best talks day 1: * Please, no More Minutes, Milliseconds, Monoliths... or Monitoring Tools! - Adrian Cockcroft * gave 5 good rules for monitoring systems, showed what cloud / microservices monitoring looks like @ Netflix * Simple math to get some signal out of your noisy sea of data - Toufic Boubez * explains why static alert thresholds don't work and gave 3 techniques to use instead * Car Alarms and Smoke Alarms - Dan Slimmon * how to use sensitivity and specificity in monitoring, some good math * Metrics 2.0 - Dieter Plaetinck * metrics20 = redesign of graphite that fixes a bunch of problems, keep an eye on this project * StatsG at New York Times - Eric Buth * the first half of the talk on ops philosophy was really interesting, second half about statsg is not so useful best talks day 2: * "Auditing all the things": The future of smarter monitoring and detection - Jen Andre * really awesome security talk, lots of good practical steps for us * Is There An Echo In Here?: Applying Audio DSP algorithms to monitoring - Noah Kantrowitz * shows how to use audio processing techniques on monitoring data, good math, very interesting * recurring themes / big takeaways: * monitoring must scale ahead of the underlying system * high frequency monitoring, waiting minutes for a check result or alert is bad * collect data on everything using graphite * only alert when work isn't getting done, RAM / swap / CPU / etc. are not something you should alert on * manually watching graphs & dashboards doesn't scale * start using anomaly detection * static thresholds do not work for data from the data center * do more analysis, understand your data, do more scatterplots, histograms, find distributions, correlations, etc.