## Sponsor Plug: New Relic - Chase ##

New Relic browser / front end:

* how fast your pages load
* how fast are your ajax calls?
* JS error tracking

interesting stuff we found:

* error messages get translated, "Syntax error" vs. "Erreur de syntaxe", they get reported differently
* his site had no ajax, but there were a ton of AJAX errors
** what is this stuff?
** the majority are toolbars, malware, etc.
** browser extensions, google translate, etc.
** some are pretty nasty, "Skype click-to-call" got into an infinite loop and triggered tens of thousands of errors

## Sponsor plug: Elastic Search - Rashid ##

* who uses ES? show of hands
* 70% use it vs. 30% don't (hmm... interesting..)

* i'm going to give a workshop on wednesday, so i'll demo a lot more then
* but if anyone has any questions, feel free to ask me now

* Q: why do we need log searching? why elasticsearch?
** A: a graph shows you when something might be wrong, but logs allow you to go back to the original event and see what exactly happened

* Q: what did you have for breakfast?
** A: yogurt, granola, melon

* Q: do you want to buy a musket?
** A: yes, to defend myself from the government

* Q: did you know you can 3d print a musket?
** A: yes, i'm terrified of this

* Q: does ZK cluster discovery work?
** A: not used it, zen (?) discovery works

* Q: can you talk about jepsen and ES?
** A: there's a recent blog post about it, it's a tough subject, distributed is hard, we don't have an answer for everything but we're doing pretty good

* Q: roadmap?
** A: for what?
* Q: kibana?
** A: will talk more on wed, better aggregations / facets, which are useful for turning logs into charts, "top N query" reduced from N queries to 1

* Q: when is ES going to learn how to reindex something something without something?
** A: push harder if you want this feature

## Sponsor plug: Librato - Joe ##

* CTO of librato
* librato is a platform for storing, monitoring, and alerting on custom metrics
* composable monitoring system tailored to you
* in the past that meant building your own solution from scratch with a bunch of OSS

* librato lets you correlate arbitrary time series with each other
* marking events like deploys & config changes
* no proprietary agent, everything works over HTTP
* 80-100 products (middleware, web servers, databases, etc.) know how to speak to librato via opensource plugins
* if you can write to stdout, you can capture that log output and send to librato as metrics

* new features:
** more integrations
** better alerts - tune the sensitivity of alerts using historical data
** better on-call information - associate URLs / documentation with alerts, find all previous occurrences of an alert
** "composite metrics" - custom query language to manipulate raw data, calculate ratios, aggregates (looks like graphite's URL/function interface)

## Sponsor plug: Pagerduty ##

* pagerduty sits between your monitoring systems and your on-call people
* we integrate with everyone
* we send SMS/email to the right person
* we take reliability seriously, full end-to-end tests
** we have 4 android phones in our lab constantly receiving texts to ensure deliverability!

new stuff:

* multi-user alerting
* on-call handoff notifications
* SSO
* outbound webhooks

multi-user alerting:

* we found this is a great way to do onboarding for new ops people
* put the new guy on-call alongside a veteran so they can get trained up in being on-call
* multi-user alerting is also good for higher levels of escalation
* for example if two people sleep through the alert, then set up your third escalation level to alert everyone instead of continuing to retry people one-by-one

handoff notifications:

* notify by email, sms, and push when you go on or off call

outbound webhooks:
* now has integration with slack, hipchat, flowdock, etc.
* live demo of webhook FAILED, kinda awkward... lolz
* oh wait he just yelled from the crowd that it worked (sure it did)

## Sponsor plug: Dataloop.io - David ##

* lots of teams spend a lot of time building monitoring solutions using OSS
* but as soon as you try to get developers or QA to use it, you run into problems
* high learning curve, confusing documentation, difficult interfaces

* we want to un-silo the monitoring tools
* as we move to microservices, traditional monitoring gets more difficult

* we are building the monitoring tool for microservices
* easy to use
* flexibility of nagios / graphite, but with drag & drop
* easy to create alerts

* use existing nagios check scripts
* speaks graphite/statsd/carbon protocol
* create hierarchies with drag & drop
* use tags
* write plugins in any language

* another thing we do besides config is visualization
** nagios, collectd, and statsd all in one place
** create dashboards via drag & drop, resize
** send dashboard reports via email (good for weekly / monthly reports to management teams)
** embeddable widgets

* next, alerting:
** big feature is multiple triggers for alerts
** build context for your alerts
** condition A and condition B and condition C
** e.g. both web performance & service up/down check must trigger before alert goes off
** this decreases alert spam

* actions:
** email / SMS / phone
** send to jira
** trigger event handlers (any language)

* driven by API, command line tool, or github

* launching later this year, beta testing now

## Sponsor plug: Salesforce ##

no-show

## Sponsor plug: Puppet ##

* who doesn't know what puppet is?
* we have commercial & open source offerings

* who's coming to the puppet party tonight?
* it's really hard to get there, left then right

* we're hiring, a lot
* (scrolls through dozens of job listings)

* can everyone from puppet labs stand up?
* (like 20 people stood up)

* come to puppetconf in SF, september 20-24
* all kinds of presenters, lots of topics
* early bird pricing ends this month

## Sponsor plug: pingdom ##

interesting numbers from our customers:

* 14 billion checks per month
* 9.4 million detected outages per month
* 8 million alerts sent per month
* total downtime to 500 million minutes, across 450k customers

* what can we do at pingdom to help with this?
* #1 most requested feature: alert management

new feature: BeepManager

* pingdom.com/beepmanager
* team members can customize their method of contact
* automated escalations
* integrate with other systems (nagios, new relic, rackspace cloud monitoring)
* alert flood protection
* access levels
* alert templates

* most important feature of monitoring system is that it works for your team
* we are committed to making our tool work for your team

## Sponsor plug: Grok - Jared ##

* numenta.com/grok
* we do anomaly detection
* we've heard all about it these two days

* how do we solve it? science
* years of research, we've made some breakthroughs
* automatic & unsupervised machine learning on timeseries data
* open source at numenta.org

first product: grok

* mobile app
* automated model creation & monitoring for AWS instances
* showed some examples
* automatic anomaly detection in CPU load
* they used this to catch someone running manual builds on a build server
* required no setup / training

* free trial: simple to get running, 10 servers, no time limit

## Sponsor plug: Big Panda ##

* we launched our private beta yesterday

* we spend a lot of time tweaking tools, building thousands of alerts
* what do you use to manage your response to issues?
* jira, zendesk, email
* those tools are meant for humans
* they were not built for responding to tons of automatically created incidents, flapping alerts, etc.

* bigpanda is basically jira for ops

* live demo
* home page "OpsBox" shows all alerts
* UI should be very familiar to gmail users
* star alerts, mute alerts
* how do i rise above the noise of alerts?
* shows a timeline of alerts, when did it start warning, when did it reach critical, when did it go back to normal
* (pretty cool looking)
* shows a lot more data in context
* "Changes" view: event log of changes in your infrastructure

* we're already helping people today respond to alerts in a much more intelligent manner

## Sponsor plug: Datadog - Alexei ##

* cofounder and CTO of Datadog
* hosted monitoring service
* easily monitor from 5 to 50,000 hosts

* what have we been working on the past year?

* better graphs
* better visualizations, histograms
* better counts & counters
* heatmaps
* better alerts, more sophisticated alerting

* the ability to embed disturbing images into your dashboards (nicholas cage meme pics)

* more integrations: fastly, google cloud, slack, new relic, 50-60 integrations total

* monitoring is fun!
* who here has learned a lot these past two days? (everyone)
* who here wants to work on monitoring more? (still everyone)
* that's good news because we're hiring ha ha laffs