Last active
August 29, 2015 14:10
-
-
Save mikepea/5226efe1b87e6700023c to your computer and use it in GitHub Desktop.
Revisions
-
mikepea revised this gist
Dec 3, 2014 . 1 changed file with 77 additions and 0 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -17,6 +17,16 @@ So that I can fix bugs As a front-end webdev I would like to know the performance of my our site from remote locations. So that we can figure out how to optimise it. As a front-end webdev I want to know what devices (screen sizes) are viewing the site So that we can support them correctly As a front-end webdev I want to know % of users that have javascript So that we can establish what to do when they don't. ``` ### Back-end web developer @@ -33,6 +43,28 @@ So that I can be happy my application is functional As a back-end webdev I would like to be able to see all the logging related to a given web session So that I can trace the path through the backend systems. As a back-end webdev I want to know the scale at which my app is working via key facts like: * number of docs * queries against db * number of items in queue * number of users * key page views So that I know how it is growing and can plan accordingly. As a back-end webdev I want to know which of my DB queues are taking longer than n milliseconds So I can optimise them if possible. As a back-end webdev I want a series of smoke tests to be run after all deployments So that I know that I haven’t broken anything when deploying my application As a back-end webdev I want a simple way of instrumenting my application to feed metrics to the metrics system So that I can gain visibility of how my application is running in production And so we can find and fix problems with it quickly ``` ### Product Owner @@ -50,6 +82,23 @@ As a product owner I would like to see that key functionality of my site is used So that I can be aware of trends in usability. As a product owner I would like the ability to see how releases affect user progress so I can learn from what we release and simplify the user journey. ``` ### Tech Arch ``` As a technical architect I would like the ability to graph releases alongside system errors so that I can see how a release affects the site's stability. As a technical architect, I would like to see how site errors affect user progress so I can understand and communicate how release quality impacts user satisfaction. ``` @@ -72,13 +121,31 @@ As a security analyst I would like the time on all log messages to be accurate to the millisecond and in UTC. So that I can plot the timeline of events accurately. As a security analyst I would like to monitor inappropriate access so that I can understand if attacks against our systems are increasing with time. ``` ### Web operations ``` As a webop I would like to NOT receive a message storm when my metrics collection system is down. So that i don't miss important alerts. As a webop I would like to be able to upgrade my metrics aggregation system without losing data. So that dependent monitoring systems dont fail. As a webop I would like to know when a system is on trajectory for failure based on the history of its metrics (eg root filesystem filling up, at current rate will fill in 3 days) So that I can fix it before it fails. As a webop I would like to my metrics retrieval system to be reliable So that I can base my health check alerts on its data. @@ -127,6 +194,16 @@ As a webop I would like to be able to configure custom per-node/service thresholds for alerts So that I don't get hassled when the world changes. As a webop or developer I want to set up suitable notifications from the monitoring system, with a customisable endpoint, such as * Full escalation path alerting (eg Pagerduty) * Simple 'working hours' level serious alerts * Notification into a chatops room. So that I know about any issues as they happen, with a forewarning of severity. As an on-call webop or dev I would like to have a link to supporting documentation when i receive an alert So that I get a head start on how to fix or diagnose the problem. ``` -
mikepea revised this gist
Dec 2, 2014 . 1 changed file with 94 additions and 22 deletions.There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -2,31 +2,55 @@ Monitoring Pack User Stories === ### Front-end web developer ``` As a front-end webdev I would like visibility of which browsers are in use So that I can convince my product owner to stop supporting crap browsers or devices As a front-end webdev I would like to see exceptions that are logged from my code So that I can fix bugs As a front-end webdev I would like to know the performance of my our site from remote locations. So that we can figure out how to optimise it. ``` ### Back-end web developer ``` As a back-end webdev I would like to be able to instrument my code and visualise the data So that I can see which parts need work As a back-end webdev I would like to know that all backend systems are operational. So that I can be happy my application is functional As a back-end webdev I would like to be able to see all the logging related to a given web session So that I can trace the path through the backend systems. ``` ### Product Owner ``` As a product owner I would like visibility of the dashboards that my team use to ascertain site health So that I can also be happy that the site is performant As a product owner I would like to know that end-users are happy So that I can dance for joy As a product owner I would like to see that key functionality of my site is used So that I can be aware of trends in usability. ``` ### Security analyst @@ -47,15 +71,63 @@ So that I can be aware of intrusions and unauthorised access. As a security analyst I would like the time on all log messages to be accurate to the millisecond and in UTC. So that I can plot the timeline of events accurately. ``` ### Web operations ``` As a webop I would like to my metrics retrieval system to be reliable So that I can base my health check alerts on its data. As a webop I would like to be able to easily inspect my graph data So that I can easily review the overall performance of a cluster. As a webop I would like to be able to construct and share my own queries against metrics data So that I can investigate the data available As a webop I would like to be able to create new alerts and checks quickly and easily So I can quickly react to new issues discovered As a webop I would like to know the cache-hit ratio of my kernel fscache So I can tune memory and backup timing for my applications. As a webop I would like visibility of key performance related kernel parameters So I can see if changes in them affect system performance As a webop I would like visibility of key events in my system, such as reboots, deployments So that I can see these markers on all graphs. As a webop I would like to be alerted when a metric has crossed a threshold So I can examine the data and work out if it is a problem As a webop I would like to be provided with a link to the problem metric graph in an alert So that I can inspect it myself. As a webop I would like to be alerted when a metric has started to behave abnormally So I can see if it is a problem. As a webop I would like to be able to acknowledge or silence alerts that I know are not a problem So that I don't get hassled. As a webop I would like to be able to configure custom per-node/service thresholds for alerts So that I don't get hassled when the world changes. ``` -
mikepea created this gist
Dec 1, 2014 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,61 @@ Monitoring Pack User Stories === ### Web operations ``` As a webop I would like to my metrics retrieval system to be reliable So that I can base my health check alerts on its data. As a webop I would like to be able to easily inspect my graph data So that I can easily review the overall performance of a cluster. As a webop I would like to be able to construct my own queries against metrics data So that I can investigate the data available As a webop I would like to be able to create new alerts and checks quickly and easily So I can quickly react to new issues discovered As a webop I would like So I can ``` ### Security analyst ``` As a security analyst I would like the data in the logging system to be encrypted at rest and in transit So that we do not leak information As a security analyst I would like access to the data in the logging system to be authenticated and recorded. So that we have awareness of who can view it and when they do. As a security analyst I would like to be able to view activity on all systems So that I can be aware of intrusions and unauthorised access. As a security analyst I would like the time on all log messages to be accurate to the millisecond and in UTC. So that I can plot the timeline of events accurately. ``` ### Product Owner ``` As a product owner I would like visibility of the dashboards that my team use to ascertain site health So that I can also be happy that the site is performant As a product owner I would like to know that ```