Skip to content

Instantly share code, notes, and snippets.

@mikepea
Last active August 29, 2015 14:10
Show Gist options
  • Save mikepea/5226efe1b87e6700023c to your computer and use it in GitHub Desktop.
Save mikepea/5226efe1b87e6700023c to your computer and use it in GitHub Desktop.

Revisions

  1. mikepea revised this gist Dec 3, 2014. 1 changed file with 77 additions and 0 deletions.
    77 changes: 77 additions & 0 deletions packmon_user_stories_1.md
    Original file line number Diff line number Diff line change
    @@ -17,6 +17,16 @@ So that I can fix bugs
    As a front-end webdev
    I would like to know the performance of my our site from remote locations.
    So that we can figure out how to optimise it.
    As a front-end webdev
    I want to know what devices (screen sizes) are viewing the site
    So that we can support them correctly
    As a front-end webdev
    I want to know % of users that have javascript
    So that we can establish what to do when they don't.
    ```

    ### Back-end web developer
    @@ -33,6 +43,28 @@ So that I can be happy my application is functional
    As a back-end webdev
    I would like to be able to see all the logging related to a given web session
    So that I can trace the path through the backend systems.
    As a back-end webdev
    I want to know the scale at which my app is working via key facts like:
    * number of docs
    * queries against db
    * number of items in queue
    * number of users
    * key page views
    So that I know how it is growing and can plan accordingly.
    As a back-end webdev
    I want to know which of my DB queues are taking longer than n milliseconds
    So I can optimise them if possible.
    As a back-end webdev
    I want a series of smoke tests to be run after all deployments
    So that I know that I haven’t broken anything when deploying my application
    As a back-end webdev
    I want a simple way of instrumenting my application to feed metrics to the metrics system
    So that I can gain visibility of how my application is running in production
    And so we can find and fix problems with it quickly
    ```

    ### Product Owner
    @@ -50,6 +82,23 @@ As a product owner
    I would like to see that key functionality of my site is used
    So that I can be aware of trends in usability.
    As a product owner
    I would like the ability to see how releases affect user progress
    so I can learn from what we release and simplify the user journey.
    ```

    ### Tech Arch

    ```
    As a technical architect
    I would like the ability to graph releases alongside system errors
    so that I can see how a release affects the site's stability.
    As a technical architect,
    I would like to see how site errors affect user progress
    so I can understand and communicate how release quality impacts user satisfaction.
    ```


    @@ -72,13 +121,31 @@ As a security analyst
    I would like the time on all log messages to be accurate to the millisecond and in UTC.
    So that I can plot the timeline of events accurately.
    As a security analyst
    I would like to monitor inappropriate access
    so that I can understand if attacks against our systems are increasing with time.
    ```



    ### Web operations

    ```
    As a webop
    I would like to NOT receive a message storm when my metrics collection system is down.
    So that i don't miss important alerts.
    As a webop
    I would like to be able to upgrade my metrics aggregation system without losing data.
    So that dependent monitoring systems dont fail.
    As a webop
    I would like to know when a system is on trajectory for failure based on the history of its metrics (eg root filesystem filling up, at current rate will fill in 3 days)
    So that I can fix it before it fails.
    As a webop
    I would like to my metrics retrieval system to be reliable
    So that I can base my health check alerts on its data.
    @@ -127,6 +194,16 @@ As a webop
    I would like to be able to configure custom per-node/service thresholds for alerts
    So that I don't get hassled when the world changes.
    As a webop or developer
    I want to set up suitable notifications from the monitoring system, with a customisable endpoint, such as
    * Full escalation path alerting (eg Pagerduty)
    * Simple 'working hours' level serious alerts
    * Notification into a chatops room.
    So that I know about any issues as they happen, with a forewarning of severity.
    As an on-call webop or dev
    I would like to have a link to supporting documentation when i receive an alert
    So that I get a head start on how to fix or diagnose the problem.
    ```


  2. mikepea revised this gist Dec 2, 2014. 1 changed file with 94 additions and 22 deletions.
    116 changes: 94 additions & 22 deletions packmon_user_stories_1.md
    Original file line number Diff line number Diff line change
    @@ -2,31 +2,55 @@ Monitoring Pack User Stories
    ===


    ### Web operations

    ### Front-end web developer

    ```
    As a webop
    I would like to my metrics retrieval system to be reliable
    So that I can base my health check alerts on its data.
    As a front-end webdev
    I would like visibility of which browsers are in use
    So that I can convince my product owner to stop supporting crap browsers or devices
    As a webop
    I would like to be able to easily inspect my graph data
    So that I can easily review the overall performance of a cluster.
    As a front-end webdev
    I would like to see exceptions that are logged from my code
    So that I can fix bugs
    As a webop
    I would like to be able to construct my own queries against metrics data
    So that I can investigate the data available
    As a front-end webdev
    I would like to know the performance of my our site from remote locations.
    So that we can figure out how to optimise it.
    ```

    As a webop
    I would like to be able to create new alerts and checks quickly and easily
    So I can quickly react to new issues discovered
    ### Back-end web developer

    As a webop
    I would like
    So I can
    ```
    As a back-end webdev
    I would like to be able to instrument my code and visualise the data
    So that I can see which parts need work
    As a back-end webdev
    I would like to know that all backend systems are operational.
    So that I can be happy my application is functional
    As a back-end webdev
    I would like to be able to see all the logging related to a given web session
    So that I can trace the path through the backend systems.
    ```

    ### Product Owner

    ```
    As a product owner
    I would like visibility of the dashboards that my team use to ascertain site health
    So that I can also be happy that the site is performant
    As a product owner
    I would like to know that end-users are happy
    So that I can dance for joy
    As a product owner
    I would like to see that key functionality of my site is used
    So that I can be aware of trends in usability.
    ```


    ### Security analyst
    @@ -47,15 +71,63 @@ So that I can be aware of intrusions and unauthorised access.
    As a security analyst
    I would like the time on all log messages to be accurate to the millisecond and in UTC.
    So that I can plot the timeline of events accurately.
    ```

    ### Product Owner


    ### Web operations

    ```
    As a product owner
    I would like visibility of the dashboards that my team use to ascertain site health
    So that I can also be happy that the site is performant
    As a webop
    I would like to my metrics retrieval system to be reliable
    So that I can base my health check alerts on its data.
    As a webop
    I would like to be able to easily inspect my graph data
    So that I can easily review the overall performance of a cluster.
    As a webop
    I would like to be able to construct and share my own queries against metrics data
    So that I can investigate the data available
    As a webop
    I would like to be able to create new alerts and checks quickly and easily
    So I can quickly react to new issues discovered
    As a webop
    I would like to know the cache-hit ratio of my kernel fscache
    So I can tune memory and backup timing for my applications.
    As a webop
    I would like visibility of key performance related kernel parameters
    So I can see if changes in them affect system performance
    As a webop
    I would like visibility of key events in my system, such as reboots, deployments
    So that I can see these markers on all graphs.
    As a webop
    I would like to be alerted when a metric has crossed a threshold
    So I can examine the data and work out if it is a problem
    As a webop
    I would like to be provided with a link to the problem metric graph in an alert
    So that I can inspect it myself.
    As a webop
    I would like to be alerted when a metric has started to behave abnormally
    So I can see if it is a problem.
    As a webop
    I would like to be able to acknowledge or silence alerts that I know are not a problem
    So that I don't get hassled.
    As a webop
    I would like to be able to configure custom per-node/service thresholds for alerts
    So that I don't get hassled when the world changes.
    As a product owner
    I would like to know that
    ```



  3. mikepea created this gist Dec 1, 2014.
    61 changes: 61 additions & 0 deletions packmon_user_stories_1.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,61 @@
    Monitoring Pack User Stories
    ===


    ### Web operations

    ```
    As a webop
    I would like to my metrics retrieval system to be reliable
    So that I can base my health check alerts on its data.
    As a webop
    I would like to be able to easily inspect my graph data
    So that I can easily review the overall performance of a cluster.
    As a webop
    I would like to be able to construct my own queries against metrics data
    So that I can investigate the data available
    As a webop
    I would like to be able to create new alerts and checks quickly and easily
    So I can quickly react to new issues discovered
    As a webop
    I would like
    So I can
    ```




    ### Security analyst

    ```
    As a security analyst
    I would like the data in the logging system to be encrypted at rest and in transit
    So that we do not leak information
    As a security analyst
    I would like access to the data in the logging system to be authenticated and recorded.
    So that we have awareness of who can view it and when they do.
    As a security analyst
    I would like to be able to view activity on all systems
    So that I can be aware of intrusions and unauthorised access.
    As a security analyst
    I would like the time on all log messages to be accurate to the millisecond and in UTC.
    So that I can plot the timeline of events accurately.
    ```

    ### Product Owner

    ```
    As a product owner
    I would like visibility of the dashboards that my team use to ascertain site health
    So that I can also be happy that the site is performant
    As a product owner
    I would like to know that
    ```