- scaling to many production environments
- metrics about deployments
- dashboards
- automating new env creation, no GUIs required
Deployments are currently initiated and managed using Bamboo's deployment plans. This scales well when you only have a couple of environments, but will become increasingly difficult to manage once we have many customers. We currently need to create a new build plan for each production environment that we want Bamboo to automatically deploy. Although the deployment process is isolated to scripts in the infastructure project, we still need to upload SSH Keys to bamboo and configure the build plan to run the scripts. This also means that any change to SSH keys or adding/removing scripts will require us to touch every build plan. That's something we can do with a few environments, but would be terrible if we had 100+ customers. NO FIDDLING WITH GUIs.
Amazon's CodeDeploy project looks really interesting and should tie in well with our infastructure project. We can add to the new project to a deployment group or create a new group. CodeDeploy can be integrated with Github and can use Githubs branch CI statuses to determine when to deploy, so it can wait for master or staging to be green. We will need to spend some time investigating how CodeDeploy works and possibly adjust our current deployment strategy to fit in well with CodeDeploy.
Use something like Heaven to deploy our apps. The deployment server can receive a single call/api request from bamboo and then start deploying to all environments that are set to track that branch. Bamboo only needs to run the CI and record the result, and can trigger the deploy directly with a call to the deployer. Using a deployment server, we can track deployments much easier and with far more metrics. This gives us a single place to make changes for deployments, and never need to touch a GUI. It allows to fully script the creation of an environment for a customer. Bamboo does not currenlty have an API for creating build plans. We can also make it easier to do custom things for "special" customers. Temporarily freezing customers to a specific revision can be done in our own dashboard and not having a developer log into Bamboo and disable a script.
- server level monitoring
- services level monitoring (DB, Work Queues)
- application level monitoring
- errors
- alerting
It's not in production until it's monitored!!!!
We have lots of options. New Relic supports server monitoring. AWS has Cloudwatch. Nagios or somethig more modern like Sensu can also do this quite well. Whatever we use must be hooked up to something like PagerDuty to allow us to easily change the on call person. Sending an email to a distribution list leads to Alert Fatigue and often everyone ignoring the problem.
New Relic has plugins to monitor most of these and can monitor back end workers like sidekiq. Another option is Nagios/Sensu which can monitor most of anything. It is important that any tool we have can alert us to queue length and unusual errors in the queues.
Far and away the easiest and best Rails app monitoring is New Relic. You can get a lot of performance timings and similar stats using your own creation of Statd/Graphite/Batsd, but you will lose things like slow transaction tracing. New Relic is not good at collecting many custom metrics, though. So you can't add a lot of stats like you would do with a Statsd/Batsd implementation. New Relic also has SQL query performance timing and will suggest indexes, etc. There are other alternative providers we could look into, Sklight or Appsignal for example, they both have more favorable price schemes. Honeybadger has it's own lightweight form of this in their error tracking product.
I recommend we use a service for this, honeybadger is my current favorite. Sentry and other options are pretty solid for the price. Honeybadger actually has performance metrics for your app and could almost replace New Relic with transaction tracing, but I've never tried using it for that. New Relic also has error capturing and reporting.
Continue using newrelic until it becomes cost prohibitive. At that time we can do many things like investigate other services or consider building our own. New Relic is very opinionated and supports Rails apps out of the box. It covers server, application and error monitroing on it's own. Yes it's expensive, but we can reevaluate once the cost is very high or we understand our own needs better.
It is possible to build your own version of new relic. You can utitlize ActiveSupport::Notifications to record everything you want to know. You can use statsd or even splunk to create the statistics that you want, like p95 or avg or median measurements. You can use Rails own auto explain logging to record slow queries.
We can also mix in services with our own monitoring such as Honeybadger for errors etc.
Please for the love of everything holy, do not send alerts to an email distribution list. Instead send them to a service like PagerDuty, if it's too expensive we can use an open source tool that does a similar thing like Flapjack, but other services tend to be well integrated with PagerDuty, like New Relic and Honeybadger.
This is vital for establishing an on-call rotation and avoid emails from interrupting developers. Or worse developers just start ignoring them.
- making it so anyone can deploy anything/anywhere
- automating everything
The goal of ChatOps is to make everything automated and open to everyone. A PM wants to deploy code to an environment, they don't need to know anything but the command to our chatbot or how to use our dashboard UI. No logging into Bamboo or installing the correct SSH key and setting up a project on you local machine. A sales rep wants to setup a demo environment, BOOM! done. No more tickets for the developers to implement. This ties in pretty tightly with a deployment server, the current bamboo method isn't going to work.
New Relic is expensive and it charges per server. We will have lots of servers with our environment per customer. Something like Honeybadger charges per application, and we will really only have 4 applications currently.
As part of chatops we'll need to figure out the best way to configure everything on New Relic as well. When change or create projects then New Relic might need to change as well, such as adding a new type of service.