sinhan · June 4, 2017 23:57
diff --git a/gistfile1.txt b/gistfile1.txt
 OpenTSDB:
 Scales well, keep all data, creating metric is easy : OpnenTSDB ( vs Ganglia). 
 Anamoly detection : Skyline and oculus by etsy. Icinga checks for metrics
 Opensource and distributed based on HBASE/HDFS
 tcollector framework to colect and put data
 decouple mesaurment from storage
 Very precise and collects trillion of data points. never lose precision
 Very good for IOT, distributed systems
 Monitor : Application performance, network performance, resource utilization,
 FrontEnds: By box, tcketmaster,
 How to guard against chatty service . It difficult. Promethius is better ( digital ocen)

 Promethius + Garphana
 Add Vulacan
 Pull based just like nagios but only collects time series data
 Uses TCP for pull rather than UDP for push
 Not a event based monitoring system nor it stores raw data
 With a pull-based approach, your monitoring system needs to know which service instances exist and how to connect to them
 To get high availability, pull allows you to just run two identically configured Prometheus servers in parallel
 Whether you pull or push, any time-series database will fall over if you send it more samples than it can handle. 

 EFK (Elasticsearch, Fluentd, and Kibana)
 both Fluentd and Logstash provide both log forwarders and log shippers 
 Log shippers are an essential component in modern devops, because logs are streams, not files 
 Log forwarders send logging events to log shippers. A forwarder's goal is to send those events upstream as quickly as possible. Log shippers make delivery/routing decisions based upon the log event stream. A shipper may aggregate events, and/or send them to remote storage or analysis tools. 


 Both Kibana and Grafana are powerful visualization tools. However, the Grafana and InfluxDB combination is used for metric data whereas Kibana is part of the popular ELK Stack, which provides more flexibility when exploring log data.
 Both platforms are good options and can even sometimes complement each other. First, use Kibana to analyze your logs. Then, export the data into Grafana as the visualization layer. Both rely on the same Elasticsearch repository.



 Generate - > Collect -> Transpor - > Store - > Analyze - > Alerts
 Generate : Application logs, syslog,  proxy and web servers, 
 Logging considerations ● Logging means more code ● Logging is not free ● Consider feedback to the UI instead of logging ● The more you log, the less you can find ● Consider to log only the most evil scenarios (log exceptions) ● Agree on levels like FATAL, ERROR, WARN, DEBUG, INFO, TRACE 

 Collect : Stdout, files,
 Transport : transporters and collectors, Logstash, Flume, Fluent . Pull vs push for traffic

 Store : Short vs long, speed of data ingestion ad retrieval , data access,.  Can be S3, Elastic Search, Cassandra, HBASE,

 Analyze : 
 	Batch processing -  HDFS, Hive, Pig -> Map Reduce
 	UI based : Kibana, Garylog
 Alerts : Based on patterns or calculated metric  send out events.

 Logging is not monitoring
 Logging : recording to diagnose system
 Monitoring : observation, checking and recording

 In containerized world : label data at source. Push and parse as soon as possible

 =========================================

 .Monitor individual servers/VMS - Sutained CPU utlization ,Load per CPU, Memory consumptions, HEAP, NTP offsets, Disc usage, Active connections, 

 network traffic , swap usage: Using Nagios

 2.Monitor applications : Using Shell scripting, nagios/Sensu, Log Aggregator, Splunk , Code instrumentation (byteman for tomcat), verifying 

 server process are up

 	- Apache : Logs , mod_status, (number of incoming requests, CPU Usage, Server load, server uptime,total tarffic, worker pool, idle vs 

 active connections, number of threads)
 	- RabbitMq : Logs,rabbitmqctl,/management/nagios plugin, number of message in a queue, acknowledged vs unack, queue timeouts, cpu/memory 

 usage, messages in queue, publish rate, get rate, health status, open sockets, open files
 	- NginX : Logs, access_log directive,ngxtop HttpStubStatusModule module,serverdensity ,: requests per second , number of connections, 

 baseline traffic, uptime,  CPU overload
 	- NodeJs server : Logs, response time of imporatnt services/apis/webpages, transaction errors, Free, Used, and Max Heap, Non-Heap Memory, 

 Garbage collection, Total time spent actively executing in each event loop tick, Event loop ticks per minute
 	- Redis : Log, redis-cli, memory, max concurrent connections, cache hit ratio, evictions, expired objects
 	- Tomcat : Logs	, enable JMX, JVM Heap an,d memory utilization, thread usage, request throughput, sessions,threadpool, GC Collection, 

 Thread pool, Active threads,Active, expired and rejected HTTP session 
 	- Code : Instrumentation for trace
 	- Jboss : Same as above tomcat
 	- HAProxy : properly distributing traffic, Error Rate (per-min), Proxy Status, Request Rate (per-min), Active Servers, Sessions Active, 

 Sessions Queued, frontend metrics, backend metrics, health metrics, session utilization, http client and server errors, average backend response 

 time, number of time connection retries, , connection failures, response denied or failed,q time, ( check directive will detect health of backend 

 and frontend servers)

 	- Load Balancer : Session utilization, latency, Denials, queue length and queue time
 	- Oracle : Number of connections to database
 	- MySQL : Read/Write requests,Uptime,Threads_connected,Max_used_connections,Aborted_connects,InnoDB deadlocks,slow query log, salve lag, full table scans

 3. Monitor website :
 	- Whitebox monitoring : Monitoring based on metrics exposed by the internals of the system like 
 		http responses and error codes,
 		Response time for services, apis, webpages
 		Drops/Spikes for different pages with differnt pages : Browse, Search, products, Add to card, checkout
 		Drops and spike for aggregation from DB : OPM
 	- Blackbox monitoring : Testing externally visible behavior as a user would see it. user journey .Most important
 		Page load performance
 		Page view throughput
 		Browser traces
 	- Ping Tests : Geographic performance,
	OpenTSDB:
	Scales well, keep all data, creating metric is easy : OpnenTSDB ( vs Ganglia).
	Anamoly detection : Skyline and oculus by etsy. Icinga checks for metrics
	Opensource and distributed based on HBASE/HDFS
	tcollector framework to colect and put data
	decouple mesaurment from storage
	Very precise and collects trillion of data points. never lose precision
	Very good for IOT, distributed systems
	Monitor : Application performance, network performance, resource utilization,
	FrontEnds: By box, tcketmaster,
	How to guard against chatty service . It difficult. Promethius is better ( digital ocen)

	Promethius + Garphana
	Add Vulacan
	Pull based just like nagios but only collects time series data
	Uses TCP for pull rather than UDP for push
	Not a event based monitoring system nor it stores raw data
	With a pull-based approach, your monitoring system needs to know which service instances exist and how to connect to them
	To get high availability, pull allows you to just run two identically configured Prometheus servers in parallel
	Whether you pull or push, any time-series database will fall over if you send it more samples than it can handle.

	EFK (Elasticsearch, Fluentd, and Kibana)
	both Fluentd and Logstash provide both log forwarders and log shippers
	Log shippers are an essential component in modern devops, because logs are streams, not files
	Log forwarders send logging events to log shippers. A forwarder's goal is to send those events upstream as quickly as possible. Log shippers make delivery/routing decisions based upon the log event stream. A shipper may aggregate events, and/or send them to remote storage or analysis tools.


	Both Kibana and Grafana are powerful visualization tools. However, the Grafana and InfluxDB combination is used for metric data whereas Kibana is part of the popular ELK Stack, which provides more flexibility when exploring log data.
	Both platforms are good options and can even sometimes complement each other. First, use Kibana to analyze your logs. Then, export the data into Grafana as the visualization layer. Both rely on the same Elasticsearch repository.



	Generate - > Collect -> Transpor - > Store - > Analyze - > Alerts
	Generate : Application logs, syslog, proxy and web servers,
	Logging considerations ● Logging means more code ● Logging is not free ● Consider feedback to the UI instead of logging ● The more you log, the less you can find ● Consider to log only the most evil scenarios (log exceptions) ● Agree on levels like FATAL, ERROR, WARN, DEBUG, INFO, TRACE

	Collect : Stdout, files,
	Transport : transporters and collectors, Logstash, Flume, Fluent . Pull vs push for traffic

	Store : Short vs long, speed of data ingestion ad retrieval , data access,. Can be S3, Elastic Search, Cassandra, HBASE,

	Analyze :
	Batch processing - HDFS, Hive, Pig -> Map Reduce
	UI based : Kibana, Garylog
	Alerts : Based on patterns or calculated metric send out events.

	Logging is not monitoring
	Logging : recording to diagnose system
	Monitoring : observation, checking and recording

	In containerized world : label data at source. Push and parse as soon as possible

	=========================================

	.Monitor individual servers/VMS - Sutained CPU utlization ,Load per CPU, Memory consumptions, HEAP, NTP offsets, Disc usage, Active connections,

	network traffic , swap usage: Using Nagios

	2.Monitor applications : Using Shell scripting, nagios/Sensu, Log Aggregator, Splunk , Code instrumentation (byteman for tomcat), verifying

	server process are up

	- Apache : Logs , mod_status, (number of incoming requests, CPU Usage, Server load, server uptime,total tarffic, worker pool, idle vs

	active connections, number of threads)
	- RabbitMq : Logs,rabbitmqctl,/management/nagios plugin, number of message in a queue, acknowledged vs unack, queue timeouts, cpu/memory

	usage, messages in queue, publish rate, get rate, health status, open sockets, open files
	- NginX : Logs, access_log directive,ngxtop HttpStubStatusModule module,serverdensity ,: requests per second , number of connections,

	baseline traffic, uptime, CPU overload
	- NodeJs server : Logs, response time of imporatnt services/apis/webpages, transaction errors, Free, Used, and Max Heap, Non-Heap Memory,

	Garbage collection, Total time spent actively executing in each event loop tick, Event loop ticks per minute
	- Redis : Log, redis-cli, memory, max concurrent connections, cache hit ratio, evictions, expired objects
	- Tomcat : Logs , enable JMX, JVM Heap an,d memory utilization, thread usage, request throughput, sessions,threadpool, GC Collection,

	Thread pool, Active threads,Active, expired and rejected HTTP session
	- Code : Instrumentation for trace
	- Jboss : Same as above tomcat
	- HAProxy : properly distributing traffic, Error Rate (per-min), Proxy Status, Request Rate (per-min), Active Servers, Sessions Active,

	Sessions Queued, frontend metrics, backend metrics, health metrics, session utilization, http client and server errors, average backend response

	time, number of time connection retries, , connection failures, response denied or failed,q time, ( check directive will detect health of backend

	and frontend servers)

	- Load Balancer : Session utilization, latency, Denials, queue length and queue time
	- Oracle : Number of connections to database
	- MySQL : Read/Write requests,Uptime,Threads_connected,Max_used_connections,Aborted_connects,InnoDB deadlocks,slow query log, salve lag, full table scans

	3. Monitor website :
	- Whitebox monitoring : Monitoring based on metrics exposed by the internals of the system like
	http responses and error codes,
	Response time for services, apis, webpages
	Drops/Spikes for different pages with differnt pages : Browse, Search, products, Add to card, checkout
	Drops and spike for aggregation from DB : OPM
	- Blackbox monitoring : Testing externally visible behavior as a user would see it. user journey .Most important
	Page load performance
	Page view throughput
	Browser traces
	- Ping Tests : Geographic performance,