veekaybee · February 16, 2023 19:38 · Feb 16, 2023
diff --git a/systems_performance.md b/systems_performance.md
@@ -0,0 +1,131 @@
+## [Systems Performance 2nd edition](https://www.brendangregg.com/blog/2020-07-15/systems-performance-2nd-edition.html)
+
+See [synthesized write-up here](https://vickiboykis.com/2022/12/05/the-cloudy-layers-of-modern-day-programming/)
+
++ Do a quick performance check in 60 seconds
++ Use a number of different tools available in unix
++ Use flamegraphs of the callstack if you have access to them
++ Best performance winds are elimiating unnecessary wrok, for example a thread stack in a loop, eliminating bad config
++ Mantras: Don't do it (elimiate); do it again (caching); do it less (polling), do it when they're not looking, do it concurrently, do it more cheaply
+
++ **Latency** is an essential performance metric - the time for an operation to complete
+ - Operation request
+ - Database query
+ - File system operation
+ - We can improve latency by decreasing disk reads, aka caching
+
+## Actionable Chain of Events
+
+`Counter --> Statistics --> Metrics --> Alerts`
+
+Profiling tools allow us to take simple measures of CPUs, [including flamegraphs](https://www.brendangregg.com/flamegraphs.html), which show us CPU footprint. 
+
+![](https://camo.githubusercontent.com/eecfbf00e6cc5baf6ae2b66283573d765f8fe29f1d3df10f4ce3423d942c0af3/687474703a2f2f7777772e6272656e64616e67726567672e636f6d2f466c616d654772617068732f6370752d626173682d666c616d6567726170682e737667)
+The x-axis shows the stack profile population, sorted alphabetically (it is not the passage of time), and the y-axis shows stack depth, counting from zero at the bottom. Each rectangle represents a stack frame. The wider a frame is is, the more often it was present in the stacks. The top edge shows what is on-CPU, and beneath it is its ancestry. Original flame graphs use random colors to help visually differentiate adjacent frames. Variations include inverting the y-axis (an "icicle graph"), changing the hue to indicate code type, and using a color spectrum to convey an additional dimension.
+
+Tracing - Event-based recording where data is saved for later analysis. 
+
+## Linux 60-second checklist
+
+Also here: [if you only have a bit of time to profile your system.](https://www.brendangregg.com/Articles/Netflix_Linux_Perf_Analysis_60s.pdf)
+
+In 60 seconds you can get a high level idea of system resource usage and running processes by running the
+following ten commands. Look for errors and saturation metrics, as they are both easy to interpret, and then
+resource utilization. Saturation is where a resource has more load than it can handle, and can be exposed
+either as the length of a request queue, or time spent waiting
+
+Don't only use top because you don't know other tools, creates a streetlight effect. 
+
+```
+uptime
+dmesg | tail
+vmstat 1
+mpstat -P ALL 1
+pidstat 1
+iostat -xz 1
+free -m
+sar -n DEV 1
+sar -n TCP,ETCP 1
+top
+```
+
+## High-level terminology 
+
++ IOPS - input/output per second, data trasnfer
++ Latency - measure of time of operations spent waiting
++ Saturation - Degree which a resource has been queued
++ Hit ratio: number of times needed data is found in cache versus total access (hits+ misses)
++ 
+
+## Performance tradeoffs: 
+
+```
+Good -- Fast -- Cheap ; high-performance -- Ontime -- inexpensive
+```
+
+File system size: small records perform better for I/O; larger record sizes will improve streaming workloads
+
+## Types of caches
+
++ Performance tuning is most effective when done closest to the work performed
+
++ **MRU **- most recently used
++ **LRU** - least recently used
++ **MFU** - most frequently used
++ **LFU** - least recently used
+
+**Cold cache** - empty, populated with unwanted data. Hit ratio is zero as it begins to warm up. 
+**Warm cache** - populated with useful data but doesn't have a large enough hit ratio
+
+```
+Cold --> Warm --> Hot
+Ratio improving
+```
+Cache tuning: Aim to cache as high in the stack as possible, closer to where the work is, performed directly reduces the operational overload of cache hits. 
+
+p. 61: [performance Mantras](https://www.brendangregg.com/methodology.html)
+
+```
+State the goals of the study and define system boundaries
+List system services and possible outcomes
+Select performance metrics
+List system and workload parameters
+Select factors and their values
+Select the workload
+Design the experiments
+Analyze and interpret the data
+Present the results
+If necessary, start over
+```
+
+## Disk Utilization (p. 65)
+
+Disk utilization can become a problem even before it hits 100%. To find the bottleneck: 
+
+1. Measure rate of server requests, monitor this rate over tme
+2. Measure hardware and software resource usage
+3. Express server requests in terms of resource used
+4. Extrapolate severer requests for each resource
+
+Constraints: 
+
+**Hardware: **
++ CPU Utilization
++ Memory Usage
++ Disk IOPS
++ Disk Throughput
++ Disk Capacity
+
+**Software: **
++ Virtual memory usage
++ Proess/tasks
++ File descriptions
+
+Sharding - a common strategy for databases where data split into logical components, each managed by its own database
+
+p. 106 - CPU versus IO bound: 
++ CPU: Performing heavy compute like science and math
++ IO-bound: performing io like web servers and file servers, low latency is important
+
+
+
No results found