Created
February 16, 2023 19:38
-
-
Save veekaybee/1cddbd7d07de431d9f93734a8f0b47c8 to your computer and use it in GitHub Desktop.
Revisions
-
veekaybee created this gist
Feb 16, 2023 .There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode charactersOriginal file line number Diff line number Diff line change @@ -0,0 +1,131 @@ ## [Systems Performance 2nd edition](https://www.brendangregg.com/blog/2020-07-15/systems-performance-2nd-edition.html) See [synthesized write-up here](https://vickiboykis.com/2022/12/05/the-cloudy-layers-of-modern-day-programming/) + Do a quick performance check in 60 seconds + Use a number of different tools available in unix + Use flamegraphs of the callstack if you have access to them + Best performance winds are elimiating unnecessary wrok, for example a thread stack in a loop, eliminating bad config + Mantras: Don't do it (elimiate); do it again (caching); do it less (polling), do it when they're not looking, do it concurrently, do it more cheaply + **Latency** is an essential performance metric - the time for an operation to complete - Operation request - Database query - File system operation - We can improve latency by decreasing disk reads, aka caching ## Actionable Chain of Events `Counter --> Statistics --> Metrics --> Alerts` Profiling tools allow us to take simple measures of CPUs, [including flamegraphs](https://www.brendangregg.com/flamegraphs.html), which show us CPU footprint.  The x-axis shows the stack profile population, sorted alphabetically (it is not the passage of time), and the y-axis shows stack depth, counting from zero at the bottom. Each rectangle represents a stack frame. The wider a frame is is, the more often it was present in the stacks. The top edge shows what is on-CPU, and beneath it is its ancestry. Original flame graphs use random colors to help visually differentiate adjacent frames. Variations include inverting the y-axis (an "icicle graph"), changing the hue to indicate code type, and using a color spectrum to convey an additional dimension. Tracing - Event-based recording where data is saved for later analysis. ## Linux 60-second checklist Also here: [if you only have a bit of time to profile your system.](https://www.brendangregg.com/Articles/Netflix_Linux_Perf_Analysis_60s.pdf) In 60 seconds you can get a high level idea of system resource usage and running processes by running the following ten commands. Look for errors and saturation metrics, as they are both easy to interpret, and then resource utilization. Saturation is where a resource has more load than it can handle, and can be exposed either as the length of a request queue, or time spent waiting Don't only use top because you don't know other tools, creates a streetlight effect. ``` uptime dmesg | tail vmstat 1 mpstat -P ALL 1 pidstat 1 iostat -xz 1 free -m sar -n DEV 1 sar -n TCP,ETCP 1 top ``` ## High-level terminology + IOPS - input/output per second, data trasnfer + Latency - measure of time of operations spent waiting + Saturation - Degree which a resource has been queued + Hit ratio: number of times needed data is found in cache versus total access (hits+ misses) + ## Performance tradeoffs: ``` Good -- Fast -- Cheap ; high-performance -- Ontime -- inexpensive ``` File system size: small records perform better for I/O; larger record sizes will improve streaming workloads ## Types of caches + Performance tuning is most effective when done closest to the work performed + **MRU **- most recently used + **LRU** - least recently used + **MFU** - most frequently used + **LFU** - least recently used **Cold cache** - empty, populated with unwanted data. Hit ratio is zero as it begins to warm up. **Warm cache** - populated with useful data but doesn't have a large enough hit ratio ``` Cold --> Warm --> Hot Ratio improving ``` Cache tuning: Aim to cache as high in the stack as possible, closer to where the work is, performed directly reduces the operational overload of cache hits. p. 61: [performance Mantras](https://www.brendangregg.com/methodology.html) ``` State the goals of the study and define system boundaries List system services and possible outcomes Select performance metrics List system and workload parameters Select factors and their values Select the workload Design the experiments Analyze and interpret the data Present the results If necessary, start over ``` ## Disk Utilization (p. 65) Disk utilization can become a problem even before it hits 100%. To find the bottleneck: 1. Measure rate of server requests, monitor this rate over tme 2. Measure hardware and software resource usage 3. Express server requests in terms of resource used 4. Extrapolate severer requests for each resource Constraints: **Hardware: ** + CPU Utilization + Memory Usage + Disk IOPS + Disk Throughput + Disk Capacity **Software: ** + Virtual memory usage + Proess/tasks + File descriptions Sharding - a common strategy for databases where data split into logical components, each managed by its own database p. 106 - CPU versus IO bound: + CPU: Performing heavy compute like science and math + IO-bound: performing io like web servers and file servers, low latency is important