Skip to content

Instantly share code, notes, and snippets.

@veekaybee
Created February 16, 2023 19:38
Show Gist options
  • Select an option

  • Save veekaybee/1cddbd7d07de431d9f93734a8f0b47c8 to your computer and use it in GitHub Desktop.

Select an option

Save veekaybee/1cddbd7d07de431d9f93734a8f0b47c8 to your computer and use it in GitHub Desktop.

Revisions

  1. veekaybee created this gist Feb 16, 2023.
    131 changes: 131 additions & 0 deletions systems_performance.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,131 @@
    ## [Systems Performance 2nd edition](https://www.brendangregg.com/blog/2020-07-15/systems-performance-2nd-edition.html)

    See [synthesized write-up here](https://vickiboykis.com/2022/12/05/the-cloudy-layers-of-modern-day-programming/)

    + Do a quick performance check in 60 seconds
    + Use a number of different tools available in unix
    + Use flamegraphs of the callstack if you have access to them
    + Best performance winds are elimiating unnecessary wrok, for example a thread stack in a loop, eliminating bad config
    + Mantras: Don't do it (elimiate); do it again (caching); do it less (polling), do it when they're not looking, do it concurrently, do it more cheaply

    + **Latency** is an essential performance metric - the time for an operation to complete
    - Operation request
    - Database query
    - File system operation
    - We can improve latency by decreasing disk reads, aka caching

    ## Actionable Chain of Events

    `Counter --> Statistics --> Metrics --> Alerts`

    Profiling tools allow us to take simple measures of CPUs, [including flamegraphs](https://www.brendangregg.com/flamegraphs.html), which show us CPU footprint.

    ![](https://camo.githubusercontent.com/eecfbf00e6cc5baf6ae2b66283573d765f8fe29f1d3df10f4ce3423d942c0af3/687474703a2f2f7777772e6272656e64616e67726567672e636f6d2f466c616d654772617068732f6370752d626173682d666c616d6567726170682e737667)
    The x-axis shows the stack profile population, sorted alphabetically (it is not the passage of time), and the y-axis shows stack depth, counting from zero at the bottom. Each rectangle represents a stack frame. The wider a frame is is, the more often it was present in the stacks. The top edge shows what is on-CPU, and beneath it is its ancestry. Original flame graphs use random colors to help visually differentiate adjacent frames. Variations include inverting the y-axis (an "icicle graph"), changing the hue to indicate code type, and using a color spectrum to convey an additional dimension.

    Tracing - Event-based recording where data is saved for later analysis.

    ## Linux 60-second checklist

    Also here: [if you only have a bit of time to profile your system.](https://www.brendangregg.com/Articles/Netflix_Linux_Perf_Analysis_60s.pdf)

    In 60 seconds you can get a high level idea of system resource usage and running processes by running the
    following ten commands. Look for errors and saturation metrics, as they are both easy to interpret, and then
    resource utilization. Saturation is where a resource has more load than it can handle, and can be exposed
    either as the length of a request queue, or time spent waiting

    Don't only use top because you don't know other tools, creates a streetlight effect.

    ```
    uptime
    dmesg | tail
    vmstat 1
    mpstat -P ALL 1
    pidstat 1
    iostat -xz 1
    free -m
    sar -n DEV 1
    sar -n TCP,ETCP 1
    top
    ```

    ## High-level terminology

    + IOPS - input/output per second, data trasnfer
    + Latency - measure of time of operations spent waiting
    + Saturation - Degree which a resource has been queued
    + Hit ratio: number of times needed data is found in cache versus total access (hits+ misses)
    +

    ## Performance tradeoffs:

    ```
    Good -- Fast -- Cheap ; high-performance -- Ontime -- inexpensive
    ```

    File system size: small records perform better for I/O; larger record sizes will improve streaming workloads

    ## Types of caches

    + Performance tuning is most effective when done closest to the work performed

    + **MRU **- most recently used
    + **LRU** - least recently used
    + **MFU** - most frequently used
    + **LFU** - least recently used

    **Cold cache** - empty, populated with unwanted data. Hit ratio is zero as it begins to warm up.
    **Warm cache** - populated with useful data but doesn't have a large enough hit ratio

    ```
    Cold --> Warm --> Hot
    Ratio improving
    ```
    Cache tuning: Aim to cache as high in the stack as possible, closer to where the work is, performed directly reduces the operational overload of cache hits.

    p. 61: [performance Mantras](https://www.brendangregg.com/methodology.html)

    ```
    State the goals of the study and define system boundaries
    List system services and possible outcomes
    Select performance metrics
    List system and workload parameters
    Select factors and their values
    Select the workload
    Design the experiments
    Analyze and interpret the data
    Present the results
    If necessary, start over
    ```

    ## Disk Utilization (p. 65)

    Disk utilization can become a problem even before it hits 100%. To find the bottleneck:

    1. Measure rate of server requests, monitor this rate over tme
    2. Measure hardware and software resource usage
    3. Express server requests in terms of resource used
    4. Extrapolate severer requests for each resource

    Constraints:

    **Hardware: **
    + CPU Utilization
    + Memory Usage
    + Disk IOPS
    + Disk Throughput
    + Disk Capacity

    **Software: **
    + Virtual memory usage
    + Proess/tasks
    + File descriptions

    Sharding - a common strategy for databases where data split into logical components, each managed by its own database

    p. 106 - CPU versus IO bound:
    + CPU: Performing heavy compute like science and math
    + IO-bound: performing io like web servers and file servers, low latency is important