## Troubleshooting and Debugging ### Tooling ``` observability, benchmarking, tuning, static performance tuning, profiling, and tracing ``` - uptime - Useful for CPU load averages (no of processes running and are waiting to run) - *High Level* idea of system usage, moving sum average of 1, 5, 15 minute. - "High level" because gives some idea of how the load is changing on a system, i.e. if the load average at 1 min is more than that of 15 min, the load is increasing, or if reverse then load is decreasing. If load is 0.0, then CPU is idle - Load averages : CPU demand, ie number of threads which are waiting to run on the CPU - Better alternatives : per-CPU utilization - using mpstat -P ALL 1, per-process CPU utilization - top, pidstat - dmesg | tail - Lists system messages, errors messages related to performance measures can be looked from here - **vmstat** - summary of servers memory utilization statistics, short for virtual memory stat - **r**: number of process waiting to run on CPU. Value greater than CPU count means saturation of the server. - **free** : free memory in kilobytes. Alternative, more elaborate, *free* - **si, so**: Page in and page out (paging~swapping). When pages are written into disk from memory, it is pageout. Page in, when data(process data) is brought from disk to memory, in the forms of pages. Pageins are fine, application initialization will have page-ins. Too many page-out indicate that kernel might be spending too much time managing than application processing (thrashing). In case of constant pageouts, check process occupying cpu the most using ps command. - **us, sy, id, wa, st** : CPU times : user time, system time (kernel), idle, wait I/O, and stolen time. -Options - 1: every second t: timestamp column, SM: Data in Megabytes - mpstat -P ALL 1 - CPU time breakdowns per CPU(cores) using the -P option. One of the cores/CPU overworking indicate high usage of a single threaded app. - **usr** : percentage of cpu utilization while executing user level application - **sys** : percentage of cpu utilization while executing by kernel - pidstat 1 - Summary of per process statistics, like top, but doesnt clean the screen. Easy to see patterns over time. - iostat -xz 1 - Used for devices (hard disks), to understand the workload applied and performance. - **r/s, w/s, rkB/s, wkB/s** : no of reads & writes, no of kB read and written from the atttached devices - **util** : Percentage of time the device is doing work. Interpretation: omre than 60 percent - **await** : avg time for io. Time queued or time being serviced. Larger than expected times might mean device saturation. - free -m - alternate cat /proc/meminfo - buffer/cached = sum of buffer and cache. Buffer used for deivce io, cache used by filesystem. - sar -n DEV 1 - sar -n TCP,ETCP 1 - top