Tool Categories : observability, benchmarking, tuning, static performance tuning, profiling, and tracing
Types of resources : CPU, Memory, Block Devices(disk), Network Devices
-
uptime
- Measure of cpu demand by looking at system(CPU + disks) load averages (no of processes running or are waiting to run)
- High Level idea of system usage and how the load changes. 3 numbers -moving load averages at 1, 5, 15 minute.
- Interpretation : if the load average at 1 min is more than that of 15 min, the load is increasing, or if reverse then load is decreasing. If load is 0.0, then CPU is idle.
- If load average is greater than CPU, meaning more work than what cpu can dispatch. CPU Saturation
- Better alternatives : per-CPU utilization - using mpstat -P ALL 1, per-process CPU utilization - top, pidstat
-
dmesg | tail
- Lists system messages, errors messages related to performance measures can be looked from here
-
vmstat
- summary of servers memory utilization statistics, short for virtual memory stat
- r: number of process waiting to run on CPU. Value greater than CPU count means saturation of the server.
- free : free memory in kilobytes. Alternative, more elaborate, free
- si, so: Page in and page out (paging~swapping). When pages are written into disk from memory, it is pageout. Page in, when data(process data) is brought from disk to memory, in the forms of pages. Pageins are fine, application initialization will have page-ins. Too many page-out indicate that kernel might be spending too much time managing than application processing (thrashing). In case of constant pageouts, check process occupying cpu the most using ps command.
- us, sy, id, wa, st : CPU times : user time(application ), system time (kernel), idle, wait I/O, and stolen time. -Options - 1: every second t: timestamp column, SM: Data in Megabytes
- summary of servers memory utilization statistics, short for virtual memory stat
-
mpstat -P ALL 1
- CPU time breakdowns per CPU(cores) us ing the -P option. One of the cores/CPU overworking indicate high usage of a single threaded app.
- usr : percentage of cpu utilization while executing user level application
- sys : percentage of cpu utilization while executing by kernel
-
pidstat 1
- Summary of per process statistics(breakdown), like top, but doesnt clean the screen. Easy to see patterns over time, rolling output.
- usr, system for each process.
-
iostat -xz 1
- Used for devices (hard disks), to understand the workload applied and resulting performance.
- Workload metrics
- r/s, w/s, rkB/s, wkB/s : no of reads & writes, no of kB read and written from the atttached devices
- Resulting performance metrics
- await : avg time for io. Time queued or time being serviced or time waiting for the blocked disk .Larger than expected times might mean device saturation.
- util : Percentage of time the device is doing work. Interpretation: more than 60 percent indicate device sationation
-
free -m
- alternate cat /proc/meminfo
- buffer/cached = sum of buffer and cache. Buffer used for block device io, cache used by virtual page cache.
-
sar -n DEV 1
-
Tool to check network throughput and ensure if it is under the limit. rxKbps and txkBps : measure of workload
-
sar -n TCP,ETCP 1
- Overview of tcp metrics.
- active and passive: outbound and inbound connections. Used as measure of network load on the server
-
top
- System wide summary. All of above (memory, CPU, IO, network)
- Consumes cpu to read /proc.
- % CPU summed across all CPUs.
-
ps
- Process status listing
-
strace: System call tracer. Translates syscall args. Usful in solving system usage issues.
- Implementations of strace use ptrace, alternate use perf using perf-trace. Former slows system down so have to be cautious.
- Blocks the target, slows application down, Shouldnt be used in production.
-
tcpdump
- Trace packets. Packets sequences etc. Scalability issue when network is io in high volumes (gigabits). Doesnt scale well.
-
netstat
-
Prints network protocol statistics. Different options provide differ information (interface stats, route table etc)
-
Better command (ip table etc) ss
-
nicstat
-
Network interface stats
-
swapon -s
-
Shows swap device usage
-
lsof
-
Debug tool. Understand env, who is connected to who. Which files are connected to which process.
-
sar
- System activity reporter. Many statistics (TCP, DEV(networking))
- Complements top by giving statistics from the past.