Skip to content

Instantly share code, notes, and snippets.

@psidney
Forked from ameenkhan07/FB-PE-InterviewTips.md
Created July 7, 2021 12:48
Show Gist options
  • Save psidney/78d829eebd5b56c836b33a9aea5e0016 to your computer and use it in GitHub Desktop.
Save psidney/78d829eebd5b56c836b33a9aea5e0016 to your computer and use it in GitHub Desktop.
Facebook Production Engineering Interview

Troubleshooting and Debugging

Tooling

observability, benchmarking, tuning, static performance tuning, profiling, and tracing
  • uptime

    • Useful for CPU load averages (no of processes running and are waiting to run)
    • High Level idea of system usage, moving sum average of 1, 5, 15 minute.
    • "High level" because gives some idea of how the load is changing on a system, i.e. if the load average at 1 min is more than that of 15 min, the load is increasing, or if reverse then load is decreasing. If load is 0.0, then CPU is idle
    • Load averages : CPU demand, ie number of threads which are waiting to run on the CPU
    • Better alternatives : per-CPU utilization - using mpstat -P ALL 1, per-process CPU utilization - top, pidstat
  • dmesg | tail

    • Lists system messages, errors messages related to performance measures can be looked from here
  • vmstat

    • summary of servers memory utilization statistics, short for virtual memory stat
      • r: number of process waiting to run on CPU. Value greater than CPU count means saturation of the server.
      • free : free memory in kilobytes. Alternative, more elaborate, free
      • si, so: Page in and page out (paging~swapping). When pages are written into disk from memory, it is pageout. Page in, when data(process data) is brought from disk to memory, in the forms of pages. Pageins are fine, application initialization will have page-ins. Too many page-out indicate that kernel might be spending too much time managing than application processing (thrashing). In case of constant pageouts, check process occupying cpu the most using ps command.
      • us, sy, id, wa, st : CPU times : user time, system time (kernel), idle, wait I/O, and stolen time. -Options - 1: every second t: timestamp column, SM: Data in Megabytes
  • mpstat -P ALL 1

    • CPU time breakdowns per CPU(cores) using the -P option. One of the cores/CPU overworking indicate high usage of a single threaded app.
    • usr : percentage of cpu utilization while executing user level application
    • sys : percentage of cpu utilization while executing by kernel
  • pidstat 1

    • Summary of per process statistics, like top, but doesnt clean the screen. Easy to see patterns over time.
  • iostat -xz 1

    • Used for devices (hard disks), to understand the workload applied and performance.
    • r/s, w/s, rkB/s, wkB/s : no of reads & writes, no of kB read and written from the atttached devices
    • util : Percentage of time the device is doing work. Interpretation: omre than 60 percent
    • await : avg time for io. Time queued or time being serviced. Larger than expected times might mean device saturation.
  • free -m

    • alternate cat /proc/meminfo
    • buffer/cached = sum of buffer and cache. Buffer used for deivce io, cache used by filesystem.
  • sar -n DEV 1

  • sar -n TCP,ETCP 1

  • top

What to Expect

• This 45-minute systems interview will focus on responding to real world problems with an unhealthy service, such as a web server or database. The interview will start off at a high level troubleshooting a likely scenario, dig deeper to find the cause and some possible solutions for it. The goal is to probe your knowledge of systems at scale and under load, so keep in mind the challenges of the Facebook environment.
• Depending on how your conversation goes, your interviewer may ask to use CoderPad.
• Some of the questions may be around scalability, so think of solutions that would apply and be effective in our environment.

 

Helpful Tips

• Focus on things that might show up in your average Operating Systems class such as tooling, memory management and unix process lifecycle.
• Spend time on a linux system — maybe even install one from scratch. Run Linux as your primary desktop environment for a while to force yourself to learn how it works, even though servers != desktops.
• Brendan Gregg's blog & his book "Systems Performance" may help refresh basic OS material
• What's it like to be a PE at Facebook?

Systems

More specifically, linux troubleshooting and debugging. Understanding things like memory, io, cpu, shell, memory etc. would be pretty helpful. Knowing how to actually write a unix shell would also be a good idea. What tools might you use to debug something? On another note, this interview will likely push your boundaries of what you know (and how to implement it).

Design/Architecture 

This interview is all about taking an ambiguous question of how you might build a system and letting you guide the way. Your interviewer will add in constraints when necessary and the idea is to get a simple, workable solution on the board. Things like load and monitoring are things you might consider. What you consider is just as important as to what you don’t. So ask clarifying questions and gather requirements when appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment