Skip to content

Instantly share code, notes, and snippets.

@netojoaobatista
Forked from joewiz/post-mortem.md
Created July 17, 2018 14:12
Show Gist options
  • Select an option

  • Save netojoaobatista/d61d7622461a2f0d365b15ce84db166c to your computer and use it in GitHub Desktop.

Select an option

Save netojoaobatista/d61d7622461a2f0d365b15ce84db166c to your computer and use it in GitHub Desktop.

Revisions

  1. @joewiz joewiz created this gist Oct 28, 2015.
    13 changes: 13 additions & 0 deletions post-mortem.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,13 @@
    On Tue Oct 27, 2015, history.state.gov began buckling under load, intermittently issuing 500 errors. Nginx's error log was sprinkled with the following errors:

    > 2015/10/27 21:48:36 [crit] 2475#0: accept4() failed (24: Too many open files)
    > 2015/10/27 21:48:36 [alert] 2475#0: *7163915 socket() failed (24: Too many open files) while connecting to upstream...
    An article at http://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/ provided directions that mostly worked. Below are the steps we followed. The steps that diverged from the article's directions are marked with an *.

    1. * Instead of using `su` to run `ulimit` on the nginx account, use `ps aux | grep nginx` to locate nginx's process IDs. Then query each process's file handle limits using `cat /proc/pid/limits` (where `pid` is the process id retrieved from `ps`). (Note: `sudo` may be necessary on your system for the `cat` command here, depending on your system.)
    2. Added `fs.file-max = 70000` to /etc/sysctl.conf
    3. Added `nginx soft nofile 10000` and `nginx hard nofile 30000` to /etc/security/limits.conf
    4. Ran `sysctl -p`
    5. Added `worker_rlimit_nofile 30000;` to /etc/nginx/nginx.conf.
    6. * While the directions suggested that `nginx -s reload` was enough to get nginx to recognize the new settings, not all of nginx's processes received the new setting. Upon closer inspection of `/proc/pid/limits` (see #1 above), the first worker process still had the original S1024/H4096 limit on file handles. Even `nginx -s quit` didn't shut nginx down. The solution was to kill nginx with the `kill pid`. After restarting nginx, all of the nginx-user owned processes had the new file limit of S10000/H30000 handles.