# Collection of sysctl networking notes - [List & reloading changes](#list--reloading-changes) - [Summary details of each connection status](#summary-details-of-each-connection-status) - [Settings tweaking in `/etc/sysctl.conf`](#settings-tweaking-in-etcsysctlconf) - [Detailed setting notes](#detailed-setting-notes) - [Control `TIME_WAIT` and connection tracking timeouts](#control-time_wait-and-connection-tracking-timeouts) - [Miscellaneous](#miscellaneous) - [Nginx Plus additions](#nginx-plus-additions) - [Further reading](#further-reading) ## List & reloading changes ```sh $ sysctl --all $ sysctl --load ``` ## Summary details of each connection status ```sh $ netstat --numeric --tcp | tail --lines +3 | \ awk "{n[\$6]++} END { for(k in n) { print k, n[k]; }}" ``` ## Settings tweaking in `/etc/sysctl.conf` ``` # per-socket receive/send buffers net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # per-socket receive/send buffers for TCP [min default max] net.ipv4.tcp_rmem = 4096 12582912 16777216 net.ipv4.tcp_wmem = 4096 12582912 16777216 #net.ipv4.tcp_rmem = 4096 87380 16777216 #net.ipv4.tcp_wmem = 4096 65536 16777216 #net.ipv4.tcp_rmem = 4096 16060 262144 #net.ipv4.tcp_wmem = 4096 16384 262144 # port range used by TCP and UDP to choose the local port # default: net.ipv4.ip_local_port_range = 32768 60999 net.ipv4.ip_local_port_range = 1024 61000 # various timewait socket setting tweaks net.ipv4.tcp_tw_reuse = 1 #net.ipv4.tcp_tw_recycle = 1 #net.ipv4.tcp_max_tw_buckets = 400000 #net.ipv4.tcp_max_orphans = 60000 # time that must elapse before TCP/IP can release an orphaned/closed connection and reuse its resources # default: net.ipv4.tcp_fin_timeout = 60 #net.ipv4.tcp_fin_timeout = 30 # note: net.ipv4.tcp_syncookies enabled by default with Ubuntu 12.04LTS+ net.ipv4.tcp_syncookies = 1 # remembered connection requests, without an ACK # note: will increase automatically in proportion to available memory # default: net.ipv4.tcp_max_syn_backlog = 128 net.ipv4.tcp_max_syn_backlog = 4096 #net.ipv4.tcp_max_syn_backlog = 8096 # upper limit allowed for a listen() backlog # maximum established sockets (with an ACK) waiting to be accepted by listening process # default: net.core.somaxconn = 128 net.core.somaxconn = 1024 net.core.somaxconn = 4096 net.core.somaxconn = 8192 # give kernel more memory for TCP #net.ipv4.tcp_mem = 50576 64768 98152 #net.core.netdev_max_backlog = 2500 #net.core.netdev_max_backlog = 5000 # note: only need to tweak if ip_conntrack is used - e.g. stateful iptables rules # default: net.ipv4.netfilter.ip_conntrack_max = 65536 #net.ipv4.netfilter.ip_conntrack_max = 1048576 ``` ### Detailed setting notes - `net.ipv4.tcp_tw_reuse = 1`: Allow to reuse `TIME_WAIT` sockets for new connections when it is safe from protocol viewpoint. In detail Linux will reuse an existing connection in the `TIME_WAIT` state for a new **outgoing** connection **only**. An outgoing connection in the `TIME_WAIT` state can be reused after just one second. Again, note the fact it will only reuse for outgoing connections, not incoming - so the practical use of this for a server might be fairly limited. - `net.ipv4.tcp_tw_recycle = 1`: - From [this article](http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html) it's not worth/dangerous to enable this with NAT devices connecting, also [here](https://www.varnish-cache.org/lists/pipermail/varnish-misc/2009-September/017483.html). - Warnings around this: - http://permalink.gmane.org/gmane.comp.web.haproxy/3701. - http://stackoverflow.com/questions/6426253/tcp-tw-reuse-vs-tcp-tw-recycle-which-to-use-or-both. - `net.ipv4.tcp_max_tw_buckets`: Maximal number of timewait sockets held by system simultaneously. If this number is exceeded, a time-wait socket is immediately destroyed and a warning is printed. This limit exists only to prevent simple DoS attacks, you must not lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value. - `net.ipv4.tcp_max_orphans`: Maximal number of TCP sockets not attached to any user file handle, held by system. If this number is exceeded orphaned connections are reset immediately and a warning is printed. This limit exists only to prevent simple DoS attacks, you must not rely on this or lower the limit artificially, but rather increase it (probably, after increasing installed memory), if network conditions require more than default value and also with tuning of network services which linger - killing such states more aggressively. **Note:** each orphan eats up to 64K of unswappable memory. - `net.ipv4.tcp_fin_timeout`: Time that must elapse before TCP/IP can release an orphaned (no longer referenced by any application) connection and reuse its resources. During this `TIME_WAIT` state, reopening the connection to the client costs less than establishing a new connection. Reducing the value of this entry, TCP/IP can release closed connections faster, making more resources available for new connections. Can cause issues when set below **25-30** seconds. - `net.ipv4.tcp_max_syn_backlog`: Maximal number of remembered connection requests, which have not received an acknowledgment from connecting client. The minimal value is 128 for low memory machines, and it will increase in proportion to the memory of machine. If server suffers from overload, try increasing this number. - `net.core.somaxconn`: - An upper limit for the value of the backlog parameter passed to the [`listen(2)`](http://man7.org/linux/man-pages/man2/listen.2.html) function. If the `backlog` argument is greater than the value of `/proc/sys/net/core/somaxconn`, then it is silently truncated to this limit. - **Note:** as per the [`listen(2)`](http://man7.org/linux/man-pages/man2/listen.2.html) man page, with Linux 2.2 the meaning of `backlog` changed: - It now specifies the queue length for _completely established_ sockets waiting to be accepted. - The maximum length of _incomplete connection requests_ is set via `net.ipv4.tcp_max_syn_backlog`. - Details: https://derrickpetzold.com/p/somaxconn/. - Raising this value [may **not** be wise](https://serverfault.com/questions/518862/will-increasing-net-core-somaxconn-make-a-difference). - View current number of active connections in this queue: ```sh $ netstat --all --numeric --tcp | grep --count "SYN_RECV" ``` ### Control `TIME_WAIT` and connection tracking timeouts - Refer to: http://www.engineyard.com/blog/2012/linux-scalability/, this is good stuff. - Tweak `ulimit -a` values for open file handles. - `nf_conntrack_tcp_timeout_time_wait`: By default, a connection is supposed to stay in the `TIME_WAIT` state for twice the MSL. Its purpose is to make sure any lost packets that arrive after a connection is closed do not confuse the TCP subsystem. The default [maximum segment lifetime (MSL)](http://en.wikipedia.org/wiki/Maximum_Segment_Lifetime) is 60 seconds, which puts the default `TIME_WAIT` timeout value at 2 minutes. This means you’ll run out of available ports if you receive more than about 400 requests a second. - `nf_conntrack_tcp_timeout_established`: The established connection timeout. Technically this should only apply to connections that are in the `ESTABLISHED` state and a connection should get out of this state when a FIN packet goes through in either direction - but it seems this does not always happen. So how long do connections stay in this table then? It turns out that the default value for `nf_conntrack_tcp_timeout_established` is **432000** seconds (around 5 days). ``` net.netfilter.nf_conntrack_tcp_timeout_time_wait = 15 net.netfilter.nf_conntrack_tcp_timeout_established = 300 ``` View maximum/current in use netfilter connection tracking counts: ```sh $ sysctl net.ipv4.netfilter.ip_conntrack_max $ sysctl net.netfilter.nf_conntrack_count $ cat /proc/net/ip_conntrack && wc --lines /proc/net/ip_conntrack ``` ## Miscellaneous ### Nginx Plus additions Part of the following Amazon Web Services AMI: https://aws.amazon.com/marketplace/pp/B00UU272MM ``` # AMI-ID: ami-5d56a83f // nginx-plus-ami-amazon-linux-hvm-v1.2-20180118.x86_64 net.ipv4.ip_local_port_range = 1024 64999 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.core.wmem_max = 16777216 net.core.rmem_max = 16777216 net.ipv4.tcp_tw_reuse = 1 net.core.netdev_max_backlog = 30000 net.core.somaxconn = 32768 net.ipv4.tcp_max_orphans = 32768 ``` ``` # AMI-ID: ami-aefed3cd // nginx-plus-ami-amazon-linux-hvm-v1.1-20160426.x86_64 net.ipv4.ip_local_port_range = 1024 65000 net.ipv4.tcp_wmem = 4096 65536 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.core.wmem_max = 16777216 net.core.rmem_max = 16777216 net.ipv4.tcp_tw_reuse = 1 net.core.netdev_max_backlog = 30000 net.core.somaxconn = 32768 net.ipv4.tcp_max_orphans = 32768 ``` ## Further reading - Kernel.org references for `/proc/sys/net/ipv4/*` and `/proc/sys/net/netfilter/nf_conntrack_*` settings: - https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt - https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt - http://agiletesting.blogspot.com/2009/03/haproxy-and-apache-performance-tuning.html - http://baheyeldin.com/technology/linux/detecting-and-preventing-syn-flood-attacks-web-servers-running-linux.html - http://comments.gmane.org/gmane.comp.web.haproxy/1384 - http://lartc.org/howto/lartc.kernel.obscure.html (explains `tcp_max_orphans` & `tcp_max_tw_buckets` well). - https://lowlatencyweb.wordpress.com/2012/03/20/500000-requestssec-modern-http-servers-are-fast/ - https://redmine.lighttpd.net/projects/1/wiki/Docs_Performance - https://www.frozentux.net/documents/ipsysctl-tutorial/ - https://www.mail-archive.com/haproxy@formilux.org/msg01708.html - http://www.metabrew.com/article/a-million-user-comet-application-with-mochiweb-part-1 - http://www.symantec.com/connect/articles/hardening-tcpip-stack-syn-attacks - https://serverfault.com/questions/400822/tuning-linux-haproxy - https://serverfault.com/questions/408576/why-does-nf-conntrack-count-keep-increasing - https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux - https://www.slideshare.net/brendangregg/how-netflix-tunes-ec2-instances-for-performance (slide #33). - https://man7.org/linux/man-pages/man2/listen.2.html - https://man7.org/linux/man-pages/man7/tcp.7.html - https://blog.cloudflare.com/syn-packet-handling-in-the-wild/