Skip to content

Instantly share code, notes, and snippets.

@karthicraghupathi
Forked from zpoint/gunicorn.md
Created February 28, 2024 22:16
Show Gist options
  • Save karthicraghupathi/b49845a7dc0cf7e8679b1ccd0ffd9f10 to your computer and use it in GitHub Desktop.
Save karthicraghupathi/b49845a7dc0cf7e8679b1ccd0ffd9f10 to your computer and use it in GitHub Desktop.

Revisions

  1. @zpoint zpoint created this gist Aug 4, 2021.
    230 changes: 230 additions & 0 deletions gunicorn.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,230 @@
    # gunicorn workers![image title](http://www.zpoint.xyz:8080/count/tag.svg?url=github%2Fgist%2FPython%2Fgunicorn.md)

    We've learned `SyncWorker` for gunicorn in [part1](https://github.com/zpoint/Blog/blob/master/Python/django/django.md), now let's see how other workers work

    ![workers](https://user-images.githubusercontent.com/18083296/128213871-742e9645-2181-407f-b316-7b5e96597342.png)

    # contents

    * [eventlet](#Eventlet)
    * [gevent](#Gevent)
    * [thread](#thread)
    * [tornado](#tornado)
    * [read more](#read-more)

    ## Eventlet

    If you visite the official site of [eventlet](https://eventlet.net/)

    > Eventlet is a concurrent networking library for Python that allows you to change how you run your code, not how you write it.
    >
    > - It uses epoll or kqueue or libevent for [highly scalable non-blocking I/O](http://en.wikipedia.org/wiki/Asynchronous_I/O#Select.28.2Fpoll.29_loops).
    > - [Coroutines](http://en.wikipedia.org/wiki/Coroutine) ensure that the developer uses a blocking style of programming that is similar to threading, but provide the benefits of non-blocking I/O.
    > - The event dispatch is implicit, which means you can easily use Eventlet from the Python interpreter, or as a small part of a larger application.
    `EventletWorker` inherit from `AsyncWorker`, it override the `init_process` method and `run` method

    ```python3
    def patch(self):
    hubs.use_hub()
    eventlet.monkey_patch()
    patch_sendfile()

    def init_process(self):
    self.patch()
    super().init_process()
    ```

    After `fork` from the master process, the `init_process` calls `eventlet.monkey_patch()` , which replace the following modules by the corresponding `eventlet` support module by default

    ```python3
    for name, modules_function in [
    ('os', _green_os_modules),
    ('select', _green_select_modules),
    ('socket', _green_socket_modules),
    ('thread', _green_thread_modules),
    ('time', _green_time_modules),
    ('MySQLdb', _green_MySQLdb),
    ('builtins', _green_builtins),
    ('subprocess', _green_subprocess_modules),
    ]
    ```

    Eventlet replaced the default IO module by it's `green` module, when you calls the `socket` function, you are actually calling `_green_socket_modules` , which implements nonblocking IO

    On every `socket` read/write, or `time.sleep`, it actually save the current context and add the current gthread to the pooling list, and then calls pool to wait for next ready IO event

    It's like the `async` keyword in python3, but with less code invasion



    If you run your app in eventlet mode

    ```python3
    gunicorn --workers 2 --worker-class eventlet mysite.wsgi
    ```

    ![evenlet](https://user-images.githubusercontent.com/18083296/128213929-4dba4ac0-4abf-4e69-b64b-94e402bd3000.png)

    `EventletWorker` will spawn a new `gthread`, which in charge of accept connection from socket, after accept a new connection from socket, the `gthread` pass the django handle function to the `greenpool`, and use the `greenpool` to start the django function

    Thanks for `eventlet`, we can simply change `--worker-class` to make our django application blocking IO to nonblocking IO

    Compare to define `async` function directly, your code can run both in blocking and nonblocking mode, and easier to debug

    But defining `async` function with `async` keyword directly, require you to design your code in `async` style from the top down, gives you more power about `async` control. For example, `eventlet` with django parallel two different request, while `async` function is able to parallel different IO operation in the same request

    ## Gevent

    If you visite the official site of [gevent](http://www.gevent.org/)

    > gevent is a [coroutine](https://en.wikipedia.org/wiki/Coroutine) -based [Python](http://python.org/) networking library that uses [greenlet](https://greenlet.readthedocs.io/) to provide a high-level synchronous API on top of the [libev](http://software.schmorp.de/pkg/libev.html) or [libuv](http://libuv.org/) event loop.
    >
    > gevent is [inspired by eventlet](http://blog.gevent.org/2010/02/27/why-gevent/) but features a more consistent API, simpler implementation and better performance.
    The differences

    > 1. gevent is built on top of libevent(since 1.0, gevent uses libev and c-ares.)
    > * Signal handling is integrated with the event loop.
    > * Other libevent-based libraries can integrate with your app through single event loop.
    > * DNS requests are resolved asynchronously rather than via a threadpool of blocking calls.
    > * WSGI server is based on the libevent’s built-in HTTP server, making it [super fast](http://nichol.as/benchmark-of-python-web-servers).
    > 2. gevent’s interface follows the conventions set by the standard library
    > 3. gevent does not have all the features that Eventlet has.
    If you had another library (written in C) that used libevent’s event loop and want to integrate them together in a single process, gevent support while eventlet does not

    Let's go back to `gunicorn`

    `GeventWorker` inherit from `AsyncWorker`, it also override the `init_process` method and `run` method

    ```python3
    def patch(self):
    monkey.patch_all()

    def init_process(self):
    self.patch()
    hub.reinit()
    super().init_process()
    ```

    After `fork` from the master process, the `init_process` calls `gevent.monkey()` , which replace the following modules by the corresponding `gevent` support module

    ```python3
    def patch_all(socket=True, dns=True, time=True, select=True, thread=True, os=True, ssl=True,
    subprocess=True, sys=False, aggressive=True, Event=True,
    builtins=True, signal=True,
    queue=True, contextvars=True,
    **kwargs):
    pass

    ```

    The pattern is similar to [eventlet](#Eventlet), the interface is different, so the actual function being called in `run` is slightly different

    ```python3
    # gunicorn/workers/ggevent.py
    from gevent.pool import Pool
    from gevent.server import StreamServer

    def run(self):
    # ...
    pool = Pool(self.worker_connections)
    # ...
    server = StreamServer(s, handle=hfun, spawn=pool, **ssl_args)
    # ...
    server.start()
    ```

    If you run

    ```bash
    gunicorn --workers 2 --worker-class eventlet mysite.wsgi
    ```

    ![gevent](https://user-images.githubusercontent.com/18083296/128213982-34544381-be51-4a3c-8ebc-39cd4a4bf70a.png)

    The pros and cons of using `gevent` is the same as `eventlet`, we are not repeating it again

    If you focus more on performance, or you've C lib that use libevent’s(or libev) event loop that want to integrate into Python in a single process, consider using `gevent`

    If you rely on some specific features on `eventlet` such as `eventlet.db_pool` or `eventlet.processes`, you probably should keep using `eventlet`

    ## thread

    By default `gunicorn` use the `sync` [mode](https://github.com/zpoint/Blog/blob/master/Python/django/django.md), It prefork `workers` number of process and each worker is able to handle one request at a time

    `ThreadWorker` inherit from `Worker`, it also override the `init_process` method and `run` method

    ```python3
    def init_process(self):
    self.tpool = self.get_thread_pool()
    self.poller = selectors.DefaultSelector()
    self._lock = RLock()
    super().init_process()

    def enqueue_req(self, conn):
    conn.init()
    # submit the connection to a worker
    fs = self.tpool.submit(self.handle, conn)
    self._wrap_future(fs, conn)

    def accept(self, server, listener):
    try:
    sock, client = listener.accept()
    # initialize the connection object
    conn = TConn(self.cfg, sock, client, server)
    self.nr_conns += 1
    # enqueue the job
    self.enqueue_req(conn)
    except EnvironmentError as e:
    if e.errno not in (errno.EAGAIN, errno.ECONNABORTED,
    errno.EWOULDBLOCK):
    raise

    def run(self):
    # ....
    ```

    We can see that `init_process` create a thread pool, and `accept` just push the established connection to the `queue` inside the `ThreadPool` object

    > 1. If there is a concern about the application [memory footprint](https://en.wikipedia.org/wiki/Memory_footprint), using `threads` and its corresponding **gthread worker class** in favor of `workers` yields better performance because the application is loaded once per worker and every thread running on the worker shares some memory, this comes to the expense of some additional CPU consumption.
    Let's see an example

    ```bash
    gunicorn --workers 1 --worker-class gthread --threads 2 mysite.wsgi
    ```

    The `--threads` will only affect `gthread` worker class, other worker class will not be affected by `--threads` parameter

    ![gthread](https://user-images.githubusercontent.com/18083296/128214059-4045112c-3dac-4029-9b9a-b591684a790d.png)

    Each worker initialize a `ThreadPool` with size `--threads` threads, whenever the main thread accept a socket object, the object is pushed into the `queue`, and the working thread in `ThreadPool` will pop it from the `queue` and delegate the actual request to django application

    ## tornado

    The last worker class is `tornado`, the code is pretty simple

    ```python3
    # gunicorn/gunicorn/workers/gtornado.py
    def init_process(self):
    # IOLoop cannot survive a fork or be shared across processes
    # in any way. When multiple processes are being used, each process
    # should create its own IOLoop. We should clear current IOLoop
    # if exists before os.fork.
    IOLoop.clear_current()
    super().init_process()

    def run(self):
    # ...
    ```

    The `run` method initlaize monitor utility in `gunicorn` , start a tornado server instance, bind the listening sockets to the tornado server, and finally runs the `IOLoop`

    ## read more

    * [what are you using gevent for?](#https://groups.google.com/g/gevent/c/TelwPl3KgnE)
    * [Comparing gevent to eventlet](https://blog.gevent.org/2010/02/27/why-gevent/)
    * [Better performance by optimizing Gunicorn config](https://medium.com/building-the-system/gunicorn-3-means-of-concurrency-efbb547674b7)