### Overview ### For those that are unfamiliar with the project, [PyPy](http://pypy.org/) is an implementation of the Python language that features a [JIT Compiler](http://en.wikipedia.org/wiki/Just-in-time_compilation). I have noticed a huge performance benefit in some personal projects by switching to PyPy. I have always been curious how it would perform on a large and complex project like OpenStack, but my early experiments ran into massive roadblocks around broken dependencies. It has been six months since I last looked, so I figured it was time to try it again. Support has come a long way and, now that lxml is working, we are close enough to get a Proof-of-Concept running. Read on for instructions on running nova with PyPy. ### Preparation ### Start out with a base ubuntu 12.04 (precise) install and run devstack. I won't go through the details of getting devstack running here, because there are already instructions on the [devstack site](http://devstack.org). ### Install PyPy ### First we need to download pypy and unpack it: ```bash wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.0-beta1-linux64-libc2.15.tar.bz2 tar -jxvf pypy-2.0-beta1-linux64-libc2.15.tar.bz2 ``` For convenience, we put PyPy in the path: ```bash sudo ln -s $PWD/pypy-2.0-beta1/bin/pypy /usr/bin/pypy ``` Next we need distribute and pip so we can install dependencies: ``` curl -O http://python-distribute.org/distribute_setup.py curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py pypy distribute_setup.py pypy get-pip.py ``` ### Modify Nova ### A few modifications to nova are needed to run all of the binaries with PyPy. I had to make four changes: 1. [Updated binaries to run with pypy instead of python](https://github.com/vishvananda/nova/commit/b804bc50c61dd756d8af301e3f20394bfa2c7318) 1. [Removed dependency on psycopg2](https://github.com/vishvananda/nova/commit/a1e2ee753d80cfcc43b29fbe005b365ddc8c027d) 1. [Removed dependency on websockify](https://github.com/vishvananda/nova/commit/682caa2d7bb0d8038612ce263f375f385c5b4c59) 1. [Worked around lack of support for os.statvfs](https://github.com/vishvananda/nova/commit/7a4027de4c4b0d29cedbf0f6bb86ac04ded661c0) Note that this is just a proof of concept, so I'm not overly worried about the lack of novnc support and improper disk usage reporting. The above fixes can be snagged from the pypy branch on my github: ```bash cd /opt/stack/nova git remote add vishvananda https://github.com/vishvananda/nova.git git fetch vishvananda git checkout pypy cd - ``` ### Install Dependencies ### We can use pip to install nova's dependencies for PyPy: ```bash ./pypy-2.0-beta1/bin/pip install -r /opt/stack/nova/tools/pip-requires ./pypy-2.0-beta1/bin/pip install -r /opt/stack/nova/tools/test-requires ``` ### Fix Eventlet ### Eventlet has some issues with PyPy that need to be patched. First we will install a specific version so we can be sure that the patches will work: ```bash ./pypy-2.0-beta1/bin/pip install eventlet==0.12.1 ``` There is an issue with corolocal. It appears that due to some difference in the PyPy implementation of WeakRef or the object model the check for the existence of a specific `__init__` fails. Specifically `thrl.__init__` can be `None` and the attempt to call it doesn't work. We work around it by just making sure it isn't `None` before calling it: ```bash sed -i "s/if cls.__init__ is not object.__init__:/if cls.__init__ is not object.__init__ and thrl.__init__:/" pypy-2.0-beta1/site-packages/eventlet/corolocal.py ``` A more worrisome issue is that PyPy uses its own implementation of socket (it is similar to the Python3 implementation). Eventlet's green socket implementation isn't compatible and breaks down. PyPy's sockets have a method that eventlet's green sockets don't support called `_decref_socketios`. This is supposed to be used to keep track of reference counts. Completely rewriting eventlet's socket class to be compatible with PyPy is more that I want to deal with for a POC, but the following minimal patch makes it work: ```bash cat | patch pypy-2.0-beta1/site-packages/eventlet/greenio.py << EOF # fix issue with eventlet socket 335a336 > self._closed = False 337a339,341 > def _decref_socketios(self): > self.__del__() > 374,378c378,384 < try: < os.close(self._fileno) < except: < # os.close may fail if __init__ didn't complete (i.e file dscriptor passed to popen was invalid < pass --- > if not self._closed: > try: > os.close(self._fileno) > self._closed = True > except: > # os.close may fail if __init__ didn't complete (i.e file dscriptor passed to popen was invalid > pass EOF ``` ### Build libvirt ### Libvirt can't be installed with pip so you will have to install it manually. The first step is to create a short shell script called pypy-config. This tells the libvirt configure script where to find the PyPy include directory: ```bash echo '#!/usr/bin/env bash' > $PWD/pypy-2.0-beta1/bin/pypy-config echo "echo '-I$PWD/pypy-2.0-beta1/include/'" >> $PWD/pypy-2.0-beta1/bin/pypy-config chmod 755 $PWD/pypy-2.0-beta1/bin/pypy-config ``` Next we download the libvirt dependencies and source: ```bash sudo apt-get build-dep libvirt apt-get source libvirt ``` Configure libvirt to build with PyPy: ```bash cd libvirt-0.9.8 ./configure --libdir=/usr/lib --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-python=`dirname $PWD`/pypy-2.0-beta1/bin/pypy ``` The lxc code seems to have trouble finding libnl. I suspect this could be fixed by passing `--without-lxc` to configure above, but I manually fixed it in the Makefile: ```bash sed -i 's/libvirt_lxc_CFLAGS = \\/libvirt_lxc_CFLAGS = $(LIBNL_CFLAGS) \\/' src/Makefile ``` The current version of PyPy doesn't support PyInstance_New. It looks like [this patch](https://bitbucket.org/pypy/pypy/commits/2dd5a00fac80bc87da0bb5c0a5ed7272b5a1e971) adds support but it isn't in the beta, so we need to work around it by replacing the call with a call to PyEval_CallObject: ```bash sed -i '1N;$!N;s/.*PyInstance_New.*\n.*\n.*/PyEval_CallObject(dom_class, pyobj_dom_args);/;P;D' python/libvirt-override.c ``` Now we should be able to build libvirt: ```bash make ``` Devstack has already installed the libvirt binaries, so we just need the python libraries. Note that PyPy wants its .so files to have a .pypy-20.so extension, so we copy them in with the proper name and make them executable: ```bash for a in `ls python/.libs/*.so`; do b=$(echo ../pypy-2.0-beta1/site-packages/$(basename $a) | sed s/.so/.pypy-20.so/); cp $a $b; chmod 755 $b; done ``` Next we copy the .py files: ```bash cp python/libvirt.py ../pypy-2.0-beta1/site-packages/ cp python/libvirt_*.py ../pypy-2.0-beta1/site-packages/ ``` ### Running Nova ### Now that we've done all of the prep work, we can rerun the binaries using the PyPy versions: ```bash killall screen cd devstack ./rejoin_stack.sh ``` All of the normal nova commands should work with our shiny new PyPy install of nova. ### Stability ### Overall nova runs pretty well. Startup time is a bit slow, but that is expected with a JIT. I did get one crash in nova-conductor after leaving it running for a while: ```bash RPython traceback: File "pypy_jit_metainterp_compile.c", line 133, in force_now_1 File "pypy_jit_metainterp_compile.c", line 219, in ResumeGuardForcedDescr_save_data Fatal RPython error: AssertionError Aborted ``` This seems to be a crash in the PyPy compiler, which might be because we are running a beta version. I suspect it my hacky eventlet patching has exposed a bug, but perhaps some other library just isn't totally happy with PyPy yet. __UPDATE__: Stackless support has issues in beta1 as mentioned on the [PyPy site](http://pypy.org/features.html), so this crash should be gone by the time 2.0 final is out. ### Performance ### __UPDATE__: The current version of PyPy disables the JIT when eventlet is enabled. This is actively being worked on in a [branch](https://bitbucket.org/pypy/pypy/src/?at=jitframe-on-heap). I will detail my attempt to get the branch running in a future post. Despite my experience with smaller projects, nova runs a bit more slowly with PyPy than with CPython. Specifically, simple API requests take about twice as long and complex operations like instance launch can take 33% longer (8 seconds in PyPy vs 6. seconds in CPython). I'm surprised; I expected a big speedup since a lot of our time is spent in python code (SQLALchemy I'm looking at you). It is possible that the JIT benefits are being offset by slowdowns in c extensions. Or, perhaps eventlet's greenlet switches are preventing the JIT from optimizing effectively. It's hard to say without doing some in-depth profiling. ### Conclusion ### PyPy is on the cusp of being ready for real production applications like this. Based on my experience with other projects, I suspect that with some optimization effort we could see a performance benefit over the CPython version. Some upstream fixes are clearly needed to eventlet, and PyPy might need a little longer to bake for stability reasons, but we should definitely keep our eye on PyPy as an option in the future.