For those that are unfamiliar with the project, PyPy is an implementation of the Python language that features a JIT Compiler. I have noticed a huge performance benefit in some personal projects by switching to PyPy. I have always been curious how it would perform on a large and complex project like OpenStack, but my early experiments ran into massive roadblocks around broken dependencies.
It has been six months since I last looked, so I figured it was time to try it again. Support has come a long way and, now that lxml is working, we are close enough to get a Proof-of-Concept running. Read on for instructions on running nova with PyPy.
Start out with a base ubuntu 12.04 (precise) install and run devstack. I won't go through the details of getting devstack running here, because there are already instructions on the devstack site.
First we need to download pypy and unpack it:
wget https://bitbucket.org/pypy/pypy/downloads/pypy-2.0-beta1-linux64-libc2.15.tar.bz2
tar -jxvf pypy-2.0-beta1-linux64-libc2.15.tar.bz2For convenience, we put PyPy in the path:
sudo ln -s $PWD/pypy-2.0-beta1/bin/pypy /usr/bin/pypyNext we need distribute and pip so we can install dependencies:
curl -O http://python-distribute.org/distribute_setup.py
curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py
pypy distribute_setup.py
pypy get-pip.py
A few modifications to nova are needed to run all of the binaries with PyPy. I had to make four changes:
- Updated binaries to run with pypy instead of python
- Removed dependency on psycopg2
- Removed dependency on websockify
- Worked around lack of support for os.statvfs
Note that this is just a proof of concept, so I'm not overly worried about the lack of novnc support and improper disk usage reporting.
The above fixes can be snagged from the pypy branch on my github:
cd /opt/stack/nova
git remote add vishvananda https://github.com/vishvananda/nova.git
git fetch vishvananda
git checkout pypy
cd -We can use pip to install nova's dependencies for PyPy:
./pypy-2.0-beta1/bin/pip install -r /opt/stack/nova/tools/pip-requires
./pypy-2.0-beta1/bin/pip install -r /opt/stack/nova/tools/test-requiresEventlet has some issues with PyPy that need to be patched. First we will install a specific version so we can be sure that the patches will work:
./pypy-2.0-beta1/bin/pip install eventlet==0.12.1There is an issue with corolocal. It appears that due to some difference in the PyPy implementation of WeakRef or the object model the check for the existence of a specific __init__ fails. Specifically thrl.__init__ can be None and the attempt to call it doesn't work. We work around it by just making sure it isn't None before calling it:
sed -i "s/if cls.__init__ is not object.__init__:/if cls.__init__ is not object.__init__ and thrl.__init__:/" pypy-2.0-beta1/site-packages/eventlet/corolocal.pyA more worrisome issue is that PyPy uses its own implementation of socket (it is similar to the Python3 implementation). Eventlet's green socket implementation isn't compatible and breaks down. PyPy's sockets have a method that eventlet's green sockets don't support called _decref_socketios. This is supposed to be used to keep track of reference counts. Completely rewriting eventlet's socket class to be compatible with PyPy is more that I want to deal with for a POC, but the following minimal patch makes it work:
cat | patch pypy-2.0-beta1/site-packages/eventlet/greenio.py << EOF
# fix issue with eventlet socket
335a336
> self._closed = False
337a339,341
> def _decref_socketios(self):
> self.__del__()
>
374,378c378,384
< try:
< os.close(self._fileno)
< except:
< # os.close may fail if __init__ didn't complete (i.e file dscriptor passed to popen was invalid
< pass
---
> if not self._closed:
> try:
> os.close(self._fileno)
> self._closed = True
> except:
> # os.close may fail if __init__ didn't complete (i.e file dscriptor passed to popen was invalid
> pass
EOFLibvirt can't be installed with pip so you will have to install it manually.
The first step is to create a short shell script called pypy-config. This tells the libvirt configure script where to find the PyPy include directory:
echo '#!/usr/bin/env bash' > $PWD/pypy-2.0-beta1/bin/pypy-config
echo "echo '-I$PWD/pypy-2.0-beta1/include/'" >> $PWD/pypy-2.0-beta1/bin/pypy-config
chmod 755 $PWD/pypy-2.0-beta1/bin/pypy-configNext we download the libvirt dependencies and source:
sudo apt-get build-dep libvirt
apt-get source libvirtConfigure libvirt to build with PyPy:
cd libvirt-0.9.8
./configure --libdir=/usr/lib --prefix=/usr --sysconfdir=/etc --localstatedir=/var --with-python=`dirname $PWD`/pypy-2.0-beta1/bin/pypyThe lxc code seems to have trouble finding libnl. I suspect this could be fixed by passing --without-lxc to configure above, but I manually fixed it in the Makefile:
sed -i 's/libvirt_lxc_CFLAGS = \\/libvirt_lxc_CFLAGS = $(LIBNL_CFLAGS) \\/' src/MakefileThe current version of PyPy doesn't support PyInstance_New. It looks like this patch adds support but it isn't in the beta, so we need to work around it by replacing the call with a call to PyEval_CallObject:
sed -i '1N;$!N;s/.*PyInstance_New.*\n.*\n.*/PyEval_CallObject(dom_class, pyobj_dom_args);/;P;D' python/libvirt-override.cNow we should be able to build libvirt:
makeDevstack has already installed the libvirt binaries, so we just need the python libraries. Note that PyPy wants its .so files to have a .pypy-20.so extension, so we copy them in with the proper name and make them executable:
for a in `ls python/.libs/*.so`; do b=$(echo ../pypy-2.0-beta1/site-packages/$(basename $a) | sed s/.so/.pypy-20.so/); cp $a $b; chmod 755 $b; doneNext we copy the .py files:
cp python/libvirt.py ../pypy-2.0-beta1/site-packages/
cp python/libvirt_*.py ../pypy-2.0-beta1/site-packages/Now that we've done all of the prep work, we can rerun the binaries using the PyPy versions:
killall screen
cd devstack
./rejoin_stack.shAll of the normal nova commands should work with our shiny new PyPy install of nova.
Overall nova runs pretty well. Startup time is a bit slow, but that is expected with a JIT. I did get one crash in nova-conductor after leaving it running for a while:
RPython traceback:
File "pypy_jit_metainterp_compile.c", line 133, in force_now_1
File "pypy_jit_metainterp_compile.c", line 219, in ResumeGuardForcedDescr_save_data
Fatal RPython error: AssertionError
AbortedThis seems to be a crash in the PyPy compiler, which might be because we are running a beta version. I suspect it my hacky eventlet patching has exposed a bug, but perhaps some other library just isn't totally happy with PyPy yet.
UPDATE: Stackless support has issues in beta1 as mentioned on the PyPy site, so this crash should be gone by the time 2.0 final is out.
UPDATE: The current version of PyPy disables the JIT when eventlet is enabled. This is actively being worked on in a branch. I will detail my attempt to get the branch running in a future post.
Despite my experience with smaller projects, nova runs a bit more slowly with PyPy than with CPython. Specifically, simple API requests take about twice as long and complex operations like instance launch can take 33% longer (8 seconds in PyPy vs 6. seconds in CPython). I'm surprised; I expected a big speedup since a lot of our time is spent in python code (SQLALchemy I'm looking at you).
It is possible that the JIT benefits are being offset by slowdowns in c extensions. Or, perhaps eventlet's greenlet switches are preventing the JIT from optimizing effectively. It's hard to say without doing some in-depth profiling.
PyPy is on the cusp of being ready for real production applications like this. Based on my experience with other projects, I suspect that with some optimization effort we could see a performance benefit over the CPython version.
Some upstream fixes are clearly needed to eventlet, and PyPy might need a little longer to bake for stability reasons, but we should definitely keep our eye on PyPy as an option in the future.
hello, thank you very much for the write up! any updates on using recent pypy?