dmalcolm (dmalcolm) wrote,

Automatically detecting reference-count bugs in Python extension modules

[ For the tl;dr version, scroll down to see the pretty screenshots :) ]

I've been working on a static analysis tool to automatically detect reference-count bugs in C Python extension code.

(see my earlier posts on verifying calls to the PyArg_ParseTuple API here and here)

Mismanaging reference counts can lead to the python process leaking memory (and other resources), for when an object becomes immortal, or segfaulting, when an object is cleaned up when things still refer to it.

My "cpychecker" code is still an early prototype (don't expect to use it on arbitrary C code yet), but here's an example of some of the things it's already capable of:

Can you see the reference-counting error in this (contrived) code fragment?
    22	PyObject *
    23	refcount_demo(PyObject *self, PyObject *args)
    24	{
    25	    PyObject *list;
    26	    PyObject *item;
    27	    list = PyList_New(1);
    28	    if (!list)
    29	        return NULL;
    31	    item = PyLong_FromLong(42);
    32	    if (!item)
    33	        return NULL;
    35	    PyList_SetItem(list, 0, item);
    36	    return list;
    37	}
    39	static PyMethodDef test_methods[] = {
    40	    {"refcount_demo",  refcount_demo, METH_VARARGS, NULL},
    41	    {NULL, NULL, 0, NULL} /* Sentinel */
    42	};

Compiling like this:
  [david@fedora-15 gcc-plugin]$ ./gcc-with-python -I/usr/include/python2.7 refcount-demo.c

the checker adds this output to gcc's:
refcount-demo.c: In function ‘refcount_demo’:
refcount-demo.c:37:1: error: ob_refcnt of PyListObject is 1 too high
refcount-demo.c:27:10: note: PyListObject allocated at:     list = PyList_New(1);
refcount-demo.c:27:10: note: when PyList_New() succeeds at:     list = PyList_New(1);
refcount-demo.c:28:8: note: when taking False path at:     if (!list)
refcount-demo.c:31:10: note: reaching:     item = PyLong_FromLong(42);
refcount-demo.c:31:10: note: when PyLong_FromLong() fails at:     item = PyLong_FromLong(42);
refcount-demo.c:32:8: note: when taking True path at:     if (!item)
refcount-demo.c:33:9: note: reaching:         return NULL;
refcount-demo.c:37:1: note: when returning

which can be navigated in any IDE that can parse GCC's output messages (works for me in emacs).

This demonstrates a particular path of execution that has a bug.

I found the textual output a bit heavy on the eye, so I've hacked up the plugin script so it can render graphical HTML visualizations of the errors that it finds.

Here's that same report, in HTML form:

The report shows the control flow through the function: lines that get executed are written in bold and outlined in blue, with arrows connecting them, and additional annotations in italics. (I'm not so good at HTML/CSS, so help here would be most welcome!).

I used the jsplumb JavaScript library to add lines to the HTML to link together elements. This uses the newish <canvas> element, so the control-flow lines may only appear in recent browsers. It works for me in Chromium 12 and Firefox 4. You can see the HTML report itself here:

(Currently it's hardcoded to generate the reports, but I'll probably add something like a -fdump-html command-line option to the gcc-with-cpychecker harness).

Here are some more examples:

Detecting the all-too-common: "return Py_None;" bug:


Another (very contrived) reference leak:


Detecting a stray Py_INCREF that makes the reference count too high, or segfaults python, depending on what happened earlier:


This is still an experimental prototype, so it's not yet ready for general purpose use, but I'm frantically working on it, and I hope it will be ready in time for inclusion in Fedora 16.

The checker is Free Software (licensed under GPLv3 or later), and if you want to get involved, go to (as I said above, I could really use some help with HTML and CSS! The checker is written in Python itself, if you're interested in hacking on the code).

(Thanks to Red Hat for allowing me to spend a substantial proportion of my $DAYJOB on this)
Tags: fedora, gcc, python

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your IP address will be recorded 

  • 1 comment