Author Archives: Janis

Why import * is not allowed outside the global scope

So today I was looking at import machinery in Python due to this question about import styles in Persephone, an open source natural language processing library I’ve been contributing to.

Looking around at this more I’ve found a couple of things that I didn’t realize before:

Python 2.7.12 (default, Dec  4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> def foo():
...     from math import *
...     return sqrt(4)
...
<stdin>:1: SyntaxWarning: import * only allowed at module level
>>> foo()
2.0

This warning was introduced in Python 2.1 via PEP 227 the 2.1 release was the same release that introduced the Warnings framework itself. Nested scopes can break the import * so the language specification specifically prohibits this. However as you can see the CPython 2.x implementation doesn’t enforce it, so this produces a warning but works when you try to use it.

Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> def foo():
...    from math import *
...    return sqrt(4)
...
File "<stdin>", line 1
SyntaxError: import * only allowed at module level

This produces a syntax error. The Python3 approach here is much nicer considering that the code is prohibited by the spec. Why this happens is a lot more interesting though, and ultimately comes down to efficiency.

Basically a lot of efficiency gains can be had if the local variables are known at function parsing time.

So lets look at what a recent CPython implementation does. First lets look at FrameObject in include/frameobject.h:

typedef struct _frame {
    PyObject_VAR_HEAD
    struct _frame *f_back;      /* previous frame, or NULL */
    PyCodeObject *f_code;       /* code segment */
    PyObject *f_builtins;       /* builtin symbol table (PyDictObject) */
    PyObject *f_globals;        /* global symbol table (PyDictObject) */
    PyObject *f_locals;         /* local symbol table (any mapping) */
    PyObject **f_valuestack;    /* points after the last local */
    /* Next free slot in f_valuestack.  Frame creation sets to f_valuestack.
       Frame evaluation usually NULLs it, but a frame that yields sets it
       to the current stack top. */
    PyObject **f_stacktop;
    PyObject *f_trace;          /* Trace function */
    char f_trace_lines;         /* Emit per-line trace events? */
    char f_trace_opcodes;       /* Emit per-opcode trace events? */

    /* Borrowed reference to a generator, or NULL */
    PyObject *f_gen;

    int f_lasti;                /* Last instruction if called */
    /* Call PyFrame_GetLineNumber() instead of reading this field
       directly.  As of 2.3 f_lineno is only valid when tracing is
       active (i.e. when f_trace is set).  At other times we use
       PyCode_Addr2Line to calculate the line from the current
       bytecode index. */
    int f_lineno;               /* Current line number */
    int f_iblock;               /* index in f_blockstack */
    char f_executing;           /* whether the frame is still executing */
    PyTryBlock f_blockstack[CO_MAXBLOCKS]; /* for try and loop blocks */
    PyObject *f_localsplus[1];  /* locals+stack, dynamically sized */
} PyFrameObject;

This represents a Python stack frame.

Note we have f_localsplus which stores the locals variables.

Now how these frames are evaluated is interesting, because stack frames are frequently created and destroyed there is an optimization used to make function calls faster. Each code object has a “zombie frame” associated with it that allows a faster call because the frame itself doesn’t need to have memory allocated for it during the call. This frame gets created at the same time as the code object and has its memory allocated at creation time, so when the call happens this memory just gets reused and the required values are filled into this pre-existing frame directly.

There’s a particularly informative comment here in Objects/frameobject.c

 /* Stack frames are allocated and deallocated at a considerable rate.
   In an attempt to improve the speed of function calls, we:
   1. Hold a single "zombie" frame on each code object. This retains
   the allocated and initialised frame object from an invocation of
   the code object. The zombie is reanimated the next time we need a
   frame object for that code object. Doing this saves the malloc/
   realloc required when using a free_list frame that isn't the
   correct size. It also saves some field initialisation.
   In zombie mode, no field of PyFrameObject holds a reference, but
   the following fields are still valid:
     * ob_type, ob_size, f_code, f_valuestack;
     * f_locals, f_trace are NULL;
     * f_localsplus does not require re-allocation and
       the local variables in f_localsplus are NULL.

What this design means is that the struct member f_localsplus is created at the time that the code_object is created. (The code object is created when the function is first encountered) So it means that all local variable names must be defined at that point in time the function is defined so that these variables can be stored in this part of the frameobject structure. Not needing to constantly calculate the number of local variables to allocate and deallocate the memory needed for the function call is a significant win. This way the function when called doesn’t need to find the local variables, they are already there and they just need to be populated with the values. As a result of this optimization from mod import * must be made illegal within the function scope and hence this is why the syntax error is raised.

For an example of some implementation code that requires this in Objects/frameobject.c:

 static void _Py_HOT_FUNCTION
frame_dealloc(PyFrameObject *f)
{
    PyObject **p, **valuestack;
    PyCodeObject *co;

    if (_PyObject_GC_IS_TRACKED(f))
        _PyObject_GC_UNTRACK(f);

    Py_TRASHCAN_SAFE_BEGIN(f)
    /* Kill all local variables */
    valuestack = f->f_valuestack;
    for (p = f->f_localsplus; p < valuestack; p++)
        Py_CLEAR(*p);

This fast clearing of locals to enable stack frame reuse can only work if f_localsplus is always the same size. If the size could be different between stack frames a lot more checks would have to be inserted, but as discussed above this would be a performance hit in a very critical part of the implementation.

How a web page request makes it down to the metal

The other day I was interested in how many steps occur between sending a POST or GET request from a website to the actual processing that happens on the CPU of the server. I figured that I knew bits and pieces of the puzzle but I wanted to see the complete path from the highest levels of abstraction all the way to the lowest without missing anything too big in-between. It turns out that in a modern web system there are a lot of steps. I have been really fascinated by this much like the explorer that wants to find a path from one known place to another. If you are interested in better understanding how your computer works you might find walking along this path with your tech stack helpful.

Continue reading

Interfacing Python with C or C++

One of the main downsides of writing applications in Python is that they can run slow. When you run into an insurmountable runtime speed issue the commonly accepted wisdom is to either use a more efficient Python implementation such as PyPy or to write the performance critical code in a more efficient compiled language (usually C or C++).

In this article I’m going to focus on ways to integrate fast compiled code for use by Python code, I usually make a point of consider the feasibility of PyPy vs the costs of using a different implementation as it can give you great performance increases with less added complexity.

If you want to use C or C++ compiled code in your Python project here’s a quick summary of the options available to you:

If there’s an important option missing please leave a comment with information on it!

Examples

A good way to demonstrate some of these options is via a simple example.

Lets say we want to compute a factorial recursively and we are using a fairly naive recursive approach. (Note that function calls in Python are slow which is a non-accidental part of this example, sometimes we could choose a better algorithm, but from the point of view of an example this is a fairly simple function where choosing a better language implementation would result in a win.)

Here’s the python code equivalent:

def factorial(n):
    """Compute a factorial of n"""
    if n <= 0:
        return 1
    else:
        return n * factorial(n - 1)

Use the Python C interface

If you are using the cPython implementation you can use the C interface to call out to C code. Here’s an example:

#include <Python.h>

/* Computes the factorial value we need for the example */
int
compute_it(int x){
    if(x <= 0){
        return 1;
    }else{
        return compute_it(x - 1) * x;
    }
}

/* Python binding for our compute_it function */
static PyObject *
expensive_computations_compute_it(PyObject *self, PyObject *args){
    int input, result;

    if(!PyArg_ParseTuple(args, "i", &input)){
        return NULL;
    }
    result = compute_it(input);
    return Py_BuildValue("i", result);
}

/* Methods for our expensive_computations object*/
static PyMethodDef ExpensiveComputationMethods[] = {
    {"compute_it", expensive_computations_compute_it, METH_VARARGS, "Computes our test factorial function"},
    {NULL, NULL, 0, NULL} /* End of methods sential value */
};


/* define the module */
static struct PyModuleDef expensive_computations_module = {
    PyModuleDef_HEAD_INIT,
    "demo_module",
    -1,
    ExpensiveComputationMethods
};

/* Init code */
PyMODINIT_FUNC
PyInit_demo_module(void){
    return PyModule_Create(&expensive_computations_module);
}

int main(int argv, char* argc[]){
    /*Init the python interpreter */
    Py_Initialize();

    return 0;
}

More can be found about this way of going about extending Python in the docs.

Note that if your use case is just calling C library functions or system calls this is a poor choice, if you aren’t writing any custom code CFFI is a lot better in that case (not to mention more portable). One other thing that is worth mentioning is that there’s a lot of boilerplate here to convert between Python and C types. This is something that is handled better with some of the other libraries.

Interface Python with C++

Previously the easiest way to do this was with Boost.Python as the library greatly reduced the boilerplate. However boost is a huge dependency to introduce to a project and can be a massive pain to compile. (This is a fairly substantial downside if you weren’t already using boost with your c++ code.) Thankfully there’s another option now that lets you get interoperability without introducing a heavy dependency on boost: pybind11

This library is header only which makes it much easier to compile as part of your c++ project.See the pybind11 tutorial on here for more about using this option.

Roughly it will look like this:

#include <vector>
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
namespace py = pybind11;
/**
 * computes factorials
 * @param n   The input number
 */
int compute_factorial(int n){
    if(x <= 0){
        return 1;
    }else{
        return compute_factorial(x - 1) * x;
    }
}

/**
 * Generate the python bindings for this C++ function
 */
PYBIND11_PLUGIN(factorials_example) {
    py::module m("factorial_example", "Computing factorials in c++ with python bindings using pybind11");
    m.def("factorial_from_cpp", &compute_factorial, "A function which computes factorials in C++");
    return m.ptr();
}

Use CFFI

This is one of the biggest contributions from the PyPy ecosystem to the broader Python universe, it works on PyPy and cPython. The main idea behind CFFI is to generate an API to the C code without needing to worry about ABI issues. This gives you a rather large amount of efficiency in writing the code because a lot of the nasty details are hidden away from you. If you only need to make a call to existing C you don’t even need a C compiler run as part of the process, CFFI takes care of that for you.

Essentially you can write out your C code from within Python and CFFI will then go and compile the code and automatically create an API for you to use the C code from within Python. Honestly the whole idea is brilliant as it cuts down the time and effort required to consume existing C code. It also makes porting your code between different python implementations substantially easier because you don’t end up tied to the implementation specific way of creating extensions. You still need a C compiler and there’s a few limitations but overall this approach tends to work well. Going with the factorial example again it looks something like this:

# file "cffi_factorials.py"

from cffi import FFI
ffibuilder = FFI()

ffibuilder.cdef("int factorial_fast(int);")

ffibuilder.set_source("_example",
r"""
    static int factorial_fast(int n){
        if(x <= 0){
            return 1;
        }else{
            return compute_it(x - 1) * x;
        }
    }
""")

if __name__ == "__main__":
    ffibuilder.compile(verbose=True)

We then need to deal with actually building this as part of our build script because we need to call the C compiler to build our code before we can call it from python.

# file "setup.py"
from setuptools import setup

setup(
    ...
    setup_requires=["cffi>=1.0.0"],
    cffi_modules=["cffi_factorials.py:ffibuilder"],
    install_requires=["cffi>=1.0.0"],
)

As you can see this is much less verbose in the code than using the python C interface.

Embedding Python in another language

If the majority of your project is in another language but you need some sort of scripting ability embedding Python into your project can be a good option. This can be substantially easier overall than needing to write an interpreter in the main language. This is more of a high level decision but might be the right way for your project depending on what you are doing.

Putting the MMAP backed array library on PyPi

Given that the MMAP backed array library I have been working on is now in use with multiple projects I figure that the time has come to put it up on PyPi. The project started off as an in-house library that was used in a few projects, given we were making something fast for our internal use we didn’t spend a lot of time on making a nicely packaged library. Being able to have an editable pip install from a git repository was good enough. It’s been extremely useful for me so it is probably useful for other people too, the time had come to do some housekeeping and get it out there on PyPi.

That of course means doing all the things on the Python packaging user guide. Many of those things were already done but I needed to do the following:

  • Add a license file
  • Put the license type into the setup.py under license
  • Put my name in as the author in setup.py under author
  • Create a PyPi account and upload the package
  • Put a PyPi badge in the README file 😉

Then after that I ran it all through cheesecake to make sure I hadn’t missed anything critical

However these are rather small technical details that I can just grind through in an afternoon. They are necessary but not sufficient conditions for making a healthy open source library. The bigger issue is…

Making it easier for people to contribute

The bigger issue I wanted to address is making the project more friendly to contributors. I just finished watching VM Brasseur’s fantastic PyConAU 2016 talk about encouraging drive-through contributors and I realize there’s significant improvements that can be made with this library.

Specifically with this library:

  • I needed to put some effort into describing the contribution process. That meant writing a CONTRIBUTORS file that explains how to make contributions. In this I tried to make it clear how to make pull requests and what’s required to get them merged in. This resulted in starting to put together some python coding guidelines so that new contributors can get up to speed quickly.
  • Explain how people can get involved in the project.
  • Host documentation on readthedocs or similar.

If there’s anything else you can see that would make it easier to contribute I’d love to know, please leave a comment if there’s anything I’ve missed.

Update

The mmap_backed_array package is now up on PyPi!

Refactoring: Heuristics library episode 2 – Separation of concerns

In this video we discuss the separation of concerns and apply that to improving some of the code:

Summary

  • Discussed the separation of concerns concept
    • Improved code with this in mind
  • Set up an upstream repository and updated the fork of the repo to get new upstream changes
  • Made a PR with the changes in this episode: https://github.com/billy164/AAH/pull/4

If you have any questions please leave a comment!

Resolving CFFI install issues on Ubuntu

Some of my Python projects rely on a variety of packages, most of which install flawlessly via PIP. Recently I’ve been setting up a new machine running Ubuntu 16.04 and I ran into a few issues installing CFFI via PIP. Essentially CFFI requires some packages that are not installed by default on Ubuntu 16.04, the following should help you get past those issues.

When running pip install cffi the first failure you might see is something like this:

building '_cffi_backend' extension
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/c
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DUSE__THREAD -I/usr/include/ffi -I/usr/include/libffi -I/usr/include/python3.5m -I/home/janis/.virtualenvs/virt_env/include/python3.5m -c c/_cffi_backend.c -o build/temp.linux-x86_64-3.5/c/_cffi_backend.o
c/_cffi_backend.c:2:20: fatal error: Python.h: No such file or directory
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

----------------------------------------
Failed building wheel for cffi

This happens because Python development headers are not installed (as hinted at by the missing Python.h include file), to fix that we must install:

sudo apt-get install python3-dev

Or if for some reason you are running Python2 and the packages were not already installed:

sudo apt-get install python-dev

After installing that there’s a second fail you might encounter:

running build_ext
building '_cffi_backend' extension
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/c
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DUSE__THREAD -I/usr/include/ffi -I/usr/include/libffi -I/usr/include/python3.5m -I/home/janis/.virtualenvs/virt_env/include/python3.5m -c c/_cffi_backend.c -o build/temp.linux-x86_64-3.5/c/_cffi_backend.o
c/_cffi_backend.c:13:17: fatal error: ffi.h: No such file or directory
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

This is again a result of missing packages, this time libffi-dev:

sudo apt-get install libffi-dev

After installing that I got a successful CFFI install via PIP.

Refactoring: Heuristics library episode 1 – introduction

We have a whole new refactoring series thanks to a library Billy has asked us to improve:

Summary

Here’s what happened in this video:

  • First look at code
  • Fixed mutable default parameter
  • Set up Tox for testing
    • Required writing a setup.py file
  • Set up TravisCI for continuous integration
  • Set up Coveralls for test coverage reports
  • Created PR with changes made in this video: https://github.com/billy164/AAH/pull/1

The issue with mutable default parameters

In the code there was a mutable default parameter:

def var_depth_search(number_of_machines, depth, number_of_tasks, tasks=[], limit=-1): 
    if not tasks:
        tasks = [random.randint(0, LONGEST_DURATION) for _ in range(number_of_tasks)]
    else:
        number_of_tasks = len(tasks)

In this particular case it works fine because tasks is rewritten every time as tasks never gets mutated but the overall pattern could have easily lead to issues.

For example if  tasks gets appended to or mutated in any way that adds values to it the default value will change and the else branch gets taken. This is a bit of a concern going forward, see this code for an example of what can happen:

>>> def var_depth_search(tasks=[]):
...     if not tasks:
...         tasks.extend([1,2,3])
...     else:
...         tasks.append("a")
...     print(tasks)
... 
>>> var_depth_search()
[1, 2, 3]
>>> var_depth_search()
[1, 2, 3, 'a']
>>> var_depth_search()
[1, 2, 3, 'a', 'a']

You’d think if it was called with no arguments it should always go down the   if branch but that’s not what actually happens. Seeing as this was not the desired behavior I changed this code to check against  None to do the default parameter handling.

Generating python bindings from C++11 with pybind11

Code for this tutorial found here: https://github.com/shuttle1987/pybind11_tutorial_code

Recently I was involved in some discussion sparked by this blog post where “rdrewd” was talking about benchmarking a simple program that generated prime numbers in Python. The thing about this that I like is that it’s a very good example of a situation where making pragmatic decisions about implementation is highly relevant. I like that the author is focusing on the pragmatics of solving the underlying problem.

There’s a page on the python wiki about performance in python that is a good summary of some key issues. One passage I think is highly relevant (emphasis mine):

Some people are inappropriately obsessed with speed and think that just because C can provide better performance for certain types of problem, it must therefore be a better language for all purposes. Other people think that speed of development is far more important, and choose Python even for those applications where it will run slower. Often, they are surprised to find Python code can run at quite acceptable speeds, and in some cases even faster than what they could get from C/C++ with a similar amount of development time invested.

Usually it is not the absolute speed that is important, you should think about what would be an acceptable speed of execution. Optimizations beyond achieving this acceptable speed are wasteful of resources (usually: your time. And thus: money.).

The general philosophy of cPython is to write the bottleneck code in a faster language. The original blog post got my attention because it’s a very good example of a time where calling out to a faster language makes sense. Personally if I’m faced with a situation where the code is running in cPython I’ll try PyPy first because the speedups due to the tracing JIT are quite remarkable. Many times in my career switching to PyPy was sufficient to make the implementation of a project have feasible run times. The times it has not been enough I’ve used various FFI techniques to get the benefit of fast development in Python while running performance critical code in a faster language.

If there’s one thing I’d like people to take away from this is that choosing implementations is not a binary choice, sometimes mixing and matching is best.

Generation of prime numbers

I mostly want to talk about programming implementation details but first I think it’s good to go over some algorithms.

Even the best implementation in the world can’t overcome a bad algorithm choice in a problem like this. Efficiently generating prime numbers is actually quite involved.

Naive algorithms like the Sieve of Eratosthenes are conceptually easy to understand but allow for almost no locality of reference. Modern CPUs are very sophisticated and are often explicitly designed to leverage locality of reference via memory caching. Efficiently using the CPU cache can have huge impacts on the performance of otherwise similar looking code.

The best algorithms for generating primes allow for good locality of reference. For example the Sieve of Atkin has a better asymptotic complexity than the Sieve of Eratosthenes, but it also has much better locality of reference compared to a naive implementation of the Sieve of Eratosthenes. (Upon further investigation even though the asymptotic complexity for an idealized operation is lower for some algorithms the actual underlying operations on the hardware can have a significantly large constant factor. In a practical setting this cannot be overlooked, benchmarking as always is very useful)

A good implementation of a prime number generation algorithm will take locality into account. This will outperform naive implementations because the algorithms with better locality of reference allows for much better utilization of cache-backed CPU hardware.

All this applies for any language, when it matters good algorithms choice along with good implementation makes a huge difference.

EDIT: As Dana Jacobsen has pointed out in the comments the choice of of algorithm here is quite involved, and optimizing these algorithms makes a substantial difference, thanks for pointing out some considerations I overlooked.

Practical implementation

Lets say we have chosen the right algorithm and that algorithm intrinsically takes some serious computing power to run. How do we actually get that into code in a way we can easily use in our applications?

The reference code that sparked this off looks like this in Python3:

primes=[2]
cand=3
while len(primes) < 20:
    for factor in primes:
        if cand % factor == 0:
            # Not a prime
            break
    else:
       primes.append(cand)
    cand += 2
print(primes)

Original Python2 code on pastebin: http://pastebin.com/DHQtzN0s

For the sake of the article lets assume that this algorithm is the one we have decided to use. The blog post talked of the relative merits of using different languages, a C implementation was made and people benchmarked various different implementations of the same algorithm in different languages. Many people in the comments tried various languages and the inevitable discussions of the relative merits of languages came up. The author expressed regret nobody wrote a c++ implementation, so here’s a naive one I threw together today:

#include <iostream>
#include <vector>

/**
 * Generates the a sequence of prime numbers
 * @param how_many   how many primes to generate
 */
std::vector<int> generate_primes(unsigned int how_many){
    std::vector<int> primes;
    primes.reserve(how_many);
    primes.push_back(2); //2 is the first prime number
    unsigned int candidate = 3;
    bool potential_prime = true;
    while(primes.size() < how_many){
        potential_prime = true;
        for(unsigned int prime: primes){
            if(candidate % prime == 0){
                candidate += 2;
                potential_prime = false;
                break;
            }
        }
        if(potential_prime){
            primes.push_back(candidate);
        }
    }
    return primes;
}

int main(){
    std::vector<int> first_20_primes = generate_primes(20);
    for(int prime : first_20_primes){
        std::cout << prime << std::endl;
    }
    return 0;
}

This C++11 solution is fairly close to the original python code.

Comparatively speaking you can see that Python makes it really easy to implement something quickly. There’s more that can be done with c++ as well but this is about the limit of quickly putting something together. As is fairly obvious C++ is just a more verbose language.

Benchmarking

Decisions about performance benefit enormously from benchmarking. Because the impact of performance is very much non-uniform finding where the performance matters is a crucial step. If you are ever trying to decide if you need to use a foreign function interface out or any other means of writing code in a different language due to performance reasons you really need to be benchmarking your code to find out where the real bottlenecks are.

Some benchmarks were provided in the blog post that were on extremely small sets of data. The problem size was small enough where the IO completely eclipsed the run time. But that was the authors entire point. In a throwaway project this small the run time difference did not actually matter. In the particular case of generating 20 primes, just write it in Python, you will have the solution with the least amount of developer hours spent that way. The computational time is going to be small enough that the trade-off of time that would be required to write it in another language is just not worth it.

However what happens when you want to generate the first thousand primes? How about the first 10 thousand? What happens if you put this code on a popular web server where you are getting a lot of hits and you are charged for the amount of CPU time used?

We should seriously consider another implementation when the balance shifts away from ease of implementation being the main factor. Don’t make the decision lightly, if you are writing high quality code to begin with then using higher performance code that complicates the building of your project involves a real trade-off in the form of introducing additional complexity.

First we would benchmark and see where the bottlenecks are as this enables us to make a judgment call about the costs of improving the performance of the code by re-implementing those bottlenecks.

FFI and python

So lets say you have benchmarked and you have found that your python implementation of this function is such a bottleneck that you can’t satisfy your requirements, what do you do next?

One of the big things that has changed in recently is how much easier it is to make python bindings now. You don’t even need to know the python C-api, you can use something like pybind11 and just use that to generate your python modules.

Here’s how we would expose that C++ function to Python:

#include <vector>

#include <pybind11/pybind11.h>
#include <pybind11/stl.h>

namespace py = pybind11;

/**
 * Generates the a sequence of prime numbers
 * @param how_many   how many primes to generate
 */
std::vector<int> generate_primes(unsigned int how_many){
    std::vector<int> primes;
    primes.reserve(how_many);
    primes.push_back(2); //2 is the first prime number
    unsigned int candidate = 3;
    bool potential_prime = true;
    while(primes.size() < how_many){
        potential_prime = true;
        for(unsigned int prime: primes){
            if(candidate % prime == 0){
                candidate += 2;
                potential_prime = false;
                break;
            }
        }
        if(potential_prime){
            primes.push_back(candidate);
        }
    }
    return primes;
}

/**
 * Generate the python bindings for this C++ function
 */
PYBIND11_PLUGIN(primes_py3) {
    py::module m("example", "Generating primes in c++ with python bindings using pybind11");

    m.def("generate_primes", &generate_primes, "A function which generates a list of primes. The length of this list is specified by the user");

    return m.ptr();
}

Overall the amount of boilerplate is remarkably small. Also the pybind11 library is header only, which is great for keeping builds simple as we don’t need to link to it. However the build is still complex enough that making a script is worthwhile:

#!/bin/bash
g++ -O3 -fPIC -shared -std=c++11 -I /home/janis/code_snippets/prime_gen/pybind11-1.8.1/include `python3-config --cflags --ldflags` prime_gen_python3_bindings.cpp -o primes_py3.so

This compiles the C++ into a form that can be imported from python directly:

janis@janis-laptop:~/code_snippets/prime_gen (master)$ python3
Python 3.4.3+ (default, Oct 14 2015, 16:03:50)
[GCC 5.2.1 20151010] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import primes_py3
>>> primes_py3.__doc__
'Generating primes in c++ with python bindings using pybind11'
>>> primes_py3.generate_primes.__doc__
'generate_primes(int) -> list<int>\n\nA function which generates a list of primes. The length of the list is specified by the user\n'
>>> primes_py3.generate_primes(20)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

As you can see from the python side this is really easy to use, you don’t even know that the C++ is there.

When working on a team this can be a very powerful technique to have in your arsenal. By making performance critical modules in this way you can keep the ease of use for the consumers of the code while getting performance improvements.

Summary

Ease of development in smaller projects is one of the biggest wins that Python delivers, this is because speed of development is one of Python’s strongest points. C++ has very made very different decisions because the philosophy of the language is fundamentally different. C++ is designed to be powerful, it isn’t designed for ease of use in smaller projects, though it has got a lot easier to use in the more modern revisions of the language. Where C++ really shines is in the certain situations where the power it gives you is actually important, such as high performance requirements situations.

As this tutorial shows when there’s some performance bottlenecks you have options other than complete re-writes. Specifically you can just deal with the parts your profiling/benchmarking have shown to be problematic.

If you are writing your performance critical code in c++ using pybind11 is a convenient way to consume that code from Python. It is by no means the only way, you could use subprocesses or message queues like ZeroMQ, each of these has their own set of trade-offs. If you have C code then I’d recommend looking into CFFI, it might make your life a lot easier. I used it in my MMAP backed array library and saved myself a ton of hassles.

Essentially you want to use the right tool for the job, and sometimes this involves using more than one language.

If you have any questions about code implementation decisions and trade-offs please leave a comment below! Improving the performance of code is something we specialize in, contact us if you have a project you are looking for help with.

 

Deprecation warnings in python

When you are maintaining a package you want to be able to deprecate functionality and be able to give your users advance warning when you know that something is going away in the future. Because we wish to warn users without breaking the code we want to have the ability to give information without changing the flow of execution. Luckily Python has standard library support for warnings in the warnings module.

One thing you’ll notice is that warnings are actually in the standard exception hierarchy. Within this there’s some subclasses of Warning that provide more specific categories. Just like exceptions choosing the most specific correct category is a good practice.

When you want to warn your users that something is going away in the future there’s a PendingDeprecationWarning. Then when you actually deprecate it you can use DeprecationWarning.

The full list of categories can be found here: https://docs.python.org/3/library/warnings.html#warning-categories

The benefit of this approach is that by using convention you make your intentions to the users of your code unambiguous.

Here’s an example of how this works with methods:

#deprecation_example.py
import warnings

class Example:
    def __init__(self):
        self.items = [3,4,6,2,1]

    def slow_sort(self):
        warnings.warn(
            "slow_sort is deprecated, use fast_sort instead",
            DeprecationWarning
        )
        import time
        time.sleep(5)
        return sorted(self.items)

    def gone_in_future_version_sort(self):
        warnings.warn(
            "gone_in_future_version_sort will be deprecated in version 2, use fast_sort instead",
             PendingDeprecationWarning
        )
        return sorted(self.items)

    def fast_sort(self):
        return sorted(self.items)

Now when we run this:

janis:~/deprecation_tutorial$ python3
Python 3.4.3+ (default, Oct 14 2015, 16:03:50)
[GCC 5.2.1 20151010] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from deprecation_example import Example
>>> test = Example()
>>> test.slow_sort()
[1, 2, 3, 4, 6]
>>> test.gone_in_future_version_sort()
[1, 2, 3, 4, 6]
>>> test.fast_sort()
[1, 2, 3, 4, 6]

Now notice that nothing appeared to happen here. This is good! Warnings are for the sake of the developers, we do not want the execution for the users to change when we introduce a warning.

So in order to get the information displayed we have to start the interpreter with a flag  -Wd

janis:~/deprecation_tutorial$ python3 -Wd
Python 3.4.3+ (default, Oct 14 2015, 16:03:50)
[GCC 5.2.1 20151010] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from deprecation_example import Example
>>> test = Example()
>>> test.slow_sort()
/home/janis/deprecation_tutorial/deprecation_example.py:11: DeprecationWarning: slow_sort is deprecated, use fast_sort instead
  DeprecationWarning
[1, 2, 3, 4, 6]
>>> test.gone_in_future_version_sort()
/home/janis/deprecation_tutorial/deprecation_example.py:20: PendingDeprecationWarning: gone_in_future_version_sort will be deprecated in version 2, use fast_sort instead
  PendingDeprecationWarning
[1, 2, 3, 4, 6]
>>> test.fast_sort()
[1, 2, 3, 4, 6]

Now we see the warnings!

Say we want to write some code in a script:

from deprecation_example import Example
test_instance = Example()
test_instance.slow_sort()
test_instance.gone_in_future_version_sort()
test_instance.fast_sort()

Just like the REPL example running this with no flags to python will not display any warnings output.

janis:~/deprecation_tutorial$ python3 test_script.py
janis:~/deprecation_tutorial$

Invoking the interpreter with the  -Wd flag will however give us info:

janis:~/deprecation_tutorial$ python3 -Wd test_script.py
/home/janis/deprecation_tutorial/deprecation_example.py:11: DeprecationWarning: slow_sort is deprecated, use fast_sort instead
  DeprecationWarning
/home/janis/deprecation_tutorial/deprecation_example.py:20: PendingDeprecationWarning: gone_in_future_version_sort will be deprecated in version 2, use fast_sort instead
  PendingDeprecationWarning

Now say we wanted to do that via code:

import warnings
warnings.simplefilter('default')
from deprecation_example import Example
test_instance = Example()
test_instance.slow_sort()
test_instance.gone_in_future_version_sort()
test_instance.fast_sort()

Now we have set it up we get the warning reporting even without the interpreter flag:

janis:~/deprecation_tutorial$ python3 test_script.py
/home/janis/deprecation_tutorial/deprecation_example.py:11: DeprecationWarning: slow_sort is deprecated, use fast_sort instead
  DeprecationWarning
/home/janis/deprecation_tutorial/deprecation_example.py:20: PendingDeprecationWarning: gone_in_future_version_sort will be deprecated in version 2, use fast_sort instead
  PendingDeprecationWarning

Note that if you are using the standard library logging you can set it up to log all warnings with logging.captureWarnings . In production systems this is a very good idea, as you want to gather as much information as you can about things that will impact your code.

If you have any questions please feel free to post a comment below!

Which techniques help us be more productive?

A couple of years ago I was asked by a junior developer if I had any suggestions for techniques they could learn about to improve their productivity. It was a great question, one which I’m really glad I was asked as it showed interest and also got me seriously thinking about how you can improve the productivity of a development team. One of the striking things is how despite it being a very broad topic the best things to learn will be highly dependent on the skills of the individual in question and the projects that they are working on. In many ways in the IT industry the best approaches to improving productivity are related to the best approaches to learning, more so than in any other industry I can think of, as knowledge is often the biggest barrier to better productivity. In such a fast changing industry learning new skills effectively is a huge asset, being able to efficiently learn improves productivity enormously. Often the trickiest question is figuring out what techniques to learn next. Hopefully after reading this you have an idea of how go about answering that question. Continue reading