Search This Blog

Labels

Friday, December 31, 2010

Charming Python: Make Python run as fast as C with Psyco——Using Psyco, the Python specializing compiler

David Mertz (mertz@gnosis.cx), Freelance writer, Gnosis Software, Inc.

Summary:  In some ways the design of Python resembles the design of Java. Both utilize a virtual machine that interprets specialized pseudo-compiled bytecodes. One area where JVMs are more advanced than Python is in optimizing the execution of bytecodes. Psyco, a Python specializing compiler, helps to even the playing field. Right now Psyco is an external module, but it could someday be included in Python itself. With only a tiny amount of extra programming, Psyco can often be used to increase the speed of Python code by orders of magnitude. In this article, David looks at what Psyco is, and tests it in some applications.

Python is usually fast enough for what you want it to do. Ninety percent of the concerns that novice programmers express about the execution speed of an interpreted/byte-compiled language like Python are simply naive. On modern hardware, most non-optimized Python programs run as fast as they need to, and there is really no point in spending extra programming effort to make an application run faster.

In this installment, therefore, I'm only interested in that other ten percent. Once in while, Python programs (or programs in other languages) run impracticably slowly. What is needed for a given purpose varies greatly; shaving off milliseconds is rarely compelling, but speeding up tasks that run for minutes, hours, days, or weeks is often worthwhile. Moreover, it should be noted that not everything that runs slowly is CPU-bound. If a database query takes hours to complete, for example, it makes little difference whether the resulting dataset takes one or two minutes to process. This installment is also not about I/O issues.

There are a number of ways to speed up Python programs. The first technique that should come to every programmer's mind is improving the algorithms and data structures used. Micro-optimizing the steps of an inefficient algorithm is a fool's errand. For example, if the complexity order of the current technique is O(n**2), speeding up the steps by 10x is a lot less helpful than finding an O(n) substitute. This moral applies even when considering an approach as extreme as a rewrite in assembly: the right algorithm in Python will often go faster than the wrong algorithm in hand-tuned assembly.

The second technique you ought to consider first is to profile your Python application, with an eye to rewriting key portions as C extension modules. Using an extension wrapper like SWIG (see Resources), you can create a C extension that executes the most time consuming elements of your program as C code. Extending Python in this manner is relatively straightforward, but require some time to learn (and knowledge of C). Very often you will find that the large majority of the time spent in executing your Python application is spent in just a handful of functions, so considerable gains are possible.

A third technique builds on the second. Greg Ewing has created a language called Pyrex, which melds Python and C. In particular, to use Pyrex, you write functions in a Python-like language that adds type-declarations to selected variables. Pyrex (the tool) processes ".pyx" files into ".c" extensions. Once compiled with a C compiler, these Pyrex (the language) modules can be imported into and used in your regular Python applications. Since Pyrex uses almost the same syntax as Python itself (including loop, branch and exception statements, assignment forms, block indentation, etc.) a Pyrex programmer need not learn C to write extensions. Moreover, Pyrex allows more seamless mixing of C-level variables and Python-level variables (objects) within the same code than does an extension written directly in C.

A final technique is the subject of this installment. The extension module Psyco can plug in to the guts of the Python interpreter, and selectively substitute optimized machine code for portions of Python's interpreted bytecode. Unlike the other techniques described, Psyco operates strictly at Python runtime. That is, Python source code is compiled by the python command to bytecode in exactly the same manner as before (except for a couple import statements and function calls added to invoke Psyco). But while the Python interpreter is running an application, Psyco sometimes checks to see if it can substitute some specialized machine code for regular Python bytecode actions. This specialized compilation is both very similar to what Java just-in-time compilers do (broadly speaking, at least) and is architecture-specific. As of right now, Psyco is only available for i386 CPU architectures. The nice thing about Psyco is that you can use the very same Python code you have been writing all along (literally!), but let it run much faster.

How Psyco works

To understand Psyco completely, you probably need to have a good grasp of both the Python interpreter's eval_frame()function and i386 Assembly. Unfortunately, I myself can claim neither expertise -- but I think I can explain Psyco in outline without going too far wrong.

In regular Python, the eval_frame() function is the inner loop of the Python interpreter. Basically, the eval_frame() function looks at the current bytecode in an execution context, and switches control out to a function appropriate for implementing that bytecode. The specifics of what this support function will do depend, in general, upon the states of various Python objects held in memory. To make it simple, adding the Python objects "2" and "3" produces a different result than adding the objects "5" and "6," but both operations are dispatched in a similar way.

Psyco replaces the eval_frame() function with a compound evaluation unit. There are several ways that Psyco is able to improve upon what Python does. In the first place, Psyco compiles operations to somewhat optimized machine code; in itself this produces only slight improvements, since what the machine code needs to accomplish is the same as what Python's dispatched functions do. Moreover, what is "specialized" in Psyco compilation is more than the choice of Python bytecodes, Psyco also specializes over variable values that are known in execution contexts. For example, in code like the below, the variable x is knowable for the duration of the loop:

x = 5
l = []
for i in range(1000):
l.append(x*i)

An optimized version of this code need not multiply each i by "the content of the x variable/object" -- it is less expensive to simply multiply each i by 5, saving a lookup/dereference.

Aside from creating i386-specific codes for small operations, Psyco caches this compiled machine code for later reuse. If Psyco is able to recognize that a particular operation is the same as something that was performed (and "specialized") earlier, it can rely on this cached code, rather than need to recompile the segment. This saves a bit more time.

The real savings in Psyco, however, arise from Psyco's categorization of operations into three different levels. For Psyco, there are "run-time," "compile-time" and "virtual-time" variables. Psyco promotes and demotes variables between the levels as needed. Run-time variables are simply the raw bytecodes and object structures that the regular Python interpreter handles. Compile-time variables are represented in machine registers and directly-accessed memory locations, once the operations have been compiled by Psyco into machine code.

The most interesting level is the virtual-time variables. A Python variable is, internally, a complete structure, with lots of members -- even when the object only represents an integer. Psyco virtual-time variables represent Python objects that could potentially be built if the need arose, but whose details are omitted until they are. For example, consider an assignment like:

x = 15 * (14 + (13 - (12 / 11)))

Standard Python builds and destroys a number of objects to compute this value. An entire integer object is built to hold the value of(12/11); then a value is pulled out of the temporary object's structure, and used to compute a new temporary object (13-PyInt). Psyco skips the objects, and just computes the values, knowing that an object can be created "if needed" from the value.

Using Psyco


Explaining Psyco is relatively difficult, but using Psyco is far easier. Basically, all there is to it is telling the Psyco module which functions/methods to "specialize." No code changes need be made to any of your Python functions and classes themselves.

There are a couple approaches to specifying what Psyco should do. The "shotgun" approach is to enable Psyco just-in-time operation everywhere. To do that, put the following lines at the top of your module:

import psyco ; psyco.jit() 
from psyco.classes import *

The first line tells Psyco to do its magic on all global functions. The second line (in Python 2.2 and above) tells Psyco to do the same with class methods. To target Psyco's behavior a bit more precisely, you can use the commands:

psyco.bind(somefunc)          # or method, class
newname = psyco.proxy(func)

The second form leaves func as a standard Python function, but optimizes calls involving newname. In almost all cases other than testing and debugging, the psyco.bind() form is what you will use.

Psyco's performance


As magic as Psyco is, using it still requires a little thought and testing. The main thing to understand is that Psyco is useful for handling blocks that loop many times, and it knows how to optimize operations involving integers and floating point numbers. For non-looping functions, and for operations on other types of objects, Psyco mostly just adds overhead for its analysis and internal-compilation. Moreover, for applications with large numbers of functions and classes, enabling Psyco application-wide adds a large burden in machine-code compilation and memory-usage for this caching. It is far better to selectively bind those functions that can benefit most from Psyco's optimizations.

I started my testing in a completely naive fashion. I simply considered what application I had run recently that I would not mind speeding up. The first example that came to mind was a text-manipulation program I use to convert drafts of my forthcoming book (Text Processing in Python) to LaTeX format. This application uses some string methods, some regular expressions, and some program logic driven mostly by regular expressions and string matches. It is actually a terrible candidate for Psyco, but I use it, so I started there.

On the first pass, all I did was add psyco.jit() to the top of my script. Painless enough. Unfortunately, the results were (expectedly) disappointing. Where the script initially took about 8.5 seconds to run, after Psyco's "speedup" it ran in about 12 seconds. Not so good! I guessed that the just-in-time compilation probably had some startup overhead that swamped the running time. So the next thing I tried was processing a much larger input file (consisting of multiple copies of the original one). This gave the very limited success of reducing running time from about 120 seconds to 110 seconds. The speedup was consistent across several runs, but fairly insignificant either way.

Second pass with my text processing candidate. Instead of adding a blanket psyco.jit() call, I added only the linepsyco.bind(main), since the main() function does loop a number of times (but only makes minimal use of integer arithmetic). The results here were nominally better. This approach shaved a few tenths of a second off the normal running time, and a few seconds off the large-input version. But still nothing spectacular (but also no harm done).

For a more relevant test of Psyco, I dug up some neural network code that I had written about in an earlier article (see Resources). This "code_recognizer" application can be trained to recognize the likely distribution of different ASCII values in different programming languages. Potentially, something like this could be useful in guessing file types (say of lost network packets); but the code is actually completely generic as to what it is trained on -- it could learn to recognize faces, or sounds, or tidal patterns just as easily. In any case, "code_recognizer" is based on the Python library bpnn, which is also included (in modified form) as a test case with the Psyco 0.4 distribution. The important thing to know about "code_recognizer" for this article is that it does a lot of looping floating point math, and it takes a long time to run. We have got a good candidate for Psyco to work on here.

After a little playing around, I established several details about how to use Psyco. For this application, with just a small number of classes and functions, it does not make too much difference whether you use just-in-time or targeted binding. But the best result, by a few percentage points, still comes about by selectively binding the best optimizable classes. More significantly, however, it is important to understand the scope of Psyco binding.

The code_recognizer.py script contains lines like:

from bpnn import NN

class NN2(NN):
# customized output methods, math core inherited

That is, the interesting stuff from Psyco's point of view is in the class bpnn.NN. Adding either psyco.jit() orpsyco.bind(NN2) to the code_recognizer.py script has little effect. To get Psyco to do the desired optimization, you need to either add psyco.bind(NN) to code_recognizer.py or add psyco.jit() to bpnn.py. Contrary to what you might assume, just-in-time does not happen when an instance is created, or methods run, but rather in the scope where the class is defined. In addition, binding descendent classes does not specialize their methods that are inherited from elsewhere.

Once the small details of proper Psyco binding are worked out, the resultant speedups are rather impressive. Using the same test cases and training regime the referenced article presented (500 training patterns, 1000 training iterations), neural net training time was reduced from about 2000 seconds to about 600 seconds -- better than a 3x speedup. Reducing the iterations as low as 10 showed proportional speedups (but worthless neural net recognition), as did intermediate numbers of iterations.

I find bringing running time down from more than 1/2 hour to about 10 minutes with two lines of new code to be quite remarkable. This speedup is still probably less than the speed of a similar application in C, and it is certainly less than the 100x speedup that a few isolated Psyco test cases exhibit. But this application is fairly "real life" and the improvements are enough to be significant in many contexts.

Whither Psyco?


Psyco currently does not perform any sort of internal statistics or profiling, and does only minimal optimization of generated machine code. Potentially, a later version might know how to target those Python operations that could actually benefit most, and discard cached machine code for non-optimizable sections. In addition, perhaps a future Psyco could decide to perform more extensive (but more costly) optimizations on heavily-run operations. Such runtime analysis would be similar to what Sun's HotSpot technology does for Java. The fact that Java, unlike Python, has type-declarations is actually less significant than many people assume (but prior work in optimization of Self, Smalltalk, Lisp, and Scheme make this point also).

Although I suspect it will never actually happen, it would be exciting to have Psyco-type technology integrated into some future version of Python itself. A few lines for imports and bindings is not much to do, but letting Python just inherently run much faster would be even more seamless. We shall see.

Resources



  • Read the previous installments of Charming Python.
  • Find more information at Psyco's home page and project page at SourceForge.
  • The Simplified Wrapper and Interface Generator (SWIG) is a very widely -- perhaps predominantly -- used tool for writing C/C++ modules for Python and other "scripting" languages.
  • Greg Ewing has created the language Pyrex, which is used for writing Python extension modules. The idea behind Pyrex is to define a language that looks very close to Python itself, and that allows a mixture of Python and C datatypes to be combined, but which is ultimately transformed and compiled into a Python C-extension.
  • John Max Skaller's Vyper language was intended to be an enhanced Python, implemented in OCaml. One upshot that was hoped for in the project was compilation to the same machine code OCaml generates, which is generally comparable with the speed of C. Unfortunately, Vyper is a dead project, and a compiling version was never completed. Read David's interview with Skaller back when the project was alive (developerWorks, October 2000).
  • David co-authored with Andrew Blais An Introduction to neural networks (developerWorks, July 2001). In that article, they provided some code based on Neil Schemenauer's Python module bpnn. The current article utilizes that neural network code to demonstrate Psyco's capabilities.
  • Find more Linux articles in the developerWorks Linux zone.

No comments:

Post a Comment