Table of Contents
Interactive Parallel Computing with IPython
Updated: 3/21/2006
Overview
IPython is currently being redesigned. One of the goals in this process is to make it possible to develop, execute, and debug parallel applications interactively and collaboratively using IPython. This page contains an overview of the parallel model we are using as well as notes on its implementation.
Parallel computing has been around a long time. In spite of this, relatively little has been done on interactive parallel computing. We should give some examples of other work alng these lines here.
In the langauage of Flynn's classification of computer architectures, our parallel computing system has a MIMD (multiple instruction/multiple data) architecture. This is the most general type of computing architecture and its flexibility allows us to create an interactive parallel computing system that is capable of expressing many sytles of parallelism with little difficulty. Furthermore, a MIMD architecture is the only parallel architecture that supports true collaborative computing. This is because the presence of multiple users implies the presence of multiple instruction/data streams. The challenge is to coordinate the instruction streams of the different users and manage shared data in a meaningful way.
The implementation of this sytem is based on the IPython Kernel and the related IPython Kernel Client. The basic ideas are these:
- A set of IPython kernels running on different processors constitute a cluster that can be used for parallel computations. These kernels have persistent Python namespaces and can execute Python code in those namespaces.
- A user works with a set of kernels using a lightweight InteractiveCluster? object that lives in their local Interactive IPython session. This object has methods for i) executing Python commands on kernels, ii) sending (pushing) Python objects from the local namespace to the kernel's namespaces and iii) retrieving (pulling) Python objects from the kernel's namespaces into the local namespace. By sending different commands/objects to and from the kernels the user can perform parallel computations interactively.
The MIMD design of this architecture comes from the fact that each can kernel execute different commands and store different Python objects in its namespace.
Design Goals
Here are more specific design goals:
- A user should be able to disconnect from and reconnect to a set of running kernels at any time.
- Multiple users should be able to access a set of kernels simultaneously.
- A user should be able to work with kernels interactively, like the regular serial IPython.
- The kernels should be accessible through multiple network protocols
- Kernels should be able to be started in a manner that is transparent to the user at any time.
- The user interface to the remotely running kernels should be lightweight.
- The known limitations of Python's Global Interpreter Lock (GIL) should be hidden from the user.
- The system should support many different styles of parallelism, from distributed memory and message passing to task farming and tuple spaces.
Comparison with other models of distributed/parallel computing
Message Passing Interface (MPI)
Message passing is the most popular method for multiple processes to communicate with each other. In general, message passing enables MIMD type parallel computations to be run. Thus, message passing is very consistent with our architecture. Because of this, we are designing our kernels to be able to make calls to standard message passing interface (MPI) libraries. The MPI calls are passed to the kernels just like any other command. This enables MPI based applications to be executed interactively and also allows Python objects to be passed between kernels using the highly optimized capabilities of MPI.
As a side note, many MPI programs are written using a SIMD style. This style of parallel program can also be executed interactively by sending identical commands to each kernel.
Remote objects/Remote Procedure Call
One common model of distributed computing is the Remote Procedure Call (RPC). In this model, the methods of objects that live on remote processes are invoked over the network. An RPC typically has three stages:
- The remote procedure is called in the local process with a list of arguments (python objects).
- The remote procedure is executed in the remote process with the list of arguments that have been sent over the network.
- The return value of the procedure is brought back to the local process and given to the user.
While these three steps may occur asynchronously, the RPC model typically assumes that all three stages occur every time. There are some weaknesses in this model however that make it unsuitable for our purposes:
- If a remote procedure takes a long time to execute, the local process must wait for it to complete to get the results. That is, if the local process dies while the RPC is executing, the result is lost.
- The results of each RPC call are stored on the local process only. Thus, multiple users cannot access the results of previous RPC calls so that collaboration is not possible.
- The RPC style of computation does not scale to a large number remote objects on different processors.
Because of these difficulties, we are not using the RPC style of distributed computation. But, the ideas of RPC distributed computation can clarify our model. In our model, the three stages of an RPC are independent. That is, the tasks of pushing python objects to a remote process, executing python code and retreiving (or pulling) python objects from a remote process can all be performed independently of each other. This makes is possible to:
- Store the results of previously executed commands persistently in the kernels namespace, just like a standard python session.
- Results can be retreived later as they are simply saved in the namespace of the kernels.
- Multiple users can access the results of computations as they are stored persistently in the kernel's namespace.
Thus, our system is like an RPC system, but where the user's process is assumed to be extremely transient. This is made up for by the remote processes, the kernels, being more stateful that in a usual RPC system.
Tuple Spaces/Linda
The basic idea behind Linda is Virtual Shared Memory or VSM. Virtual Shared Memory allows multiple processes to share a common, remotely managed namespace. There are a number of implementations of this idea in a variety of langauges (including Python):
While there are many similarities between the user interface of these systems and our architecture, they are fundamentally different. In a VSM system, each process in a parallel computation shares the same virtual namespace. While each process could also have its own private namespace, the parallelism is implemented using shared memory. In this model, all computational processes connect to a central service that is hosting the VSM. The VSM is also designed to be persistent.
In our model, each kenel has its own namespace and there is no centralized shared memory. With that said, it would be nearly trivial to add VSM capabilities to our kernels. We could simply create a new VSM server on some machine and then each kernel could read/write to it as needed. This might be useful in some cases.
Another interesting feature that some VSMs have is a blocking retrieval of an object from VSM. That is, if one process tries to get an non-existant object from a VSM, two things can happen. First, the attempt can fail immediately. This is what the Linda folks would call a non-blocking operation. The second option is that the retrieval operation blocks until some other process creates the object, at which time it is retreived. Thus, in this blocking mode, retrieving an object from the VSM becomes a synchronizing operation.
