Friday, September 2, 2011

Simple(r) concurrent Matlab programming with Theron

Preamble: What follows are my (long-winded) musings on the current (Sep. 2011) state of concurrent programming in Matlab. For a specific guide to using the Theron C++ library for Actors concurrency in Matlab, see the following post.

Concurrent Matlab

Matlab is a standard programming environment for scientific calculations. In our lab, we use it for a lot of analysis, especially prototyping new analyses. Eventually, some of the performance critical stuff gets ported over to Java, but we still have plenty of things we need to run that are Matlab-only and need to be fast.

As personal computers add more and more cores without improving clock speeds much, there is more and more need for parallelized (AKA concurrent, multithreaded) software. Matlab is not a very popular language for supercomputing, but high-performance clusters also of course demand parallelized software. Matlab has some support for multithreading internally, so depending on which Matlab internal functions you use, you may see significant multicore use on your machine. But there is currently no support for explicitly writing multithreaded programs in vanilla Matlab.

One solution is to start multiple instances of Matlab and use communication abstractions like sockets (also not supported in vanilla Matlab) or the file system to interact between the processes. This is an okay solution for coarsely (AKA embarrassingly) parallel algorithms. Several nicely abstracted solutions exist. Matlab's own Parallel Computing Toolbox with its parfor construct for example. Free solutions include the multicore package from MatlabCentral FileExchange, and MatlabMPI/pMatlab from MIT. Multicore and pMatlab both use the file system for interprocess communication. (They have somewhat different abstraction models that make them appropriate for somewhat different styles of multithreading.) One upside to using the file system for interprocess multithreading is that it is trivial to use a network file system to make your code not only multiprocess but actually distributed to multiple machines.

But what if you need more finely grained parallelism, for example you want to code algorithms that flexibly break a problem into hundreds of sub-tasks and maximally exploit available hardware concurrency? At present, this seems to require you to either give up on Matlab entirely, or else write lower-level code that links into Matlab. The two methods I'm familiar with for linking lower-level code into Matlab are to write Java, which Matlab communicates with fairly smoothly, or else write C/C++/Fortran and bring it in with Matlab's "mex" functionality.

Java core concurrency is established, but based around low level threads and locks. It's a minefield of synchronization issues. There are libraries available for more abstracted approaches, but I haven't worked with them. And if a contained segment of code is really performance critical, writing it in native C/C++ makes sense.

So what are the capabilities for writing concurrent C++ and linking it into Matlab? Unfortunately current C++ compilers do not have native concurrency support. But this is expected to change in the not-too-distant future. C++11, the new standard, was approved last month with a new thread library based on the threading in the popular Boost distribution. Unfortunately, the C++11 threading standard looks a lot like Java core concurrency, built around low-level threads and mutexes with no support for higher-level, safer, simpler concurrency abstractions. Luckily, there are libraries available offering concurrency abstractions for C++. The one I'd like to highlight today is Theron.

Theron

Theron is a concurrency library for C++ based on the Actors model, in which most of the low-level thread/mutex concurrency implementation details are hidden from the user. It is not a strict Actors model based tightly on the academic definition of Actors, but it is in that spirit. It offers an Actors abstraction based on a backing thread pool, so that creating new Actors is cheaper than actually creating new threads, allowing you to trivially have hundreds or thousands of Actors in one program. The pool of worker threads services the Actors efficiently, allowing high exploitation of available hardware concurrency. Actors are also defined to communicate only through message passing and not through shared memory, eliminating most of the booby traps that make low level concurrent programming such a pain. (There are ways to pass shared memory between Theron Actors if you really need to, e.g. if you have large input data structures you don't want to copy into messages. It's then up to the user to ensure that the Actors interact with the shared memory in well defined, safe ways. Nevertheless I have found that the Actors abstraction is very helpful for simplifying concurrent code even if some degree of shared memory is slipped in.)

Another nice feature of Theron is that it is designed to allow easy swapping in of different underlying thread models. Currently it has wrappers for native Win32 threading as well as Boost threads. The expectation is that as C++11 std::thread compilers become available, it will be easy to adapt Theron to use the standard threading. Theron also has a very nice website with clear documentation of the simple, powerful API, a clear tutorial, getting started guide, and good user support. Although Theron has historically been somewhat Windows-centric, I was able to download Theron, build it on my work Mac, and get a multithreaded program running in Matlab in about a day. Getting it working in Linux took longer, but I think I have it figured out now too; see next post for details.

Update: libcppa

I recently came across another C++ Actors project, libcppa. This is much less mature than Theron, and relies on a lot of C++11 features for its core functionality, but it looks interesting. It's hosted on GitHub, so could be easy to dig into despite not a lot of documentation. But Theron is clearly the choice for the time being.

No comments:

Post a Comment