Charm++
Charm++ is a parallel object-oriented programming paradigm based on C++ and developed in the Parallel Programming Laboratory at the University of Illinois at Urbana–Champaign. Charm++ is designed with the goal of enhancing programmer productivity by providing a high-level abstraction of a parallel program while at the same time delivering good performance on a wide variety of underlying hardware platforms. Programs written in Charm++ are decomposed into a number of cooperating message-driven objects called chares. When a programmer invokes a method on an object, the Charm++ runtime system sends a message to the invoked object, which may reside on the local processor or on a remote processor in a parallel computation. This message triggers the execution of code within the chare to handle the message asynchronously.
Paradigm | Message-driven parallel programming, migratable objects, Object-oriented, asynchronous many-tasking |
---|---|
Designed by | Laxmikant Kale |
Developer | Parallel Programming Laboratory |
First appeared | late 1980s |
Stable release | 6.10.2
/ August 4, 2020 |
Implementation language | C++, Python |
Platform | Cray XC, XK, XE, IBM Blue Gene/Q, Infiniband, TCP, UDP, MPI, OFI |
OS | Linux, Windows, macOS |
Website | http://charmplusplus.org |
Chares may be organized into indexed collections called chare arrays and messages may be sent to individual chares within a chare array or to the entire chare array simultaneously.
The chares in a program are mapped to physical processors by an adaptive runtime system. The mapping of chares to processors is transparent to the programmer, and this transparency permits the runtime system to dynamically change the assignment of chares to processors during program execution to support capabilities such as measurement-based load balancing, fault tolerance, automatic checkpointing, and the ability to shrink and expand the set of processors used by a parallel program.
Applications implemented using Charm++ include NAMD (molecular dynamics) and OpenAtom (quantum chemistry), ChaNGa and SpECTRE (astronomy), EpiSimdemics (epidemiology), Cello/Enzo-P (adaptive mesh refinement), and ROSS (parallel discrete event simulation). All of these applications have scaled up to a hundred thousand cores or more on petascale systems.
Adaptive MPI (AMPI)[1] is an implementation of the Message Passing Interface standard on top of the Charm++ runtime system and provides the capabilities of Charm++ in a more traditional MPI programming model. AMPI encapsulates each MPI process within a user-level migratable thread that is bound within a Charm++ object. By embedding each thread in a chare, AMPI programs can automatically take advantage of the features of the Charm++ runtime system with little or no changes to the MPI program.
Charm4py allows writing Charm++ applications in Python, supporting migratable Python objects and asynchronous remote method invocation.
Example
Here is some Charm++ code for demonstration purposes:[2]
- Header file (hello.h)
class Hello : public CBase_Hello {
public:
Hello(); // C++ constructor
void sayHi(int from); // Remotely invocable "entry method"
};
- Charm++ Interface file (hello.ci)
module hello {
array [1D] Hello {
entry Hello();
entry void sayHi(int);
};
};
- Source file (hello.cpp)
# include "hello.decl.h"
# include "hello.h"
extern CProxy_Main mainProxy;
extern int numElements;
Hello::Hello() {
// No member variables to initialize in this simple example
}
void Hello::sayHi(int from) {
// Have this chare object say hello to the user.
CkPrintf("Hello from chare # %d on processor %d (told by %d)\n",
thisIndex, CkMyPe(), from);
// Tell the next chare object in this array of chare objects
// to also say hello. If this is the last chare object in
// the array of chare objects, then tell the main chare
// object to exit the program.
if (thisIndex < (numElements - 1)) {
thisProxy[thisIndex + 1].sayHi(thisIndex);
} else {
mainProxy.done();
}
}
# include "hello.def.h"
Adaptive MPI (AMPI)
Adaptive MPI is an implementation of MPI (like MPICH, OpenMPI, MVAPICH, etc.) on top of Charm++'s runtime system. Users can take pre-existing MPI applications, recompile them using AMPI's compiler wrappers, and begin experimenting with process virtualization, dynamic load balancing, and fault tolerance. AMPI implements MPI "ranks" as user-level threads (rather than operating system processes). These threads are fast to context switch between, and so multiple of them can be co-scheduled on the same core based on the availability of messages for them. AMPI ranks, and all the data they own, are also migratable at runtime across the different cores and nodes of a job. This is useful for load balancing and for checkpoint/restart-based fault tolerance schemes. For more information on AMPI, see the manual: http://charm.cs.illinois.edu/manuals/html/ampi/manual.html
Charm4py
Charm4py[3] is a Python parallel computing framework built on top of the Charm++ C++ runtime, which it uses as a shared library. Charm4py simplifies the development of Charm++ applications and streamlines parts of the programming model. For example, there is no need to write interface files (.ci files) or to use SDAG, and there is no requirement to compile programs. Users are still free to accelerate their application-level code with technologies like Numba. Standard ready-to-use binary versions can be installed on Linux, macOS and Windows with pip.
It is also possible to write hybrid Charm4py and MPI programs.[4] An example of a supported scenario is a Charm4py program using mpi4py libraries for specific parts of the computation.
References
- "Parallel Programming Laboratory". charm.cs.illinois.edu. Retrieved 2018-12-12.
- "Array "Hello World": A Slightly More Advanced "Hello World" Program: Array "Hello World" Code". http://charmplusplus.org/: PPL - UIUC PARALLEL PROGRAMMING LABORATORY. Retrieved 2017-05-08.
- "Charm4py — Charm4py 1.0.0 documentation". charm4py.readthedocs.io. Retrieved 2019-09-11.
- "Running hybrid mpi4py and Charm4py programs (mpi interop)". Charm++ and Charm4py Forum. 2018-11-30. Retrieved 2018-12-11.