Introduction to Chapel
Chapel: a language for parallel computing on large-scale systems
- Modern, open-source parallel programming language developed at Cray Inc. (acquired by Hewlett Packard Enterprise in 2019).
- Offers simplicity and readability of scripting languages such as Python or Matlab: “Python for parallel programming”.
- Compiled language $\Rightarrow$ provides the speed and performance of Fortran and C.
- Supports high-level abstractions for data distribution and data parallel processing, and for task parallelism.
- Based on the PGAS (Partitioned Global Address space) programming model: can access variables in global address space from each node, a lot of behind-the-scenes work to reduce/buffer remote memory access.
- Provides data-driven placement of computations.
- Designed around a multi-resolution philosophy: users can incrementally add more detail to their original code, to bring it as close to the machine as required, at the same time they can achieve anything you can normally do with MPI and OpenMP.
The Chapel community is fairly small: relatively few people know/use Chapel ⇄ too few libraries. However, you can use functions/libraries written in other languages:
- Direct calls will always be serial.
- High-level Chapel parallel libraries can use C/F90/etc libraries underneath.
- A slowly growing base of parallel Chapel libraries.
You can find the slides here.
Running Chapel codes on Cedar / Graham / Béluga / Narval
On Compute Canada clusters Cedar / Graham / Béluga / Narval we have two versions of Chapel: a single-locale (single-node) Chapel and a multi-locale (multi-node) Chapel. You can find the documentation on running Chapel in our wiki .
If you want to start single-locale Chapel, you will need to load chapel-multicore
module, e.g.
$ module spider chapel # list all Chapel modules
$ module load gcc/9.3.0 chapel-multicore
Multi-locale is provided by chapel-ofi
module on OmniPath clusters such as Cedar, and by chapel-ucx
module
on InfiniBand clusters such as Graham, Béluga, Narval. Since multi-locale Chapel includes a parallel launcher
for the right interconnect type, there is no single Chapel module for all cluster architectures.
Running Chapel codes inside a Docker container
If you are familiar with Docker and have it installed, you can run multi-locale Chapel inside a Docker container (e.g., on your laptop, or inside an Ubuntu VM on Arbutus):
$ docker pull chapel/chapel-gasnet # will emulate a cluster with 4 cores/node
$ mkdir -p ~/tmp
$ docker run -v /home/ubuntu/tmp:/mnt -it -h chapel chapel/chapel-gasnet # map host's ~/tmp to container's /mnt
$ cd /mnt
$ apt-get update
$ apt-get install nano # install nano inside the Docker container
$ nano test.chpl # file is /mnt/test.chpl inside the container and ~ubuntu/tmp/test.chpl on the host VM
$ chpl test.chpl -o test
$ ./test -nl 8
You can find more information at https://chapel-lang.org/install-docker.html
Running single-locale Chapel in MacOS
You can compile and run Chapel codes in MacOS. Multi-locale codes (e.g. containing distributed arrays) will compile but will run only as single-locale.
brew install chapel
Running Chapel codes on the training cluster
Depending on where our training cluster is deployed, its Chapel setup might (or not) be different from the production clusters. On the training cluster, you can start single-locale Chapel with either
$ module load arch/avx2 # not necessary, unless you land on an avx512 node
$ module load gcc/9.3.0 chapel-multicore
or
source /project/def-sponsor00/shared/syncHPC/startSingleLocale.sh
Let’s write a simple Chapel code, compile and run it:
$ cd ~/tmp
$ nano test.chpl
$ writeln('If you can see this, everything works!');
$ chpl test.chpl -o test
$ ./test
You can optionally pass the flag --fast
to the compiler to optimize the binary to run as fast as possible
for the given architecture.
Depending on the code, it might utilize one / several / all cores on the current node. The command above implies that you are allowed to utilize all cores. This might not be the case on an HPC cluster, where a login node is shared by many people at the same time, and where it might not be a good idea to occupy all cores on a login node with CPU-intensive tasks. Therefore, we’ll be running test Chapel codes inside submitted jobs on compute nodes.
Let’s write the job script serial.sh
:
#!/bin/bash
#SBATCH --time=0:5:0 # walltime in d-hh:mm or hh:mm:ss format
#SBATCH --mem-per-cpu=1000 # in MB
./test
and then submit it:
$ chpl test.chpl -o test
$ sbatch serial.sh
$ sq # same as `squeue -u $USER`
$ cat slurm-jobID.out
Alternatively, today we could work inside a serial interactive job:
$ salloc --time=3:0:0 --mem-per-cpu=1000
Makefiles
In the rest of this workshop, we’ll be compiling codes test.chpl
, baseSolver.chpl
, begin.chpl
,
cobegin.chpl
and many others. To simplify compilation, we suggest writing a file called Makefile
in your
working directory:
%: %.chpl
chpl $^ -o $@
clean:
@find . -maxdepth 1 -type f -executable -exec rm {} +
Note that the 2nd and the 4th lines start with TAB and not with multiple spaces – this is very important!
With this makefile, to compile any Chapel code, e.g. baseSolver.chpl
, you would type:
$ make baseSolver
Add --fast
flag to the makefile to optimize your code. And you can type make clean
to delete all
executables in the current directory.