Intro to high-performance computing (HPC)

Table of contents

Monday, June 19th
2:00pm–5:00pm Pacific Time

This course is an introduction to High-Performance Computing on the Alliance clusters.

Instructor: Alex Razoumov (SFU)

Prerequisites: Working knowledge of the Linux Bash shell. We will provide guest accounts to one of our Linux systems.

Software: All attendees will need a remote secure shell (SSH) client installed on their computer in order to participate in the course exercises. On Windows we recommend the free Home Edition of MobaXterm. On Mac and Linux computers SSH is usually pre-installed (try typing ssh in a terminal to make sure it is there).

Please download a ZIP file with all slides (single PDF combining all chapters) and sample codes.
We’ll be using the same training cluster as in the morning – let’s try to log in now.

Part 1
Click on a triangle to expand a question:

Question 1: cluster filesystems

Let’s log in to the training cluster. Try to access /home, /scratch, /project on the training cluster. Note that these only emulate the real production filesystems and have no speed benefits on the training cluster.

Question 2: edit a remote file

Edit a remote file in nano or vi or emacs. Use cat or more to view its content in the terminal.

Question 3: gcc compiler

Load the default GNU compiler with module command. Which version is it? Try to understand what the module does: run module show on it, echo $PATH, which gcc.

Question 4: Intel compiler

Load the default Intel compiler. Which version is it? Does it work on the training cluster?

Question 5: third compiler?

Can you spot the third compiler family when you do module avail?

Question 6: scipy-stack

What other modules does scipy-stack/2022a load?

Question 7: python3

How many versions of python3 do we have? What about python2?

Question 8: research software

Think of a software package that you use. Check if it is installed on the cluster, and share your findings.

Question 9: file transfer

Transfer a file to/from the cluster (we did this already in bash class) using either command line or GUI. Type “done” into the chat when done.

Question 10: why HPC?

Can you explain (1-2 sentences) how HPC can help us solve problems? Why a desktop/workstation not sufficient? Maybe, you can give an example from your field?

Question 11: tmux

Try left+right or upper+lower split panes in tmux. Edit a file in one and run bash commands in the other. Try disconnecting temporarily and then reconnecting to the same session.

Question 12: compiling

In introHPC/codes, compile {pi,sharedPi,distributedPi}.c files. Try running a short serial code on the login node (not longer than a few seconds: modify the number of terms in the summation).

Question 13a: make

Write a makefile to replace these compilations commands with make {serial,openmp,mpi}.

Question 13b: make (cont.)

Add target all.

Add target clean. Try implementing clean for all executable files in the current directory, no matter what they are called.

Question 14: Julia

Julia parallelism was not mentioned in the videos. Let’s quickly talk about it (slide 29).

Question 14b: parallelization

Suggest a computational problem to parallelize. Which of the parallel tools mentioned in the videos would you use, and why?

If you are not sure about the right tool, suggest a problem, and we can brainstorm the approach together.

Question 15: Python and R

If you use Python or R in your work, try running a Python or R script in the terminal.

If this script depends on packages, try installing them in your own directory with virtualenv. Probably, only a few of you should do this on the training cluster at the same time.

Question 16: other

Any remaining questions? Type your question into the chat, ask via audio (unmute), or raise your hand in Zoom.

Part 2
Click on a triangle to expand a question:

Question 17: serial job

Submit a serial job that runs hostname command.

Try playing with sq, squeue, scancel commands.

Question 18: serial job (cont.)

Submit a serial job based on pi.c.

Try sstat on a currently running job. Try seff and sacct on a completed job.

Question 19: optimization timing

Using a serial job, time optimized (-O2) vs. unoptimized code. Type your findings into the chat.

Question 20: Python vs. C timing

Using a serial job, time pi.c vs. pi.py for the same number of terms (cannot be too large or too small – why?).

Python pros – can you speed up pi.py?

Question 21: array job

Submit an array job for different values of n (number of terms) with pi.c. How can you have different executable for each job inside the array?

Question 22: OpenMP job

Submit a shared-memory job based on sharedPi.c. Did you get any speedup? Type your answer into the chat.

Question 23: MPI job

Submit an MPI job based on distributedPi.c.

Try scaling 1 → 2 → 4 → 8 cores. Did you get any speedup? Type your answer into the chat.

Question 24: serial interactive job

Test the serial code inside an interactive job. Please quit the job when done, as we have very few compute cores on the training cluster.

Note: we have seen the training cluster become unstable when using too many interactive resources. Strictly speaking, this should not happen, however there is a small chance it might. We do have a backup.

Question 25: shared-memory interactive job

Test the shared-memory code inside an interactive job. Please quit when done, as we have very few compute cores on the training cluster.

Question 26: MPI interactive job

Test the MPI code inside an interactive job. Please quit when done, as we have very few compute cores on the training cluster.

Question 27: debugging and optimization

Let’s talk about debugging, profiling and code optimization.

Question 28: permissions and file sharing

Let’s talk about file permissions and file sharing.

Share a file in your ~/projects directory (make it readable) with all other users in def-sponsor00 group.

Question 29: other

Are there questions on any of the topics that we covered today? You can type your question into the chat, ask via audio (unmute), or raise your hand in Zoom.

Videos: introduction

Introduction (3 min)
Cluster hardware overview (17 min)
Basic tools on HPC clusters (18 min)
File transfer (10 min)
Programming languages and tools (16 min)

Updates:

WestGrid ceased its operations on March 31, 2022. Since April 1st, your instructors in this course are based at Simon Fraser University.
Some of the slides and links in the video have changed – please make sure to download the latest version of the slides (ZIP file).
Compute Canada has been replaced by the Digital Research Alliance of Canada (the Alliance). All Compute Canada hardware and services are now provided to researchers by the Alliance and its regional partners. However, you will still see many references to Compute Canada in our documentation and support system.
New systems were added (e.g. Narval in Calcul Québec), and some older systems were upgraded.

Videos: overview of parallel programming frameworks

Here we give you a brief overview of various parallel programming tools. Our goal here is not to learn how to use these tools, but rather tell you at a high level what these tools do, so that you understand the difference between shared- and distributed-memory parallel programming models and know which tools you can use for each. Later, in the scheduler session, you will use this knowledge to submit parallel jobs to the queue.

Feel free to skip some of these videos if you are not interested in parallel programming.

OpenMP (3 min)
MPI (message passing interface) (9 min)
Chapel parallel programming language (7 min)
Python Dask (6 min)
Make build automation tool (9 min)
Other essential tools (5 min)
Python and R on clusters (6 min)

Videos: Slurm job scheduler

Slurm intro (8 min)
Job billing with core equivalents (2 min)
Submitting serial jobs (12 min)
Submitting shared-memory jobs (9 min)
Submitting MPI jobs (8 min)
Slurm jobs and memory (8 min)
Hybrid and GPU jobs (5 min)
Interactive jobs (8 min)
Getting information and other Slurm commands (6 min)
Best computing / storage practices and summary (9 min)