Setup and running Jupyter notebooks

Disclaimer: These notes started number of years ago from the official SWC lesson but then evolved quite a bit to include other topics.

Why Python?

Python is a free, open-source programming language first developed in the late 1980s and 90s that became really popular for scientific computing in the past 15 years. With Python in a few minutes you can:

  • analyze thousands of texts,
  • process tables with billions of records,
  • manipulate thousands of images,
  • restructure and process data any way you want.

Python vs. Excel

  • Unlike Excel, Python can read any type of data, both structured and unstructured.
  • Python is free and open-source, so no artificial limitations on where/how you run it.
  • Python works on all platforms: Windows, Mac, Linux, Android, etc.
  • Data manipulation is much easier in Python. There are hundreds of data processing, machine learning, and visualization libraries.
  • Python can handle much larger amounts of data: limited not by Python, but by your available computing resources. In addition, Python can run at scale (in parallel) on larger systems.
  • Python is more reproducible (rerun / modify the script).

Python vs. other programming languages

Python pros Python cons
elegant scripting language slow (interpreted, dynamically typed)
easy to write and read code uses indentation for code blocks
powerful, compact constructs for many tasks
very popular across all fields
huge number of external libraries

Installing Python locally

Today we’ll be running Python in the cloud, so you can skip this section. I am listing these options in case you want to install Python on your computer after the workshop.

Option 1: Install Python from https://www.python.org/downloads making sure to check the option “Add Python to PATH” during the installation.

Option 2: Install Python and the packages via Anaconda from https://www.anaconda.com/download.

Option 3: Install Python via your favourite package manager, e.g. in MacOS – assuming you have Homebrew installed – run the command brew install python.

Post-installation: Install 3rd-party Python packages in the Command Prompt / terminal via pip install <packageName>, e.g. to be able to run Python inside a Jupyter Notebook run pip install jupyter.

Starting Python

There are many ways to run Python commands:

  • from a Unix shell you can start a Python shell and type commands there,
  • you can launch Python scripts saved in plain text *.py files,
  • you can execute Python cells inside Jupyter notebooks; the code is stored inside JSON files, displayed as HTML

Today’s setup

Today we’ll be using JupyterHub on our training cluster. Point your browser to https://hss.c3.ca and log in with your username and password, then launch a JupyterHub server with time = 2.5 hours, 1 CPU core, memory = 3712 MB, GPU configuration = None, user interface = JupyterLab. Finally, start a new Python 3 notebook.

After you log in, in the dashboard start a new Python 3 notebook.

  • File | Save Notebook As - to rename your notebook
  • File | Download - download the notebook to your computer
  • File | New Launcher - to open a new launcher dashboard, e.g. to start a terminal
  • File | Log Out - to terminate your job (everything is running inside a Slurm job!)

Explain: tab completion, annotating code, displaying figures inside the notebook.

  • Esc - leave the cell (border changes colour) to the control mode
  • A - insert a cell above the current cell
  • B - insert a cell below the current cell
  • X - delete the current cell
  • M - turn the current cell into the markdown cell
  • H - to display help
  • Enter - re-enter the cell (border becomes green) from the control mode
  • you can enter Latex expressions in a markdown cell, e.g. try typing \int_0^\infty f(x)dx inside two dollar signs
print(1/2)   # to run all commands in the cell, either use the Run button, or press shift+return