2 Using Software

2.1 Best Practice

2.1.1 Interactive mode vs batch mode

Most applications can run either in interactive mode or in batch mode. For small jobs, it really doesn’t matter which one you choose. For medium to large jobs, you could still code and debug your work in the interactive mode, but you should run your work in batch mode once it’s fully tested as probably you don’t want to wait in front of the screen for your job to finish. If you use batch mode, you’ll have the convenience of logging off the system and logging in later to check the result. You can do so by pre- and post-face your application’s batch mode command with the Linux command nohup and control operator &.

nohup <batch mode command> &

nohup (no hangup) prevents your job from being killed when you log off the system. The & at the end makes the command run in the background. For example, suppose you use Stata and you have a do file named filename.do that takes hours to run, you could run it in batch mode as below.

nohup stata -b do filename &

2.1.2 RRN vs compute nodes

RRN is part of a High-Performance Computing (HPC) system. The HPC system has a cluster of around 100 compute nodes and hundreds of CPUs. Currently, Rotman users have access to 8 compute nodes via a “standard” partition (a partition is just a logical set of compute nodes). Each compute node has 24 CPUs and 240G memory.

Since RRN is shared by many Rotman researchers, if you have a very large compute-intensive job, you should consider submitting it to the compute nodes instead of running it on RRN. Although the compute nodes are also shared, once you obtain the requested resources (CPUs, memory, etc.) from the compute nodes, they are exclusive resources for you during your computation.

See Running Jobs on Compute Nodes to learn how to submit jobs to compute nodes.

2.2 Using Python

2.2.1 Setup

Different versions and distributions of Python are available on RRN including Standard Python 2 and 3 releases and Anaconda Distribution 2 and 3. (Use module avail to list all available software on RRN.)

We recommend that you use Standard Python 3 release. For example, load the Python 3.11.5 standard distribution as below.

module load StdEnv/2023
module load python/3.11.5

StdEnv/2023 is a software environment that offers more recent software tools. (Use module avail after module load StdEnv/2023 to list all available software under StdEnv/2023.)

Note: see more on setting up a Python virtual environment below (section Install packages). A virtual environment allows you to install and manage packages for a specific Python project (and hence avoid package version conflict between projects). The standard Python distribution comes with a tool to create and manage virtual environment.

2.2.2 Interactive mode (Python console)

To use Python in interactive mode (Python console), type python in the terminal. Type exit() to exit the Python console.

2.2.3 Batch mode

To use Python in batch mode, use

python mycode.py

To capture standard output stdout and standard error stderr (usually displayed on screen) in a log file, use

python mycode.py &> mycode_log.txt

&> redirects standard output and error streams to a file specified. Nothing will be printed in the terminal. If the file already exists, it gets overwritten.

Pre- and post-face the above command with the Linux command nohup and control operator & so that the job will run in background and won’t be killed even you log off the system (see section Interactive mode vs batch mode).

nohup python mycode.py &> mycode_log.txt &

2.2.4 Install packages

To install a new Python package, use

pip install --user <packageName>

The above command installs a package to the base environment under your home directory.

However, the best practice to manage Python packages is to create a virtual environment for each of your Python project. Within a project virtual environment, you only need to install packages required for the specific project. In that way, you isolate package requirement for each project and avoid potential package conflict between projects.

We assume that you use a Standard Python 3 release on RRN. Below are steps to setup a virtual environment and install new packages.

Load a standard Python 3 distribution.

module load StdEnv/2023
module load python/3.11.5

Create a virtual environment. (Make sure that you are currently in your home directory, or otherwise use the cd command to return to your home directory. This ensures the virtual environment folder created by the below command will be under your home directly.)

virtualenv --no-download myenv

virtualenv is a tool to create isolated Python virtual environments. myenv is the specified environment name. (You may want to pick your own name.) A folder named myenv will be created under your home directory. Packages related to this environment will all be installed/managed there. The --no-download option disables download of the latest pip/setuptools/wheel tools from PyPI (Python package repository) as these tools are installed directly from our system. (The system manages its own set of Python packages/wheels for optimal performance on specific hardware.)

Activate the virtual environment.

source ~/myenv/bin/activate

Install packages.

Now you are inside your virtual environment. You should see (myenv) in the terminal prompt. This is an isolated Python environment that you can manage the packages to be installed using the standard tool pip. For example, pip list displays all packages currently installed in the environment. pip install <packageName> installs a new package, and its dependencies. As an example. we install the scikit-learn library below.

pip install scikit-learn

Note that a newly created virtual environment has no packages pre-installed (except a few necessary tools for installing new packages).

Return to base environment.

After you’re done with Python, type deactivate to return to the base environment.

Next time, all you need to do is step (1) and step (3) to start the virtual environment and using Python.

If you don’t need a virtual environment anymore, and want to completely delete it, do the following.

rm -r ~/myenv

Recall that a virtual environment is maintained in a folder under your home directory. The above command removes (rm) everything recursively (-r) under the ~/myenv folder and the folder itself. Use it cautiously.

Lastly, if you run into any problem installing a Python package, let us know. As mentioned above, the system manages its own set of Python packages/wheels for optimal performance on specific hardware. Certain restrictions and rules are set on the system, so some packages might have installation issues.

2.2.5 Run Python on compute nodes

For large compute-intensive Python jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.

2.3 Using R

2.3.1 Setup

Several versions of R are available on RRN. (Use module avail to list all available software on RRN.) Load the R version you prefer, for example, R 4.4.0, and you are good to start using R.

module load StdEnv/2023
module load r/4.4.0

StdEnv/2023 is a software environment that offers more recent software tools. (Use module avail after module load StdEnv/2023 to list all available software under StdEnv/2023.)

2.3.2 Interactive mode (R console)

To start R in interactive mode (R console), type R in the terminal. Type quit() to quit the R console.

2.3.3 Batch mode

To use R in batch mode, use

Rscript mycode.R

See Rscript manual for more options.

To capture standard output stdout and standard error stderr (usually displayed on screen) in a log file, use

Rscript mycode.R &> mycode_log.txt

&> redirects standard output and error streams to a file specified. Nothing will be printed in the terminal. If the file already exists, it gets overwritten.

nohup Rscript mycode.R &> mycode_log.txt &

2.3.4 Install R packages

In the R console, type the below command to list current installed packages.

installed.packages()

To install additional packages, use the command:

install.packages("packageName")

Packages will be built/compiled from the source and installed under your home directory. If you encounter problems building a package, it’s likely that the package requires certain libraries/dependencies during the compilation. In this case, before you install the package, in your Linux terminal (not in the R console), load the relevant libraries. For example, a popular R package for handling geo-spatial data sf requires a few external libraries for compilation. (You can find what those external libraries are at sf CRAN site.) Therefore, you would load those libraries using the below command in the terminal, and then start R console and install the package. Once installed, you do not need to load those libraries again next time when using sf.

module load gdal/3.9.1
module load udunits/2.2.28

In fact the R sf package requires more than just gdal and udunits for compilation. However, the other dependencies are already installed and set up on the RRN, so you do not need to load them. If you encounter a package that requires a library that is not installed on RRN, please let us know.

2.3.5 Run R on compute nodes

For large compute-intensive R jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.

2.4 Using Matlab

2.4.1 Setup

Send a signed MATLAB usage statement form to cac.admin@queensu.ca and cc tdmdal@rotman.utoronto.ca. This will add you to the matlab group that allows access to the software. To check if you have been authorized access, type groups while logged on to the system. If matlab appears in the output, you are good to go!

Type module avail matlab to see available Matlab versions, and then load the one you want to use. For example, if you want to use Matlab R2024b, type

module load StdEnv/2023
module load matlab/R2024b

StdEnv/2023 is a software environment that offers more recent software tools. (Use module avail after module load StdEnv/2023 to list all available software under StdEnv/2023.)

2.4.2 Interactive mode without GUI

To start Matlab in interactive mode without GUI, type the command below.

matlab -nodisplay -nosplash

2.4.3 Interactive mode with GUI

To start Matlab in interactive mode with GUI, simply type matlab in the terminal.

If you use Matlab/R2017a, you may find the Matlab GUI a bit lagging. In this case, in the folder where you start Matlab, create a file named java.opts and add the following line in it. This should resolve the GUI problem.

-Dsun.java2d.pmoffscreen=false

2.4.4 Batch mode

To run Matlab in batch mode, use

matlab -nodisplay -nodesktop -nosplash <mycode.m >output.txt

The < symbol tells the matlab command to take the Matlab m file mycode.m as input. The > symbol redirects the output to the output.txt file instead of printing it to the screen.

nohup matlab -nodisplay -nodesktop -nosplash <mycode.m >output.txt &

2.4.5 Run Matlab on compute nodes

For large compute-intensive Matlab jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.

2.5 Using Stata

2.5.1 Setup

The Stata 18 environment has been set up for you so you do not need to do anything.

If you want to use an older version of Stata, Stata 17, load the Stata17 module to setup the environment. Type the command below in the terminal.

module load stata17

2.5.2 Interactive mode without GUI

To start Stata/IC in interactive mode without GUI (Stata console mode), type stata in the terminal.

Use stata-se for Stata/SE and stata-mp for Stata/MP.

2.5.3 Interactive mode with GUI

To start Stata/IC in interactive mode with GUI, use xstata (use xstata-se for Stata/SE and xstata-mp for Stata/MP). You will lose your terminal command line prompt once the Stata GUI is running.

Alternatively, use xstata & (xstata-se & or xstata-mp &) to run the command in background and return the command line prompt.

2.5.4 Batch mode

To run Stata/IC in batch mode, use

stata -b do filename &

Stata will execute the code in filename.do and will automatically save the output in filename.log. The & at the end tells Linux to run the command in the background, freeing up your command line prompt.

If your code takes a long time to run, you may wish to start a batch Stata job, log off from your terminal, and log back in later to retrieve the output. To do this, preface the previous command with nohup.

nohup stata -b do filename &

nohup prevents your job from being killed when you log off the system.

Similarly, replace stata with stata-se for Stata/SE or stata-mp for Stata/MP.

2.5.5 A note on Stata/MP

We have Stata/MP 32-core installed on the system. Since RRN is a 32-core node shared by many users, we request that you set the number of processors for Stata/MP to be less than 16 when you use it on RRN.

Use set processors # to set the number of processors for Stata/MP. For example,

set processors 12

Put it on the first line in your do file or run it first if you use interactive mode.

If you have a large workload that requires many cores and hours to run, your should consider submitting your job to a compute node on the HPC system. In that way, you can request more CPUs to take full advantage of the 32-core Stata/MP licence. (See Running Jobs on Compute Nodes to get started.)

2.6 Using SAS

2.6.1 Setup

The SAS 9.4 environment has been set up for you so you do not need to do anything.

2.6.2 Interactive mode without GUI

To start SAS in interactive mode without GUI (i.e. in SAS command line mode), type sas -nodms in the terminal.

To exit SAS command line, use endsas;. Note the command ends with a semicolon.

2.6.3 Interactive mode with GUI

To start SAS in interactive mode with GUI, use sas. You will lose your terminal command line prompt once the SAS GUI is running.

Alternatively, use sas & to run the command in background and return the command line prompt.

The SAS GUI consists of a few SAS windows, Session Management, Explorer, Program Editor, Log, Result, and Output. To end the SAS session and close all the windows, click the “Terminate” button in the Session Management window.

2.6.4 Batch mode

To run SAS in batch mode, use

sas mycode.sas &

SAS will execute the code in mycode.sas and will automatically save the SAS console output in mycode.log. The & at the end tells Linux to run the command in the background, freeing up your command line prompt.

If your code takes a long time to run, you may wish to start a batch SAS job, log off from the system, and log back in later to check the output. To do this, preface the previous command with nohup.

nohup sas mycode.sas &

nohup prevents your job from being terminated when you log off the system.

2.6.5 Run SAS on compute nodes

For large compute-intensive SAS jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.

2.7 Managing Jobs in Batch Mode

2.7.1 Monitor jobs

Once your programs are running in the background, you can monitor their resource usage and performance using the htop utility. Simply type htop in the terminal to start the utility. htop also shows you information about other programs/processes running on the system so it’s also a great tool to see how busy the RRN is overall.

The Linux manual for htop is a bit hard to read. You could perhaps get started with this visual tutorial, or this YouTube tutorial.

2.7.2 Terminate jobs

If you want to terminate your program running in the background, you could use htop as well (F9/Kill). Alternatively, first find your program’s process id.

ps -u yourUserName

Then, use the kill command.

kill programProcessID

If the above command doesn’t work, try the kill -9 option.

kill -9 programProcessID