2 Using Software

2.1 Best Practice

2.1.1 Interactive mode vs batch mode

Most applications can run either in interactive mode or in batch mode. For small jobs, it really doesn’t matter which one you choose. For medium to large jobs, you could still code and debug your work in the interactive mode, but you should run your work in batch mode once it’s fully tested as probably you don’t want to wait in front of the screen for your job to finish. If you use batch mode, you’ll have the convenience of logging off the system and logging in later to check the result. You can do so by pre- and post-face your application’s batch mode command with the Linux command nohup and control operator &.

nohup <batch mode command> &

nohup (no hangup) prevents your job from being killed when you log off the system. The & at the end makes the command run in the background. For example, suppose you use Stata and you have a do file named filename.do that takes hours to run, you could run it in batch mode as below.

nohup stata -b do filename &

2.1.2 RRN vs compute nodes

RRN is part of a High-Performance Computing (HPC) system. The HPC system has a cluster of around 100 compute nodes and hundreds of CPUs. Currently, Rotman users have access to 8 compute nodes via a “standard” partition (a partition is just a logical set of compute nodes). Each compute node has 24 CPUs and 240G memory.

Since RRN is shared by many Rotman researchers, if you have a very large compute-intensive job, you should consider submitting it to the compute nodes instead of running it on RRN. Although the compute nodes are also shared, once you obtain the requested resources (CPUs, memory, etc.) from the compute nodes, they are exclusive resources for you during your computation.

See Running Jobs on Compute Nodes to learn how to submit jobs to compute nodes.

2.2 Using Python

2.2.1 Setup

Different versions and distributions of Python are available on RRN including Standard Python 2 and 3 releases and Anaconda Distribution 2 and 3. (Use module avail to list all available software on RRN.)

We recommend that you use Standard Python 3 release. For example, load the Python 3.11.5 standard distribution as below.

module load StdEnv/2020
module load python/3.11.5

StdEnv/2020 is a software environment that offers more recent software tools. (Use module avail after module load StdEnv/2020 to list all available software under StdEnv/2020.)

Note: see more on setting up a Python virtual environment below (section Install packages). A virtual environment allows you to install and manage packages for a specific Python project (and hence avoid package version conflict between projects). The standard Python distribution comes with a tool to create and manage virtual environment.

2.2.2 Interactive mode (Python console)

To use Python in interactive mode (Python console), type python in the terminal. Type exit() to exit the Python console.

2.2.3 Batch mode

To use Python in batch mode, use

python mycode.py

To capture standard output stdout and standard error stderr (usually displayed on screen) in a log file, use

python mycode.py &> mycode_log.txt

&> redirects standard output and error streams to a file specified. Nothing will be printed in the terminal. If the file already exists, it gets overwritten.

Pre- and post-face the above command with the Linux command nohup and control operator & so that the job will run in background and won’t be killed even you log off the system (see section Interactive mode vs batch mode).

nohup python mycode.py &> mycode_log.txt &

2.2.4 Install packages

To install a new Python package, use

pip install --user <packageName>

The above command installs a package to the base environment under your home directory.

However, the best practice to manage Python packages is to create a virtual environment for each of your Python project. Within a project virtual environment, you only need to install packages required for the specific project. In that way, you isolate package requirement for each project and avoid potential package conflict between projects.

We assume that you use a Standard Python 3 release on RRN. Below are steps to setup a virtual environment and install new packages.

  1. Load a standard Python 3 distribution.
module load StdEnv/2020
module load python/3.11.5
  1. Create a virtual environment. (Make sure that you are currently in your home directory, or otherwise use the cd command to return to your home directory. This ensures the virtual environment folder created by the below command will be under your home directly.)
virtualenv --no-download myenv

virtualenv is a tool to create isolated Python virtual environments. myenv is the specified environment name. (You may want to pick your own name.) A folder named myenv will be created under your home directory. Packages related to this environment will all be installed/managed there. The --no-download option disables download of the latest pip/setuptools/wheel tools from PyPI (Python package repository) as these tools are installed directly from our system. (The system manages its own set of Python packages/wheels for optimal performance on specific hardware.)

  1. Activate the virtual environment.
source ~/myenv/bin/activate
  1. Install packages.

Now you are inside your virtual environment. You should see (myenv) in the terminal prompt. This is an isolated Python environment that you can manage the packages to be installed using the standard tool pip. For example, pip list displays all packages currently installed in the environment. pip install <packageName> installs a new package, and its dependencies. As an example. we install the scikit-learn library below.

pip install scikit-learn

Note that a newly created virtual environment has no packages pre-installed (except a few necessary tools for installing new packages).

  1. Return to base environment.

After you’re done with Python, type deactivate to return to the base environment.

Next time, all you need to do is step (1) and step (3) to start the virtual environment and using Python.

If you don’t need a virtual environment anymore, and want to completely delete it, do the following.

rm -r ~/myenv

Recall that a virtual environment is maintained in a folder under your home directory. The above command removes (rm) everything recursively (-r) under the ~/myenv folder and the folder itself. Use it cautiously.

Lastly, if you run into any problem installing a Python package, let us know. As mentioned above, the system manages its own set of Python packages/wheels for optimal performance on specific hardware. Certain restrictions and rules are set on the system, so there might be some packages have installation issues.

2.2.5 Run Python on compute nodes

For large compute-intensive Python jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.

2.3 Using R

2.3.1 Setup

Several versions of R are available on RRN. (Use module avail to list all available software on RRN.) Load the R version you prefer, for example, R 4.2.1, and you are good to start using R.

module load StdEnv/2020
module load r/4.2.1

StdEnv/2020 is a software environment that offers more recent software tools. (Use module avail after module load StdEnv/2020 to list all available software under StdEnv/2020.)

If you need to use RStudio, you will need to setup R in a different way. see RStudio on RRN for instructions.

2.3.2 Interactive mode (R console)

To start R in interactive mode (R console), type R in the terminal. Type quit() to quit the R console.

2.3.3 Batch mode

To use R in batch mode, use

Rscript mycode.R

See Rscript manual for more options.

To capture standard output stdout and standard error stderr (usually displayed on screen) in a log file, use

Rscript mycode.R &> mycode_log.txt

&> redirects standard output and error streams to a file specified. Nothing will be printed in the terminal. If the file already exists, it gets overwritten.

Pre- and post-face the above command with the Linux command nohup and control operator & so that the job will run in background and won’t be killed even you log off the system (see section Interactive mode vs batch mode).

nohup Rscript mycode.R &> mycode_log.txt &

2.3.4 Install R packages

In the R console, type the below command to list current installed packages.

installed.packages()

To install additional packages, use the command:

install.packages("packageName")

Packages will be built from the source and installed under your home directory. If you encounter problems building a package, it’s likely that the default Intel compiler doesn’t work well with your package. In this case, before you install the package, in your Linux terminal (not in the R console), load the gcc compiler using the below command, and then start R console and install the package again.

module load gcc/7.3.0

2.3.5 Run R on compute nodes

For large compute-intensive R jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.

2.3.6 RStudio on RRN

RStudio is not installed on RRN as its GUI experience on the server isn’t that good. In general, we do NOT recommend using RStudio on RRN, and we suggest you use R in command line (either in interactive or batch mode). Nevertheless, to install RStudio on your own, the easiest way is to use R and RStudio from the Anaconda distribution. You can create a virtual environment using Anaconda’s package management tool conda and install R and RStudio in it. Anaconda is mostly known for its Python distribution, but it recently started to distribute R as well.

  1. Load Anaconda distribution.
module load anaconda/3.5.3
  1. Create a virtual environment .
conda create --name r_env

conda is a package management system. r_env is the specified environment name. (You may want to pick your own name.)

  1. Activate the virtual environment.
source activate r_env

Note: For conda 4.6 or later version, you can use conda activate r-env to activate the environment. The default conda version on RRN is 4.5.11 (use conda --version to see it). However, if you still want to use conda activate r_env (for future proof), do the following.

Run the below command once, logout, and re-login. Once you login, run (1) again to load the anaconda module. This command adds a line to .bashrc file so that you can use conda activate to activate the virtual environment you just created in step (2)

echo ". /global/software/python/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc
  1. Install R, RStudio and R packages.

Now you are inside your virtual environment. You should see (r_env) in the terminal prompt. This is an isolated environment that you can manage the packages to be installed. Let’s install base R, some essential R packages and RStudio. Note that first you would need to install pip, a Python package manager.

conda install pip
conda install r-base r-essentials rstudio

Use conda list to display all packages installed in your virtual environment.

Installing other R packages is similar. Simply use

conda install -c r <pacakgeName>

Note that we use conda for package management instead of R itself (install.packages()). The advantage of using conda is that it takes care of the package dependency for us. This is especially helpful when a package is partially developed using languages other than R (e.g. C++).

The -c r option means installing package from the r channel. This channel is maintained by Anaconda. See this official Anaconda document for all R packages maintained. Note the r- prefix to the usual R package name.

If you are not able to find an R package with conda install -c r <pacakgeName> (packages built and maintained by Anaconda official channel) or conda install -c conda-forge <packageName> (packages built and maintained by conda community), you can still try install.packages("packageName") in R console as discussed in Install R packages.

Start RStudio by typing rstudio in the terminal. (Type rstudio & will start RStudio and return you the terminal prompt.)

  1. Return to base environment.

After you’re done with R, type source deactivate to return to the base environment. (If you have followed the “Note” in step 3, you can use conda deactivate instead.)

Next time, all you need to do is step (1) and (3) to start the virtual environment and using R.

Refer to conda’s official document for more on virtual environment and package management. Note that conda’s official document mainly targets Python user.

2.4 Using Matlab

2.4.1 Setup

Send a signed MATLAB usage statement form to and cc . This will add you to the matlab group that allows access to the software. To check if you have been authorized access, type groups while logged on to the system. If matlab appears in the output, you are good to go!

Type module avail matlab to see available Matlab versions, and then load the one you want to use. For example, if you want to use Matlab R2023a, type

module load StdEnv/2020
module load matlab/R2023a

StdEnv/2020 is a software environment that offers more recent software tools. (Use module avail after module load StdEnv/2020 to list all available software under StdEnv/2020.)

2.4.2 Interactive mode without GUI

To start Matlab in interactive mode without GUI, type the command below.

matlab -nodisplay -nosplash

2.4.3 Interactive mode with GUI

To start Matlab in interactive mode with GUI, simply type matlab in the terminal.

If you use Matlab/R2017a, you may find the Matlab GUI a bit lagging. In this case, in the folder where you start Matlab, create a file named java.opts and add the following line in it. This should resolve the GUI problem.

-Dsun.java2d.pmoffscreen=false

2.4.4 Batch mode

To run Matlab in batch mode, use

matlab -nodisplay -nodesktop -nosplash <mycode.m >output.txt

The < symbol tells the matlab command to take the Matlab m file mycode.m as input. The > symbol redirects the output to the output.txt file instead of printing it to the screen.

Pre- and post-face the above command with the Linux command nohup and control operator & so that the job will run in background and won’t be killed even you log off the system (see section Interactive mode vs batch mode).

nohup matlab -nodisplay -nodesktop -nosplash <mycode.m >output.txt &

2.4.5 Run Matlab on compute nodes

For large compute-intensive Matlab jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.

2.5 Using Stata

2.5.1 Setup

The Stata 18 environment has been set up for you so you do not need to do anything.

If you want to use an older version of Stata, Stata 17, load the Stata17 module to setup the environment. Type the command below in the terminal.

module load stata17

2.5.2 Interactive mode without GUI

To start Stata/IC in interactive mode without GUI (Stata console mode), type stata in the terminal.

Use stata-se for Stata/SE and stata-mp for Stata/MP.

2.5.3 Interactive mode with GUI

To start Stata/IC in interactive mode with GUI, use xstata (use xstata-se for Stata/SE and xstata-mp for Stata/MP). You will lose your terminal command line prompt once the Stata GUI is running.

Alternatively, use xstata & (xstata-se & or xstata-mp &) to run the command in background and return the command line prompt.

2.5.4 Batch mode

To run Stata/IC in batch mode, use

stata -b do filename &

Stata will execute the code in filename.do and will automatically save the output in filename.log. The & at the end tells Linux to run the command in the background, freeing up your command line prompt.

If your code takes a long time to run, you may wish to start a batch Stata job, log off from your terminal, and log back in later to retrieve the output. To do this, preface the previous command with nohup.

nohup stata -b do filename &

nohup prevents your job from being killed when you log off the system.

Similarly, replace stata with stata-se for Stata/SE or stata-mp for Stata/MP.

2.5.5 A note on Stata/MP

We have Stata/MP 32-core installed on the system. Since RRN is a 32-core node shared by many users, we request that you set the number of processors for Stata/MP to be less than 16 when you use it on RRN.

Use set processors # to set the number of processors for Stata/MP. For example,

set processors 12

Put it on the first line in your do file or run it first if you use interactive mode.

If you have a large workload that requires many cores and hours to run, your should consider submitting your job to a compute node on the HPC system. In that way, you can request more CPUs to take full advantage of the 32-core Stata/MP licence. (See Running Jobs on Compute Nodes to get started.)

2.6 Using SAS

2.6.1 Setup

The SAS 9.4 environment has been set up for you so you do not need to do anything.

2.6.2 Interactive mode without GUI

To start SAS in interactive mode without GUI (i.e. in SAS command line mode), type sas -nodms in the terminal.

To exit SAS command line, use endsas;. Note the command ends with a semicolon.

2.6.3 Interactive mode with GUI

To start SAS in interactive mode with GUI, use sas. You will lose your terminal command line prompt once the SAS GUI is running.

Alternatively, use sas & to run the command in background and return the command line prompt.

The SAS GUI consists of a few SAS windows, Session Management, Explorer, Program Editor, Log, Result, and Output. To end the SAS session and close all the windows, click the “Terminate” button in the Session Management window.

2.6.4 Batch mode

To run SAS in batch mode, use

sas mycode.sas &

SAS will execute the code in mycode.sas and will automatically save the SAS console output in mycode.log. The & at the end tells Linux to run the command in the background, freeing up your command line prompt.

If your code takes a long time to run, you may wish to start a batch SAS job, log off from the system, and log back in later to check the output. To do this, preface the previous command with nohup.

nohup sas mycode.sas &

nohup prevents your job from being terminated when you log off the system.

2.6.5 Run SAS on compute nodes

For large compute-intensive SAS jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.

2.7 Managing Jobs in Batch Mode

2.7.1 Monitor jobs

Once your programs are running in the background, you can monitor their resource usage and performance using the htop utility. Simply type htop in the terminal to start the utility. htop also shows you information about other programs/processes running on the system so it’s also a great tool to see how busy the RRN is overall.

The Linux manual for htop is a bit hard to read. You could perhaps get started with this visual tutorial, or this YouTube tutorial.

2.7.2 Terminate jobs

If you want to terminate your program running in the background, you could use htop as well (F9/Kill). Alternatively, first find your program’s process id.

ps -u yourUserName

Then, use the kill command.

kill programProcessID

If the above command doesn’t work, try the kill -9 option.

kill -9 programProcessID