2 Using Software
2.1 Best Practice
2.1.1 Interactive mode vs batch mode
Most applications can run either in interactive mode or in batch mode. For small jobs, it really doesn’t matter which one you choose. For medium to large jobs, you could still code and debug your work in the interactive mode, but you should run your work in batch mode once it’s fully tested as probably you don’t want to wait in front of the screen for your job to finish. If you use batch mode, you’ll have the convenience of logging off the system and logging in later to check the result. You can do so by pre- and post-face your application’s batch mode command with the Linux command nohup
and control operator &
.
nohup
(no hangup) prevents your job from being killed when you log off the system. The &
at the end makes the command run in the background. For example, suppose you use Stata and you have a do file named filename.do
that takes hours to run, you could run it in batch mode as below.
2.1.2 RRN vs compute nodes
RRN is part of a High-Performance Computing (HPC) system. The HPC system has a cluster of around 100 compute nodes and hundreds of CPUs. Currently, Rotman users have access to 8 compute nodes via a “standard” partition (a partition is just a logical set of compute nodes). Each compute node has 24 CPUs and 240G memory.
Since RRN is shared by many Rotman researchers, if you have a very large compute-intensive job, you should consider submitting it to the compute nodes instead of running it on RRN. Although the compute nodes are also shared, once you obtain the requested resources (CPUs, memory, etc.) from the compute nodes, they are exclusive resources for you during your computation.
See Running Jobs on Compute Nodes to learn how to submit jobs to compute nodes.
2.2 Using Python
2.2.1 Setup
Different versions and distributions of Python are available on RRN including Standard Python 2 and 3 releases and Anaconda Distribution 2 and 3. (Use module avail
to list all available software on RRN.)
We recommend that you use Standard Python 3 release. For example, load the Python 3.11.5 standard distribution as below.
StdEnv/2023 is a software environment that offers more recent software tools. (Use module avail
after module load StdEnv/2023
to list all available software under StdEnv/2023.)
Note: see more on setting up a Python virtual environment below (section Install packages). A virtual environment allows you to install and manage packages for a specific Python project (and hence avoid package version conflict between projects). The standard Python distribution comes with a tool to create and manage virtual environment.
2.2.2 Interactive mode (Python console)
To use Python in interactive mode (Python console), type python
in the terminal. Type exit()
to exit the Python console.
2.2.3 Batch mode
To use Python in batch mode, use
To capture standard output stdout
and standard error stderr
(usually displayed on screen) in a log file, use
&>
redirects standard output and error streams to a file specified. Nothing will be printed in the terminal. If the file already exists, it gets overwritten.
Pre- and post-face the above command with the Linux command nohup
and control operator &
so that the job will run in background and won’t be killed even you log off the system (see section Interactive mode vs batch mode).
2.2.4 Install packages
To install a new Python package, use
The above command installs a package to the base environment under your home directory.
However, the best practice to manage Python packages is to create a virtual environment for each of your Python project. Within a project virtual environment, you only need to install packages required for the specific project. In that way, you isolate package requirement for each project and avoid potential package conflict between projects.
We assume that you use a Standard Python 3 release on RRN. Below are steps to setup a virtual environment and install new packages.
- Load a standard Python 3 distribution.
- Create a virtual environment. (Make sure that you are currently in your home directory, or otherwise use the
cd
command to return to your home directory. This ensures the virtual environment folder created by the below command will be under your home directly.)
virtualenv
is a tool to create isolated Python virtual environments. myenv
is the specified environment name. (You may want to pick your own name.) A folder named myenv
will be created under your home directory. Packages related to this environment will all be installed/managed there. The --no-download
option disables download of the latest pip/setuptools/wheel tools from PyPI (Python package repository) as these tools are installed directly from our system. (The system manages its own set of Python packages/wheels for optimal performance on specific hardware.)
- Activate the virtual environment.
- Install packages.
Now you are inside your virtual environment. You should see (myenv)
in the terminal prompt. This is an isolated Python environment that you can manage the packages to be installed using the standard tool pip
. For example, pip list
displays all packages currently installed in the environment. pip install <packageName>
installs a new package, and its dependencies. As an example. we install the scikit-learn library below.
Note that a newly created virtual environment has no packages pre-installed (except a few necessary tools for installing new packages).
- Return to base environment.
After you’re done with Python, type deactivate
to return to the base environment.
Next time, all you need to do is step (1) and step (3) to start the virtual environment and using Python.
If you don’t need a virtual environment anymore, and want to completely delete it, do the following.
Recall that a virtual environment is maintained in a folder under your home directory. The above command removes (rm
) everything recursively (-r
) under the ~/myenv
folder and the folder itself. Use it cautiously.
Lastly, if you run into any problem installing a Python package, let us know. As mentioned above, the system manages its own set of Python packages/wheels for optimal performance on specific hardware. Certain restrictions and rules are set on the system, so some packages might have installation issues.
2.2.5 Run Python on compute nodes
For large compute-intensive Python jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.
2.3 Using R
2.3.1 Setup
Several versions of R are available on RRN. (Use module avail
to list all available software on RRN.) Load the R version you prefer, for example, R 4.4.0, and you are good to start using R.
StdEnv/2023 is a software environment that offers more recent software tools. (Use module avail
after module load StdEnv/2023
to list all available software under StdEnv/2023.)
2.3.2 Interactive mode (R console)
To start R in interactive mode (R console), type R
in the terminal. Type quit()
to quit the R console.
2.3.3 Batch mode
To use R in batch mode, use
See Rscript manual for more options.
To capture standard output stdout
and standard error stderr
(usually displayed on screen) in a log file, use
&>
redirects standard output and error streams to a file specified. Nothing will be printed in the terminal. If the file already exists, it gets overwritten.
Pre- and post-face the above command with the Linux command nohup
and control operator &
so that the job will run in background and won’t be killed even you log off the system (see section Interactive mode vs batch mode).
2.3.4 Install R packages
In the R console, type the below command to list current installed packages.
To install additional packages, use the command:
Packages will be built/compiled from the source and installed under your home directory. If you encounter problems building a package, it’s likely that the package requires certain libraries/dependencies during the compilation. In this case, before you install the package, in your Linux terminal (not in the R console), load the relevant libraries. For example, a popular R package for handling geo-spatial data sf
requires a few external libraries for compilation. (You can find what those external libraries are at sf
CRAN site.) Therefore, you would load those libraries using the below command in the terminal, and then start R console and install the package. Once installed, you do not need to load those libraries again next time when using sf
.
In fact the R sf
package requires more than just gdal
and udunits
for compilation. However, the other dependencies are already installed and set up on the RRN, so you do not need to load them. If you encounter a package that requires a library that is not installed on RRN, please let us know.
2.3.5 Run R on compute nodes
For large compute-intensive R jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.
2.4 Using Matlab
2.4.1 Setup
Send a signed MATLAB usage statement form to cac.admin@queensu.ca and cc tdmdal@rotman.utoronto.ca. This will add you to the matlab group that allows access to the software. To check if you have been authorized access, type groups
while logged on to the system. If matlab
appears in the output, you are good to go!
Type module avail matlab
to see available Matlab versions, and then load the one you want to use. For example, if you want to use Matlab R2023a, type
StdEnv/2023 is a software environment that offers more recent software tools. (Use module avail
after module load StdEnv/2023
to list all available software under StdEnv/2023.)
2.4.2 Interactive mode without GUI
To start Matlab in interactive mode without GUI, type the command below.
2.4.3 Interactive mode with GUI
To start Matlab in interactive mode with GUI, simply type matlab
in the terminal.
If you use Matlab/R2017a, you may find the Matlab GUI a bit lagging. In this case, in the folder where you start Matlab, create a file named java.opts
and add the following line in it. This should resolve the GUI problem.
-Dsun.java2d.pmoffscreen=false
2.4.4 Batch mode
To run Matlab in batch mode, use
The <
symbol tells the matlab
command to take the Matlab m file mycode.m
as input. The >
symbol redirects the output to the output.txt
file instead of printing it to the screen.
Pre- and post-face the above command with the Linux command nohup
and control operator &
so that the job will run in background and won’t be killed even you log off the system (see section Interactive mode vs batch mode).
2.4.5 Run Matlab on compute nodes
For large compute-intensive Matlab jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.
2.5 Using Stata
2.5.1 Setup
The Stata 18 environment has been set up for you so you do not need to do anything.
If you want to use an older version of Stata, Stata 17, load the Stata17 module to setup the environment. Type the command below in the terminal.
2.5.2 Interactive mode without GUI
To start Stata/IC in interactive mode without GUI (Stata console mode), type stata
in the terminal.
Use stata-se
for Stata/SE and stata-mp
for Stata/MP.
2.5.3 Interactive mode with GUI
To start Stata/IC in interactive mode with GUI, use xstata
(use xstata-se
for Stata/SE and xstata-mp
for Stata/MP). You will lose your terminal command line prompt once the Stata GUI is running.
Alternatively, use xstata &
(xstata-se &
or xstata-mp &
) to run the command in background and return the command line prompt.
2.5.4 Batch mode
To run Stata/IC in batch mode, use
Stata will execute the code in filename.do
and will automatically save the output in filename.log
. The &
at the end tells Linux to run the command in the background, freeing up your command line prompt.
If your code takes a long time to run, you may wish to start a batch Stata job, log off from your terminal, and log back in later to retrieve the output. To do this, preface the previous command with nohup
.
nohup
prevents your job from being killed when you log off the system.
Similarly, replace stata
with stata-se
for Stata/SE or stata-mp
for Stata/MP.
2.5.5 A note on Stata/MP
We have Stata/MP 32-core installed on the system. Since RRN is a 32-core node shared by many users, we request that you set the number of processors for Stata/MP to be less than 16 when you use it on RRN.
Use set processors #
to set the number of processors for Stata/MP. For example,
Put it on the first line in your do file or run it first if you use interactive mode.
If you have a large workload that requires many cores and hours to run, your should consider submitting your job to a compute node on the HPC system. In that way, you can request more CPUs to take full advantage of the 32-core Stata/MP licence. (See Running Jobs on Compute Nodes to get started.)
2.6 Using SAS
2.6.2 Interactive mode without GUI
To start SAS in interactive mode without GUI (i.e. in SAS command line mode), type sas -nodms
in the terminal.
To exit SAS command line, use endsas;
. Note the command ends with a semicolon.
2.6.3 Interactive mode with GUI
To start SAS in interactive mode with GUI, use sas
. You will lose your terminal command line prompt once the SAS GUI is running.
Alternatively, use sas &
to run the command in background and return the command line prompt.
The SAS GUI consists of a few SAS windows, Session Management, Explorer, Program Editor, Log, Result, and Output. To end the SAS session and close all the windows, click the “Terminate” button in the Session Management window.
2.6.4 Batch mode
To run SAS in batch mode, use
SAS will execute the code in mycode.sas
and will automatically save the SAS console output in mycode.log
. The &
at the end tells Linux to run the command in the background, freeing up your command line prompt.
If your code takes a long time to run, you may wish to start a batch SAS job, log off from the system, and log back in later to check the output. To do this, preface the previous command with nohup
.
nohup
prevents your job from being terminated when you log off the system.
2.6.5 Run SAS on compute nodes
For large compute-intensive SAS jobs, consider submitting them to the compute nodes. See Running Jobs on Compute Nodes to get started.
2.7 Managing Jobs in Batch Mode
2.7.1 Monitor jobs
Once your programs are running in the background, you can monitor their resource usage and performance using the htop
utility. Simply type htop
in the terminal to start the utility. htop
also shows you information about other programs/processes running on the system so it’s also a great tool to see how busy the RRN is overall.
The Linux manual for htop
is a bit hard to read. You could perhaps get started with this visual tutorial, or this YouTube tutorial.