Yellow

505.277.6900
help@carc.unm.edu

Walk-in office hours
with Dr. Ryan Johnson,
Applications Scientist

Wednesdays 10 am to noon

3. Batch Submission Scripts

Bash Hello World PBS Script

This example uses the bash shell to print a simple “Hello World” message. Note that it specifies the shell with the “-S” option. If you do not specify a shell using the “-S” option (either inside the PBS script or as an argument to qsub), then your default shell will be used.

## Introductory PBS Example
## Copyright (c) 2014 The Center for Advanced Research Computing
## at The University of New Mexico
#PBS -lnodes=1:ppn=4
#PBS -lwalltime=1:00
## Specify the shell to be bash
#PBS -S /bin/bash
# print out a hello message indicating the host that this is running on
export THIS_HOST=`hostname `
echo Hello World from host $THIS_HOST

Note that the “ppn” value must always be less than or equal to the number of physical cores available on each node of the system on which you are running. For example, on the nano supercomputer, ppn should be <=4; on pequena, ppn should be <=8. See Systems for machine specifications.

Tcsh Hello World PBS Script

This example uses the tcsh shell to print a simple “Hello World” message. Note that it specifies the shell with the “-S” option. If you do not specify a shell using “-S” (either inside the PBS script or as an argument to qsub), then your default shell will be used.

## Introductory Example
## Copyright (c) 2014 The Center for Advanced Research Computing
## at The University of New Mexico
#PBS -lnodes=1:ppn=4
#PBS -lwalltime=1:00
## Specify the shell to be tcsh
#PBS -S /bin/tcsh
# print out a hello message
# indicating the host this is running on
setenv THIS_HOST `hostname `
echo Hello World from host $THIS_HOST

Submitting the PBS Script to the Batch Scheduler

In order to run our simple PBS script, we will need to submit it to the batch scheduler using the command qsub followed by the name of the script we would like to run.

In the following example, we submit our hello.pbs script to the batch scheduler using qsub. Note that it returns the job identifier when the job is successfully submitted. You can use this job identifier to query the status of your job from your shell.  For example:

shell> qsub hello.pbs
64811.nano.nano.alliance.unm.edu

Checking on the Status of Your Job

If you would like to check the status of your job, you can use the qstat command to do so. With the hello.pbs script, the job may run so quickly that you do not see your job using qstat. The “-a” option causes PBS to display more information about the jobs currently in the scheduler.

If you would like to see the status of this job only, you would run the following from your shell:
shell> qstat 64811.nano.nano.alliance.unm.edu

Or, the shorter version with just the numeric portion of the job identifier:

shell> qstat 64811

In the example qstat output shown below, the username is “download” and the job identifier is 64811.nano.nano.alliance.unm.edu.

Note that your job can be in one of three states while it is in the scheduler: Running, Queued, or Exiting, denoted by R, Q, and E respectively in the job State column (the column labeled “S”).

shell> qstat -a
nano.nano.alliance.unm.edu:

Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
64758.nano.nano.alli jruser one_long frob0001 1049 1 -- -- 160:0 R 46:27
64760.nano.nano.alli jruser one_long frob-1000 2037 1 -- -- 160:0 R 46:22
64761.nano.nano.alli jruser one_long frob-3000 9944 1 -- -- 160:0 R 46:18
64762.nano.nano.alli jruser one_long frob-6000 21219 1 -- -- 160:0 R 46:14
64763.nano.nano.alli jruser one_long frob-12000 -- 1 -- -- 160:0 Q --
64764.nano.nano.alli jruser one_long frob-18000 -- 1 -- -- 160:0 Q --
64765.nano.nano.alli jruser one_long frob-28000 -- 1 -- -- 160:0 Q --
64766.nano.nano.alli jruser one_long frob-38000 -- 1 -- -- 160:0 Q --
64770.nano.nano.alli alice defaultq abcd 32682 4 -- -- 60:00 R 28:24
64797.nano.nano.alli bill one_node blub11234 18940 1 -- -- 48:00 R 16:09
64811.nano.nano.alli download one_node hello.pbs -- 1 -- -- 00:01 Q --

Determining Which Nodes Your Job Is Using

If you would like to check which nodes your job is using, you can pass the “-n” option to qsub. Note that if you currently have a job running on a node of the machine, you may freely log into that node in order to check on the status of your job. When your job is finished, your processes on that node will all be killed by the system, and the node will be released back into the available resource pool.

shell> qstat -an
nano.nano.alliance.unm.edu:

Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
64758.nano.nano.alli jruser one_long frob0001 1049 1 -- -- 160:0 R 46:27
nano34+nano34+nano34+nano34 64760.nano.nano.alli jruser one_long frob-1000 2037 1 -- -- 160:0 R 46:22
nano28+nano28+nano28+nano28
64761.nano.nano.alli jruser one_long frob-3000 9944 1 -- -- 160:0 R 46:18
nano12+nano12+nano12+nano12 64762.nano.nano.alli jruser one_long frob-6000 21219 1 -- -- 160:0 R 46:14
nano11+nano11+nano11+nano11 64763.nano.nano.alli jruser one_long frob-12000 -- 1 -- -- 160:0 Q -- --
64764.nano.nano.alli jruser one_long frob-18000 -- 1 -- -- 160:0 Q -- --
64765.nano.nano.alli jruser one_long frob-28000 -- 1 -- -- 160:0 Q -- --
64766.nano.nano.alli jruser one_long frob-38000 -- 1 -- -- 160:0 Q -- --
64770.nano.nano.alli alice defaultq abcd 32682 4-- -- 60:00 R 28:24
nano27+nano27+nano27+nano27+nano25+nano25+nano25+nano25+nano24+nano24+nano24 +nano24+nano23+nano23+nano23+nano23
64797.nano.nano.alli fred one_node blub11234 18940 1 -- -- 48:00 R 16:09
nano20+nano20+nano20+nano20 64811.nano.nano.alli download one_node hello.pbs -- 1 -- -- 00:01 Q -- --

Viewing Output and Error Files

Once your job has completed, you should see two files in the directory from which you submitted the job: hello.pbs.oXXXXX and hello.pbs.eXXXXX (where the Xs are replaced by the numerical portion of the job identifier returned by qsub). Any output from the job sent to “standard output” will be written to the hello.pbs.oXXXXX file and any output sent to “standard error” will be written to the hello.pbs.eXXXXX file. These files are referred to as the “output file” and the “error file” respectively throughout this document. For the example job, the error file is empty, and the output file contains the following:

Nano Portable Batch System Prologue
Job Id: 64811.nano.nano.alliance.unm.edu
Username: download
prologue running on host: nano10
Hello World from host nano10
Nano Portable Batch System Epilogue

Multi-process Hello World (Single Node)

Bash
## Introductory Example
## Copyright (c) 2014 The Center for Advanced Research Computing
## at The University of New Mexico
#PBS -lnodes=1:ppn=4
#PBS -lwalltime=1:00
## Specify the shell to be bash
#PBS -S /bin/bash
# load the environment module to use OpenMPI built with the GNU compilers
source etc/profile.d/module.sh
module load openmpi/gnu
# print out a hello message from each of the processors on this host
# indicating the host this is running on
export THIS_HOST=`hostname `
mpirun -np 4 -machinefile $PBS_NODEFILE /bin/sh \-c \
'echo Hello World from host $THIS_HOST '

Tcsh
## Introductory Example
## Copyright (c) 2014 The Center for Advanced Research Computing
## at The University of New Mexico #PBS -lnodes=1:ppn=4
#PBS -lwalltime=1:00
## Specify the shell to be tcsh
#PBS -S /bin/tcsh
# load the environment module to use OpenMPI built with the GNU compilers
source etc/profile.d/module.csh
module load openmpi/gnu
# print out a hello message from each of the processors on this host
# indicating the host this is running on
setenv THIS_HOST `hostname `
mpirun -np 4 -machinefile $PBS_NODEFILE /bin/sh \-c \
'echo Hello World from host $THIS_HOST ' 

In this job’s output file, you should see something like this:
Nano Portable Batch System Prologue
Job Id: 28297.nano.nano.alliance.unm.edu
Username: download
Job 28297.nano.nano.alliance.unm.edu running on nodes:
nano14
prologue running on host: nano14

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Hello World from host nano14
Hello World from host nano14
Hello World from host nano14
Hello World from host nano14
Nano Portable Batch System Epilogue
Nano Portable Batch System Epilogue

Multi-node Hello World

Bash
## Introductory Example
## Copyright (c) 2014 The Center for Advanced Research Computing
#PBS -lnodes=4:ppn=4
#PBS -lwalltime=1:00
## Specify the shell to be bash
#PBS -S /bin/bash
# load the environment module to use OpenMPI built with the GNU
# compilers

source etc/profile.d/module.sh
module load openmpi/gnu
# print out a hello message from each of the processors on this host
# indicating the host this is running on
mpirun -np 16 -machinefile $PBS_NODEFILE /bin/sh \-c \
'echo Hello World from host ` hostname `'

Tcsh
## Introductory Example
## Copyright (c) 2014 The Center for Advanced Research Computing
#PBS -lnodes=4:ppn=4
#PBS -lwalltime=1:00
## Specify the shell to be tcsh
#PBS -S /bin/tcsh
# load the environment module to use OpenMPI built with the GNU compilers
source etc/profile.d/module.csh
module load openmpi/gnu
# print out a hello message from each of the processors on this host
# indicating the host this is running on
mpirun -np 16 -machinefile $PBS_NODEFILE /bin/sh \-c \
'echo Hello World from host `hostname `'

Center for Advanced Research Computing

MSC01 1190
1601 Central Ave. NE
Albuquerque, NM 87106

p: 505.277.8249
f:  505.277.8235
e: