Yellow

505.277.6900
help@carc.unm.edu

Walk-in office hours
with Dr. Ryan Johnson,
Applications Scientist

Mondays 10 am to noon

  • UNM
  • >Home
  • >Policies
  • >CARC Queue Configuration

CARC Queue Configuration

1. General Configuration Information

At CARC all supercomputer batch jobs are submitted through the machine’s head node via the PBS/Torque resource manager, and are scheduled through the Maui scheduler. There are several important constraints on how jobs run in this mode.

1.1. Single Job per Node

Each node is only allowed to run a single job at a time with the exception of poblano[1]. This is to prevent contention for resources within a node, such as memory and network bandwidth resulting from multiple running jobs. It is up to the user to make the best use of the resources available on each node requested for his or her job.

1.2. Single User per Node

The compute nodes of the systems at CARC, with the exception of poblano[1], are configured to allow only a single user to access the system at any time[2]. You may log into any nodes on which you currently have a running job[3] in order to monitor the progress of your job. When your job is complete, all of your login sessions will be terminated and all files on the local disk on all of the nodes in your job will be removed.

1.3. Honoring Resource Requests

The scheduler is configured so that resource requests from users to PBS are honored exactly. This means that if you request one processor on each of eight compute nodes, PBS will not "pack" the job onto two compute nodes (each with four processors), but will instead allocate all eight nodes for the job (and set up the PBS_NODEFILE to contain one entry for each of the eight nodes). Again, it is up to the user to make the best use of the resources on their allocated nodes. We periodically monitor usage of the machines and may contact you if we feel that your jobs are not making good use of the resources that you have requested. All machines have a max wallclock time. Therefore, the user should responsibly use standard checkpoint/restart procedures if their job requires additional time.

If necessary, users can submit a request, in advance, to help@carc.unm.edu, requesting additional resources in order to accommodate paper/conference/research deadlines. The request should specify the machine name, number of nodes, wallclock time, and the start/end dates for the special resource request. The requests will be reviewed and approved subject to the availability of resources. Detailed information for specific machines is available here:

1.4. Detailed information for specific machines is available here:

2. ^Nano

Queues at a glance:

Queue
Names
Total Nodes
Available
Max Nodes
Per Job
Min Nodes
Per Job
Max Wallclock
Time
Max # Jobs
Per User
Max # Queued
Jobs Per User
defaultq 35 8 2 80:00:00
debug 35 1 00:30:00
one_long 35 1 160:00:00 4 8
one_node 35 1 48:00:00 8

2.1. Default Queue

The default queue for this server is named route. If you do not specify otherwise, your jobs will be sent here. The route queue is a routing queue and, in general, will select the appropriate queue for a particular job. If you would like to submit to a different queue, you will need to run qsub with the -q argument followed by the name of the queue.

2.2. Routing Queues

  1. route

This is the default queue for nano's PBS server. When no queue is specified to PBS, either on the qsub command-line or within a PBS script, jobs that you submit will be sent here. This queue routes to the following queues (in this order): one_node, one_long and defaultq. The job will be sent to the first of these queues that will allow jobs with the requested resources to run. This queue is available to all users.

2.3. Execution Queues

  1. defaultq

This queue is intended for jobs that use up to 8 nodes per job, for up to 80 hours of wallclock time. This queue has 35 specific nodes allocated to it. This queue will run jobs sent from the route routing queue. You can also explicitly specify the defaultq queue by running qsub -q defaultq followed by any additional parameters that you would like to pass. The queue is available to all users.

  1. debug

This queue is intended for small debugging jobs that use only a single node for up to 30 minutes of wallclock time. This queue has 35 specific nodes allocated to it. This queue must be explicitly specified using qsub -q debugfollowed by any additional parameters you would like to pass. Access to this queue is by special request.

  1. one_long

This queue is intended for jobs that use 1 node per job for up to 160 hours of wallclock time. This queue has 35 specific nodes allocated to it. Users may run up to 4 jobs at a time in this queue. This queue will run jobs sent from the route routing queue. Users may queue up to 8 jobs at a time. You can also explicitly specify the one_long queue by running qsub -q one_long followed by any additional parameters that you would like to pass.

  1. one_node

This queue is intended for jobs that use up to 1 nodes per job for up to 48 hours of wallclock time. This queue has 35 specific nodes allocated to it. Users may run up to 8 jobs at a time in this queue. This queue will run jobs sent from the route routing queue. You can also explicitly specify the one_node queue by running qsub -q one_node followed by any additional parameters that you would like to pass. This queue is available to all users.

3. ^Pequena

Single node Shared-Memory Multi Processor Node.

Queues at a glance:

Queue
Names
Total Nodes
Available
Max Nodes
Per Job
Min Nodes
Per Job
Max Wallclock
Time
Max # Jobs
Per User
Max # Jobs
Queued Per User
workq 22 8 48:00:00
debug 22 1 00:30:00
one_long 22 1 120:00:00
one_node 22 1 48:00:00


3.1. Default Queue

The default queue for pequena's PBS server is named workq. If you do not specify otherwise, your jobs will be sent here. The workq queue is an execution queue. If you would like to submit to a different queue, you will need to run qsub with the -q argument followed by the name of the desired queue.

3.2. Execution Queues

  1. workq

This queue is intended for jobs that use up to 8 nodes per job for up to 48 hours of wallclock time. This queue does not have specific nodes allocated to it and may use any free nodes in the cluster. This queue can be explicitly specified using qsub -q workq followed by any additional parameters you would like to pass. This queue is available to all users.

  1. debug

This queue is intended for small debugging jobs that use only a single node for up to 30 minutes of wallclock time. This queue does not have specific nodes allocated to it and may use any free nodes in the cluster. This queue must be explicitly specified using qsub -q debug followed by any additional parameters that you would like to pass. This queue is available to all users.

  1. one_long

This queue is intended for jobs that use up to 1 nodes per job for up to 120 hours of wallclock time. This queue does not have specific nodes allocated to it and may use any free nodes in the cluster. This queue must be explicitly specified using qsub -q one_long followed by any additional parameters that you would like to pass. This queue is available to all users.

  1. one_node

This queue is intended for jobs that use up to 1 nodes per job for up to 48 hours of wallclock time. This queue does not have specific nodes allocated to it and may use any free nodes in the cluster. This queue must be explicitly specified using qsub -q one_node followed by any additional parameters that you would like to pass. This queue is available to all users.

4. ^Metropolis

Queues at a glance:

Queue
Names
Total Nodes
Available
Max Nodes
Per Job
Min Nodes
Per Job
Max Wallclock
Time
Max # Jobs
Per User
Max # Queued
Jobs Per User
defaultq 74 72:00:00

4.1. Default Queue

The default queue for Metropolis’ PBS server is named defaultq. If you do not specify otherwise, your jobs will be sent here. The defaultq queue is an execution queue. If you would like to submit to a different queue, you will need to run qsub with the -q argument followed by the name of the desired queue.

4.2. Execution Queues

  1. defaultq

This queue is intended for jobs up to 72 hours of wallclock time and there are no restrictions on the number of nodes per job. This queue does not have specific nodes allocated to it and may use any free nodes in the cluster. You can also explicitly specify the defaultq queue by running qsub -q defaultq followed by any additional parameters that you would like to pass. This queue is available to all users.

5. ^Ulam

Queues at a glance:

Queue
Names
Total Nodes
Available
Max Nodes
Per Job
Min Nodes
Per Job
Max Wallclock
Time
Max # Jobs
Per User
Max # Queued Jobs
Per User
defaultq 120 12 24:00:00
one_long 120 1 72:00:00


5.1. Default Queue

The default queue for Ulam's PBS server is named defaultq. If you do not specify otherwise, your jobs will be sent here. The defaultq queue is an execution queue. If you would like to submit to a different queue, you will have to run qsub with the -q argument followed by the name of the desired queue.

5.2. Execution Queues

  1. defaultq

This queue is intended for jobs that use up to 16 nodes per job for up to 24 hours of wallclock time. This queue does not have specific nodes allocated to it and may use any free nodes in the cluster. This queue can be explicitly specified using qsub -q defaultq followed by any additional parameters that you would like to pass. This queue is available to all users.

  1. one_long

This queue is intended for jobs that use up to 1 nodes per job for up to 72 hours of wallclock time. This queue does not have specific nodes allocated to it and may use any free nodes in the cluster. This queue must be explicitly specified using qsub -q one_long followed by any additional parameters that you would like to pass. This queue is available to all users.

6. ^Gibbs

Queues at a glance:

Queue
Names
Total Nodes
Available
Max Nodes
Per Job
Min Nodes
Per Job
Max Wallclock
Time
Max # Jobs Per
User
Max # Queued
Jobs Per User
defaultq 24 6 72:00:00 6

6.1. Default Queue

The default queue for this server is named route. If you do not specify otherwise, your jobs will be sent here. The route queue is a routing queue and, in general, will select the appropriate queue for a particular job. If you would like to submit to a different queue, you will need to run qsub with the -q argument followed by the name of the queue.

6.2. Routing Queues

  1. route

This is the default queue for Gibbs's PBS server. When no queue is specified to PBS, either on the qsub command-line or within a PBS script, jobs that you submit will be sent here. This queue routes to defaultq. The job will be sent to the first of these queues that will allow jobs with the resources that were requested to run. This queue is available to all users.

6.3. Execution Queues

  1. defaultq

This queue is intended for jobs that use up to 6 nodes per job for up to 72 hours of wallclock time. This queue does not have specific nodes allocated to it and may use any free nodes in the cluster. Users may run up to 6 jobs at a time in this queue. This queue will run jobs sent from the route routing queue. You can also explicitly specify the defaultq queue by running qsub -q defaultq followed by any additional parameters that you would like to pass. This queue is available to all users.

7. ^Poblano

Poblano is a 32 Core single node system.

Queues at a glance:

Queue
Names
Total Nodes
Available
Max Nodes
Per Job
Min Nodes
Per Job
Max Wallclock
Time
Max # Jobs
Per User
Max # Queued
Jobs Per User
defaultq 1 N/A N/A 48:00:00

7.1. Default Queue

The default queue for poblano’s PBS server is named defaultq. If you do not specify otherwise, your jobs will be sent here. The defaultq queue is an execution queue. If you would like to submit to a different queue, you will need to run qsub with the -q argument followed by the name of the queue that you would like to submit your job to.

7.2. Execution Queues

  1. defaultq

This is the only queue on the system for job submission. This queue allows jobs to run for 48 hours on the Poblano single node system.

8. ^Galles

Queues at a glance:

Queue
Names
Total Nodes
Available
Max Nodes
Per Job
Min Nodes
Per Job
Max Wallclock
Time
Max # Jobs
Per User
Max # Queued
Jobs Per User
all-nodes 200
Hadoop 17
Core2 105 40 40
PD-2.80 75 40 40
PD-3.00 3 3

8.1. Default Queue

The default queue for this server is named default. If you do not specify otherwise, your jobs will be sent here. The default queue is a routing queue and, in general, will select the appropriate queue for a particular job. If you would like to submit to a different queue, you will need to run qsub with the -q argument followed by the name of the queue.

8.2. Routing Queues

  1. default

This is the default queue for Galles's PBS server. When no queue is specified to PBS, either on the qsub command-line or within a PBS script, jobs that you submit will be sent here. This queue routes to the following queues (in this order): PD-3.00, Core2 and PD-2.80. The job will be sent to the first of these queues that will allow jobs with the requested resources to run. This queue is available to all users.

8.3. Execution Queues

  1. all-nodes

This queue has no restrictions on the number of nodes per job and no restrictions on the wallclock time. This queue has 200 specific nodes allocated to it. This queue must be explicitly specified using qsub -q all-nodes followed by any additional parameters you would like to pass. The queue is available to all users.

  1. Hadoop

This queue has no restrictions on the number of nodes per job and no restrictions on the wallclock time. This queue has 17 specific nodes allocated to it. This queue must be explicitly specified using qsub -q Hadoop followed by any additional parameters you would like to pass. The queue is available to all users.

  1. Core2

This queue is intended for jobs that use up to 40 node per job. This queue has no restrictions on the wallclock time. This queue has 105 specific nodes allocated to it. This queue will run jobs sent from the default routing queue. You can also explicitly specify the Core2 queue by running qsub -q Core2 followed by any additional parameters that you would like to pass. This queue is avilable to all users.

  1. PD-2.80

This queue is intended for jobs that use up to 40 nodes per job. This queue has no restrictions on the wallclock time. This queue has 75 specific nodes allocated to it. This queue will run jobs sent from the default routing queue. You can also explicitly specify the PD-2.80 queue by running qsub -q PD-2.80 followed by any additional parameters that you would like to pass. This queue is available to all users.

  1. PD-3.00

This queue is intended for jobs that use up to 3 nodes per job. This queue has no restrictions on the wallclock time. This queue has 3 specific nodes allocated to it. This queue will run jobs sent from the default routing queue. You can also explicitly specify the PD-3.00 queue by running qsub -q PD-3.00 followed by any additional parameters that you would like to pass. This queue is available to all users.



[1] Poblano is a Silicon Mechanics A422.v3 Shared-Memory Multi Processor single node machine.

[2] Certain system administrator user accounts may also access the systems at any time.

[3] You can determine which nodes a job is using by passing the option –n to qstat.

Center for Advanced Research Computing

MSC01 1190
1601 Central Ave. NE
Albuquerque, NM 87106

p: 505.277.8249
f:  505.277.8235
e: