Compute Job Submission

Running Jobs

The A*CRC pool of computers allows users to run very large number of interactive and batch jobs of varied footprints and duration, with appropriate management, scheduling, control and status monitoring capabilities built into the entire system.

The submission of jobs for running on A*CRC resources implies fully agreeing to the terms and conditions of the A*CRC systems and resources usage.

Schedulers

Important to maximise the use of each HPC resource, a scheduler selects jobs based on a set of pre-defined parameters such as CPU time required, memory required, number of cores or nodes required and several other parameters. The job in a queue with the highest priority, defined by a set of flags and a queue policy at a given machine, is the one the scheduler is going to start next.

Platform LSF scheduler has been installed in all A*CRC systems.

Interactive

Interactive-use machines have nodes specifically allocated to handle user logins as well as other nodes that are configured into an interactive use pool. Login nodes main purpose is editing files, code compiling and linking, batch system interaction such as submit or query. In addition, they can start both single-node applications as well as parallel jobs that would run in the compute pool. The login node is usually not selected by the users, but auto assigned by a distribution system.

You can query the system and follow your job status using the following commands. See the man page for each command for details. 

Interactive Job Commands
Commands
Machine Availability
Purpose
ps
All
Show current status of processes
gtop
Aurora and Fuji
Display and update information about the top CPU processes
topas
Cirrus
Display and update information about the IBM AIX system events

Batch

These jobs can be run without terminal access (default), with terminal access via run/proxy, or using the specific utility for each of the A*CRC systems. The run and proxy utilities are available to allow connection to the standard in, standard out, and standard error channels of jobs running in batch or elsewhere. Run must be used in starting the job to be connected to and then proxy can be used in an interactive environment to deal with the messages. See the man pages for run and proxy for more information.

Job Limits

Limits on job size and run duration are imposed on interactive and batch jobs. These limits are viewable per machine by invoking a set of commands specific for each machine, as an example: use command qstat –Q on Aurora or llclass on Cirrus.

cshThe standard C shell based on the syntax and commands seen in the C programming language.
zsh
The Z shell is created with interactive use in mind, as well as a scripting language. Similar to ksh but with many additions, like in the command-line editor or customisation options.
ksh
The Korn shell, sometime backward compatible with the Bourne shell and often integrated with the POSIX shell into one common shell
bsh
IBM AIX version of bash on A*CRC Cirrus system
sh
Available in several flavours, either as the standard POSIX shell or alternatively the Korn or the Bourne shell..
tcsh
A superset of the C shell adding more features like file name completion and command line editing, Tcsh is also compatible with the Berkeley Unix C shell versions

Storage and Purge

A*CRC has a major archival storage system available on all its system. Users are strongly urged to store vital files in archival storage because online files can be lost during a machine crash, not all directories are backed up, and files on some machines are purged. If you have an A*CRC, you also have a storage account. To connect to storage, type ftp 

Purge policies are subject to change and, when revised, are announced in news postings, and status e-mails. Once files are purged, there is no possibility of recovering them.

Visualisation

Visualisation group at Advanced Computing Program of IHPC manages Visualisation resources and offers interactive, real time realistic 3-D visualization capability for A*CRC users.

Fault Reporting

For more information, please refer to the Fault Reporting

User Login – Online

Most A*CRC resources are available from our login nodes on each machine. Access the host via ssh.

The command to log in as user is:
% ssh user@
An example is:
ssh user@fuji.acrc.a-star.edu.sg

Once you are logged in, you can find your files or data through the UNIX file system.