FAQs

General

ASPIRE2A general questions, and configuration queries

ASPIRE 2A is a supercomputer operated by the National Supercomputing Centre (NSCC) Singapore. It is an AMD-based Cray EX supercomputer with 15 PB of GPFS FS,10 PB of Lustre FS storage and Slingshot interconnect. It is built on x86 based architecture with HPL CPU benchmarks Rpeak of 3.145 PFLOPS (Rmax 2.58 PFLOPS) of compute throughput and I/O performance of up to 500GBytes/sec.

The ASPIRE 2A has 768 compute nodes (98,304 cores), 82 GPU nodes (352 GPUs), 16 large memory nodes (2,048 cores), 16 High Frequency Nodes (1,024 cores) and memory ranging from 512 GB to 4TB RAM. ASPIRE 2A has login nodes dedicated to each stakeholder.

Server CPU Model Cores per socket Socket per server Total Physical cores perserver Available RAM(DDR4) GPUs per server
Standard Compute Node(768 nodes) Dual-CPU AMD EPYC Millan 7713 @2.0GHz 64 2 128 512 GB No GPU
GPU compute node(64 nodes) Single-CPU AMD 7713 @ 2.0GHz 64 1 64 512 GB 4x Nvidia A100 40GB
GPU AI Node(12 nodes) Single-CPU AMD 7713 @ 2.0GHz 64 1 64 512 GB 4xNvidia A100 40GB(11TB NVMe)
GPU AI Node(6 nodes) Dual-CPU AMD 7713 @ 2.0GHz 64 2 128 1 TB 8xNvidia A100 40GB(14TB NVMe)
Large memory node(12 nodes) Dual-CPU AMD 7713 @ 2.0GHz 64 2 128 2 TB No GPU
Large memory node4 nodes) Dual-CPU AMD 7713 @ 2.0GHz 64 2 128 4 TB No GPU
High frequency node(16 nodes) Dual-CPU AMD 75F3@ 2.95GHz 32 2 64 512 GB No GPU

ASPIRE 2A is built on a RHEL8 Linux platform and currently does not support Microsoft Windows OS.

The ASPIRE 2A job portal is a graphical user interface which enables users to run, monitor, and manage the jobs on ASPIRE 2A. It can be accessed from https://jobportal.nscc.sg/ Note – Job portal is accessible only via NSCC VPN. Stakeholder’s links are work in progress.

The ASPIRE 2A Remote Visualisation Portal is a graphical user interface which enables users to run graphic intensive applications using Altair Access Desktop Session. It can be accessed from https://visual.nscc.sg
Note – Visualisation portal is accessible only via NSCC VPN. Stakeholder’s links are work in progress.

SU is the currency to utilize resources on the ASPIRE 2A system. SU is charged based on the requested CPU and GPU resources as follows:

  • For CPU only jobs: 1 SU for 1 ncpus per hour.
  • For GPU jobs: 64 SU for 1 ngpus per hour (no charge for ncpus)

NOTE: GPU Jobs will have fixed CPU and Memory ratio enforced i.e. for every 1 GPU requested, 16 CPU and 124 GB of memory will be allocated and SU charged will be 64.

Accessing ASPIRE2A

Queries regarding accessing ASPIRE2A, logging in and transferring files.

NSCC’s enrollment system is federated with SingAREN to facilitate users from A*STAR, NTU, NUS, SUTD, SMU, SP and TP to register seamlessly to ASPIRE2A.

Please follow the instructions in the User enrollment Guide for Stakeholders to enroll and get access to ASPIRE 2A. If you don’t belong to one of the organizations above, please request for an account via https://new.nscc.sg

*. *Note: Login ID creation is subjected to availability and approvals. A new User ID will be assigned automatically if the requested User ID is not available.

Enrolment for NUSEXT users are automated. However, users must enrol to NUS IT HPC first and can then use the same credentials to enrol to NSCC through the portal once the ID is created in NUS IT HPC.

To utilise ASPIRE 2A resources, you need to login to the cluster. Please follow the below table to determine how to access ASPIRE 2A. How to reach login node via optimal path

User From Entity Host/FQDNLoginNode If connect from outside campus network
NUS aspire2a.nus.edu.sg connect to NUS VPN first
NTU aspire2antu.nscc.sg Note:- NTU users will need to request NTU jumphost to access ASPIRE2A via aspire2antu.nscc.sg Please email your jumphost access request to [email protected] How to access jump host:- Using-NTU-JumpHost-to-NSCC-ASPIRE-2A
A*STAR aspire2a.a-star.edu.sg connect to A*STAR VPN first
SUTD aspire2a.sutd.edu.sg connect to SUTD VPN first
Direct User aspire2a.nscc.sg or login.asp2a.nscc.sg For all users not from the above categories, please connect to ASPIRE 2A VPN first before accessing login nodes. Kindly refer to VPN guides here: 1 : ASPIRE 2A VPN for Windows 2 : ASPIRE 2A VPN for Mac 3 : ASPIRE 2A VPN for Linux

For example, a user from NUS can reach the login node using ssh client via “ssh user_id@aspire2a.nus.edu.sg” Common client tools for connecting to login nodes and file transfer

Operating System Function Tools
Windows SSH Secure Client Putty, MobaXterm
File Transfer FileZilla, Winscp
Linux/Unix SSH Secure Client Terminal or ssh
File Transfer FileZilla, scp, rsync–e ssh
MacOS SSH Secure Client ssh
File Transfer scp, rsync -e ssh
Any OS ASPIRE 2A Job Portal https://jobportal2a.nscc.sg/
ASPIRE 2A Remote Visualisation Portal https://visual2a.nscc.sg/

Follow the below instructions to connect to ASPIRE2A through portal:

  • Open the web browser and browse for the URL provided above.
  • Use your ASPIRE2A credentials to access the web portal.

Below are instructions to connect using SSH/SCP on UNIX/Linux or Mac:

  • Open any SSH terminal
  • Type “ssh user_id@login_hostname” # Replace user_id with your user ID and login_hostname with your assinged hostname
  • Enter the password (characters are invisible while typing the password)
  • Once logged in successfully you should be able to see a “$” prompt that allows you to execute Linux commands

If you want to use the X11 interface, replace the above ssh command with ssh -X. For X11 – ssh -Y, make sure you have XQuartz installed for OS X 10.8 or higher. Below are instructions to connect using SSH/SCP on Windows:

  • Open any SSH terminal software (e.g. putty, MobaXterm)
  • In case of putty, type the login host name and click on Open button (Putty may allow you to login automatically without asking for password. In case if putty asks for username/password, you need to type the login ID and password from your university.)
  • In case of MobaXterm, type to command at the prompt “ssh userid@”
  • Upon successful login, you will be prompted with “$” to use NSCC Supercomputer

Users will be able to access files on the Internet using http or https methods using tools such as curl or wget.

Starting from 2nd January 2025, a new firewall rule implemented by the NSCC security team is designed to enhance cluster security by blocking all traffic destined for blacklisted IPs and URLs. This measure aims to prevent unauthorized or malicious communication, ensuring a safer and more secure computing environment for all users. As a workaround, users are advised to download the files to their local system and then upload them to Aspire2a.

Please login to https://user.nscc.sg/ with your organization credentials and click on set/reset your password. Once a new password is set you can login to NSCC with the new password. Please follow the Password Reset guide and follow the step-by-step instructions. Note that the password must contain:

  • a minimum of 8 characters;
  • a mixture of upper and lower cases;
  • numbers (0-9); and
  • at least one special character: !@#$%^&*+=?>

If you still cannot access your account after resetting your password, please contact our help desk at [email protected]

ASPIRE 2A SSH session idle timeout is set to 12 hours. The ssh session will terminate automatically after. This is a security measure to ensure that NSCC offers a safe environment to all users of ASPIRE 2A.

The NSCC account password expires every 90 days. Please be reminded to reset your password before the expiry of your password. If you did not reset your password after 90 days, your account will expire. Upon expiry of the account you will not be able to login to NSCC system and the messages as shown below will be displayed when using the ssh connection.

  • Your account has expired; please contact your system administrator
  • You are required to change your LDAP password immediately.
  • Connection closed by xxx.xxx.xxx.xxx

There’s a grace period of 90 days on top of the 90 day expiry, after which to activate the account, one must contact us through [email protected] in order to activate the account, however it is mandatory to re-set the password after activating the account. Files in home directory are untouched until 90 days post which the files from home directory will be archived, while file in scratch still follow the purge policy. To activate the account one must login to https://user.nscc.sg/ with your organization credentials and click on set/reset your password. Once a new password is set you can login to NSCC with the new password. PS: Changing password through passwd command will have no effect on the expiry date. If your account is still not re-activated after resetting your password, please contact our help desk at [email protected]

ASPIRE 2A VPN connection remains valid for up to 24 hours. VPN connection will automatically terminate after the time limit is reached. Users are required to re-reconnect to re-establish the VPN connection.

Only the Duo 2FA mobile app is supported currently. NSCC is in the midst of exploring other alternatives.

Only Checkpoint VPN client is supported.

Yes. You can SSH to the compute nodes where your jobs are running and only via login nodes.

To access the compute node of a running job through SSH via login nodes, you need to export the job id in environment first. For example, if the job ID is 123456.pbs101 and the nodes assigned are : x1000c0s0b0n1 and x1000c1s6b0n1, you can do the following:

$ export PBS_JOBID=123456.pbs101

Then, SSH will be allowed on x1000c0s0b0n1 and x1000c1s6b0n1 from the node where you exported the PBS_JOBID and the SSH sessions will be attached to that PBS JOB. $ssh x1000c0s0b0n1 Subsequent SSH sessions from any of the compute node will already have the PBS_JOBID exported and will not require to export PBS_JOBID.

  • if the PBS_JOBID is not correctly exported and the JOB ID did not match, SSH session will be rejected.
  • if there are multiple jobs running, you need to export the relevant Job Id every time to SSH to the compute node.

To check the current exported job ID in the current session on login node, you can do :

$ echo $PBS_JOBID

Projects Application

NSCC resources can only be utilised through approved projects.

The personal quota of is granted to users from all stakeholder organisations (A*STAR, NTU, NUS, SUTD) and IHL (autonomous university and polytechnic) upon account creation.

  • Stakeholder organisations (A*STAR, NTU, NUS, SUTD): 100,000 SU and 50 GB of storage
  • IHL (SIT, SMU, SUSS, NP, NYP, RP, SP, TP): 10,000 SUs and 50 GB of storage

The quota is fixed, non-transferable and cannot be extended. Once the personal quota is exceeded, you can only submit a job through an approved project. Users are encouraged to apply for projects only if your resource request exceeds the personal quota, or if you have depleted your personal quota.

  • The Project Applicant should be the Principal Investigator (PI) or the supervisor of the project. If required, the PI may authorise a team member to submit the application on their behalf. The authorisation can be in the form of an email acknowledgement to be sent to [email protected].
  • The Project Applicant will be responsible for all correspondences of the project, including but not limited to the receiving of notifications from NSCC, managing of project membership and requests for changes in resources.

Please refer to this page for more information.

Please refer to https://www.nscc.sg/srapolicy/ for more information on the NSCC Singapore Strategic Allocation Policy.

Please refer to https://www.nscc.sg/srapolicy/ for more information on the NSCC Singapore Strategic Allocation Policy.

All resource allocations under the official Call for Projects, if approved, will be valid for 1 year, unless otherwise stated.

NSCC Singapore will announce the results of the Call for Projects via a Letter of Award to the respective Research Offices (RO) of the hosting institutions. For all communications, including the award of projects and post-award management of granted compute resources, NSCC Singapore will liaise with the ROs as the primary point of contact. 

Hence, project applicants are advised to reach out to your respective ROs for updates.

Please note that only the project applicant will be able to see the project in the Project Portal. Only projects allocated under the Call for Project will be indicated in the portal. You may use the following command to check all projects that you are part of: myprojects

Please complete Part I and Part II of the form and send the request to [email protected]. This is a form to request changes to the Project Applicant of an approved and active project. An official email is mandatory. Note: Requests for this change can only be made by the Project Applicant or Project Investigator of the project.
For the addition of existing users:
Please complete Part I of the form and send the request to [email protected]. This is a form to request changes to the membership of an approved and active project. Note that requests for membership changes can only be made by the Project Applicant or Principal Investigator of the project.

Please note that only users with a valid NSCC user ID can be added to the project.
Account creation:
  • For users from NUS, NTU, A*STAR, SUTD, TCOMS, SMU, SIT, SP, TP and RP: Please register for one via the self-service portal here.
    • Users from SMU, SIT, SP, TP and RP need to contact [email protected] to request for access to NSCC’s VPN after account creation
  • For users from all other entities: Please request for an account via the Account Creation Form here.
    • Please note that this is subject to approval.
In order for the new committee SRAC to better assess your project, please provide all required information for your project via the Additional Project and Funding Information Form.
The PI/applicant of the project should send an email to [email protected] with the Form Response ID once completed.

Yes, projects that are not renewed will expire by the end of the project cycle. When your project ends, please update the project deliverables in the Project Portal. At the end of the project cycle, all resources granted will expire.

  • Project quota will be set to zero (0) hours and no jobs will be dispatched.
  • All project data will remain in the system for one month.
  • After one month, all project data will be archived.
  • All archived data will be retained for 5 months from the date of archival.
  • A charge of $0.022/GB will be imposed for retrieval of archived data.

Please plan your resource utilisation wisely and use your allocated resources as early as possible to avoid encountering resource bottlenecks towards the end of the project cycle.

Please refer to https://www.nscc.sg/srapolicy/ for more information on the NSCC Singapore Strategic Allocation Policy.

Please note that all project allocation grantees must acknowledge and cite NSCC for any published manuscripts and presentations and submit the proof of deliverables achieved. All academic journals and conference papers shall cite NSCC with the following: “The computational work for this article was (fully/partially) performed on resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg).” For Call for Project: Please login to the Projects Portal and update the deliverables achieved for all projects under the “Deliverables” tab for each project. For all other projects not found on the Projects Portal: Please submit your deliverables update via this form.

Please contact [email protected] if you would like to apply for resources other than through a project call.

Accounting

Queries related to usage of projects, checking the allocation, and usage of project.

The disk quota allocated is limited as stated below respectively :

/home – Home directory – 50GB.

/scratch – Scratch directory – 100 TB

This will not be extended on any terms. If more disk space is needed, you should apply for a project through the Project Portal. You can check your current disk usage on /home by executing the “myquota” command.

$myquota

+——————————————————+—————————+
| Block Limit | Inode Limit |
| Filesystem Type Usage Quota | Usage Quota|
+——————————————————+————————-+
| asp2a-home GPFS-FILESET 38.11G 50.00G | 231638 0 |
| scratch LUSTRE 1.574T 100T | 142726 200000000 |
+======================================================+

Purging policy is implemented on scratch directory and files unused for more than 30 days will be purged automatically. File under home or project directories are untouched.

You can check your current CPU usage by executing the “myusage” command.

$ myusage

Your usage as of 12/02/2025-09:15:05
+———————+———-+——————+———————+
| Time Range | Resource | Num of Jobs | Core/Card Hours |
+———————+———-+——————+———————+
| Past 7 days | CPU | 0 | 0.000 |
| | GPU | 0 | 0.000 |
+———————+———-+——————+———————+
| Past 30 days | CPU | 0 | 0.000 |
| | GPU | 0 | 0.000 |
+———————+———-+——————+———————+
| Overall | CPU | 124 | 125.390 |
| | GPU | 14 | 4.321 |
+=========================================================================+

Upon logging in to ASPIRE 2A, you will be able to see your SU balance statement with usage split for CPU and GPU resources. The actual remaining balance is shown against the “SU” resource. The “Grant”, “Used” and “Balance” for these resources reflects the project allocation approved, used and remaining respectively.

+——–+———————+———————+———————+ | Unit | Grant | Used | Balance | +——–+———————+———————+———————+ | SU | 10512000.000 | 1342398.472 | 9154217.528 | | CPU/HR | 10000000.000 | 1101916.673 | 8898083.327 | | GPU/HR | 8000.000 | 3757.000 | 4243.000 | +===================================================+

Total SU Grant: 10512000

Total SU Used: 1342398.472

Total SU Balance: 9154217.528

“Total SU Grant” corresponds to the amount of resource that was granted.

“Total SU Used” corresponds to the amount consumed for completed jobs.

“Total SU Balance” corresponds to the remaining amount available at that instance. (i.e., “Total SU Grant” – “Total SU Used” – “Total SU for running jobs”).

Since the accounting happens in prepaid mode, SU will be fully deducted for the requested resources and will only be refunded once the job is completed. Please note that once the Used+Total SU for running jobs >= Granted, the subsequent jobs will be ON HOLD, and no new jobs will be allowed. The new jobs will only be allowed when they fit into the Balance. For example, if you request 50 hours for your job but in actuality your job only needs 10 hours to complete, the AMS will hold back the requested 50 hours to itself (and mark them as “pending core hours”) until the 10 hour-job is completed. The remaining 40 hours will only be released and brought back to the user when the job is completed. What is categorised as “pending core hours” will be deducted from your “available core hours”. Therefore, if you are not requesting for the right amount of wall time in your job, you might be temporarily seeing less available core hours than you actually have.

When logged into the NSCC login nodes, you will be able to see the utilisation of projects assigned to the user in the MOTD page. To check the usage at any point in time, please execute the command: $myprojects -p For example, to view the project summary of the project 90000001, execute the command: $myprojects -p 90000001

A similar output shown below will be displayed:

$myprojects -p 90000001

Project : 90000001
ExpDate : 2032-12-31
Organis : NSCC
P.Inves : Hiew Ngee Heng Paul
Appl.Id : null
Members : hiewnh fsg1 satriamt chungsy leety1 ngkg darrenting ganesan
jmathpal michaelqi ayush mahendra ronyap wang_yi malikm michael
nichola6
P.Title : NSCC Admin project

Project 90000001 balance as of 12/02/2025-08:20:06
+——–+———————+———————+———————+———————+
| Unit | Grant | Used | Balance | In Doubt |
+——–+———————+———————+———————+———————+
| SU | 9777561600.000 | 7497864.002 | 9770063735.998 | 0.000 |
+——–+———————+———————+———————+———————+
In doubt – SU deducted for current running jobs (Prepaid)

Project 90000001 SU Usage breakdown
+—————+—————–+—————–+—————–+
| Unit | Usage | SU Rate | SU Used |
+—————+—————–+—————–+—————–+
| CPU Hour | 5943744.859 | 1 | 5943744.859 |
| GPU Hour | 24283.111 | 64 | 1554119.143 |
+—————+—————–+—————–+—————–+

To check the storage disk usage, execute the command: $myquota -p

For example, for project 90000001, execute the command: $myquota -p 90000001

A similar output shown below will be displayed:

$ myquota -p 90000001

Time of reporting: 12-02-2025 10:24:58
+———-+————————+—————–+————-+————-+
| Project | Directory | Filesystem | Block Usage | Inode Usage |
+———-+————————+—————–+————-+————-+
| 90000001 | /home/project/90000001 | asp2a-data | 21.11G | 57052 |
+———-+————————+—————–+————-+————-+

Upon logging in to ASPIRE2A, you will be able to see the amount of core hours remaining in all the projects in which you are a member of, as seen below:

Summary of Project 90000001 as of 12/02/2025-08:20:06
+——–+———————+———————+———————+———————+
| Unit | Grant | Used | Balance | In Doubt |
+——–+———————+———————+———————+———————+
| SU | 9777561600.000 | 7497864.002 | 9770063735.998 | 0.000 |
+——–+———————+———————+———————+———————+

Core hours remaining for project: Total Grant: 9777561600.000 Total Used: 7497864.002 Balance: 9770063735.998 “Total Grant” corresponds to the amount of resource that was granted. “Total Used” corresponds to the amount consumed so far. “Balance” corresponds to the current amount requested (i.e. #running_jobs * per_job_core_requested * per_job_walltime_requested)

“Total Avail” corresponds to “Total Grant” – “Total Used” – “Total Pending” Be aware that once the (Used+Pending >= Granted), the subsequent jobs will be ON-HOLD, and no new jobs will be allowed. The new jobs will be allowed when it fit into the Avail, which is (Avail >= Granted – Used – Pending).

For example, if you request 50 hours for your job but your job actually only needs 10 hours to complete, the AMS will hold back the requested 50 hours to itself (and mark them as “pending core hours”) until the 10 hour-job is completed. The remaining 40 hours will only be released and brought back to the user at that point of time.

What is categorized as “pending core hours” will be deducted from your “available core hours”, so if you’re not requesting for the right amount of walltime in your job, you might be temporarily seeing less available core hours than you actually have.

The myprojects command provides detailed information about project usage, user/project-specific details, and historical data reporting with customizable date ranges.

  • -p : Fetch project details for a specific project ID.
  • -l: Show detailed project usage.
  • -s : Specify the start date (in YYYY-MM-DD format) for detailed usage reporting. Defaults to yesterday if not specified.
  • -e : Specify the end date (in YYYY-MM-DD format) for detailed usage reporting. Defaults to yesterday if not specified.
  • -h: Display help for the command.

Can I use multiple options together ?

Yes, you can combine options to refine your query. For example:

myprojects -p  -l -s 2025-01-01 -e 2025-01-25

Note: The program only displays data since 2024-09-01 onwards.

How can I display the command’s help information?

myprojects -h

Applications and Libraries

All queries related to Aspire Applications and libraries

MPI is a message-passing library interface standard and includes MPI-1, MPI-2, MPI-3. It is one of the most used technologies to parallelise computing tasks. There are various implementations of MPI such as OpenMPI, CRAYMPICH and IntelMPI. MPI standard defines the programming interfaces for C and Fortran, which means that all the codes which are used in C/Fortran can be used with MPI to parallelise the codes. There are also MPI bindings for Python and Java. ASPIRE 2A is a Cray cluster and supports Cray MPICH, which can offer better performance than other MPI implementations.

There are several reasons:

  1. Improper MPI environment. For example, the Gromacs module “gromacs/2021.4” is compiled with Cray-mpich whereas it should be run within the Cray-mpich environment.
  2. Improper CPU binding. For Cray-mpich, the cpu-binding parameters are “depth”, “cores”, “numa”. Generally, the “depth” will give a better performance.

For example, the command “mpirun -np 32 –cpu-binding=depth -d 2 gmx_mpi mdrun …” will distribute the MPI codes to 64 CPU cores and each MPI thread with 2 openmp threads.

Cray MPICH is the default MPI environment and is also the MPI implementation supported by Cray. We encourage users to use Cray MPICH first. The module name is “cray-mpich” and the MPI compilers are cc for MPI C codes, CC for MPI c++ codes, and ftn for compiling MPI Fortran codes. Other MPI implementations available on ASPIRE 2A include OpenMPI and MVAPICH. The compilers are mpicc for C codes, mpic++ for c++ and mpif77/mpif9 for Fortran codes. Please note that support for these implementations is limited.

One of the techniques to run your code in parallel is OpenMP. Codes using OpenMP are normally restricted to run in one physical server, unless you are running a hybrid code made out of MPI and OpenMP. OpenMP codes are not capable of communicating across nodes through a network infrastructure such as Infiniband. OpenMP code is simple to use and does not need any wrappers. Standard compilers like GCC/Gfortran or Intel C/Fortran can use OpenMP. However, the program must have OpenMP directives. In the jobs script, the environment variable OMP_NUM_THREADS should be used to manage the number of OPENMP threads in the app. For example,

export OMP_NUM_THREADS=4

will allow the app to use 4 OpenMP threads in one computing task. An application which combines OpenMP and MPI together, when run with cray-MPICH, can set up the OpenMP threads with “-d” parameter. For example:

mpirun -np 12 –cpu-bind depth -d 4 app

In this case, the application will run with 12 MPI processes and each MPI process has 4 OpenMP threads running in dedicated cores each. Note: using –cpu-bind depth to control how tasks are bound to cores. Options include:

  • none – No CPU binding
  • numa, socket, core, thread – Bind ranks to the specified hardware
  • depth – Bind ranks to number of threads in ‘depth’ argument
  • list – Bind ranks to colon-separated range lists of CPUs
  • mask – Bind ranks to comma-separated bitmasks of CPUs

using -d 4 to specify the number of cpus (cores) to be allocated per process. This may be useful if the job is multithreaded and requires more than one cpu per task for optimal performance.

A serial application is an application which uses only one process at a time to perform calculations, while a parallel application can scale to multiple processes and make use of cores/CPUs in one or multiple servers using high speed Interconnects. For this reason, parallel applications can shorten the wall-time of computing tasks in general. To promote the HPC experience, it is strongly recommended to use parallel codes on ASPIRE 2A if applicable.

Checkpointing is a technique used to save the output periodically during the runtime of an application. The advantage of using checkpointing is that it is possible to restart the program from the last checkpoint should the execution of the program terminate abruptly due to any reason. You are advised to use the checkpointing and restart technique while writing your application so that should the program be terminated, you would be able to restart the job from the last checkpoint.

The latest compiling tools and environment are supported on ASPIRE 2A and all the compiler’s environments are managed via modules. The default environment is PrgEnv-cray. PrgEnv-gnu for GNU GCC PrgEnv-intel for Intel compiler PrgEnv-nvhpc for OpenACC support PrgEnv-aocc for AMD AOCC compiler craype-accel-nvidia80 for GPU CUDA compiler The program environment can be changed via the “module swap” command. For example- “module swap PrgEnv-cray PrgEnv-intel”. When choosing the compiler, please refer to the following compiler table:-

ASPIRE 2A
PrgEnv-gnu PrgEnv-intel craype-accel-nvidia80 PrgEnv-nvhpc openmpi PrgEnv-aocc cray-mpich
c cc cc cc cc
c++ CC CC CC CC
fortran ftn ftn ftn ftn
mpicc mpicc cc
mpic++ mpic++ CC
mpifortran mpif90 ftn
cuda nvcc cc
fortran(cuda) ftn

If your codes are developed by CUDA(*.cu), these codes can be compiled with Nvidia CUDA. On ASPIRE 2A, the package name is “craype-accel-nvidia80” and it can work with other PrgEnv-cray, PrgEnv-gnu, PrgEnv-intel and PrgEnv-aocc modules to compile the apps. Use cc to compile the c codes, CC to compile C++ codes, and nvcc for CUDA codes.

Currently, NSCC does not support compilers other than the versions mentioned above. If you’re keen to use other compilers, you are more than welcome to install it into your own project or home directory. Please contact our service desk (access Service Desk or contact [email protected]) if you face any difficulties during the installation.

There are Cray Scientific libraries (cray-libsci, cray-fftw), by default, on ASPIRE 2A. Intel MKL math libraries are also available.

Commercial applications can be installed in ASPIRE 2A, but they need to be installed and used according to their respective licensing terms. Firstly, the license and the software modules of the intended application needs to be provided by you to NSCC for installation. Based on the license terms and conditions, we will then determine whether the software should be installed in ASPIRE 2A and how it should be installed. The same rule applies to applications with academic licenses. Please note that the accountability on the usage of the software should be held by the owner of the license. For more information, please contact our service desk.

Yes, you can request for additional libraries or applications through the Service Desk portal. However, the installation of libraries is subject to various conditions such as compatibility, time required to make the library available, dependencies and our software policies. Please contact the Service Desk for further clarification. Our Technical specialist will respond to you.

Yes you can create your own virtual python environment with Anaconda, Miniconda or one of the Python distributions and install the Jupyter lab/notebook in the environment. The jupyter lab/notebook can be accessed through VIS nodes or through batch jobs.

As the underlying hardware and modules are separate on ASPIRE 2A, you will need to recompile your codes. You can run all the available modules with module avail command or please refer here for more info.

Job Submission and Scheduling

Queries related to PBS Pro, Job submission and Job status

To find out the reason why your jobs are not running, run “qstat -s ” to print the comments from the job scheduler about your job. If you see a “–””Q” in the column “S”, it means the scheduler has not executed your job yet.

A few common reasons listed below:

  • The operating system that you are using is different from that in ASPIRE 2A. Solution:
    • If you are running in your local Linux machine, a simple recompilation on the ASPIRE 2A login nodes will solve the issue.
    • If you are running on a Windows PC and you want to run in ASPIRE 2A, you need to either obtain a copy of the software for Linux or port the code from Windows to Linux.
    • If you are transferring scripts from a Windows PC, remember to check if the file was converted to Linux file type, e.g. using dos2unix command.
  • Different compiler/library stack
    • If your job is not running although your operating system in your own environment is the same as ASPIRE 2A, it means the compiler/libraries are not compatible. You need to recompile your codes on ASPIRE 2A login nodes.

Please contact our service desk (access Service Desk or contact [email protected]) if you still face any issues with running your job on ASPIRE 2A.

This error could be because you have forgotten to source the appropriate system .rc in your personal .rc file.

  • If you are using an sh-derived shell for your jobs, edit the .bashrc file to ensure it contains the line “. /etc/bashrc”.
  • If you are using a csh-derived shell for your jobs, edit the .cshrc file to ensure it contains the line

“source /etc/csh.cshrc”. If you have accidentally deleted the .bashrc file, you can copy back the original .bashrc file from /etc/skel/.bashrc. Example: $ cp /etc/skel/.bash* ~/  Note: Avoid using the command “module purge” to remove all modules. Try to use “module rm/swap” to unload or swap modules.

If your batch job is named, runjob.sh and if your output is not redirected in the batch script, your job output will appear in runjob.sh.o****, where the digits at the end of the file name represent your job ID. The final entries in the .o file give you the details on wall time and virtual memory used by the job. Similarly, error messages of your jobs will be recorded in runjob.sh.e****, where the digits at the end of the file name represent your job ID.

In a PBS job script, the memory you specify using the -lmem= option is the total memory across all nodes. However, this value is internally converted into the per-node equivalent, and this is how it is monitored. For example, since NSCC Supercomputer has 24 cores per node, if you request -l select=2:ncpus=24, mem=10GB, the actual limit will be 10GB on each of the two nodes. If you exceed this on either of the nodes, your job will be killed. Please note that if a job runs for less than a few minutes, the memory use reported in your .o file once the job completes may be inaccurate. We strongly discourage people running short jobs of less than 1 hour. This is because there is significant overhead in setting up and tearing down a job and you may end up wasting large amounts of your grant. Instead, if you have many short jobs to run, consider merging them together into a longer job.

Please note that you are not supposed to run any job on the Login nodes. We will automatically kill user jobs running on the login node. If you want to run an interactive session, please submit an interactive job into the compute nodes using the qsub -I command. $qsub -I

The queues available for job submission are:

  • normal

“normal” queue routes the job to various other queues on the EX system based on resources requested. The details are outlined in the next question (CPU and GPU jobs can be submitted). This queue is suitable for both CPU and GPU jobs, but not suitable for AI jobs which typically require higher IOPs and large numbers of smaller files.

  • ai

“ai” queue routes the job to various queues on the accelerator system that is dedicated for GPU only workloads with NVMe as local storage to boost IOPs and cater for large small files access scenarios. NVMe storage is mounted at /raid as local disk on each AI node.

There are several queues in the system with different conditions which works for different workloads. Below are some of the queues in the system.

Complex Route Name Exec Queue Name Conditions (Per Job) Restrictions
pbs101 normal q1 ncpus = 1walltime = 02:00:01 hours to 24:00:00 hours
q2 ncpus = 2 to 64walltime = 02:00:01 hours to 24:00:00 hours
q3 ncpus = 65 to 127 (less than 1 node)walltime = 02:00:01 hours to 24:00:00 hours
q4 ncpus = 128 (1 node)walltime = 02:00:01 hours to 24:00:00 hours
q5 ncpus = 129 to 2048 (2 to 16 nodes)walltime = 02:00:01 hours to 24:00:00 hours
q6 ncpus = 2049 to 98304 (16 to 768 nodes)walltime = 02:00:01 hours to 24:00:00 hours
qlong ncpus = 1 to 128 (1 node)walltime = 24:00:01 hours to 120:00:00 hours Maximum 2 running jobs per user at a time upto a total of 128 ncpus
qdev ncpus = 1 to 128 (1 node)walltime = 00:00:01 hours to 02:00:00 hours Maximum 2 running jobs per user at a time upto a total of 256 ncpus
g1 ngpus = 1walltime = 02:00:01 hours to 24:00:00 hours
g2 ngpus = 2 or 3walltime = 02:00:01 hours to 24:00:00 hours
g3 ngpus = 4 (1 node)walltime = 02:00:01 hours to 24:00:00 hours
g4 ngpus = 5 to 256 (2 to 64 nodes)walltime = 02:00:01 hours to 24:00:00 hours
glong ngpus = 1 to 64 (1 to 16 nodes)walltime = 24:00:01 hours to 120:00:00 hours
gdev ngpus = 1 to 4 (1 node)walltime = 00:00:01 hours to 02:00:00 hours Maximum 2 running jobs per user.Maximum 16 ngpus running at a time, all users combined
l1 ncpus = 1 to 128 (1 node)walltime = 00:00:01 hours to 24:00:00 hours memory = 512 GB to 1 TB
l2 ncpus = 1 to 128 (1 node)walltime = 00:00:01 hours to 24:00:00 hours memory = 1 TB to 2 TB
l3 ncpus = 1 to 128 (1 node)walltime = 00:00:01 hours to 24:00:00 hours memory = 2 TB to 4 TB
pbs102 ai aiq1 ngpus = 1walltime = 00:00:01 hours to 24:00:00 hours
aiq2 ngpus = 2 or 3walltime = 00:00:01 hours to 24:00:00 hours
aiq3 ngpus = 4 (1 node)walltime = 00:00:01 hours to 24:00:00 hours
aiq4 ngpus = 5 to 96walltime = 00:00:01 hours to 24:00:00 hours Maximum 6 chunks of 8 ngpus
aidev ngpus = 1walltime = 00:00:01 hours to 02:00:00 hours
ailong ngpus = 1 to 4 (1 node)walltime = 00:00:01 hours to 120:00:00 hours Maximum of 1 running job per user

However, these queues do not accept jobs directly. Rather, jobs will be routed once submitted to the normal/ai queue. NOTE: There is a system wide restriction of a maximum of 100 Jobs per user, unless otherwise specified at queue level.

E: Job is exiting after having run F: Job is finished. Job has completed execution, job failed during execution, or job was deleted. H: Job is held. A job is put into a held state by the server or by a user or administrator. A job stays in a held state until it is released by a user or administrator. M: Job was moved to another server Q: Job is queued, eligible to run or be routed R: Job is running S: Job is suspended by the server. A job is put into the suspended state when a higher priority job needs the resources. T: Job is in transition (being moved to a new location) U: Job is suspended due to workstation becoming busy W: Job is waiting for its requested execution time to be reached or job specified a stage in request which failed for some reason. X: Subjobs only; subjob is finished (expired) Please refer to the PBSPro manual for more details: https://help.altair.com/2021.1.3/PBS%20Professional/PBSUserGuide2021.1.3.pdf

Job arrays are a great way to organise the execution of multiple short jobs with similar properties or if they are using similar data with different algorithms or if they are using a serial input file numbering system e.g. file01, file02, file03. Example: Submit 10 jobs with consecutive index numbers. #!/bin/sh Examples Using a Job Script #PBS -N Simn1010Jobs #PBS -J 1-10 echo “Main script: index ” $PBS_ARRAY_INDEX /opt/AppA –input /home/user01/runcase1/scriptlet_$PBS_ARRAY_INDEX

A interactive job can be submitted by “qsub” command. See below for some examples. Submit the job to CPU node and ask to use 4 CPU cores and 16 GB memory:

qsub -I -l select=1:ncpus=4:mem=16gb -l walltime=02:00:00 -P Project-ID -q normal

Submit a job to GPU node, which asked to use 1 GPUs:

qsub -I -l select=1:ngpus=1 -l walltime=02:00:00 -P Project-ID -q normal

Submit an interactive job to AI queue using 1 GPU:

qsub -I -l select=1:ngpus=1 -l walltime=02:00:00 -P Project-ID -q ai

All the AI jobs should be submitted to the “ai” queue. An example of an AI job may look like this:

#!/bin/sh #PBS -l select=1:ngpus=4 #PBS -l walltime=12:00:00 #PBS -q ai #PBS -P #PBS -N mxnet_sing #PBS -j oe #PBS -o log image=”/app/apps/containers/mxnet/mxnet_22.12-py3.sif” cd “$PBS_O_WORKDIR” || exit $? [ -d log ] || mkdir log module load singularity datadir=/app/workshops/introductory/aidata/mxnet480 singularity exec –nv -B /scratch:/scratch -B /app:/app -B /home/project:/home/project $image python /opt/mxnet/example/image-classification/train_imagenet.py \ –gpus 0,1,2,3 \ –batch-size 512 –num-epochs 1 \ –data-train $datadir/train_480_100k.rec \ –data-train-idx $datadir/train_480_100k.idx \ –disp-batches 10 –network resnet-v1 \ –num-layers 50 –data-nthreads 32 \ –min-random-scale 0.533 \ –max-random-shear-ratio 0 \ –max-random-rotate-angle 0 \ –kv-store nccl_allreduce

You can check the status of jobs in the “ai” queue by running the following command:
$ qstat -anstw @pbs102
This provides a detailed view of all jobs on the AI Cluster.

The “M” status indicates that the job has been moved to another queue. In many cases, GPU jobs are automatically moved to the “ai” queue if the associated project has access to AI Cluster resources and there is available capacity on the AI Cluster.
To check the updated status of your job after it has been moved, run:
$qstat -anstw @pbs102

To use the singularity image, the singularity module should be loaded. Users can prepare the images by themself or use the pre-build images, which are located here “/app/apps/containers”. For example, pull an image from Docker hub:

module load singularitysingularity build ubuntu.sif docker://ubuntu:latest

To run an image:

module load singularityexport SINGULARITY_BIND=”/home/project:/home/project,/scratch:/scratch,/app:/app” singularity exec –nv /app/apps/containers/pytorch/pytorch-nvidia-22.12-py3.sif python thePythonScript.py

AI queue is only accessible upon request or specifically mentioned during project submission. To get AI queue access, please email your request to [email protected].

There are 3 types of nodes in the cluster based on their memory, which are 502GB, 2TB and 4TB. A job can request multiple chunks of memory with a maximum of 4TB (4000GB) per chunk. For example, -l select=2:ncpus=128:mem=4000g is a job that is requesting 8TB (8000GB) memory in total and will run on 2 nodes. However, “ai” queues will have fixed CPU and Memory ratio enforced i.e. for every 1 GPU requested, 16 CPU and 110GB of memory will be allocated.

Maximum walltime up to 120h can be used in both normal and AI queues. Kindly refer FAQ-6.8 for more info.

You may refer following links to get started with PBS:-

PBS Pro Guide: – Click here

PBS Pro Quick Commands Reference:- Click here

PBS Pro Official Documentation:- Click here

Files and Filesystems

Queries regarding file systems such as home, scratch, project and fileset

By default, any file is restricted to individual and the project/group. In case you need to let other users access your files , please contact the service desk (access Service Desk or contact [email protected]).

Please refer to the Technical Instruction for approved projects on:

The scratch disk is a temporary space for your working or transient data. Any files that have not been accessed for more than 30 days may be removed without notice. Therefore please do regular housekeeping and move your files to your home or project space if you still need them.

The scratch disk is a temporary space for your working or transient data. Any files more than 30 days old may be removed without notice. Therefore please do regular housekeeping and move your files to your home or project space if you still need them

No, NSCC does not have any backup system currently. If the files are deleted by accident or intentional, there’s no way we can retrieve the files. Please note that the scratch disk is a temporary space for your working or transient data. Any files more than 30 days old may be removed without notice. Therefore please do regular housekeeping and move your files to your home or project space if you still need them

Software

Details of applications and software that are available in ASPIRE2A

Please refer to our Software List. Once you’re logged in to ASPIRE2A, you can load the module of the application using the following commands: $ module avail # check on available modules $ module load # load application

ASPIRE 2A Software Environment

Yes, you can install applications by yourself. Please feel free to install your intended application into your own home directory. Alternatively,. You may setup conda environment and install your desired package there. Please refer conda.io for more info.

Yes, but in order to do so, you will need to have a project registered with NSCC. When requesting, please make sure to let us know your project ID. In general, we will ask you to install applications in your home directory by yourself, unless there are any dependencies that can only be solved by the system administrators.

For Parallel HDF5, use the module load cray-hdf5-parallel/1.12.1.1

Intel-MPI is not officially supported on ASPIRE 2A yet, so there are no such modules available. There is an interface to connect with the Intel-MPI page, so it is possible to install it in the user’s project folder.

File Transfer

Queries related to File transfer

You can use one of the file transfer protocols which supports copying over SSH (e.g. scp). Please note that the transfer rate depends on the network speed between ASPIRE 2A and your device.

The easiest way to transfer files from your organisation HPC is to use rsync.

For example, to transfer /scratch/johnsmit/project1 directory in your existing cluster to ASPIRE2A’s /project/johnsmit/ directory, you can run the following command on your cluster:

$ rsync -avz -e ssh /path/to/the/files @.nscc.sg:/project/johnsmit/

Note: Please replace login with the respective login node mentioned in the table access methods. For VPN connected users (either via stakeholder or NSCC vpn), user may have limited bandwidth and speed is expected to be slow.

From PC:  For Microsoft Windows/Mac, the preferred/suggested method is to use FileZilla or any other file transfer utility to transfer your local files to a desired location in NSCC. From Linux machine/Mac terminal, you can either use scp or rsync command to copy files from your local pc to NSCC.

The simple syntax for rsync should be: :

rsync -arvz -e ssh [email protected]:/destination/path/in/nscc /path/to/files

From a cluster in your organisation: SSH Secure client (putty, MobaXterm) published on the internet, there is no straightforward method but if your cluster is in A*STAR extranet, this can be possible.

Please try the below syntax: : 

rsync -arvz -e ssh /path/to/files astar.nscc.sg:/destination/path/in/nscc

For all other users, you will need to copy your files to your PC and transfer back to NSCC.

Please use Filezilla software to use GUI for file transfer.

Basic Linux

Basic Linux commands

Please refer to the Linux Tutorial.

For home and project directory:

Users will be required to launch interactive jobs to the compute node in order to make the changes using GPFS command, mmgetacl and mmeditacl.

STEP 1 – Submit an interactive job and enter any node.

$ qsub -I -l select=1:ncpus=1:ompthreads=1:mem=2GB -P -l walltime=01:00:00

qsub: waiting for job 925856.pbs101 to start qsub: job 925856.pbs101 ready

STEP 2 – Define which editor will be used.

Select the configure file editor. In this case, it is vim.

export EDITOR=/bin/vim

STEP 3 – Find the file or folder which you plan to change.

$/usr/lpp/mmfs/bin/mmeditacl exampleFolder

mmeditacl: Should the modified ACL be applied? (yes) or (no) yes

After running “mmeditacl”, vim will start to edit the ACL of the file. At the end of the file, add the permission configures. In this example, the allowed user id is “mycolleague”. Allow the user to “Read” and “Execute” the folder.

user:mycolleague:r-x-:allow

(X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED

After which, press “ESC” and type “:wq” to save and quit this editor. After saving the file, a pop up by mmeditacl will appear – “Should the modified ACL be applied?” Input “yes” to confirm this change.

STEP 4 – Check the ACL

Run the command “mmgetacl” to check the folder’s permission.

$ /usr/lpp/mmfs/bin/mmgetacl exampleFolder

#owner:aaa #group:root special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:—-:allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:—-:allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:mycolleague:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED

STEP 5 – Exit the job

Type “exit” to quit the job.

For scratch directory:-

You can use the command “setfacl” to modify the permission and let other members access or edit your file or folder.

Set the permission for a file or folder:

setfacl -m u:: file/folder

Get the permission:

getfacl file/folder

Remove the permission:

setfacl -x u: file/folder

Miniforge

About Miniforge

Miniforge is a fully open-source project, meaning its code is freely available for anyone to inspect and contribute to. This provides greater transparency and control over your environment compared to Anaconda, which has closed-source components. Anaconda updated its terms of service, for a organization over 200 members, a valid license is required. But for NSCC cluster, different groups are sharing within the computing resources, it is hard for NSCC to pay the license. Then miniforge can bring up the basic requirement for creating the conda environment. It is a good replacement of anaconda/miniconda.

yes, there are some. The main downside is the limited pre-installed packages in the base environment. You’ll need to manually install any additional packages you need in your own conda environment, which can be time-consuming and require more technical knowledge compared to Anaconda’s pre-installed options.

The answer is YES. On Aspire 2a, just load the miniforge environment by the command “module load miniforge3”. The base environment of miniforge is loaded into the system. The following steps are like anaconda, you can create your environment, activate you environment with conda or mamba command.

Yes, it is. If there is a meaningful upgrade, we will also upgrade it.

Yes, you can.

To setup a virtual environment with Miniforge just like as Anaconda/Miniconda. The commands are: $module load miniforge3 $conda create -n –yes

The commands “conda” or “mamba” can help you install packages under your environment. The “pip” command is also supported.

Yes, you can.

conda-forge and Anaconda’s channel are both sources for conda packages, but they have some key differences: conda-forge: Community-driven and open-source. Anyone can contribute packages and recipes. Anaconda’s channel: Primarily curated by Anaconda Inc., with contributions from select partners.

Package scope:

conda-forge: Aims to offer the widest possible range of packages, including bleeding-edge versions, pre-releases, and niche tools. Anaconda’s channel: Focuses on stable and tested packages, often prioritizing official releases and widely used libraries. conda-forge: Packages are generally updated more frequently, reflecting the faster pace of community development. Anaconda’s channel: Updates may be slower, prioritizing thorough testing and compatibility.

Transparency:

conda-forge: All package recipes and build logs are publicly available for inspection. Anaconda’s channel: Some recipes and build logs may be private or proprietary.

Licensing:

conda-forge: Primarily open-source packages, with some exceptions. Anaconda’s channel: May include both open-source and commercially licensed packages.

Summary:

conda-forge: If you need the latest packages, niche tools, or prefer open-source transparency. Anaconda’s channel: If you prioritize stability and want pre-tested, official releases.

Choose Your HPC System
Before Proceeding: