ASPIRE 2A is a supercomputer operated by the National Supercomputing Centre (NSCC) Singapore. It is an AMD-based Cray EX supercomputer with 15 PB of GPFS FS,10 PB of Lustre FS storage and Slingshot interconnect. It is built on x86 based architecture with HPL CPU benchmarks Rpeak of 3.145 PFLOPS (Rmax 2.58 PFLOPS) of compute throughput and I/O performance of up to 500GBytes/sec. The ASPIRE 2A has 768 compute nodes (98,304 cores), 82 GPU nodes (352 GPUs), 16 large memory nodes (2,048 cores), 16 High Frequency Nodes (1,024 cores) and memory ranging from 512 GB to 4TB RAM. ASPIRE 2A has login nodes dedicated to each stakeholder. server (DDR4) (768 nodes) (64 nodes) (12 nodes) (11TB NVMe) (6 nodes) (14TB NVMe) (12 nodes) 4 nodes) (16 nodes) ASPIRE 2A is built on a RHEL8 Linux platform and currently does not support Microsoft Windows OS. The ASPIRE 2A job portal is a graphical user interface which enables users to run, monitor, and manage the jobs on ASPIRE 2A. It can be accessed from https://jobportal2a.nscc.sg/ Note - Job portal is accessible only via NSCC VPN. Stakeholderâs links are work in progress. The ASPIRE 2A Remote Visualisation Portal is a graphical user interface which enables users to run graphic intensive applications using Altair Access Desktop Session. It can be accessed from https://visual2a.nscc.sg/. Note - Visualisation portal is accessible only via NSCC VPN. Stakeholderâs links are work in progress. SU is the currency to utilize resources on the ASPIRE 2A system. SU is charged based on the requested CPU and GPU resources as follows: For CPU only jobs: 1 SU for 1 ncpus per hour. For GPU jobs: 64 SU for 1 ngpus per hour (no charge for ncpus) NOTE: GPU Jobs will have fixed CPU and Memory ratio enforced i.e. for every 1 GPU requested, 16 CPU and 124 GB of memory will be allocated and SU charged will be 64. NSCCâs enrollment system is federated with SingAREN to facilitate users from A*STAR, NTU, NUS, SUTD, SMU, SP and TP to register seamlessly to ASPIRE2A. Please follow the instructions in the User enrollment Guide for Stakeholders to enroll and get access to ASPIRE 2A. If you donât belong to one of the organizations above, please contact [email protected] with the following details: *Note: Login ID creation is subjected to availability and approvals. . A new User ID will be assigned automatically if the requested User ID is not available. Enrolment for NUSEXT users are automated. However, users must enrol to NUS IT HPC first and can then use the same credentials to enrol to NSCC through the portal once the ID is created in NUS IT HPC. To utilise ASPIRE 2A resources, you need to login to the cluster. Please follow the below table to determine how to access ASPIRE 2A. How to reach login node via optimal path LoginNode or login.asp2a.nscc.sg 1 â ASPIRE 2A VPN for Windows 2 â ASPIRE 2A VPN for Mac 3 â ASPIRE 2A VPN for Linux For example, a user from NUS can reach the login node using ssh client via âssh <userid>@aspire2a.nus.edu.sgâ Common client tools for connecting to login nodes and file transfer Follow the below instructions to connect to ASPIRE2A through portal: Below are instructions to connect using SSH/SCP on UNIX/Linux or Mac: If you want to use the X11 interface, replace the above ssh command with ssh -X. For X11 â ssh -Y, make sure you have XQuartz installed for OS X 10.8 or higher. Below are instructions to connect using SSH/SCP on Windows: Users will be able to access files on the Internet using http or https methods using tools such as curl or wget. Please login to https://user.nscc.sg/ with your organization credentials and click on set/reset your password. Once a new password is set you can login to NSCC with the new password. Please follow the Password Reset guide and follow the step-by-step instructions. Note that the password must contain: If you still cannot access your account after resetting your password, please contact our help desk at [email protected] ASPIRE 2A SSH session idle timeout is set to 12 hours. The ssh session will terminate automatically after. This is a security measure to ensure that NSCC offers a safe environment to all users of ASPIRE 2A. The NSCC account password expires every 90 days. Please be reminded to reset your password before the expiry of your password. If you did not reset your password after 90 days, your account will expire. Upon expiry of the account you will not be able to login to NSCC system and the messages as shown below will be displayed when using the ssh connection. There's a grace period of 90 days on top of the 90 day expiry, after which to activate the account, one must contact us through [email protected] in order to activate the account, however it is mandatory to re-set the password after activating the account. Files in home directory are untouched until 90 days post which the files from home directory will be archived, while file in scratch still follow the purge policy. To activate the account one must login to https://user.nscc.sg/ with your organization credentials and click on set/reset your password. Once a new password is set you can login to NSCC with the new password. PS: Changing password through passwd command will have no effect on the expiry date. If your account is still not re-activated after resetting your password, please contact our help desk at [email protected] ASPIRE 2A VPN connection remains valid for up to 24 hours. VPN connection will automatically terminate after the time limit is reached. Users are required to re-reconnect to re-establish the VPN connection. Only the Duo 2FA mobile app is supported currently. NSCC is in the midst of exploring other alternatives. Only Checkpoint VPN client is supported. Yes. You can SSH to the compute nodes where your jobs are running and only via login nodes. To access the compute node of a running job through SSH via login nodes, you need to export the job id in environment first. For example, if the job ID is 123456.pbs101 and the nodes assigned are : x1000c0s0b0n1 and x1000c1s6b0n1, you can do the following: Then, SSH will be allowed on x1000c0s0b0n1 and x1000c1s6b0n1 from the node where you exported the PBS_JOBID and the SSH sessions will be attached to that PBS JOB. $ssh x1000c0s0b0n1 Subsequent SSH sessions from any of the compute node will already have the PBS_JOBID exported and will not require to export PBS_JOBID. To check the current exported job ID in the current session on login node, you can do : The personal quota of is granted to users from all stakeholder organisations (A*STAR, NTU, NUS, SUTD) and IHL (autonomous university and polytechnic) upon account creation. The quota is fixed, non-transferable and cannot be extended. Once the personal quota is exceeded, you can only submit a job through an approved project. Users are encouraged to apply for projects only if your resource request exceeds the personal quota, or if you have depleted your personal quota. For the Call for Research Project: For the Call for Educational HPC Project: Please refer to this page for more information on NSCC's Project Calls and their eligibilities. The project allocations are reviewed by the Policy and Resource Allocation Committee (PRAC) and Technical Resource Allocation Committee (TRAC) with representatives from the NSCC stakeholders, based on scientific merits and technical feasibility. The criteria for resource request approval includes: The expected turnaround time from project application to approval is 2 months. The project applicant can expect to receive the notification via email one month before the start of the allocation period for the respective calls. Example 1: For the Jul-cycle Call for Research Project, approval will be sent out to the project applicant in June via email. The resources will be provisioned in July. Example 2: For the Jan-cycle Call for Research Project, approval will be sent out to the project applicant in December via email. The resources will be provisioned in January. Example 3: For the Call for Educational HPC Project, approval will be sent out within 2 months to the project applicant via email after the closing of the application period e.g. application period closes on 31 Jan, notification email will be sent out latest by 31 Mar. Please note that only the project applicant will be able to see the project in the Project Portal. Only projects allocated under the Call for Research Project will be indicated in the portal. You may use the following command to check all projects that you are part of: myprojects Yes, but please note that only the PI and/or the project applicant can transfer the ownership of a project. The current owner or PI of the project should email [email protected] with the following information: The project applicant will act as the owner and will be responsible for all correspondences of the project, including requests for addition/removal of members, requests for additional resources and submission of renewal application, etc. All project-related notifications will be communicated with the project applicant via email. The PI/applicant of the project should send an email to [email protected] with the following details: Please note that requests made outside the project allocation calls will be reviewed by the Resource Allocation Committee within 2 weeks to 3 months depending on the scale of the request. Yes, projects that are not renewed will expire by the end of the 1-year project cycle. Please renew your project during the project call window if you wish to extend your project. When your project ends, please update the project deliverables in the Project Portal. At the end of the project cycle, all resources granted will expire. Please plan your resource utilisation wisely and use your allocated resources as early as possible to avoid encountering resource bottlenecks towards the end of the project cycle. Please submit a project renewal application during the project call window. For Call for Research Project: Please note that all unutilised resources will not be brought forward. For Call for Educational HPC Project: For Call for Research Project: At the end of the project cycle, you will be requested to update your project deliverables. For Call for Educational HPC Project and all other projects not found on the Projects Portal; Please submit your deliverables update via this form. Please contact [email protected] if you would like to apply for resources other than through a project call. The disk quota allocated is limited as stated below respectively : /home   - Home directory   - 50GB. This will not be extended on any terms. If more disk space is needed, you should apply for a project through the Project Portal. You can check your current disk usage on /home by executing the âmyquotaâ command. $myquota +------------------------------------------------------+-------------------------+ |                                    Block Limit     |         Inode Limit   | | Filesystem Type                Usage         Quota |      Usage      Quota | +------------------------------------------------------+-------------------------+ | asp2a-home GPFS-FILESET        0.00K        50.00G |          1          0 | | scratch   LUSTRE             416.3M          100T |      11656  200000000 | +======================================================+========+ Purging policy is implemented on scratch directory and files unused for more than 30 days will be purged automatically. File under home or project directories are untouched. You can check your current CPU usage by executing the âmyusageâ command. $ myusage Your usage as of 13/01/2023-15:31:57 +---------------------+----------+------------------+---------------------+ | Time Range         | Resource |     Num of Jobs |    Core/Card Hours | +---------------------+----------+------------------+---------------------+ | Past 7 days        |  CPU   |              55 |          15480.207 | |                |  GPU   |               0 |              0.000 | +---------------------+----------+------------------+---------------------+ | Past 30 days    |  CPU   |              67 |          63819.158 | |                |  GPU   |               0 |              0.000 | +---------------------+----------+------------------+---------------------+ | Overall         |  CPU   |              67 |          63819.158 | |                |  GPU   |               0 |              0.000 | +========================================================+ Upon logging in to ASPIRE 2A, you will be able to see your SU balance statement with usage split for CPU and GPU resources. The actual remaining balance is shown against the âSUâ resource. The âGrantâ, âUsedâ and âBalanceâ for these resources reflects the project allocation approved, used and remaining respectively. +--------+---------------------+---------------------+---------------------+ | Unit    |             Grant |         Used |         Balance | +--------+---------------------+---------------------+---------------------+ | SU      |       10512000.000 |  1342398.472 |     9154217.528 | | CPU/HR |       10000000.000 |   1101916.673 |    8898083.327 | | GPU/HR |         8000.000 |     3757.000 |       4243.000 | +===================================================+ Total SU Grant: 10512000 âTotal SU Grantâ corresponds to the amount of resource that was granted. Since the accounting happens in prepaid mode, SU will be fully deducted for the requested resources and will only be refunded once the job is completed. Please note that once the Used+Total SU for running jobs >= Granted, the subsequent jobs will be ON HOLD, and no new jobs will be allowed. The new jobs will only be allowed when they fit into the Balance. For example, if you request 50 hours for your job but in actuality your job only needs 10 hours to complete, the AMS will hold back the requested 50 hours to itself (and mark them as âpending core hoursâ) until the 10 hour-job is completed. The remaining 40 hours will only be released and brought back to the user when the job is completed. What is categorised as âpending core hoursâ will be deducted from your âavailable core hoursâ. Therefore, if you are not requesting for the right amount of wall time in your job, you might be temporarily seeing less available core hours than you actually have. When logged into the NSCC login nodes, you will be able to see the utilisation of projects assigned to the user in the MOTD page. To check the usage at any point in time, please execute the command: $myprojects -p <project ID> For example, to view the project summary of the project 10000001, execute the command: $myprojects -p 13001717 A similar output shown below will be displayed: $myprojects -p 13001717 Project : 13001717 ExpDate : 2023-06-30 Members : <member part of this project will be displayed here> Project balance as of 13/01/2023-15:32:03 +--------+---------------------+---------------------+---------------------+ | Unit  |              Grant |               Used |            Balance | +--------+---------------------+---------------------+---------------------+ | SU    |       10512000.000 |        1342398.472 |        9154217.528 | | CPU/HR |       10000000.000 |        1101916.673 |        8898083.327 | | GPU/HR |           8000.000 |           3757.000 |           4243.000 | +=================================================+ To check the storage disk usage, execute the command: $myquota -p <project ID> For example, for project 13001717, execute the command: $myquota -p 13001717 A similar output shown below will be displayed: $ myquota -p 13001717 +------------------------------------------------------+-------------------------+ |                                    Block Limit     |         Inode Limit   | | Filesystem Type                Usage         Quota |      Usage      Quota | +------------------------------------------------------+-------------------------+ | asp2a-data GPFS-FILESET        8.07G         3.00T |       3972          0 | +==================================================+ Upon logging in to ASPIRE 1, you will be able to see the amount of core hours remaining in all the projects in which you are a member of, as seen below: Core hours remaining for project: Total Grant: 1000000.00 âTotal Grantâ corresponds to the amount of resource that was granted. Be aware that once the (Used+Pending >= Granted), the subsequent jobs will be ON-HOLD, and no new jobs will be allowed. For example, if you request 50 hours for your job but your job actually only needs 10 hours to complete, the AMS will hold back the requested 50 hours to itself (and mark them as âpending core hoursâ) until the 10 hour-job is completed. The remaining 40 hours will only be released and brought back to the user at that point of time. What is categorized as âpending core hoursâ will be deducted from your âavailable core hoursâ, so if youâre not requesting for the right amount of walltime in your job, you might be temporarily seeing less available core hours than you actually have. MPI is a message-passing library interface standard and includes MPI-1, MPI-2, MPI-3. It is one of the most used technologies to parallelise computing tasks. There are various implementations of MPI such as OpenMPI, CRAYMPICH and IntelMPI. MPI standard defines the programming interfaces for C and Fortran, which means that all the codes which are used in C/Fortran can be used with MPI to parallelise the codes.  There are also MPI bindings for Python and Java. ASPIRE 2A is a Cray cluster and supports Cray MPICH, which can offer better performance than other MPI implementations. There are several reasons: For example, the command âmpirun -np 32 --cpu-binding=depth -d 2 gmx_mpi mdrun âŠâ will distribute the MPI codes to 64 CPU cores and each MPI thread with 2 openmp threads. Cray MPICH is the default MPI environment and is also the MPI implementation supported by Cray. We encourage users to use Cray MPICH first. The module name is âcray-mpichâ and the MPI compilers are cc for MPI C codes, CC for MPI c++ codes, and ftn for compiling MPI Fortran codes. Other MPI implementations available on ASPIRE 2A include OpenMPI and MVAPICH. The compilers are mpicc for C codes, mpic++ for c++ and mpif77/mpif9 for Fortran codes. Please note that support for these implementations is limited. One of the techniques to run your code in parallel is OpenMP. Codes using OpenMP are normally restricted to run in one physical server, unless you are running a hybrid code made out of MPI and OpenMP. OpenMP codes are not capable of communicating across nodes through a network infrastructure such as Infiniband. OpenMP code is simple to use and does not need any wrappers. Standard compilers like GCC/Gfortran or Intel C/Fortran can use OpenMP. However, the program must have OpenMP directives. In the jobs script, the environment variable OMP_NUM_THREADS should be used to manage the number of OPENMP threads in the app. For example, will allow the app to use 4 OpenMP threads in one computing task. An application which combines OpenMP and MPI together, when run with cray-MPICH, can set up the OpenMP threads with â-dâ parameter. For example: In this case, the application will run with 12 MPI processes and each MPI process has 4 OpenMP threads running in dedicated cores each. Note: using --cpu-bind depth to control how tasks are bound to cores. Options include: using  -d 4 to specify the number of cpus (cores) to be allocated per process. This may be useful if the job is multithreaded and requires more than one cpu per task for optimal performance. A serial application is an application which uses only one process at a time to perform calculations, while a parallel application can scale to multiple processes and make use of cores/CPUs in one or multiple servers using high speed Interconnects. For this reason, parallel applications can shorten the wall-time of computing tasks in general. To promote the HPC experience, it is strongly recommended to use parallel codes on ASPIRE 2A if applicable. Checkpointing is a technique used to save the output periodically during the runtime of an application. The advantage of using checkpointing is that it is possible to restart the program from the last checkpoint should the execution of the program terminate abruptly due to any reason. You are advised to use the checkpointing and restart technique while writing your application so that should the program be terminated, you would be able to restart the job from the last checkpoint. The latest compiling tools and environment are supported on ASPIRE 2A and all the compilerâs environments are managed via modules. The default environment is PrgEnv-cray. PrgEnv-gnu for GNU GCC PrgEnv-intel for Intel compiler PrgEnv-nvhpc for OpenACC support PrgEnv-aocc for AMD AOCC compiler craype-accel-nvidia80 for GPU CUDA compiler The program environment can be changed via the âmodule swapâ command. For example- âmodule swap PrgEnv-cray PrgEnv-intelâ. When choosing the compiler, please refer to the following compiler table:- If your codes are developed by CUDA(*.cu), these codes can be compiled with Nvidia CUDA. On ASPIRE 2A, the package name is âcraype-accel-nvidia80â and it can work with other PrgEnv-cray, PrgEnv-gnu, PrgEnv-intel and PrgEnv-aocc modules to compile the apps. Use cc to compile the c codes, CC to compile C++ codes, and nvcc for CUDA codes. Currently, NSCC does not support compilers other than the versions mentioned above. If youâre keen to use other compilers, you are more than welcome to install it into your own project or home directory. Please contact our service desk (access Service Desk or contact [email protected]) if you face any difficulties during the installation. There are Cray Scientific libraries (cray-libsci, cray-fftw), by default, on ASPIRE 2A. Intel MKL math libraries are also available. Commercial applications can be installed in ASPIRE 2A, but they need to be installed and used according to their respective licensing terms. Firstly, the license and the software modules of the intended application needs to be provided by you to NSCC for installation. Based on the license terms and conditions, we will then determine whether the software should be installed in ASPIRE 2A and how it should be installed. The same rule applies to applications with academic licenses. Please note that the accountability on the usage of the software should be held by the owner of the license. For more information, please contact our service desk. Yes, you can request for additional libraries or applications through the Service Desk portal. However, the installation of libraries is subject to various conditions such as compatibility, time required to make the library available, dependencies and our software policies. Please contact the Service Desk for further clarification. Our Technical specialist will respond to you. Yes you can create your own virtual python environment with Anaconda, Miniconda or one of the Python distributions and install the Jupyter lab/notebook in the environment. The jupyter lab/notebook can be accessed through VIS nodes or through batch jobs. As the underlying hardware and modules are separate on ASPIRE 2A, you will need to recompile your codes. You can run all the available modules with module avail command or please refer here for more info. To find out the reason why your jobs are not running, run âqstat -s â to print the comments from the job scheduler about your job. If you see a âââ"Q" in the column "S", it means the scheduler has not executed your job yet. A few common reasons listed below: Please contact our service desk (access Service Desk or contact [email protected]) if you still face any issues with running your job on ASPIRE 2A. This error could be because you have forgotten to source the appropriate system .rc in your personal .rc file. âsource /etc/csh.cshrcâ. If you have accidentally deleted the .bashrc file, you can copy back the original .bashrc file from /etc/skel/.bashrc. Example: $ cp /etc/skel/.bash* ~/  Note: Avoid using the command âmodule purgeâ to remove all modules. Try to use âmodule rm/swapâ to unload or swap modules. If your batch job is named, runjob.sh and if your output is not redirected in the batch script, your job output will appear in runjob.sh.o****, where the digits at the end of the file name represent your job ID. The final entries in the .o file give you the details on wall time and virtual memory used by the job. Similarly, error messages of your jobs will be recorded in runjob.sh.e****, where the digits at the end of the file name represent your job ID. In a PBS job script, the memory you specify using the -lmem= option is the total memory across all nodes. However, this value is internally converted into the per-node equivalent, and this is how it is monitored. For example, since NSCC Supercomputer has 24 cores per node, if you request -l select=2:ncpus=24, mem=10GB, the actual limit will be 10GB on each of the two nodes. If you exceed this on either of the nodes, your job will be killed. Please note that if a job runs for less than a few minutes, the memory use reported in your .o file once the job completes may be inaccurate. We strongly discourage people running short jobs of less than 1 hour. This is because there is significant overhead in setting up and tearing down a job and you may end up wasting large amounts of your grant. Instead, if you have many short jobs to run, consider merging them together into a longer job. Please note that you are not supposed to run any job on the Login nodes. We will automatically kill user jobs running on the login node. If you want to run an interactive session, please submit an interactive job into the compute nodes using the qsub -I command. $qsub -I The queues available for job submission are: ânormalâ queue routes the job to various other queues on the EX system based on resources requested. The details are outlined in the next question (CPU and GPU jobs can be submitted). This queue is suitable for both CPU and GPU jobs, but not suitable for AI jobs which typically require higher IOPs and large numbers of smaller files. âaiâ queue routes the job to various queues on the accelerator system that is dedicated for GPU only workloads with NVMe as local storage to boost IOPs and cater for large small files access scenarios. NVMe storage is mounted at /raid as local disk on each AI node. There are several queues in the system with different conditions which works for different workloads. Below are some of the queues in the system. walltime = 02:00:01 hours to 24:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 24:00:01 hours to 120:00:00 hours walltime = 00:00:01 hours to 02:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 02:00:01 hours to 24:00:00 hours walltime = 24:00:01 hours to 120:00:00 hours walltime = 00:00:01 hours to 02:00:00 hours Maximum 16 ngpus running at a time, all users combined walltime = 00:00:01 hours to 24:00:00 hours memory = 512 GB to 1 TB walltime = 00:00:01 hours to 24:00:00 hours memory = 1 TB to 2 TB walltime = 00:00:01 hours to 24:00:00 hours memory = 2 TB to 4 TB walltime = 00:00:01 hours to 24:00:00 hours walltime = 00:00:01 hours to 24:00:00 hours walltime = 00:00:01 hours to 24:00:00 hours walltime = 00:00:01 hours to 24:00:00 hours walltime = 00:00:01 hours to 02:00:00 hours walltime = 00:00:01 hours to 120:00:00 hours However, these queues do not accept jobs directly. Rather, jobs will be routed once submitted to the normal/ai queue. NOTE: There is a system wide restriction of a maximum of 100 Jobs per user, unless otherwise specified at queue level. E: Job is exiting after having run Please refer to the PBSPro manual for more details: Job arrays are a great way to organise the execution of multiple short jobs with similar properties or if they are using similar data with different algorithms or if they are using a serial input file numbering system e.g. file01, file02, file03. Example: Submit 10 jobs with consecutive index numbers. #!/bin/sh Examples Using a Job Script #PBS -N Simn1010Jobs #PBS -J 1-10 echo "Main script: index " $PBS_ARRAY_INDEX /opt/AppA âinput /home/user01/runcase1/scriptlet_$PBS_ARRAY_INDEX A interactive job can be submitted by âqsubâ command. See below for some examples. Submit the job to CPU node and ask to use 4 CPU cores and 16 GB memory: Submit a job to GPU node, which asked to use 1 GPUs: Submit an interactive job to AI queue using 1 GPU: All the AI jobs should be submitted to the âaiâ queue. An example of an AI job may look like this: #PBS -l select=1:ngpus=4 #PBS -l walltime=12:00:00 #PBS -q ai #PBS -P <projectID> #PBS -N mxnet_sing #PBS -j oe #PBS -o log image="/app/apps/containers/mxnet/mxnet_22.12-py3.sif" cd "$PBS_O_WORKDIR" || exit $? [ -d log ] || mkdir log module load singularity datadir=/app/workshops/introductory/aidata/mxnet480 singularity exec --nv -B /scratch:/scratch -B /app:/app -B /home/project:/home/project $image python /opt/mxnet/example/image-classification/train_imagenet.py \ --gpus 0,1,2,3 \ --batch-size 512 --num-epochs 1 \ --data-train $datadir/train_480_100k.rec \ --data-train-idx $datadir/train_480_100k.idx \ --disp-batches 10 --network resnet-v1 \ --num-layers 50 --data-nthreads 32 \ --min-random-scale 0.533 \ --max-random-shear-ratio 0 \ --max-random-rotate-angle 0 \ --kv-store nccl_allreduce To use the singularity image, the singularity module should be loaded. Users can prepare the images by themself or use the pre-build images, which are located here â/app/apps/containersâ. For example, pull an image from Docker hub: singularity build ubuntu.sif docker://ubuntu:latest To run an image: export SINGULARITY_BIND="/home/project:/home/project,/scratch:/scratch,/app:/app" singularity exec --nv /app/apps/containers/pytorch/pytorch-nvidia-22.12-py3.sif python thePythonScript.py AI queue is only accessible upon request or specifically mentioned during project submission. To get AI queue access, please email your request to [email protected]. There are 3 types of nodes in the cluster based on their memory, which are 502GB, 2TB and 4TB. A job can request multiple chunks of memory with a maximum of 4TB (4000GB) per chunk. For example, -l select=2:ncpus=128:mem=4000g is a job that is requesting 8TB (8000GB) memory in total and will run on 2 nodes. However, âaiâ queues will have fixed CPU and Memory ratio enforced i.e. for every 1 GPU requested, 16 CPU and 110GB of memory will be allocated. Maximum walltime up to 120h can be used in both normal and AI queues. Kindly refer FAQ-6.8 for more info. You may refer following links to get started with PBS:- PBS Pro Quick Start Guide: - http://help.nscc.sg/pbspro-quickstartguide/ PBS Pro Quick Commands Reference:- PBS_Professional_Quick_Reference_31Jan2023 PBS Pro Official Documentation:- By default, any file is restricted to individual and the project/group. In case you need to let other users access your files , please contact the service desk (access Service Desk or contact [email protected]). The scratch disk is a temporary space for your working or transient data. Any files that have not been accessed for more than 30 days may be removed without notice. Therefore please do regular housekeeping and move your files to your home or project space if you still need them. The scratch disk is a temporary space for your working or transient data. Any files more than 30 days old may be removed without notice. Therefore please do regular housekeeping and move your files to your home or project space if you still need them No, NSCC does not have any backup system currently. If the files are deleted by accident or intentional, there's no way we can retrieve the files. Please note that the scratch disk is a temporary space for your working or transient data. Any files more than 30 days old may be removed without notice. Therefore please do regular housekeeping and move your files to your home or project space if you still need them Please refer to our software list (https://help.nscc.sg/wp-content/uploads/NSCC-ASPIRE-2A-Software-List_as-of-2-Sep-2022.pdf). Once youâre logged in to ASPIRE2A, you can load the module of the application using the following commands: $ module avail                                            # check on available modules $ module load <application name>       # load application ASPIRE 2A Software Environment Yes, you can install applications by yourself. Please feel free to install your intended application into your own home directory. Alternatively,. You may setup conda environment and install your desired package there. Please refer conda.io for more info. Yes, but in order to do so, you will need to have a project registered with NSCC. When requesting, please make sure to let us know your project ID. In general, we will ask you to install applications in your home directory by yourself, unless there are any dependencies that can only be solved by the system administrators. For Parallel HDF5, use the module load cray-hdf5-parallel/1.12.1.1 Intel-MPI is not officially supported on ASPIRE 2A yet, so there are no such modules available. There is an interface to connect with the Intel-MPI page, so it is possible to install it in the user's project folder. You can use one of the file transfer protocols which supports copying over SSH (e.g. scp). Please note that the transfer rate depends on the network speed between ASPIRE 2A and your device. The easiest way to transfer files from your organisation HPC is to use rsync. For example, to transfer /scratch/johnsmit/project1 directory in your existing cluster to ASPIRE2Aâs /project/johnsmit/ directory, you can run the following command on your cluster: $ rsync -avz -e ssh /path/to/the/files <userid>@<loginnode>.nscc.sg:/project/johnsmit/ Note: Please replace login with the respective login node mentioned in the table access methods. For VPN connected users (either via stakeholder or NSCC vpn), user may have limited bandwidth and speed is expected to be slow. From PC: For Microsoft Windows/Mac, the preferred/suggested method is to use FileZilla or any other file transfer utility to transfer your local files to a desired location in NSCC. From Linux machine/Mac terminal, you can either use scp or rsync command to copy files from your local pc to NSCC. The simple syntax for rsync should be: <localpc>: rsync -arvz -e ssh userid@<loginnode>.nscc.sg:/destination/path/in/nscc /path/to/files From a cluster in your organisation: SSH Secure client (putty, MobaXterm) published on the internet, there is no straightforward method but if your cluster is in A*STAR extranet, this can be possible. Please try the below syntax: <your cluster>: rsync -arvz -e ssh /path/to/files astar.nscc.sg:/destination/path/in/nscc For all other users, you will need to copy your files to your PC and transfer back to NSCC. Please use Filezilla software to use GUI for file transfer. Please refer to the Linux Tutorial. Users will be required to launch interactive jobs to the compute node in order to make the changes using GPFS command, mmgetacl and mmeditacl. walltime=01:00:00 qsub: waiting for job 925856.pbs101 to start qsub: job 925856.pbs101 ready Select the configure file editor. In this case, it is vim. mmeditacl: Should the modified ACL be applied? (yes) or (no) yes After running âmmeditaclâ, vim will start to edit the ACL of the file. At the end of the file, add the permission configures. In this example, the allowed user id is "mycolleague". Allow the user to âReadâ and âExecuteâ the folder. (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE   (-)DELETE_CHILD (-)CHOWN       (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED After which, press âESCâ and type â:wqâ to save and quit this editor. After saving the file, a pop up by mmeditacl will appear â âShould the modified ACL be applied?â Input âyesâ to confirm this change. Run the command âmmgetaclâ to check the folderâs permission. #NFSv4 ACL #owner:aaa #group:root special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE   (X)DELETE_CHILD (X)CHOWN       (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:group@:----:allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE   (-)DELETE_CHILD (-)CHOWN       (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:----:allow (-)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE   (-)DELETE_CHILD (-)CHOWN       (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:mycolleague:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)APPEND/MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE   (-)DELETE_CHILD (-)CHOWN       (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Type âexitâ to quit the job. You can use the command âsetfaclâ to modify the permission and let other members access or edit your file or folder. Miniforge is a fully open-source project, meaning its code is freely available for anyone to inspect and contribute to. This provides greater transparency and control over your environment compared to Anaconda, which has closed-source components. Anaconda updated its terms of service, for a organization over 200 members, a valid license is required. But for NSCC cluster, different groups are sharing within the computing resources, it is hard for NSCC to pay the license. Then miniforge can bring up the basic requirement for creating the conda environment. It is a good replacement of anaconda/miniconda. yes, there are some. The main downside is the limited pre-installed packages in the base environment. You'll need to manually install any additional packages you need in your own conda environment, which can be time-consuming and require more technical knowledge compared to Anaconda's pre-installed options. The answer is YES. On Aspire 2a, just load the miniforge environment by the command "module load miniforge3". The base environment of miniforge is loaded into the system. The following steps are like anaconda, you can create your environment, activate you environment with conda or mamba command. Yes, it is. If there is a meaningful upgrade, we will also upgrade it. Yes, you can. To setup a virtual environment with Miniforge just like as Anaconda/Miniconda. The commands are: $module load miniforge3 $conda create -n <name of virtual env.> --yes The commands "conda" or "mamba" can help you install packages under your environment. The "pip" command is also supported. Yes, you can. conda-forge and Anaconda's channel are both sources for conda packages, but they have some key differences: conda-forge: Community-driven and open-source. Anyone can contribute packages and recipes. Anaconda's channel: Primarily curated by Anaconda Inc., with contributions from select partners. conda-forge: Aims to offer the widest possible range of packages, including bleeding-edge versions, pre-releases, and niche tools. Anaconda's channel: Focuses on stable and tested packages, often prioritizing official releases and widely used libraries. conda-forge: Packages are generally updated more frequently, reflecting the faster pace of community development. Anaconda's channel: Updates may be slower, prioritizing thorough testing and compatibility. conda-forge: All package recipes and build logs are publicly available for inspection. Anaconda's channel: Some recipes and build logs may be private or proprietary. conda-forge: Primarily open-source packages, with some exceptions. Anaconda's channel: May include both open-source and commercially licensed packages. conda-forge: If you need the latest packages, niche tools, or prefer open-source transparency. Anaconda's channel: If you prioritize stability and want pre-tested, official releases.
Server
CPU Model
Cores per socket
Socket per server
Total Physical cores per
Available RAM
GPUs per server
Standard Compute Node
Dual-CPU AMD EPYC Millan 7713 @2.0GHz
64
2
128
512 GB
No GPU
GPU compute node
Single-CPU AMD 7713 @ 2.0GHz
64
1
64
512 GB
4x Nvidia A100 40GB
GPU AI Node
Single-CPU AMD 7713 @ 2.0GHz
64
1
64
512 GB
4xNvidia A100 40GB
GPU AI Node
Dual-CPU AMD 7713 @ 2.0GHz
64
2
128
1 TB
8xNvidia A100 40GB
Large memory node
Dual-CPU AMD 7713 @ 2.0GHz
64
2
128
2 TB
No GPU
Large memory node
Dual-CPU AMD 7713 @ 2.0GHz
64
2
128
4 TB
No GPU
High frequency node
Dual-CPU AMD 75F3@ 2.95GHz
32
2
64
512 GB
No GPU
User From Entity
Host/FQDN
If connect from outside campus network
NUS
aspire2a.nus.edu.sg
connect to NUS VPN first
NTU
aspire2antu.nscc.sg
Note:- NTU users will need to request NTU jumphost to access ASPIRE2A via
aspire2antu.nscc.sg
Please email your jumphost access request to [email protected]
How to access jump host:- Using-NTU-JumpHost-to-NSCC-ASPIRE-2A
A*STAR
aspire2a.a-star.edu.sg
connect to A*STAR VPN first
SUTD
aspire2a.sutd.edu.sg
connect to SUTD VPN first
Direct User
aspire2a.nscc.sg
For all users not from the above categories, please connect to ASPIRE 2A VPN first before accessing login nodes. Kindly refer to VPN guides here:
Operating System
Function
Tools
Windows
SSH Secure Client
Putty, MobaXterm
File Transfer
FileZilla, Winscp
Linux/Unix
SSH Secure Client
Terminal or ssh
File Transfer
FileZilla, scp, rsyncâe ssh
MacOS
SSH Secure Client
ssh
File Transfer
scp, rsync -e ssh
Any OS
ASPIRE 2A Job Portal
https://jobportal2a.nscc.sg/
ASPIRE 2A Remote Visualisation Portal
https://visual2a.nscc.sg/
$ export PBS_JOBID=123456.pbs101
$ echo $PBS_JOBID
/scratch - Scratch directory - 100 TB
Total SU Used: 1342398.472
Total SU Balance: 9154217.528
âTotal SU Usedâ corresponds to the amount consumed for completed jobs.
âTotal SU Balanceâ corresponds to the remaining amount available at that instance. (i.e., âTotal SU Grantâ - âTotal SU Usedâ - âTotal SU for running jobsâ).
Total Used: 200000.00
Total Pending: 50000.00
Total Avail: 7500000.00
âTotal Usedâ corresponds to the amount consumed so far.
âTotal Pendingâ corresponds to the current amount requested (i.e. #running_jobs * per_job_core_requested * per_job_walltime_requested)
âTotal Availâ corresponds to âTotal Grantâ - âTotal Usedâ - âTotal Pendingâ
The new jobs will be allowed when it fit into the Avail, which is (Avail >= Granted - Used - Pending).
export OMP_NUM_THREADS=4
mpirun -np 12 --cpu-bind depth -d 4 app <input>
ASPIRE 2A
PrgEnv-gnu
PrgEnv-intel
craype-accel-nvidia80
PrgEnv-nvhpc
openmpi
PrgEnv-aocc
cray-mpich
c
cc
cc
-
cc
-
cc
-
c++
CC
CC
-
CC
-
CC
-
fortran
ftn
ftn
-
ftn
-
ftn
-
mpicc
-
-
-
-
mpicc
-
cc
mpic++
-
-
-
-
mpic++
-
CC
mpifortran
-
-
-
-
mpif90
-
ftn
cuda
-
-
nvcc
cc
-
-
-
fortran(cuda)
-
-
-
ftn
-
-
-
Solution:
Complex
Rout Name
Exec Queue Name
Conditions (Per Job)
Restrictions
pbs101
normal
q1
ncpus = 1
q2
ncpus = 2 to 64
q3
ncpus = 65 to 127Â (less than 1 node)
q4
ncpus = 128Â (1 node)
q5
ncpus = 129 to 2048Â (2 to 16 nodes)
q6
ncpus = 2049 to 98304Â (16 to 768 nodes)
qlong
ncpus = 1 to 128Â (1 node)
Maximum 2 running jobs per user at a time upto a total of 128 ncpus
qdev
ncpus = 1 to 128Â (1 node)
Maximum 2 running jobs per user at a time upto a total of 256 ncpus
g1
ngpus = 1
g2
ngpus = 2 or 3
g3
ngpus = 4Â (1 node)
g4
ngpus = 5 to 256 (2 to 64 nodes)
glong
ngpus = 1 to 64 (1 to 16 nodes)
gdev
ngpus = 1 to 4Â (1 node)
Maximum 2 running jobs per user.
l1
ncpus = 1 to 128Â (1 node)
l2
ncpus = 1 to 128Â (1 node)
l3
ncpus = 1 to 128Â (1 node)
pbs102
ai
aiq1
ngpus = 1
aiq2
ngpus = 2 or 3
aiq3
ngpus = 4 (1 node)
aiq4
ngpus = 5 to 96
Maximum 6 chunks of 8 ngpus
aidev
ngpus = 1
ailong
ngpus = 1 to 4 (1 node)
Maximum of 1 running job per user
F: Job is finished. Job has completed execution, job failed during execution, or job was deleted.
H: Job is held. A job is put into a held state by the server or by a user or administrator. A job stays in a held state until it is released by a user or administrator.
M: Job was moved to another server
Q: Job is queued, eligible to run or be routed
R: Job is running
S: Job is suspended by the server. A job is put into the suspended state when a higher priority job needs the resources.
T: Job is in transition (being moved to a new location)
U: Job is suspended due to workstation becoming busy
W: Job is waiting for its requested execution time to be reached or job specified a stage in request which failed for some reason.
X: Subjobs only; subjob is finished (expired)
https://help.altair.com/2021.1.3/PBS%20Professional/PBSUserGuide2021.1.3.pdf
qsub -I -l select=1:ncpus=4:mem=16gb -l walltime=02:00:00 -P <projectId> -q normal
qsub -I -l select=1:ngpus=1 -l walltime=02:00:00 -P <projectId> -q normal
qsub -I -l select=1:ngpus=1 -l walltime=02:00:00 -P <projectId> -q ai
#!/bin/sh
module load singularity
module load singularity
For home and project directory:
STEP 1 - Submit an interactive job and enter any node.
$ qsub -I -l select=1:ncpus=1:ompthreads=1:mem=2GB -P <yourProject> -l
STEP 2 - Define which editor will be used.
export EDITOR=/bin/vim
STEP 3 - Find the file or folder which you plan to change.
$/usr/lpp/mmfs/bin/mmeditacl exampleFolder
user:mycolleague:r-x-:allow
STEP 4 - Check the ACL
$ /usr/lpp/mmfs/bin/mmgetacl exampleFolder
STEP 5 - Exit the job
For scratch directory:-
Set the permission for a file or folder:
setfacl -m u:<userid>:<rwx> file/folder
Get the permission:
getfacl file/folder
Remove the permission:
setfacl -x u:<userid> file/folder
Package scope:
Transparency:
Licensing:
Summary:
1.GENERAL
ASPIRE2A general questions, and configuration queries
Was this helpful?50
Was this helpful?51
Was this helpful?42
Was this helpful?25
Was this helpful?210
Was this helpful?00
2.ACCESSING ASPIRE2A
Queries regarding accessing ASPIRE2A, logging in and transferring files.
Was this helpful?10
Was this helpful?00
Was this helpful?32
Was this helpful?710
Was this helpful?010
Was this helpful?00
Was this helpful?05
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?10
Was this helpful?20
3.PROJECTS APPLICATION
NSCC resources can only be utilised through approved projects.
Was this helpful?50
Was this helpful?31
Was this helpful?00
Yes, the projects are accorded priority in the following order:
Was this helpful?10
All project allocations under the official Call for Research Project and Call for Educational HPC Project, if approved, will be valid for 1 year.
Was this helpful?00
Was this helpful?10
Was this helpful?00
Was this helpful?10
For the addition of existing users:
The PI/applicant of the project should email [email protected] with the following details:
Please note that only users with a valid NSCC user ID can be added to the project.
For users from stakeholder organisations (A*STAR, NUS, NTU, SUTD) without an account, please register for one here.
For adding of users from a non-stakeholder organisation (creation of account):
The applicant/PI of the project should email [email protected] with the following information:
Preferred User ID (limited to 8 characters):
Official Email Address:
NSCC Project ID:
Was this helpful?00
Reasons for the new request (e.g. how was the resources used, applications used etc.)
Calculations of the requested resource
Utilisation plan over the remaining months till the end of the project cycle
Deliverables achieved as of now
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
4.ACCOUNTING
Queries related to usage of projects, checking the allocation, and usage of project.
Was this helpful?21
Was this helpful?00
Was this helpful?00
Was this helpful?10
Was this helpful?00
5.APPLICATIONS AND LIBRARIES
All queries related to Aspire Applications and libraries
Was this helpful?10
Was this helpful?00
Was this helpful?00
Was this helpful?10
Was this helpful?00
Was this helpful?00
Was this helpful?01
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?20
Was this helpful?414
Was this helpful?00
6.JOB SUBMISSION AND SCHEDULING
Queries related to PBS Pro, Job submission, Job status,
Was this helpful?01
Was this helpful?00
Was this helpful?20
Was this helpful?00
Was this helpful?00
Was this helpful?02
Was this helpful?010
Was this helpful?22
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
7.FILES AND FILESYSTEMS
Queries regarding file systems such as home, scratch, project, fileset
Was this helpful?00
Please refer to the Technical Instruction for approved projects on:
Was this helpful?00
Was this helpful?10
Was this helpful?00
Was this helpful?00
8.SOFTWARE
Details of applications and software that are available in ASPIRE2A
Was this helpful?31
Was this helpful?710
Was this helpful?30
Was this helpful?00
Was this helpful?00
9.FILE TRANSFER
Queries related to File transfer
Was this helpful?10
Was this helpful?20
Was this helpful?01
Was this helpful?00
10.BASIC LINUX
Basic Linux commands
Was this helpful?01
Was this helpful?01
11.Miniforge
About Miniforge
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Was this helpful?00
Comments are closed.