Contents
- ACCESSING NSCC PETASCALE SUPERCOMPUTER
- ACCOUNTING
- APPLICATIONS AND LIBRARIES
- JOB SUBMISSION AND SCHEDULING
- FILES AND FILESYSTEMS
- SOFTWARE
- FILE TRANSFER
- BASIC LINUX
- RESOURCE REQUEST, APPROVAL and ALLOCATION
ACCESSING NSCC Petascale supercomputer
Server | CPU Model | Number of Cores | Number of Sockets | Effective cores/server | Available RAM | GPUs |
Standard Compute Node | E5-2690 v3 @ 2.60GHz | 12 | 2 | 24 | 128 GB | No GPU |
GPU compute node | E5-2690 v3 @ 2.60GHz | 12 | 2 | 24 | 128 GB | One Tesla K40t |
Large memory node | E7-4830 v3 @ 2.10GHz | 12 | 2 | 24 | 1TB | No GPU |
Large memory node | E7-4830 v3 @ 2.10GHz | 12 | 4 | 48 | 1TB | No GPU |
Large memory node | E7-4830 v3 @ 2.10GHz | 12 | 4 | 48 | 2TB | No GPU |
Large memory node | E7-4830 v3 @ 2.10GHz | 12 | 4 | 48 | 6TB | No GPU |
For non SingaAREN or commercial users (including NUSEXT domain users) has to send an email to [email protected] with the following details:
SUTDWindowssutd.nscc.sgputty
MobaXterm
SSH Secure clientFileZilla
Winscp
SSH Secure client
Entity | Operating System | Host/FQDN | Tool/Function | File transfer |
NUS | Windows | nus.nscc.sg |
putty MobaXterm SSH Secure client |
FileZilla Winscp SSH Secure client |
Linux/Unix/MAC | nus.nscc.sg | Terminal/SSH |
FileZilla (OSX) FileZilla (Linux) SCP rsync |
|
All | https://nusweb.nscc.sg | PBS Compute Manager/Display Manager | PBS Compute Manager | |
NTU | Windows | ntu.nscc.sg |
putty MobaXterm SSH Secure client |
FileZilla Winscp SSH Secure client |
Linux/Unix/MAC | ntu.nscc.sg | Terminal/SSH |
FileZilla (OSX) FileZilla (Linux) SCP rsync |
|
All | https://ntuweb.nscc.sg | PBS Compute Manager/Display Manager | PBS Compute Manager | |
ASTAR | Windows | astar.nscc.sg |
putty MobaXterm SSH Secure client |
FileZilla Winscp SSH Secure client |
Linux/Unix/MAC | astar.nscc.sg | Terminal/SSH |
FileZilla (OSX) FileZilla (Linux) SCP rsync |
|
All | https://astarweb.nscc.sg | PBS Compute Manager/Display Manager | PBS Compute Manager | |
Linux/Unix/MAC | stud.nscc.sg | Terminal/SSH |
FileZilla (OSX) FileZilla (Linux) SCP rsync |
|
All | https://sutd.nscc.sg | PBS Compute Manager/Display Manager | PBS Compute Manager | |
Direct users | Windows | aspire.nscc.sg |
putty MobaXterm SSH Secure client |
FileZilla Winscp SSH Secure client |
Linux/Unix/MAC | aspire.nscc.sg | Terminal/SSH | FileZilla SCP rsync |
|
All | https://aspireweb.nscc.sg | PBS Compute Manager
PBS Display manager |
PBS Compute Manager |
Follow the below instructions to connect to NSCC Supercomputer through portal.
- Open the web browser and browse for the URL provided above.
- Use the credentials of your organization to login to this portal and access NSCC Supercomputer.
Below instructions to connect using SSH/SCP on Unix/Mac PC:
- Open the terminal
- Type ssh [email protected]
- Enter the password (characters are invisible while typing the password)
- Once logged in successfully you should be able to see a $ prompt, allow you to type commands
If you want to use X11 interface, replace the above ssh command with ssh -X.
For X11 ssh -Y, make sure you installed XQuartz for OS X 10.8 or higher.
Connecting from Windows:
- Open putty or MobaXterm
- In case of putty, type the login host name and click on Open button (Putty may allow you to login automatically without asking for password. In case if putty asks for username/password, you need to type the login ID and password from your university.)
- In case of MobaXterm, type to command at the prompt ssh [email protected]
- Upon successful login, you will be prompted with $ to use NSCC Supercomputer
A*Star Users – login to astar-exanet.nscc.sg and download files from internet
NUS users – ssh to login.nscc.sg and download files from internet
NTU users – ssh to login.nscc.sg and download files from internet
Direct Users – ssh to aspire.nscc.sg and download files from internet
“openssl rsa -in old.key -out new.key”
Then the new.key can be loaded by puttygen.
Note: this need to be done only when using puttgen.exe and when facing the error
Minimum 8 characters
Mixture of Upper and Lower case
Contain Numbers (0-9)
At least include one Special chars [email protected]#$%^&*+=?><
http://nscc.sg – Corporate Website
http://beta.nscc.sg – Technical information about NSCC Supercomputer beta phase
http://workshop.nscc.sg – All NSCC related information can be found here
https://help.nscc.sg – Information pertaining to usage of NSCC Super computer and other technical details.
ACCOUNTING
If you anticipate running 1,000 instances of Gromacs, Lammps, OpenFOAM, Vasp, WRF, etc. each instance using 256-cores and lasting 24 hours per job instance, then you will need 10,000 x 256 x 24 = 6.144 million core hours (or rounded up to 6.2 million).
If you plan to run bwa on 1 million genome sequences, each bwa instance running with 12 threads for 2 hours, then you will need 1,000,000 x 12 x 2 = 24 million core hours.
APPLICATIONS AND LIBRARIES
MPI Interface is a wrapper to C/Fortran compiler, which means that all the code which is used in C/Fortran can be used with MPI with addition of penalization techniques
NSCC Petascale supercomputer is powered with IntelMPI for better performance of applications.
For example a molecular dynamics application which is serial takes 24 hours to complete a simulation, the same application parallelized with running 24 processes may complete approximately one hour.
It is strongly suggested to use parallel codes whenever it is possible.
OpenMP code is simple to use and does not need any wrappers, standard compilers like GCC/Gfortran or Intel C/Fortran can use OpenMP, however the program must have OpenMP directives.
You are advised to use the checkpoint and restart technique while writing your application so that in case of eventualities, you should be able to restart the job from the last checkpoint.
The applications, Libraries, and compilers are very dynamic and powered with Environment modules in NSCC Supercomputer, to list the available modules, use ”module avail” from at the prompt in NSCC Supercomputer login nodes.
JOB SUBMISSION AND SCHEDULING
- If you see a ”Q” in the column “S”, it means the scheduler has not yet considered your job. Be patient.
- If you see Storage resources unavailable, it means that you have exceeded one of your storage quotas.
- If you see Waiting for software licenses, it indicates that all the licenses for a software package you have requested are currently in use.
- If you see Not Running: Insufficient amount of resource ncpus, it indicates that all the cpus are busy. Please be patient, PBSPro scheduling is based on resources available and request, see Resource allocation policy for more details.
- The operating system what you are using is different from what is running in NSCC Supercomputer
Solution:- if you are running in your local linux machine, a simple recompilation using NSCC Login nodes will solve the issue
- if you are running in Windows PC and you want to run in NSCC Supercomputer, you need to either obtain a copy of the software for Linux or port the code from Windows to Linux
- Different compiler/library stack: In case the operating system is same as NSCC but still the job is not running, which means the compiler/libraries are not compatible hence you need to recompile using NSCC Supercomputer.
- Inputfile/jobscript file created in Windows Machine: please see FAQ – Why am I getting ^M: bad interpreter above
If the job is still failing despite that the above conditions are satisfied, you are advised to seek the guidance of NSCC Support helpdesk by creating the ticket from this portal.
- If you get a message in the .e file along the lines of
- /tbd/pbs/mom_priv/jobs/3917736.r-man2.SC: Command not found.
- or
/bin/csh^M: bad interpreter: No such file or directory
and you created your batch job script on a Windows box then you need to remove some extraneous invisible characters from the script. Say your batch job script is called runjob.sh then you should do
dos2unix runjob.sh
To convert the file from dos to unix format.
There are several simple editors in NSCC Supercomputer such as vi, nano, gedit so you can create batch scripts directly. - If you submit a script as an argument to qsub, check that there is a newline character at the end of the last executable line. The easiest way to do this is to simply cat the script if the last line of the script has your shell prompt attached, edit the file to put a blank line at the end
- Often when using a workstation, people run their job in the background, say ./runjob &, which works fine interactively. However, when translated to a queue batch script the result is often
#!/bin/sh
#PBS -q normal
#PBS -l walltime=00:10:00,mem=400MB
./runjob &This script will exit almost immediately as it is trying to run runjob in the background. Since the script exits immediately, the queue system assumes that the job is finished and kills off all user processes. Consquently, your code which runs fine interactively gets killed almost immedately out on the queue.
There are two solutions. Try the batch job script#!/bin/sh
#PBS -q normal
#PBS -l walltime=00:10:00,mem=400MB
./runjobNOTE: the missing &
BUT, if your runjob is itself a complicated script which starts up all sorts of program in the background try#!/bin/sh
#PBS -q normal
#PBS -l walltime=00:10:00,mem=400MB
./runjob
wait
Which will tell the shell (/bin/sh) to wait until all background jobs are finished before exiting. This will prevent your background jobs from being killed and allow your program to complete.
- If you are using an sh-derived shell for your jobs, edit the .bashrc file to ensure it contains the line . /etc/bashrc.
- If you are using a csh-derived shell for your jobs, edit the .cshrc file to ensure it contains the line source /etc/csh.cshrc.
There is a chance that the .bashrc file might of deleted accidentally, you may copy back the file from the skeleton file /etc/skel/.bashrc to home directory, example of command is cp /etc/skel/.bash* ~/
For example, since NSCC Supercomputer has 24 cores per node, if you request -l slect=2:ncpus=24, mem=10GB, the actual limit will be 10GB on each of the two nodes. If you exceed this on either of the nodes, your job will be killed.
Please note, if a job runs for less than a few minutes, the memory use reported in your .o file once the job completes may be inaccurate. We strongly discourage people running short jobs of, e.g. less than 1 hour. This is because there is significant overhead in setting up and tearing down a job and you may end up wasting large amounts of your grant. Instead, if you have many short jobs to run, considering merging them together into a few longer jobs
RSS exceeded.user=abc123, pid=12345, cmd=exe, rss=4028904, rlim=2097152 Killed
Each interactive process you run on the login nodes has imposed on it a time (30mins) limit and a memory use (2GB) limit. If you want to run longer or more memory intensive interactive job, please submit an interactive job (qsub -I)
External Queue Name | Internal Queue Name | Walltime | Other Limits | Remarks |
largemem | 24 hours | To be decided | For jobs requiring more than 4GB per core | |
normal | dev | 1 hour | 2 standard nodes per user | High priority queue for testing and development works |
small | 24 hours | Up to 24 cores per job | For jobs that do not require more than one node | |
medium | 24 hours | Up to the limit as per prevailing policies | For standard job runs requiring more than one node | |
long | 120 hours | 1 node per user | Low priority queue for jobs which cannot be checkpointed | |
gpu | gpunormal | 24 hours | Up to the limit as per prevailing policies | For “normal” jobs which require GPU |
gpulong | 240 hours | Up to the limit as per prevailing policies | Low priority GPU jobs which cannot be checkpointed | |
iworkq | 8 hours | 1 node per user | For visualisation |
For more information, please contact Service desk through the portal or write a mail to [email protected]
#PBS -M [email protected]
#PBS -m abe
a Send mail when job or subjob is aborted by batch system
b Send mail when job or subjob begins execution
e Send mail when job or subjob ends execution
n Do not send mail
Please Note sending emails using mail command on NSCC Supercomputer is disabled.
F : Job is finished. Job has completed execution, job failed during execution, or job was deleted.
H : Job is held. A job is put into a held state by the server or by a user or administrator. A job stays in a held state until it is released by a user or administrator.
M : Job was moved to another server
Q : Job is queued, eligible to run or be routed
R : Job is running
S : Job is suspended by server. A job is put into the suspended state when a higher priority job needs the resources.
T : Job is in transition (being moved to a new location)
U : Job is suspended due to workstation becoming busy
W : Job is waiting for its requested execution time to be reached or job specified a stagein request which failed for some reason.
X : Subjobs only; subjob is finished (expired.)
Please refer to the PBSPro manual for more details:
http://resources.altair.com/pbs/documentation/support/PBSProUserGuide12.1.pdf
#!/bin/bash
#PBS …
for i in {1..10}; do
qsub -v PBS_ARRAY_INDEX=$i job-script
done
You could also run 24 single-cpu jobs in parallel if they all use similar resources and will finish around the same time by using the following in the jobscript:
#!/bin/bash
#PBS -l ncpus=24
#PBS …
for i in {1..24}; do
./run_my_program args … &
done
wait
Please note the & at the end of the command line, and the wait for all background tasks to finish.
FILES AND FILESYSTEMS
SOFTWARE
FILE TRANSFER
rsync -arvz /scratch/johnsmit/project1 directory login.nscc.sg: /project/johnsmit/
*Please replace login with the respective login node mentioned in the table access methods
rsync -arvz /<projectdir>/<project name>/<username>/<copy from> myorgnizationcluster:/<my destination directory>
BASIC LINUX
RESOURCE REQUEST, APPROVAL and ALLOCATION
Note that incomplete submissions will not be processed.
- Scientific merits
- Quality and completeness of application
- Track record of applicant (compliance with previous deliverables/projects outcomes)
- Alignment with national/NSCC agenda
- Resources requested vs stakeholder’s fair share
- Expected measurable outputs (e.g. number of patents, publications and postgraduate students etc.)
- Individual submission
- Group submission
- Industry submission
- Government Agency submission
- 1 – 28 Feb – Call for resource request submission
- 1 – 31 Mar – First round of verification by NSCC Project Admin
- 1 – 30 Apr – Second round of verification by TRAC
- 1 – 31 May – Approval by PRAC
- First 3 weeks of Jun – Resource provisioning by NSCC Tech Team
- Last week of Jun – Send Resource Request Approval notification email
https://help.nscc.sg/wp-content/uploads/2017/06/Technical_Instruction_Approved_Project_v0.1.pdf
- Login to the project portal (https://user.nscc.sg/project) and update the members section to include the userIDs to be granted permission.
- Email [email protected] about the request for immediate processing.
Comments are closed.