Training Agenda:
1. Introduction to HPC and PBS Works
- Products, features, and benefits
- PBS Pro components and roles
2. Job Management
- PBS job types
- Submitting jobs
- Managing jobs
- Job attributes
- Requesting job resources
- Default job resources
- Managing jobs
- Learning about various queues in ASPIRE 2A
- Exercise: Practice submitting jobs and observe behaviours
3. Advanced Job Management
- Job exit codes
- File processing
- Job submission dependencies
- Job submission, signalling, shrink-to-fit jobs
- Exercise: Understand how to submit a job that is dependent on a previously submitted job’s status
4. Sample Job Scrits
- Single-node CPU jobs (SMP/multi SMP)
- Multi-node CPU jobs (MPI)
- Single-node GPU jobs
- Multi-node GPU jobs
- Data transfer jobs
- Exercise: Using PBS directives in a job script instead of specifying on qsub command line
- Exercise: Using the various mail notifications, observe information sent to the user’s email address
- Exercise: Practice sample job scripts discussed
5. AI Cluster
- Sending jobs to AI cluster
- Using NVME fast storage and best practices
- Sample AI jobsQuerying PBS commands from PBS102 cluster
6. Job Arrays
- Concept of job array
- Job array example scripts
- Job array environment variables
- Querying a job array
- Job array and subjob states
- Job array terminology
- Exercise: Practice submitting array jobs
7. Reservations
- Advance reservations (concept, requesting, status, submitting jobs)
- Exercise: Check status of advance reservation
- Exercise: Submit jobs to a confirmed reservation
8. Troubleshooting
- Understanding qsub errors
- Understanding job comments in qstat output
- General errors
- Documentations and Altair Community
9. Altair Access (Job Portal and Visualization Portal)
- Job submission
- Job monitoring
- Desktops for remote visualization
- Managing folders and files
- Personalization
- Documentations and Altair community
- Exercise: Submit Jupyter notebook job on Job Portal
- Exercise: Submit xterm desktop session job and launch any application
Prerequisites:
1. A valid user account on the NSCC system, ASPIRE 2A
2. Pre-installed SSH clients like Putty or Moba-Xterm to connect to ASPIRE 2A on user’s laptop
3. Basic understanding of Linux commands
a. File management
b. “vi” editor
c. Using “modules” in Linux
d. Process management
4. Basic PBS Pro job management
5. Users are expected to have participated in the introductory session and know how to log in to the ASPIRE 2A login nodes.
Expected outcome:
At the end of this course, users will have a fair understanding of advanced job management such as array jobs, reservations, job dependencies, file stage-in/out, job portal and visualization portal.