ASPIRE 2A+: Expect Longer Wait Times for Large Jobs from 14 Oct 2025, 6:00 PM (SGT)​

Dear ASPIRE 2A+ Users,

Please note that starting from 14 October 2025 (Tuesday), 6:00 PM (SGT), a project reservation will temporarily reduce the general resource pool in the ASPIRE 2A+ system.

Impact:

  • Large jobs (More than 8 GPUs): Expect longer queue wait times.
  • Small jobs (1–8 GPUs): No impact.

Tips:

  • To reduce delays, please enable checkpointing or pre-emption where possible.
  • Users are encouraged to schedule job runs between 6:00 PM to 10:00 AM to leverage off-peak hours, which may result in shorter queue times.

Thanks for your patience and cooperation. Kindly look out for updates via the MOTD or email announcement if there are any changes to the arrangement.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

[Resolved] Service Disruption for Users Accessing ASPIRE 2A​

Dear ASPIRE 2A Users,

 

We are pleased to inform you that the issue with the Lustre storage system issue has been resolved as of 1:15 AM, 4 October 2025. You may proceed to login to the ASPIRE 2A system as per normal.

 

We apologise for any inconvenience this may cause and thank you for your understanding. Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

 
Thank you.

 

Warm regards,

The NSCC Team

[Update 2] Service Disruption for Users Accessing ASPIRE 2A​

Dear ASPIRE 2A Users,

This is to update you on the service disruption on the ASPIRE 2A system.

Actions Taken:
  • Conducted checks on the hardware and filesystem.
Next Steps:
  • HPE engineers and the NSCC team will continue the filesystem recovery to ensure the integrity of data.

Next Update: 9:00 AM, 4 October 2025
 

Thank you.

Warm regards,
The NSCC Team

[Update] Service Disruption for Users Accessing ASPIRE 2A​

Dear ASPIRE 2A Users,

This is to update you on the service disruption on the ASPIRE 2A system.

Actions Taken:
  • The replacement parts have arrived, and the hardware replacement was successfully completed at 1:15 PM.
Next Steps:
  • Conduct checks on the hardware and filesystems.
  • HPE engineers, supported by the NSCC team, will proceed with recovery.

Next Update: 8:00 PM, 3 October 2025
 

Thank you.

Warm regards,
The NSCC Team

Service Disruption for Users Accessing ASPIRE 2A

Dear ASPIRE 2A Users,

We wish to inform you that there is a service disruption on the ASPIRE 2A system. Our team is diligently investigating the issue and working towards a swift resolution.

  • Start Time: 7:15 AM

  • Type: Priority 1 outage

Cause of Disruption:

  • Issues with the Lustre storage system.

  • A storage volume consisting of 53 HDDs was lost from both HA pair nodes. Automatic failover was unsuccessful.

Impact of the Disruption:

  • All users are currently unable to access the ASPIRE 2A system.

Actions Taken:

  • Initial troubleshooting began after both nodes were restarted at 8:00 AM.

  • Job dispatch has been temporarily disabled.

  • Hardware issue confirmed and replacement parts are arranged.

Next Steps:

  • Replacement parts are scheduled to arrive by 2:00 PM.

  • HPE engineers, supported by the NSCC team, will proceed with recovery immediately afterwards.

 

Next Update: 2:00 PM, 3 October 2025

 

We apologise for the disruption and are treating this with the highest priority.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected] if you have any questions.

 

Thank you.

 

Warm regards,

The NSCC Team