Cybersecurity Incident Investigation for ASPIRE 2A

Dear ASPIRE 2A users,

 

Please be informed that we are conducting an investigation on a cybersecurity incident for ASPIRE 2A. Our team is working to ensure the integrity and safety of our system.

 

Because this requires a methodical, system-wide verification process, we are currently unable to provide an exact restoration time.

 

Impact During this Period:
Users will not be able to access the system during this period.

 

We are working diligently to complete this verification and resolve the matter as quickly as possible.

 

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

 

Thank you.

Warm regards,
The NSCC Team

[Advisory] MPI, Debugger and Profiler Behavior After CVE-2026-46333 Mitigation​

Dear NSCC Users,

Red Hat published a security advisory (CVE-2026-46333, Red Hat Security Bulletin RHSB-2026-004) describing a local privilege-escalation vulnerability in the Linux kernel. NSCC has applied Red Hat’s recommended mitigation on ASPIRE 2A on 17 May 2026.

The same mitigation has also been applied to ASPIRE 2A+ as a precautionary measure, while we await a response and further guidance from NVIDIA.

This is a defense-in-depth measure intended to ensure continued protection through the planned June kernel upgrade, where the underlying conditions may change. Please be assured that no NSCC user data, jobs, or accounts are known to have been affected by this vulnerability.

However, because this mitigation restricts certain kernel-level process tracking and memory access, it will temporarily alter the behavior of development tools and MPI frameworks as detailed below.

 

Debuggers and Profilers
Debuggers, profilers, and tracing tools that rely on ptrace may not be functional as expected under this mitigation. The most common symptom is “Operation not permitted” when a tool attempts to inspect or attach to a process.

General Guideline: Workflows where a tool starts your program from the beginning are more likely to continue working. Workflows where a tool reaches into or attaches to a process that is already running will likely fail.

If your usual workflow involves attaching to an existing PID (e.g., gdb -p, strace -p, perf -p, nsys attach, ncu –pid, py-spy –pid, gcore), expect it to fail or behave incorrectly.

Tools that may not be functional as expected include, but are not limited to:

  • gdb, cuda-gdb
  • strace, ltrace
  • perf (record, stat, top)
  • nsight-systems (nsys), nsight-compute (ncu)
  • VTune
  • CrayPat (pat_run), perftools-lite
  • valdrind4hpc, heaptrack (in some modes)
  • gcore, py-spy, rr, and other tools that read /proc/<pid>/mem

 

Important Note: Arm Forge (DDT, MAP, PR) is known not to work under this mitigation and should not be used until further notice.

This list is not exhaustive. If you use a tool not mentioned here and observe unexpected behavior, assume it may be related to this change.

MPI and Intra-node Communication
Cray MPICH’s shared-memory single-copy optimizations (XPMEM, cross-memory-attach) rely on kernel mechanisms that this mitigation also gates. Without action, Cray MPICH jobs would be expected to fail or hang on intra-node communication.

We have applied site-wide environment defaults to disable the affected single-copy paths:
export MPICH_CH4_XPMEM_LMT_MSG_SIZE=NONE
export MPICH_SMP_SINGLE_COPY_MODE=NONE

With these in place, Cray MPICH is expected to function normally. The exports are applied in the default module environment, so you do not need to add them to your job scripts unless you have explicitly overridden them. There may be a small performance impact on large intra-node messages, but correctness is preserved.

OpenMPI and NCCL are not affected and require no changes.

If you build your own MPI from source, or use a non-default MPI distribution, please apply equivalent flags to disable single-copy / cross-memory-attach mechanisms in your implementation.

Next Steps
We will revisit and roll back this mitigation once patched kernels are available, validated, and deployed in June. Until then, thank you for your patience and cooperation as you adapt your workflows.

If a particular task or business-critical workflow is materially impacted, please reach out to us so we can assist.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

[Completed] NUS Fire Certification Inspection and Electrical Shutdown Affecting ASPIRE 2A & 2A+ from 15 May 2026, 3PM to 18 May 2026, 10AM

Dear NSCC users,

We are pleased to announce that the activities has been completed. You may proceed to login to the ASPIRE 2A and 2A+ systems as per normal.

Important Note for ASPIRE 2A Users:

  • There will be temporary limitations to the MPI, debugger and profiler behavior after the CVE-2026-46333 mitigation. Please refer to our subsequent follow-up email for specific details.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

NUS Fire Certification Inspection and Electrical Shutdown Affecting ASPIRE 2A & 2A+ from 15 May 2026, 3PM to 18 May 2026, 10AM

Dear NSCC users,

We wish to provide you with an update immediately following the confirmation of the schedule for the upcoming Fire Certification Inspection and Electrical Shutdown at the NUS Innovation 4.0 building. All services to the ASPIRE 2A and ASPIRE 2A+ systems will be affected.

Maintenance Details:
  • Start: 15 May 2026 (Friday), 3:00 PM SGT
  • End: 18 May 2026 (Monday), 10:00 AM SGT
Impact During the Maintenance Period:
  • There will be a full shutdown of the ASPIRE 2A and ASPIRE 2A+ systems.
  • All queues will stop dispatching jobs, and all remaining jobs running will be terminated gracefully before the system shuts down.
  • Users will not be able to access the systems during the maintenance period.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

[Resolved] Service Disruption for NTU and SUTD Users Accessing ASPIRE 2A & ASPIRE 2A+​

Dear NTU and SUTD users,

We are pleased to inform you that the network issue has been resolved. You may proceed to login to the ASPIRE 2A and 2A+ systems as per normal.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

Service Disruption for NTU and SUTD Users Accessing ASPIRE 2A & ASPIRE 2A+​

Dear NTU and SUTD users,

We wish to inform you that there is a service disruption on the network access to ASPIRE 2A and 2A+ system. Our team is diligently investigating the issue and working towards a swift resolution.

Cause of Disruption:
Issue with network connectivity between NSCC, NTU and SUTD.

Impact During the Maintenance Period:
NTU and SUTD users will not be able to access the ASPIRE 2A and ASPIRE 2A+ systems from their respective institution’s network.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

[Completed] Urgent Maintenance for ASPIRE 2A & 2A+ System on 8 May 2026, 9AM

Dear NSCC users,

We are pleased to announce that the urgent scheduled system maintenance for ASPIRE 2A & 2A+ has been completed.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

Urgent Maintenance for ASPIRE2A & 2A+ System on 8 May 2026, 9AM

Dear NSCC Users,

We wish to inform you that urgent system maintenance is currently being carried out on ASPIRE 2A & 2A+ to implement the mitigation from Dirty Frag Linux local privilege escalation vulnerability.

Maintenance Details:

  • Start: 8 May 2026 (Friday), 9:00AM SGT
  • End: 11 May 2026 (Monday), 6:00PM SGT

Impact During the Maintenance Period:

  • Users will not be able to access the ASPIRE 2A and ASPIRE 2A+ systems during the maintenance period.

Should you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

[Completed] Urgent Maintenance for ASPIRE2A on 30 April 2026, 8PM

Dear NSCC Users,

We are pleased to announce that the urgent scheduled system maintenance for ASPIRE 2A has been completed.

If you have any questions or need assistance, please contact our Helpdesk via the Service Desk Portal or email us at [email protected].

Thank you.

Warm regards,
The NSCC Team

ASPIRE 2A and ASPIRE 2A+ Scheduled System Maintenance From 3 Jun 2026, 9am to 8 Jun 2026, 4pm​

Dear Users,

Please note that the ASPIRE 2A and ASPIRE 2A+ systems will be undergoing a scheduled system maintenance from 3 June 2026, 9am to 8 June 2026, 4pm​. The scheduled system maintenance is to ensure long term reliability, uptime and stability of the systems.

 

Do take note of the following dates.

ASPIRE 2A Maintenance activities:

  1. RedHat/Rocky Linux Patch
  2. HPCM Platform Software Stack Upgrade
  3. OS security patches for non BCM servers
  4. Coolant Replacement
  5. PBS Update
  6. System Health Check.

ASPIRE 2A+ Maintenance activities:

  1. DGX H100 and ConnectX-7 Firmware Upgrade on Compute Nodes
  2. DDN H6100 NAS Appliance Firmware from 3.11.6.4 to 3.11.6.6
  3. DDN EXAScaler Software and Firmware
  4. Ubuntu OS Update On BCM Servers
  5. PBS Update
  6. System Health Check

Please contact our Helpdesk via the Service Desk Portal or email us at [email protected] if you have any questions.

Thank you.

Warm regards,
The NSCC Team