3rd High-Performance Computing
3rd High-Performance Computing
NIST IR 8476
Yang Guo
Jeremy Licata
Victoria Pillitteri
Sanjay Rekhi
Robert Beverly
Xin Yuan
Gary Key
Rickey Gregg
Stephen Bowman
Catherine Hinton
Albert Reuther
Ryan Adamson
Aron Warren
Purushotham Bangalore
Erik Deumens
Csilla Farkas
September 2023
Certain commercial equipment, instruments, software, or materials, commercial or non-commercial, are identified in
this paper in order to specify the experimental procedure adequately. Such identification does not imply
recommendation or endorsement of any product or service by NIST, nor does it imply that the materials or
equipment identified are necessarily the best available for the purpose.
There may be references in this publication to other publications currently under development by NIST in
accordance with its assigned statutory responsibilities. The information in this publication, including concepts and
methodologies, may be used by federal agencies even before the completion of such companion publications. Thus,
until each publication is completed, current requirements, guidelines, and procedures, where they exist, remain
operative. For planning and transition purposes, federal agencies may wish to closely follow the development of
these new publications by NIST.
Organizations are encouraged to review all draft publications during public comment periods and provide feedback
to NIST. Many NIST cybersecurity publications, other than the ones noted above, are available at
https://csrc.nist.gov/publications.
Publication History
Approved by the NIST Editorial Review Board on 2023-09-20
Contact Information
ir8476-comments@nist.gov
All comments are subject to release under the Freedom of Information Act (FOIA).
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
Abstract
High-performance computing (HPC) is a vital computational infrastructure for processing large
data volumes, performing complex simulations, and conducting advanced machine learning
model training. As such, HPC is a critical component of scientific discovery, innovation, and
economic competitiveness. Cybersecurity thus plays an important role in HPC by safeguarding
against abuse and misuse and ensuring data and research integrity. However, HPC systems often
have unique hardware, software, and user environments that pose distinct cybersecurity
challenges. This collaborative workshop gathered stakeholders from government, academia, and
industry to discuss community needs, ongoing activities, and future directions in HPC security.
This public workshop report provides detailed summaries of technical sessions, key takeaways
from breakout sessions, and a summary of the keynote presentations.
Keywords
high-performance computing (HPC); HPC security; HPC security posture; security guidance;
security compliance.
i
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
Table of Contents
Introduction: Workshop Objective, Participants, and Agenda ....................................... 1
Workshop Session Highlights and Summaries................................................................ 5
HPC Architecture and Security Posture ........................................................................ 5
HPC Operator Security Experience............................................................................... 6
Risk Management Framework Development, Implementation, and Assessment ......... 7
2.3.1. Presentation on the Trusted CI Framework .............................................................. 8
2.3.2. Presentation on the Development of TOSS 4 STIG .................................................. 8
2.3.3. Panel Discussion ....................................................................................................... 9
HPC Security Research .............................................................................................. 10
HPC Vendor Viewpoints .............................................................................................. 11
Breakout Session Key Takeaways .................................................................................. 13
HPC System Vulnerabilities and Threats .................................................................... 13
HPC RMF: Challenges and Opportunities................................................................... 14
HPC Security Implementations, Best Practices, and Challenges ............................... 16
Future HPC System and Its Implications for Security ................................................. 17
3.4.1. Hardware ................................................................................................................. 17
3.4.2. Software .................................................................................................................. 17
3.4.3. Policy ....................................................................................................................... 18
Keynote Summary ............................................................................................................. 19
Keynote 1 — The NSF HPC Security Landscape: Research Challenges to Production
Capabilities .............................................................................................................. 19
Keynote 2 — DoE’s Office of Science HPC Cybersecurity ......................................... 20
Keynote 3 — Usable Computer Security and Privacy to Enable Data Sharing in High-
Performance Computing Environments ................................................................... 21
List of Tables
Table 1. 3rd High-Performance Computing Security Workshop Agenda — Day 1 ...................... 2
Table 2. 3rd High-Performance Computing Security Workshop Agenda — Day 2 ...................... 3
List of Figures
Fig. 1. Workshop attendee composition ....................................................................................... 2
ii
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
Preface
This report is intended to capture the activities and pertinent topics from the 3rd High-
Performance Computing Security Workshop for public record and dissemination. The content of
this report has been assimilated from individual workshop session scribe notes and edited for
clarity and uniformity. As such, no claims are made as to the completeness or accuracy of the
content herein. Rather, this report should be viewed as a summary of discussion-worthy points
from the workshop. Further, this report does not reflect official National Institute of Standards
and Technology (NIST), National Science Foundation (NSF), or federal positions or policies. All
listed talks and participants provided consent for their names and session content to appear in
this document.
iii
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
1
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
Attendee Percentage
5%
21%
41%
33%
Government Universities
Industries Others
The in-person two-day workshop attracted 102 attendees, with 41% representing government
agencies, 33% from academia, 21% from industry, and 5% from other sectors (refer to Fig. 1).
Among the attendees, over 50% held key roles in HPC security by serving as Chief Information
Security Officers (CISOs), security engineers, security controls assessors (SCAs), HPC
architects, and system engineers. The workshop encompassed three keynote talks, five technical
sessions, multiple interactive panel discussions, and four breakout sessions with specific
discussion focuses. Distinguished speakers were drawn from government agencies, national
laboratories, NSF-funded open science supercomputing centers, universities, and industries. The
workshop agenda and speakers’ details can be found in Table 1 and Table 2.
The rest of the workshop report is organized as follows:
• Section 2 covers highlights and summaries of individual sessions.
• Section 3 reports on the key takeaways of breakout sessions.
• Section 4 includes a summary of the keynotes.
Table 1. 3rd High-Performance Computing Security Workshop Agenda — Day 1
2
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
3
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
4
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
5
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
During the final presentation, Ian Lee from Lawrence Livermore National Laboratory (LLNL)
provided an overview of the security operations conducted at Livermore Computing. As the
complexity of HPC systems continues to grow to meet user needs, the challenges associated with
securing these systems have also escalated. While security compliance frameworks offer some
assistance, they are not the sole solution to ensuring comprehensive security. Ian emphasized the
critical importance of system monitoring, gaining insights into ongoing activities, and
illuminating the computing environment. He also described the implementation of a continuous
monitoring system at LLNL, which effectively addresses these requirements. This system
enables proactive monitoring and offers valuable visibility into the computing environment to
enhance security measures.
6
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
software dependencies and a larger attack surface has presented the operators with new
challenges in ensuring that users can operate securely.
Indeed, users themselves can present threats to HPC systems. In an HPC environment, it is not
unusual for users to bring their own code and compile their own binaries. The panel agreed that
this open software policy is necessary to support the unique needs of their diverse population of
science users but also agreed that user-installed software presents challenges to understanding
the threat surface and how to identify and patch vulnerable code. The panel unanimously agreed
that they strove to support users and their unique application needs but stressed that they have to
be aware of these needs and that users should engage earlier and more deeply with the operators.
The panel also discussed how the inherently multi-node, distributed nature of typical HPC
systems presents an advantage in performing rolling reboots, wherein machines are patched
when their jobs are finished. As the job scheduler will simply choose an available node, the patch
and reboot are transparent to the end users.
The panel identified software supply chain security as both a current and future challenge,
especially in HPC, where code provenance is largely unknown and users compile their own code.
Asked how they might redesign HPC systems to be more secure, one panelist emphasized the
power of virtualization, containers, and the ability to limit the attack surface. The panel discussed
the relative merits, performance, and obstacles to such a pure container-based approach.
In contrast, one panelist discussed how they have embraced end users as a second set of eyes or a
“human intrusion detection system.” The panelist noted that since the users are most intimately
familiar with their workload and how it performs on the HPC system, they are often able to
identify problems that could be indicative of a system failure or a larger security event that
warrants investigation. This was another example of how deeply engaging with HPC users
yielded benefits for this particular operator, wherein their users alerted them to potential
anomalies before the operators themselves were aware of the problem.
In the end, the panel enthusiastically identified with the HPC mission and emphasized that the
“supercomputer is a research instrument.” They further highlighted the market-like forces that
incentivize them to provide particular capabilities, both in terms of functionality and security. If
users are unable to effectively use one HPC system, they may choose to use a different one.
Thus, the operators concluded that making HPC performant, featureful, and secure ultimately
supports users, impacts science and discovery, and ensures a continued stream of funding.
Finally, when queried about their hypothetical primary concern in the context of a cybersecurity
incident (“what keeps you up at night”), the panel agreed that the most worrisome potential
consequence is that the science itself is somehow called into question. The panelists emphasized
how this fear has driven their continued work to secure their respective HPC and to continue to
balance the security and usability of the system amid a continually evolving threat and policy
landscape.
7
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
discuss how the NIST RMF supports their HPC cybersecurity programs and share valuable
insights, best practices, future challenges and opportunities, and lessons learned. Two panelists
presented their work on developing a security framework for scientific HPC facilities and
supporting the implementation of RMF, and a moderated discussion to exchange ideas followed.
8
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
9
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
Specific security requirements, such as auditing services, has the potential to impact the
availability of computing resources. This issue has been identified as a common challenge across
HPC environments. Another key challenge and opportunity is the need for effective
communication and information sharing (e.g., sharing lessons learned). In many cases,
organizations go through their own experimentation process on a case-by-case basis to derive
local solutions that address requirements or system issues. For instance, one organization has
developed a solution for limiting the performance impact of the auditing service. Sharing that
information through a common knowledge repository would be beneficial across the HPC
community.
The panelists agreed that communicating HPC challenges and needs to organizational
management requires the ability to express the risks without relying on cybersecurity jargon that
may not be universally understood.
There is also a growing concern that the security talent pool is shrinking, and outreach efforts are
needed to bring in the next generation of professionals. The existing community is already facing
limitations and experiencing attrition like other sectors. Creating more recruiting opportunities
and building on the existing knowledge base will be vital to the ongoing success of HPC security
in the future.
In summary, the panelists’ recommendations emphasized the importance of comprehending the
HPC operational environment. This includes documenting security and privacy requirements and
keeping track of security and operational performance metrics, which will allow management to
gain valuable insights into how well the HPC is performing in terms of both security and mission
objectives. Furthermore, automation will help achieve consistent implementations for
standardized and objective requirements. Tools like Gitlab and Ansible play a significant role in
facilitating the maintenance and sharing of scripts that support the consistent implementation of
security requirements unique to HPC environments.
10
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
HPC domain but has grown into a larger project of building a general infrastructure that
supports automated detection of vulnerabilities.
3. Dr. Hoda Maleki from Augusta University presented research on searchable encryption
of scientific data. Searchable encryption is a popular field, and many techniques have
been developed, largely for cloud computing environments. This research focuses on
developing searchable encryption specifically for scientific data, which have distinct
characteristics from other types of data. The project has just started, and Hoda mostly
discussed what is to be expected in this project.
4. Dr. Erik Deumens, Director of University of Florida Information Technology Research
Computing, gave an informative talk on their efforts to create a community of practice
for regulated research. He discussed the need for standardization in this domain and the
SP 800-223 HPC security effort. He also described activities to build the community,
including monthly webinars, sharing resources, and organizing workshops. Additional
information can be found at the Regulated Research Community of Practice (RRCoP)
project website.
5. Dr. Joseph Manzano from Pacific Northwest National Laboratory (PNNL) gave a talk on
security issues for systems with accelerators. He described a scenario in which the
accelerator hardware can be exploited to infer information.
6. Dr. Yong Chen from Texas Tech University presented the research and development of a
software infrastructure that provides lightweight provenance and a service for the
collection, management, and analysis of provenance data (i.e., the metadata that describes
the history of the data). He also discussed how such a system can be used for security
purposes. The system has been deployed at the computing center at Texas Tech, and Dr.
Chen gave a demonstration of the system in the presentation. The capability of the
infrastructure can improve the productivity of science in complex HPC simulation and
analysis.
11
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
manages the entire HPC system via multiple service meshes, networks, and security
policy enforcement points.
• David Reber from Nvidia delivered a presentation titled “Data Center-Wide Security by
Default,” which tackled the challenges that stem from the disaggregated corporate
perimeter and distributed computing, including the user-edge-central data-center
computing paradigm and federated model training. Nvidia utilizes confidential computing
(CC), Morpheus, and data processing units (DPUs) to enhance security. CC establishes a
trusted computing environment that facilitates secure deployments of proprietary
artificial intelligence (AI) models and federated learning. NVIDIA Morpheus — a
software development kit (SDK) accelerated by graphics processing units (GPUs) —
empowers cybersecurity developers to construct optimized AI pipelines for real-time data
filtering, processing, and classification. Additionally, DPUs represent a novel class of
programmable processors that offer services such as compute attestation, storage file
management, network pluggable alternatives, and telemetry. Toward the conclusion of
the talk, David emphasized the importance of adopting innovative technologies like
DPUs, CC, and Morpheus to ensure secure HPC in the future.
• Lowell Wofford from Amazon Web Services (AWS) presented his perspective on HPC
security in a talk titled “Beyond the Walled Garden: Cloud Security Patterns for HPC.”
As HPC computing expands beyond a single trusted computing zone, securely
orchestrating workflows becomes a significant challenge due to numerous interactions
among different components. Creating custom security controls for each interaction is
neither scalable nor reliable. To address the issue, one potential solution is to adopt an
API-driven design with unified APIs for the entire computing infrastructure. By
enforcing API access policies at the unified API layer, access can be unified and
streamlined. The second challenge explored in this presentation was the dynamic creation
of trusted computing zones for individual tenants or applications in a multi-tenant,
distributed computing infrastructure. Lowell suggested that such requirements can be
facilitated by making trust programmatic and declarative so that individual walled
gardens of trust zones can be created on demand.
12
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
13
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
• System management services may expose specific vulnerabilities that could be exploited
by attackers.
• HPC systems support multi-user simultaneous use of the system without complete
segmentation.
• Securing high-speed interconnects in HPC systems requires specific considerations that
differ from other distributed systems.
The third working group discussion point focused on evaluating the various impacts of potential
cyber attacks from the perspectives of various HPC stakeholders (e.g., system owners, vendors,
end users, and the scientific community), especially impacts to scientific discovery, business
decision making, and reputation. The group agreed measuring such impacts may encompass
multiple factors, such as financial loss, reputation loss, and environmental impacts.
The fourth working group discussed the security measures utilized by enterprise information
systems and their effectiveness when applied to HPC systems. The attendees specifically
explored the measures that are most and least effective when implemented in HPC systems. The
consensus was that the following security best practices are often straightforward to implement
and highly effective in an HPC environment:
• Multi-factor authentication
• Network security monitoring
• Intrusion detection
• Vulnerability scanning
• Log aggregation
• Antivirus scanning
• Endpoint detection
• Hardware-based encryption
The least effective security controls, which also significantly impact system performance, were
identified as:
• Software-based encryption
• Patching and software maintenance
• Endpoint protection
• Anti-virus software
• Signature-based detection of malicious software
14
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
• Antivirus controls (SI-3): The requirement to scan Petabyte-scale file systems takes an
extensive amount of time and leads to significant performance issues for storage
operations. Additionally, some antivirus tools do not offer the option to exclude
filesystems or coordinate multiple scanners, resulting in multiple simultaneous scans of
the same filesystem. Targeting scans on the edge and gateway nodes, where files enter
the HPC system, as well as the file systems that hold the system and configuration data
should address the highest risk.
• File integrity monitoring (SI-7): Configuration management and validation can be
implemented using modern tools instead, such as Puppet Check.
• Logging of all file accesses (AU-2, AU-6): In particular, all read accesses cause
performance degradation on large parallel file systems with high I/O (input/output)
activity. One solution is to provide compensation information to identify the files that
could be accessed without checking each file (e.g., monitor at the directory level).
• Intrusion detection checking causes performance problems (SI-4): Using out-of-band
intrusion detection provides the needed visibility into the data flow without impacting
performance. Walking the filesystem can also determine whether files have changed.
• Scanning all open ports, protocols, and services (CM-7(1)): It is challenging to scan all
open ports, protocols, and services in HPC development environments due to the large
amount of application development activities. Instead, the daemons, services, and ports
that support the cluster should be listed as they are known services. Any services or
daemons created by users to be “published” also need to be reviewed. In addition,
scanning edge nodes provides needed visibility of the greatest risks.
• Application allowlist/denylist (CM-7(5)): This control is infeasible in a development
environment. Most applications have a simple data flow for input and output, are heavily
compute bound, and are executed in non-privileged mode. Allowlisting or denylisting
does not change the risk profile significantly. The applications that involve ports to the
outside network need closer scrutiny before deployment.
• Static code analysis and rigorous code review required for researching software
development causes unacceptable delays in the software development process (RA-5): A
software process where code goes through stages of development, testing, and production
allows for effective risk management without delaying the development cycle.
• Unsupported hardware and software are common on research HPC systems (SA-22):
There is a time-delay gap between when vendors provide novel HPC systems and system
components and when authorization officers are comfortable approving them.
The session also discussed other security control issues and the best practices that have been
implemented in practice:
• STIG requirements can pose implementation problems. The STIG may call out specific
vendor products instead of focusing on desired functionality. Ideally, the STIG should be
designed similarly to the RMF and allow sites to partially implement it, just as SP 800-53
allows partial implementation of a control with proper justification and risk analysis. The
STIG should not be an all-or-nothing approach. Additionally, the STIG often contains
15
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
specific values, such as 10 concurrent sessions, and should instead allow sites and
organizations to choose the values.
• Promote the definition of HPC as a single, monolithic system. It should not be considered
a collection of a thousand different systems (e.g., comprising compute nodes, admin
nodes).
• Vulnerability mitigation (SI-2) timeframes are difficult to follow. Jobs may have been
running or waiting in the queue for a long period of time, and patching will terminate
those jobs. Rolling patches should be encouraged whenever possible.
• There are no good tools for identifying where software vulnerabilities reside.
• The SP 800-53 overlay should have more “organizationally defined value” instead of
specific values, which will allow for a tailored value based upon risk, type of
vulnerability, and other factors. For example, a High-Moderate-Low should have a 60-
day mitigation period for a critical vulnerability, but a Moderate-Low-Low could have a
90-day mitigation period for critical vulnerabilities. The Committee on National Security
Systems Instruction (CNSSI) gives some guidance on scoping organization-based
parameters. In the HPC security overlay, the overlay should give guidance on what the
values should be and explanations for why certain values are possible or needed.
• There are broad concerns about disk encryption and encryption at rest. Physical security
and the striping of data across many drives mitigate the risk of data loss by limiting
physical access to disk drives.
• There were general questions about the Federal Information Processing Standards (FIPS)
140-2 and the effects of post-quantum-resistant algorithms on systems.
• Authorization boundaries should be investigated more closely in system security plans
(SSPs).
• Maintaining documentation in the verbiage and terminology of RMF regarding where
one control may be tailored differently from another and/or contradict verbiage in other
security controls would benefit the SSP.
16
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
There were several comments about security implementation challenges within the HPC
environment. Some of the concerns centered on the education of IT personnel, who may lack
training on the unique nature of HPC systems and their use. Another area of concern involved
trying to apply the same rules to HPC assets as enterprise-type systems, most of which currently
follow non-HPC guidance (i.e., SP 800-171 or SP 800-53) in lieu of HPC-specific standards and
best practices.
There were comments about the conflict between system administrators and security and the
difficulty of finding comment ground between the two groups regarding HPC. One suggestion
was to include input from both system administrators and security groups in acceptance standard
operating procedures (SOPs), policies, best practices, and even common dashboards to help
bridge the gap.
The group also discussed the common challenge of testing security implementations. All agreed
on the need for viable test environments for function and performance benchmarking, which are
often cost prohibitive. As a result, many have utilized smaller scale clusters, virtual alternatives,
and zoned system implementations within an existing cluster. It was noted that at least some
testing could be performed using these methods.
3.4.1. Hardware
• Composable/disaggregated systems (CXL) and their implications on architectures
• Multi-tenancy: Exploring how to efficiently share resources among multiple users or
applications
• Supply chain security: Ensuring the integrity and security of hardware components
throughout the supply chain
• Real-time authorization of components: Enabling the verification and authorization of
components in real time
• Transition from Trusted Platform Module 2.0 (TPM2) to Virtual TPM (VTPM):
Evaluating the benefits and challenges of transitioning from TPM2 to Virtual TPM
• SmartNICs and the software executed on them: Understanding the role of SmartNICs and
the software running on them for enhancing system performance and security
• Software-defined networks (SDNs): Leveraging SDNs to enable more flexible and
efficient network management
3.4.2. Software
• Crypto agility: Developing software that can adapt to different cryptographic algorithms
and protocols to ensure long-term security
17
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
3.4.3. Policy
• Transition to RMF and the subsequent move to the Cybersecurity Maturity Model
Certification (CMMC) in the coming decade
• Zero trust approach and its relevance to HPC systems: Implementing a zero trust
architecture that assumes no inherent trust in the system components and networks
• Supply chain management: Developing strategies to mitigate the risks associated with the
supply chain, including hardware and software components
These discussions highlighted the importance of considering hardware, software, and policy
aspects when designing and operating future HPC systems. By addressing these areas,
stakeholders can work toward building more secure, efficient, and resilient HPC environments.
18
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
Keynote Summary
19
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
Dr. Beverly described some of the ways in which HPC security is unique and presents distinct
challenges:
• May involve novel architectures, such as neuromorphic and quantum
• Typically involves specialized software, workloads, and data
• Has a unique population of users
• Focuses on performance
• Has a distinct science mission
• Has different adversaries than, for example, enterprise computing
OAC and the NSF have taken a holistic approach to securing science cyberinfrastructure through
multiple programs that target different capabilities and challenges. For instance, the Secure and
Trustworthy Computing (SaTC) program engages in fundamental cybersecurity research, while
the Cybersecurity Innovation for Cyberinfrastructure (CICI) program targets applied and
translational cybersecurity work for science cyberinfrastructure. CICI has three program areas:
usable and collaborative security for science, reference scientific security datasets, and a
transition to cyberinfrastructure resilience track. Across these areas, Dr. Beverly highlighted
some particularly exciting projects, including work on improving the performance and security
of data transport, analyzing and automatically patching scientific binary software, finding and
fixing configuration vulnerabilities in HPC, performing encrypted search and computing on
private data, and distributed credentialed access.
Dr. Beverly concluded by speaking about OAC’s initiatives for supporting the human side of
cyberinfrastructure through learning and workforce development, cyberinfrastructure
professionals, and the minority-serving cyberinfrastructure consortium.
20
NIST IR 8476 3rd HPC Security Workshop Report
September 2023
heterogeneous sources of information, such as the network, computing nodes, operating system,
runtime, and applications.
Dr. Pino also discussed two new scientific initiatives that are closely related to HPC. The first
initiative, titled “5G for Science,” explores the potential of emerging 5G wireless technologies.
The new technologies offer opportunities and capabilities to enable the digital continuum that
links the wireless edge to advanced scientific user facilities, such as HPC. The second initiative
focuses on microelectronics research and development (R&D), which holds the promise of
revolutionizing memory and data storage to redefine the future computing.
21