Nutanix Core
Performance
Nutanix Tech Note
Version 1.0 • February 2019 • TN-2096
Nutanix Core Performance
Copyright
Copyright 2019 Nutanix, Inc.
Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110
All rights reserved. This product is protected by U.S. and international copyright and intellectual
property laws.
Nutanix is a trademark of Nutanix, Inc. in the United States and/or other jurisdictions. All other
marks and names mentioned herein may be trademarks of their respective companies.
Copyright | 2
Nutanix Core Performance
Contents
1. Executive Summary.................................................................................4
2. Introduction.............................................................................................. 5
2.1. Audience.........................................................................................................................5
2.2. Purpose.......................................................................................................................... 5
3. Nutanix Enterprise Cloud Overview...................................................... 6
3.1. Nutanix Acropolis Architecture.......................................................................................7
4. Distributed Storage Fabric..................................................................... 8
5. Effective Data Tiering..............................................................................9
5.1. Random I/O....................................................................................................................9
5.2. Sequential I/O.............................................................................................................. 10
5.3. Curator..........................................................................................................................11
6. Data Locality.......................................................................................... 12
7. High-Performance Snapshots and Clones..........................................13
8. Conclusion..............................................................................................14
Appendix..........................................................................................................................15
About Nutanix...................................................................................................................... 15
List of Figures................................................................................................................ 16
List of Tables.................................................................................................................. 17
3
Nutanix Core Performance
1. Executive Summary
Performance for a storage system has traditionally been one of the biggest pain points
in deployments. In a typical SAN or NAS architecture, the storage controller becomes a
performance bottleneck, especially when the architecture uses SSDs. The Nutanix architecture
provides performance in a scaled-out, distributed way by providing every node in the cluster with
its own Controller VM (CVM). Every CVM manages performance at a local level, leveraging its
local resources.
With the Nutanix Enterprise Cloud, you don’t have to design your deployments around future
performance needs. You can increase performance by simply adding nodes to the cluster.
Because the Nutanix solution is completely software-defined, you only need to upgrade the
software to get new performance features. You don’t need any special hardware or custom gear
to deliver new performance capabilities.
Nutanix also enables you to visualize performance through Prism. Prism provides end users
and administrators with a complete, detailed view of performance and event correlation data for
everything from VMs to disk drives.
1. Executive Summary | 4
Nutanix Core Performance
2. Introduction
2.1. Audience
This technical note is part of the Nutanix Solutions Library. We wrote it for architects and
administrators responsible for managing performance. Readers of this document should already
be familiar with basic virtualization and performance concepts.
2.2. Purpose
In this document, we cover the following topics:
• Nutanix Distributed Storage Fabric (DSF) performance.
• Data tiering with Nutanix.
• I/O path for reads and writes.
• Data locality in the Nutanix architecture.
• Snapshot and clone performance in Nutanix.
Table 1: Document Version History
Version
Published Notes
Number
1.0 February 2019 Original publication.
2. Introduction | 5
Nutanix Core Performance
3. Nutanix Enterprise Cloud Overview
Nutanix delivers a web-scale, hyperconverged infrastructure solution purpose-built for
virtualization and cloud environments. This solution brings the scale, resilience, and economic
benefits of web-scale architecture to the enterprise through the Nutanix Enterprise Cloud
Platform, which combines three product families—Nutanix Acropolis, Nutanix Prism, and Nutanix
Calm.
Attributes of this Enterprise Cloud OS include:
• Optimized for storage and compute resources.
• Machine learning to plan for and adapt to changing conditions automatically.
• Self-healing to tolerate and adjust to component failures.
• API-based automation and rich analytics.
• Simplified one-click upgrade.
• Native file services for user and application data.
• Native backup and disaster recovery solutions.
• Powerful and feature-rich virtualization.
• Flexible software-defined networking for visualization, automation, and security.
• Cloud automation and life cycle management.
Nutanix Acropolis provides data services and can be broken down into three foundational
components: the Distributed Storage Fabric (DSF), the App Mobility Fabric (AMF), and AHV.
Prism furnishes one-click infrastructure management for virtual environments running on
Acropolis. Acropolis is hypervisor agnostic, supporting three third-party hypervisors—ESXi,
Hyper-V, and Citrix Hypervisor—in addition to the native Nutanix hypervisor, AHV.
Figure 1: Nutanix Enterprise Cloud
3. Nutanix Enterprise Cloud Overview | 6
Nutanix Core Performance
3.1. Nutanix Acropolis Architecture
Acropolis does not rely on traditional SAN or NAS storage or expensive storage network
interconnects. It combines highly dense storage and server compute (CPU and RAM) into a
single platform building block. Each building block delivers a unified, scale-out, shared-nothing
architecture with no single points of failure.
The Nutanix solution requires no SAN constructs, such as LUNs, RAID groups, or expensive
storage switches. All storage management is VM-centric, and I/O is optimized at the VM virtual
disk level. The software solution runs on nodes from a variety of manufacturers that are either
all-flash for optimal performance, or a hybrid combination of SSD and HDD that provides a
combination of performance and additional capacity. The DSF automatically tiers data across the
cluster to different classes of storage devices using intelligent data placement algorithms. For
best performance, algorithms make sure the most frequently used data is available in memory or
in flash on the node local to the VM.
To learn more about the Nutanix Enterprise Cloud, please visit the Nutanix Bible and
Nutanix.com.
3. Nutanix Enterprise Cloud Overview | 7
Nutanix Core Performance
4. Distributed Storage Fabric
Nutanix Acropolis Distributed Storage Fabric (DSF) drives high performance for guest VMs by
providing storage resources to VMs locally on the same host. This method enables the local
storage controller (one per Nutanix node) to devote its resources to handling I/O requests made
by VMs running on the same physical node. Other controllers running in the cluster are then
free to serve I/O requests made by their own local guest VMs. This architecture contrasts with
traditional storage arrays that have remote storage controllers and resources located across a
network (SAN or NAS).
The Nutanix architecture has several important performance benefits. First, because storage
resources are local, each request does not traverse the network, which drastically decreases
latency because it eliminates the physical network from the I/O path. Additionally, each host
(or Nutanix node) has its own virtual storage controller (CVM), which eliminates the storage
bottlenecks common in shared storage architectures. As you add new Nutanix nodes to
the cluster, CVMs are added at the same rate, providing predictable, scalable, and linear
performance. The scale-out architecture allows for predictable high storage performance.
4. Distributed Storage Fabric | 8
Nutanix Core Performance
5. Effective Data Tiering
The DSF monitors and fingerprints storage access patterns and treats various data types
differently to optimize performance for each guest VM. In a hybrid node (which consists of SSDs
and HDDs), the DSF keeps frequently accessed (hot) data and random I/O in the fastest storage
medium—high-speed memory or flash-based SSDs—and moves less frequently accessed (cold)
data and sequential I/O to higher-capacity HDDs, all while keeping data fully redundant and
protected from failure. In the case of all-flash nodes, there is no tiering since all the media is the
same, and the DSF stores hot data either in the memory or SSD device, depending on when it
was accessed.
5.1. Random I/O
The DSF writes random I/O to a dedicated area on the SSD tier called the oplog. The oplog is an
SSD-based write cache built to handle I/O bursts. It stores data persistently and responds quickly
to guest VMs to deliver both low latency and improved performance.
During a random write burst, when the DSF writes data to the local oplog on one node, it
simultaneously sends the data across the network to the CVMs on one or more other nodes in
the cluster, where it is replicated to those nodes’ oplogs. After the DSF stores data on at least two
different nodes, it acknowledges the successful write to the guest VM and persists the data in the
write to the disk. You can therefore later retrieve the data even if there is a power outage or SSD
or node failure just milliseconds after the write is acknowledged.
Once the DSF persists write data to the oplog and sends an acknowledgement to the guest VM,
it coalesces and sequentially drains the data to the extent store, the persistent bulk storage of
DSF. In hybrid nodes, the extent store spans both the SSD and HDD tiers. In all-flash nodes,
the extent store consists only of SSDs. The oplog continuously drains to maintain space for
subsequent incoming writes and to maintain the highest level of performance.
Autonomous Extent Store
Starting with AOS 5.10, if you meet certain conditions (only all-flash nodes and at least 12 SSDs
per node), the extent store, rather than the oplog, handles sustained random write workloads
with a feature called Autonomous Extent Store (AES). For sustained random writes, AES
writes and stores data in the extent store directly. Before, AOS stored all metadata globally, but
now metadata has two parts: one stored locally to the node and another stored globally. Local
metadata storage provides metadata locality in addition to data locality. Nodes don’t need to
know where each piece of physical metadata resides, which optimizes metadata lookups and
allows you to achieve efficient sustained random write performance without using the oplog. For
5. Effective Data Tiering | 9
Nutanix Core Performance
random write bursts, the DSF still uses the oplog and drains to the extent store, using AES where
possible.
5.2. Sequential I/O
Sequential writes skip the oplog entirely and go directly to the extent store because sequential
data is continuous and can be efficiently written to disks in large blocks without performance
impact. Additionally, sequential data requires fewer metadata updates, so users spend less time
performing metadata updates.
Figure 2: DSF I/O Path
You can also configure the DSF such that sequential data bypasses the SSD tier altogether and
is written directly to the HDDs. This process maintains high performance because HDDs already
5. Effective Data Tiering | 10
Nutanix Core Performance
write data sequentially. Writing sequential data directly to HDD also reduces the total amount of
data stored on the SSD tier, preserving SSD capacity for random data and extending the lifetime
of flash-based SSD storage.
5.3. Curator
Curator, a system component that performs background tasks to keep the cluster running
smoothly, assesses data in the extent store to determine whether it is hot or cold. Data remains
on the highest performance hot tier (SSD) as long as it is accessed frequently. If data access
patterns diminish, however, Curator marks the data for migration to the high-capacity cold tier of
storage (HDD).
Curator uses a series of MapReduce algorithms to efficiently scan the metadata in a distributed
manner by analyzing different portions of metadata on each node in the cluster. It manages many
file system operations, including data tiering, disk rebalancing, defragmentation, repairing data
redundancy after either a disk or node failure, and much more. Curator runs a full scan every six
hours and a partial scan once per hour and whenever a critical event occurs. Examples of critical
events include disk or node failure, the SSD tier filling up beyond a given threshold, and a disk
(HDD or SSD) filling up beyond a given threshold relative to other disks.
Each CVM has a dynamic read cache, called the unified cache, that spans both the CVM’s
memory and the SDD. Curator places highly accessed data in this local read cache to allow the
DSF to serve requests directly from either memory or SSD, driving very low latency fetches.
The DSF promotes data from the HDD tier to the high-speed SSD tier when the access
frequency for the data increases. Stargate, another system component that manages data and I/
O in the cluster, moves the data. Curator takes a longer-term view and maintains optimal use of
SSD space by moving infrequently accessed data back to the HDD tier.
The local CVM leverages all disks on a particular tier for I/O. This distribution fully utilizes the
bandwidth available for all disks for both reads and writes. It also provides RAID-0-like level
performance, but without the risk of an entire node becoming unavailable if a single disk failure
occurs. Similarly, all CVMs in the cluster participate in oplog and extent store replication.
5. Effective Data Tiering | 11
Nutanix Core Performance
6. Data Locality
A common characteristic of a virtual machine cluster is that VMs migrate from host to host within
a cluster throughout the day and over time in order to optimize CPU and memory resources.
Because the DSF serves data locally to guest VMs, the VM’s data must follow when it moves
between hosts.
In a traditional shared storage environment, users access data over the network, so a VM’s data
stays in the same place (in other words, on the central array) even if the VM migrates throughout
the cluster. Because of the distributed and scalable nature of the Nutanix architecture, however,
the DSF keeps data as close to the VM as possible for the fastest performance and to minimize
both cross talk and network utilization.
After a VM completes its migration to another host, the CVM on the destination host takes
ownership of the migrated VM’s files (vDisks) and begins to serve all I/O requests for these
vDisks. Accordingly, the writes also go to the local CVM on local storage to ensure that write
performance remains as fast as it was before the VM migration event.
The Nutanix platform serves all read requests for newly written data locally, and forwards
previously written data to the source host’s CVM. In the background, Curator dynamically moves
the VM’s remote data to the local Nutanix node so that all read I/O is performed locally and does
not traverse the network.
6. Data Locality | 12
Nutanix Core Performance
7. High-Performance Snapshots and Clones
Traditional hypervisor snapshots can degrade performance and are not typically recommended
for use in production environments. Performance degradation occurs because the hypervisor has
little to no knowledge about the back-end storage medium. The performance challenge is that the
hypervisor makes writes to delta files in the form of a change log, which records every change
made to a file since the snapshot was taken. This process means that reads need to access
the most recent delta file the block of data was written to as well as every change made to the
original data. More snapshots result in more changed data, which in turn imposes a substantial
performance penalty.
Nutanix uses a redirect-on-write algorithm that dramatically improves system efficiency when
performing snapshots because the snapshots have storage awareness and are designed for
production-level data protection.
First, Nutanix snapshots allocate writes on a new block instead of writing them to a change log,
which means the DSF doesn’t have to look up existing data when writing. Secondly, the DSF
intelligently handles the way snapshot trees are tracked in metadata to optimize performance and
capacity, while simultaneously minimizing system overhead. When the child writes to any of the
blocks inside the group, the child vDisk gets a copy of the parent metadata for an extent group.
This process essentially eliminates any overhead as the snapshot chain grows because each
clone has its own copy.
Additionally, the distributed metadata system allows users to request multiple vDisks in a single
request, which greatly minimizes metadata lookup overhead for blocks that have not yet been
written or updated.
Clones (essentially writable snapshots) are closely related to snapshots. The DSF uses the
same underlying mechanism for cloning that it does for snapshots, so it benefits from the same
metadata optimizations.
7. High-Performance Snapshots and Clones | 13
Nutanix Core Performance
8. Conclusion
As discussed, although it is a cluster, Nutanix manages performance at the node level, delivering
the best possible performance for every guest VM. Data locality, which is unique to Nutanix,
makes sure reads are served locally to VMs even after they move around in the cluster.
Performance scales linearly as more nodes are added to the cluster.
Effective data tiering ensures that performance is optimized for every VM according to access
patterns for I/O going to its vDisks. System processes make sure the relevant data is hosted on
the relevant tier to keep the cluster performing at the optimal level.
Prism provides a detailed end-to-end analysis of performance from the same management
platform used to manage the cluster so you don’t need to install additional software specifically
for managing performance.
8. Conclusion | 14
Nutanix Core Performance
Appendix
About Nutanix
Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that
power their business. The Nutanix Enterprise Cloud OS leverages web-scale engineering and
consumer-grade design to natively converge compute, virtualization, and storage into a resilient,
software-defined solution with rich machine intelligence. The result is predictable performance,
cloud-like infrastructure consumption, robust security, and seamless application mobility for a
broad range of enterprise applications. Learn more at www.nutanix.com or follow us on Twitter
@nutanix.
Appendix | 15
Nutanix Core Performance
List of Figures
Figure 1: Nutanix Enterprise Cloud................................................................................... 6
Figure 2: DSF I/O Path....................................................................................................10
16
Nutanix Core Performance
List of Tables
Table 1: Document Version History................................................................................... 5
17