Preserving Topological Structure with
Mapper-Based Dimensionality Reduction
Jiahao Lai
Hangzhou Jianxiake Technology Co., Ltd.
Hangzhou, China
laijiahao0430@gmail.com
Abstract. High-dimensional data often presents challenges in visualization and interpretation due to
its complex structures and intricate relationships. Traditional dimensionality reduction techniques, such as
PCA and t-SNE, often struggle to preserve the topological features of such data, leading to the loss of critical
structural information. To address this, we propose a novel dimensionality reduction technique rooted in
topological data analysis, which aims to maintain the intrinsic topological structure while mapping the data
into a lower-dimensional space. Our approach extracts persistent homology groups and critical points as
topological features, ensuring their invariance in the reduced representation, and optimizes the bottleneck
distance using a mapper-based skeleton. We demonstrate the effectiveness of our method on complex
real-world datasets, showcasing its ability to uncover meaningful structures that are often overlooked by
traditional methods.
Keywords: Topological Data Analysis, Dimensionality Reduction, Persistent Homology,
Mapper Algorithm
1 Introduction
Dimensionality reduction is crucial for analyzing high-dimensional data, as such data often
presents challenges like sparsity and the curse of dimensionality. In high-dimensional Eu-
clidean spaces, data points tend to be sparsely distributed, making it difficult to accurately
capture their structure without a massive number of samples. This sparsity leads to issues
such as the crowding problem, where mapping high-dimensional data into lower dimen-
sions can distort distances and relationships. Additionally, most points in high-dimensional
spaces concentrate near the edges of a unit hypercube, making Euclidean distance metrics
less effective, as differences between large and small distances become negligible. These
challenges limit the performance of traditional dimensionality reduction methods like PCA
and t-SNE [1], which often struggle to preserve the underlying structure of the data dur-
ing reduction. As a result, there is a growing need for techniques that can retain essential
structural information while mapping data to lower dimensions.
Given these challenges, dimensionality reduction techniques [2, 3, 1, 4, 5] have become
a focal point of research, offering several key benefits. First, they increase the density of
samples, allowing distance-based algorithms to function effectively. Second, they remove re-
dundancy in high-dimensional data, reducing storage and computational costs—especially
important for large datasets. Third, by mapping data into two or three dimensions, they
enable direct visualization, making it easier to interpret the underlying structure. In di-
mensionality reduction, it is often desirable to preserve the topological structure of high-
dimensional data, maintaining features like connectivity and the presence of loops. How-
ever, traditional methods like PCA and t-SNE lack mechanisms to explicitly preserve these
topological properties, often leading to information loss during the reduction process.
To address this, Topological Data Analysis (TDA) [6–8]has emerged as a powerful
framework for understanding the shape and structure of data beyond mere distances. TDA
focuses on identifying properties that remain unchanged under continuous transformations,
making it well-suited for capturing the intrinsic structure of complex datasets. One of the
key tools in TDA is persistent homology [9], which analyzes how topological features such
as connected components, loops, and voids persist across different scales. By computing
persistent Betti numbers, it provides a robust summary of the topological features present
in data, allowing us to understand the structure of high-dimensional datasets.
Another important tool in TDA is the Mapper algorithm [10], which is used to visu-
alize the shape of high-dimensional data [11]. Mapper constructs a graph-like skeleton by
clustering data using a filter function, revealing structures like branches and loops that
represent significant patterns. It effectively captures topological information that may be
overlooked by traditional methods, making it a versatile tool for exploring complex data
landscapes.
In this work, we propose a novel dimensionality reduction method that leverages TDA
to address the limitations of traditional approaches. Our method focuses on preserving key
topological features by maintaining consistent persistent Betti numbers between the orig-
inal and reduced data. Additionally, we integrate the Mapper skeleton into our approach
to better retain the global structure of high-dimensional data, ensuring that important
patterns and relationships are preserved throughout the reduction process.
2 Related Work
Numerous dimensionality reduction techniques have been developed to represent high-
dimensional datasets[2, 3, 1, 4, 5]. However, a common limitation of these methods is the
distortions they introduce [12, 13, 3], which can vary significantly with changes in hyper-
parameters. These distortions often disrupt the global structure of the data, affecting
relationships between clusters and pairwise distances. In fields like physics and biology,
where accurate data interpretation is essential, such inconsistencies can lead to misleading
conclusions. Moreover, methods like UMAP and t-SNE frequently produce non-canonical,
inconsistent representations, with results highly sensitive to initialization and hyperpa-
rameters [14, 3]. This challenge has led to increasing interest in TDA, which offers tools
to better preserve the intrinsic structure of data across scales.
Recent work in topological data analysis (TDA) has enabled the integration of persis-
tent homology with optimization [15–18], making it a powerful tool for assessing topolog-
ical similarity. [19] developed methods for differentiating functions based on persistence
diagrams, and [20] categorized techniques for regularizing these functions, providing a
foundation for comparing datasets’ topological structures using metrics like the bottle-
neck and Wasserstein distances. Building on this, Rieck and Leitte [21, 22] introduced the
idea of using Wasserstein distance to compare persistence diagrams of high-dimensional
data with their lower-dimensional embeddings. This approach has inspired methods that
iteratively adjust low-dimensional embeddings to minimize topological differences, effec-
tively preserving structural features during dimensionality reduction. Extensions such as
topology-preserving graph autoencoders [23] and differentiable topological layers [24] fur-
ther broaden TDA’s applications in deep learning.
3 Methods
3.1 Preliminaries
Persistent Homology Persistent Homology is a fundamental concept in TDA that identi-
fies and quantifies topological features such as connected components, loops, and voids in
data. Given a point cloud X ⊂ Rn , we construct a simplicial complex K to approximate
its shape. Homology groups Hp (K) capture p-dimensional features (e.g., connected com-
ponents for p = 0, loops for p = 1). The p-th Betti number βp represents the number of
such features, providing a summary of the data’s structure. Persistent homology extends
this idea by analyzing how these features persist across different scales in a filtration of
simplicial complexes {K i }, where ∅ = K0 ⊆ K1 ⊆ · · · ⊆ Km = K. As the scale increases,
features may appear and disappear, which is captured in a persistence diagram—a collec-
tion of point pairs (i, j) indicating the birth and death of each feature. The persistence
diagram provides a concise summary of the topological changes in the data across scales.
To quantify the difference between two diagrams, we use the Wasserstein distance:
1/p
(1) (2) X
Wp,q Dk , Dk = inf ∥x − π(x)∥pq ,
(1)
(2)
π:Dk →Dk (1)
x∈Dk
(1)
where ∥ · ∥q denotes the q-norm, and π ranges over all bijections between diagrams Dk
(2)
and Dk .
Mapper Mapper is a TDA tool that visualizes the shape of high-dimensional data by
creating a simplified graph representation. Given a point cloud X ⊂ Rn and a filter
function f : X → R, Mapper projects the data using f and covers the range of f with
overlapping intervals {Ii }m −1 (I ) is clustered.
i=1 . For each interval Ii , the subset Xi = f i
The resulting Mapper graph M has nodes representing clusters, with edges between nodes
if the clusters share data points. Formally, let {Cij } be the clusters for interval Ii :
m
[
M= {Cij | Cij ̸= ∅},
i=1
with an edge between Cij and Ckl if Cij ∩ Ckl ̸= ∅. The choice of f , interval overlaps, and
clustering method shapes the Mapper graph, making it effective for detecting features like
loops, branches, and connected components in complex data.
3.2 Proposed Algorithm
In this work, we develop a dimensionality reduction algorithm that combines persistent
homology with Mapper-based initialization to preserve topological features. Our goal is to
find a low-dimensional embedding Y ⊂ Rd (with d ≪ n) that maintains the topological
structure of the original high-dimensional data X ⊂ Rn . The process involves the following
four steps:
Step 1: Constructing the Mapper Graph. We use the Mapper method to encode the struc-
tural information of X. Using a suitable filter function f : X → R, we create overlapping
intervals and cluster data subsets within each interval, resulting in a simplified Mapper
graph M . The nodes of M , which we refer to as critical points, represent clusters of similar
data points, capturing essential structures and transitions in X. These critical points play
a crucial role in guiding the subsequent dimensionality reduction process.
Step 2: Initializing the Low-Dimensional Embedding Ymapper . We initialize the low-dimensional
representation Ymapper , where the number of points matches the number of nodes (critical
points) in the Mapper graph M . This initialization is done by positioning Ymapper within
the same structure as M , aligning it with the topology of X. To refine this alignment, we
minimize the following loss function:
L(Ymapper , Xmapper ) = W2,2 (D(Ymapper ), D(Xmapper )), (1)
where W2,2 is the Wasserstein distance with p = q = 2, comparing the persistence diagrams
D(Xmapper ) and D(Ymapper ).
Step 3: Transition from Ymapper to Y . After optimizing Ymapper , which aligns with the
critical points of X, we extend this representation to a full low-dimensional embedding Y
that matches the original number of data points in X. This involves assigning each point
in X to a corresponding position in Y , guided by the proximity to the nodes in the Mapper
graph. This transition allows the topological structure captured in Ymapper to guide the
layout of the full embedding Y .
Step 4: Optimizing the Topological Loss between X and Y . To ensure that the final low-
dimensional embedding Y maintains the topological features of the original data X, we
define a combined loss function:
L(X, Y ) = W2,2 (D(X), D(Y )) + λLcritical (X, Y ), (2)
where W2,2 measures the Wasserstein distance between the persistence diagrams D(X)
and D(Y ), with p = q = 2. The term Lcritical (X, Y ) penalizes the relative Euclidean
Distance in critical points. We iteratively update Y to minimize L(X, Y ), refining the
low-dimensional embedding until it closely matches the topological structure of X.
Our algorithm thus integrates Mapper-based initialization with a topology-preserving
optimization framework, providing a robust solution for dimensionality reduction that
maintains both local and global data structures.
4 Dataset
In this study, we employ two datasets to evaluate the effectiveness of our proposed di-
mensionality reduction algorithm. The MNIST dataset serves as a familiar baseline for
comparison, while the Multiple-Activity Dataset represents a more complex and domain-
specific challenge that forms the core of our analysis.
4.1 MNIST Dataset
The MNIST dataset is a well-known benchmark in the field of machine learning, widely
used for evaluating image processing and dimensionality reduction techniques. It contains
a total of 70,000 grayscale images of handwritten digits (0-9), each represented as a 28x28
pixel image, resulting in a 784-dimensional vector for each sample. In our study, we select
a subset of 2,500 samples from the training set for the purpose of visual assessment, as
our primary goal is to evaluate the visualization capabilities of different dimensionality
reduction methods rather than performing classification tasks. This subset allows us to
examine how well our algorithm preserves the inherent structure of the high-dimensional
data when mapping it into a lower-dimensional space. The MNIST dataset provides a
straightforward benchmark to verify the generalizability of our algorithm to typical image
data.
4.2 Multiple-Activity Dataset
While MNIST serves as a benchmark, the Multiple-Activity Dataset is the central focus
of our study, presenting a more challenging and nuanced problem space. This dataset was
collected using Xsens Dot sensors, which are advanced wearable inertial measurement units
(IMUs) capable of high-precision tracking of human movement. The data captures nine
different indoor activities: Ascending Stairs, Descending Stairs, Jogging, Jumping, Lying,
Pushing an Object, Pushups, Sitting, and Walking. Sensors were strategically placed on
four body parts (left hand, right hand, left arm, and right arm), and each sensor recorded
nine features: Eulerx , Eulery , Eulerz (Euler angles), Accx , Accy , Accz (acceleration),
and Gyrx , Gyry , Gyrz (gyroscope readings). The data was sampled at an interval of 16.7
milliseconds, resulting in a dense and high-dimensional time series representation.
Each activity comprises approximately 6000 sensor records per body part, resulting
in a dataset of over 200,000 samples across all activities. This dataset is inherently more
complex than MNIST due to several factors:
– High Dimensionality and Temporal Nature: Unlike the static image data in
MNIST, the Multiple-Activity Dataset consists of time-series data where each sample
encapsulates a sequence of sensor readings. This requires the dimensionality reduction
method to preserve both spatial and temporal dependencies within the data.
– Sensor Correlation: The data from different body parts are interrelated, as move-
ments are often synchronized or coordinated. Capturing these relationships is crucial
for understanding the underlying dynamics of each activity.
– Activity Variability: The nine activities span a range of motion intensities and pat-
terns, from simple activities like sitting to more dynamic ones like jogging or jumping.
A successful dimensionality reduction technique must distinguish these subtle varia-
tions while maintaining the overall structure of the dataset.
The complexity and richness of this dataset make it a valuable test case for our di-
mensionality reduction algorithm, especially in scenarios where preserving the underlying
structure and temporal dynamics is critical. By applying our method to this dataset, we
aim to demonstrate not only the algorithm’s capacity to simplify high-dimensional data
but also its ability to uncover meaningful patterns in human movement data that would
be challenging for traditional techniques.
5 Experiment
Our experiments evaluate the effectiveness of our proposed dimensionality reduction method
in preserving topological features and structure across different types of high-dimensional
data, with a primary focus on its application to the Multiple-Activity Dataset and using
the MNIST dataset as a baseline for comparison.
5.1 Multiple-Activity Dataset Experiment
The primary set of experiments focuses on the Multiple-Activity Dataset. We begin by
preprocessing the data, removing outliers to reduce noise, and normalizing the data to
ensure a zero-mean distribution. To capture the temporal dynamics of human movements,
we organize the sensor readings into sliding windows of varying sizes. Each window con-
tains a continuous segment of readings, and the window size determines the temporal
resolution of each sample. For example, a window size of 50 represents a time sequence of
50 consecutive readings per sensor, resulting in a 450-dimensional vector (50 readings ×
9 features).
Our method projects these high-dimensional windows into a two-dimensional space,
enabling visualization of the activities. We compare the results with classical dimension-
ality reduction techniques like PCA, Manifold Intrinsic Dimensionality Search (MIDS),
and ISOMAP. Additionally, we test a hybrid approach, where PCA is used to reduce the
dimensionality to 10 before applying our method to further reduce it to two dimensions
(see Fig. 1).
Fig. 1. Visualization of dimensionality reduction results for nine activities from the Multiple-Activity
Dataset using a window size of 50 readings per sensor. The two-dimensional mapping reveals how different
activities are separated, capturing key temporal dynamics of movements such as jogging and jumping.
We also explore the impact of different window sizes, ranging from 30 to 100 readings,
on the quality of the reduced representations. This allows us to assess how the choice
of temporal resolution affects the separation and visualization of various activities (see
Fig. 2). Our observations indicate that the method performs well in capturing certain
dynamic activities, such as jogging and jumping, while the differentiation between less
dynamic activities remains challenging.
5.2 MNIST Dataset Experiment
To evaluate the generalizability of our method, we also apply it to the MNIST dataset, a
common benchmark in dimensionality reduction and visualization. Using a subset of 2,500
samples, we initially reduce the dimensionality from 784 to 10 using t-SNE, retaining the
primary structural features. Our method is then applied to further reduce the data to two
dimensions, allowing direct comparison with other methods.
We assess two initialization strategies: PCA-based initialization (see Fig. 3), which
provides a structured starting point, and random initialization (see Fig. 4), which tests the
robustness of our approach. The results show that both strategies produce similar visual
Fig. 2. Dimensionality reduction results for the Multiple-Activity Dataset using varying window sizes (5 to
50 readings). The figure illustrates how different temporal resolutions affect the separation and visualization
of activities in a two-dimensional space.
patterns, suggesting that the method can adapt to different initial conditions, though PCA
initialization tends to offer slightly more refined results.
Fig. 3. Left: Visualization of PH dimensionality reduction results with PCA initial weights. Right: Corre-
sponding loss curve showing the change in loss during the optimization process.
5.3 Summary of Experimental Insights
The experiments reveal that the performance of dimensionality reduction techniques, in-
cluding our own, varies depending on the characteristics of the data. Our method showed
strengths in preserving the temporal structure of complex activities in the Multiple-
Activity Dataset, although it did not achieve consistent reductions in topological dis-
tance across all activities. For more static datasets like MNIST, the method effectively
maintained key structural features, but further fine-tuning could improve the separation
between similar classes.
To assess whether topological dimensionality reduction preserves the distributional
differences among high-dimensional data from different activities, we employed a random
Fig. 4. Left: Visualization of PH dimensionality reduction results with random initial weights. Right:
Corresponding loss curve illustrating the progression of the loss throughout the optimization.
forest classifier to evaluate the separability of data after reduction. The classification accu-
racy provides insight into how well the reduced representations distinguish between various
activities.
To further explore the reasons behind these results, we introduced a feature extractor
before applying traditional dimensionality reduction methods. This extractor generates 36
features (see Table 1), including 27 time-domain features and 9 frequency-domain features.
As shown in Table 2, these commonly used signal processing features significantly improve
the classification performance on raw data. However, the performance often deteriorates
after dimensionality reduction, suggesting that the feature extraction process may not
align well with the reduced representation.
6 Conclusion
We proposed a novel dimensionality reduction method that leverages topological data
analysis, combining persistent homology and Mapper-based initialization to preserve topo-
logical features. Experiments on the MNIST and Multiple-Activity datasets demonstrated
that our method effectively retains critical structural information, providing improved
visualization and interpretability over traditional techniques. These results highlight the
potential of integrating topological insights into dimensionality reduction for analyzing
complex high-dimensional data.
References
1. L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,” Journal of machine learning
research, vol. 9, no. 11, 2008.
2. L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform manifold approximation and projection for
dimension reduction,” arXiv preprint arXiv:1802.03426, 2018.
3. Y. Wang, H. Huang, C. Rudin, and Y. Shaposhnik, “Understanding how dimension reduction tools
work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visual-
ization,” J Mach. Learn. Res, vol. 22, pp. 1–73, 2021.
4. M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques for embedding and clustering,”
Advances in neural information processing systems, vol. 14, 2001.
5. J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric framework for nonlinear
dimensionality reduction,” Science, vol. 290, no. 5500, pp. 2319–2323, 2000.
6. S. Barannikov, “The framed Morse complex and its invariants,” Advances in Soviet Mathematics,
vol. 21, pp. 93–115, 1994.
Table 1. Details of the Feature Extractor
Feature ID Feature Feature Detail
1
PK
1 Mean µ= K i=0 Si
q P
1 K
i=1 (Si − µ)
2 STD σ= K 2
3 Min Min(S)
4 Max Max(S)
5 Mode The most frequent value in S
6 Value Range Max(S) - Min(S)
7 Mean Crossing Counts of signal cross the mean value
8-17 Freq. Peak Values Top 10 freq. spectrum peak values
18-27 Top Frequencies Freq. of top 10 freq. spectrum peaks
PK 2
28 Signal Energy i=1 (Si )
µshape = 1e N
P
29 Shape Mean i=1 iPi
q P
30 Shape STD σshape = 1e N i=1 (i − µshape ) Pi
2
i−µshape 3
γshape = 1e N
P
31 Shape Skewness i=1 σshape
Pi
i−µshape 4
ζshape = 1e N
P
32 Shape Kurtosis i=1 σshape
Pi − 3
N
µamp = N1
P
33 Amplitude Mean i=1 Pi
q P
N
34 Amplitude STD σamp = N1 i=1 (Pi − µamp )
2
PN Pi −µamp 3
35 Amplitude Skewness γamp = N1 i=1 σ
amp 4
N P i −µamp
Amplitude Kurtosis ζamp = N1
P
36 i=1 σamp
−3
Table 2. Results of Random Forest Classifier with and without Feature Extraction
Method PH PCA+PH PCA MDS ISOMAP
Acc (without Feature Extraction) 0.377 0.294 0.433 0.392 0.197
Acc (with Feature Extraction) 0.252 0.252 0.294 0.141 0.281
7. A. J. Zomorodian, Computing and comprehending topology: Persistence and hierarchical Morse com-
plexes (Ph.D.Thesis). University of Illinois at Urbana-Champaign, 2001.
8. F. Chazal and B. Michel, “An introduction to topological data analysis: fundamental and practical
aspects for data scientists,” arXiv preprint arXiv:1710.04019, 2017.
9. A. Zomorodian and G. Carlsson, “Computing persistent homology,” in Proceedings of the twentieth
annual symposium on Computational geometry, 2004, pp. 347–356.
10. R. Kraft, “Illustrations of data analysis using the mapper algorithm and persistent homology,” 2016.
11. G. Singh, F. Mémoli, G. E. Carlsson et al., “Topological methods for the analysis of high dimensional
data sets and 3d object recognition.” PBG@ Eurographics, vol. 2, 2007.
12. T. Chari, J. Banerjee, and L. Pachter, “The specious art of single-cell genomics,” bioRxiv, 2021.
13. J. Batson, C. G. Haaf, Y. Kahn, and D. A. Roberts, “Topological obstructions to autoencoding,”
Journal of High Energy Physics, vol. 2021, no. 4, pp. 1–43, 2021.
14. D. Kobak and G. C. Linderman, “Initialization is critical for preserving global data structure in both
t-SNE and UMAP,” Nature biotechnology, vol. 39, no. 2, pp. 156–157, 2021.
15. O. Kachan, “Persistent homology-based projection pursuit,” in Proceedings of the IEEE/CVF Con-
ference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 856–857.
16. A. Wagner, E. Solomon, and P. Bendich, “Improving metric dimensionality reduction with distributed
topology,” arXiv preprint arXiv:2106.07613, 2021.
17. M. Moor, M. Horn, B. Rieck, and K. Borgwardt, “Topological autoencoders,” in International confer-
ence on machine learning. PMLR, 2020, pp. 7045–7054.
18. B. J. Nelson and Y. Luo, “Topology-preserving dimensionality reduction via interleaving optimization,”
arXiv preprint arXiv:2201.13012, 2022.
19. M. Carriére, F. Chazal, M. Glisse, Y. Ike, H. Kannan, and Y. Umeda, “Optimizing persistent homology
based functions,” in International Conference on Machine Learning. PMLR, 2021, pp. 1294–1303.
20. J. Leygonie, S. Oudot, and U. Tillmann, “A framework for differential calculus on persistence bar-
codes,” Foundations of Computational Mathematics, pp. 1–63, 2021.
21. B. Rieck and H. Leitte, “Persistent homology for the evaluation of dimensionality reduction schemes,”
in Computer Graphics Forum, vol. 34, no. 3. Wiley Online Library, 2015, pp. 431–440.
22. ——, “Agreement analysis of quality measures for dimensionality reduction,” in Topological Methods
in Data Analysis and Visualization. Springer, 2015, pp. 103–117.
23. Z. Luo, C. Xu, Z. Zhang, and W. Jin, “A topology-preserving dimensionality reduction method for
single-cell rna-seq data using graph autoencoder,” Scientific reports, vol. 11, no. 1, pp. 1–8, 2021.
24. K. Kim, J. Kim, M. Zaheer, J. Kim, F. Chazal, and L. Wasserman, “Pllay: efficient topological layer
based on persistent landscapes,” Advances in Neural Information Processing Systems, vol. 33, pp.
15 965–15 977, 2020.