Autoencoders and Their Applications in Machine Learning
Autoencoders and Their Applications in Machine Learning
https://doi.org/10.1007/s10462-023-10662-6
Abstract
Autoencoders have become a hot researched topic in unsupervised learning due to their
ability to learn data features and act as a dimensionality reduction method. With rapid
evolution of autoencoder methods, there has yet to be a complete study that provides a
full autoencoders roadmap for both stimulating technical improvements and orienting
research newbies to autoencoders. In this paper, we present a comprehensive survey of
autoencoders, starting with an explanation of the principle of conventional autoencoder and
their primary development process. We then provide a taxonomy of autoencoders based
on their structures and principles and thoroughly analyze and discuss the related models.
Furthermore, we review the applications of autoencoders in various fields, including
machine vision, natural language processing, complex network, recommender system,
speech process, anomaly detection, and others. Lastly, we summarize the limitations of
current autoencoder algorithms and discuss the future directions of the field.
* Fatemeh Daneshfar
  f.daneshfar@uok.ac.ir
    Kamal Berahmand
    kamal.berahmand@hdr.qut.edu.au
    Elaheh Sadat Salehi
    e.salehi@cse.shirazu.ac.ir
    Yuefeng Li
    y2.li@qut.edu.au
    Yue Xu
    yue.xu@qut.edu.au
1
    School of Computer Science, Faculty of Science, Queensland University of Technology (QUT),
    Brisbane, Australia
2
    Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran
3
    Department of Electrical and Computer Engineering, University of Shiraz, Shiraz, Iran
                                                                                            13
                                                                                    Vol.:(0123456789)
28   Page 2 of 52                                        K. Berahmand et al.
List of symbols
X	The input
X ′	The reconstructed output
X̂′	The noisy input
Z	The hidden representation of the input data
L	The graph Laplacian matrix
W	Non-negative matrices (basis vectors)
H	Non-negative matrices (coefficients or activations)
We	The encoder weight matrix
Wd	The decoder weight matrix
D	The distances matrix between neighbors
N	The number of data points
E	The expectation operator
𝜆	The regularization parameter
KL(.||.)	The Kullback–Leibler divergence
p(.)	The probability distribution
q(.)	The approximate probability distribution of p(.)
f(.)	The encoder function
g(.)	The decoder function
tr(.)	The trace of the matrix
D(.)	The discriminator’s output for a real data point
G(.)	The generator’s output for the latent variable
‖.‖	The 2-norm of a vector
‖.‖F	The Frobenius norm
‖X − X � ‖2F	The reconstruction loss
Abbreviations
AA	Adversarial Autoencoder
AAE	Adversarial Autoencoder
AE	Autoencoder
AGAE	Adversarial Graph Autoencoder
BAE	Bayesian Autoencoder
BCE	Binary Cross-Entropy
BiRNNAE	Bidirectional Autoencoder
CAE	Convolutional Autoencoder
CAE	Convolutional Autoencoder
CNN	Convolutional Neural Network
CVAE	Convolutional Variational Autoencoder
CSAE	Convolutional Sparse Autoencoder
DAE	Denoising Autoencoder
DVAE	Disentangled Variational Autoencoder
GAE	Graph Autoencoder
GAAE	Graph Attentional Autoencoder
GCN	Graph Convolution Network
GMAE	Graph Masked Autoencoder
GPU	Graphics Processing Unit
GRUAE	GRU Autoencoder
ISOMAP	Isometric Feature Mapping
13
Autoencoders and their applications in machine learning: a…                 Page 3 of 52   28
1 Introduction
Dimension reduction is crucial in machine learning for simplifying complex data sets
(Van Der Maaten et al. 2009), reducing computational complexity (Ray et al. 2021),
and mitigating the curse of dimensionality (Talpur et al. 2023), ultimately improving
model performance and interpretability. Dimension reduction encompasses two primary
approaches: feature selection (Solorio-Fernández et al. 2022), which involves choosing a
subset of the most informative features from the original data-set to reduce dimensionality
while maintaining interpretability; and feature extraction (Li et al. 2022), a method where
new, lower-dimensional features are derived from the original data to capture essential
patterns and relationships.
   Feature extraction comprises both linear and nonlinear techniques that transform the
original data into a lower-dimensional representation. Linear feature extraction such as
Factor Analysis (FA) (Garson 2022), Linear Discriminant Analysis (LDA) (Balakrishnama
and Ganapathiraju 1998), Principal Component Analysis (PCA) (Abdi and Williams
2010) and Non-negative Matrix Factorization (NMF) (Lee and Seung 2000) involves
transforming the input data into a new set of features using linear combinations of the
original input features (Wang et al. 2023).
   Linear methods are relatively straightforward and computationally efficient. They
often provide interpretable results, making it easier to understand the importance
of each feature, and are effective when the underlying relationships in the data
are approximately linear. However, they capture global correlations, and result in
                                                                                13
28   Page 4 of 52                                                                            K. Berahmand et al.
Fig. 1 Categorization of feature extraction methods into linear and non-linear approaches
13
     Table 1  Methods for dimensionality reduction
     Method Type Method                              Loss function L               Description
     Linear        FA (Shrestha 2021)                 min(−0.5 log |Ψ|             Explains patterns of correlations among observed variables by uncovering underlying latent
                                                                                    factors
                                                           + tr(SΨ−1 )
                                                           + 0.5dp log(2𝜋))
                   PCA (Hasan and Abdulazeez 2021) maxW T W=I trace(W T AW)        Optimizes the projection of data onto its principal components by maximizing the variance
                                                                                    along those components
                                                           (             )
                   LDA (Li et al. 2020)                        wT Sb w             Maximizes the separation between classes while minimizing the variance within each class
                                                     min       wT Sw w
                   NMF (Wang et al. 2023)            minW,H≥0 ‖X − WH‖2F                Decomposes a non-negative matrix into two lower-dimensional non-negative matrices
                                                                                                                                                                                      Autoencoders and their applications in machine learning: a…
                                                         �∑           ∑               �
     Nonlinear     LLE (Miao et al. 2022)            min                            2   Seeks to preserve the local linear relationships between data points in a lower-dimensional
                                                            i ��xi −     j wij xj ��
                                                                                          space
                                                         �            �
                   ISOMAP (Ding et al. 2022)         min ‖D − D̂‖2                      Constructs a low-dimensional representation of data while preserving the geodesic distances
                                                                                          between data points on a manifold-like structure
                                                         �∑ ∑            � p ��
                   t-SNE (Meyer et al. 2022)                                ij          Preserves  the pairwise similarity relationships between data points in a lower-dimensional
                                                     min        pij log q
                                                                            ij            space
                                                         �             �
                   AE (Bank et al. 2023)             min ‖X − X � ‖2F                   Aims to encode and subsequently decode data, facilitating dimensionality reduction and
                                                                                          feature extraction
                   RNN (Shi et al. 2022)             –                                  Captures temporal dependencies from sequential data passed recursively through hidden
                                                                                          layers
                   CNN (Molaei et al. 2022)          –                                  Processes structured grid data, by applying convolutional layers to automatically extracted
                                                                                          features
                                                                                                                                                                                       Page 5 of 52
                                                                                                                                                                                       28
13
28   Page 6 of 52                                                           K. Berahmand et al.
Fig. 2  All published papers in gScholar, Web of Science and arxiv since 2012 with keywords
"Autoencoders" and "Machine Learning"
13
Autoencoders and their applications in machine learning: a…                   Page 7 of 52   28
   However, AEs offer a powerful set of capabilities but also come with certain drawbacks
that should be considered. One of the main drawbacks of using AEs is that they are sensitive
to the choice of hyperparameters, such as the number and size of layers, the learning rate,
the loss function, and the regularization. These hyperparameters can affect the performance
and the quality of the autoencoder, and may require trial and error or grid search to find
the optimal values (Bank et al. 2020). Another common concern with AEs is their lack of
robustness. They can be sensitive to noisy data, outliers, and variations in input, which can
lead to suboptimal representations and reconstructions (Singh and Ogunfunmi 2022). AEs
can be prone to overfitting, especially when trained on limited data. Additionally, they may
not inherently preserve the spatial or temporal locality of data during training. This can
be problematic for tasks where preserving the local structure is essential, such as image
segmentation or sequence modeling (Liu et al. 2023). Furthermore, AEs tend to capture
lower-order features and may struggle to represent complex, higher-order relationships in
the data. This limitation can impact their performance on tasks that require understanding
intricate dependencies (Miuccio et al. 2022).
   In recent years, substantial research efforts have been dedicated to addressing these
drawbacks through advancements in deep learning and AE techniques. Some of the
presented architectures in this area include regularization AEs, robust AE, generative
AE, convolutional AE, recurrent AE, semi-supervised AE, graph AE and masked AE.
These improvements, as demonstrated in Fig. 2, have caused that the use of autoencoder
algorithms in machine learning has gained increasing interest over the years. The
graph shows the trend of papers published in the field of "autoencoder" and “machine
learning” since 2012, revealing that over 90% of all indexed papers were published
between 2018 and 2023.
   Despite being an important area of research, there is currently a lack of
comprehensive studies exploring the applications of AE algorithms in machine learning
on a wide scale. While existing review papers have examined specific themes, there has
been no comprehensive review conducted. In Table 2, we compare our contribution in
this paper to the descriptions of existing review papers in the field.
   To this knowledge gap, our review will focus on addressing three key research
questions:
• What are the different types of AE algorithms that have been developed and utilized
  in machine learning applications?
• What are the main methodological frameworks and the latest achievements in the
  application of AE algorithms?
• What are the gaps and future directions in this field, and how can they be addressed
  to enhance the effectiveness of AE algorithms in machine learning applications?
                                                                                  13
                                                                                                                                                                                28
     Table 2  Comparison of our article with the previous review or survey articles
     Paper                          Year Brief description                                                      Aspects not considered
13
     Sagha et al. (2017)            2017 The article provides a comprehensive review of existing literature Categorization of autoencoder taxonomies and applications.
                                          and studies that have utilized stacked denoising autoencoders for Comprehensiveness
                                                                                                                                                                                Page 8 of 52
     Pratella et al. (2021)     2021 The review discusses several types of autoencoders, the                  Autoencoder applications in ML
                                        advantages, and disadvantages of each algorithm, and provides
                                        examples of how they can be applied to rare disease diagnosis
     Song et al. (2021)         2021 It proposes the use of autoencoders as a technique for network           Applications of autoencoder techniques in ML
                                        intrusion detection. The authors conduct experiments on a
                                        dataset of network traffic, comparing the performance of
                                        autoencoders to traditional anomaly detection techniques
     Qian et al. (2022)         2022 The article provides an overview of fault detection and diagnosis,       Comprehensiveness. Autoencoder techniques in ML
                                        and then discusses the use of autoencoders for feature extraction
                                        in industrial processes. It covers different types of autoencoders,
                                        and how they can be used for fault detection and diagnosis
     Shankar and Parsana (2022) 2022 The paper provides an overview and empirical comparison of               Categorization of autoencoder taxonomies and applications.
                                        different NLP models and introduces and empirically applies            Comprehensiveness
                                        autoencoder models in the marketing domain
     Singh and Ogunfunmi (2022) 2022 The paper provides an overview of VAEs and their applications            Categorization of autoencoder taxonomies. Autoencoder
                                                                                                                                                                            Autoencoders and their applications in machine learning: a…
13
     Table 2  (continued)
                                                                                                                                28
13
                                    them into distinct categories based on their architecture.
                                                                                                                                Page 10 of 52
This paper is organized as follows. Section 2 provides a concise overview of the structure
and hyperparameter in AEs. Section 3 discusses various taxonomies of AEs that have
been proposed in the literature. In Sect. 4, we review previous applications of AEs in the
machine learning domain, categorizing them according to the task they were used for. In
Sect. 5, we review explore publicly available software and platforms that can be used to
construct and develop AEs the performance of various autoencoders. Section 6 is dedicated
to discussing future directions in the field. Finally, in Sect. 7, we present our conclusions
based on the insights gathered from our analysis.
2 Background of autoencoder
AE is a fundamental building block that can be used hierarchically to create deep models.
They organize, compress, and extract high-level features, allowing unsupervised learning
and the extraction of non-linear features (Chen and Guo 2023). Autoencoders have
advantages over Restricted Boltzmann Machines (RBMs) as they can learn more complex
data representations. RBMs are widely used for generating various data types, including
images (Hinton et al. 2006). RBMs are a type of Boltzmann Machine (BM) that learns a
probability distribution from inputs (Chen and Guo 2023). The main difference between
Autoencoders, RBMs, and BMs lies in their architectures. AEs have an encoder and a
decoder, while RBMs consist of visible and hidden layers. Boltzmann Machines (BMs)
are more general and fully connected, making them less tractable compared to RBMs.
AEs are feed-forward neural networks, allowing information to flow in one direction. In
contrast, RBMs and BMs are generative models capable of generating new samples from
the learned distribution.
2.1 Vanilla autoencoder
                                                                                  13
28   Page 12 of 52                                                                                  K. Berahmand et al.
   During the encoding step, an AE maps an input vector X to a code vector Z using an
encoding function f𝜃 . In the decoding step, it maps the code vector Z back to the output
vector X ′, aiming to reconstruct the input data using a decoding function g𝜃 . AEs adjust
the network’s weights (W ) through fine-tuning, achieved by minimizing the reconstruction
error L between X and the reconstructed data X ′. This reconstruction error acts as a loss
function used to optimize the network’s parameters (Chai et al. 2019). The objective
function of an AE can be written as:
                                           n                          n
                                           ∑                          ∑
                       min JAE (𝜃) = min
                         𝜃           𝜃
                                                 l(xi , xi� ) = min
                                                                𝜃
                                                                            l(xi , g𝜃 (f𝜃 (xi )))                  (1)
                                           i=1                        i=1
where xi represents the i th dimension of the training sample, xi′ represents the i th dimension
of the output data, and n is the total amount of training data. The term "l" refers to the
reconstruction error between the input and output, defined as:
                                                     n
                                                     �
                                    L(X, X � ) =            ‖Xi − Xi� ‖2                                           (2)
                                                      i=1
The encoder and decoder mapping functions are Z = f𝜃 (X) = s(WX + b) and
X � = g𝜃 (Z) = s(W � Z + b� ), where "s" is a non-linear activation function like sigmoid
or ReLU. W and W ′ are weight matrices, and b and b′ are bias vectors. During training,
the weights and biases of the autoencoder are adjusted to minimize the reconstruction
error using an optimization algorithm like stochastic gradient descent. Once trained, the
encoding function can create low-dimensional representations of new input data ( Z ),
while the decoding function can reconstruct the original data from the low-dimensional
representation ( X ′).
2.2 Stack autoencoder
13
Autoencoders and their applications in machine learning: a…                 Page 13 of 52   28
Stacked Autoencoder follows a layer-wise approach (Hoang and Kang 2019; Hinton et al.
2006). After training layer 1, it serves as the input for training layer 2. When evaluating
the reconstruction loss, it is assessed relative to layer 1 rather than the input layer. The
encoding process can be mathematically represented as follows:
in which k represents the k-th autoencoder, ak represents the encoding outcome of the k-th
autoencoder, and when k = 1, a0 = x denotes the input data. The decoding process can be
mathematically represented as follows:
2.3 Hyperparameters in autoencoder
Autoencoders come with various hyperparameters that must be defined prior to training,
and their values can significantly influence the model’s performance. It’s crucial to
understand that certain hyperparameters are usually set before training and remain
constant, while others can be dynamically tuned during training to optimize the model’s
performance. Selecting and adjusting hyperparameters often involves experimentation
and validation to achieve the best results for a particular task. The following outlines the
most common hyperparameters in autoencoders:
• Number of Hidden Layers: The quantity of hidden layers within the autoencoder
  defines its network depth and its capacity to capture intricate data patterns. This
  parameter is configured before training. While adding more hidden layers can
  enhance the model’s representational power, it may also introduce optimization
  challenges and elevate the risk of overfitting.
• Number of Neurons in Each Layer: The number of neurons in each layer governs
  the network’s data representation capacity and is typically set before training. A
  higher count of neurons can amplify the network’s capacity but might also elevate
  the risk of overfitting and complicate the optimization process.
• Size of Latent Space: Adjusting the size of the bottleneck layer permits fine-tuning
  the balance between model complexity and performance. This parameter is set prior
  to training.
• Activation Function: The activation function utilized in the bottleneck layer plays
  a pivotal role in the autoencoder’s performance. To optimize the autoencoder’s
  performance, the bottleneck layer activation function should be tailored before
  training. These functions determine the network’s nonlinearity and its ability to
  learn intricate data patterns. Common activation functions employed in bottleneck
  layers encompass sigmoid, tanh, ReLU, and SELU. Further details, including their
  equations, outputs, and output curves, are outlined in Table 3.
• Objective Function: The objective function, also known as the loss function, is a
  critical element of an autoencoder, serving to train the network by minimizing the
  distinction between input and output data. It gauges the dissimilarity between the
                                                                                 13
28     Page 14 of 52                                                                K. Berahmand et al.
                                            {
SELU                                          x       if x > 0       [−2, ∞]
                                   f (x) = 𝜆
                                              𝛼ex − 𝛼 if x ≤ 0
     input and output data, and the autoencoder is trained to diminish this dissimilarity.
     The selection of the objective function hinges on the data type and the specific
     application and is generally determined before training. Common objective functions
     used in autoencoders include:
     – Mean Squared Error (MSE): This is the predominant objective function in
       autoencoders, measuring the average squared difference between input and output
       data. MSE is defined by formulas (1):
                                  �             �
               LAE (X, X � ) = min ‖X − X � ‖2F                                    (4)
	  When choosing an autoencoder loss function, consider the problem’s unique needs.
  MSE suits regression tasks, offering robustness against outliers but sensitivity to data
  scaling. BCE is for binary classification but can be numerically unstable near 0 or 1
  probabilities. The choice depends on the problem and task requirements. MSE is the
13
Autoencoders and their applications in machine learning: a…                Page 15 of 52   28
                                                                                13
28   Page 16 of 52                                                                                                 K. Berahmand et al.
Autoencoder
OAE Difussion AE
3 Autoencoder taxonomy
3.1 Regularized autoencoder
13
Autoencoders and their applications in machine learning: a…                    Page 17 of 52   28
3.1.1 Sparse autoencoder
This combined penalty term encourages the model to acquire a sparse representation,
wherein only a limited number of neurons are active for each input.
3.1.2 Contractive autoencoder
Contractive Autoencoder (CAE) (Rifai et al. 2011) is an autoencoder that aims to produce
similar representations for similar input data by adding a penalty term to the loss function.
This penalty term, based on the Frobenius norm of the Jacobian matrix of the encoder
concerning the input data, encourages local stability in the learned representation. The
primary objective of the CAE is to minimize the difference between the input data and
the reconstructed data while taking the penalty term into account, promoting similarity
in representations for similar input data. The overall loss function of CAE includes the
reconstruction loss and a penalty term as follows:
                                           �                           �
                       LCAE (X, X � ) = min ‖X − X � ‖2F + 𝜆‖JF (X)‖2F                    (8)
where ‖JF (X)‖2F represents the squared Frobenius norm of the Jacobian matrix of the
encoded representation concerning the input data. This norm measures the sensitivity of
the encoded representation to small variations, calculated as:
                                                            �             �2
                                                      �         𝜕hj (X)
                                     ‖JF (X)‖2F   =                                            (9)
                                                      i,j
                                                                 𝜕Xi
                                                                                    13
28   Page 18 of 52                                                           K. Berahmand et al.
3.1.3 Laplacian autoencoder
The standard Autoencoder may not emphasize the relationships between nearby data
points during its learning process, which can lead to extracted features lacking crucial
information about the data’s internal structure. In contrast, the Laplacian Autoencoder
prioritizes preserving the distances between neighboring data points, effectively capturing
the significant internal structure within the data. Inspired by this concept, the Laplacian
Autoencoder (LAE) (Jia et al. 2015) was introduced to facilitate the generation of lower-
dimensional representations for Autoencoders. This approach ensures that the learned
representations incorporate essential local structural information, enhancing their
suitability for specific data analysis tasks. The loss function for the Laplacian Autoencoder
is defined as follows:
                                              �                           �
                          LLAE (X, X � ) = min ‖X − X � ‖2F + 𝜆tr(Z � LZ)                (10)
where matrix L, known as the graph Laplacian, is calculated based on how similar pairwise
are in the latent space. This calculation typically involves techniques like using k-nearest
neighbor graphs or Gaussian kernels.
3.1.4 Orthogonal autoencoder
where I is the identity matrix, Z T represents the transpose of the compressed representation
Z , and 𝜆 is a penalization parameter. Notably, setting 𝜆 to zero yields a conventional
autoencoder.
3.2 Robust autoencoder
13
Autoencoders and their applications in machine learning: a…                  Page 19 of 52    28
3.2.1 Denoising autoencoder
where X represents the clean input data, and X̂′ denotes the noisy input data.
where W signifies the learned transformation matrix, and m represents the total number of
input examples.
    The M-DAE seeks the best solution for W, which can be expressed mathematically
as:
L2,1 Robust Autoencoder ( L2,1-RAE) (Li et al. 2018) is a modified version of the Robust
Autoencoder (RAE) designed to enhance the autoencoder’s resilience when dealing
with noisy or corrupted input data. This enhancement is achieved through the use of
a specific type of regularization known as     L2,1 regularization. 
                                                                     L2,1 regularization
encourages the learned features to possess specific properties. Notably, it promotes
                                                                                  13
28   Page 20 of 52                                                           K. Berahmand et al.
feature sparsity, meaning that most features consist of zeros, and robustness, enabling
them to handle scenarios with data outliers or noise. The mathematical expression of
the L2,1-RAE loss function is given as follows:
                                             �                          �
                      L2,1RAE (X, X � ) = min ‖X − X � ‖2F + 𝜆 ⋅ ‖Z‖2,1            (15)
where ‖Z‖2,1 represents the L2,1-norm of the latent representations, which emphasizes both
sparsity and robustness in these learned features.
3.3 Generative autoencoder
3.3.1 Variational autoencoder
Variational Autoencoder (VAE) (An and Cho 2015) is a type of autoencoder that learns
to represent data in a lower-dimensional latent space and generate new data samples that
resemble the input. Unlike traditional autoencoders, VAEs are generative models that
can capture the underlying distribution of input data. In a VAE, the encoder maps input
data to a posterior distribution q(Z|X) instead of a fixed latent representation Z. During
reconstruction, Z is sampled from this distribution and passed through a decoder. The
regularization loss in VAE encourages q(Z|X) to match a specific distribution, often a
standard Gaussian. The VAE loss function is defined as:
                                      [            ]
                   LVAE = − E(q(Z|X)) log[p(X|Z)] + KL(q(Z|X)||p(Z))                  (16)
 the first term measures the difference between the original input data ( p(X|Z)) and the data
reconstructed by the decoder. The second term, a regularization component, quantifies the
KL divergence between q(Z|X) and p(Z), typically a standard Gaussian distribution. This
loss function guides VAE training to balance accurate data reconstruction with a structured
latent space for generative purposes.
3.3.2 Adversarial autoencoder
13
Autoencoders and their applications in machine learning: a…                   Page 21 of 52    28
                                   �                                            �
               LAAE (X, X � ) = min ‖X − X � ‖2F + log(D(X)) + log(1 − D(G(Z)))               (17)
where G(z) is the decoder function that converts the latent representation back to the
original input data, and D(X) represents the discriminator’s output for the original input
data. The term log(1 − D(G(Z))) reflects the discriminator’s output for data generated by
the decoder.
3.3.3 Bayesian autoencoder
Bayesian Autoencoder (BAE) (Yong and Brintrup 2022) is a probabilistic AE that models
all parameters, in contrast to the Variational Autoencoder (VAE) that mainly models the
latent layer. BAE combines a Gaussian likelihood for data reconstruction with an isotropic
Gaussian prior for parameter uncertainty. The loss function maximizes data likelihood and
minimizes model complexity. The BAE loss function is defined as:
                                    (     D
                                                                      )
                                      1 ∑ 1                  1
                     log p(x|𝜃) = −             (x − xi� )2 + log 𝜎i2                 (18)
                                      D i=1 2𝜎i2 i           2
where 𝜎i2 is the variance of the Gaussian distribution, and log p(x|𝜃) represents the log-
likelihood of observing the original data x given the model parameters 𝜃 . It quantifies data
reconstruction through squared errors and variances while promoting model simplicity.
The training objective is to maximize this log-likelihood while minimizing regularization
to find optimal parameters 𝜃 for effective data pattern and uncertainty capture.
3.3.4 Diffusion autoencoder
3.4 Convolutional autoencoder
                                                                                    13
28   Page 22 of 52                                                                K. Berahmand et al.
data, as they excel at capturing spatial dependencies, which refer to the patterns and
relationships among pixels or locations within individual images or data frames. They find
wide-ranging applications in tasks such as image denoising, inpainting, segmentation, and
super-resolution.
the first term measures the difference between the original image and its reconstruction by
the decoder, while the second term encourages the latent representation q(Z|X) to follow a
standard Gaussian distribution through KL divergence regularization, ensuring a structured
latent space for effective generative capabilities.
where N is the number of spatial rows in the data, M is the number of spatial columns in
the data, T is the number of time steps in the sequence, Xtij represents the ground truth
value at spatial location (i, j) at time step t, and Xtij
                                                      ′
                                                          represents the predicted value at spatial
location (i, j) at time step t.
13
Autoencoders and their applications in machine learning: a…                           Page 23 of 52    28
includes a sparsifying module designed to create sparse feature maps. This module retains
the highest value and its corresponding position within each local subregion before
performing unpooling, primarily through max pooling. The loss function used in CSAE,
which quantifies the disparities between the original input and the reconstructed output,
relies on the Frobenius norm and is defined as follows:
                                                         L �
                                                         �                  (l) �
                                                                                  2
                              LCSAE (X, X � ) = min             ‖X (l) − X � � )                      (22)
                                                                                �F
                                                          l=1
                                                d
                                                ∑ (                   )
                                                                                                      (23)
                                      (l)
                                 X�         =      rot(Wi , 180) ∗ Zil + ci
                                                i=1
                                        ( )          (                     )
                              Z l = Gp,s Zi(l) = Gp,s f (Wi ⋅ X (l) + bi )                            (24)
where l is the number of layers, X (l) represents the original input at layer l, X � (l) represents
the reconstructed output at layer l, d is the number of feature channels, Zil is the ith
sparsified feature map, and Gp,s (X) represents the sparsifying operator, involving max-
pooling and unpooling operations to create sparse feature maps.
3.5 Recurrent autoencoder
RNNs (Medsker and Jain 2001) are designed for processing sequential data, like time
series where the current state (ht ) relies on the previous state (ht−1). Vanilla RNNs have
a limitation of short-term memory, leading to gradient problems in long sequences. To
address this, LSTM equipped with three gates (forget gate, input gate, and output gate),
and GRU networks consist of two gates (update gate and reset gate) were introduced.
These architectures incorporate self-loops to effectively manage gradients over extended
sequences, addressing the vanishing or exploding gradient issue. Recurrent Autoencoder is
an autoencoder that incorporates recurrent layers, such as LSTM or GRU, within both the
encoder and decoder components.
where X represents the clean input sequence and X ′ represents the reconstructed output
sequence.
                                                                                           13
28   Page 24 of 52                                                           K. Berahmand et al.
GRU Autoencoder (GRUAE) (Dehghan et al. 2014) employs GRU units in both the
encoder and decoder parts. Unlike LSTM, GRU has a simpler architecture with only two
gates: the update and reset gates. This architectural simplicity can lead to easier training
and faster processing while still capturing long-term dependencies in input sequences.
The formulation of a GRU Autoencoder is similar to that of an LSTM Autoencoder,
making it flexible and effective for modeling sequential data,
                                                  �             �
                            LGRUAE (X, X � ) = min ‖X − X � ‖2F                         (26)
where X represents the clean input sequence and X ′ represents the reconstructed output
sequence.
3.5.3 Bidirectional autoencoder
where T is the sequence length, Xt represents the input at time step t, and Xt′ represents the
reconstructed output at time step t.
3.6 Semi‑supervised autoencoder
13
Autoencoders and their applications in machine learning: a…                         Page 25 of 52    28
in which the first term represents the expectation of the conditional log-likelihood of the
latent variable z, the second term denotes the log-likelihood associated with y, and the third
term quantifies the Kullback–Leibler divergence between the prior distribution p(z) and the
posterior distribution q𝜙 (z|x, y).
Label and Sparse Regularized Autoencoder (LSRAE) (Chai et al. 2019) is a novel
approach that combines label and sparse regularizations with autoencoders to create a
semi-supervised learning method. This method effectively leverages the strengths of
both unsupervised and supervised learning processes. On one hand, sparse regularization
selectively activates a subset of neurons, enhancing the extraction of localized and
informative features. This unsupervised learning process helps uncover underlying data
concepts, improving generalization. On the other hand, label regularization enforces the
                                                                                            13
28   Page 26 of 52                                                            K. Berahmand et al.
where the first term ensures precise data reconstruction, the second term promotes sparsity
within the hidden layer, facilitating efficient feature extraction. The third term acts as a
safeguard against overfitting by penalizing excessive weights. Lastly, the fourth term
enhances classification accuracy by quantifying the label error. Here, L denotes the actual
label, and T represents the desired label.
3.7 Graph autoencoder
Graph Autoencoder (GAE) (Pan et al. 2018) is a power method for reducing the
dimensionality of graph data, enhancing efficiency in graph analytics. It takes a graph
as input and outputs a condensed vector representation that captures its essential feature.
Within GAE, the encoder converts the input graph into a lower-dimensional vector,
which the decoder uses to recreate the original graph. The model aims to minimize the
dissimilarity between input and output graphs while capturing essential graph features. The
loss function for GAE is defined as:
                                                  �             �
                              LGAE (X, X � ) = min ‖X − X � ‖2F                         (31)
where X ′ is computed from the inner product of the hidden representation Z and its
transpose Z T using the logistic sigmoid function 𝜎(ZZ T ). Z = GCN(F, X), obtained through
the Graph Convolutional Network (GCN) applied to the node features matrix F , is based
on the input data X .
Variational Graph Autoencoder (VGAE) (Kipf and Welling 2016) is a framework for
learning interpretable latent representations of graph-structured data. It employs a
probabilistic approach to encode graph information effectively. VGAE consists of two
essential components: an encoder and a decoder. The encoder utilizes a Graph Convolution
Network (GCN) to transform graph nodes into a lower-dimensional latent space. It
generates latent variables zi for each node by sampling from Gaussian distributions. These
latent variables capture crucial structural information of the graph. The decoder functions
as a generative model, aiming to reconstruct the original graph structure using the latent
variables zi . It estimates the likelihood of connections (edges) between nodes based on
their corresponding latent vectors.The VGAE loss function combines a reconstruction term
and a regularization term to guide the learning process effectively:
                 LVGAE = −E(q(Z|F, X))[log[p(X|Z)]] + KL(q(Z|F, X)||p(Z))                  (32)
where q(Z|F, X) represents the encoding distribution, p(X|Z) models the likelihood of
the adjacency matrix given the latent variables, and KL(q(Z|F, X)||p(Z)) quantifies the
divergence between the encoding distribution and the prior distribution governing the
latent variables Z.
13
Autoencoders and their applications in machine learning: a…                    Page 27 of 52    28
Adversarial Graph Autoencoder (AGAE) (Pan et al. 2018) leverages adversarial training
to acquire a lower-dimensional representation of the input graph. It employs an encoder to
map graph nodes to this lower-dimensional space and a decoder to reconstruct the original
graph. AGAE integrates an adversarial component, akin to a discriminator, to ensure the
learned embeddings preserve the graph structure. This unsupervised model combines
autoencoder-based reconstruction with adversarial training to generate high-quality graph
representations. The AGAE loss function is defined as follows:
                      LAGAE = E(H∼pz ) [log D(Z)] + EX [log(1 − D(G(F, X)))]                   (33)
where G(⋅) represents the generator, and D(⋅) signifies the discriminator. The
discriminator’s role is to distinguish between the real input graph, pz , and the reconstructed
graph generated by the generator G(F, X).
Graph Attentional Autoencoder (GAAE) (Salehi and Davulcu 2019) is a variant of graph
autoencoders that combines Graph Attention Network (GAT) with GAE. It employs
attention mechanisms to weigh the importance of neighboring nodes and edges during the
reconstruction process. In essence, GAAE aims to learn a low-dimensional representation
of a graph while preserving its structural information using attention mechanisms. The
GAAE loss function is defined as follows:
                                       �                      �
                          LGAAE = min ‖X − Sigmoid(ZZ T ))‖2F                      (34)
in which Z represents the hidden layer representation of node v. The calculation of Zi(l) is
based on the formula:
                                      (                       )
                                        ∑
                                                                                     (35)
                               (l)                (l−1) (l−1)
                              Zi = 𝜎        aij W      Zj
                                                  j∈Ni
where Ni denotes the set of neighbors of node vi , and W (l−1) represents the learnable
parameter matrix. The attention coefficient aij is computed using the following formula:
3.8 Masked autoencoders
                                                                                    13
28   Page 28 of 52                                                             K. Berahmand et al.
generate coherent and contextually appropriate text or videos, making them valuable for
tasks like text completion (Zhang et al. 2022), text generation (Zhang et al. 2023,) language
modeling, image captioning (Alzu’bi et al. 2021) and data augmentation (Xu et al. 2022).
Graph Masked Autoencoder (GMAE) (Hou et al. 2022) is a simplified and cost-effective
approach for self-supervised graph representation learning. Unlike most GAEs that focus
on reconstructing graph structures, GMAE’s core emphasis is on feature reconstruction
through masking. Additionally, GMAE departs from using MSE, opting for the cosine
error, which benefits cases where feature magnitudes vary, common in graph node
attributes. The primary objective of GMAE is to reconstruct the masked features of nodes,
V ′ ⊂ V , given the partially observed node signals. Formally, for GMAE, the Loss function
is as follow, where it is averaged over all masked nodes,
                                                �                  �𝛾
                                     1 �                 xiT zi
                     LGMAE = min �                1−                  , 𝛾≥1           (37)
                                    �V � v ∈V �      ‖xi ‖ ⋅ ‖zi ‖
                                         i
                                                         �               �
                               ⎛                                   𝜌j−         ⎞
                               ⎜                            − exp(  𝜏
                                                                       )       ⎟
                                                                                            (38)
                                       � 2
                 LCMAE   = min ⎜‖Ym − Ym ‖F + 𝜆 log     𝜌−j    ∑K          𝜌−j ⎟
                               ⎜                    exp( 𝜏 ) + j=1 exp( 𝜏 ) ⎟
                               ⎝                                               ⎠
13
     Table 4  Various autoencoder methods including details on their respective improvements and utilized loss functions
     Method        Improvement                                                                                 Loss function
                                                                                                                   �                             �
     SAE           Learns a more compact and informative representation of the data                            min ‖X − X � ‖2F + 𝜆KL(p ∥ q)
                                                                                                                  �                            �
     CAE           Learns a mapping that is robust to small input variations                                   min ‖X − X � ‖2F + 𝜆‖JF (X)‖2F
                                                                                                                  �                          �
     LAE           Learns a low-dimensional data representation while preserving the local structure           min ‖X − X � ‖2F + 𝜆tr(Z � LZ)
                                                                                                                  �                               �
     OAE           Enforcing orthogonality among latent features, enhancing class discriminability             min ‖X − X � ‖2F + 𝜆‖Z T Z − I‖2F
                                                                                                                  �             �
     DAE           Introduces noise to input and reconstructs the output from the original clean input         min ‖X − X̂� ‖2F
                                                                                                                  � ∑                      �
     M-DAE         Reconstructs clean data from noisy data where some of the features are missing                      m
                                                                                                               min m1 i=1 ‖X − X̂� W‖2F
                                                                                                                   �                          �
     2,1RAE        Enhances resilience to noisy data using L2,1 regularization, encouraging feature sparsity   min ‖X − X � ‖2F + 𝜆 ⋅ ‖Z‖2,1
                    and robustness
                                                                                                                       [            ]
     VAE           Learns the input data distribution and generates new data points from this distribution −E(q(Z|X)) log[p(X|Z)] + KL(q(Z|X)||p(Z))
                                                                                                                �                                          �
     AAE         Learns the input data structure and generates new data points similar to them             min ‖X − X � ‖2F + log(D(X)) + log(1 − D(G(Z)))
                                                                                                            � ∑                               �
     BAE         Combining Gaussian likelihood and isotropic Gaussian prior for effective data pattern and − 1 D 1 (x − x� )2 + 1 log 𝜎 2
                                                                                                               D   i=1 2𝜎 2 i   i     2     i
                   uncertainty capture                                                                                   i
                                                                                                                                                                                              Autoencoders and their applications in machine learning: a…
     DiffusionAE a specialized generative model, employing the Diffusion Probabilistic Loss for training   − log P(X|X � )
                                                                                                                       [            ]
     CVAE        Integrating convolutional layers and probabilistic modeling, using a Gaussian latent      −E(q(Z|X)) log[p(X|Z)] + KL(q(Z|X)||p(Z))
                   variable and KL divergence regularization
                                                                                                                ∑N ∑M ∑T �                    �
     ConvLSTM combines convolution and recurrent layers for spatiotemporal data                            min                    ‖X − X � ‖2
                                                                                                                      i=1   j=1   t=1      ijt   ijt F
     CSAE          Combines the convolutional layers of a CNN with the sparsity constraint of a SAE                ∑L �                 �2
                                                                                                               min l=1 ‖X (l) − X � (l) � )
                                                                                                                                        �F
                                                                                                                   �      �   2
                                                                                                                                �
     LSTMAE        Uses LSTM units in the encoder and decoder parts of the network                             min ‖X − X ‖F
                                                                                                                   �            �
     GRUAE         Uses GRU units in the encoder and decoder parts of the network                              min ‖X − X � ‖2F
                                                                                                                     ∑T �                �
     BiRNNAE       Using bidirectional RNNs to minimize squared reconstruction error with an MSE loss for      min T1 t=1 ‖Xt − Xt� ‖2F
                    sequential data
     SSVAE         Combining log-likelihood terms for latent variables and Kullback–Leibler divergence         −𝔼q𝜙 (z|x,y) [log p𝜃 (x|y, z)] − log p𝜃 (y) + KL(q𝜙 (z|x, y)||p(z))
     DVAE          A unique loss function for capturing complex data patterns and relationships between        𝔼q(y,z|x) (log p(x|y, z) + log p(y) + log p(z) − log q(y|x, z) − log q(z|x))
                                                                                                                                                                                               Page 29 of 52
13
     Table 4  (continued)
                                                                                                                                                                                         28
13
     LSRAE         Combining sparse and label regularizations with autoencoders to improve feature          min ‖X − X � ‖2F + KL(p ∥ q) + i=1 j=1 (Wij )2 + i=1 ‖L − T‖
                     extraction and categorization accuracy
                                                                                                                                                                                         Page 30 of 52
     VGAE          Using a probabilistic approach, combining an Encoder and a Decoder guided by a loss      −E(q(Z|F, X))[log[p(X|Z)]] + KL(q(Z|F, X)||p(Z))
                     function with a reconstruction term
     AGAE          Using adversarial training with encoder and decoder components to create compact graph   E(H∼pz ) [log D(Z)] + EX [log(1 − D(G(F, X)))]
                     representations
                                                                                                               �                        �
     GAAE          Using attention mechanisms to reconstruct graphs effectively. Its loss emphasizes        min ‖X − Sigmoid(ZZ T ))‖2F
                     preserving structural information
                                                                                                                                   �                      �𝛾
     GMAE          Prioritizing feature reconstruction through masking and employs cosine error                    1 ∑                         xiT zi
                                                                                                            min   �V � � vi ∈V �       1−   ‖xi ‖⋅‖zi ‖
                                                                                                                                                               ,      𝛾≥1
                                                                                                                                                            �            �
                                                                                                                                                                      j
     CMAE          Uses Improving vision representations with online and target branches, online encoder        ⎛                                            − exp( 𝜌𝜏− )           ⎞
                    reconstructs masked images, using cosine similarity loss                                                                                𝜌−                 𝜌−
                                                                                                                                                                                    ⎟
                                                                                                            min ⎜‖Ym − Ym� ‖2F + 𝜆 log
                                                                                                                ⎜                                                    ∑
                                                                                                                                                        exp( 𝜏j    )+ Kj=1 exp( 𝜏j ) ⎟
                                                                                                                ⎝                                                                   ⎠
     SDMAE         Utilizing student and teacher branches to reconstruct missing information                min(log q𝜓 (̂x|̃x)
                                                                                                                                                                                         K. Berahmand et al.
Autoencoders and their applications in machine learning: a…                                                                   Page 31 of 52             28
                                                                                           ∑n
                                                                                             mi f𝜙 (xi )f𝜃 (̂x)
                                                                                               i=1
              LSDMAE = min(log q𝜓 (̂x�̃x)) ≈ min �                                                 �∑                                                (39)
                                                                                 ∑n    i (f (x ))2       n
                                                                                  i=1 m    𝜙  i
                                                                                                                i   x))2
                                                                                                         i=1 m (f𝜃 (̂
4 Application autoencoder
AEs have been widely used in various domains, including computer vision, natural
language processing, complex network analysis, recommenders, anomaly detection,
speech recognition, and more. Different types of autoencoder architectures have been
proposed to address specific challenges and improve performance in these domains.
For example, convolutional autoencoders are commonly used in image processing
tasks, while recurrent autoencoders are well-suited for sequential data processing. In
addition, variational autoencoders have been developed for generating new data samples
and improving model generalization. Although each architecture has its own advantages
and limitations, it is important to consider the specific requirements of the application
domain when selecting an appropriate architecture. Figure 5 provides an overview of
the applications of autoencoders in various domains, which can be used as a starting
point for selecting an appropriate architecture. However, further research is needed to
investigate which architectures are more suitable for which application categories and
which architectures are more popular in specific domains.
                                                                Application of
                                                                Autoencoder
              Object
             Detection
3D Shape
Fig. 5  The process of creating the consensus matrix, including the generation of random walks of different
lengths and their combination
                                                                                                                                       13
28   Page 32 of 52                                                        K. Berahmand et al.
4.1 Machine vision
Machine vision utilizes computer algorithms and software to analyze and interpret
images or video data, aiming to enable machines to understand and interact with
the visual world (Jain et al. 1995). AEs play a vital role in various machine vision
applications by learning to extract meaningful image features and reducing data
dimensionality. These applications encompass tasks such as image classification
(Vincent et al. 2010), image clustering (Guo et al. 2017), image segmentation
(Myronenko 2019), image inpainting (Bertalmio et al. 2000), image generation (Vahdat
and Kautz 2020), object detection (Liang et al. 2018), and 3D shape analysis (Todd
2004).
   AEs are instrumental in image classification. Methods like Semi-supervised stacked
distance autoencoder (Hou et al. 2020) enhance feature representation by incorporating
semi-supervised learning, utilizing both labeled and unlabeled data to learn inter-data
point distances. Deep Convolutional Autoencoders (DCAE) aid in semi-supervised
classification, as seen in Geng et al. (2015), where they pre-train on unlabeled Synthetic
Aperture Radar (SAR) images and fine-tune using labeled data for high-resolution SAR
images classification.
   AEs are also valuable in image clustering, where they learn compressed image
representations for grouping similar images in the latent space. This technique involves
training a clustering algorithm like K-means on the latent space, as described in
references Song et al. (2013) and Yang et al. (2017). Additionally, AEs can be used for
unsupervised image clustering, making them suitable for scenarios with limited labeled
data.
   AEs are instrumental in image segmentation, with a wide array of applications that
enhance the precision and efficiency of this critical computer vision task. By learning
meaningful feature representations from image data, AEs provide a valuable foundation
for distinguishing objects and boundaries in images. Their capability for dimensionality
reduction streamlines the processing of high-resolution images, making segmentation
algorithms computationally more tractable (Zhang et al. 2019). AEs also excel in noise
reduction, eliminating unwanted artifacts from images, which is pivotal for accurate
segmentation (Tripathi 2021). They are integral in semantic segmentation (Ohgushi
et al. 2020), where they classify each pixel in an image, and instance segmentation (Lin
et al. 2020), distinguishing individual object instances. Furthermore, AEs contribute
to medical image segmentation (Ma et al. 2022), aiding in the precise identification
of structures and anomalies in healthcare images. Overall, AEs substantially elevate
the accuracy and efficiency of image segmentation tasks, encompassing a range of
applications that extend from object recognition to medical diagnosis.
   AEs find significant applications in the domain of image inpainting, a process
of reconstructing missing or corrupted parts of an image. They excel at capturing
complex patterns and textures within images, making them invaluable for this task.
AEs, particularly VAEs and GANs, offer high-quality inpainting results by learning to
generate realistic and coherent content to fill in the gaps (Tian et al. 2023; Han and
Wang 2021). They effectively model the underlying structures and features of images,
ensuring that the inpainted regions seamlessly blend with the surrounding content.
   AEs find versatile applications in image generation tasks, contributing to the creation
of high-quality and diverse visual content. They serve as a foundational component
in generative models, VAEs and GANs, enabling the synthesis of realistic and novel
13
Autoencoders and their applications in machine learning: a…                   Page 33 of 52   28
images (Huang and Jafari 2023). AEs are essential in encoding and decoding operations,
effectively generating images with specific features, styles, and content (Xu et al. 2019).
They also play a vital role in style transfer, where they transform images to adopt the
artistic characteristics of other images or styles (Kim et al. 2021).
   AEs play a role in object detection by extracting valuable features from images or video
frames, improving detection accuracy. Convolutional AEs are used to learn compressed
image representations that enhance the performance of object detection algorithms, such
as Region-based Convolutional Neural Networks (R-CNN) (Ding et al. 2019). VAE further
enhanes object detection accuracy, as seen in the integration of VAE with You Only Look
Once (YOLO) (Redmon et al. 2016).
   In the domain of 3D shape analysis, AEs learn compressed representations for tasks like
shape generation, completion, and retrieval. Achieving a disentangled latent representation
that separates various factors of variation is a challenge. Recent research introduces
methods like Split-AE (Saha et al. 2022) and 3D Shape Variational Autoencoder Latent
Disentanglement (Foti et al. 2022), addressing this challenge. Other approaches employ
deep learning features for 3D shape retrieval by projecting 3D shapes into 2D space and
utilizing AEs for feature learning (Zhu et al. 2016). Additionally, architectures like point-
cloud AEs combined with VAEs are explored to partition the latent space and enhance 3D
shape analysis (Aumentado-Armstrong et al. 2019).
   While AEs offer valuable capabilities in various machine vision applications, their
effectiveness often depends on the specific task and dataset characteristics, and they may
be complemented by specialized models in certain scenarios.
4.2 NLP
NLP is a field that explores how computers can understand and work with human
language in speech or text form to perform useful tasks (Chowdhary and Chowdhary
2020). This area mainly concentrates on methods for handling text data, including tasks
like categorizing text (text classification) (Kowsari et al. 2019), grouping similar texts
together (text clustering) (Aggarwal and Zhai 2012), generating new text (text generation)
(McKeown 1992), and assessing the sentiment expressed in text (sentiment analysis)
(Medhat et al. 2014). To tackle the complexities of working with textual data, researchers
have developed advanced models, often incorporating AEs. These models have proven
effective in addressing the challenges associated with processing text data (Li et al. 2023).
   AEs play a versatile role in text classification tasks, offering feature learning to capture
crucial patterns in text data (Guo et al. 2023; Ye et al. 2022), dimensionality reduction
for efficient processing of high-dimensional text features (Le et al. 2023; Che et al. 2020),
noise reduction to clean and enhance noisy text (García-Mendoza et al. 2022; Che et al.
2020), and semi-supervised learning for improved classification using limited labeled
data (Wu et al. 2019; Xu et al. 2017). They also excel in topic modeling by uncovering
underlying themes within text documents (Paul et al. 2023; Smatana and Butka 2019),
aid in anomaly detection to identify unusual patterns (Gorokhov et al. 2023; Bursic
et al. 2019), and enable coherent text generation (Semeniuta et al. 2017; Zhao et al.
2021). Their adaptability and versatility make them indispensable tools in NLP and text
analysis, enhancing various aspects of text classification. Another application of AE in
the field of NLP is text clustering. In this context, AEs have been applied to organize text
documents into meaningful groups. One approach utilizes stacked AEs, combining them
with k-means clustering to effectively group text documents into meaningful clusters
                                                                                   13
28   Page 34 of 52                                                           K. Berahmand et al.
(Hosseini and Varzaneh 2022). In Deep Embedded Clustering (DEC), AEs play a pivotal
role by initializing feature representations of data points and serving as the foundation for
similarity computations during the clustering process. The embeddings learned by AEs
are jointly optimized with cluster assignments, thereby enhancing the overall quality of
clustering results (Xie et al. 2016; Daneshfar et al. 2023). AEs also provide a solution
to the challenges of short text clustering. They address the sparsity problem in short text
representations by employing low-dimensional continuous representations or embeddings
like Smooth Inverse Frequency (SIF) embeddings. Here, the encoder maps the input
short texts to a lower-dimensional continuous representation, and the decoder strives to
reconstruct the input from this representation. AEs are used to encode and reconstruct
these SIF embeddings, resulting in improved short text clustering quality (Hadifar et al.
2019).
4.3 Complex network
13
Autoencoders and their applications in machine learning: a…                Page 35 of 52   28
while preserving pairwise topology (Fan et al. 2021). Bayesian deep generative
frameworks are used to learn deep latent representations, improving link prediction in
HINs. Another method (Salha et al. 2019) inspired by Newtonian gravity extends the graph
autoencoder and VAE frameworks to address link prediction in directed graphs, effectively
reconstructing directed graphs from node embeddings. Lastly, the Multi-Scale Variational
Graph Autoencoder (MSVGAE) introduces a novel graph embedding framework that
leverages graph attribute information through self-supervised learning (Guo et al. 2022).
   In conclusion, autoencoders are versatile tools for intricate network analysis,
contributing significantly to tasks such as network embedding, deep clustering, and link
prediction by capturing complex patterns, enhancing representations, and enabling precise
predictions.
4.4 Recommender system
                                                                                13
28   Page 36 of 52                                                            K. Berahmand et al.
and Neural Collaborative Autoencoder (NCAE) (He et al. 2017). HCCAE combines the
learned representations with other recommendation models, while NCAE utilizes a neural
network to generate recommendations directly from the learned representations. These
models leverage additional information such as content features, social relationships, or
visual data to enhance their recommendations. Each model possesses unique characteristics
and objectives, making them suitable for addressing various challenges like cold start
problems, sequential data, semantic information, or visual styles.
4.5 Anomaly detection
While AEs have the ability to learn complex patterns in data and detect anomalies that are
not easily identifiable, it has been widely used in the field of anomaly detection (Pang et al.
2021). An anomaly detection model can be used to detect a fraudulent transaction or any
highly imbalanced supervised tasks (Chandola et al. 2009). AEs can be used in supervised
(Alsadhan 2023), unsupervised (Lopes et al. 2022), and semi-supervised (Akcay et al.
2018; Ruff et al. 2019) anomaly detection tasks.
   In supervised anomaly detection, AEs are trained on both normal and anomalous
data. The AE is first trained on normal data to learn the underlying patterns and features
of normal data. Then, the AE is fine-tuned on the combined normal and anomalous
data to capture the difference between normal and anomalous data. During training, the
objective is to minimize the reconstruction error between the input and the output of the
AE. After training, the reconstruction error of the test data is compared to a threshold. If
the reconstruction error is above the threshold, the input data is classified as anomalous
(Pang et al. 2021). This approach combines the feature learning capabilities of AEs with
the discriminative power of supervised classifiers, enhancing the accuracy of anomaly
detection in real-world applications, including fraud detection (Alsadhan 2023; Debener
et al. 2023; Fanai and Abbasimehr 2023), network security (Ghorbani and Fakhrahmad
2022; Lopes et al. 2022), and fault detection (Ding et al. 2022; Ying et al. 2023) in
industrial processes.
   In unsupervised tasks, the idea is to train AEs on only sample data of one class
(majority class). This way the network is capable of re-constructing the input with good
or less reconstruction loss. Now, if a sample data of another target class is passed through
the AE network, it results in comparatively larger reconstruction loss, a threshold value
of reconstruction loss (anomaly score) can be decided, larger than that can be considered
an anomaly (Sakurada and Yairi 2014). This inherent ability to capture complex data
representations without labeled anomalies makes AEs effective in detecting anomalies,
whether in cyber-security for identifying network intrusions (Lopes et al. 2022; An
et al. 2022; Lewandowski and Paffenroth 2022), in manufacturing for spotting defects
(Papananias et al. 2023; Sudo et al. 2021), or in finance for fraud detection (Du et al. 2022;
Jiang et al. 2023; Kennedy et al. 2023). The versatility of AEs and their capacity to adapt
to diverse data types contribute to their widespread use in unsupervised anomaly detection
scenarios, enhancing system security and reliability.
   AEs have been employed effectively in semi-supervised anomaly detection by
capitalizing on their capacity to learn rich data representations (Zhou et al. 2023). In this
context, a portion of the training data is labeled as normal, while the majority remains
unlabeled. The AE is trained to reconstruct the normal data accurately, and during this
process, it learns to capture the underlying structure and features of the normal class.
When presented with new, unlabeled data, the AE endeavors to reconstruct it (Ruff et al.
13
Autoencoders and their applications in machine learning: a…                 Page 37 of 52   28
2019). Anomalies, which deviate significantly from the learned normal patterns, result
in high reconstruction errors. By setting a suitable threshold on the reconstruction error,
anomalies can be effectively detected. This semi-supervised approach minimizes the need
for extensive labeled anomaly data and has proven effective in various domains, including
fraud detection (Charitou et al. 2020; DeLise 2023; Dzakiyullah et al. 2021), network
security (Dong et al. 2022; Hara and Shiomoto 2020; Hoang and Kim 2022; Thai et al.
2022), and quality control (Cacciarelli et al. 2022; Sae-Ang et al. 2022), where labeled
anomalies are often scarce.
4.6 Speech processing
                                                                                 13
28   Page 38 of 52                                                         K. Berahmand et al.
4.7 Other
4.7.1 Fault diagnosis
4.7.2 Intrusion detection
13
Autoencoders and their applications in machine learning: a…               Page 39 of 52   28
   Autoencoders can play a significant role in automatic feature extraction for intrusion
detection systems. Kunang et al. (2018) propose a method in which an autoencoder is
employed to extract relevant features from raw network traffic data. These extracted
features are then used as input for a classifier, such as a Support Vector Machine
(SVM), to distinguish between normal and malicious traffic. Compared to traditional
rule-based or signature-based methods, autoencoders have the potential to enhance the
accuracy and efficiency of intrusion detection systems (Ieracitano et al. 2020).
4.7.3 Hyperspectral imaging
AEs find wide-ranging applications in hyperspectral image analysis due to their ability
to learn concise representations of high-dimensional data. Hyperspectral imaging is a
potent technique for capturing detailed spectral information about objects or scenes. It
involves multi-dimensional data where each pixel contains a spectrum of reflectance
or radiance values across numerous narrow, contiguous spectral bands (Jaiswal et al.
2023).
   AEs are employed for various tasks in managing hyperspectral data, including
hyperspectral data compression (Minkin et al. 2021), hyperspectral unmixing (Książek
et al. 2022), blind hyperspectral unmixing (Palsson et al. 2022), and dimensionality
reduction (Zabalza et al. 2016). In data compression, AEs condense hyperspectral data
while retaining crucial information, facilitating subsequent analysis and processing.
Hyperspectral unmixing entails decomposing a hyperspectral image into its constituent
parts, referred to as endmembers. AEs play a pivotal role in reconstructing the spectral
profiles of these identified components (endmembers) and determining their proportional
mixing amounts (abundances). This is indispensable for enhancing the efficiency of
hyperspectral analysis and classification tasks (Su et al. 2019). Blind hyperspectral
unmixing involves deconstructing the recorded spectrum of a pixel into a mixture of
endmembers while simultaneously discerning the proportions or fractions of these
endmembers within the pixel. Training an AE on hyperspectral images results in a lower-
dimensional representation of the data, rendering it more manageable for subsequent
analysis (Petersson et al. 2016).
The development and availability of open-source libraries for various versions of AEs
have greatly facilitated research in this field. Three popular libraries that are widely
used for building and training autoencoder models are TensorFlow, PyTorch, and
Keras. Each of these libraries has its strengths and is preferred by different segments of
the machine learning and deep learning community. Table 5 presented in this section
provides a comprehensive overview of the source code for our proposed category of AE
variants. Researchers can access these code repositories to implement and test different
versions of AEs, and to compare their performance on various tasks. For instance, one
could use the available code to train a variational AE for image reconstruction or a graph
attention AE for node embedding. These libraries are not only useful for research but
also for practical applications, as they enable practitioners to easily deploy pre-trained
models on their own datasets. Table 6 presents a comprehensive overview of various AE
                                                                               13
28    Page 40 of 52                                                                          K. Berahmand et al.
Table 5  AE Models and their corresponding years of publication, programming languages, and code
repositories
Subsection Model           Year Language Code Repository
models and their diverse applications in machine learning. Each model is associated with
specific applications, datasets, methodology, evaluation metrics, and performance results.
Notable applications include feature learning, dimensionality reduction, graph-based data
representation, generative modeling, anomaly detection, and sequential data analysis. The
evaluation metrics vary depending on the application but commonly include error rates,
accuracy, precision, recall, F1 score, Area Under the Curve (AUC), and more. These AEs
demonstrate their effectiveness in tasks ranging from image classification and sentiment
analysis to graph representation learning and acoustic novelty detection, showcasing
their versatility in addressing a wide array of machine learning challenges across various
domains.
13
     Table 6  AE Models and their corresponding applications
     AE model                             Application                           Methodology                             Dataset               Performance
     SAE (Ng 2011)                        Sparse and Discriminative Feature     Image classification                    MNIST                 Error rate = 1.35
                                           Learning.
                                                                                Fault diagnosis                         CWRU                ACC = 100
     CAE (Rifai et al. 2011)              Feature Extraction and                Feature extraction and classification   CIFAR                Error rate = 47.86
                                           Dimensionality Reduction.                                                    MNIST                Error rate = 1.14
     LAE (Jia et al. 2015)                Graph-based data representation       Manifold generalization                 MNIST                Error rate = 0.98
                                           learning.                                                                    CIFAR-10             Error rate = 45.41
     OAE (Wang et al. 2019)               Discriminative and diverse feature    Data clustering                         MNIST                ACC = 95.4
                                           representations                                                                                   NMI = 90
     DAE (Vincent et al. 2010)            Robust Feature Extraction.            Data classification                     MNIST                Error rate = 1.21
     M-DAE (Chen et al. 2012)             Anomaly detection                     Sentiment analysis                      Amazon reviews       Transfer rate = 1.1
     L2,1-RAE (Li et al. 2018)            Outlier detection                     Unsupervised                            MNIST                ACC = 97.66
                                                                                feature learning                        Reuters-21578        ACC = 82.92
     VAE (An and Cho 2015)                Generative modeling.                  Anomaly detection                       MNIST                AUC ROC = 91.7
                                                                                                                                                                              Autoencoders and their applications in machine learning: a…
     LSTMAE (Nguyen et al. 2021)          Capture representations from          Forecasting and                         C-MAPSS              ACC = 98.36
                                            sequential data.                    anomaly detection                                            F-score = 96.98
                                                                                                                                                                               28
13
     Table 6  (continued)
                                                                                                                                               28
13
     GRUAE (Dehghan et al. 2014)      Sequential data reconstruction         Determining                    Family 101     Precision = 81.5
                                                                             Parent-Offspring               KinFaceW-II    Precision = 74.5
                                                                                                                                               Page 42 of 52
                                                                             Resemblance
     BiRNNAE (Marchi et al. 2015)     Capture contextual information from    Acoustic novelty detection     PASCAL CHiME   Precision = 94.7
                                       both of sequence directions                                                         Recall = 92.0
     SSVAE (Xu et al. 2017)           Data representation.                   Text classification            IMDB           Error rate = 7.6
                                                                                                            AGNews         Error rate = 7.68
     DVAE (Higgins et al. 2016)       Disentanglement representation         Unsupervised disentanglement   celebA         ACC = 83.9
                                       learning in complex data.               representations
     LSRAE (Chai et al. 2019)         Extract the potential features to      Image classification           MNIST          ACC = 98.33
                                       improve classification.
     VGAE (Kipf and Welling 2016)     Graph-based generative modeling.       Link prediction                Cora           ACC = 63.8
                                                                                                                           NMI = 45
     AGAE (Pan et al. 2018)           Graph-Based Anomaly Detection.         Link prediction                Cora           AUC = 92.4
                                                                                                                           AP = 92.6
     GAAE (Salehi and Davulcu 2019)   Graph representation learning.         Node classification            Cora           ACC = 83.2
     GMAE (Hou et al. 2022)           Sequence modeling and text             Node classification            Cora           Micro-f = 84.2
                                       generation.
     CMAE (Huang et al. 2022)         Data augmentation                      Image classification, data     ImageNet-1k    ACC = 85.3
                                                                               augmentation
     SDMAE (Chen et al. 2022)         Generate high descriptive capability   Image classification           ImageNet-1k    ACC = 84.1
                                       for MAE
                                                                                                                                               K. Berahmand et al.
Autoencoders and their applications in machine learning: a…                   Page 43 of 52   28
6 Future directions
Despite in-depth research on autoencoders and their improved algorithms in recent years,
the following issues still need to be addressed.
6.2 Hypergraph autoencoder
Autoencoders have proven effective in preserving the non-linear structure of data due to
their deep learning capabilities. However, they face a challenge in preserving higher-order
neighbors in complex datasets. While autoencoders can address the former concern, they
may not inherently handle the latter. To bridge this gap, integrating hypergraph-based
representations of data into the autoencoder framework emerges as a potential solution. By
transforming the data into a hypergraph and feeding it as input to the autoencoder, it may
be possible to preserve the critical high-order neighbor relationships. This approach holds
promise for enhancing the utility of autoencoders in scenarios where preserving intricate
data dependencies is crucial, potentially leading to improved performance across various
applications.
Constructing an autoencoder involves crucial decisions about parameters like the number
of hidden layers and nodes, which significantly influence the model’s final performance.
While parameter selection is essential, the process of identifying the most suitable
configuration can be challenging. In current research efforts, some have explored leveraging
reinforcement learning techniques in conjunction with autoencoder construction. This
novel approach aims to optimize autoencoder parameters efficiently, potentially enhancing
model performance. The integration of reinforcement learning into parameter tuning
represents an evolving research gap that holds promise for automating and improving the
autoencoder design process.
                                                                                   13
28   Page 44 of 52                                                                         K. Berahmand et al.
7 Conclusion
Autoencoders have become a focal point in unsupervised learning due to their remarkable abil-
ity to uncover data features and serve as a valuable dimensionality reduction tool. This paper has
conducted a thorough examination of autoencoders, covering their fundamental principles and a
detailed classification of models based on unique characteristics. We have also explored their use
in various areas, from computer vision to natural language processing, highlighting their adapt-
ability. During this study, we’ve recognized both the advantages and occasional drawbacks of
autoencoders. By classifying and summarizing these models based on their unique traits, we’ve
revealed possible directions for future enhancements and innovations. This insight paves the way
for further progress in the field.
    In summary, autoencoders have an important role in the field of machine learning,
and their significance is continuously growing. They have the remarkable ability to
find valuable insights in data and create smart results, which can greatly impact vari-
ous areas. We expect an ongoing journey of progress and important developments in
the field of autoencoders, ultimately leading to the creation of even more powerful and
intelligent solutions that benefit society as a whole. Autoencoders are positioned to fos-
ter innovation and shape the future of machine learning.
Author contributions KB and FD has made a substantial contribution to the concept of the article and
drafted the article, ES has made an analysis of the article data, and YL and YX has revised the article.
Data availability The data that support the findings of this study are available from the corresponding author
upon reasonable request.
Declarations
Conflict of interest The authors declared no potential conflicts of interest with respect to the research,
authorship, and/or publication of this article.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,
which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-
mons licence, and indicate if changes were made. The images or other third party material in this article
are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
13
Autoencoders and their applications in machine learning: a…                               Page 45 of 52   28
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly
from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Abdi H, Williams LJ (2010) Principal component analysis. Wiley interdisciplinary reviews:
      computational statistics 2(4):433–459
Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms. Mining Text Data, 77–128
Akcay S, Atapour-Abarghouei A, Breckon TP (2018) Ganomaly: Semi-supervised anomaly detection via
      adversarial training. In: Computer Vision-ACCV 2018: 14th Asian conference on computer vision,
      Perth, Australia, December 2-6, 2018, Revised Selected Papers, Part III 14, Springer, pp 622–637
Alex SB, Mary L (2023) Variational autoencoder for prosody-based speaker recognition. ETRI J
      45(4):678–689
Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K (2018) Deep learning approach combining sparse
      autoencoder with SVM for network intrusion detection. IEEE Access 6:52843–52856
Alsadhan N (2023) A multi-module machine learning approach to detect tax fraud. Comput Syst Sci Eng
      46(1):241–253
Alzu’bi A, Albalas F, Al-Hadhrami T, Younis LB, Bashayreh A (2021) Masked face recognition using
      deep learning: a review. Electronics 10(21):2666
An J, Cho S (2015) Variational autoencoder based anomaly detection using reconstruction probability.
      Special Lecture IE 2(1):1–18
An P, Wang Z, Zhang C (2022) Ensemble unsupervised autoencoders and gaussian mixture model for
      cyberattack detection. Inform Process Manag 59(2):102844
Aumentado-Armstrong T, Tsogkas S, Jepson A, Dickinson S (2019) Geometric disentanglement for
      generative latent shape models. In: Proceedings of the IEEE/CVF international conference on
      computer vision, pp 8181–8190
Azarang A, Kehtarnavaz N (2020) A review of multi-objective deep learning speech denoising methods.
      Speech Commun 122:1–10
Balakrishnama S, Ganapathiraju A (1998) Linear discriminant analysis-a brief tutorial. Inst Signal
      Inform Process 18(1998):1–8
Bank D, Koenigstein N, Giryes R (2020) Autoencoders. arXiv preprint arXiv:2003.05991
Bank D, Koenigstein N, Giryes R (2023) Autoencoders. Machine Learning for Data Science Handbook:
      Data Mining and Knowledge Discovery Handbook 353–374
Bank D, Koenigstein N, Giryes R (2023) Autoencoders. Machine learning for data science handbook:
      Data mining and knowledge discovery handbook, pp 353–374
Berahmand K, Li Y, Xu Y (2023) DAC-HPP: deep attributed clustering with high-order proximity
      preserve. Neural Comput Appl pp 1–19
Bertalmio M, Sapiro G, CasellesV, Ballester C (2000) Image inpainting. In: Proceedings of the 27th
      annual conference on computer graphics and interactive techniques, pp 417–424
Bhangale KB, Kothandaraman M (2022) Survey of deep learning paradigms for speech processing.
      Wireless Pers Commun 125(2):1913–1949
Bursic S, Cuculo V, D’Amelio A (2019) Anomaly detection from log files using unsupervised deep
      learning. In: International symposium on formal methods, Springer, pp 200–207
Cacciarelli D, Kulahci M, Tyssedal J (2022) Online active learning for soft sensor development using
      semi-supervised autoencoders. arXiv preprint arXiv:2212.13067
Cao S, Lu W, Xu Q (2016) Deep neural networks for learning graph representations. In: Proceedings of
      the AAAI conference on artificial intelligence, vol. 30
Chai Z, Song W, Wang H, Liu F (2019) A semi-supervised auto-encoder using label and sparse
      regularizations for classification. Appl Soft Comput 77:205–217
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58
Charitou C, Garcez Ad, Dragicevic S (2020) Semi-supervised gans for fraud detection. In: 2020
      international joint conference on neural networks (IJCNN), IEEE, pp 1–8
Charte D, Charte F, García S, del Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for
      nonlinear feature fusion: taxonomy, models, software and guidelines. Inform Fus 44:78–96
Che L, Yang X, Wang L (2020) Text feature extraction based on stacked variational autoencoder.
      Microprocess Microsyst 76:103063
Chen S, Guo W (2023) Auto-encoders in deep learning-a review with new perspectives. Mathematics
      11(8):1777
Chen Y, Liu Y, Jiang D, Zhang X, Dai W, Xiong H, Tian Q (2022) Sdae: Self-distillated masked
      autoencoder. In: European conference on computer vision, Springer, pp 108–124
                                                                                                13
28   Page 46 of 52                                                                       K. Berahmand et al.
Chen M, Xu Z, Weinberger K, Sha F (2012) Marginalized denoising autoencoders for domain adaptation.
      arXiv preprint arXiv:1206.4683
Chowdhary K, Chowdhary K (2020) Natural language processing. Fundamentals of artificial
      intelligence, pp 603–649
Cui P, Wang X, Pei J, Zhu W (2018) A survey on network embedding. IEEE Trans Knowl Data Eng
      31(5):833–852
Daneshfar F, Soleymanbaigi S, Nafisi A, Yamini P (2023) Elastic deep autoencoder for text embedding
      clustering by an improved graph regularization. Expert Syst Appl 121780
Debener J, Heinke V, Kriebel J (2023) Detecting insurance fraud using supervised and unsupervised
      machine learning. J Risk Insurance
Dehghan A, Ortiz EG, Villegas R, Shah M (2014) Who do i look like? determining parent-offspring
      resemblance via gated autoencoders. In: Proceedings of the IEEE conference on computer vision
      and pattern recognition, pp 1757–1764
DeLise T (2023) Deep semi-supervised anomaly detection for finding fraud in the futures market. arXiv
      preprint arXiv:2309.00088
Ding L, Liu G-W, Zhao B-C, Zhou Y-P, Li S, Zhang Z-D, Guo Y-T, Li A-Q, Lu Y, Yao H-W et al
      (2019) Artificial intelligence system of faster region-based convolutional neural network
      surpassing senior radiologists in evaluation of metastatic lymph nodes of rectal cancer. Chin Med
      J 132(04):379–387
Ding S, Keal CA, Zhao L, Yu D (2022) Dimensionality reduction and classification for hyperspectral
      image based on robust supervised Isomap. J Ind Prod Eng 39(1):19–29
Ding Y, Zhuang J, Ding P, Jia M (2022) Self-supervised pretraining via contrast learning for intelligent
      incipient fault detection of bearings. Reliab Eng Syst Saf 218:108126
Dong Y, Chen K, Peng Y, Ma Z (2022) Comparative study on supervised versus semi-supervised machine
      learning for anomaly detection of in-vehicle can network. In: 2022 IEEE 25th international conference
      on intelligent transportation systems (ITSC), IEEE, pp 2914–2919
Du X, Yu J, Chu Z, Jin L, Chen J (2022) Graph autoencoder-based unsupervised outlier detection. Inf Sci
      608:532–550
Dutt A, Gader P (2023) Wavelet multiresolution analysis based speech emotion recognition system using 1d
      CNN LSTM networks. IN: IEEE/ACM Transactions on audio, speech, and language processing
Dzakiyullah NR, Pramuntadi A, Fauziyyah AK (2021) Semi-supervised classification on credit card fraud
      detection using autoencoders. J Appl Data Sci 2(1):01–07
Fan H, Zhang F, Wei Y, Li Z, Zou C, Gao Y, Dai Q (2021) Heterogeneous hypergraph variational
      autoencoder for link prediction. IEEE Trans Pattern Anal Mach Intell 44(8):4125–4138
Fanai H, Abbasimehr H (2023) A novel combined approach based on deep autoencoder and deep classifiers
      for credit card fraud detection. Expert Syst Appl 217:119562
Fan S, Wang X, Sh, C, Lu E, Lin K, Wang B (2020) One2multi graph autoencoder for multi-view graph
      clustering. In: Proceedings of the web conference 2020, pp 3070–3076
Farahnakian F, Heikkonen J (2018) A deep auto-encoder based approach for intrusion detection system.
      In: 2018 20th international conference on advanced communication technology (ICACT), IEEE, pp
      178–183
Foti S, Koo B, Stoyanov D, Clarkson MJ (2022) 3d shape variational autoencoder latent disentanglement
      via mini-batch feature swapping for bodies and faces. In: Proceedings of the IEEE/CVF conference
      on computer vision and pattern recognition, pp 18730–18739
Gaikwad SK, Gawali BW, Yannawar P (2010) A review on speech recognition technique. Int J Comput Appl
      10(3):16–24
Gao Z, Cecati C, Ding SX (2015) A survey of fault diagnosis and fault-tolerant techniques-part I: fault
      diagnosis with model-based and signal-based approaches. IEEE Trans Ind Electron 62(6):3757–3767
Gao Y, Wang L, Liu J, Dang J, Okada S (2023) Adversarial domain generalized transformer for cross-corpus
      speech emotion recognition. IEEE Trans Affect Comput. https://doi.org/10.1109/TAFFC.2023.32907
      95
García-Mendoza J-L, Villaseñor-Pineda L, Orihuela-Espina F, Bustio-Martínez L (2022) An autoencoder-
      based representation for noise reduction in distant supervision of relation extraction. J Intell Fuzzy
      Syst 42(5):4523–4529
Garson GD (2022) Factor analysis and dimension reduction in R: a social Scientist’s Toolkit. Taylor &
      Francis, New York
Geng J, Fan J, Wang H, Ma X, Li B, Chen F (2015) High-resolution SAR image classification via deep
      convolutional autoencoders. IEEE Geosci Remote Sens Lett 12(11):2351–2355
13
Autoencoders and their applications in machine learning: a…                              Page 47 of 52   28
Ghorbani A, Fakhrahmad SM (2022) A deep learning approach to network intrusion detection using
       a proposed supervised sparse auto-encoder and SVM. Iran J Sci Technol Trans Electr Eng
       46(3):829–846
Girin L, Leglaive S, Bie X, Diard J, Hueber T, Alameda-Pineda X (2020) Dynamical variational
       autoencoders: a comprehensive review. arXiv preprint arXiv:2008.12595
Gorokhov O, Petrovskiy M, Mashechkin I, Kazachuk M (2023) Fuzzy CNN autoencoder for unsupervised
       anomaly detection in log data. Mathematics 11(18):3995
Guo X, Liu X, Zhu E, Yin J (2017) Deep clustering with convolutional autoencoders. In: Neural information
       processing: 24th International Conference, ICONIP 2017, Guangzhou, China, November 14-18,
       2017, Proceedings, Part II 24, Springer, pp 373–382
Guo Z, Wang F, Yao K, Liang J, Wang Z (2022) Multi-scale variational graph autoencoder for link
       prediction. In: Proceedings of the Fifteenth ACM international conference on web search and data
       mining, pp 334–342
Guo Y, Zhou D, Ruan X, Cao J (2023) Variational gated autoencoder-based feature extraction model for
       inferring disease-Mirna associations based on multiview features. Neural Netw
Hadifar A, Sterckx L, Demeester T, Develder C (2019) A self-training approach for short text clustering. In:
       Proceedings of the 4th workshop on representation learning for NLP (RepL4NLP-2019), pp 194–199
Han C, Wang J (2021) Face image inpainting with evolutionary generators. IEEE Signal Process Lett
       28:190–193
Hara K, Shiomoto K (2022) Intrusion detection system using semi-supervised learning with adversarial
       auto-encoder. In: NOMS 2020-2020 IEEE/IFIP network operations and management symposium,
       IEEE, pp 1–8
Hasan BMS, Abdulazeez AM (2021) A review of principal component analysis algorithm for dimensionality
       reduction. J Soft Comput Data Min 2(1):20–30
He X, Liao L, Zhang H, Nie L, Hu X, Chua T-S (2017) Neural collaborative filtering. In: Proceedings of the
       26th international conference on world wide web, pp 173–182
Hickok G, Poeppel D (2007) The cortical organization of speech processing. Nat Rev Neurosci 8(5):393–402
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-vae:
       Learning basic visual concepts with a constrained variational framework. In: International conference
       on learning representations
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput
       18(7):1527–1554
Hoang D-T, Kang H-J (2019) A survey on deep learning based bearing fault diagnosis. Neurocomputing
       335:327–335
Hoang T-N, Kim D (2022) Detecting in-vehicle intrusion via semi-supervised learning-based convolutional
       adversarial autoencoders. Veh Commun 38:100520
Hosseini S, Varzaneh ZA (2022) Deep text clustering using stacked autoencoder. Multimedia tools and
       applications 81(8):10861–10881
Hosseini M, Celotti L, Plourde E (2021) Speaker-independent brain enhanced speech denoising. In: ICASSP
       2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP),
       IEEE, pp 1310–1314
Hou L, Luo X-Y, Wang Z-Y, Liang J (2020) Representation learning via a semi-supervised stacked distance
       autoencoder for image classification. Front Inform Technol Electron Eng 21(7):1005–1018
Hou Z, Liu X, Cen Y, Dong Y, Yang H, Wang C, Tang J (2022) Graphmae: Self-supervised masked graph
       autoencoders. In: Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and
       data mining, pp 594–604
Huang G, Jafari AH (2023) Enhanced balancing GAN: minority-class image generation. Neural Comput
       Appl 35(7):5145–5154
Huang Z, Jin X, Lu C, Hou Q, Cheng M-M, Fu D, Shen X, Feng J (2022) Contrastive masked autoencoders
       are stronger vision learners. arXiv preprint arXiv:2207.13532
Ieracitano C, Adeel A, Morabito FC, Hussain A (2020) A novel statistical analysis and autoencoder driven
       intelligent intrusion detection approach. Neurocomputing 387:51–62
Jain R, Kasturi R, Schunck BG et al (1995) Machine vision, vol 5. McGraw-hill New York, New York
Jaiswal G, Rani R, Mangotra H, Sharma A (2023) Integration of hyperspectral imaging and autoencoders:
       benefits, applications, hyperparameter tunning and challenges. Comput Sci Rev 50:100584
Jha S, Shah S, Ghamsani R, Sanghavi P, Shekokar NM (2023) Analysis of RNNs and different ML and
       DL classifiers on speech-based emotion recognition system using linear and nonlinear features. CRC
       Press, Boca Raton, pp 109–126
Jia K, Sun L, Gao S, Song Z, Shi BE (2015) Laplacian auto-encoders: an explicit learning of nonlinear data
       manifold. Neurocomputing 160:250–260
                                                                                               13
28   Page 48 of 52                                                                         K. Berahmand et al.
Jiang S, Dong R, Wang J, Xia M (2023) Credit card fraud detection based on unsupervised attentional
       anomaly detection network. Systems 11(6):305
Kennedy RK, Salekshahrezaee Z, Villanustre F, Khoshgoftaar TM (2023) Iterative cleaning and learning of
       big highly-imbalanced fraud data using unsupervised learning. J Big Data 10(1):106
Kim S, Jang H, Hong S, Hong YS, Bae WC, Kim S, Hwang D (2021) Fat-saturated image generation from
       multi-contrast MRIs using generative adversarial networks with Bloch equation-based autoencoder
       regularization. Med Image Anal 73:102198
Kipf TN, Welling M (2016) Variational graph auto-encoders. arXiv preprint arXiv:1611.07308
Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification
       algorithms: a survey. Information 10(4):150
Książek K, Głomb P, Romaszewski M, Cholewa M, Grabowski B, Búza K (2022) Improving autoencoder
       training performance for hyperspectral unmixing with network reinitialisation. In: International
       Conference on Image Analysis and Processing, pp. 391–403. Springer
Kumar S, Rath SP, Pandey A (2022) Improved far-field speech recognition using joint variational
       autoencoder. arXiv preprint arXiv:2204.11286
Kunang YN, Nurmaini S, Stiawan D, Zarkasi A, et al (2018) Automatic features extraction using
       autoencoder in intrusion detection system. In: 2018 international conference on electrical engineering
       and computer science (ICECOS), IEEE, pp 219–224
Le T-D, Noumeir R, Rambaud J, Sans G, Jouvet P (2023) Adaptation of autoencoder for sparsity reduction
       from clinical notes representation learning. IEEE J Trans Eng Health Med
Lee J-w, Lee J (2017) Idae: Imputation-boosted denoising autoencoder for collaborative filtering.
       In: Proceedings of the 2017 ACM on conference on information and knowledge management,
       pp2143–2146
Lee D, Seung HS (2000) Algorithms for non-negative matrix factorization. Adv Neural Inform Process Syst
       13
Lei Y, Yang B, Jiang X, Jia F, Li N, Nandi AK (2020) Applications of machine learning to machine fault
       diagnosis: a review and roadmap. Mech Syst Signal Process 138:106587
Lewandowski B, Paffenroth R (2022) Autoencoder feature residuals for network intrusion detection:
       Unsupervised pre-training for improved performance. In: 2022 21st IEEE international conference on
       machine learning and applications (ICMLA), IEEE, pp 1334–1341
Li Y-J, Wang S-S, Tsao Y, Su B (2021) Mimo speech compression and enhancement based on convolutional
       denoising autoencoder. In: 2021 Asia-pacific signal and information processing association annual
       summit and conference (APSIPA ASC), IEEE, pp 1245–1250
Li F, Zuraday J, Wu W (2018) Sparse representation learning of data by autoencoders with l ̂ sub 1∕2̂
       regularization. Neural Netw World 28(2):133–147
Li H, Zhang L, Huang B, Zhou X (2020) Cost-sensitive dual-bidirectional linear discriminant analysis. Inf
       Sci 510:283–303
Li Z, Huang H, Zhang Z, Shi G (2022) Manifold-based multi-deep belief network for feature extraction of
       hyperspectral image. Remote Sens 14(6):1484
Li X, Li C, Rahaman MM, Sun H, Li X, Wu J, Yao Y, Grzegorzek M (2022) A comprehensive review
       of computer-aided whole-slide image analysis: from datasets to feature extraction, segmentation,
       classification and detection approaches. Artif Intell Rev 55(6):4809–4878. https://doi.org/10.1007/
       s10462-021-10121-0
Liang D, Krishnan RG, Hoffman MD, Jebara T (2018) Variational autoencoders for collaborative filtering.
       In: Proceedings of the 2018 World Wide Web Conference, pp 689–698
Liao L, Cheng G, Ruan H, Chen K, Lu J (2022) Multichannel variational autoencoder-based speech
       separation in designated speaker order. Symmetry 14(12):2514
Lin C-C, Hung Y, Feris R, He L (2020) Video instance segmentation tracking with a modified vae
       architecture. In: Proceedings of the IEEE/CVF conference on computer vision and pattern
       recognition, pp 13147–13157
Li P, Pei Y, Li J (2023) A comprehensive survey on design and application of autoencoder in deep learning.
       Appl Soft Comput 110176
Liu Y, Ponce C, Brunton SL, Kutz JN (2023) Multiresolution convolutional autoencoders. J Comput Phys
       474:111801
Lopes IO, Zou D, Abdulqadder IH, Ruambo FA, Yuan B, Jin H (2022) Effective network intrusion detection
       via representation learning: a denoising autoencoder approach. Comput Commun 194:55–65
Luo W, Li J, Yang J, Xu W, Zhang J (2017) Convolutional sparse autoencoders for image classification.
       IEEE Trans Neural Netw Learn Syst 29(7):3289–3294
Luo W, Liu W, Gao S (2017) Remembering history with convolutional lstm for anomaly detection. In: 2017
       IEEE international conference on multimedia and expo (ICME), IEEE pp 439–444
13
Autoencoders and their applications in machine learning: a…                             Page 49 of 52   28
Ma M, Sun C, Chen X (2018) Deep coupling autoencoder for fault diagnosis with multimodal sensory data.
      IEEE Trans Ind Inf 14(3):1137–1145
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015) Adversarial autoencoders. arXiv preprint
      arXiv:1511.05644
Ma S, Li X, Tang J, Guo F (2022) Eaa-net: Rethinking the autoencoder architecture with intra-class features
      for medical image segmentation. arXiv preprint arXiv:2208.09197
Marchi E, Vesperini F, Eyben F, Squartini S, Schuller B (2015) A novel approach for automatic acoustic
      novelty detection using a denoising autoencoder with bidirectional lstm neural networks. In: 2015
      IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1996–2000.
      IEEE
Martínez V, Berzal F, Cubero J-C (2016) A survey of link prediction in complex networks. ACM Comput
      Surv 49(4):1–33
McConville R, Santos-Rodriguez R, Piechocki RJ, Craddock I (2021) N2d:(not too) deep clustering via
      clustering the local manifold of an autoencoded embedding. In: 2020 25th international conference on
      pattern recognition (ICPR), IEEE, pp 5145–5152
McKeown K (1992) Text generation. Cambridge University Press, Cambridge
Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a survey. Ain
      Shams Eng J 5(4):1093–1113
Medsker LR, Jain L (2001) Recurrent neural networks. Design Appl 5(64–67):2
Meyer BH, Pozo ATR, Zola WMN (2022) Global and local structure preserving GPU t-SNE methods for
      large-scale applications. Expert Syst Appl 201:116918
Miao J, Yang T, Sun L, Fei X, Niu L, Shi Y (2022) Graph regularized locally linear embedding for
      unsupervised feature selection. Pattern Recogn 122:108299
Minkin A (2021) The application of autoencoders for hyperspectral data compression. In: 2021 international
      conference on information technology and nanotechnology (ITNT), IEEE, pp 1–4
Miuccio L, Panno D, Riolo S (2022) A wasserstein GAN autoencoder for SCMA networks. IEEE Wireless
      Commun Lett 11(6):1298–1302
Molaei S, Ghorbani N, Dashtiahangar F, Peivandi M, Pourasad Y, Esmaeili M (2022) Fdcnet: presentation
      of the fuzzy CNN and fractal feature extraction for detection and classification of tumors. Comput
      Intell Neurosci 2022
Myronenko A (2019) 3d mri brain tumor segmentation using autoencoder regularization. In: Brainlesion:
      Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop,
      BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018,
      Revised Selected Papers, Part II 4, Springer, pp 311–320
Ng A et al (2011) Sparse autoencoder. CS294A Lecture Notes 72(2011):1–19
Nguyen HD, Tran KP, Thomassey S, Hamad M (2021) Forecasting and anomaly detection approaches using
      LSTM and LSTM autoencoder techniques with the applications in supply chain management. Int J Inf
      Manage 57:102282
Ohgushi T, Horiguchi K, Yamanaka M (2020) Road obstacle detection method based on an autoencoder
      with semantic segmentation. In: proceedings of the Asian conference on computer vision
Palaz D, Collobert R (2015) Analysis of CNN-based speech recognition system using raw speech as input.
      Report, Idiap
Palsson B, Sveinsson JR, Ulfarsson MO (2022) Blind hyperspectral unmixing using autoencoders: a critical
      comparison. IEEE J Sel Topics Appl Earth Observ Remote Sens 15:1340–1372
Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput
      Surv 54(2):1–38
Pang G, Shen C, Cao L, Hengel AVD (2021) Deep learning for anomaly detection: a review. ACM Comput
      Surv 54(2):1–38
Pan S, Hu R, Long G, Jiang J, Yao L, Zhang C (2018) Adversarially regularized graph autoencoder for
      graph embedding. arXiv preprint arXiv:1802.04407
Pan S, Hu R, Long G, Jiang J, Yao L, Zhang C (2018) Adversarially regularized graph autoencoder for
      graph embedding. arXiv preprint arXiv:1802.04407
Papananias M, McLeay TE, Mahfouf M, Kadirkamanathan V (2023) A probabilistic framework for product
      health monitoring in multistage manufacturing using unsupervised artificial neural networks and
      gaussian processes. Proc Inst Mech Eng Part B: J Eng Manufact 237(9):1295–1310
Paul D, Chakdar D, Saha S, Mathew J (2023) Online research topic modeling and recommendation utilizing
      multiview autoencoder-based approach. IEEE Trans Comput Soc Syst
Pereira RC, Santos MS, Rodrigues PP, Abreu PH (2020) Reviewing autoencoders for missing data
      imputation: technical trends, applications and outcomes. J Artif Intell Res 69:1255–1285
                                                                                              13
28   Page 50 of 52                                                                         K. Berahmand et al.
Petersson H, Gustafsson D, Bergstrom D (2016) Hyperspectral image analysis using deep learning-a review.
       In: 2016 sixth international conference on image processing theory, tools and applications (IPTA),
       IEEE, pp 1–6
Pratella D, Ait-El-Mkadem Saadi S, Bannwarth S, Paquis-Fluckinger V, Bottini S (2021) A survey of
       autoencoder algorithms to pave the diagnosis of rare diseases. Int J Mol Sci 22(19):10891
Preechakul K, Chatthee N, Wizadwongsa S, Suwajanakorn S (2022) Diffusion autoencoders: Toward a
       meaningful and decodable representation. In: Proceedings of the IEEE/CVF conference on computer
       vision and pattern recognition, pp 10619–10629
Qian J, Song Z, Yao Y, Zhu Z, Zhang X (2022) A review on autoencoder based representation learning for
       fault detection and diagnosis in industrial processes. Chemometrics Intell Lab Syst, 104711
Ray P, Reddy SS, Banerjee T (2021) Various dimension reduction techniques for high dimensional data
       analysis: a review. Artif Intell Rev 54(5):3473–3515. https://doi.org/10.1007/s10462-020-09928-0
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection.
       In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: Explicit invariance
       during feature extraction. In: Proceedings of the 28th international conference on international
       conference on machine learning, pp 833–840
Rituerto-González E, Peláez-Moreno C (2021) End-to-end recurrent denoising autoencoder embeddings for
       speaker identification. Neural Comput Appl 33(21):14429–14439
Ruff L, Vandermeulen RA, Görnitz N, Binder A, Müller E, Müller K-R, Kloft M (2019) Deep semi-
       supervised anomaly detection. arXiv preprint arXiv:1906.02694
Rumelhart DE, Hinton GE, Williams RJ, et al (1985) Learning internal representations by error propagation.
       Institute for Cognitive Science, University of California, San Diego La
Rusnac A-L, Grigore O (2022) CNN architectures and feature extraction methods for EEG imaginary
       speech recognition. Sensors 22(13):4679
Sae-Ang B-I, Kumwilaisak W, Kaewtrakulpong P (2022) Semi-supervised learning for defect segmentation
       with autoencoder auxiliary module. Sensors 22(8):2915
Sagha H, Cummins N, Schuller B (2017) Stacked denoising autoencoders for sentiment analysis: a review.
       Wiley Interdiscip Rev Data Min Knowl Discov 7(5):1212
Saha S, Minku LL, Yao X, Sendhoff B, Menzel S (2022) Split-ae: An autoencoder-based disentanglement
       framework for 3d shape-to-shape feature transfer. In: 2022 international joint conference on neural
       networks (IJCNN), IEEE, pp 1–9
Sakurada M, Yairi T (2014) Anomaly detection using autoencoders with nonlinear dimensionality
       reduction. In: Proceedings of the MLSDA 2014 2nd workshop on machine learning for sensory data
       analysis, pp. 4–11
Salehi A, Davulcu H (2019) Graph attention auto-encoders. arXiv preprint arXiv:1905.10715
Salha G, Limnios S, Hennequin R, Tran V-A, Vazirgiannis M (2019) Gravity-inspired graph autoencoders
       for directed link prediction. In: Proceedings of the 28th ACM international conference on information
       and knowledge management, pp 589–598
Sayed HM, ElDeeb HE, Taie SA (2023) Bimodal variational autoencoder for audiovisual speech
       recognition. Mach Learn 112(4):1201–1226
Seki S, Kameoka H, Tanaka K, Kaneko T (2023) Jsv-vc: Jointly trained speaker verification and voice
       conversion models. In: ICASSP 2023-2023 IEEE international conference on acoustics, speech and
       signal processing (ICASSP), IEEE, pp 1–5
Semeniuta S, Severyn A, Barth E (2017) A hybrid convolutional variational autoencoder for text generation.
       arXiv preprint arXiv:1702.02390
Seyfioğlu MS, Özbayoğlu AM, Gürbüz SZ (2018) Deep convolutional autoencoder for radar-based
       classification of similar aided and unaided human activities. IEEE Trans Aerosp Electron Syst
       54(4):1709–1723
Shankar V, Parsana S (2022) An overview and empirical comparison of natural language processing (NLP)
       models and an introduction to and empirical application of autoencoder models in marketing. J Acad
       Mark Sci 50(6):1324–1350
Shi D, Zhao C, Wang Y, Yang H, Wang G, Jiang H, Xue C, Yang S, Zhang Y (2022) Multi actor hierarchical
       attention critic with RNN-based feature extraction. Neurocomputing 471:79–93
Shixin P, Kai C, Tian T, Jingying C (2022) An autoencoder-based feature level fusion for speech emotion
       recognition. Digital Commun Netw
Shrestha N (2021) Factor analysis as a tool for survey analysis. Am J Appl Math Stat 9(1):4–11
Singh A, Ogunfunmi T (2022) An overview of variational autoencoders for source separation, finance, and
       bio-signal applications. Entropy 24(1):55
13
Autoencoders and their applications in machine learning: a…                                Page 51 of 52   28
Smatana M, Butka P (2019) Topicae: a topic modeling autoencoder. Acta Polytechnica Hungarica
      16(4):67–86
Solorio-Fernández S, Carrasco-Ochoa JA, Martínez-Trinidad JF (2022) A survey on feature
      selection methods for mixed data. Artif Intell Rev 55(4):2821–2846. https://doi.org/10.1007/
      s10462-021-10072-6
Song Y, Hyun S, Cheong Y-G (2021) Analysis of autoencoders for network intrusion detection. Sensors
      21(13):4294
Song C, Liu F, Huang Y, Wang L, Tan T (2013) Auto-encoder based data clustering. In: Progress in Pattern
      Recognition, Image Analysis, Computer Vision, and Applications: 18th Iberoamerican Congress,
      CIARP 2013, Havana, Cuba, November 20-23, 2013, Proceedings, Part I 18, pp 117–124. Springer
Srikotr T (2022) The improved speech spectral envelope compression based on VQ-VAE with adversarial
      technique. Thesis
Strub F, Mary J, Gaudel R (2016) Hybrid collaborative filtering with autoencoders. arXiv preprint arXiv:
      1603.00806
Strub F, Mary J, Philippe P (2015) Collaborative filtering with stacked denoising autoencoders and sparse
      inputs. In: NIPS workshop on machine learning for ecommerce
Su Y, Li J, Plaza A, Marinoni A, Gamba P, Chakravortty S (2019) DAEN: deep autoencoder networks for
      hyperspectral unmixing. IEEE Trans Geosci Remote Sens 57(7):4309–4321
Sudo T, Kanishima Y, Yanagihashi H (2021) A study of anomalous sound detection using autoencoder for
      quality determination and condition diagnosis. IEICE Tech. Rep. 121(284):20–25
Talpur N, Abdulkadir SJ, Alhussian H, Hasan MH, Aziz N, Bamhdi A (2023) Deep neuro-fuzzy system
      application trends, challenges, and future perspectives: a systematic survey. Artif Intell Rev
      56(2):865–913. https://doi.org/10.1007/s10462-022-10188-3
Tanveer M, Rastogi A, Paliwal V, Ganaie M, Malik A, Del Ser J, Lin C-T (2023) Ensemble deep learning in
      speech signal tasks: a review. Neurocomputing 126436
Thai HH, Hieu ND, Van Tho N, Do Hoang H, Duy PT, Pham V-H (2022) Adversarial autoencoder and generative
      adversarial networks for semi-supervised learning intrusion detection system. In: 2022 RIVF international
      conference on computing and communication technologies (RIVF), IEEE, pp 584–589
Tian Y, Xu Y, Zhu Q-X, He Y-L (2022) Novel stacked input-enhanced supervised autoencoder integrated
      with gated recurrent unit for soft sensing. IEEE Trans Instrum Meas 71:1–9
Tian H, Zhang L, Li S, Yao M, Pan G (2023) Pyramid-VAE-GAN: transferring hierarchical latent
      variables for image inpainting. Comput Visual Med pp 1–15
Todd JT (2004) The visual perception of 3d shape. Trends Cogn Sci 8(3):115–121
Tripathi M (2021) Facial image denoising using autoencoder and UNET. Herit Sustain Dev 3(2):89–96
Vahdat A, Kautz J (2020) Nvae: a deep hierarchical variational autoencoder. Adv Neural Inf Process
      Syst 33:19667–19679
Van den Oord A, Dieleman S, Schrauwen B (2013) Deep content-based music recommendation. Adv
      Neural Inform Process Syst 26
Van Der Maaten L, Postma EO, van den Herik HJ et al (2009) Dimensionality reduction: a comparative
      review. J Mach Learn Res 10(66–71):13
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A, Bottou L (2010) Stacked denoising
      autoencoders: Learning useful representations in a deep network with a local denoising criterion.
      J Mach Learn Res 11(12)
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A, Bottou L (2010) Stacked denoising
      autoencoders: Learning useful representations in a deep network with a local denoising criterion.
      J Mach Learn Res 11(12)
Wang W, Yang D, Chen F, Pang Y, Huang S, Ge Y (2019) Clustering with orthogonal autoencoder.
      IEEE Access 7:62421–62432
Wang G, Karnan L, Hassan FM (2022) Face feature point detection based on nonlinear high-dimensional
      space. Int J Syst Assurance Eng Manag 13(Suppl 1):312–321
Wang D, Cui P, Zhu W (2016) Structural deep network embedding. In: Proceedings of the 22nd ACM
      SIGKDD international conference on knowledge discovery and data mining, pp 1225–1234
Wang D, Li T, Deng P, Zhang F, Huang W, Zhang P, Liu J (2023) A generalized deep learning clustering
      algorithm based on non-negative matrix factorization. ACM Trans Knowledge Discovery Data
Wang C, Pan S, Long G, Zhu X, Jiang J (2017) Mgae: Marginalized graph autoencoder for graph
      clustering. In: Proceedings of the 2017 ACM on conference on information and knowledge
      management, pp 889–898
Wang H, Wang N, Yeung D-Y (2015) Collaborative deep learning for recommender systems. In:
      Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data
      mining, pp1235–1244
                                                                                                 13
28   Page 52 of 52                                                                      K. Berahmand et al.
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
13