KEMBAR78
Machine Learning Engineer Cheatsheet | PDF | Cluster Analysis | Machine Learning
0% found this document useful (0 votes)
31 views3 pages

Machine Learning Engineer Cheatsheet

The document provides an extensive overview of machine learning concepts, including types such as supervised, unsupervised, and reinforcement learning, along with their key terminologies and techniques. It covers data processing methods, core algorithms for regression and classification, deep learning architectures like CNNs and RNNs, and model evaluation metrics. Additionally, it discusses advanced topics like generative adversarial networks and hyperparameter tuning strategies.

Uploaded by

pawnsacrifice94
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views3 pages

Machine Learning Engineer Cheatsheet

The document provides an extensive overview of machine learning concepts, including types such as supervised, unsupervised, and reinforcement learning, along with their key terminologies and techniques. It covers data processing methods, core algorithms for regression and classification, deep learning architectures like CNNs and RNNs, and model evaluation metrics. Additionally, it discusses advanced topics like generative adversarial networks and hyperparameter tuning strategies.

Uploaded by

pawnsacrifice94
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MACHINE LEARNING

FUNDAMETAL CONCEPTS
MACHINE LEARNING TYPES BASIC TERMINOLOGY
Supervised Learning: Training a model on labeled data to predict outcomes. Features: Input variables used to make predictions.
Regression: Predicting a continuous output variable. Labels/Targets: Output variables the model aims to predict.
Classification: Predicting a categorical output variable. Training Set: Data used to train the model.
Unsupervised Learning: Finding patterns and structure in unlabeled data. Validation Set: Data used to tune hyperparameters and evaluate model performance during training.
Test Set: Data used to evaluate the final model's performance on unseen data.
Clustering: Grouping similar data points together.
Overfitting: When a model learns the training data too well, performing poorly on unseen data.
Dimensionality Reduction: Reducing the number of variables while preserving essential
Underfitting: When a model is too simple to capture the underlying patterns in the data.
information. Bias: Error due to overly simplistic assumptions in the learning algorithm.
Reinforcement Learning: Training an agent to make decisions in an environment to maximize Variance: Error due to the model's sensitivity to small fluctuations in the training data.
cumulative rewards. Hyperparameters: Parameters set before training, controlling the learning process.
Agent: The learner and decision-maker. Parameters: Internal model parameters learned during training.
Environment: The world or system the agent interacts with. Model: A mathematical representation learned from data to make predictions.
Reward: Feedback signal indicating the desirability of an action. Loss Function: Measures the error between predicted and actual values during training (e.g., MSE,
State: The current situation or configuration of the environment. Cross-Entropy).
Action: What the agent does in a given state. Cost Function: The average loss over the entire training dataset.
Optimizer: Algorithm that updates model parameters to minimize the loss function (e.g., Gradient
Semi-supervised Learning: Training a model on a dataset with both labeled and unlabeled
Descent).
data, typically when labeled data is scarce.
Evaluation Metrics: Quantify model performance (e.g., Accuracy, F1-score, R-squared).
Self-supervised Learning: A form of unsupervised learning where the data provides the
Cross-Validation: Technique to assess model performance by splitting data into multiple folds and
supervision. For example, predicting a part of the input from other parts of the input. training/testing on different combinations.
Transfer Learning: Leveraging knowledge gained from one task to improve performance on a Regularization: Techniques to prevent overfitting by adding a penalty to the loss function.
related task, often by using a pre-trained model. Ensemble Learning: Combining multiple models to improve overall performance.
Bagging (Bootstrap Aggregating): Training multiple models on different subsets of the training data
and averaging their predictions.
Boosting: Training models sequentially, where each model focuses on correcting the errors of the
previous ones.
Stacking: Training multiple models and then using another model (meta-learner) to combine their
predictions.

DATA PROCESSING CORE ALGORITHMS &


Data Cleaning: Handling missing values, outliers, and inconsistencies in the data.
TECHNIQUES
Missing Values: Data points where values are not recorded. Strategies include imputation (mean,
median, model-based) or removal.
Outliers: Extreme values that deviate significantly from other data points. Can be handled by SUPERVISED LEARNING
removal, transformation, or using robust models.
Regression:
Noise: Random errors or variations in the data.
Linear Regression (Simple, Multiple): Modeling the relationship between a dependent variable and one or more
Data Transformation: Applying mathematical functions to change the distribution or scale of features.
independent variables using a linear equation.
Normalization: Scaling features to a specific range (e.g., 0 to 1).
Polynomial Regression: Extending linear regression by adding polynomial terms to capture non-linear
Standardization: Transforming features to have zero mean and unit variance. relationships.
Log Transform: Applying the logarithm to reduce the impact of extreme values. Regularized Regression: Adding a penalty term to the linear regression cost function to prevent overfitting.
Feature Scaling: Ensuring features have similar ranges to prevent features with larger values from Ridge Regression (L2 Regularization): Adds a penalty proportional to the square of the magnitude of
dominating the learning process. coefficients.
Feature Encoding: Converting categorical features into numerical representations. Lasso Regression (L1 Regularization): Adds a penalty proportional to the absolute value of the
One-Hot Encoding: Creating binary features for each category. magnitude of coefficients. Can perform feature selection.
Label Encoding: Assigning a unique integer to each category. Elastic Net: A combination of Ridge and Lasso regression.
Feature Engineering: Creating new features from existing ones to improve model performance. Support Vector Regression (SVR): Using Support Vector Machines (SVM) for regression tasks.
Feature Selection: Choosing the most relevant features for the model. Decision Tree Regression: Building a tree-like structure where each node represents a decision based on a
feature, and each leaf node represents a predicted value.
Filter Methods: Selecting features based on statistical measures (e.g., correlation, chi-squared).
Random Forest Regression: An ensemble of decision trees for regression.
Wrapper Methods: Evaluating different subsets of features using a specific model.
Gradient Boosting Regression: An ensemble method that builds trees sequentially, each correcting the errors
Embedded Methods: Feature selection is built into the model training process (e.g., Lasso
of the previous ones.
regression). GBM (Gradient Boosting Machines): A general framework for gradient boosting.
Dimensionality Reduction: Reducing the number of features while preserving important information. XGBoost (Extreme Gradient Boosting): A highly efficient and popular implementation of gradient
PCA (Principal Component Analysis): Transforming features into a new set of uncorrelated features boosting.
(principal components) that capture the most variance in the data. LightGBM: Another fast and efficient gradient boosting framework, often faster than XGBoost.
t-SNE (t-distributed Stochastic Neighbor Embedding): A non-linear technique primarily used for CatBoost: A gradient boosting library that handles categorical features well.
visualization, preserving local neighborhood structures in lower dimensions. Classification:
LDA (Linear Discriminant Analysis): A supervised dimensionality reduction technique that maximizes Logistic Regression: A linear model for binary classification that uses the logistic function to predict the
the separation between different classes. probability of a sample belonging to a particular class.
Data Splitting: Dividing the data into training, validation, and test sets. Support Vector Machines (SVM): Finding the optimal hyperplane that separates different classes with the
largest margin.
Handling Imbalanced Datasets: Addressing datasets where one class significantly outnumbers others.
k-Nearest Neighbors (k-NN): Classifying a sample based on the majority class of its k nearest neighbors in
Oversampling: Duplicating samples from the minority class.
the training data.
Undersampling: Removing samples from the majority class.
Naive Bayes: A probabilistic classifier based on Bayes' theorem, assuming independence between
SMOTE (Synthetic Minority Over-sampling Technique): Creating synthetic samples for the minority features.
class. Decision Trees: Building a tree-like structure for classification, where each node represents a decision based
on a feature, and each leaf node represents a class label.
Random Forests: An ensemble of decision trees for classification.
Gradient Boosting Classifiers: Using gradient boosting for classification tasks.
GBM, XGBoost, LightGBM, CatBoost: (See descriptions under Regression)
Neural Networks (Multilayer Perceptron): A network of interconnected nodes (neurons) organized in layers,
capable of learning complex non-linear relationships.
By Shailesh Shakya @Beginnersblog & OpenAILearning
UNSUPERVISED LEARNING DEEP LEARNING
1. Clustering:
k-Means: Partitioning data into k clusters, where each data point belongs to the cluster with
the nearest mean (centroid).
Hierarchical Clustering: Building a hierarchy of clusters, either by agglomerative (bottom- CNN NEURAL NETWORK RNN
up) or divisive (top-down) approach.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Grouping data
points based on their density, identifying clusters of high density separated by areas of
low density. NEURAL NETWORK
Gaussian Mixture Models (GMM): Modeling the data distribution as a mixture of
Gaussian distributions, where each Gaussian represents a cluster. Perceptron: The simplest form of a neural network, a single-layer model with a linear activation function.
Activation Functions: Introduce non-linearity into neural networks, allowing them to learn complex patterns.
2. Dimensionality Reduction: Sigmoid: Outputs values between 0 and 1, often used in the output layer for binary classification.
ReLU (Rectified Linear Unit): Outputs the input if it's positive, otherwise outputs 0. A common choice for
Principal Component Analysis (PCA): (See description under Data Preprocessing) hidden layers.
Linear Discriminant Analysis (LDA): (See description under Data Preprocessing) Tanh (Hyperbolic Tangent): Outputs values between -1 and 1.
t-distributed Stochastic Neighbor Embedding (t-SNE): (See description under Data Softmax: Outputs a probability distribution over multiple classes, often used in the output layer for multi-
Preprocessing) class classification.
Autoencoders: Neural networks trained to reconstruct their input, learning a compressed Backpropagation: Algorithm for computing the gradients of the loss function with respect to the network's
weights, used to update the weights during training.
representation of the data in a hidden layer.
Gradient Descent: An optimization algorithm that iteratively updates model parameters in the direction of
the negative gradient of the loss function.
SGD (Stochastic Gradient Descent): Updates parameters using the gradient calculated from a single
data point or a small batch of data points.

CONVOLUTIONAL NEURAL NETWORKS


Adam (Adaptive Moment Estimation): An adaptive learning rate optimization algorithm that combines
the benefits of RMSprop and Momentum.
(CNNS): RMSprop (Root Mean Square Propagation): An adaptive learning rate method that maintains a moving
average of the squared gradients.
Loss Functions: Measure the error between predicted and actual values in neural networks.
Convolutional Layers: Apply filters to input data to extract features.
MSE (Mean Squared Error): Commonly used for regression tasks.
Pooling Layers: Reduce the spatial dimensions of feature maps, reducing the number of parameters and
Cross-Entropy: Commonly used for classification tasks.
computation.
Regularization: Techniques to prevent overfitting in neural networks.
Filters/Kernels: Small matrices that slide across the input data, performing element-wise multiplication and
Dropout: Randomly dropping out neurons during training, forcing the network to learn more robust
summation to produce feature maps.
features.
Padding: Adding extra pixels around the borders of the input to control the output size.
L1/L2 Regularization: Adding a penalty term to the loss function based on the magnitude of the weights
Stride: The number of pixels a filter moves across the input in each step.
(see descriptions under Regularized Regression).
Common Architectures:LeNet: One of the first successful CNN architectures, used for digit recognition.
Batch Normalization: Normalizing the activations of each layer to have zero mean and unit variance,
AlexNet: A deeper CNN that achieved significant improvements in image classification.
which can speed up training and improve performance.
VGG: A CNN architecture known for its simplicity and use of small 3x3 filters.
Weight Initialization: Setting initial values for the weights of a neural network. Proper initialization can help
ResNet (Residual Network): Introduced residual connections to enable the training of very deep networks.
with faster and more stable training. Examples include Xavier/Glorot initialization and He initialization.
Inception: Uses modules with multiple filter sizes to capture features at different scales.
Learning rate: Controls the size of the steps taken during gradient descent.
Momentum: Helps accelerate gradient descent by accumulating past gradients.
Batch size: The number of training examples used in one iteration of gradient descent.

RECURRENT NEURAL NETWORKS (RNNS)


Recurrent Units: Process sequential data by maintaining a hidden state that is updated at each time step.
Vanishing/Exploding Gradients: Problems that can occur during training RNNs, where gradients become too
small or too large, making it difficult to learn long-range dependencies.

MODEL EVALUATION & TUNING


Long Short-Term Memory (LSTM): A type of RNN cell designed to address the vanishing gradient problem by
using a memory cell and gates to control the flow of information.
Gated Recurrent Unit (GRU): A simplified version of LSTM that also uses gates to control information flow.
Sequence-to-Sequence Models: RNN architectures that map an input sequence to an output sequence, used in
tasks like machine translation and text summarization.
Attention Mechanisms: Allow RNNs to focus on specific parts of the input sequence when generating the output
sequence.
EVALUATION METRICS
Regression:
Mean Squared Error (MSE): Average squared difference between predicted and actual values.
OTHER DEEP LEARNING TOPICS Root Mean Squared Error (RMSE): Square root of MSE, provides an error measure in the same units as
the target variable.
Mean Absolute Error (MAE): Average absolute difference between predicted and actual values.
Generative Adversarial Networks (GANs): Consist of two networks, a generator and a discriminator, that are R-squared (Coefficient of Determination): Proportion of variance in the target variable explained by the
trained adversarially to generate realistic data. model.
Autoencoders: (See description under Unsupervised Learning). 2. Classification:
Variational Autoencoders (VAEs): A type of autoencoder that learns a probabilistic representation of the Accuracy: Proportion of correctly classified samples.
input data, allowing for generating new samples. Precision: Proportion of true positives among predicted positives (TP / (TP + FP)). Measures the ability of
Transformers: A neural network architecture based on the self-attention mechanism, which has achieved state- the classifier not to label a negative sample as positive.
of-the-art results in many NLP tasks. Recall: Proportion of true positives among actual positives (TP / (TP + FN)). Measures the ability of the
Transfer Learning in Deep Learning: Using a pre-trained deep learning model on a large dataset and fine-tuning classifier to find all the positive samples.
it for a specific task, often with a smaller dataset. F1-score: Harmonic mean of precision and recall, balances both metrics.
Object Detection: Identifying and locating objects within an image. AUC-ROC: Area under the Receiver Operating Characteristic curve, measures the model's ability to
YOLO (You Only Look Once): A real-time object detection system that performs detection in a single pass. distinguish between classes.
Faster R-CNN: A two-stage object detection system that uses a region proposal network to generate Confusion Matrix: Table summarizing the performance of a classification model, showing counts of true
candidate object regions. positives, true negatives, false positives, and false negatives.
Image Segmentation: Partitioning an image into segments, where each segment corresponds to a different
object or region. 3. Clustering:
U-Net: A CNN architecture commonly used for image segmentation, particularly in biomedical Silhouette Score: Measures how similar a data point is to its own cluster compared to other clusters.
applications. Ranges from -1 to 1, with higher values indicating better clustering.
Davies-Bouldin Index: Measures the average similarity between each cluster and its most similar cluster.
Lower values indicate better clustering.

By Shailesh Shakya @Beginnersblog & OpenAILearning


CROSS VAILDATION HYPERPARAMETER TUNING
k-Fold Cross-Validation: Dividing data into k folds, training on k-1 folds, and testing on the Grid Search: Evaluating all possible combinations of hyperparameter values within a
remaining fold, repeating k times. specified range.
Stratified k-Fold Cross-Validation: Ensures that each fold has approximately the same Random Search: Evaluating a random sample of hyperparameter combinations from a
proportion of samples from each class as the specified distribution. Often more efficient than grid search.
Leave-One-Out Cross-Validation (LOOCV): Using each data point as a test set and the Bayesian Optimization: Building a probabilistic model of the objective function (e.g.,
remaining data as the training set, repeating for all data points. Computationally expensive performance metric) and using it to select the most promising hyperparameter combinations
but useful for small datasets. to evaluate.

TOOLS & LIBRARIES

PYTHON LIBRARIES OTHER TOOLS


NumPy: Fundamental library for numerical computing in Python, providing support for arrays, matrices, and Jupyter Notebook/Lab: Interactive web-based environment for creating and sharing documents that contain
mathematical functions. live code, equations, visualizations, and narrative text.
Pandas: Powerful library for data manipulation and analysis, offering data structures like DataFrames for Google Colab: Free cloud-based Jupyter notebook environment with GPU and TPU support, provided by
efficient data handling. Google.
Scikit-learn: Comprehensive machine learning library with a wide range of algorithms, tools for model Git/GitHub: Version control system (Git) and web-based hosting service (GitHub) for tracking changes to
selection, evaluation, and preprocessing. code and collaborating on projects.
TensorFlow: Open-source deep learning framework developed by Google, known for its flexibility and SQL: Structured Query Language, used for managing and querying relational databases.
scalability. Cloud Platforms (AWS, GCP, Azure) - ML Services:AWS (Amazon Web Services): Offers SageMaker
Keras: High-level neural networks API that can run on top of TensorFlow, PyTorch, or Theano, simplifying the for building, training, and deploying machine learning models.
process of building and training deep learning models. GCP (Google Cloud Platform): Provides AI Platform for similar functionalities as SageMaker.
PyTorch: Open-source deep learning framework developed by Facebook, known for its dynamic computation Azure (Microsoft Azure): Offers Azure Machine Learning for building, deploying, and managing
graphs and ease of use. machine learning models.
Matplotlib: Plotting library for creating static, animated, and interactive visualizations in Python. MLflow: Open-source platform for managing the end-to-end machine learning lifecycle, including
Seaborn: Statistical data visualization library based on Matplotlib, providing a high-level interface for creating experiment tracking, model packaging, and deployment.
attractive and informative statistical graphics. Weights & Biases: A tool for tracking and visualizing machine learning experiments, providing insights into
Statsmodels: Library focused on statistical modeling, including regression analysis, time series analysis, and model performance and hyperparameter tuning.
hypothesis testing.

DEPLOYMENT & MLOPS


MODEL DEPLOYME NT
REST APIs (Flask, FastAPI): Creating web services that allow other applications to interact with a trained model.
Flask: A lightweight and popular web framework for building REST APIs in Python. MLOPS
FastAPI: A modern, high-performance web framework for building APIs with Python 3.7+, based on
standard Python type hints. Model Versioning: Tracking different versions of a model, including its code, data, and hyperparameters.
Containerization (Docker): Packaging a model and its dependencies into a container for consistent and Model Monitoring: Continuously tracking the performance of a deployed model and detecting issues like
reproducible deployment across different environments. data drift or model degradation.
Cloud Deployment (AWS SageMaker, Google AI Platform, Azure ML): Deploying models on cloud platforms CI/CD for Machine Learning: Applying Continuous Integration/Continuous Deployment principles to
using managed services. automate the process of building, testing, and deploying machine learning models.
Serverless Deployment: Deploying models as functions that are triggered by events, without managing servers A/B Testing: Comparing different versions of a model in a production environment to determine which one
(e.g., AWS Lambda, Google Cloud Functions, Azure Functions). performs better.

SPECIFIC TOPICS
Computer Vision:
Image Processing: Manipulating and analyzing images using techniques like filtering, edge detection,
and morphological operations. NATURAL LANGUAGE PROCESSING (NLP)
Object Detection: (See description under Deep Learning)
Image Segmentation: (See description under Deep Learning) Tokenization: Splitting text into individual words or units (tokens).
Image Classification: Assigning a label or category to an entire image. Stemming: Reducing words to their root form (e.g., running, runs, ran -> run).
Transfer Learning in CV: Using pre-trained CNN models on large datasets (e.g., ImageNet) and fine- Lemmatization: Reducing words to their base or dictionary form (e.g., better -> good).
tuning them for specific computer vision tasks.
Bag-of-Words: Representing text as a collection of unique words and their frequencies.
Time Series Analysis:
TF-IDF (Term Frequency-Inverse Document Frequency): A numerical statistic that reflects how
Autocorrelation: Correlation of a time series with its own past values.
important a word is to a document in a collection of documents.
Partial Autocorrelation: Correlation between a time series and its past values, removing the influence of
intermediate lags. Word Embeddings: Representing words as dense vectors that capture semantic relationships.
ARIMA (Autoregressive Integrated Moving Average): A class of statistical models for forecasting time Word2Vec: A popular technique for creating word embeddings by training a neural
series data. network on a large corpus of text.
Exponential Smoothing: A family of forecasting methods that use weighted averages of past GloVe (Global Vectors for Word Representation): Another method for generating word
observations. embeddings based on word co-occurrence statistics.
Prophet: A forecasting procedure developed by Facebook, designed for business time series data with
FastText: An extension of Word2Vec that also considers subword information.
seasonality and trend changes.
Sentiment Analysis: Determining the emotional tone or opinion expressed in text.
Recommender Systems: Topic Modeling (LDA): Discovering abstract "topics" that occur in a collection of documents.
Collaborative Filtering: Making recommendations based on the preferences of similar users or items.
LDA (Latent Dirichlet Allocation): A probabilistic model that assumes each document is a
Content-Based Filtering: Making recommendations based on the characteristics of items and user
profiles. mixture of topics and each topic is a distribution over words.
Hybrid Approaches: Combining collaborative and content-based filtering techniques. Text Classification: Categorizing text into predefined classes (e.g., spam detection, news
Anomaly Detection: categorization).
One-Class SVM: A variation of SVM used for anomaly detection, where the model learns a boundary Transformers (BERT, RoBERTa, GPT):BERT (Bidirectional Encoder Representations from
around the normal data points. Transformers): A powerful transformer-based model for various NLP tasks, pre-trained on
Isolation Forest: An algorithm that isolates anomalies by randomly partitioning the data space. a massive amount of text data.
Autoencoders: (See description under Unsupervised Learning and Deep Learning). Can be used for RoBERTa (A Robustly Optimized BERT Pretraining Approach): An improved version of
anomaly detection by measuring the reconstruction error. BERT with optimized training procedures.
GPT (Generative Pre-trained Transformer): A transformer-based model primarily used for
text generation, also capable of other NLP tasks.
By Shailesh Shakya @Beginnersblog & OpenAILearning

You might also like