DATA SCIENCE CHEATSHEET
1. Learn Python and Jupyter The most popular methods for analysing and visualising AUDIO are:
Python is one of the most popular programming languages using the Viewing and Transforming: Scipy
data science field. Its the easiest to pick up and will give you Visualisation: Numpy
maximum bang for buck. Focus on learning:
Data structures - Base Types, Lists, Dictionaries, 5. Preprocessing and transformation
Looping - for, in, while, The core goal of preprocessing and transformation is to get your
TE
Control Flow and Operators data ready for modelling. Learn how to perform these tasks:
Creating Functions and Classes Structured: handling missing values, normalizing and scaling
data, splitting your data into dependent and independent values
Jupyter Notebook and Jupyter Lab are the most common Data and creating a training and testing dataset
OT
Science development environments for Data Science with Python. Images: checking your images are valid, labelling images using
labelme and labelImg, performing image augmentation using
2. Identifying data science tasks OpenCV
Common Data Science Tasks and the Algorithms behind them Text: removing punctuation, stripping out stop words, applying
Sales Forecasting - Regression lemmatization and tokenization
Churn Prediction - Binary Classification Audio: conversion of .wav files to spectrograms
N
Customer Propensity to Buy - Regression
Market Basket Analysis - Association Rules 6. Modelling, Algorithms and Evaluation
Sign Language Recognition - Object Detection Supervised
RE
Defect Analysis - Semantic Segmentation Structured Regression - Random Forest Regressor, Gradient
Human Pose Modelling - Posenet Boosting Regressor
Structured Classification - Random Forest Classifier, Gradient
How to find data science tasks? Boosting Classifier
FIRST: Search for “Machine Learning Examples for <Industry You’re Image Classification - Keras Sequential Neural Network
Interested In> industry” Object Detection - Tensorflow Single Shot Detector
S
THEN: Look for examples of those tasks on GitHub or Kaggle to get Semantic Segmentation - Tensorflow Mark R-CNN
an idea of how they’re structured Pose Estimation - PoseNet
Reinforcement Learning - Stable Baselines
LA
3. Understand the types of data you’ll encounter Sentiment Analysis - Text Blob Sentiment
and how to work with them
You’ll encounter lots of different types of data during your journey as Unsupervised
a data scientist. Its useful to know how to work with each of them. Clustering - K-Means
Structured - CSV, Excel, SQL Views Anomaly Detection - One Class SVM
HO
Unstructured - Images, Video, Text, Audio Dimensionality Reduction - Principal Component Analysis
4. Analysing and Visualising your datasets 7. Deployment and Integration
The most popular methods for analysing and visualising Being able to deploy your models to cloud services allows you to
STRUCTURED data. integrate your work with other parts of the business or startup.
Viewing and Transforming: Microsoft Excel, Pandas (Python Cloud Machine Learning Providers: Watson Machine Learning,
IC
library for data analytics, think Excel but using Python), Numpy AWS Sagemaker, Azure ML
(Python library for basic array and mathematical functions)
Visualisation: Matplotlib (Most popular Python plotting library), You should also get an understanding of how to deploy your models
Seaborn (Easy to use and great looking visualisation library) using Open Source tools including:
N
FastAPI
The most popular methods for analysing and visualising IMAGES Django
and VIDEOS are: Flask
Viewing and Transforming: OpenCV (ridiculously powerful
computer vision Python library) 8. Domain Expertise and Presentation Skills
Visualisation: Matplotlib (handles viewing images in a Jupyter How to learn about your industry?
Notebook using plt.imshow() method) Read blog posts, financial reports and industry white papers. Look
for data science examples in that industry.
The most popular methods for analysing and visualising TEXT and
NATURAL LANGUAGE are: How to improve your presentation skills?
Viewing and Transforming: NLTK, TextBlob and Spacy Join a Toastmasters club
Visualisation: Matplotlib Practice presenting at a meetup
Make a YouTube video describing a project your built!
NICHOLAS RENOTTE