Python, Data science, and Unsupervised learning

Disclaimer
Presentations are intended for educational
purposes only and do not replace
independent professional judgment.
Statements of fact and opinions expressed
are those of the participants individually
and don’t necessarily reflect those of
blibli.com.
Blibli.com does not endorse or approve,
and assumes no responsibility for, the
content, accuracy or completeness of the
information presented.

Python, data science, and
unsupervised learning
Hendri Karisma
hendri.karisma@gdn-commerce.com / situkangsayur@gmail.com

Hendri Karisma
• Sr. Research and Development
Engineer at blibli.com (PT. Global
Digital Niaga)
• Rnd Team for Machine Learning
• Working for Fraud Detection System.
Current working in dynamic
recommendation system project.

Definition of Informatics
“Automation of Information” –
Prof. Dr. Ing. Iping Supriana

Solution Approachment
• Analytical (Exact)
Example :
– analytics solution :
– Numerical solution
– Error = | 7.25 – 22/3| = |7.25-7.33|=0.08333
• Numerical (Aprox)
– Is numerical methods just about ML method that we know in the
book?
– Newton raphson, Gauss Elimination, Gauss-Jordan, Jacobi method,
Gauss-Seidel, Lagrange, Newton Gregory, Richardson Interpolation,
etc.

Machine Learning Definition
“A computer program is said to learn
from experience E with respect to
some class of tasks T and performance
measure P, if its performance at tasks
in T, as measured by P, improves with
experience E.” – Prof. Tom Mitchel

Machine Learning Perspective
● Information Theory (Decission Tree :
ID-Tree, C4.5, etc)
● Probability (Bayessian : Naive
Bayes, Belief Network, etc)
● Graphical Model (Belief network, HMM,
CRF, Neural Network, etc)
● Numerical Method or Regression
(Stochastic Gradient Descent/Ascent:
Linear Regression, Multiple Linear
Regression, Neural Network, E-M
Algorithm, HMM)

Machine Learning
• Supervised
• Unsupervised
• Reinforcement Learning
• Semi-Supervised
• Deep Learning

Tools/libs in python
● Numpy
● Scipy
● Pandas
● Scikit-learn
● Matplotlib
● seaborn
● Tensorflow
*pydata.org
*anaconda
● Other Tech (to
support ML) :
– Apache Kafka
– Apache Spark
– Db : mongo, postgre
– elasticsearch
– CUDA/OpenCL

Numpy, scipy, padas, and sk-learn
● Numpy & scipy: Arrays, Indexing, Slicing,
and Iterating, Reshaping, Shallow vs deep
copy, Broadcasting, Indexing (advanced),
Matrices, Matrix decompositions, Scipy on
top numpy
● Pandas : Reading data, Selecting columns
and rows, Filtering, Vectorized string
operations, Missing values, Handling time,
Time series, On top numpy.
● SK-Learn : Feature extraction, Classification,
Regression, Clustering, Dimension reduction,
Model selection

What we do in blibli using python
● Data flow
● Data pooling
● Data preprocessing
● Machine Learning Service/app

Our system that using python for ML
● Personalize recommendation system
● Data engineering (especially the
data flow for ML engine)
● Machine learning engine
● Fraud detection experiments

EM Algorithms
Repeat until convergence{
}

EM Algorithms
There are 3 keys that (as far as I know) almost
always used in EM-Algorithm :
● Data Distribution
● Maximum Likelihood Estimation (MLE)
● Estimation-Maximization (EM)
*Today we will use the Gaussian distribution for
sample case

EM Algorithms
The algorithm has 2 main steps just like the name
of the algorithm:
– Expectation :
– Maximization:
*repeat until get maximum likelihood :

Gaussian Multivariate
● Gaussian Distribution :
● Gaussian Distribution Multivariate :

EM-Algorithm for Mixture Gaussian
● Expectation :
● Maximization :
*Log likelihood :

Fraud – without target class/labels
● These are anomalous data
● Anomaly data usually have one or
some small group of data
● A lot of features without labels
------------------------------------------
● We need unsupervised algorithm
(EM-Algorithm)

Case Anomaly Detection
● Credit Card data with fraudulant data.

Distributed System/Scale Out
Python script
Presistence
Computation
Supervisor/Service
Using python

THANK YOU
Any question?
*we are hiring*

Python, Data science, and Unsupervised learning

More Related Content

What's hot

Similar to Python, Data science, and Unsupervised learning

More from Hendri Karisma

Recently uploaded

Python, Data science, and Unsupervised learning