Statistics in Data Science with Python

Mahe Karim
Front End Developer
ID - 162-15-7770
Area of Interest:
 Full Stack Developer
 Data Analyst
 Animation
 Why Not Jump Into Passive Income ? ;)
Who I Am ?

Implement of our course
Step 1
Step 2
Step 3
•Statistics
•Data Science
•Python

Basic RoadTo Data Science
Statistics
Machine
Learning
Deep
Learning
Programming
Language
( Python / R )
Data Science

Smartest way to be a
Data Scientist / Analyst • Core Statistics
• Statistical Machine
Learning
• Probabilistic
Modeling
Step 1
Statistics
• Database
• Data Mining
• Data Design
Step 2
Computing
• Deep Learning
• NLP
• DataAnalysis
Step 3
ML

3 steps to learning the statistics and
probability required for data science:
• Descriptive statistics, distributions,
hypothesis testing, and regression.
Core Statistics
Concepts
• Conditional probability, priors,
posteriors, and maximum likelihood.
BayesianThinking
• Learn basic machine concepts and
how statistics fits in.
Intro to Statistical
Machine Learning

Verified course include STATISTICS

Most ImportantTopics In Statistics
• Part 1 - Simple Linear Regression
Part 2 - Multivariate Linear Regression
Part 3 - Logistic Regression
Part 4 - Multivariate Logistic Regression
Part 5 - Neural Networks
Part 6 - SupportVector Machines
Part 7 - K-Means Clustering & PCA
Part 8 - Anomaly Detection & Recommendation

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
path = os.getcwd() + 'dataex1data1.txt'
data = pd.read_csv(path, header=None, names=['Population', 'Profit'])
data.head()

data.plot(kind='scatter', x='Population', y='Profit',
figsize=(12,8))

Implementing Simple Linear Regression
def computeCost(X, y, theta):
inner = np.power(((X * theta.T) - y), 2)
return np.sum(inner) / (2 * len(X)
# append a ones column to the front of the data set
data.insert(0, 'Ones', 1)
# set X (training data) and y (target variable)
cols = data.shape[1]
X = data.iloc[:,0:cols-1]
y = data.iloc[:,cols-1:cols]
# convert from data frames to numpy matrices
X = np.matrix(X.values)
y = np.matrix(y.values)
theta = np.matrix(np.array([0,0]))

x = np.linspace(data.Population.min(), data.Population.max(), 100)
f = g[0, 0] + (g[0, 1] * x)
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')

Resources:
 https://elitedatascience.com/learn-statistics-for-data-science
 https://github.com/datasciencemasters/go
 An Introduction to Statistical Learning with Applications in R Gareth
James, DanielaWitten,Trevor Hastie and RobertTibshirani
 http://www.johnwittenauer.net/machine-learning-exercises-in-
python-part-1/
 Think Stats

Statistics in Data Science with Python

More Related Content

What's hot

Similar to Statistics in Data Science with Python

Recently uploaded

Statistics in Data Science with Python