KEMBAR78
Statistics in Data Science with Python | PPTX
Statistics in Data Science with Python
Mahe Karim
Front End Developer
ID - 162-15-7770
Area of Interest:
 Full Stack Developer
 Data Analyst
 Animation
 Why Not Jump Into Passive Income ? ;)
Who I Am ?
Implement of our course
Step 1
Step 2
Step 3
•Statistics
•Data Science
•Python
Basic RoadTo Data Science
Statistics
Machine
Learning
Deep
Learning
Programming
Language
( Python / R )
Data Science
Smartest way to be a
Data Scientist / Analyst • Core Statistics
• Statistical Machine
Learning
• Probabilistic
Modeling
Step 1
Statistics
• Database
• Data Mining
• Data Design
Step 2
Computing
• Deep Learning
• NLP
• DataAnalysis
Step 3
ML
3 steps to learning the statistics and
probability required for data science:
• Descriptive statistics, distributions,
hypothesis testing, and regression.
Core Statistics
Concepts
• Conditional probability, priors,
posteriors, and maximum likelihood.
BayesianThinking
• Learn basic machine concepts and
how statistics fits in.
Intro to Statistical
Machine Learning
Verified course include STATISTICS
Most ImportantTopics In Statistics
• Part 1 - Simple Linear Regression
Part 2 - Multivariate Linear Regression
Part 3 - Logistic Regression
Part 4 - Multivariate Logistic Regression
Part 5 - Neural Networks
Part 6 - SupportVector Machines
Part 7 - K-Means Clustering & PCA
Part 8 - Anomaly Detection & Recommendation
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
path = os.getcwd() + 'dataex1data1.txt'
data = pd.read_csv(path, header=None, names=['Population', 'Profit'])
data.head()
Data Set:
data.plot(kind='scatter', x='Population', y='Profit',
figsize=(12,8))
Implementing Simple Linear Regression
def computeCost(X, y, theta):
inner = np.power(((X * theta.T) - y), 2)
return np.sum(inner) / (2 * len(X)
# append a ones column to the front of the data set
data.insert(0, 'Ones', 1)
# set X (training data) and y (target variable)
cols = data.shape[1]
X = data.iloc[:,0:cols-1]
y = data.iloc[:,cols-1:cols]
# convert from data frames to numpy matrices
X = np.matrix(X.values)
y = np.matrix(y.values)
theta = np.matrix(np.array([0,0]))
x = np.linspace(data.Population.min(), data.Population.max(), 100)
f = g[0, 0] + (g[0, 1] * x)
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(x, f, 'r', label='Prediction')
ax.scatter(data.Population, data.Profit, label='Traning Data')
ax.legend(loc=2)
ax.set_xlabel('Population')
ax.set_ylabel('Profit')
ax.set_title('Predicted Profit vs. Population Size')
Prediction ;) :D :p <3
Resources:
 https://elitedatascience.com/learn-statistics-for-data-science
 https://github.com/datasciencemasters/go
 An Introduction to Statistical Learning with Applications in R Gareth
James, DanielaWitten,Trevor Hastie and RobertTibshirani
 http://www.johnwittenauer.net/machine-learning-exercises-in-
python-part-1/
 Think Stats

Statistics in Data Science with Python

  • 1.
    Statistics in DataScience with Python
  • 2.
    Mahe Karim Front EndDeveloper ID - 162-15-7770 Area of Interest:  Full Stack Developer  Data Analyst  Animation  Why Not Jump Into Passive Income ? ;) Who I Am ?
  • 4.
    Implement of ourcourse Step 1 Step 2 Step 3 •Statistics •Data Science •Python
  • 5.
    Basic RoadTo DataScience Statistics Machine Learning Deep Learning Programming Language ( Python / R ) Data Science
  • 6.
    Smartest way tobe a Data Scientist / Analyst • Core Statistics • Statistical Machine Learning • Probabilistic Modeling Step 1 Statistics • Database • Data Mining • Data Design Step 2 Computing • Deep Learning • NLP • DataAnalysis Step 3 ML
  • 7.
    3 steps tolearning the statistics and probability required for data science: • Descriptive statistics, distributions, hypothesis testing, and regression. Core Statistics Concepts • Conditional probability, priors, posteriors, and maximum likelihood. BayesianThinking • Learn basic machine concepts and how statistics fits in. Intro to Statistical Machine Learning
  • 8.
  • 9.
    Most ImportantTopics InStatistics • Part 1 - Simple Linear Regression Part 2 - Multivariate Linear Regression Part 3 - Logistic Regression Part 4 - Multivariate Logistic Regression Part 5 - Neural Networks Part 6 - SupportVector Machines Part 7 - K-Means Clustering & PCA Part 8 - Anomaly Detection & Recommendation
  • 10.
    import os import numpyas np import pandas as pd import matplotlib.pyplot as plt %matplotlib inline path = os.getcwd() + 'dataex1data1.txt' data = pd.read_csv(path, header=None, names=['Population', 'Profit']) data.head()
  • 11.
  • 12.
  • 13.
    Implementing Simple LinearRegression def computeCost(X, y, theta): inner = np.power(((X * theta.T) - y), 2) return np.sum(inner) / (2 * len(X) # append a ones column to the front of the data set data.insert(0, 'Ones', 1) # set X (training data) and y (target variable) cols = data.shape[1] X = data.iloc[:,0:cols-1] y = data.iloc[:,cols-1:cols] # convert from data frames to numpy matrices X = np.matrix(X.values) y = np.matrix(y.values) theta = np.matrix(np.array([0,0]))
  • 14.
    x = np.linspace(data.Population.min(),data.Population.max(), 100) f = g[0, 0] + (g[0, 1] * x) fig, ax = plt.subplots(figsize=(12,8)) ax.plot(x, f, 'r', label='Prediction') ax.scatter(data.Population, data.Profit, label='Traning Data') ax.legend(loc=2) ax.set_xlabel('Population') ax.set_ylabel('Profit') ax.set_title('Predicted Profit vs. Population Size')
  • 15.
  • 16.
    Resources:  https://elitedatascience.com/learn-statistics-for-data-science  https://github.com/datasciencemasters/go An Introduction to Statistical Learning with Applications in R Gareth James, DanielaWitten,Trevor Hastie and RobertTibshirani  http://www.johnwittenauer.net/machine-learning-exercises-in- python-part-1/  Think Stats