KEMBAR78
What Is Data Science? Data Science Course - Data Science Tutorial For Beginners | Edureka | PPTX
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What is Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Will You Learn Today?
What is Data Science
Need Of Data Science Use case of Data Science
Business Intelligence
vs. Data Science
Tools used in Data Science Lifecycle of Data Science
1 2 3
4 6
5
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Need Of Data Science
Revolution
of
Technology
Unstructured Data
Data Storage
Lack of scientific
insights
Data Science
Prediction
Decision making
Pattern discovery
Data Flow
Lack of predictive
analytics
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Need Of Data Science
THEN
NOW
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Need Of Data Science
You can use Data Science to
 Recommend the right product to the right customer to
enhance business.
 Predict the characteristics of high LTV customers and
helps in customer segmentation.
 Build intelligence and ability in machines.
 Predict fraudulent transactions beforehand.
 Perform sentiment analysis to predict the outcome
of elections.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science?
 Data Science is a blend of various tools,
algorithms, and machine learning principles with
the goal to discover hidden patterns from the raw
data.
 Data Science is primarily used to make decisions
and predictions.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Now, lets understand Data
Science with the help of some
use cases.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science?
 Basketball teams are using data for tracking team
strategies and outcome of matches.
 Below parameters will be used for model building.
• Average pass time of ball.
• Number of successful passes.
• Speed and accuracy of successful baskets.
• Area of court the player on average is
shadowing.
 Models built on the basis of data science algorithms
help in pattern discovery of player game.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science?
 Amazon has huge amount of consumer purchasing
data.
 The data consists of consumer demographics (age,
sex, location), purchasing history, past browsing
history.
 Based on this data, Amazon segments its
customers, draws a pattern and recommends the
right product to the right customer at the right
time.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What Is Data Science?
Google self driving car is a smart, driverless car.
 It collects data from environment through
sensors.
 Takes decisions like when to speed up, when to
speed down, when to overtake and when to turn.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Use Cases Of Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Skills Of Data Scientist
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Role Of A Data Scientist
The Data Scientist will be responsible for designing and creating processes and layouts for complex, large-
scale data sets used for modeling, data mining, and research purposes.
Responsibilities
 Selecting features, building and optimizing classifiers using machine learning techniques.
 Data mining using state-of-the-art methods.
 Extending company’s data with third party sources of information when needed.
 Processing, cleansing, and verifying the integrity of data for analysis.
 Building predictive models using Machine Learning algorithms.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
BI Vs. Data Science
Characteristics Business Intelligence Data Science
Perspective Looking Backward Looking Forward
Data Sources
Structured
(Usually SQL, often Data Warehouse)
Both Structured and Unstructured
( logs, cloud data, SQL, NoSQL, text)
Approach Statistics and Visualization
Statistics, Machine Learning, Graph
Analysis, Neuro- linguistic Programming
(NLP)
Focus Past and Present Present and Future
Tools Pentaho, Microsoft BI, QlikView, R RapidMiner, BigML, Weka, R
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Tools Used In Data Science
Data analysis Data warehousing Data visualization Machine learning
• R
• Spark
• Python
• SAS
• Hadoop
• SQL
• Hive
• R
• Tableau
• Raw
• Spark
• Mahout
• Azure ML studio
Commonly used tools by Data Scientists
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
What if we could predict the
occurrence of diabetes and
take appropriate measures
beforehand to prevent it?
Definitely! Let me take you
through the steps to
predict the vulnerable
patients.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
 Discovery involves acquiring data from all the identified internal and
external sources that can help answer the business question.
 This data could be
• logs from webservers
• social media data
• census datasets
• data streamed from online sources via APIs
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
Doctor gets this data from the medical
history of the patient.
Attributes:
npreg – Number of times pregnant
glucose – Plasma glucose concentration
bp – Blood pressure
skin – Triceps skinfold thickness
bmi – Body mass index
ped – Diabetes pedigree function
age – Age
income – Income
Income is an irrelevant attribute in the
prediction of diabetes
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
 The data can have a lot of inconsistencies like missing values, blank
columns, abrupt values and incorrect data format which need to be
cleaned.
 It is required to explore, preprocess and condition data prior to modeling.
 This will help you to spot the outliers and establish a relationship between
the variables.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
This data has lot of anomalies and needs cleansing before further
analysis can be done.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
We clean and preprocess this data by removing the outliers, filling up the
null values and normalizing the data type.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
 Here, we determine the methods and techniques to draw the relationships
between variable.
 Apply Exploratory Data Analytics (EDA) using various statistical formulas and
visualization tools.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
Use of visualization techniques like histograms, line graphs, box plots to get a fair
idea of the distribution of data.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
 Develop datasets for training and testing purposes.
 Consider whether existing tools will suffice for running the models.
 Analyze various learning techniques like classification, association and
clustering to build the model.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
This is a decision tree based on different attributes.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Data Preparation
Model Planning
Model Building
Operationalize
Communicate Results
Deliver final reports, briefings, code and technical documents.
Implement pilot project in a real-time production environment.
Look for performance constraints if any.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Initialization
Model Planning
Model Building
Deployment
Communicate Results
 Identify all the key findings and communicate to the stakeholders.
 Explaining the model and result to medical authorities.
 Determine if the results of the project are a success or a failure based on the
criteria developed.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Lifecycle Of Data Science
Discovery
Initialization
Model Planning
Model Building
Deployment
Communicate Results
 Diabetes Positive set:
• glucose > 154
• glucose >127 & <= 154 + bmi >30.9
• glucose<=127 + pregnant >5
• glucose<=127 + pregnant <=5 + age >28
• glucose<=127 + pregnant <=5 + age <=28 +bmi > 30.9
 Diabetes Negative set:
• glucose > 154
• glucose >127 & <= 154 + bmi <=30.9
• glucose<=127 + pregnant <=5 + age <=28 +bmi <= 30.9
 We can use this decision tree result to know whether the patient is
vulnerable to diabetes or not.
www.edureka.co/data-science
Edureka’s Data Science Certification Training
Course Details
Go to www.edureka.co/data-science
Get Edureka Certified in Data Science Today!
What our learners have to say about us!
Shravan Reddy says- “I would like to recommend any one who
wants to be a Data Scientist just one place: Edureka. Explanations
are clean, clear, easy to understand. Their support team works
very well.. I took the Data Science course and I'm going to take
Machine Learning with Mahout and then Big Data and Hadoop”.
Gnana Sekhar says - “Edureka Data science course provided me a very
good mixture of theoretical and practical training. LMS pre recorded
sessions and assignments were very good as there is a lot of
information in them that will help me in my job. Edureka is my
teaching GURU now...Thanks EDUREKA.”
Balu Samaga says - “It was a great experience to undergo and get
certified in the Data Science course from Edureka. Quality of the
training materials, assignments, project, support and other
infrastructures are a top notch.”
www.edureka.co/data-science
Edureka’s Data Science Certification Training

What Is Data Science? Data Science Course - Data Science Tutorial For Beginners | Edureka

  • 1.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training What is Data Science
  • 2.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training What Will You Learn Today? What is Data Science Need Of Data Science Use case of Data Science Business Intelligence vs. Data Science Tools used in Data Science Lifecycle of Data Science 1 2 3 4 6 5
  • 3.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Need Of Data Science Revolution of Technology Unstructured Data Data Storage Lack of scientific insights Data Science Prediction Decision making Pattern discovery Data Flow Lack of predictive analytics
  • 4.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Need Of Data Science THEN NOW
  • 5.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Need Of Data Science You can use Data Science to  Recommend the right product to the right customer to enhance business.  Predict the characteristics of high LTV customers and helps in customer segmentation.  Build intelligence and ability in machines.  Predict fraudulent transactions beforehand.  Perform sentiment analysis to predict the outcome of elections.
  • 6.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training What Is Data Science
  • 7.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training What Is Data Science?  Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data.  Data Science is primarily used to make decisions and predictions.
  • 8.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Now, lets understand Data Science with the help of some use cases.
  • 9.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training What Is Data Science?  Basketball teams are using data for tracking team strategies and outcome of matches.  Below parameters will be used for model building. • Average pass time of ball. • Number of successful passes. • Speed and accuracy of successful baskets. • Area of court the player on average is shadowing.  Models built on the basis of data science algorithms help in pattern discovery of player game.
  • 10.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training What Is Data Science?  Amazon has huge amount of consumer purchasing data.  The data consists of consumer demographics (age, sex, location), purchasing history, past browsing history.  Based on this data, Amazon segments its customers, draws a pattern and recommends the right product to the right customer at the right time.
  • 11.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training What Is Data Science? Google self driving car is a smart, driverless car.  It collects data from environment through sensors.  Takes decisions like when to speed up, when to speed down, when to overtake and when to turn.
  • 12.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Use Cases Of Data Science
  • 13.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Skills Of Data Scientist
  • 14.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Role Of A Data Scientist The Data Scientist will be responsible for designing and creating processes and layouts for complex, large- scale data sets used for modeling, data mining, and research purposes. Responsibilities  Selecting features, building and optimizing classifiers using machine learning techniques.  Data mining using state-of-the-art methods.  Extending company’s data with third party sources of information when needed.  Processing, cleansing, and verifying the integrity of data for analysis.  Building predictive models using Machine Learning algorithms.
  • 15.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training BI Vs. Data Science Characteristics Business Intelligence Data Science Perspective Looking Backward Looking Forward Data Sources Structured (Usually SQL, often Data Warehouse) Both Structured and Unstructured ( logs, cloud data, SQL, NoSQL, text) Approach Statistics and Visualization Statistics, Machine Learning, Graph Analysis, Neuro- linguistic Programming (NLP) Focus Past and Present Present and Future Tools Pentaho, Microsoft BI, QlikView, R RapidMiner, BigML, Weka, R
  • 16.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Tools Used In Data Science Data analysis Data warehousing Data visualization Machine learning • R • Spark • Python • SAS • Hadoop • SQL • Hive • R • Tableau • Raw • Spark • Mahout • Azure ML studio Commonly used tools by Data Scientists
  • 17.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science
  • 18.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training What if we could predict the occurrence of diabetes and take appropriate measures beforehand to prevent it? Definitely! Let me take you through the steps to predict the vulnerable patients.
  • 19.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science
  • 20.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results  Discovery involves acquiring data from all the identified internal and external sources that can help answer the business question.  This data could be • logs from webservers • social media data • census datasets • data streamed from online sources via APIs
  • 21.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results Doctor gets this data from the medical history of the patient. Attributes: npreg – Number of times pregnant glucose – Plasma glucose concentration bp – Blood pressure skin – Triceps skinfold thickness bmi – Body mass index ped – Diabetes pedigree function age – Age income – Income Income is an irrelevant attribute in the prediction of diabetes
  • 22.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results  The data can have a lot of inconsistencies like missing values, blank columns, abrupt values and incorrect data format which need to be cleaned.  It is required to explore, preprocess and condition data prior to modeling.  This will help you to spot the outliers and establish a relationship between the variables.
  • 23.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results This data has lot of anomalies and needs cleansing before further analysis can be done.
  • 24.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results We clean and preprocess this data by removing the outliers, filling up the null values and normalizing the data type.
  • 25.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results  Here, we determine the methods and techniques to draw the relationships between variable.  Apply Exploratory Data Analytics (EDA) using various statistical formulas and visualization tools.
  • 26.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results Use of visualization techniques like histograms, line graphs, box plots to get a fair idea of the distribution of data.
  • 27.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results  Develop datasets for training and testing purposes.  Consider whether existing tools will suffice for running the models.  Analyze various learning techniques like classification, association and clustering to build the model.
  • 28.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results This is a decision tree based on different attributes.
  • 29.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Data Preparation Model Planning Model Building Operationalize Communicate Results Deliver final reports, briefings, code and technical documents. Implement pilot project in a real-time production environment. Look for performance constraints if any.
  • 30.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Initialization Model Planning Model Building Deployment Communicate Results  Identify all the key findings and communicate to the stakeholders.  Explaining the model and result to medical authorities.  Determine if the results of the project are a success or a failure based on the criteria developed.
  • 31.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Lifecycle Of Data Science Discovery Initialization Model Planning Model Building Deployment Communicate Results  Diabetes Positive set: • glucose > 154 • glucose >127 & <= 154 + bmi >30.9 • glucose<=127 + pregnant >5 • glucose<=127 + pregnant <=5 + age >28 • glucose<=127 + pregnant <=5 + age <=28 +bmi > 30.9  Diabetes Negative set: • glucose > 154 • glucose >127 & <= 154 + bmi <=30.9 • glucose<=127 + pregnant <=5 + age <=28 +bmi <= 30.9  We can use this decision tree result to know whether the patient is vulnerable to diabetes or not.
  • 32.
    www.edureka.co/data-science Edureka’s Data ScienceCertification Training Course Details Go to www.edureka.co/data-science Get Edureka Certified in Data Science Today! What our learners have to say about us! Shravan Reddy says- “I would like to recommend any one who wants to be a Data Scientist just one place: Edureka. Explanations are clean, clear, easy to understand. Their support team works very well.. I took the Data Science course and I'm going to take Machine Learning with Mahout and then Big Data and Hadoop”. Gnana Sekhar says - “Edureka Data science course provided me a very good mixture of theoretical and practical training. LMS pre recorded sessions and assignments were very good as there is a lot of information in them that will help me in my job. Edureka is my teaching GURU now...Thanks EDUREKA.” Balu Samaga says - “It was a great experience to undergo and get certified in the Data Science course from Edureka. Quality of the training materials, assignments, project, support and other infrastructures are a top notch.”
  • 33.

Editor's Notes