Lecture # 3:
Advanced Introduction to Data
Programming Science
M Hassan Zaib Computer Science Department FUUAST 1
M Hassan Zaib Computer Science Department FUUAST 2
Oh Really! Is this a Data Scientist?
M Hassan Zaib Computer Science Department FUUAST 3
What is Data Science
Data Science is the area of study which involves
extracting insights from vast amounts of data by the use
of various scientific methods, algorithms, and processes.
It helps you to discover hidden patterns from the raw
data.
Data Science is an interdisciplinary field that allows you
to extract knowledge from structured or unstructured
data.
M Hassan Zaib Computer Science Department FUUAST 4
Why Data Science?
Data is the oil for today's world. With the right tools,
technologies, algorithms, we can use data and convert
it into a distinctive business advantage
Data Science can help you to detect fraud using
advanced machine learning algorithms
It helps you to prevent any significant monetary losses
Allows to build intelligence ability in machines
You can perform sentiment analysis to gauge customer
brand loyalty
M Hassan Zaib Computer Science Department FUUAST 5
It enables you to take better and faster decisions
Helps you to recommend the right product to the right
customer to enhance your business
M Hassan Zaib Computer Science Department FUUAST 6
M Hassan Zaib Computer Science Department FUUAST 7
Statistics:
Statistics is the most critical unit in Data science. It is the method or science of
collecting and analyzing numerical data in large quantities to get useful insights.
Visualization:
Visualization technique helps you to access huge amounts of data in easy to
understand and digestible visuals.
Machine Learning:
Machine Learning explores the building and study of algorithms which learn to
make predictions about unforeseen/future data.
Deep Learning:
Deep Learning method is new machine learning research where the algorithm
selects the analysis model to follow.
M Hassan Zaib Computer Science Department FUUAST 8
What includes in Data Science
Basic statistical and mathematical foundations for data
science
Data acquisition and cleaning.
Exploratory data analysis and visualization
Feature engineering
Model creation and validation
M Hassan Zaib Computer Science Department FUUAST 9
Data Science
Process
M Hassan Zaib Computer Science Department FUUAST 10
Discovery
Discovery step involves acquiring data from all the
identified internal & external sources which helps you
to answer the business question.
The data can be:
Logs from webservers
Data gathered from social media
Census datasets
Data streamed from online sources using APIs
M Hassan Zaib Computer Science Department FUUAST 11
Data Preparation
Data can have lots of inconsistencies like missing value,
blank columns, incorrect data format which needs to be
cleaned. You need to process, explore, and condition
data before modeling. The cleaner your data, the
better are your predictions.
M Hassan Zaib Computer Science Department FUUAST 12
Model Planning
In this stage, you need to determine the method and
technique to draw the relation between input variables.
Planning for a model is performed by using different
statistical formulas and visualization tools. SQL analysis
services, R, and SAS/access are some of the tools used
for this purpose.
M Hassan Zaib Computer Science Department FUUAST 13
Model Building
In this step, the actual model building process starts.
Here, Data scientist distributes datasets for training and
testing. Techniques like association, classification, and
clustering are applied to the training data set. The
model once prepared is tested against the "testing"
dataset.
M Hassan Zaib Computer Science Department FUUAST 14
Operations
In this stage, you deliver the final baselined model with
reports, code, and technical documents. Model is
deployed into a real-time production environment after
thorough testing.
M Hassan Zaib Computer Science Department FUUAST 15
Communicate Results
In this stage, the key findings are communicated to all
stakeholders. This helps you to decide if the results of
the project are a success or a failure based on the
inputs from the model.
M Hassan Zaib Computer Science Department FUUAST 16
Python
Arc Language
Tools for
R Language
Java
Data Science SAS
Matlab
Some Others
M Hassan Zaib Computer Science Department FUUAST 17
Optimize Internet Search
Recommendation System
Application
Area of Image and Speech Recognition
Data
Science Gaming
Many others
M Hassan Zaib Computer Science Department FUUAST 18
What do Data Scientist do?
National Security
Cyber Security
Business Analytics
Engineering
Healthcare
And more ….
M Hassan Zaib Computer Science Department FUUAST 19
Main Libraries
Numpy
Pandas
Scipy
M Hassan Zaib Computer Science Department FUUAST 20
Visualizations
Sea born
Matpotlib
M Hassan Zaib Computer Science Department FUUAST 21
Machine and Deep Learning
SciKit-Learn
TensorFlow
Keras
M Hassan Zaib Computer Science Department FUUAST 22
Any Query?
M Hassan Zaib Computer Science Department FUUAST 23