KEMBAR78
Class X Data Science | PDF | Data Science | Data
0% found this document useful (0 votes)
22 views29 pages

Class X Data Science

Data science is an interdisciplinary field that utilizes scientific methods to extract insights from structured and unstructured data, combining expertise in programming, mathematics, and statistics. It is crucial for organizations to develop data science capabilities to remain competitive in the era of big data, with applications ranging from fraud detection to recommendation systems. The data science lifecycle includes stages such as data capture, processing, analysis, and communication to inform decision-making.

Uploaded by

mittalkrish51
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views29 pages

Class X Data Science

Data science is an interdisciplinary field that utilizes scientific methods to extract insights from structured and unstructured data, combining expertise in programming, mathematics, and statistics. It is crucial for organizations to develop data science capabilities to remain competitive in the era of big data, with applications ranging from fraud detection to recommendation systems. The data science lifecycle includes stages such as data capture, processing, analysis, and communication to inform decision-making.

Uploaded by

mittalkrish51
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

DATA SCIENCE

CLASS X
Lets’ Watch a small video about
Data Science

https://youtu.be/X3paOmcrTjQ
What is Data Science?
Data Science, it is also known as data driven
science, which makes use of scientific methods,
processes and systems to extract knowledge or
insights from data in various forms, i.e either
structured or unstructured.
Data science is an interdisciplinary field focused
on extracting knowledge from data sets, which
are typically large (see big data).

The field encompasses analysis, preparing data


for analysis, and presenting findings to inform
high-level decisions in an organization.
Data science is the field of study that combines domain
expertise, programming skills, and knowledge of
Mathematics and statistics to extract meaningful insights
from data.

Data science practitioners apply machine learning


algorithms to numbers, text, images, video, audio, and
more to produce artificial intelligence (AI) systems to
perform tasks that ordinarily require human intelligence.

In turn, these systems generate insights which analysts


and business users can translate into tangible business
value.
Why Data Science is Important?
• More and more companies are coming to
realize the importance of data science, AI, and
machine learning.
• Regardless of industry or size, organizations
that wish to remain competitive in the age of
big data need to efficiently develop and
implement data science capabilities or risk
being left behind.
A Brief History of Data Science
• The term data science has existed for the
better part of the last 30 years and was
originally used as a substitute for "computer
science" in 1960.
• Approximately 15 years later, the term was
used to define the survey of data processing
methods used in different applications. In
2001, data science was introduced as an
independent discipline.
How Data Science Is Applied
• Data science incorporates tools from multiple
disciplines to gather a data set, process, and
derive insights from the data set, extract
meaningful data from the set, and interpret it
for decision-making purposes.
• The disciplinary areas that make up the data
science field include mining, statistics,
machine learning, analytics, and
programming.
Data science generally has a five-stage
lifecycle that consists of:
• Capture: Data acquisition, data entry, signal
reception, data extraction
• Maintain: Data warehousing, data cleansing, data
staging, data processing, data architecture
• Process: Data mining, clustering/classification,
data modeling, data summarization
• Communicate: Data reporting, data visualization,
business intelligence, decision making
• Analyze: Exploratory/confirmatory, predictive
analysis, regression, text mining, qualitative
analysis
What Can Data Science be Used for?
• Anomaly detection (fraud, disease, crime, etc.)
• Automation and decision-making (background checks,
credit worthiness, etc.)
• Classifications (in an email server, this could mean
classifying emails as “important” or “junk”)
• Forecasting (sales, revenue and customer retention)
• Pattern detection (weather patterns, financial market
patterns, etc.)
• Recognition (facial, voice, text, etc.)
• Recommendations (based on learned preferences,
recommendation engines can refer you to movies,
restaurants and books you may like)
Application of Data Science
• Fraud and Risk Detection
• Genetics & Genomics
• Internet Search
• Targeted Advertising
• Website Recommendations
• Airline Route Planning
Data Scientist Job Roles
• Some of the prominent Data Scientist job
titles are:
Data Scientist
Data Engineer
Data Architect
Data Administrator
Data Analyst
Business Analyst
Data/Analytics Manager
Business Intelligence Manager
Revisiting AI Life Cycle
Problem Scoping
Data Acquisition
Data Exploration
Modelling
Evaluation
Data Collection
• Data collection is defined as the procedure of
collecting, measuring and analyzing accurate
insights for research using standard validated
techniques.
• Data collection is an exercise which does not
require even a tiny bit of technological
knowledge.
The data collection methods:
• Interviews.
• Questionnaires and surveys.
• Observations.
• Documents and records.
• Focus groups.
• Oral histories.
Data Collection
• Data collection is an exercise which does not
require even a tiny bit of technological
knowledge.
• But when it comes to analysing the data, it
becomes a tedious process for humans as it is all
about numbers and alpha-numerical data.
• That is where Data Science comes into the
picture.
• It not only gives us a clearer idea around the
dataset, but also adds value to it by providing
deeper and clearer analyses around it.
• And as AI gets incorporated in the process,
predictions and suggestions by the machine
become possible on the same.
• For the data domain-based projects, majorly
the type of data used is in numerical or alpha-
numerical format and such datasets are
curated in the form of tables.
• Such databases are very commonly found in
any institution for record maintenance and
other purposes.
Some examples of datasets which you
must already be aware of are:
BANK
Databases of loans issued, account holder,
locker owners, employee registrations, bank
visitors, etc.
ATM Machines
• Usage details per day, cash denominations
transaction details, visitor details, etc.
Movie Theatres
• Movie details, tickets sold offline, tickets sold
online, refreshment purchases, etc.
Sources of Data
• There exist various sources of data from
where we can collect any type of data
required and the data collection process can
be categorised in two ways: Offline and
Online.
Offline Data Collection
• Sensors
• Surveys
• Interviews
• Observations
Online Data Collection
• Open-sourced Government Portals
• Reliable Websites (Kaggle)
• World Organisations’ open-sourced statistical
websites
While accessing data from any of the data sources,
following points should be kept in mind:

1. Data which is available for public usage only should be


taken up.
2. Personal datasets should only be used with the consent
of the owner.
3. One should never breach someone’s privacy to collect
data.
4. Data should only be taken form reliable sources as the
data collected from random sources can be wrong or
unusable.
5. Reliable sources of data ensure the authenticity of data
which helps in proper training of the AI model.
Types of Data
• For Data Science, usually the data is collected
in the form of tables. These tabular datasets
can be stored in different formats. Some of
the commonly used formats are:
• CSV
• Spreadsheet
• SQL
Data Access
• After collecting the data, to be able to use it
for programming purposes, we should know
how to access the same in a Python code.
• To make our lives easier, there exist various
Python packages which help us in accessing
structured data (in tabular form) inside the
code.
Some of the Python packages:
• Python Lists
• NumPy (Numerical Python)
• Python Pandas
• Data Visualization using Matplotlib
• Basic Statistics with Python
(Mean, Median, Mode, Standard Deviation,
Variance etc.)

You might also like