Data science:
Data science is the domain of study that deals with vast volumes of data using
modern tools and
techniques to find unseen patterns, derive meaningful information, and make business
decisions.
Data science uses complex machine learning algorithms to build predictive models.
The data used for analysis can come from many different sources and presented in
various
formats.
Data science is about extraction, preparation, analysis, visualization, and
maintenance of
information. It is a cross disciplinary field which uses scientific methods and
processes to draw
insights from data.
The Data Science Lifecycle
Data science’s lifecycle consists of five distinct stages, each with its own tasks:
Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This stage
involves
gathering raw structured and unstructured data.
Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data
Architecture.
This stage covers taking the raw data and putting it in a form that can be used.
Process: Data Mining, Clustering/Classification, Data Modeling, Data Summarization.
Data
scientists take the prepared data and examine its patterns, ranges, and biases to
determine how
useful it will be in predictive analysis.
Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining,
Qualitative
Analysis. Here is the real meat of the lifecycle. This stage involves performing the
various
analyses on the data.
INTRODUCTION TO DATA SCIENCE
2
CSE NRCM P.LAKSHMI PRASANNA(ASST.PROFESSOR)
Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision
Making. In
this final step, analysts prepare the analyses in easily readable forms such as
charts, graphs, and
reports.
Evolution of Data Science: Growth & Innovation
Data science was born from the idea of merging applied statistics with computer
science. The
resulting field of study would use the extraordinary power of modern computing.
Scientists
realized they could not only collect data and solve statistical problems but also
use that data to
solve real-world problems and make reliable fact-driven predictions.
1962: American mathematician John W. Tukey first articulated the data science dream.
In his
now-famous article “The Future of Data Analysis,” he foresaw the inevitable
emergence of a new
field nearly two decades before the first personal computers. While Tukey was ahead
of his time,
he was not alone in his early appreciation of what would come to be known as “data
science.”
1977: The theories and predictions of “pre” data scientists like Tukey and Naur
became more
concrete with the establishment of The International Association for Statistical
Computing
(IASC), whose mission was “to link traditional statistical methodology, modern
computer
technology, and the knowledge of domain experts in order to convert data into
information and
knowledge.”
1980s and 1990s: Data science began taking more significant strides with the
emergence of the
first Knowledge Discovery in Databases (KDD) workshop and the founding of the
International
Federation of Classification Societies (IFCS).
1994: Business Week published a story on the new phenomenon of “Database Marketing.”
It
described the process by which businesses were collecting and leveraging enormous
amounts of
data to learn more about their customers, competition, or advertising techniques.
INTRODUCTION TO DATA SCIENCE
3
CSE NRCM P.LAKSHMI PRASANNA(ASST.PROFESSOR)
1990s and early 2000s: We can clearly see that data science has emerged as a
recognized and
specialized field. Several data science academic journals began to circulate, and
data science
proponents like Jeff Wu and William S. Cleveland continued to help develop and
expound upon
the necessity and potential of data science.
2000s: Technology made enormous leaps by providing nearly universal access to
internet
connectivity, communication, and (of course) data collection.
2005: Big data enters the scene. With tech giants such as Google and Facebook
uncovering large
amounts of data, new technologies capable of processing them became necessary.
Hadoop rose to
the challenge, and later on Spark and Cassandra made their debuts.
2014: Due to the increasing importance of data, and organizations’ interest in
finding patterns and
making better business decisions, demand for data scientists began to see dramatic
growth in
different parts of the world.
2015: Machine learning, deep learning, and Artificial Intelligence (AI) officially
enter the realm
of data science.
2018: New regulations in the field are perhaps one of the biggest aspects in the
evolution in data
science.
2020s: We are seeing additional breakthroughs in AI, machine learning, and an
ever-moreincreasing demand for qualified professionals in Big Data
Roles in Data Science
Data Analyst
Data Engineers
Database Administrator
Machine Learning Engineer
INTRODUCTION TO DATA SCIENCE
4
CSE NRCM P.LAKSHMI PRASANNA(ASST.PROFESSOR)
Data Scientist
Data Architect
Statistician
Business Analyst
Data and Analytics Manager
1. Data Analyst
Data analysts are responsible for a variety of tasks including visualisation,
munging, and
processing of massive amounts of data. They also have to perform queries on the
databases from
time to time. One of the most important skills of a data analyst is optimization.
Few Important Roles and Responsibilities of a Data Analyst include:
Extracting data from primary and secondary sources using automated tools
Developing and maintaining databases
Performing data analysis and making reports with recommendations
To become a data analyst: SQL, R, SAS, and Python are some of the sought-after
technologies for
data analysis.