KEMBAR78
DS Life Cycle | PDF
Data Science Life Cycle
Presented By:
Praanav Bhowmik
Durgesh Gupta
Agenda:
● What is Data Science?
● Qualities of Data Scientist
● Job Role & Skill Sets
● Data Science Life Cycle
○ Business Understanding
○ Data acquisition and understanding
○ Modeling
○ Deployment
○ Customer Acceptance
○ Maintenance
● Contribution of Data Scientist to the Organization.
Introduction
● Data Science is a combination of multiple disciplines that uses statistics, data
analysis, and machine learning to analyze data and to extract information,
knowledge and gain insights from it.
● Data Science is about data gathering, analysis and decision-making.
● Finding patterns in data, through analysis,visualizations, and make future
predictions are the key things.
● Make use of theory and methods to provide concrete and actionable solutions to
complex problems.
Qualities of Data Scientist
Qualities of Data Scientist
● Discover valuable insights from huge amounts of data, which can then be used to
shape company strategies and achieve business objectives.
● Data Scientists Empower Management to Make Smarter Decisions.
● Data Scientists Make it Easier to Achieve Business Goals
● Challenge the Workforce to Embrace Data
● Refine Target Audiences
● Identify New Revenue Opportunities
● Analytical mind and business acumen
Job Responsibilities
● Fetching information from various sources and analyzing it to get a clear
understanding of how an organization performs.
● Uses statistical and analytical methods plus AI tools to automate specific
processes within the organization and develop smart solutions to business
challenges.
● Build predictive models and machine learning algorithms.
● Project information using data visualization tools.
● Propose solutions and strategies to tackle business challenges.
Skill Set Required
● Data scientists need to use mathematics to process and structure the data they’re
dealing with.
● Probability & Statistics: Statistics allows data scientists to slice and dice through data,
extracting the insights needed to make reasonable conclusions.
● Programming: A data scientist needs to know several programming languages like
Python for writing scripts for data manipulation, analysis, and visualization. R, Java, C etc
to achieve specific goals.
● Data Management: Ability to extract data from relational databases, non-relational and
unstructured data.
● Machine Learning / Deep Learning: ML algorithms to build the model.
● Cloud Computing: Able to use and utilize data and machine learning services and
frameworks available or provided by cloud service providers.
Data Science Life Cycle
Business Understanding
● The complete cycle revolves around the enterprise goal.
● Identify the key business variables that the analysis needs to predict.
● Define the project goals by asking and refining "sharp" questions that are relevant,
specific, and unambiguous.
● Find the relevant data that helps you answer the questions that define the
objectives of the project.
Data Acquisition and
Understanding
● Real-world data sets are often noisy, are missing values, or have a host of other
discrepancies.
● Aim is to produce a clean, high-quality data set whose relationship to the target
variables is understood.
● Develop a solution architecture of the data pipeline that refreshes and scores the
data regularly
Modeling
● Determine the optimal data features for the machine-learning model.
● Create an informative machine-learning model that predicts the target most
accurately.
● The process for model training includes the following steps:
○ Split the input data randomly for modeling into a training data set and a test
data set.
○ Build the models by using the training data set.
○ Evaluate the training and the test data set. Use a series of competing
machine-learning algorithms along with the various associated tuning
parameters (known as a parameter sweep) that are geared toward
answering the question of interest with the current data.
Deployment
● Deploy models with a data pipeline to a production or production-like
environment for final user acceptance.
● After you have a set of models that perform well, you can operationalize them for
other applications to consume. Depending on the business requirements,
predictions are made either in real time or on a batch basis.
● To deploy models, you expose them with an open API interface.
● The interface enables the model to be easily consumed from various applications.
Customer Acceptance
● Confirm that the pipeline, the model, and their deployment in a production
environment satisfy the customer's objectives.
● The customer should validate that the system meets their business needs and
that it answers the questions with acceptable accuracy to deploy the system to
production for use by their client's application.
● The project is handed-off to the entity responsible for operations.
Monitoring & Maintenance
● The final but continuous phase of ML development is model monitoring and
maintenance.
● Post-deployment, you need to monitor your model to ensure it continues to
perform as expected.
● ML model requires regular tuning and updating to meet performance
expectations.
● Failing to perform this essential step may result in diminishing model accuracy
over time.
Contribution to the Organization
Contribution
● Data Science helps businesses monitor, manage, and collect performance
measures to improve decision-making across the organization.
● Companies may use trend analysis to make critical decisions to improve consumer
engagement, corporate performance, and boost revenue.
● Data Science models make use of current data and may simulate a variety of
operations. As a result, businesses may look for candidates with a professional
certificate who have studied the best courses for data analytics.
● Data Science assists firms in identifying and refining target audiences by integrating
existing data with additional data points to provide meaningful insights.
References
● Microsoft Data Science Life Cycle Overview
● Data Science Introduction-W3Schools
Thank You

DS Life Cycle

  • 1.
    Data Science LifeCycle Presented By: Praanav Bhowmik Durgesh Gupta
  • 2.
    Agenda: ● What isData Science? ● Qualities of Data Scientist ● Job Role & Skill Sets ● Data Science Life Cycle ○ Business Understanding ○ Data acquisition and understanding ○ Modeling ○ Deployment ○ Customer Acceptance ○ Maintenance ● Contribution of Data Scientist to the Organization.
  • 3.
    Introduction ● Data Scienceis a combination of multiple disciplines that uses statistics, data analysis, and machine learning to analyze data and to extract information, knowledge and gain insights from it. ● Data Science is about data gathering, analysis and decision-making. ● Finding patterns in data, through analysis,visualizations, and make future predictions are the key things. ● Make use of theory and methods to provide concrete and actionable solutions to complex problems.
  • 4.
  • 5.
    Qualities of DataScientist ● Discover valuable insights from huge amounts of data, which can then be used to shape company strategies and achieve business objectives. ● Data Scientists Empower Management to Make Smarter Decisions. ● Data Scientists Make it Easier to Achieve Business Goals ● Challenge the Workforce to Embrace Data ● Refine Target Audiences ● Identify New Revenue Opportunities ● Analytical mind and business acumen
  • 6.
    Job Responsibilities ● Fetchinginformation from various sources and analyzing it to get a clear understanding of how an organization performs. ● Uses statistical and analytical methods plus AI tools to automate specific processes within the organization and develop smart solutions to business challenges. ● Build predictive models and machine learning algorithms. ● Project information using data visualization tools. ● Propose solutions and strategies to tackle business challenges.
  • 7.
    Skill Set Required ●Data scientists need to use mathematics to process and structure the data they’re dealing with. ● Probability & Statistics: Statistics allows data scientists to slice and dice through data, extracting the insights needed to make reasonable conclusions. ● Programming: A data scientist needs to know several programming languages like Python for writing scripts for data manipulation, analysis, and visualization. R, Java, C etc to achieve specific goals. ● Data Management: Ability to extract data from relational databases, non-relational and unstructured data. ● Machine Learning / Deep Learning: ML algorithms to build the model. ● Cloud Computing: Able to use and utilize data and machine learning services and frameworks available or provided by cloud service providers.
  • 8.
  • 10.
    Business Understanding ● Thecomplete cycle revolves around the enterprise goal. ● Identify the key business variables that the analysis needs to predict. ● Define the project goals by asking and refining "sharp" questions that are relevant, specific, and unambiguous. ● Find the relevant data that helps you answer the questions that define the objectives of the project.
  • 11.
    Data Acquisition and Understanding ●Real-world data sets are often noisy, are missing values, or have a host of other discrepancies. ● Aim is to produce a clean, high-quality data set whose relationship to the target variables is understood. ● Develop a solution architecture of the data pipeline that refreshes and scores the data regularly
  • 12.
    Modeling ● Determine theoptimal data features for the machine-learning model. ● Create an informative machine-learning model that predicts the target most accurately. ● The process for model training includes the following steps: ○ Split the input data randomly for modeling into a training data set and a test data set. ○ Build the models by using the training data set. ○ Evaluate the training and the test data set. Use a series of competing machine-learning algorithms along with the various associated tuning parameters (known as a parameter sweep) that are geared toward answering the question of interest with the current data.
  • 13.
    Deployment ● Deploy modelswith a data pipeline to a production or production-like environment for final user acceptance. ● After you have a set of models that perform well, you can operationalize them for other applications to consume. Depending on the business requirements, predictions are made either in real time or on a batch basis. ● To deploy models, you expose them with an open API interface. ● The interface enables the model to be easily consumed from various applications.
  • 14.
    Customer Acceptance ● Confirmthat the pipeline, the model, and their deployment in a production environment satisfy the customer's objectives. ● The customer should validate that the system meets their business needs and that it answers the questions with acceptable accuracy to deploy the system to production for use by their client's application. ● The project is handed-off to the entity responsible for operations.
  • 15.
    Monitoring & Maintenance ●The final but continuous phase of ML development is model monitoring and maintenance. ● Post-deployment, you need to monitor your model to ensure it continues to perform as expected. ● ML model requires regular tuning and updating to meet performance expectations. ● Failing to perform this essential step may result in diminishing model accuracy over time.
  • 16.
    Contribution to theOrganization
  • 17.
    Contribution ● Data Sciencehelps businesses monitor, manage, and collect performance measures to improve decision-making across the organization. ● Companies may use trend analysis to make critical decisions to improve consumer engagement, corporate performance, and boost revenue. ● Data Science models make use of current data and may simulate a variety of operations. As a result, businesses may look for candidates with a professional certificate who have studied the best courses for data analytics. ● Data Science assists firms in identifying and refining target audiences by integrating existing data with additional data points to provide meaningful insights.
  • 18.
    References ● Microsoft DataScience Life Cycle Overview ● Data Science Introduction-W3Schools
  • 19.