KEMBAR78
Data Analytics Lecture 3-1 | PDF | Databases | Data
0% found this document useful (0 votes)
7 views23 pages

Data Analytics Lecture 3-1

Gx

Uploaded by

Sakshi Prajapati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views23 pages

Data Analytics Lecture 3-1

Gx

Uploaded by

Sakshi Prajapati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Welcome to the

Course!

Data Analytics
Subject Code: BCS
DEPARTMENT OF052
COMPUTER SCIENCE

Mohit Singh Tanwar


Assistant Professor
Department of CS, KIET Group of
Institutions
CONTENT FOR TODAY’S CLASS

 DATA ANALYTICS SUMMARY


 SOURCES OF DATA
 NATURE OF DATA
 CLASSIFICATION OF DATA
 CHARACTERISTICS OF DATA
 NEED OF DATA ANALYTICS
SOURCES OF DATA
 Data collection is the process of acquiring, collecting,
extracting, and storing the voluminous amount of data
which may be in the structured or unstructured form
like text, video, audio, XML files, records, or other image
files used in later stages of data analysis.
 In the process of big data analysis, “Data collection” is
the initial step before starting to analyze the patterns or
useful information in data.
 The data which is to be analyzed must be collected from
different valid sources.

e.g. Kaggle helps to get real-time datasets.


SOURCES OF DATA
SOURCES
PRIMARY DATA

 The data which is Raw, original, and extracted


directly from the official sources is known as primary
data.
 This type of data is collected directly by performing
techniques such as questionnaires, interviews,
and surveys.
 The data collected must be according to the demand
and requirements of the target audience on which
analysis is performed otherwise it would be a burden
in the data processing.
SELF STUDY

METHODS TO COLLECT PRIMARY DATA


LINK: https://www.geeksforgeeks.org/different-sources-of-data-for-data-analysis/
SOURCES
SECONDARY DATA

 Secondary data is the data which has already been collected and
reused again for some valid purpose.
 This type of data is previously recorded from primary data and it has
two types of sources named internal source and external source.
 Internal Sources (sales record, customer database, etc.)
 External Sources (public data, social media data, commercial third
party data)
SOURCES
OTHER SOURCES

 Sensors data: With the advancement of IoT devices, the sensors of these devices collect data
which can be used for sensor data analytics to track the performance and usage of products.
 Satellites data: Satellites collect a lot of images and data in terabytes on daily basis through
surveillance cameras which can be used to collect useful information.
 Web traffic: Due to fast and cheap internet facilities many formats of data which is uploaded
by users on different platforms can be predicted and collected with their permission for data
analysis. The search engines also provide their data through keywords and queries searched
mostly.
NATURE OF DATA
 The data which is collected is known as raw data which is not useful now but on
cleaning the impure and utilizing that data for further analysis forms information, the
information obtained is known as “knowledge”.
 Knowledge has many meanings like business knowledge or sales of enterprise
products, disease treatment, etc.
 The main goal of data collection is to collect information-rich data.
 Most of the data collected are of two types known as “qualitative data“ which is a
group of non-numerical data such as words, sentences mostly focus on behavior and
actions of the group and another one is “quantitative data”.
CONTD…
QUALITATIVE DATA
Nature: Qualitative data is descriptive and non-numeric. It captures the qualities,
attributes, or characteristics of phenomena.
Examples: Interviews, focus groups, open-ended survey responses, and observational data.
Usage: It provides deeper insights into people's experiences, motivations, and
attitudes. It's often used in exploratory research to understand context and generate
hypotheses.
Qualitative data cannot be organized or labeled numerically in a meaningful way.
For example, a user's favorite color has no numerical meaning or precedence. While it could be
assigned an arbitrary numerical value (1 = blue, 2 = red, 3 = yellow, etc.), those values have no
real meaning in relation to each other in regard to ranking, precedence, or magnitude.
It can be analyzed to provide answers to questions such as “Why do customers leave my
website before making a purchase?”, “How do customers perceive my brand?” and “What
demographic does my product attract?”
CONTD…
QUANTITATIVE DATA
Nature: Quantitative data is numerical and can be measured or quantified. It involves
quantities and statistical analysis.
Examples: Surveys with closed-ended questions, measurements, counts, and numerical data
from experiments.
Usage: It is used to identify patterns, test theories, and make predictions. It allows for
statistical analysis and can be used to generalize findings to a larger population.
It takes its meaning from the numerical values in the data — the “numbers are the
information, and are not just representational labels”. The numbers refer to a magnitude, or
sometimes an order.
Quantitative data answers questions such as ”How much time do visitors spend on my
site?”, ”How often do customers make a purchase?” and ”How many users does each
advertising channel attract?” The answer to each of these questions is a measurable, numerical
CLASSIFICATION OF DATA

There are 3 types of classification:-


 Structured data,
 Semi-structured data,
 Unstructured data.
STRUCTURED DATA
Nature: This type of data is highly organized and easily searchable, usually stored in
a fixed format. It follows a predefined schema with clearly defined fields and data
types.

Examples: Databases, spreadsheets, and tables where data is organized into rows
and columns with consistent data types (e.g., integers, strings, dates).

Usage: Structured data is easy to input, store, and query using standard database
management systems (DBMS) and is ideal for performing precise and complex
queries.

It concerns all data which can be stored in database SQL in a table with rows and
columns. They have relational keys and can easily be mapped into pre-designed
fields. Today, those data are most processed in the development and simplest way to
manage information. Example: Relational data.
UNSTRUCTURED DATA
Nature: Unstructured data does not follow a specific format or structure. It is often textual and can be
more challenging to organize and analyze.
Examples: Text documents (Word, PDF), emails, social media posts, audio files, and videos.
Usage: Analyzing unstructured data typically requires advanced techniques such as natural language
processing (NLP) or machine learning to extract meaningful insights .

Unstructured data is a data which is not organized in a predefined manner or does not have a
predefined data model, thus it is not a good fit for a mainstream relational database. So for
Unstructured data, there are alternative platforms for storing and managing, it is increasingly
prevalent in IT systems and is used by organizations in a variety of business intelligence and analytics
applications.
SEMI-STRUCTURED DATA
Nature: This data has some organizational properties but does not conform to the rigid
structure of structured data. It may contain tags or markers to separate data elements,
but the overall format is more flexible.
Semi-structured data is information that does not reside in a relational database but that
has some organizational properties that make it easier to analyze. With some
processes, you can store them in the relation database (it could be very hard for some
kind of semi-structured data), but Semi-structured exist to ease space.
Examples: XML files, JSON files, and NoSQL databases.
Usage: Semi-structured data is more flexible than structured data and easier to analyze
than unstructured data. It is often used in applications that require both structured and
unstructured information.
DIFFERENCES
PYQs
CHARACTERISTICS OF DATA
The characteristics of good quality data are as follows:
NEXT CLASS TOPIC
1. Introduction of Big Data and Platform
2. 6 V’s of Big Data
3. Analytic Process and Tools and Modern Tools Used in Data
Analytics
4. Evolution of Analytic Scalability
5. Analysis vs Reporting
6. Applications of Data Analytics

You might also like