0 ratings0% found this document useful (0 votes) 162 views16 pagesBig Data..Unit-1 Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
UNIT1
Introduction to Big Data
Syllabus ta, history of
ae wi ee u al cal of Bi
Introduction to Hig Data: Types Oo Giglio Big. Daley Big Data ita
introduction to Big Data platform. dive pnology COMPODENS, Bi priieet
pata plianc 7
Big Data f gcourily Se agés of and prota
. Big Data features ~ 5°°° ties, CI of conveny
Data privacy and Big Data analy processes and tools, anata
intelligent data analysis, nature of dala analy ysis vs 4
modern data analytic tools
Vs of Big Data. Big
Types of digital data
DIGITAL DATA series of 0's and
Digital data is information stored on a computer system a @ e
t'sina ‘A
binary language. Digital data jumps from one value to the next in a step by
step sequence,
Example: Whenever we send an email, rea jal of fale
pictures with our digital camera, we are wor
Digital data can be classified into three forms:
a. Unstructured Data; The data which does 1
not in a form that can be used easily by a computer P!
unstructured data. About 80—90% data of an organizat
Example: Memos, chat rooms, PowerPoint presentations, images, videos,
letters, researches, white papers, the body of an email, etc.
b. Semi-Structured Data; The data which does not conform to a data model
is categorized as semi-structured data. However, itis
da social media P
igital data
ing with di
ot conform to a data model oris
rogram is categorized as
ition is in this format.
but has some structure
not in a form that can be used easily by a computer program
Example : Emails, XML, markup languages like HTML, etc. Metadata for this
data is available but is not sufficient.
¢., Structured Data: The data which isin an organized form (ie. in rows and
columns) and can be easily used by a computer program is categorized aS
semi-structured data, Relationships exist between entities of data, such a5
classes and their objects.
Example: Data stored in databases.ff
Introduction to Big Data platform | the features and
A big data platform is a type of IT solution that com" oh in a single
capabilities of several big data applications and utilities enalyed Big Dats
solution, this is then used further for managing 48 2" Sn c
It focuses on providing its users with efficient analytics
datasets. ans according to thei
‘The users of such platforms can custom build appfication® 9s :
use case like to calculate customer loyalty (E-Commere? eee
on.
Goal: The main goal of a Big Data Platform is to achieve: S
Availability, Performance, and Security
Example: Some of the most commonly used Big
ols for massive
calability,
Data Platforms are :
+ Hadoop Delta Lake Migration Platform
+ Data Catalog Platform
+ Data Ingestion Platform
+ loT Analytics Platform
Big Data Architecture :
Big data architecture is designed to handle the ingestion, processing, Ge
analysis of data that is too large or complex for traditional database systems
Analytical Data
Store
Real time
Message
Ingestion
‘The big data architectures include the following components:f
Data sources: All big data solutions start wit!
Example,
© Application data stores, such as relational dstebeSasear log files.
0 Static files produced by applications, such as W°" -
© Real-time data sources, such as loT devices. tored in a distributed
Data storage: Data for batch processing operations is a 's formats (also
file tore that can hold high volumes of large files n Var"
called data lake).
Example,
‘Azure Data Lake Store or blob containers in Azure S|
Batch processing: Since the data sets are SO large,
solution must process data files using long-running batch
Borers data sources.
th one
torage- F
therefore a big data
jobs to filter,
aggregate, and prepare the data for analysis.
oath i cludes real-time sources, the
e messages for
Real-time message ingestion: If a solution in
architecture must include a way to capture and store realtim
stream processing,
Stream processing: After capturing real-time messages,
and preparing the data for analysis.
We can use
the solution must
process them by filtering, aggregating,
The processed stream data is then written to an output sink.
‘open-source Apache streaming technologies like Storm and Spark Streaming
for this.
Analytical data store: Many big data solutions prepare data for analysis and
then serve the processed data in a structured format that can be queried using
analytical tools. Example: Azure Synapse Analytics provides a managed
service for large-scale, cloud-based data warehousing,
Analysis and reporting: The goal of most big data solutions is to provide
insights into the data through analysis and reporting. To empower users to
analyze the data, the architecture may include a data modelling layer.
‘Analysis and reporting can also take the form of interactive data ex .
data scientists or data analysts. loration by
Orchestration: Most big data solutions consist of repeat
operations, that transform source data, move data betw.
and sinks, load the processed data into an analytical g;
‘ed data processing
een multiple sources
lata store, or push thewe canuse an
results straight to a report. To automate these workflows:
orchestration technology such as Azure Data Factory:
Drivers for Big Data iced topics in the
Big Data has quickly risen to become one of the most des"
— Data Analytics are:
The main business drivers for such rising demand for Bi
1. The digitization of society
2. The drop in technology costs
3. Connectivity through cloud computing
4. Increased knowledge about data science
5, Social media applications
6. The rise of Internet-of-Things(loT)
Example: A number of companies that have Big Dat
Strategy like :
Apple, Amazon, Facebook and Nettlix have become very succes:
beginning of the 21st century
Big Data Characteristics :
Big data can be described by the following characteristics:
a at the core of their
ssful at the
+ Volume
+ Variety
+ Velocity
5 Vs of Big Data, Big Data technology components
5 Vs of Big Data :1. Volume : i
Big Data is a vast “volumes” of data generated from many sources daily, such
as business processes, machines, social media platforms, networks, human
interactions, and so on.
Example: Facebook generates approximately a billion messages, 4.5 billion
times the “Like” button is recorded, and more than 350 million new posts are
uploaded each day.
Big data technologies can handle large amounts of data.
2. Variety :
Big Data can be structured, unstructured, and semi-structured that are being
collected from different sources.
Data were only collected from databases and sheets in the past, But these
days the data will come in an array of forms ie.- PDFs, Emails, audios, Social
Media posts, photos, videos, etc.
3. Velocity :
Velocity refers to the speed with which data is generated in teal-time.
Velocity plays an important role compared to others,
It contains the linking of incoming data sets Speeds, ra
activity bursts,
The primary aspect of Big Data is to provide demanding data rapidly,
ite of change, andr messages or
Example of data that is generated with high velocity - THe
Facebook posts.
4. Veracity
Veracity refers to the quality of the data that is beind analyzed: :
Itis the process of being able to handle and manage dat@ efficient.
Example: Facebook posts with hashtags.
5. Value
Value is an essential characteristic of big data.
Itis not the data that we process or a is valuable and reliable data that
we
store, process and analyse.
Big Data importance and applications
Big Data Importance :
Big Data importance doesn't revolve around the amount of data a company
has but lies in the fact that how the company utiizes the gathered data.
Every company uses its collected data in its own way. More effectively the
company uses its data, more rapidly it grows.
By analysing the big data pools effectively the companies can get answers to:
Cost Savings :
0 Some tools of Big Data like Hadoop can bring cost advantages to business
when large amounts of data are to be stored.
© These tools help in identifying more efficient ways of doing business.
Time Reductions :
© The high speed of tools like Hadoop and in-memory analytics can easily
identify new sources of data which helps businesses analyzing data
immediately.
© This helps us to make quick decisions based on the learnings.
Understand the market conditions :
° Byanalyzing big data we can get a better Understanding of current market
conditions.
co For example: By analyzing customers’ purchasing behaviours, a company
can find out the products that are sold the most and produce products
according to this trend. By this, it can get ahead of its competit
Control online reputation : ors.© Big data tools can do sentiment analysis. pat about Your
© Therefore, you can get feedback about who is s2¥iN9 fo
company. of your business,
© If you want to monitor and improve the online presence
then big data tools can help in all this,
Using Big Data Analytics to Boost Customer
ae jeitfon(purchase) and
ends on.
Retention :
isiness 46P' :
.stablish a solid
‘© The customer is the most important asset any Du! OC
© No single business can claim success without first having
customer base.
for, then it is very
0 Ifa business is slow to learn what customers are looking
likely to deliver poor quality products. ustomer-telated
: eae 7
© The use of big data allows businesses to observe Various
atterns and trends. :
v fer Marketing
Using Big Data Analytics to Solve Advertisers Problem and Of
Insights :
© Big data analytics can help change all business operations.
© Like the ability to match customer expectations, changing
company’s product line, etc.
And ensuring that the marketing campaigns are powerful.
Big Data Applications :
In today’s world big data have several applications, some of them are listed
below:
Tracking Customer Spending Habit, Shopping Behavior :
In big retails stores, the management team has to keep data of customer's
spending habits, shopping behaviour, most liked product, which product is
being searched/sold most, based on that data, the production/collection rate of
that product gets fixed.
Recommendation :
By tracking customer spending habits, shopping behaviour, big retail stores
provide recommendations to the customers.
Smart Traffic System :
Data about the condition of the traffic of differen
cameras, GPS devices placed in the vehicle, “roads, collected through/
All such data are analyzed and jam-free or less jam WY"
ways are recommended,
One more profit is fuel consumption can be reduced
Secure Air Traffic System ;
At various places of flight, sensors are present temperature,
These sensors capture data like the speed of fight, moisturer
and other environmental conditions, Eran ak
Based on such data analysis, an environmental parameter within flight is set
up and varied
By analyzing figh’s machine-generate data it can be estimated ow long
the machine can operate flawlessly and when it can be replaced/repaired.
Auto Driving Car :
In the various spots of the car camera, a sensor is placed that gathers Sd
like the size of the surrounding car, obstacle, distance from those, etc.
These data are being analyzed, then various calculations are carried out.
These calculations help to take action automatically.
Virtual Personal Assistant Tool :
Big data analysis helps virtual personal assistant tools like Siri, Cortana and
Google Assistant to provide the answer to the various questions asked by
jess time taking
users.
This tool tracks the location of the user, their local time, season, other data
related to questions asked, etc.
Analyzing all such data provides an answer.
Example: Suppose one user asks “Do | need to take Umbrella?”The tool
collects data like location of the user, season and weather condition at that
location, then analyzes these data to conclude if there is a chance of raining,
then provides the answer.
JoT:
Manufacturing companies install OT sensors into machines to collect
operational data.
Analyzing such data, it can be predicted how long a Machine will work without
any problem when it requires repair.
Thus, the cost to replace the whole machine can be Saved,
Education Sector Energy Sector :‘i to seay
Online educational courses conducting organization utilize PIS So mn
candidates interested in that course, “oct, then aN onling
'f someone searches for a YouTube tutorial video on @ our er etind to
oF offline course provider organization on that subject S°"45
that person about their course.
Media and Entertainment Sector -
Media and entertainment service providing company like Nett a
Amazon Prime, Spotify do analysis on data collected from toe non
Data like what type of video, music users are watching. listenIn9 ed to set
how long users are spending on site, etc are collected and analy
the next business strategy.
Big Data Technology Components :
Components of a Big Data Ecosystem
1. Ingestion :
The ingestion layer is the very first step of pulling in raw data.
It comes from internal sources, relational databases, non-relational databases,
social media, emails, phone calls etc.
There are two kinds of ingestions
Batch, in which large groups of data are gathered ang delivered together.
Streaming, which is a continuous flow of data, This is Necessary for realtime
data analytics.
2. Storage :warehouse ang
o
Storage is where the converted data is stored in a data !@k©
eventually processed a big data
of
The data lake/warehouse is the most essential compone”
ecosystem. se insights 28 valuable
e
Itnneeds to contain only thorough, relevant data to ma!
as possible. aliow for quicker
It must be efficient with as little redundancy as possible '°
Processing.
3. Analysis : srtots, shaping nt
In the analysis layer, data gets passed through seve
actionable insights.
There are four types of analytics on big data :
+ Diagnostic: Explains why a problem is happening:
Descriptive: Describes the current state of a business through
historical data.
Predictive: Projects future results based on hi ;
Prescriptive: Takes predictive analytics a step further by projecting
best future efforts.
4. Consumption :
The final big data component is presenting the information in a format
digestible to the end-user.
This can be in the forms of tables, advanced visualizations and even single
istorical data.
numbers if requested.
The most important thing in this layer is making sure the intent and meaning of
the output is understandable.
Big Data features —security, compliance, auditing and
protection
BIG DATA SECURITY :
1. Big data security is the collective tern for all the measures and tools used to
ols us
guard both the data and analytics processes from attacks, theft, che
” or other
malicious activities that could harm or negatively affect tt
hem,: enges are
2. For companies that operate on the cloud, big data secu challeng
. , be
multi-faceted.
they trust
3. When customers give their per : i compare
Personal information t0 WE the
re spit fall:
them with personal data Which can be used against them if Le
wrong hands,
BIG DATA COMPLIANCE :
i i p is organize
1, Data compliance is the practice of ensuring that sensitive data is orgal : cd
i iterprise
and managed in such a way as to enable organizations t meet enterprise
business rules along with legal and governmental regulations.
jizati it ¢d up to
2. Organizations that don’t implement these regulations ¢" be fined up to tens
of millions of dollars and even receive a 20-year penalty-
BIG DATA AUDITING :
. Auditors can use big data to expand the scope of their projects and draw
comparisons over larger populations of data.
2. Big data also helps financial auditors to streamline the reporting process
and detect fraud.
3. These professionals can identify business risks in time and conduct more
relevant and accurate audits.
BIG DATA PROTECTION
Big data security is the collective term for all the measures and tools used to
guard both the data and analytics processes from attacks, theft, or other
malicious activities that could harm or negatively affect them,
. That's why data privacy is there to protect those customers but also
companies and their employees from security breaches,
2. When customers give their personal information to Companies, they trust
anies, they trust
them with personal data which can be used against them s
them if it falls j
is into the
wrong hands.+g implement these
20-year penalty.
Big Data privacy and ethicg
fost data i vation.
° M lata is collected through surveys, interviews, obser ation.
es, they trust
© When cus ive the; i
Yomers give their personal information «0 COMP"
one if it falls into the
them with personal data which can be used against them
wrong hands.
¢ That’s why data privacy is there to protect those customers: but also
companies and their employees from security breaches
¢ One of the main reasons why companies comply with data privacy
regulations is to avoid fines,
be fined up to
Organizations that don’t implement these regulations can
tens of millions of dollar’ and even receive a 20-year penalty.
'* Reasons, why we need to take data privacy seriously. are :
* Data breaches could hurt your business.
© Protecting your customers’ privacy
Maintaining and improving brand value
«It gives you a competitive advantage
It supports the code of ethics
Challenges of conventional systems
+ Big data is the storage and analysis of large data sets,
+ These are complex data sets that can be both structured or
unstructured.
+ They are so large that it is not possible to work on th
em wit itional
analytical tools. ith tradition:One of the Major ch
uncertainty of the p,
Big data is continy
the
lenges of conventional systems W4®
ata Management Landscape.
Ously expanding, there are new comP
anies and
Pest for them without he introduction o new isk 2 tof big
These days, organizations are realising the value they 9° 8
data analytics and hence they are deploying big dat@ tools a
Processes to bring more efficiency in their work environment
Big Data Analytics :
Big data analytics is a complex process of examining bi
ions, market
uncover information, such as - hidden patterns, correlations, NST
trends and customer preferences. pe
This can help organizations make informed business decisions.
Data Analytics technologies and techniques give organizations a way to
analyze data sets and gather new information. :
Big Data Analytics enables enterprises to analyze their data in full
context quickly and some also offer real-time analysis.
Importance of Big Data Analytics :
+ Organizations use big data analytics systems and software to make
data-driven decisions that can improve business-related outcomes.
+ The benefits include more effective marketing, new revenue
opportunities, customer personalization and improved operational
efficiency. ae
With an effective strategy, these benefits can Provide competitive
advantages over ::
. ivals, _
+ Big Data Analytics tools also help businesses Save time
aid in gaining insights to inform data-driven decisions,
+ Big Data Analytics enables enterprises to Narrow their Bi
most relevant information and analyze it to inform critical
decisions.
Intelligent data analysis, nature of data
data to
'€ and money and
Data to the
businessIntetigent p,
Analysi ‘ proaches in th
field of data = ¥S'S (IDA) is one of the most important 2 6
Based ne
he Basie oa that I
handles, th * PSI icines oft and the features of gatas cen
"Ne development of IDA is briefly summarized from thre :
Algorithm Principle
* The scale
* Type of the d
ataset i
F al
Inttigent Data Analysis (IDA) is one of the raj issues 8 ACR
Mtelligence and information.
Meligence and information,
Inteligant data analysis discioses hidden facts that fe noLKnOwn PSTSUSy
and provide potentially important information or facts from large quantities of
data.
Wtalso helps in making a decision. Bn
Based on machine learning, artificial intelligence. recognition of fae and
Tecords and visualization technology, IDA helps to obtain useful information,
necessary data and interesting models from a lot of data available online in
order to make the right choices.
IDA includes three stages:
(1) Preparation of data
(2) Data mining
(3) Data validation and Explanation
Modern data analytic tools
+ These days, organizations are realising the value they get out of big
data analytics and hence they are deploying big data tools and
& deploying big cata tools &
processes to bring more efficiency to their work environment.
+ Many big data tools and processes are being utilised by companies
these days in the processes of discovering insights and supporting
decision making
Data Analytics tools are types of application software that retrieve data
from one or more systems and combine iin a repository, such a8
data warehouse, to be reviewed and analysed.
bp Sof —D Rerivie Dalia BP) |_|Mizatio
NS yy si
Sheets © More th, tool including
Wi . an lytics
ata Mining co Statistica) eo ae re software Packages,
. es ns, statistical So
TORtther, thea eta Ped
jing tools.
ese /@ modelling too!
OVErView of the oe Analvtics
= nization a complete
© Marketp PAY to provide key insights and understanding of
ata analytics 80 smarter decisions may be Mace:
lols not only renga the results of
&xplai only report the results. ofS
Plain why the FeSults occurred to help identify weaknesses, fix
a =a
on TEAS, alert decision-makers to unforeseen EVents
Sven forecast future results based on decis
Tools give the org!
ness
+ the data but also
Potential probienva;
jons the company might
Make. ee
Below is the list some of data analytics tools :
R Programming (Leading Analytics Toot in he indust'y)
* Python
+ Excel
+ SAS
+ Apache Spark
+ Splunk
+ RapidMiner
* Tableau Public
+ KNime
Analysis vs reporting
Reporting :Once data is collected, it will be organized using tools such as
graphs and tables.
+ The process of organizing this data is called reporting.
+ Reporting translates raw data into information.
« Reporting helps companies to monitor their online business and be
alerted when data falls outside of expected ranges.
+ Good reporting should raise questions about the business from its end
users.
Analysis :lyzing it
and anal
‘pod dat
Analytics is the Process of taking the organize
This helps users
pusinesses can
how
to gain valuable insights 0
improve their Performance.
sights.
ion into insta ing the data
Analysis transforms data and information poet oterret u
* The goal of the analysis is to answer guest :
ata deeper level and providing actionable
Conclusion :
Reporting shows us “what is happening”.
ing” and “whet
ening” an
chy itis hapP'
The analysis focuses on explaining “why /
we can do about it”