KEMBAR78
Big Data Introduction Unit 1 | PDF | Analytics | Big Data
0% found this document useful (0 votes)
20 views19 pages

Big Data Introduction Unit 1

The document provides an overview of Big Data, defining it as large and complex data sets that traditional management tools cannot efficiently process. It discusses characteristics of Big Data, including volume, velocity, variety, and veracity, and categorizes data into structured, unstructured, and semi-structured types. Additionally, it highlights the importance of Big Data analytics in various industries and its applications in improving business operations and decision-making.

Uploaded by

Sakthi Vel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views19 pages

Big Data Introduction Unit 1

The document provides an overview of Big Data, defining it as large and complex data sets that traditional management tools cannot efficiently process. It discusses characteristics of Big Data, including volume, velocity, variety, and veracity, and categorizes data into structured, unstructured, and semi-structured types. Additionally, it highlights the importance of Big Data analytics in various industries and its applications in improving business operations and decision-making.

Uploaded by

Sakthi Vel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

 Introduction to Big Data

What is Data?
The quantities, characters, or symbols on which operations are performed by a computer,
which may be stored and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.
What is Big Data?
Big Data is also data but with a huge size. Big Data is a term used to describe a
collection of data that is huge in volume and yet growing exponentially with time. In
short such data is so large and complex that none of the traditional data management
tools are able to store it or process it efficiently.

 “Extremely large data sets that may be analyzed computationally to reveal patterns ,
trends and association, especially relating to human behavior and interaction are
known as Big Data.”
 Examples Of Big Data
Following are some the examples of Big Data-
 The New York Stock Exchange generates about one terabyte of new trade data per day.
 Social Media
The statistic shows that 500+terabytes of new data get ingested into the databases of social
media site Facebook, every day. This data is mainly generated in terms of photo and video
uploads, message exchanges, putting comments etc.
Twitter

 A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many
thousand flights per day, generation of data reaches up to many Petabytes.
Tabular Representation of various Memory Sizes

Name Equal To Size(In Bytes)


Bit 1 bit 1/8
Nibble 4 bits 1/2 (rare)
Byte 8 bits 1
Kilobyte 1024 bytes 1024
Megabyte 1, 024kilobytes 1, 048, 576
Gigabyte 1, 024 megabytes 1, 073, 741, 824
Terrabyte 1, 024 gigabytes 1, 099, 511, 627, 776
Petabyte 1, 024 terrabytes 1, 125, 899, 906, 842, 624
Exabyte 1, 024 petabytes 1, 152, 921, 504, 606, 846, 976
Zettabyte 1, 024 exabytes 1, 180, 591, 620, 717, 411, 303, 424

Yottabyte 1, 024 zettabytes 1, 208, 925, 819, 614, 629, 174, 706, 176
 Characteristics Of Big Data
• The following are known as “Big Data Characteristics”.
1. Volume
2. Velocity
3. Variety
4. Veracity
1. Volume:
Volume means “How much Data is generated”. Now-a-days,
Organizations or Human Beings or Systems are generating or getting
very vast amount of Data say TB(Tera Bytes) to PB(Peta Bytes) to Exa
Byte(EB) and more.
2. Velocity:
Velocity means “How fast produce Data”. Now-a-days, Organizations or
Human Beings or Systems are generating huge amounts of Data at very
fast rate.

3. Variety:
Variety means “Different forms of Data”. Now-a-days, Organizations or
Human Beings or Systems are generating very huge amount of data at very fast
rate in different formats. We will discuss in details about different formats of
Data soon.
4. Veracity
Veracity means “The Quality or Correctness or Accuracy of Captured Data”.
Out of 4Vs, it is most important V for any Big Data Solutions. Because without
Correct Information or Data, there is no use of storing large amount of data at
fast rate and different formats. That data should give correct business value.
 Types of Digital Data
1. Structured
2. Unstructured
3. Semi-structured

 Structured
 Any data that can be stored, accessed and processed in the form of fixed format is
termed as a 'structured' data.
 Over the period of time, talent in computer science has achieved greater success in
developing techniques for working with such kind of data (where the format is well
known in advance) and also deriving value out of it.
 However, nowadays, we are foreseeing issues when a size of such data grows to a huge
extent, typical sizes are being in the range of multiple zettabytes.

 Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte.
Looking at these figures one can easily understand why the name Big Data is
given and imagine the challenges involved in its storage and processing.
 Do you know? Data stored in a relational database management system is one
example of a 'structured' data.

• Examples Of Structured Data


An 'Employee' table in a database is an example of Structured Data

Employee_ID Employee_Name Gender Department Salary_In_lacs

2365 Rajesh Kulkarni Male Finance 650000


3398 Pratibha Joshi Female Admin 650000
7465 Shushil Roy Male Admin 500000
7500 Shubhojit Das Male Finance 500000
7699 Priya Sane Female Finance 550000
 Unstructured
 Any data with unknown form or the structure is classified as unstructured data.
 In addition to the size being huge, un-structured data poses multiple challenges in terms
of its processing for deriving value out of it.
 A typical example of unstructured data is a heterogeneous data source containing a
combination of simple text files, images, videos etc.
 Now day organizations have wealth of data available with them but unfortunately, they
don't know how to derive value out of it since this data is in its raw form or unstructured
format.
• Examples Of Un-structured Data
The output returned by 'Google Search'
 Semi-structured
 Semi-structured data can contain both the forms of data.
 We can see semi-structured data as a structured in form but it is actually not defined
with e.g. a table definition in relational DBMS.
 Example of semi-structured data is a data represented in an XML file.

 Examples Of Semi-structured Data


Personal data stored in an XML file-

<rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
<rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>
<rec><name>Satish Mane</name><sex>Male</sex><age>29</age></rec>
<rec><name>Subrato Roy</name><sex>Male</sex><age>26</age></rec>
<rec><name>Jeremiah J.</name><sex>Male</sex><age>35</age></rec>
 Big Data Analytics
 Big Data Analytics:
 Big Data analytics is the process of collecting, organizing and analyzing
large sets of data (called Big Data) to discover patterns and other useful
information.
 Big Data analytics can help organizations to better understand the
information contained within the data and will also help identify the data
that is most important to the business and future business decisions.
Analysts working with Big Data typically want the knowledge that comes
from analyzing the data.
 High-Performance Analytics Required:
 To analyze such a large volume of data, Big Data analytics is typically
performed using specialized software tools and applications for predictive
analytics, data mining, text mining, forecasting and data optimization.
 Collectively these processes are separate but highly integrated functions of
high-performance analytics.
 Using Big Data tools and software enables an organization to process extremely
large volumes of data that a business has collected to determine which data is
relevant and can be analyzed to drive better business decisions in the future.
 The Challenges:
 For most organizations, Big Data analysis is a challenge. Consider the sheer
volume of data and the different formats of the
data(both structured and unstructured data) that is collected across the entire
organization and the many different ways different types of data can be
combined, contrasted and analyzed to find patterns and other useful business
information.
 The first challenge is in breaking down data silos to access all data an
organization stores in different places and often in different systems.
 A second challenge is in creating platforms that can pull in unstructured data as
easily as structured data.
 This massive volume of data is typically so large that it's difficult to process
using traditional database and software methods.
 How Big Data Analytics is Used Today:
 As the technology that helps an organization to break down data silos and analyze
data improves, business can be transformed in all sorts of ways.
 Today's advances in analyzing big data allow researchers to decode human DNA in
minutes, predict where terrorists plan to attack, determine which gene is mostly likely
to be responsible for certain diseases and, of course, which ads you are most likely to
respond to on Facebook.
 Another example comes from one of the biggest mobile carriers in the world.
 France's Orange launched its Data for Development project by releasing subscriber
data for customers in the Ivory Coast.
 The 2.5 billion records, which were made anonymous, included details on calls and
text messages exchanged between 5 million users.
 Researchers accessed the data and sent Orange proposals for how the data could serve
as the foundation for development projects to improve public health and safety.
 Proposed projects included one that showed how to improve public safety by tracking
cell phone data to map where people went after emergencies; another showed how to
use cellular data for disease containment. (source)
 The Benefits of Big Data Analytics:

 Enterprises are increasingly looking to find actionable insights into their


data. Many big data projects originate from the need to answer specific
business questions. With the right big data analytics platforms in place, an
enterprise can boost sales, increase efficiency, and improve operations,
customer service and risk management.
 Webopedia parent company, QuinStreet, surveyed 540 enterprise decision-
makers involved in big data purchases to learn which business areas
companies plan to use Big Data analytics to improve operations. About half
of all respondents said they were applying big data analytics to improve
customer retention, help with product development and gain a competitive
advantage.
 Notably, the business area getting the most attention relates to increasing
efficiency and optimizing operations. Specifically, 62 percent of respondents
said that they use big data analytics to improve speed and reduce complexity.
 Application of Big Data
 Here is the list of top Big Data applications in today’s world:

• Big Data in Healthcare


• Big Data in Education
• Big Data in E-commerce
• Big Data in Media and Entertainment
• Big Data in Finance
• Big Data in Travel Industry
• Big Data in Telecom
• Big Data in Automobile

You might also like