1
Big Data
Understanding of Big
Data
An avalanche of data available increasing
exponentially
Google CEO Erik Schmidt said
Every two days we create as much information as
we did from the dawn of civilization up until 2003.
Thats something like five exabytes of data.
Farnam Jahanian kicked off a May 1, 2012 briefing,
calling data
a transformative new currency for science,
engineering, education, and commerce.
Instant detection:
Solving customer service issues in
minutes
At this global consumer products company, when
large retail customers reported problemsa missing
or late shipment, for exampleit could take several
work-intensive weeks to determine what went wrong.
By that time, these customers may have identified
other separatechallengesand the cycle continued.
With advanced analytic and visualization capabilities
in place, this company was able to identify the root
causes of customer service issues in a matter of
minutes, making it easier to solve them and move on
to the next thing more quickly.
Now the team can review over 100 million data
entries in minutes, not weeks, which allows for more
time to focus on strategic activities that improve
customer service
Understanding of Big
Data
Farnam Jahanian (NSF)
Big Data is characterized not only by the enormous
volume of data but also by the diversity and
heterogeneity of the data and the velocity of its
generation.
Nuala OConnor Kelly (GE)
its the volume and velocity and variety of data to
achieve new results for
Nick Combs (EMC)
Its needle in a haystack or connecting the dots.
Arvind Krishna (IBM) added the fourth V:
Veracity:
data in doubt
Describe 'contradictory data,' or noisy data
Big Data (Noun)
extremely large data sets that may be analysed
computationally to reveal patterns, trends, and
associations, especially relating to human
behaviour and interactions
Understanding of Big
Data
Big data is high-volume, high-velocity and/or
high-variety information assets that demand
cost-effective, innovative forms of information
processing that enable enhanced insight,
decision making, and process automation
Additional 4 Vs of Big Data
Veracity
Having a lot of data in different volumes coming in at high speed
is worthless if that data is incorrect. Incorrect data can cause a
lot of problems for organisations as well as for consumers.
Therefore, organisations need to ensure that the data is correct
as well as the analyses performed on the data are correct.
Especially in automated decision-making, where no human is
involved anymore, you need to be sure that both the data and
the analyses are correct.
If you want your organisation to become information-centric, you
should be able to trust that data as well as the analyses.
accountability.
Additional 4 Vs of Big Data
Variability
Big data is extremely variable. Brian Hopkins, a Forrester principal analyst,
defines variability as the variance in meaning, in lexicon. He refers to the
supercomputer Watson who won Jeopardy. The supercomputer had to
dissect an answer into its meaning and [] to figure out what the right
question was. That is extremely difficult because words have different
meanings an all depends on the context. For the right answer, Watson had
to understand the context.
Variability is often confused with variety. Say you have bakery that sells 10
different breads. That is variety. Now imagine you go to that bakery three
days in a row and every day you buy the same type of bread but each day
it tastes and smells different. That is variability.
Variability is thus very relevant in performing sentiment analyses. Variability
means that the meaning is changing (rapidly). In (almost) the same tweets
a word can have a totally different meaning. In order to perform a proper
sentiment analyses, algorithms need to be able to understand the context
and be able to decipher the exact meaning of a word in that context. This
is still very difficult.
Additional 4 Vs of Big Data
Visualization
This is the hard part of big data. Making all that vast amount of data
comprehensible in a manner that is easy to understand and read.
With the right analyses and visualizations, raw data can be put to
use otherwise raw data remains essentially useless. Visualizations
of course do not mean ordinary graphs or pie charts. They mean
complex graphs that can include many variables of data while still
remaining understandable and readable.
Visualizing might not be the most technological difficult part; it sure
is the most challenging part. Telling a complex story in a graph is
very difficult but also extremely crucial. Luckily there are more
and more big data startups appearing that focus on this aspect
and in the end, visualizations will make the difference. One of
them is future this will be the direction to go, where visualizations
help organisations answer questions they did not know to ask .
Additional 4 Vs of Big Data
10
Value
All that available data will create a lot of value for organisations,
societies and consumers. Big data means big business and every
industry will reap the benefits from big data. McKinsey states that
potential annual value of big data to the US Health Care is $ 300
billion, more than double the total annual health care spending of
Spain. They also mention that big data has a potential annual value
of 250 billion to the Europes public sector administration. Even
more, in their well-regarded report from 2011, they state that the
potential annual consumer surplus from using personal location data
globally can be up to $ 600 billion in 2020. That is a lot of value.
Of course, data in itself is not valuable at all. The value is in the
analyses done on that data and how the data is turned into
information and eventually turning it into knowledge. The value is in
how organisations will use that data and turn their organisation into
an information-centric company that relies on insights derived from
data analyses for their decision-making.
Data
Collection of facts and statistics collected
together for reference or analysis.
The two major categories are qualitative and
quantitative data.
Qualitative data is everything that refers to the
quality of something: A description of colours,
texture and feel of an object , a description of
experiences, and interview are all qualitative
data.
Quantitative data is data that refers to a
number. E.g. the number of golf balls, the size,
the price, a score on a test etc.
1
1
Unstructured vs.
Structured data
1
2
A plain sentence we have 5 white used golf balls with a
diameter of 43mm at 50 cents each
- might be easy to understand for a human, but for a computer
this is hard to understand.
The above sentence is what we call unstructured data.
Unstructured has no fixed underlying structure
the sentence could easily be changed and its not clear
which word refers to what exactly.
.
Unstructured vs.
Structured data
1
3
If you want your computer to process and analyse your data, it has to be
able to read and process the data.
This means it needs to be structured and in a machine-readable form.
The same thing expressed as CSV can look something like:
quantity, color, condition, item, category, diameter (mm),
price per unit (AUD) 5,white,used,ball,golf,43,0.5
Note that words have quotes around them:
This distinguishes them as text (string values in computer speak)
whereas numbers do not have quotes.
Data : Types
1
4
Data Refined
1
5