Big Data and Big Data Analytics
Facts, Opportunities, Challanges,
and Applications
Moutaz Haddara
Voettekst van presentatie
Agenda
o Fundamentals
o Challenges
o Opportunities & Applications
Fundamentals:
Big Data
There are some things that are so
big that they have implications for
everyone, whether we want it or
not.
Big Data is one of those things, and
is completely transforming the way
we do business and is impacting
most other parts of our lives.
The basic idea behind the phrase 'Big
Data' is that everything we do is
increasingly leaving a digital trace (or
data), which we (and others) can use and
analyse.
Big Data therefore refers to our ability to
make use of the ever-increasing volumes
of data.
Why big data?
Big Data - size doesn’t matter
• Big data is a broad term for data sets so large or
complex that traditional data processing applications are
inadequate.
Big Data: the three V’s
• "Big Data are high-volume, high-velocity, and/or
high-variety information assets that require new
forms of processing to enable enhanced
decision making, insight discovery and process
optimization”
Gartner 2012
Big Data: 3V’s
Big Data
Big Data Characteristics
• High-volume
– Amount of data
• High-velocity
– Speed rate in collecting or acquiring or generating or
processing of data
• High-variety
– Different data type such as audio, video, image data
The Currently
three V’s of big more
three data constitute a
V’s are also
comprehensive definition
considered
Value Veracity Viability
Variety Velocity
Volume
11
Big Data
Volume…
…refers to the vast amounts of data generated
every second. We are not talking Terabytes but
Ze;abytes or Brontobytes. If we take all the
data generated in the world between the
beginning of ?me and 2008, the same amount
of data will soon be generated every minute.
New big data tools use distributed systems so
that we can store and analyse data across
databases that are do;ed around anywhere in
the world.
Velocity…
…refers to the speed at which new data is
generated and the speed at which data moves
around. Just think of social media messages
going viral in seconds. Technology allows us
now to analyse the data while it is being
generated (some?mes referred to as in-
memory analy?cs), without ever puJng it into
databases.
Variety…
…refers to the different types of data we can
now use. In the past we only focused on
structured data that neatly fi;ed into tables or
rela?onal databases, such as financial data. In
fact, 80% of the world’s data is unstructured
(text, images, video, voice, etc.) With big data
technology we can now analyse and bring
together data of different types such as
messages, social media conversa?ons, photos,
sensor data, video or voice recordings.
Veracity…
…refers to the messiness or trustworthiness of
the data. With many forms of big data quality
and accuracy are less controllable (just think of
Twi;er posts with hash tags, abbrevia?ons,
typos and colloquial speech as well as the
reliability and accuracy of content) but
technology now allows us to work with this
type of data.
Viability…
…Tes?ng hypotheses before inves?ng in data
analy?cs projects. Like iden?fying the variables
needed for analysis. From h;p://www.wired.com/insights/
2013/05/the-missing-vs-in-big-data-viability-and-value/
Value…
…refers to the value acquired through analysing big
data…
It is all good having access to big data but unless we can
turn it into value it is useless. So you can safely argue
that 'value' is the most important V of Big Data. It is
important that businesses make a business case for any
a;empt to collect and leverage big data. It is so easy to
fall into the buzz trap and embark on big data ini?a?ves
without a clear understanding of costs and benefits.
Volume (Scale)
• Data Volume
– 44x increase from 2009 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Exponen'al increase in
collected/generated data
4.6
30 billion RFID billion
tags today
12+ TBs (1.3B in 2005)
camera
of tweet data phones
every day world wide
100s of
millions
data every day
of GPS
? TBs of
enabled
devices sold
annually
25+ TBs of 2+
log data
every day billion
people on
the Web
76 million smart meters by end
in 2009… 2011
200M by 2014
Variety (Complexity)
• Relational Data (Tables/Transaction/Legacy Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
– Social Network, Semantic Web (RDF), …
• Streaming Data
– You can only scan the data once
• A single application can be generating/collecting
many types of data
• Big Public Data (online, weather, finance, etc)
To extract knowledgeè all these types of
data need to be linked together
A Single View to the Customer
Social Banking
Media Finance
Our
Gaming
Customer Known
History
Entertain Purchase
Velocity (Speed)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions è missing opportunities
• Examples
– E-Promotions: Based on your current location, your purchase
history, what you like è send promotions right now for store next to
you
– Healthcare monitoring: sensors monitoring your activities and
body è any abnormal measurements require immediate reaction
Real-time/Fast Data
Mobile devices
(tracking all objects all the ?me)
Social media and networks Scien9fic instruments
(all of us are genera?ng data) (collec?ng all sorts of data)
Sensor technology and networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
Real-Time Analy9cs/Decision Requirement
Product
Recommenda9ons Learning why Customers
Influence
that are Relevant Behavior Switch to compe9tors
& Compelling and their offers; in
9me to Counter
Friend Invita9ons
Improving the Customer to join a
Marke9ng Game or Ac9vity
Effec9veness of a that expands
Promo9on while it business
is s9ll in Play
Preven9ng Fraud
as it is Occurring
& preven9ng more
proac9vely
Harnessing Big Data
• OLTP: Online Transaction Processing (DBMSs)
• OLAP: Online Analytical Processing (Data Warehousing)
• RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
What’s driving Big Data
- Op?miza?ons and predic?ve analy?cs
- Complex sta?s?cal analysis
- All types of data, and many sources
- Very large datasets
- More of a real-?me
- Ad-hoc querying and repor?ng
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
Types of (Big) Data
• Structured
– Relational Data
• Tables/Transaction/Legacy Data
• Semi-structured
– Text Data
• Emails, comments, tweets, etc.
• XML
• Unstructured
– Video
– Audio
– Graph Data
• Social Networks
Trends:
• SPIMES (Space & Time-Spatiotemporal data)
• Streaming Data
– You can only scan the data once
(Big) Data Features
1. Digitally generated
– The data are created digitally and can be stored using a series of ones and
zeros, and thus can be manipulated by computers.
2. Passively produced
– A by product of our daily lives or interaction with digital services
3. Automatically collected
– There is a system in place that extracts and stores the relevant data as it is
generated
4. Geographically or temporally trackable
– e.g. mobile phone location data or call duration time
5. Continuously analyzed
– i.e. information is relevant to human well-being and development and can be
analyzed in real-time.
Can we avoid Big Data? YES, but
• Pay cash for everything!
• Never go online!
– Never shop online!
• Do not use a mobile phone!
• Do not use (any) cards!
• Do not fill any forms!
• Do not use GPS!
• Do not track anything!
– E.g., CCTV, RFID
• Do not use transac?onal systems
– e.g., ERP, CRM!
Fundamentals:
Big Data Analytics (BDA)
Big Data Analytics?
Big data analytics is about two
things: big data and analytics.
Analytics
It is the discovery of hidden patterns and unknown
facts in (big) data.
Synonymous to Data Mining
Analytics
More numbers…..
Turning Big Data into Value: The datafication of our world gives us unprecedented amounts of data
in terms of Volume, Velocity, Variety and Veracity. The latest technology such as clout computing and
distributed systems together with the latest software and analysis approaches allow us to leverage all
types of data to add value.
Analysis
The ‘Datafication’ Volume
Analysing
of our World;
Big Data:
• Activities
• Text
• Conversations
Velocity
analytics
• Words
• Sentiment
• Voice
analysis
• Social Media
• Face Value
• Browser logs
recognition
Variety
• Photos
• Voice
• Videos
analytics
• Sensors
• Movement
• Etc.
analytics
Veracity
• Etc.
Big Data
Opportunities
Potentials…..
Customer Supply Chain
Quality
Intelligence Management
Risk Management
and Fraud Smart Ci?es
Detec?on
Customer Intelligence
• Big data analytics can provide organizations with the ability to profile
and segment customers based on different socioeconomic
characteristics, as well as increase levels of customer satisfaction
and retention.
• This can allow them to make more informed marketing decisions,
and market to different segments based on their preferences along
with the recognition of sales and marketing opportunities.
• By performing sentiment analysis on this data, firms can be alerted
beforehand when customers are turning against them or shifting to
different products, and accordingly take action.
Supply Chain
• Big data analytics can be used to forecast demand changes, and
accordingly match their supply. This can increasingly benefit the
manufacturing, retail, as well as transport and logistics industries.
• By analyzing stock utilization and geospatial data on deliveries,
organizations can automate replenishment decisions, which will reduce lead
times and minimize costs and delays, as well as process interruptions.
• Furthermore, alternate pricing scenarios can be run instantly, which can
enable a reduction in inventories and an increase in profit margins.
• Accordingly, big data can lead to the identification of the root causes of cost,
and provide for better planning and forecasting.
Quality Management
• Especially for the manufacturing, energy and utilities, and
telecommunications industries, big data can be used for quality
management, in order to increase profitability and reduce costs by
improving the quality of goods and services provided.
• In the manufacturing process, predictive analytics on big data can
be used to minimize the performance variability, as well as
prevent quality issues by providing early warning alerts.
• Real-time data analyses and monitoring of machine logs can
enable managers to make swifter decisions for quality management.
Risk Management & Fraud Detection
• Industries such as investment or retail banking, as well as insurance, can
benefit from big data analytics in the area of risk management. Since the
evaluation and bearing of risk is a critical aspect for the financial services
sector, big data analytics can help in selecting investments by analyzing the
likelihood of gains against the likelihood of losses.
• Internal and external big data can be analyzed for the full and dynamic
appraisal of risk exposures
• High-performance analytics can also be used to integrate the risk profiles
managed in isolation across separate departments, into enterprise wide risk
profiles.
• Banking, and insurance industries, big data analytics can be used to detect
and prevent fraud. Analytics are already commonly used in automated
fraud detection, but organizations and sectors are looking towards
harnessing the potentials of big data in order to improve their systems.
Smart Cities
Smart cities make optimal use of all the interconnected
information available today to better understand and
control its operations and optimize the use of limited
resources.
IBM
Photo: Riox.io
The need for smarter cities
Ø Growing population
Ø According to Frost and Sullivan, 60% of the world’s population is
expected to live in urban environments by 2025.
Ø Resources management (water and energy use)
Ø Global warming (carbon emissions)
Ø Tighter city budgets
Ø Aging infrastructures
How can Big data help the
different sectors mentioned?
The answer is simply..
Big Data Analytics
Mega Smart Cities
Songdo
Photo: Coree Magazine
Smart City Projects
Some Examples
iNeighbour TV- healthcare
The couch still rules!!
Smart public transit- transportation
• Intermittent bus lanes in Lisbon, Portugal
– Bus/HOV lanes, though they improve traffic flow, are often empty
– Wireless sensors in the ground detect presence of public
transport in the bus lanes, so that lanes are only reserved when
public transit vehicles approaching
Smart public transit- transportation
• Intermittent bus lanes in Lisbon, Portugal
– Bus/HOV lanes, though they improve traffic flow, are often empty
– Wireless sensors in the ground detect presence of public
transport in the bus lanes, so that lanes are only reserved when
public transit vehicles approaching
Techne Summit 2015
Vehicles of tomorrow, or actually of today!
• Researchers at Korea’s Advanced Institute of Science and
Technology (KAIST) recently constructed a seven and a half mile
stretch of asphalt roadway with specialized electric cables designed
to power batteries on a moving passenger bus.
Photo: BBC
Challenges
DRIP Syndrome
The majority of organizations suffer from being “data rich & information
poor”.
The hardest question…
With the data reaching exorbitant levels, utilities are now
grappling with a critical, looming issue: how do you convert
this data into meaningful business intelligence to deliver
ROI, value, and profit to our organisation?
Challenges facing organisations
• What actionable insights can be driven based on data
analytics?
• How can we have efficient marketing?
• How to forecast production?
• How to forecast maintenance?
• How to increase customer satisfaction?
• Etc…..
More Challenges…..
Storing unprecedented volumes of data
Scalability
Meta data management
Metadata and Seman?cs for describing content
Bigger Data are not always BeRer data
Right-sizing
Just because it is accessible doesn't’t make it ethical.
Limited access to big data creates new digital divides.
Governing data collec9on
Effec?ve governance of data resources
More Challenges…..
Integra9on
<10% of Big Data is relational
Complex data integration
Security
Security of source data, processing, and of “knowledge”
Privacy
Cost of analy9cs
Interpretations of outcomes!
Availability of the data scien9st
HR’s!
Big Data
Infrastructure &
Example Technologies
Big Data: 3 worlds
Storage
• DBMS, Data marts, EDW
• Distributed systems
• MPP databases
• Non-rela?onal DBs (NoSQL)
• In-memory databases (HANA)
• HDFS
Processing
• Hadoop / MapReduce
Analy9cs
• Associa?on Rules, Clustering, Classifica?on & Decision Trees, Regression
• Social Media & Social Network Analysis
• Text Mining
• Sen?ment Analysis
• Data Visualiza?on (ADV)
Big Data tools
• hadoop
• Teradata DBMS
• Aster Data
• Data Mining
• TWM
• Rapid Miner
• Dashboard
• Tableau
From Big Data 1.0 to Big Data 2.0
• 64One way to think about the state of big data technologies is to draw an analogy
with the business adoption of Internet technologies.
• In Web 1.0, businesses busied themselves with getting the basic internet
technologies in place, so that they could establish a web presence, build
electronic commerce capability, and improve the efficiency of their operations.
• Once firms had incorporated Web 1.0 technologies thoroughly (and in the
process had driven down prices of the underlying technology) they started to
look further. They began to ask what the Web could do for them, and how it
could improve things they’d always done—and we entered the era of Web 2.0,
where new systems and companies began taking advantage of the interactive
nature of the Web.
From Big Data 1.0 to Big Data 2.0
65
• We should expect a Big Data 2.0 phase to follow Big Data 1.0. Once
firms have become capable of processing massive data in a flexible
fashion, they should begin asking: “What can I now do that I couldn’t
do before, or do better than I could do before?” This is likely to be
the golden era of data science.
Current & Emerging
Technologies &
Enablers
In-memory Analytics
In-memory Analytics
• In-memory analytics is an enterprise architecture (EA) framework
solution used to enhance business intelligence (BI) reporting by
querying data from system memory (RAM), versus the traditional
hard disk drive medium. This approach significantly reduces
querying time in an effort to facilitate efficient business decisions.
• Big Players: SAP (HANA Platform), IBM, Tibco, SAS, Oracle,
Tableau, RapidMiner.
IoT
IoT
• The internet of things (IoT) is the internetworking of
physical devices, vehicles (also referred to as
"connected devices" and "smart devices"), buildings and
other items—embedded with electronics, software,
sensors, actuators, and network connectivity that enable
these objects to collect and exchange data.
IoT Applications.. Examples
• Supply Chain Management
• Industry 4.0 (a.k.a Smart Factories, Factories
of the Future FoF);
• Smart Ci?es;
• Smart Buildings;
• Smart Homes;
• Smart Vehicles;
• Smart etc…
Industry 4.0
Industry 4.0
• The term Industry 4.0 initiates from a project in the high-tech strategy of
the German government. Such project advocates for the
computerization of the manufacturing industry. It is also known as
the 4th industrial revolution.
• Precisely speaking, industry 4.0 is based on the technological concepts of
cyber-physical systems, Internet of Things (IoT), which enables the
Factory of the Future (FoF).
• Within the modular structured smart factories of Industry 4.0, cyber-
physical systems monitor physical processes, create a virtual copy of the
physical world and make decentralized decisions. Over the IoT, Cyber-
physical systems communicate and cooperate with each other and with
humans in real time.
Industrial Revolu?on Waves
• Steam Power
• Large-scale manufacture of machine tools
1st
• Electrical Power
• Mass Produc?on
2nd
• Electronics
• Programmable logic controller (PLC)
3rd • Internet & IT/IS Systems
• Industry 4.0
• IoT
4th • FoF
Industry 4.0
• Nowadays, machines are connected via collaborative community creating the
so-called Industry 4.0, whereby machines are connected as a collaborative
community.
• Achieving faultless interaction between surrounding machines and their
corresponding systems turns regular machines into self-aware or smart
machines, and subsequently improves its performance in the environment
where it operates.
• IoT has been an enabler to Industry 4.0. IoT refers to the networked
interconnections of objects that are fitted with ubiquitous intelligence. IoT is
opening huge opportunities of new applications, which promise better quality of
our lives. The connection of physical objects to the Internet makes it conceivable
to access distant sensor data and control the physical world from distance.
• A smart object is a term given to an embedded system, which is connected to
the Internet.
FoF
• The first time the term FoF was mentioned goes back to 1986 where Irwin
Welber gave a keynote speech to the 86 international symposium on robot
manipulators.
• Welber (1987) explained that the FoF would look like a large-scale intelligent
machine, which operates with a highly integrated as well as organized
knowledge base. He also highlighted the need for both suppliers and customers
to become integral part of the FoF environment.
• Becoming more and more adaptive is the primary motivation for manufacturing
enterprises to adopt the FoF initiative. This becomes necessary due to the new
challenges manufacturing enterprises are facing including the need to cut down
production costs, making customized products, as well as being able to respond
to market changes faster than ever before.
Greening
Sustainable business, or green business, is an enterprise that has minimal nega?ve
impact on the global or local environment, community, society, or economy.
Osen, sustainable businesses have progressive environmental and human rights
policies. In general, business is described as green if it matches the following four
criteria:
1. It incorporates principles of sustainability into each of its business decisions.
2. It supplies environmentally friendly products or services that replaces demand for
nongreen products and/or services.
3. It is greener than tradi?onal compe??on.
4. It has made an enduring commitment to environmental principles in its business
opera?ons.
e.g. Greening of Supply Chains
Green Data Centers- energy
Photo: NCC
Green Universities
Source: Duurzaambedrijfsleven
Augmented Reality
• Augmented reality (AR) is a live direct or indirect view of a
physical, real-world environment whose elements are augmented
(or supplemented) by computer-generated sensory input such as
sound, video, graphics or GPS data.
• It is related to a more general concept called mediated reality, in
which a view of reality is modified by a computer or algorithms.
• As a result, the technology functions by enhancing one’s current
perception of reality.
• By contrast, virtual reality replaces the real world with a simulated
one.
Augmented Reality- Applications
• Gaming;
• Product customization;
• Product info & comparisons;
• Marketing;
• Retail;
• Real-time & location-based apps;
• And others.
Augmented Reality- Real-9me Transla9on
Download Blippar
“What is the best measure for success?? Happiness”.
Richard Branson
91
Photo by DAVID ILIFF. License: CC-BY-SA 3.0
References
How Big Data And The Internet Of Things Create Smarter Cities
http://www.forbes.com/sites/bernardmarr/2015/05/19/how-big-data-and-the-internet-of-things-create-smarter-cities/.
Smarter Cities
http://www.ibm.com/smarterplanet/us/en/smarter_cities/overview/,
Frost & Sullivan (2014). “2014 Global Best-in-Class Smart City IntegratorProduct Leadership Award Visionary Innovation
Leadership Award”.
http://en.wikipedia.org/wiki/Masdar_City
https://en.wikipedia.org/wiki/Big_data
The Missing V’s in Big Data: Viability and Value.
http://www.wired.com/insights/2013/05/the-missing-vs-in-big-data-viability-and-value/
South Korea's $35 Billion 'Labor of Love’.
http://www.wsj.com/articles/SB10001424052702304579404579236150341041182
Korea Constructs Road That Wirelessly Charges Moving Electric Buses.
http://www.forbes.com/sites/williampentland/2013/08/11/korea-constructs-road-that-wirelessly-charges-moving-electric-buses/
Data Center for Facebook in Luleå.
http://www.ncc.se/en/our-projects/data-center-for-facebook/
Social TV.
http://socialitv.web.ua.pt/index.php/author/jferraz/
Thank you for attention!
Questions?