Big Data Analytics
Sub Code : CS0444
Course Instructor : M J Yogesh
Designation: Assistant Professor
Department: Computer Science and Engg.
College: NIE, Mysore
M J Yogesh
Big Data Analytics
Introduction
Big Data is a popular topic
Will it take away my current investment in Business Intelligence or
replace my organization?
How do I integrate my Data Warehouse and Business Intelligence
with Big Data?
How do I get started, so I can show some results?
What are the skills required?
What happens to data governance?
How do we deal with data privacy?
So, what is Big Data?
There are two common sources of data grouped
under the banner of Big Data.
First, we have a fair amount of data within the
corporation that, thanks to automation and access, is
increasingly shared.
This includes emails, mainframe logs, blogs, Adobe
PDF documents, business process events, and any
other structured, unstructured, or semi-structured
data available inside the organization.
Second, we are seeing a lot more data outside the
organization some available publicly free of cost,
some based on paid subscription, and the rest
available selectively for specific business partners or
customers.
What are Industries doing to create
new data......
Many organizations are trying
customers to create new data.
to
incentivize
For example, Foursquare (www.foursquare.com)
encourages me to document my visits to a set of
businesses advertised through Foursquare.
It provides me with points for each visit and rewards
me with the Mayor title if I am the most frequent
visitor to a specific business location.
Types of Big Data
Web and social media
Machine-to-Machine (M2M)
Big transaction data
Biometrics
Human Generated
Examples of Big Data
Social media text
Cell phone locations
Channel click information from set-top box
Web browsing and search
Product manuals
Communications network events
Call detail records (CDRs)
Radio Frequency Identification (RFID) tags
Maps
Traffic patterns
Weather data
Mainframe logs
Why is Big Data different from any
other data????
There are four Vs that characterize this data:
Volume
Velocity
Variety
Veracity
Volume
Most organizations were already struggling
with the increasing size of their databases as
the Big Data tsunami hit the data stores.
Organizations typically counted their data
storage for analytics infrastructure in
terabytes.
They have now graduated to applications
requiring storage in petabytes.
This data is straining the analytics
infrastructure in a number of industries.
Example
For a communications service provider (CSP)
with 100 million customers, the daily location
data could amount to about 50 terabytes.
Which, if stored for 100 days, would occupy
about 5 petabytes.
Assignment 1
By Choosing an Industry of your choice
Find out the volume of data generated by various
Industries for a month?
Velocity
There are two aspects to velocity
(a) Throughput of data
(b) Latency
Throughput
Which represents the data moving in the pipes.
The amount of global mobile data is growing at a
78 percent compounded growth rate and is
expected to reach 10.8 exabytes per month in
2016 as consumers share more pictures and
videos.
To analyze this data, the corporate analytics
infrastructure is seeking bigger pipes and
massively parallel processing.
Latency
Latency is the other measure of velocity.
Analytics used to be a store and report
environment
where
reporting
typically
contained data as of yesterdaypopularly
represented as D-1.
Now, the analytics is increasingly being
embedded in business processes using datain-motion with reduced latency.
For example, Turn (www.turn.com) is conducting its
analytics in 10 milliseconds to place advertisements in
online advertising platforms.
Variety (Complexity)
Relational Data (Tables/Transaction/Legacy
Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF),
Streaming Data
You can only scan the data once
A single application can be generating/collecting
many types of data
Big Public Data (online, weather, finance, etc)
A Single View to the Customer
Banking
Finance
Social
Media
Our
Known
History
Customer
Gaming
Entertain
Purchase
Variety
Data Warehouse technology was rapidly
introduced, the initial push was to create metamodels to represent all the data in one
standard format.
The data was compiled from a variety of
sources and transformed using ETL (Extract,
Transform, Load) or ELT (Extract the data and
Load it in the warehouse, then Transform it
inside the warehouse).
The basic premise was narrow variety and
structured content.
Big Data has significantly expanded our
horizons, enabled by new data integration and
analytics technologies.
A number of call center analytics solutions are
seeking analysis of call center conversations
and their correlation with emails, trouble
tickets, and social media blogs.
The source data includes unstructured text,
sound, and video in addition to structured
data.
A number of applications are gathering data
from emails, documents, or blogs.
For example, Slice provides order analytics for
online orders (see www.slice.com for details).
Its raw data comes from parsing emails and
looking for information from a variety of
organizationsairline tickets, online bookstore
purchases, music download receipts, city
parking tickets, or anything you can purchase
and pay for that hits your email.
Examples for Variety
Another example of enabling technology is IBMs InfoSphere Streams platform
Which has dealt with a variety of sources for real-time analytics and decision
making
Medical instruments for neonatal analysis,
Seismic data
CDRs (Call Data Records)
Network events
RFID tags
Traffic patterns
Weather data
Mainframe logs
Voice in many languages
Video
Veracity
Unlike carefully governed internal data, most
Big Data comes from sources outside our
control and therefore suffers from significant
correctness or accuracy problems.
Veracity represents both the credibility of the
data source as well as the suitability of the
data for the target audience.
Source Credibility
If an organization were to collect product
information from third parties and offer it to
their contact center employees to support
customer queries.
The data would have to be screened for
source accuracy and credibility.
Audience Suitability
We must also think about audience suitability
and how much truth can be shared with a
specific audience.
For example, if customer service has provided
inputs to engineering on product shortcomings
as seen at the customer touch points, this data
should be shared selectively, on a need-toknow basis.
Other data may be shared only with customers
who have valid contracts or other
prerequisites.
Some Make it 4Vs
Business Head Common Questions
The three most common questions have been
as follows:
1. What is Big Data and what are others doing
with it?
2. How do we build a strategic plan for Big Data
Analytics in response to a management
request?
3. How does Big Data change our analytics
organization and architecture?
Big Data Analytics
Big data is more real-time in
nature than traditional DW
applications
Traditional DW architectures (e.g.
Exadata, Teradata) are not wellsuited for big data apps
Shared nothing, massively
parallel processing, scale out
architectures are well-suited for
big data apps
Big Data Technology
Whats driving Big Data
- Optimizations and predictive
analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical
sources
- Small to mid-size datasets
IBM Security Systems
What does Big Data trigger?
From Big Data and the Web: Algorithms for Data Intensive Scalable Computing, Ph.D Thesis, Gianmarco
30
30
2013 IBM Corporation
2012 IBM Corporation
IBM Security Systems
How are revenues looking like.
31
31
2013 IBM Corporation
2012 IBM Corporation
Drivers for Big Data
We are increasing the pace for Big Data creation.
There are three contributing factors:
Consumers
Automation
Monetization
More than each of these contributing factors, their interaction is
speeding the creation of Big Data.
With increasing automation, it is easier to offer Big Data creation and
consumption opportunities to the consumers and the monetization
process is increasingly providing an efficient marketplace for Big
Data.
Sophisticated Consumers
The increase in information level and the associated tools has
created a new breed of sophisticated consumers.
These consumers are far more analytic, far savvier at using
statistics, and far more connected, using social media to rapidly
collect and collate opinion from others.
We live in a world full of marketing messages.
While most of the marketing is still broadcast using newspaper,
magazine, network TV, radio, and display advertising, even in the
conventional media, narrow casting is gradually becoming more
prominent.
This is seen in local advertisement insertions in magazines,
insertion of narrow cast commercials using set-top boxes, and use of
commuter information to change street display ads.
The Internet world can become highly personalized.
Search engines, social network sites, and electronic
yellow pages insert advertisements specific to an
individual or to a micro-segment.
Internet cookies are increasingly used to track user
behavior and to tailor content based on this behavior.
Email and text messages rapidly led toward increased
interpersonal interactions.
Communication started not only with marketers but also
with third parties and friends.
Communication expanded to bulletin boards, group
chats, and social media, allowing us to converse
about our purchase intentions, fears,expectations,
and disappointments with small and large social
groups.
Unlike email and text, the conversations are on the
Web for others to read, either now or later.
The next sets of sources combine information from
more than one media.
Forexample, Facebook conversations involve a
number of media, including text, sound clips, photos,
and video.
Second world and alternate reality are
becoming interesting avenues for trying out
product ideas in a simulated world where
product usage can be experimented with.
We often need experts to help us sort out
product features and how they relate to our
product usage.
A large variety of experts are available today
to help us with usage, quality, pricing, and
value-related information about products.
A number of marketers are encouraging
advisor or ambassador programs using social
media sites.
These selected customers get a preview of
new products and actively participate in
evaluating and promoting new products.
At the end of the day, people we know and
trust sway our decisions.
This is the biggest contribution of social
networks.
They have brought consumers together such
that sharing customer experiences is now far
more frequent than ever before.
Many ways to utilize social networks to influence purchase
and reuse
Studying consumer experienceA fair amount of this data is unstructured.
By analyzing the text for sentiments, intensity, readership, related blogs, referrals,
and other information, we can organize the data into positive and negative
influences and their impact on the customer base.
Organizing customer experienceWe can provide reviews to a prospective
buyer, so they can gauge how others evaluated the product.
Influencing social networksWe can provide marketing material, product
changes, company directions, and celebrity endorsements to social networks,
so that social media may influence and enhance the buzz.
Feedback to products, operations, or marketingBy using information
generated by social media, we can rapidly make changes in the product mix
and marketing to improve the offering to customers.
Real-Time Analytics/Decision Requirement
Product
Recommendations
that are Relevant
& Compelling
Improving the
Marketing
Effectiveness of a
Promotion while it
is still in Play
Influence
Behavior
Learning why Customers
Switch to competitors
and their offers; in
time to Counter
Customer
Preventing Fraud
as it is Occurring
& preventing more
proactively
Friend Invitations
to join a
Game or Activity
that expands
business
Automation
Interactive Voice Response (IVR), kiosks, mobile devices, email,
chat, corporate Websites, third-party applications, and social
networks have generated a fair amount of event information about
the customers.
In addition, customer interactions via traditional media such as call
centers can now be analyzed and organized.
The biggest change is in our ability to modify the customer
experience using software policies, procedures, and personalization,
making self-service increasingly customer friendly.
Sales and marketing have received their biggest boost in
instrumentation from Internet-driven automation over the past 10
years.
Browsing, shopping, ordering, and customer service on the Web
not only has provided tremendous control to users but also has
created an enormous flood of information to the marketing, product,
and sales organization in understanding buyer behavior.
Each sequence of Web clicks can be collected, collated, and
analyzed for customer delight and outright defection.
More information can also be obtained about sequence leading up
to a decision.
What are the sources of data from such self-service
interactions?
ProductAs products become increasingly
electronic, they provide a lot of valuable data to
the supplier regarding product use and product
quality.
In many cases, suppliers can also collect
information about the context in which a
product was used.
Products can also supply information related to
frequency of use, interruptions, usage
skipping, and other related aspects.
Electronic touch pointsA fair amount of data
can be collected from the touch points used for
product shopping, purchase, use, or payment.
IVR tree traversals can be logged, Web click
streams can be collected, and so on.
ComponentsSometimes, components may
provide additional information. This information
could include data about component failures,
use, or lack thereof.
For example, a wireless CSP can collect data
from networks, cell towers, third parties, and
handheld devices to understand how all the
components together provided a good or bad
service to the customer.
Monetization
From a Big Data Analytics perspective, a data bazaar is the
biggest enabler to create an external marketplace, where we
collect, exchange, and sell customer information.
We are seeing a new trend in the marketplace, in which
customer experience from one industry is anonymized,
packaged, and sold to other industries.
Fortunately for us, Internet advertising came to our rescue in
providing an incentive to customers through free services and
across-the-board opt-ins.
Internet advertising is a remarkably complex field.
What is interesting is that this advertising money is enhancing
customer experience.
Big Data Analytics is creating a new market, where customer data from one
industry can be collected, categorized, anonymized, and repackaged for
sale to others:
1. LocationLocation is increasingly available to suppliers. Assuming a
product is consumed in conjunction with a mobile device, the location of the
consumer becomes an important piece of information that may be available
to the supplier.
2. CookiesWeb browsers carry enormous information using web cookies.
Some of this may be directly associated with touch points.
3. Usage dataA number of data providers have started to collect,
synthesize, categorize, and package information for reuse. This includes
credit-rating agencies that rate consumers, social networks with blogs
published or Like clicked, and cable companies with audience
information.
Big Data Analytics Applications
Here we discuss a number of important use
cases for Big Data Analytics.
In each case, Big Data Analytics is becoming
integrated with business processes and
traditional analytics to provide major outcomes.
In many cases,these use cases represent
game changers essential to the survival and
growth of an organization in an increasingly
competitive marketplace.
Some of these use cases are still in their
infancy,
while
others
are
becoming
increasingly commonplace.
Social Media Command Center
A Social Media Command Center combines
automated search and display of consumer
feedback expressed publicly on the social
media.
Often, the feedback is summarized in the form
of positive or negative sentiment.
Once the feedback is obtained, the marketer
can respond to specific comments by entering
into a conversation with the affected
consumers, whether to respond to questions
about an outage or obtain feedback about a
new product offering.
Figure: Gatorade Social Media Command Center
The marketing organization for Gatorade, a
sports drink product, decided to create a Social
Media Command Center to increase consumer
dialog with Gatorade.
Big Data Analytics can be used to monitor
social media for feedback on product, price,
and promotions as well as to automate the
actions taken in response to the feedback.
This may require communication with a
number of internal organizations, tracking a
product or service problem, and dialog with
customers as the feedback results in product
or service changes.
When consumers provide feedback, the dialog
Product Knowledge Hub
As consumers turn into sophisticated users of
technology and the marketplace becomes
specialized, the product knowledge seldom
belongs to one organization.
Take the Apple iPhone as an example.
The iPhone is marketed by Apple, but its parts
came from a large supply chain pool, the apps
running on the iPhone come from a large
community of app developers, and the
communications service is provided by a CSP.
Googles Android is even more diverse, as Google
provides the operating system while a cell phone
manufacturer makes the device.
The smartphones do not work in isolation.
They act as WiFi hubs for other devices.
So now, we are facing data that is characterized by both
variety and veracity. Can we use Big Data Analytics to
solve this problem?
What we have created is a knowledge hub, which can
now be used directly from a website or made available to
the call centers.
It significantly reduces call handling time in the call
centers and also increases first call resolution.
By placing the information on the web, we are now
promoting the CSPs website as the source of knowledge,
which increases web traffic and reduces the number of
people who resort to contacting the call center.
Once we have created a single source of knowledge, this
source can be used to upsell other products, connecting
usage knowledge to product features and using the
knowledge pool to discover new product or business
partnership ideas.
More Use cases.....(SLE)
Infrastructure and Operations Studies
Product Selection, Design, and Engineering
Location-Based Services
Micro-Segmentation and Next Best Action
Online Advertising
Improved Risk Management
END OF UNIT-1