0% found this document useful (0 votes)

21 views11 pages

Unit - 1 Learning Notes

The document provides an overview of Big Data Analytics, highlighting the challenges and opportunities presented by the vast amounts of data generated in various industries. It distinguishes between Business Intelligence (BI) and Data Science, emphasizing the need for advanced analytical methods to derive actionable insights from complex data. Additionally, it discusses the current analytical architecture, challenges faced, and the emerging ecosystem surrounding Big Data, underscoring the evolving roles of data scientists in addressing business challenges through analytics.

Uploaded by

Krishnaprasanna M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views11 pages

Unit - 1 Learning Notes

Uploaded by

Krishnaprasanna M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Unit – 1

(Learning Notes)
SYLLABUS:
 Introduction to Big Data Analytics: Big Data Overview
 State of Practice in Analytics Role of Data Scientists
 Big Data Analytics in Industry Verticals

Big Data Overview

Data is created constantly, and at an ever-increasing rate. Mobile

phones, social media, imaging technologies to determine a medical diagnosis-
all these and more create new data, and that must be stored somewhere for
some purpose. Devices and sensors automatically generate diagnostic
information that needs to be stored and processed in real time. Merely keeping
up with this huge influx of data is difficult, but substantially more challenging
is analysing vast amounts of it, especially when it does not conform to
traditional notions of data structure, to identify meaningful patterns and
extract useful information.
These challenges of the data deluge present the opportunity to
transform business, government, science, and everyday life. Several
industries have led the way in developing their ability to gather and exploit
data:
 Credit card companies monitor every purchase their customers make
and can identify fraudulent purchases with a high degree of accuracy
using rules derived by processing billions of transactions.

 Mobile phone companies analyse subscribers' calling patterns to

determine, for example, whether a caller's frequent contacts are on a
rival network. If that rival network is offering an attractive promotion
that might cause the subscriber to defect, the mobile phone company
can proactively offer the subscriber an incentive to remain in her
contract.

 For companies such as Linked In and Facebook, data itself is their

primary product. The valuations of these companies are heavily derived
from the data they gather and host, which contains more and more
intrinsic value as the data grows.
Three attributes stand out as defining Big Data
characteristics:

 Huge volume of data: Rather than thousands or millions of rows, Big

Data can be billions of rows and millions of columns.

 Complexity of data types and structures: Big Data reflects the variety of
new data sources, formats, and structures, including digital traces
being left on the web and other digital repositories for subsequent
analysis.

 Speed of new data creation and growth: Big Data can describe high
velocity data, with rapid data ingestion and near real time analysis.

Definition:
Big Data is data whose scale, distribution, diversity, and/or
timeliness require the use of new technical architectures and
analytics to enable insights that unlock new sources of business
value.
Data Structures
Big data can come in multiple forms, including structured and non-
structured data such as financial data, text files, multimedia files, and genetic
mappings. Contrary to much of the traditional data analysis performed by
organizations, most of the Big Data is unstructured or semi-structured in
nature, which requires different techniques and tools to process and analyse)
Distributed computing environments and massively parallel processing (MPP)
architectures that enable parallelized data ingest and analysis are the
preferred approach to process such complex data.

1. STRUCTURED DATA: Data containing a defined data type, format, and

structure (that is, transaction data, online analytical processing [OLAP]
data cubes, traditional RDBMS, CSV files, and even simple
spreadsheets).
2. SEMI-STRUCTURED DATA: Textual data files with a discernible
pattern that enables parsing (such as Extensible Markup Language
[XML] data files that are self-describing and defined by an XML
schema).

3. QUASI-STRUCTURED DATA: Textual data with erratic data formats

that can be formatted with effort, tools, and time (for instance, web
clickstream data that may contain inconsistencies in data values and
formats).

4. UNSTRUCTURED DATA: Data that has no inherent structure, which

may include text documents, PDFs, images, and video.

Analyst Perspective on Data Repositories

The introduction of spreadsheets enabled business users to create
simple logic on data structured in rows and columns and create their own
analyses of business problems. Database administrator training is not
required to create spreadsheets: They can be set up to do many things quickly
and independently of information technology (IT) groups. Spreadsheets are
easy to share, and end users have control over the logic involved. However,
their proliferation can result in "many versions of the truth." In other words,
it can be challenging to determine if a particular user has the most relevant
version of a spreadsheet, with the most current data and logic in it. Moreover,
if a laptop is lost or a file becomes corrupted, the data and logic within the
spreadsheet could be lost. This is an ongoing challenge because spreadsheet
programs such as Microsoft Excel still run on many computers worldwide.
With the proliferation of data islands (or spread marts), the need to centralize
the data is more pressing than ever.

State of the Practice in Analytics

Business Driver Examples Business Driver Examples

Optimize business operations Optimize business operations
Identify business risk Identify business risk
Predict new business opportunities Predict new business opportunities
Comply with laws or regulatory Comply with laws or regulatory

BI Versus Data Science

Although much is written generally about analytics, it is important to
distinguish between Bland Data Science. One way to evaluate the type of
analysis being performed is to examine the time horizon and the kind of
analytical approaches being used. Bl tends to provide reports, dashboards,
and queries on business questions for the current period or in the past. Bl
systems make it easy to answer questions related to quarter-to-date revenue,
progress toward quarterly targets, and understand how much of a given
product was sold in a prior quarter or year. These questions tend to be closed-
ended and explain current or past behaviour, typically by aggregating
historical data and grouping it in some way. BI provides hindsight and some
insight and generally answers questions related to "when" and "where" events
occurred.
By comparison, Data Science tends to use disaggregated data in a more
forward-looking, exploratory way, focusing on analysing the present and
enabling informed decisions about the future. Rather than aggregating
historical data to look at how many of a given product sold in the previous
quarter, a team may employ Data Science techniques such as time series
analysis to forecast future product sales and revenue more accurately than
extending a simple trend line. In addition, Data Science tends to be more
exploratory in nature and may use scenario optimization to deal with more
open-ended questions.

This approach provides insight into current activity and foresight into
future events, while generally focusing on questions related to "how" and
"why" events occur. Where BI problems tend to require highly structured data
organized in rows and columns for accurate reporting, Data Science projects
tend to use many types of data sources, including large or unconventional
datasets.

Depending on an organization's goals, it may choose to embark on a BI

project if it is doing reporting, creating dashboards, or performing simple
visualizations, or it may choose Data Science projects if it needs to do a more
sophisticated analysis with disaggregated or varied datasets.
Current Analytical Architecture

 For data sources to be loaded into the data warehouse, data needs to
be well understood, structured, and normalized with the appropriate
data type definitions. Although this kind of centralization enables
security, backup, and fail over of highly critical data, it also means that
data typically must go through significant pre-processing and
checkpoints before it can enter this sort of controlled environment,
which does not lend itself to data exploration and iterative analytics.

 As a result of this level of control on the EDW, additional local systems

may emerge in the form of departmental warehouses and local data
marts that business users create to accommodate their need for flexible
analysis. These local data marts may not have the same constraints for
security and structure as the main EDW and allow users to do some
level of more in-depth analysis. However, these one-off systems reside
in isolation, often are not synchronized or integrated with other data
stores and may not be backed up.

 Once in the data warehouse, data is read by additional applications

across the enterprise for Bl and reporting purposes. These are high-
priority operational processes getting critical data feeds from the data
warehouses and repositories.

 At the end of this workflow, analysts get data provisioned for their
downstream analytics. Because users generally are not allowed to run
custom or intensive analytics on production databases, analysts create
data extracts from the EDW to analyse data offline in R or other local
analytical tools. Many a times these tools are limited to in-memory
analytics on desktops analysing samples of data, rather than the entire
population of a dataset. Because these analyses are based on data
extracts, they reside in a separate location, and the results of the
analysis-and any insights on the quality of the data or anomalies-rarely
are fed back into the main data repository.
Challenges in the current architecture:

1. The typical data architectures just described are designed for storing
and processing mission-critical data, supporting enterprise
applications, and enabling corporate reporting activities. Although
reports and dashboards are still important for organizations, most
traditional data architectures inhibit data exploration and more
sophisticated analysis. Moreover, traditional data architectures have
several additional implications for data scientists.

2. High-value data is hard to reach and leverage, and predictive analytics

and data mining activities are last in line for data. Because the EDWs
are designed for central data management and reporting, those wanting
data for analysis are generally prioritized after operational processes.

3. Data moves in batches from EDW to local analytical tools. This workflow
means that data scientists are limited to performing in-memory
analytics (such as with R, SAS, SPSS, or Excel), which will restrict the
size of the data sets they can use. As such, analysis may be subject to
constraints of sampling, which can skew model accuracy.

4. Data Science projects will remain isolated and ad hoc, rather than
centrally managed. The implication of this isolation is that the
organization can never harness the power of advanced analytics in a
scalable way, and Data Science projects will exist as nonstandard
initiatives, which are frequently not aligned with corporate business
goals or strategy.
Drivers of Big Data
To better understand the market drivers related to Big Data, it is helpful
to first understand some past history of data stores and the kinds of
repositories and tools to manage these data stores.

1. Medical information, such as genomic sequencing and diagnostic

imaging

2. Photos and video footage uploaded to the World Wide Web Video
surveillance, such as the thousands of video cameras spread across a
city

3. Mobile devices, which provide geospatial location data of the users, as

well as metadata about text messages, phone calls, and application
usage on smart phones

4. Smart devices, which provide sensor-based collection of information

from smart electric grids, smart buildings, and many other public and
industry infrastructures

5. Non-traditional IT devices, including the use of radio-frequency

identification (RFID) readers, GPS navigation systems, and seismic
processing

Emerging Big Data Ecosystem and a New Approach to Analytics

Organizations and data collectors are realizing that the data they can
gather from individuals contains intrinsic value and, as a result, a new
economy is emerging. As this new digital economy continues to evolve, the
market sees the introduction of data vendors and data cleaners that use
crowdsourcing (such as Mechanical Turk and Ga laxyZoo) to test the
outcomes of machine learning techniques. Other vendors offer added value by
repackaging open source tools in a simpler way and bringing the tools to
market.

Vendors such as Cloudera, Hortonworks, and Pivotal have provided this

value-add for the open source
framework Hadoop.

1. Data devices and the "Sensor net" gat her data from multiple
locations and continuously generate new data about this data. For each
gigabyte of new data created, an additional petabyte of data is created
about that data.
a. For example, consider someone playing an online video game
through a PC, game console, or smartphone. In this case, the
video game provider captures data about the skill and levels
attained by the player. Intelligent systems monitor and log how
and when the user plays the game. As a consequence, the game
provider can fine-tune the difficulty of the game, suggest other
related games that would most likely interest the user, and offer
additional equipment and enhancements for the character based
on the user's age, gender, and interests. This information may get
stored locally or uploaded to the game provider's cloud to analyse
the gaming habits and opportunities for ups ell and cross-sell and
identify archetypical profiles of specific kinds of users.
b. Smartphones provide another rich source of data. In addition to
messaging and basic phone usage, they store and transmit data
about Internet usage, SMS usage, and real-time location. This
metadata can be used for analysing traffic patterns by scanning
the density of smartphones in locations to track the speed of cars
or the relative traffic congestion on busy roads. In this way, GPS
devices in cars can give drivers real-time updates and offer
alternative routes to avoid traffic delays.
c. Retail shopping loyalty cards record not just the amount an
individual spends, but the locations of stores that person visits,
the kinds of products purchased, the stores where goods are
purchased most often, and the combinations of products
purchased together. Collecting this data provides insights into
shopping and travel habits and the likelihood of successful
advertisement targeting for certa in types of retail promotions.

2. Data collectors include sample entities that collect data from the
device and users.
a. Data results from a cable TV provider tracking the shows a person
watches, which TV channels someone will and will not pay for to
watch on demand, and the prices someone is willing to pay for
premium TV content.
b. Retail stores tracking the path a customer takes through their
store while pushing a shopping cart with an RFID chip so they
can gauge which products get the most foot traffic using
geospatial data collected from the RFID chips

3. Data aggregators make sense of the data collected from the various
entities from the "SensorNet" or the "Internet of Things." These
organizations compile data from the devices and usage patterns
collected by government agencies, retail stores, and websites. ln turns,
they can choose to transform and package the data as products to sell
to list brokers, who may want to generate marketing lists of people who
may be good targets for specific ad campaigns.
a. Retail banks, acting as a data buyer, may want to know which
customers the highest likelihood have to apply for a second
mortgage or a home equity line of credit. To provide input for this
analysis, retail banks may purchase data from a data aggregator.
This kind of data may include demographic information about
people living in specific locations; people who appear to have a
specific level of debt, yet still have solid credit scores (or other
characteristics such as paying bills on time and having savings
accounts) that can be used to infer credit worthiness; and those
who are searching the web for information about paying off debts
or doing home remodelling projects. Obtaining data from these
various sources and aggregators will enable a more targeted
marketing campaign, which would have been more challenging
before Big Data due to the lack of information or high-performing
technologies.
b. Using technologies such as Hadoop to perform natural language
processing on unstructured, textual data from social media
websites, users can gauge the reaction to events such as
presidential campaigns. People may, for example, want to
determine public sentiments toward a candidate by analysing
related blogs and online comments. Similarly, data users may
want to track and prepare for natural disasters by identifying
which areas a hurricane affects first and how it moves, based on
which geographic areas are tweeting about it or discussing it via
social media.

KEY ROLES FOR THE NEW BIG DATA ECOSYSTEM

BIG DATA ANALYTICS ACTIVITIES
There are three recurring sets of activities that data scientists perform:

1. Reframe business challenges as analytics challenges. Specifically, this

is a skill to diagnose business problems, consider the core of a given
problem, and determine which kinds of candidate analytical methods
can be applied to solve it.

2. Design, implement, and deploy statistical models and data mining

techniques on Big Data. This set of activities is mainly what people
think about when they consider the role of the Data Scientist: namely,
applying complex or advanced analytical methods to a variety of
business problems using data.

3. Develop insights that lead to actionable recommendations. It is critical

to note that applying advanced methods to data problems does not
necessarily drive new business value. Instead, it is important to learn
how to draw insights out of the data and communicate them effectively.

DATA ANALYST SKILLS

1. Quantitative skill: such as mathematics or statistics

2. Technical aptitude: namely, software engineering, machine learning,

and programming skills

3. Skeptical mind-set and critical thinking: It is important that data

scientists can examine their work critically rather than in a one-sided
way.

4. Curious and creative: Data scientists are passionate about data and
finding creative ways to solve problems and portray information.

5. Communicative and collaborative: Data scientists must be able to

articulate the business value in a clear way and collaboratively work
with other groups, including project sponsors and key stakeholders.

(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
No ratings yet
(Subject Code: 410243) (Class: TE Computer Engineering) : Data Analytics
68 pages
BD1 1
0% (1)
BD1 1
9 pages
Big Data Analytics Unit1
No ratings yet
Big Data Analytics Unit1
10 pages
Big Data Insights for Analysts
No ratings yet
Big Data Insights for Analysts
8 pages
Big Data Analytics Overview
No ratings yet
Big Data Analytics Overview
61 pages
Demystifying Big Data RGc1.0
100% (1)
Demystifying Big Data RGc1.0
10 pages
Data Structures
No ratings yet
Data Structures
50 pages
Data Science Notes
No ratings yet
Data Science Notes
3 pages
SQL ANalyst by CT Taylor Part 3
No ratings yet
SQL ANalyst by CT Taylor Part 3
5 pages
Modern Data Management - AWS
No ratings yet
Modern Data Management - AWS
13 pages
OC - Module 1 - Intro To BDA 021312
No ratings yet
OC - Module 1 - Intro To BDA 021312
37 pages
Lesson 3 Big Data Overview
No ratings yet
Lesson 3 Big Data Overview
30 pages
Unit 1
No ratings yet
Unit 1
61 pages
UNUT 1 - Introduction and Data Analytics Life Cycle
No ratings yet
UNUT 1 - Introduction and Data Analytics Life Cycle
86 pages
Unit - I DA
No ratings yet
Unit - I DA
107 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Big Data Analytics Overview
100% (1)
Big Data Analytics Overview
81 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
Big Data
No ratings yet
Big Data
34 pages
Unit 1
No ratings yet
Unit 1
122 pages
CS8091 BDA Unit 1
No ratings yet
CS8091 BDA Unit 1
118 pages
DW Concepts
100% (1)
DW Concepts
40 pages
DA - Presentation - 20250421 - 182554 - 0000
No ratings yet
DA - Presentation - 20250421 - 182554 - 0000
19 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Ics054 Unit 1
No ratings yet
Ics054 Unit 1
14 pages
Big Data Lesson 1 Lucrezia Noli
No ratings yet
Big Data Lesson 1 Lucrezia Noli
46 pages
Data Strategies for Business Leaders
No ratings yet
Data Strategies for Business Leaders
20 pages
Da Unit-1
No ratings yet
Da Unit-1
24 pages
Data Analytics Unit I 1
No ratings yet
Data Analytics Unit I 1
87 pages
Introduction
No ratings yet
Introduction
21 pages
Modern Data Architecture Guide
88% (8)
Modern Data Architecture Guide
23 pages
Unit I Introduction To Data Science Syllabus
No ratings yet
Unit I Introduction To Data Science Syllabus
10 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
12 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
No ratings yet
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
56 pages
Unit I Big Data
No ratings yet
Unit I Big Data
256 pages
Pertemuan 5. Enterprise Technologies and Big Data Business
No ratings yet
Pertemuan 5. Enterprise Technologies and Big Data Business
30 pages
Unit 1
No ratings yet
Unit 1
21 pages
Big Data Analytics Unit Test-I Answers Bank
No ratings yet
Big Data Analytics Unit Test-I Answers Bank
10 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
DAUnit 1
No ratings yet
DAUnit 1
20 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
Big-Data-Analytics Notes For Ug
No ratings yet
Big-Data-Analytics Notes For Ug
10 pages
Eds Unit 1
No ratings yet
Eds Unit 1
28 pages
Big Data Analytics Course Intro
No ratings yet
Big Data Analytics Course Intro
36 pages
Data For Business Analytics Unit 2
No ratings yet
Data For Business Analytics Unit 2
23 pages
Data and Information Management
No ratings yet
Data and Information Management
18 pages
Key Challenges in Data Analysis Solutions
100% (1)
Key Challenges in Data Analysis Solutions
15 pages
21ai402 Data Analytics Unit-1
No ratings yet
21ai402 Data Analytics Unit-1
37 pages
Business Data Management Essentials
No ratings yet
Business Data Management Essentials
43 pages
Big Data Analytics
100% (1)
Big Data Analytics
11 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
ADBMS Chapter 9
No ratings yet
ADBMS Chapter 9
7 pages
Project
No ratings yet
Project
17 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Week 4
No ratings yet
Week 4
5 pages
BDT 1
No ratings yet
BDT 1
49 pages
Unit 3
No ratings yet
Unit 3
28 pages
Unit - 2 Learning Notes
No ratings yet
Unit - 2 Learning Notes
7 pages
Unit 4
No ratings yet
Unit 4
30 pages
Unit - 3 Learning Notes
No ratings yet
Unit - 3 Learning Notes
8 pages
BDA Unit1 Notes
No ratings yet
BDA Unit1 Notes
14 pages
BDA Unit2 Notes
No ratings yet
BDA Unit2 Notes
23 pages
HT With R
No ratings yet
HT With R
33 pages
BDA Unit 3 Notes
No ratings yet
BDA Unit 3 Notes
10 pages
Please Read The: National Medical Council Guidelines
0% (1)
Please Read The: National Medical Council Guidelines
2 pages
SAP HANA Interview Questions
No ratings yet
SAP HANA Interview Questions
17 pages
DBMS Basics and Features Quiz
No ratings yet
DBMS Basics and Features Quiz
2 pages
Master Data Management Using SAP MDG On HANA - A Cookbook - Sukant Pandey - SAP Data Management 1, 2015 - Anna's Archive
No ratings yet
Master Data Management Using SAP MDG On HANA - A Cookbook - Sukant Pandey - SAP Data Management 1, 2015 - Anna's Archive
67 pages
3 - Investigating Cyber Crimes - Introduction To Cyber Forensic - 1
No ratings yet
3 - Investigating Cyber Crimes - Introduction To Cyber Forensic - 1
5 pages
Bounouh
No ratings yet
Bounouh
13 pages
System Implementation of A Laundry Management System by Faris Azhar Khan Student ID: 201953080007
No ratings yet
System Implementation of A Laundry Management System by Faris Azhar Khan Student ID: 201953080007
10 pages
Importance of Information Technology in Today World - by Niya John - Medium
No ratings yet
Importance of Information Technology in Today World - by Niya John - Medium
2 pages
Resume LinkedIn
No ratings yet
Resume LinkedIn
2 pages
College Information Literacy Exam
No ratings yet
College Information Literacy Exam
8 pages
Chapter 1 Bigdata Introduction Questions Answers
No ratings yet
Chapter 1 Bigdata Introduction Questions Answers
6 pages
Oracle 6i Report Tutorials
No ratings yet
Oracle 6i Report Tutorials
239 pages
UpGrad Campus - Data Science & Analytics Brochure
100% (1)
UpGrad Campus - Data Science & Analytics Brochure
10 pages
Catalog It A Guide To Cataloging School Library Materials Allison G Kaplan Download
No ratings yet
Catalog It A Guide To Cataloging School Library Materials Allison G Kaplan Download
81 pages
Synapsis Service Tool
No ratings yet
Synapsis Service Tool
64 pages
Indexes
No ratings yet
Indexes
70 pages
Data Warehousing Slides
No ratings yet
Data Warehousing Slides
76 pages
Data Processing
0% (1)
Data Processing
14 pages
System Analysis Project Report
No ratings yet
System Analysis Project Report
14 pages
Prepared For: Madam Susana Narawi
No ratings yet
Prepared For: Madam Susana Narawi
21 pages
PowerCenter Mapping Tips
No ratings yet
PowerCenter Mapping Tips
53 pages
SAAD Chapter 2
No ratings yet
SAAD Chapter 2
16 pages
MIS Techmax (Searchable)
No ratings yet
MIS Techmax (Searchable)
64 pages
ABAP Transactions Reference Guide
No ratings yet
ABAP Transactions Reference Guide
3 pages
GDPdU Setup For Microsoft Dynamics AX 2012
No ratings yet
GDPdU Setup For Microsoft Dynamics AX 2012
40 pages
Nursing Information Systems Guide
No ratings yet
Nursing Information Systems Guide
8 pages
BOLD/mBRAVE: Core Informatics for iBOL II
No ratings yet
BOLD/mBRAVE: Core Informatics for iBOL II
43 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
31 pages
Assosiative Memory & VIRTUAL MEMORY
No ratings yet
Assosiative Memory & VIRTUAL MEMORY
17 pages
PHP - Cookies: Assistant Prof. Dr. Rafah M. Almuttairi
No ratings yet
PHP - Cookies: Assistant Prof. Dr. Rafah M. Almuttairi
15 pages

Unit - 1 Learning Notes

Uploaded by

Unit - 1 Learning Notes

Uploaded by

Unit – 1

Big Data Overview

Data is created constantly, and at an ever-increasing rate. Mobile

 Mobile phone companies analyse subscribers' calling patterns to

 For companies such as Linked In and Facebook, data itself is their

 Huge volume of data: Rather than thousands or millions of rows, Big

1. STRUCTURED DATA: Data containing a defined data type, format, and

3. QUASI-STRUCTURED DATA: Textual data with erratic data formats

4. UNSTRUCTURED DATA: Data that has no inherent structure, which

Analyst Perspective on Data Repositories

State of the Practice in Analytics

Business Driver Examples Business Driver Examples

BI Versus Data Science

Depending on an organization's goals, it may choose to embark on a BI

 As a result of this level of control on the EDW, additional local systems

 Once in the data warehouse, data is read by additional applications

2. High-value data is hard to reach and leverage, and predictive analytics

1. Medical information, such as genomic sequencing and diagnostic

3. Mobile devices, which provide geospatial location data of the users, as

4. Smart devices, which provide sensor-based collection of information

5. Non-traditional IT devices, including the use of radio-frequency

Emerging Big Data Ecosystem and a New Approach to Analytics

Vendors such as Cloudera, Hortonworks, and Pivotal have provided this

KEY ROLES FOR THE NEW BIG DATA ECOSYSTEM

1. Reframe business challenges as analytics challenges. Specifically, this

2. Design, implement, and deploy statistical models and data mining

3. Develop insights that lead to actionable recommendations. It is critical

DATA ANALYST SKILLS

2. Technical aptitude: namely, software engineering, machine learning,

3. Skeptical mind-set and critical thinking: It is important that data

5. Communicative and collaborative: Data scientists must be able to

You might also like