KEMBAR78
Data mining with big data | PPTX
Data Mining
With Big Data
Presented By:
Dinesh chandra yenduri
Rg.no : y15mc24095
Abstract
Big Data concern large-volume, complex,
growing data sets with multiple, autonomous
sources. With the fast development of
networking, data storage, and the data collection
capacity, Big Data are now rapidly expanding in
all science and engineering domains, including
physical, biological and biomedical sciences
2
Outlines
• Introduction
• What is Data Mining With Big Data
• How To Produce The Big Data
• Big Data Characteristics
• 4Vs Big Data
• Hadoop System Architecture
• Hadoop Framework
• Data Mining Challenges With Big Data
• Big Data Challenges and solution
• Advantages
• Conclusion
• References
3
Introduction
• The volume of business data worldwide, across all
companies, doubles every 1.2 years (was 1.5 years)
• Daily 2500 quintillion of data are produced and more
than 90 percentage of data are produced within past two
years.
• Face book processes 10 TB of data every day / Twitter 7
TB
• On 4 October 2012, the first presidential debate between
President Barack Obama and Governor Mitt Romney
triggered more than 10 million tweets within 2 hours
• Examples : Booing Jet, Scientific Data, Sensor Data,
Internet Data
4
What is Data Mining With Big Data
5
How To Produce The Big Data
6
Big Data Characteristics
• Data has grown
tremendously.
• Big Data starts with
large-volume,
heterogeneous,
autonomous sources
with distributed and
decentralized system
7
4Vs Big Data
Volume
• Data quantity
Velocity
• Data Speed
Variety
• Data Types
Variability
• Authenticity
8
How To Manage The Big Data
• By using the Hadoop
• It is the open source system
• It is distributed file system
9
Hadoop System Architecture
10
Hadoop Framework
11
Data Mining Challenges With Big Data
• Big Data Mining Platform
• Big Data Semantics and Application Knowledge
• Big Data Mining Algorithm
12
Big Data Mining Platform
• Data are typically large and cannot be fit into the
main memory
• Parallel computing programming to carry out
the mining process
• Big Data processing framework will rely on
cluster computers with a high-performance
computing platform on a large number of
computing nodes
13
Big Data Mining Platform (Cont…)
• Big Data mining offers opportunities to go
beyond traditional relational databases to rely
on less structured data: weblogs, social media,
e-mail, sensors, and photographs that can be
mined for useful information
14
Big Data Semantics and Application
Knowledge
The tw0 most important issues at this section
1) Data sharing and privacy
2) Domain and application knowledge
15
Data sharing and privacy
• Information sharing is an ultimate goal for all
systems involving multiple parties
• Those are the two common approaches or their
1) Restrict access to the data, such as adding
certification or access control to the data
entries, so sensitive information is accessible
by a limited group of users only
2) anonymize data fields such that sensitive
information cannot be pinpointed to an
indivi- dual record
16
Domain and application knowledge
• Domain and application knowledge provides
essential information for designing Big Data
mining algorithms and systems
• The domain and application knowledge can also
help design achievable business objectives by
using Big Data analytical techniques
17
Big Data Mining Algorithm
I. Local Learning and Model Fusion for
Multiple Information Sources
II. Mining from Sparse, Uncertain, and
Incomplete Data
III. Mining Complex and Dynamic Data
18
Local Learning and Model Fusion for Multiple
Information Sources
As Big Data applications are featured with
autonomous sources and decentralized controls,
aggregating distributed data sources to a
centralized site for mining is system - atically
prohibitive due to the potential transmission
cost and privacy concerns
19
Mining from Sparse, Uncertain, and
Incomplete Data
• Sparse, uncertain, and incomplete data are
defining features for Big Data applications
20
Mining Complex and Dynamic Data
• The rise of Big Data is driven by the rapid
increasing of complex data and their changes in
volumes and in nature
• Documents posted on WWW servers, Internet
back- bones, social networks, communication
networks, and transportation networks, and so
on are all featured with dynamic data
21
22
Big Data Challenges and solution
 Location of Big Data sources- Commonly Big
Data are stored in different locations
 Volume of the Big Data- size of the Big Data
grows continuously.
 Hardware resources- RAM capacity
 Privacy
 Domain knowledge
 Getting meaningful information
23
solution
 Parallel computing programming
 An efficient platform for computing
will not have centralized data storage
instead of that platform will be
distributed in big scale storage.
 Restricting access to the data
24
Advantages
• No Fast response
• Extract useful information
• Prediction of required data from large amount of
data
• Serves of better results in the form of
visualization
25
Conclusion
Big Data as an emerging trend and the need for
Big Data mining is arising in all science and
engineering domains. With Big Data
technologies, we will hopefully be able to provide
most relevant and most accurate social sensing
feedback to better understand our society at
real- time
26
References
• R. Ahmed and G. Karypis, “Algorithms for Mining
the Evolution of Conserved Relational States in
Dynamic Networks,” Knowledge and Information
Systems, vol. 33, no. 3, pp. 603-630, Dec. 2012.
• M.H. Alam, J.W. Ha, and S.K. Lee, “Novel
Approaches to Crawling Important Pages Early,”
Knowledge and Information Systems, vol. 33, no. 3,
pp 707-734, Dec. 2012.
• S. Aral and D. Walker, “Identifying Influential and
Susceptible Members of Social Networks,” Science,
vol. 337, pp. 337-341, 2012.
27
28
29

Data mining with big data

  • 1.
    Data Mining With BigData Presented By: Dinesh chandra yenduri Rg.no : y15mc24095
  • 2.
    Abstract Big Data concernlarge-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences 2
  • 3.
    Outlines • Introduction • Whatis Data Mining With Big Data • How To Produce The Big Data • Big Data Characteristics • 4Vs Big Data • Hadoop System Architecture • Hadoop Framework • Data Mining Challenges With Big Data • Big Data Challenges and solution • Advantages • Conclusion • References 3
  • 4.
    Introduction • The volumeof business data worldwide, across all companies, doubles every 1.2 years (was 1.5 years) • Daily 2500 quintillion of data are produced and more than 90 percentage of data are produced within past two years. • Face book processes 10 TB of data every day / Twitter 7 TB • On 4 October 2012, the first presidential debate between President Barack Obama and Governor Mitt Romney triggered more than 10 million tweets within 2 hours • Examples : Booing Jet, Scientific Data, Sensor Data, Internet Data 4
  • 5.
    What is DataMining With Big Data 5
  • 6.
    How To ProduceThe Big Data 6
  • 7.
    Big Data Characteristics •Data has grown tremendously. • Big Data starts with large-volume, heterogeneous, autonomous sources with distributed and decentralized system 7
  • 8.
    4Vs Big Data Volume •Data quantity Velocity • Data Speed Variety • Data Types Variability • Authenticity 8
  • 9.
    How To ManageThe Big Data • By using the Hadoop • It is the open source system • It is distributed file system 9
  • 10.
  • 11.
  • 12.
    Data Mining ChallengesWith Big Data • Big Data Mining Platform • Big Data Semantics and Application Knowledge • Big Data Mining Algorithm 12
  • 13.
    Big Data MiningPlatform • Data are typically large and cannot be fit into the main memory • Parallel computing programming to carry out the mining process • Big Data processing framework will rely on cluster computers with a high-performance computing platform on a large number of computing nodes 13
  • 14.
    Big Data MiningPlatform (Cont…) • Big Data mining offers opportunities to go beyond traditional relational databases to rely on less structured data: weblogs, social media, e-mail, sensors, and photographs that can be mined for useful information 14
  • 15.
    Big Data Semanticsand Application Knowledge The tw0 most important issues at this section 1) Data sharing and privacy 2) Domain and application knowledge 15
  • 16.
    Data sharing andprivacy • Information sharing is an ultimate goal for all systems involving multiple parties • Those are the two common approaches or their 1) Restrict access to the data, such as adding certification or access control to the data entries, so sensitive information is accessible by a limited group of users only 2) anonymize data fields such that sensitive information cannot be pinpointed to an indivi- dual record 16
  • 17.
    Domain and applicationknowledge • Domain and application knowledge provides essential information for designing Big Data mining algorithms and systems • The domain and application knowledge can also help design achievable business objectives by using Big Data analytical techniques 17
  • 18.
    Big Data MiningAlgorithm I. Local Learning and Model Fusion for Multiple Information Sources II. Mining from Sparse, Uncertain, and Incomplete Data III. Mining Complex and Dynamic Data 18
  • 19.
    Local Learning andModel Fusion for Multiple Information Sources As Big Data applications are featured with autonomous sources and decentralized controls, aggregating distributed data sources to a centralized site for mining is system - atically prohibitive due to the potential transmission cost and privacy concerns 19
  • 20.
    Mining from Sparse,Uncertain, and Incomplete Data • Sparse, uncertain, and incomplete data are defining features for Big Data applications 20
  • 21.
    Mining Complex andDynamic Data • The rise of Big Data is driven by the rapid increasing of complex data and their changes in volumes and in nature • Documents posted on WWW servers, Internet back- bones, social networks, communication networks, and transportation networks, and so on are all featured with dynamic data 21
  • 22.
  • 23.
    Big Data Challengesand solution  Location of Big Data sources- Commonly Big Data are stored in different locations  Volume of the Big Data- size of the Big Data grows continuously.  Hardware resources- RAM capacity  Privacy  Domain knowledge  Getting meaningful information 23
  • 24.
    solution  Parallel computingprogramming  An efficient platform for computing will not have centralized data storage instead of that platform will be distributed in big scale storage.  Restricting access to the data 24
  • 25.
    Advantages • No Fastresponse • Extract useful information • Prediction of required data from large amount of data • Serves of better results in the form of visualization 25
  • 26.
    Conclusion Big Data asan emerging trend and the need for Big Data mining is arising in all science and engineering domains. With Big Data technologies, we will hopefully be able to provide most relevant and most accurate social sensing feedback to better understand our society at real- time 26
  • 27.
    References • R. Ahmedand G. Karypis, “Algorithms for Mining the Evolution of Conserved Relational States in Dynamic Networks,” Knowledge and Information Systems, vol. 33, no. 3, pp. 603-630, Dec. 2012. • M.H. Alam, J.W. Ha, and S.K. Lee, “Novel Approaches to Crawling Important Pages Early,” Knowledge and Information Systems, vol. 33, no. 3, pp 707-734, Dec. 2012. • S. Aral and D. Walker, “Identifying Influential and Susceptible Members of Social Networks,” Science, vol. 337, pp. 337-341, 2012. 27
  • 28.
  • 29.