KEMBAR78
Hadoop & distributed cloud computing | PPTX
HADOOP & DISTRIBUTED CLOUD
COMPUTING
DATA PROCESSING IN CLOUD




 Presentation By : Rajan Kumar Upadhyay || rajan24oct@gmail.com
CLOUD COMPUTING ?

Cloud computing is a virtual setup box that includes
following
- Delivery of computing as a service rather than product
 - Shared resources are software, utility, hardware provided over a network ( Typically
Internet )

                                   Delivery of computing


                                        Public Utilities


                                     Shared Resources
DISTRIBUTED CLOUD COMPUTING

As the name explains : Distributed computing in cloud
Examples:
• Distributed computing is nothing more than utilizing many networked computers to partition
(split it into many smaller pieces) a question or problem and allow the network to solve the
issue piecemeal
• Software like Hadoop. Written in Java, Hadoop is a scalable, efficient, distributed software
platform designed to process enormous amounts of data. Hadoop can scale to thousands of
computers across many clusters.
• Another instance of distributed computing, for storage instead of processing power, is
bittorrent. A torrent is a file that is split into many pieces and stored on many computers
around the internet. When a local machine wants to access that file, the small pieces are
retrieved and rebuilt.
• P2P network, that send communication/data packages into multiple pieces across multiple
network routes. Then assemble them in receivers end.
Distributed computing on cloud is nothing but next generation framework to utilize the
maximum value of resources over distributed architecure
WHAT IS HADOOP
Flexible infrastructure for large scale computation and data processing on a network of
commodity hardware.
Why Hadoop?
A common infrastructure pattern extracted from building distributed systems


•Scale                                          • Apache.org Open Source project
•Incremental growth                             • Yahoo !, Facebook, Google, Fox, Amazon, IBM,
•Cost                                           NY times uses it for their core infrastructure
•Flexibility                                    • Widely Adopted A valuable and reusable skill set
• Distributed File System                             Taught at major universities
• Distributed Processing Framework                    Easier to hire for
                                                      Easier to train on
                                                      Portable across projects, groups
HOW IT WORKS

HDFS: Hadoop Distributed File System
A distributed file system for large data
• Your data in triplicate ( one local and two remote copies)
• Built-in redundancy, resiliency to large scale failures
 (automated restart and re-allocation )
• Intelligent distribution, striping across racks
• Accommodates very large data sizes On commodity hardware
PROGRAMMING MODEL

There are various programming model for Hadoop
developments. I personally like & experienced with
Map/Reduce

Why Map/Reduce:
•Simple programming technique:
         •   Map(anything)->key, value
         •   Sort, partition on key
         •   Reduce(key,value)->key, value
• No parallel processing / message passing semantics
• Programmable in Java or any other language




                                                       Continued …
PROGRAMMING MODEL



                                                       Gather output of
Create/Allocate                  Move computation       map, sort or
    cluster                          to Data           partition on key




   Put Data                                                  Run          Results of job
                                    Program                reduce          stored on
   into File
                                    Execution               task             HDFS
   System

                                    Your Map code
               Data is split        is copied to the
               into                 allocated nodes,
               blocks, store        preferring nodes
               d in triplicate      that contain
               across your          copies of your
                                    data
               cluster
PRACTICES

Put large data source into HDFS
Perform aggregations, transformations, normalizations on
the data
Load into RDBMS
THANK YOU

Thank you for reading this. I hope you find it useful. Please contact me to
rajan24oct@gmail.com if you have any queries/feedback. My Name is Rajan
Kumar Upadhyay, I have more than 10 years of collective IT experience as a
techie.
If you have anything to share/looking for consulting etc. Please feel free to contact
me.

Hadoop & distributed cloud computing

  • 1.
    HADOOP & DISTRIBUTEDCLOUD COMPUTING DATA PROCESSING IN CLOUD Presentation By : Rajan Kumar Upadhyay || rajan24oct@gmail.com
  • 2.
    CLOUD COMPUTING ? Cloudcomputing is a virtual setup box that includes following - Delivery of computing as a service rather than product - Shared resources are software, utility, hardware provided over a network ( Typically Internet ) Delivery of computing Public Utilities Shared Resources
  • 3.
    DISTRIBUTED CLOUD COMPUTING Asthe name explains : Distributed computing in cloud Examples: • Distributed computing is nothing more than utilizing many networked computers to partition (split it into many smaller pieces) a question or problem and allow the network to solve the issue piecemeal • Software like Hadoop. Written in Java, Hadoop is a scalable, efficient, distributed software platform designed to process enormous amounts of data. Hadoop can scale to thousands of computers across many clusters. • Another instance of distributed computing, for storage instead of processing power, is bittorrent. A torrent is a file that is split into many pieces and stored on many computers around the internet. When a local machine wants to access that file, the small pieces are retrieved and rebuilt. • P2P network, that send communication/data packages into multiple pieces across multiple network routes. Then assemble them in receivers end. Distributed computing on cloud is nothing but next generation framework to utilize the maximum value of resources over distributed architecure
  • 4.
    WHAT IS HADOOP Flexibleinfrastructure for large scale computation and data processing on a network of commodity hardware. Why Hadoop? A common infrastructure pattern extracted from building distributed systems •Scale • Apache.org Open Source project •Incremental growth • Yahoo !, Facebook, Google, Fox, Amazon, IBM, •Cost NY times uses it for their core infrastructure •Flexibility • Widely Adopted A valuable and reusable skill set • Distributed File System Taught at major universities • Distributed Processing Framework Easier to hire for Easier to train on Portable across projects, groups
  • 5.
    HOW IT WORKS HDFS:Hadoop Distributed File System A distributed file system for large data • Your data in triplicate ( one local and two remote copies) • Built-in redundancy, resiliency to large scale failures (automated restart and re-allocation ) • Intelligent distribution, striping across racks • Accommodates very large data sizes On commodity hardware
  • 6.
    PROGRAMMING MODEL There arevarious programming model for Hadoop developments. I personally like & experienced with Map/Reduce Why Map/Reduce: •Simple programming technique: • Map(anything)->key, value • Sort, partition on key • Reduce(key,value)->key, value • No parallel processing / message passing semantics • Programmable in Java or any other language Continued …
  • 7.
    PROGRAMMING MODEL Gather output of Create/Allocate Move computation map, sort or cluster to Data partition on key Put Data Run Results of job Program reduce stored on into File Execution task HDFS System Your Map code Data is split is copied to the into allocated nodes, blocks, store preferring nodes d in triplicate that contain across your copies of your data cluster
  • 8.
    PRACTICES Put large datasource into HDFS Perform aggregations, transformations, normalizations on the data Load into RDBMS
  • 9.
    THANK YOU Thank youfor reading this. I hope you find it useful. Please contact me to rajan24oct@gmail.com if you have any queries/feedback. My Name is Rajan Kumar Upadhyay, I have more than 10 years of collective IT experience as a techie. If you have anything to share/looking for consulting etc. Please feel free to contact me.