Quiz Assignment-I Solutions: Big Data Computing (Week-1)
___________________________________________________________________________
Q.1 ________________ is responsible for allocating system resources to the various
applications running in a Hadoop cluster and scheduling tasks to be executed on different
cluster nodes.
A. Hadoop Common
B. Hadoop Distributed File System (HDFS)
C. Hadoop YARN
D. Hadoop MapReduce
Answer: C) Hadoop YARN
Explanation:
Hadoop Common: It contains libraries and utilities needed by other Hadoop modules.
Hadoop Distributed File System (HDFS): It is a distributed file system that stores data on a
commodity machine. Providing very high aggregate bandwidth across the entire cluster.
Hadoop YARN: It is a resource management platform responsible for managing compute
resources in the cluster and using them in order to schedule users and applications. YARN is
responsible for allocating system resources to the various applications running in a Hadoop
cluster and scheduling tasks to be executed on different cluster nodes
Hadoop MapReduce: It is a programming model that scales data across a lot of different
processes.
Q. 2 Which of the following tool is designed for efficiently transferring bulk data between
Apache Hadoop and structured datastores such as relational databases ?
A. Pig
B. Mahout
C. Apache Sqoop
D. Flume
Answer: C) Apache Sqoop
Explanation: Apache Sqoop is a tool designed for efficiently transferring bulk data between
Apache Hadoop and structured datastores such as relational databases
Q. 3 _________________is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data.
A. Flume
B. Apache Sqoop
C. Pig
D. Mahout
Answer: A) Flume
Explanation: Flume is a distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of log data. It has a simple and very flexible
architecture based on streaming data flows. It's quite robust and fall tolerant, and it's really
tunable to enhance the reliability mechanisms, fail over, recovery, and all the other
mechanisms that keep the cluster safe and reliable. It uses simple extensible data model that
allows us to apply all kinds of online analytic applications.
Q. 4 _______________refers to the connectedness of big data.
A. Value
B. Veracity
C. Velocity
D. Valence
Answer: D) Valence
Explanation: Valence refers to the connectedness of big data. Such as in the form of graph
networks
Q. 5 Consider the following statements:
Statement 1: Volatility refers to the data velocity relative to timescale of event being studied
Statement 2: Viscosity refers to the rate of data loss and stable lifetime of data
A. Only statement 1 is true
B. Only statement 2 is true
C. Both statements are true
D. Both statements are false
Answer: D) Both statements are false
Explanation: The correct statements are:
Statement 1: Viscosity refers to the data velocity relative to timescale of event being studied
Statement 2: Volatility refers to the rate of data loss and stable lifetime of data
Q. 6 ________________refers to the biases, noise and abnormality in data, trustworthiness
of data.
A. Value
B. Veracity
C. Velocity
D. Volume
Answer: B) Veracity
Explanation: Veracity refers to the biases ,noise and abnormality in data, trustworthiness of
data.
Q. 7 _____________ brings scalable parallel database technology to Hadoop and allows
users to submit low latencies queries to the data that's stored within the HDFS or the Hbase
without acquiring a ton of data movement and manipulation.
A. Apache Sqoop
B. Mahout
C. Flume
D. Impala
Answer: D) Impala
Explanation: Cloudera, Impala was designed specifically at Cloudera, and it's a query engine
that runs on top of the Apache Hadoop. The project was officially announced at the end of
2012, and became a publicly available, open source distribution. Impala brings scalable
parallel database technology to Hadoop and allows users to submit low latencies queries to
the data that's stored within the HDFS or the Hbase without acquiring a ton of data movement
and manipulation.
Q. 8 True or False ?
NoSQL databases store unstructured data with no particular schema
A. True
B. False
Answer: A) True
Explanation: While the traditional SQL can be effectively used to handle large amount of
structured data, we need NoSQL (Not Only SQL) to handle unstructured data. NoSQL
databases store unstructured data with no particular schema
Q. 9 ____________is a highly reliable distributed coordination kernel , which can be used for
distributed locking, configuration management, leadership election, and work queues etc.
A. Apache Sqoop
B. Mahout
C. ZooKeeper
D. Flume
Answer: C) ZooKeeper
Explanation: ZooKeeper is a central store of key value using which distributed systems can
coordinate. Since it needs to be able to handle the load, Zookeeper itself runs on many
machines.
Q. 10 True or False ?
MapReduce is a programming model and an associated implementation for processing and
generating large data sets.
A. True
B. False
Answer: A) True
___________________________________________________________________________