BIG DATA TESTING
BIG DATA ?
• It is a term for collection of data sets so large & complex that becomes difficult
to process using traditional data processing applications.
Big Data
Activities
Normal
Processing
Capabilities
Content Volume
• Social Networking sites like Facebook, LinkedIn, Twitter etc.,
• Mobile device data such as Text messages, Calls data, Apps data etc.,
“Big Data” Sources • Internet Transactions like e-Commerce websites, banking activities etc
• Network devices/ sensors data like weather forecasting, temp etc.,
Need of RDBMS
• Very Quick in response
• Enables relation between data elements to be defined &
managed
Traditional Data Processing • Single DB can be utilized for all applications
Limitations of traditional approach
• Data processing takes too long as the volume of data increases
• Not Scalable
Business Master data Transactions
Strategy
Business Processes
OLTP
Operations
OLTP & OLAP Information
OLAP
Business Data
Warehouse
Data Mining
Analytics
5 Vs
• Volume
• Velocity
Big Data
Characteristics • Variety
• Value
• Veracity
Apache Hadoop is a framework that allows distributed processing
of large datasets across clusters of commodity of computers using
a simple programming model
It is an architecture that can scale with huge volumes, variety and
speed requirements of big data by distributing the workload
across various commodity servers that process the data in parallel.
Goals of HDFS:
HADOOP Fast recovery from
hardware failures.
Access to streaming
data
Accommodation of
large data sets
Portability
Phases in Big Data Testing
Test Entry Points
• Data Staging Validation
Data
Source
(RDBMS, • Map reduce Validation
MongoD
Source ETL Target Data B
HADOOP Process Warehouse I
B, social
media • Output Validation
data etc)
HDFS – For data storage
Pig & Hive / Map reduce - for Processing
& Transforming data
Sqoop – For bulk transfer of data between
Tools Used in Big Data Scenarios
RDBMS and HDFS
Kafka – For real-time data streaming
• TestingWhiz - Helps in verifying structured &
unstructured data sets, schemas at different
sources such as Hive, Map reduce, Sqoop &
Pig
• QuerySurge – Helps in end – end testing
Automation Tools &
Challenges in Big Data Testing Challenges in Big data testing:
Large datasets & possible latency.
Automation tools may not be well equipped to
handle unexpected challenges.
Performance testing