KEMBAR78
Model Question Paper 3 | PDF | Data Analysis | Data
0% found this document useful (0 votes)
41 views8 pages

Model Question Paper 3

This document is a model question paper consisting of 60 multiple-choice questions focused on data analysis and visualization concepts. The questions cover various topics such as data cleaning, data types, data visualization techniques, machine learning, and data governance. Each question provides four answer options, requiring the selection of the most appropriate response.

Uploaded by

ganesh123143gani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views8 pages

Model Question Paper 3

This document is a model question paper consisting of 60 multiple-choice questions focused on data analysis and visualization concepts. The questions cover various topics such as data cleaning, data types, data visualization techniques, machine learning, and data governance. Each question provides four answer options, requiring the selection of the most appropriate response.

Uploaded by

ganesh123143gani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Model Question Paper 3: Data Analysis and Visualization

Instructions: This paper contains 60 multiple-choice questions. Each question has four
possible answers. Choose the most appropriate answer for each question.

1. Why is data cleaning considered a critical step in the data analytics process?
a) Because it makes the dataset larger
b) Because "garbage in, garbage out"—poor quality data leads to inaccurate
insights
c) Because it is the only step that requires coding
d) Because it is the final step before presenting results

2. What is the fundamental difference between data and information?


a) There is no difference
b) Data is processed information, while information is raw
c) Data is raw and unorganized, while information is processed, organized, and
has context
d) Data is always numerical, while information is always textual

3. Which of the following best describes 'data quality'?


a) The amount of data collected
b) The speed at which data is collected
c) The measure of data's fitness for its intended purpose, considering factors
like accuracy, completeness, and timeliness
d) The variety of data types in a dataset

4. Why would an organization choose a non-relational database over a relational one?


a) For applications requiring a rigid, predefined schema
b) When dealing with highly structured transactional data
c) For applications requiring high scalability and flexibility with unstructured or
semi-structured data
d) When they want to use complex SQL joins exclusively

5. What is the primary objective of data visualization?


a) To make simple data look more complex
b) To represent data graphically for aesthetic purposes only
c) To communicate complex information clearly and efficiently to reveal
patterns and insights
d) To replace the need for statistical analysis

6. In machine learning, what does the term 'supervised' refer to?


a) The model learns without any human intervention
b) The model learns from data that has been manually labeled with the correct
outcomes
c) The model supervises the data cleaning process
d) The model learns through a system of rewards and punishments

7. What is the main purpose of the 'discovery' phase in the data analytics lifecycle?
a) To build and test models
b) To clean and prepare the data
c) To understand and define the business problem, objectives, and scope
d) To deploy the final solution

8. Which data type would be most appropriate for storing a customer's gender (Male,
Female, Other)?
a) Quantitative Continuous
b) Quantitative Discrete
c) Qualitative Nominal
d) Qualitative Ordinal

9. What is a major limitation of using a pie chart?


a) It cannot show parts of a whole
b) It becomes difficult to compare the size of slices when there are many
categories
c) It is only suitable for time-series data
d) It cannot be created in Excel

10. The process of identifying and correcting or removing corrupt or inaccurate records from
a dataset is called:
a) Data Wrangling
b) Data Cleaning
c) Data Mining
d) Data Visualization

11. Which of the following is a core concept of statistics used heavily in data analytics?
a) Software engineering
b) Hypothesis testing
c) Network administration
d) Graphic design

12. What is the main goal of 'predictive analytics'?


a) To explain why a past event occurred
b) To provide a summary of historical data
c) To make forecasts about future events or outcomes
d) To recommend a specific course of action

13. Which statement best describes a 'data-driven culture'?


a) An environment where decisions are primarily based on intuition and experience
b) An organization where data is collected but rarely used
c) An environment where decisions at all levels are consistently based on data
and analysis
d) An organization that only uses data for marketing purposes

14. What is the role of 'unsupervised learning'?


a) To predict a target value based on labeled input data
b) To find hidden patterns or intrinsic structures in unlabeled data
c) To learn a task through trial and error
d) To classify data into predefined categories
15. Why is it important to document the data cleaning process?
a) To make the process longer and more complex
b) To ensure reproducibility and transparency in the analysis
c) It is not important
d) To get a higher salary

16. What does 'data timeliness' refer to?


a) The accuracy of the data
b) The completeness of the data
c) The degree to which data is up-to-date and available when needed
d) The number of sources the data comes from

17. A bar chart is used to compare quantities across different categories. What does the
length of each bar represent?
a) The frequency of the category
b) The value or quantity for that category
c) The rank of the category
d) The time period

18. Which of the following is an example of a data reduction technique?


a) Imputing missing values
b) Normalizing data
c) Aggregating daily sales data into monthly sales data
d) One-hot encoding categorical variables

19. What is the 'bias-variance tradeoff' in machine learning?


a) The tradeoff between the speed and accuracy of a model
b) The challenge of balancing a model's ability to fit the training data (low bias)
with its ability to generalize to new data (low variance)
c) The tradeoff between using supervised and unsupervised learning
d) The cost of data storage versus the cost of processing

20. Which of the following is a key reason for the recent explosion in the amount of data (Big
Data)?
a) The decrease in the use of computers
b) The rise of the internet, social media, and IoT devices
c) The decline in data storage capacity
d) The reduced speed of internet connections

21. • What is the primary function of the Python Pandas library?


a) For advanced mathematical and scientific computing
b) For creating static and interactive visualizations
c) For data manipulation and analysis, particularly for tabular data
d) For building and training deep learning models
22. • In data visualization, what is 'chart junk'?
a) Essential elements like titles and axis labels
b) The raw data used to create the chart
c) Unnecessary or distracting visual elements that do not add to the
understanding of the data
d) The caption explaining the chart
23. • What is 'data profiling'?
a) The process of creating a predictive model
b) The process of examining a dataset to understand its structure, content,
and quality
c) The process of deploying a model to production
d) The process of collecting data
24. • Which measurement scale provides the most information?
a) Nominal
b) Ordinal
c) Interval
d) Ratio
25. • Why is "knowing your audience" a crucial principle in data visualization?
a) To use technical jargon that only experts can understand
b) To tailor the complexity and type of visualization to the audience's needs
and knowledge level
c) It is not important; all audiences should be treated the same
d) To choose the most complicated chart possible
26. • What is the primary difference between a Data Engineer and a Data Scientist?
a) Data Engineers build and maintain the systems for data collection and
storage, while Data Scientists analyze that data to find insights
b) Data Scientists write better code than Data Engineers
c) Data Engineers focus on visualization, while Data Scientists focus on data cleaning
d) There is no difference
27. • What does it mean if a dataset is 'skewed'?
a) The data is perfectly symmetrical
b) The data contains many outliers
c) The data's distribution is not symmetrical, with a tail pointing to the left or
right
d) The data is missing more than 50% of its values
28. • Which of the following is a primary benefit of automating data wrangling tasks?
a) It introduces more human error
b) It makes the process less consistent
c) It saves time and ensures consistency for repetitive cleaning tasks
d) It is more expensive than manual cleaning
29. • What is a 'confusion matrix' used for in machine learning?
a) To visualize the correlation between features
b) To summarize the performance of a classification model by showing true
vs. predicted classes
c) To plot the distribution of a variable
d) To select the best features for a model
30. • Which of the following is an ethical implication of data analysis?
a) Achieving high model accuracy
b) Ensuring data privacy and preventing misuse of personal information
c) Using open-source software
d) Publishing research findings
31. • What is the main purpose of a box plot?
a) To show the relationship between two variables
b) To visualize the five-number summary (minimum, Q1, median, Q3,
maximum) and detect outliers
c) To compare parts of a whole
d) To display trends over time
32. • Why might a developer choose Python for a data science project?
a) Because it has very limited community support
b) Because it is a low-level programming language
c) Because of its extensive libraries (like Pandas, NumPy, Scikit-learn),
readability, and large community
d) Because it is primarily used for web design
33. • What is the difference between 'correlation' and 'causation'?
a) They are the same thing
b) Correlation implies that one event caused another, while causation does not
c) Correlation indicates a relationship between two variables, but it does not
mean one causes the other
d) Causation is easier to prove than correlation
34. • What is a 'dashboard' in the context of business intelligence?
a) A single, static chart
b) A tool for writing code
c) A visual interface that provides at-a-glance views of key performance
indicators (KPIs) relevant to a particular objective
d) A database for storing data
35. • What is the main limitation of deleting rows with missing data?
a) It can introduce bias and lead to a significant loss of valuable information,
especially if the data is not missing at random
b) It is the most complex method to implement
c) It always improves the model's accuracy
d) It increases the size of the dataset
36. • Which of the following best describes 'real-time analytics'?
a) Analyzing data that is a year old
b) The ability to process and analyze data as it is generated to provide
immediate insights
c) A type of data storage
d) The process of cleaning data once a month
37. • What is 'data governance'?
a) The process of visualizing data
b) The overall management of the availability, usability, integrity, and
security of data used in an enterprise
c) A specific machine learning algorithm
d) A type of NoSQL database
38. • Which chart is best for comparing a value against a target or benchmark?
a) Pie Chart
b) Bullet Chart
c) Scatter Plot
d) Line Chart
39. • What is the purpose of 'cross-validation' in machine learning?
a) To make the model train faster
b) To assess how the results of a statistical analysis will generalize to an
independent dataset
c) To visualize the data
d) To collect more data
40. • What does it mean to 'standardize' data?
a) To rescale it to a range between 0 and 1
b) To transform it so that it has a mean of 0 and a standard deviation of 1
c) To remove all outliers
d) To convert it to a different file format
41. • Which of these is NOT a goal of Exploratory Data Analysis (EDA)?
a) Discover patterns and relationships
b) Identify anomalies and outliers
c) Form hypotheses for further testing
d) Deploy a final, production-ready model
42. • Why is 'data lineage' important?
a) It helps in understanding the data's origins, what happens to it, and where
it moves over time
b) It is a method for predicting future data trends
c) It is a type of data visualization
d) It is a security protocol
43. • A company uses its historical sales data to build a model that forecasts sales for the
next three months. This is an example of:
a) Descriptive Analytics
b) Time Series Forecasting (a type of Predictive Analytics)
c) Diagnostic Analytics
d) Prescriptive Analytics
44. • What is the main function of the 'WHERE' clause in an SQL query?
a) To sort the results
b) To group rows together
c) To specify which table to retrieve data from
d) To filter records that meet a specified condition
45. • Which of the following is a characteristic of 'high-quality' data?
a) Inconsistent
b) Incomplete
c) Inaccurate
d) Consistent
46. • What is the primary risk of having 'data silos' within an organization?
a) It promotes collaboration and a holistic view of the business
b) It leads to inconsistent and fragmented data, hindering effective decision-
making
c) It improves data security
d) It reduces data storage costs
47. • What is the role of an 'API' in data collection?
a) It is a manual method for entering data
b) It allows different software applications to communicate and exchange
data in a structured way
c) It is a tool for visualizing data
d) It is a type of database
48. • In visualization, what is the 'data-ink ratio'?
a) The total amount of ink used on a page
b) The proportion of a graphic's ink devoted to the non-redundant display of
data-information
c) The number of colors used in a chart
d) The size of the font used for labels
49. • What is the key difference between supervised and unsupervised learning?
a) The presence or absence of labeled output data for training
b) The programming language used to implement the model
c) The size of the dataset
d) The speed of the algorithm
50. • What is a 'future trend' in the field of data visualization?
a) A return to static, non-interactive charts
b) The use of Augmented Reality (AR) and Virtual Reality (VR) for
immersive data exploration
c) A decrease in the use of dashboards
d) Moving away from real-time data
51. • What is the main purpose of a 'foreign key' in a relational database?
a) To uniquely identify a record within its own table
b) To create a link between two tables
c) To store text data
d) To define the data type of a column
52. • Which of the following is a common challenge in data wrangling?
a) The data is always perfectly clean and structured
b) Deciding on the best strategy for handling complex issues like missing data
or outliers
c) The process is always fast and fully automated
d) There are too few tools available for the job
53. • What is 'Natural Language Processing (NLP)'?
a) A field of AI that helps computers understand, interpret, and manipulate
human language
b) A programming language for statistics
c) A method for visualizing geographical data
d) A database query language
54. • Why is Excel less suitable than Tableau or Power BI for large-scale enterprise BI?
a) Excel cannot create charts
b) Excel has limitations in handling very large datasets and lacks the
advanced interactive and collaborative features of dedicated BI tools
c) Excel is more expensive than Tableau and Power BI
d) Excel requires more coding knowledge
55. • What does 'data transformation' involve?
a) Collecting data from a source
b) Changing the format, structure, or values of data
c) Deleting the data permanently
d) Storing the data in a warehouse
56. • What is the primary advantage of using a graph database?
a) It is ideal for simple key-value lookups
b) It is optimized for storing and navigating complex relationships between
entities
c) It is the best choice for storing tabular data
d) It does not require any storage space
57. • Which of the following is an example of 'ordinal' data?
a) Eye color (blue, brown, green)
b) Temperature in Celsius
c) Customer satisfaction ratings (e.g., "Very Dissatisfied", "Neutral", "Very
Satisfied")
d) The weight of a person
58. • What is a 'data pipeline'?
a) A single visualization of data
b) A system for moving data from a source to a destination, often involving
steps like transformation and cleaning
c) A machine learning model
d) A summary report
59. • Why is it important to avoid misleading scales in charts (e.g., truncating the y-
axis)?
a) Because it can exaggerate differences and misrepresent the data's story
b) Because it makes the chart look better
c) Because it is a requirement for all chart types
d) Because it saves ink
60. • What is the ultimate goal of data analytics in a business context?
a) To collect as much data as possible
b) To use data to make more informed decisions that drive business value
c) To create the most complex models possible
d) To hire more data scientists

You might also like