BigQuery Interview Cheat Sheet
Beginner Level
Q: What is Google BigQuery, and how does it differ from traditional databases?
A: Google BigQuery is a fully managed, serverless data warehouse that allows users to run SQL queries on massive
datasets with high speed and efficiency. Unlike traditional databases, BigQuery uses columnar storage, distributed
processing via Dremel, and a pay-per-use model.
Q: What are the key features of BigQuery?
A: Serverless architecture, columnar storage, automatic scaling, support for standard SQL, BigQuery ML integration,
and seamless integration with GCP services.
Q: What are datasets, tables, and schemas in BigQuery?
A: A dataset is a container for tables; a table holds structured data; a schema defines the structure of the table with
column names and types.
Q: Explain the difference between partitioned and clustered tables in BigQuery.
A: Partitioned tables segment data by column values (e.g., date), improving performance by reducing scanned data.
Clustered tables sort data within partitions based on columns, enhancing filter and aggregation performance.
Q: How does BigQuery store and process data?
A: Data is stored in columnar format using Colossus and processed by Dremel, which parallelizes queries across nodes.
Intermediate Level
Q: What is a BigQuery slot, and how does it impact performance?
A: A slot is a virtual CPU used by BigQuery to execute SQL queries. More slots mean faster processing. Slots are
dynamically allocated in on-demand pricing or fixed in flat-rate pricing.
Q: What is a materialized view, and how does it differ from a regular view?
A: Materialized views store precomputed results and are faster for frequent queries. Regular views only store query logic
and compute results on each access.
Q: How do you handle duplicate records in BigQuery?
A: Use DISTINCT, ROW_NUMBER(), or GROUP BY to deduplicate. ROW_NUMBER is useful for keeping the most
recent record per key.
Advanced Level
Q: How does BigQuery handle schema changes?
A: You can add new columns but cannot rename or drop them directly. Create a new table with the desired schema if
needed.
Q: How does query caching work in BigQuery?
A: If the same query is re-run within 24 hours and the underlying data hasn't changed, BigQuery returns the cached
result free of charge.