KEMBAR78
Formatted BigQuery CheatSheet | PDF
0% found this document useful (0 votes)
127 views1 page

Formatted BigQuery CheatSheet

The document is a cheat sheet for BigQuery interview questions, covering beginner to advanced levels. It explains key concepts such as the differences between BigQuery and traditional databases, features like serverless architecture and columnar storage, and details on handling schema changes and duplicate records. Additionally, it addresses performance aspects related to slots and query caching.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
127 views1 page

Formatted BigQuery CheatSheet

The document is a cheat sheet for BigQuery interview questions, covering beginner to advanced levels. It explains key concepts such as the differences between BigQuery and traditional databases, features like serverless architecture and columnar storage, and details on handling schema changes and duplicate records. Additionally, it addresses performance aspects related to slots and query caching.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

BigQuery Interview Cheat Sheet

Beginner Level

Q: What is Google BigQuery, and how does it differ from traditional databases?
A: Google BigQuery is a fully managed, serverless data warehouse that allows users to run SQL queries on massive
datasets with high speed and efficiency. Unlike traditional databases, BigQuery uses columnar storage, distributed
processing via Dremel, and a pay-per-use model.

Q: What are the key features of BigQuery?


A: Serverless architecture, columnar storage, automatic scaling, support for standard SQL, BigQuery ML integration,
and seamless integration with GCP services.

Q: What are datasets, tables, and schemas in BigQuery?


A: A dataset is a container for tables; a table holds structured data; a schema defines the structure of the table with
column names and types.

Q: Explain the difference between partitioned and clustered tables in BigQuery.


A: Partitioned tables segment data by column values (e.g., date), improving performance by reducing scanned data.
Clustered tables sort data within partitions based on columns, enhancing filter and aggregation performance.

Q: How does BigQuery store and process data?


A: Data is stored in columnar format using Colossus and processed by Dremel, which parallelizes queries across nodes.

Intermediate Level

Q: What is a BigQuery slot, and how does it impact performance?


A: A slot is a virtual CPU used by BigQuery to execute SQL queries. More slots mean faster processing. Slots are
dynamically allocated in on-demand pricing or fixed in flat-rate pricing.

Q: What is a materialized view, and how does it differ from a regular view?
A: Materialized views store precomputed results and are faster for frequent queries. Regular views only store query logic
and compute results on each access.

Q: How do you handle duplicate records in BigQuery?


A: Use DISTINCT, ROW_NUMBER(), or GROUP BY to deduplicate. ROW_NUMBER is useful for keeping the most
recent record per key.

Advanced Level

Q: How does BigQuery handle schema changes?


A: You can add new columns but cannot rename or drop them directly. Create a new table with the desired schema if
needed.

Q: How does query caching work in BigQuery?


A: If the same query is re-run within 24 hours and the underlying data hasn't changed, BigQuery returns the cached
result free of charge.

You might also like