SparkSQL is a Spark component that allows SQL queries to be executed on Spark. It uses Catalyst, which provides an execution planning framework for relational operations like SQL parsing, logical optimization, and physical planning. Catalyst defines logical and physical operators, expressions, data types and provides rule-based optimizations to transform query plans. The SQL core in SparkSQL builds SchemaRDDs to represent queries and allows reading/writing to Parquet and JSON formats.