Spark SQL with Scala Code Examples

Background
• Spark SQL is Spark's module for
working with structured data.
• Spark SQL lets you query structured
data inside Spark programs, using
either SQL or a familiar DataFrame API.
Usable in Java, Scala, Python and R.
• Born out of Shark project at Berkeley

Assumptions
These slides and examples assume you
already have at least a basic understanding
of Spark constructs such as RDDs, Actions,
Transformers.

Resources
To learn more about Spark, checkout
supergloo’s free Spark Tutorials

Introduction
• DataFrames are a kind of Resilient Distributed Data Set
• DataFrames are composed of Row objects accompanied
with schema which describes the data types of each
column.
• A DataFrame may be considered similar to a table in a
traditional relational database

1. $SPARK_HOME/bin/spark-shell --packages
com.databricks:spark-csv_2.10:1.3.0
2. scala>val baby_names =
sqlContext.read.format("com.databricks.spark.csv").option("he
ader", "true").option("inferSchema",
“true").load("baby_names.csv")
3. scala> baby_names.registerTempTable(“names")
4. scala> val distinctYears = sqlContext.sql("select distinct Year
from names”)
5. scala> distinctYears.collect.foreach(println)
Spark SQL with CSV

JSON in following examples:
{"first_name":"James", "last_name":"Butterburg", "address":
{"street": "6649 N Blue Gum St", "city": "New Orleans","state":
"LA", "zip": "70116" }}
{"first_name":"Josephine", "last_name":"Darakjy", "address":
{"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI",
"zip": "48116" }}
{"first_name":"Art", "last_name":"Chemel", "address": {"street": "8
W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip":
"08014" }}
Spark SQL with JSON (slide 1 of 2)

1. $SPARK_HOME/bin/spark-shell
2. scala> val customers =
sqlContext.jsonFile(“customers.json")
3. scala> customers.registerTempTable(“customers")
4. scala> val ﬁrstCityState = sqlContext.sql("SELECT
ﬁrst_name, address.city, address.state FROM
customers")
Spark SQL with JSON (slide 2 of 2)

Requirements
1. MySQL instance
2. MySQL JDBC driver
Spark SQL with JDBC mySQL (slide 1 of 2)

1. $SPARK_HOME/bin/spark-shell –jars mysql-connector-
java-5.1.26.jar
2. val dataframe_mysql = sqlContext.read.format("jdbc").option("url",
"jdbc:mysql://localhost/sparksql").option("driver",
"com.mysql.jdbc.Driver").option("dbtable",
"baby_names").option("user", "root").option("password",
“root").load()
3. scala> dataframe_mysql.registerTempTable(“names")
4. scala> dataframe_mysql.sqlContext.sql("select * from
names”).collect.foreach(println)
Spark SQL with JDBC mySQL (slide 2 of 2)

Conclusion
For more Spark SQL and other Spark tutorials visit:
http://www.supergloo.com/

Credit
Title slide image: https://ﬂic.kr/p/8wFrUX

Spark SQL with Scala Code Examples

More Related Content

What's hot

Viewers also liked

Similar to Spark SQL with Scala Code Examples

Recently uploaded

Spark SQL with Scala Code Examples