KEMBAR78
Spark SQL with Scala Code Examples | PDF
Spark SQL
Code Examples
Background
• Spark SQL is Spark's module for
working with structured data.
• Spark SQL lets you query structured
data inside Spark programs, using
either SQL or a familiar DataFrame API.
Usable in Java, Scala, Python and R.
• Born out of Shark project at Berkeley
Assumptions
These slides and examples assume you
already have at least a basic understanding
of Spark constructs such as RDDs, Actions,
Transformers.
Resources
To learn more about Spark, checkout
supergloo’s free Spark Tutorials
Introduction
• DataFrames are a kind of Resilient Distributed Data Set
• DataFrames are composed of Row objects accompanied
with schema which describes the data types of each
column.
• A DataFrame may be considered similar to a table in a
traditional relational database
1. $SPARK_HOME/bin/spark-shell --packages
com.databricks:spark-csv_2.10:1.3.0
2. scala>val baby_names =
sqlContext.read.format("com.databricks.spark.csv").option("he
ader", "true").option("inferSchema",
“true").load("baby_names.csv")
3. scala> baby_names.registerTempTable(“names")
4. scala> val distinctYears = sqlContext.sql("select distinct Year
from names”)
5. scala> distinctYears.collect.foreach(println)
Spark SQL with CSV
JSON in following examples:
{"first_name":"James", "last_name":"Butterburg", "address":
{"street": "6649 N Blue Gum St", "city": "New Orleans","state":
"LA", "zip": "70116" }}
{"first_name":"Josephine", "last_name":"Darakjy", "address":
{"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI",
"zip": "48116" }}
{"first_name":"Art", "last_name":"Chemel", "address": {"street": "8
W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip":
"08014" }}
Spark SQL with JSON (slide 1 of 2)
1. $SPARK_HOME/bin/spark-shell
2. scala> val customers =
sqlContext.jsonFile(“customers.json")
3. scala> customers.registerTempTable(“customers")
4. scala> val firstCityState = sqlContext.sql("SELECT
first_name, address.city, address.state FROM
customers")
Spark SQL with JSON (slide 2 of 2)
Requirements
1. MySQL instance
2. MySQL JDBC driver
Spark SQL with JDBC mySQL (slide 1 of 2)
1. $SPARK_HOME/bin/spark-shell –jars mysql-connector-
java-5.1.26.jar
2. val dataframe_mysql = sqlContext.read.format("jdbc").option("url",
"jdbc:mysql://localhost/sparksql").option("driver",
"com.mysql.jdbc.Driver").option("dbtable",
"baby_names").option("user", "root").option("password",
“root").load()
3. scala> dataframe_mysql.registerTempTable(“names")
4. scala> dataframe_mysql.sqlContext.sql("select * from
names”).collect.foreach(println)
Spark SQL with JDBC mySQL (slide 2 of 2)
Conclusion
For more Spark SQL and other Spark tutorials visit:
http://www.supergloo.com/
Credit
Title slide image: https://flic.kr/p/8wFrUX

Spark SQL with Scala Code Examples

  • 1.
  • 2.
    Background • Spark SQLis Spark's module for working with structured data. • Spark SQL lets you query structured data inside Spark programs, using either SQL or a familiar DataFrame API. Usable in Java, Scala, Python and R. • Born out of Shark project at Berkeley
  • 3.
    Assumptions These slides andexamples assume you already have at least a basic understanding of Spark constructs such as RDDs, Actions, Transformers.
  • 4.
    Resources To learn moreabout Spark, checkout supergloo’s free Spark Tutorials
  • 5.
    Introduction • DataFrames area kind of Resilient Distributed Data Set • DataFrames are composed of Row objects accompanied with schema which describes the data types of each column. • A DataFrame may be considered similar to a table in a traditional relational database
  • 6.
    1. $SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0 2.scala>val baby_names = sqlContext.read.format("com.databricks.spark.csv").option("he ader", "true").option("inferSchema", “true").load("baby_names.csv") 3. scala> baby_names.registerTempTable(“names") 4. scala> val distinctYears = sqlContext.sql("select distinct Year from names”) 5. scala> distinctYears.collect.foreach(println) Spark SQL with CSV
  • 7.
    JSON in followingexamples: {"first_name":"James", "last_name":"Butterburg", "address": {"street": "6649 N Blue Gum St", "city": "New Orleans","state": "LA", "zip": "70116" }} {"first_name":"Josephine", "last_name":"Darakjy", "address": {"street": "4 B Blue Ridge Blvd", "city": "Brighton","state": "MI", "zip": "48116" }} {"first_name":"Art", "last_name":"Chemel", "address": {"street": "8 W Cerritos Ave #54", "city": "Bridgeport","state": "NJ", "zip": "08014" }} Spark SQL with JSON (slide 1 of 2)
  • 8.
    1. $SPARK_HOME/bin/spark-shell 2. scala>val customers = sqlContext.jsonFile(“customers.json") 3. scala> customers.registerTempTable(“customers") 4. scala> val firstCityState = sqlContext.sql("SELECT first_name, address.city, address.state FROM customers") Spark SQL with JSON (slide 2 of 2)
  • 9.
    Requirements 1. MySQL instance 2.MySQL JDBC driver Spark SQL with JDBC mySQL (slide 1 of 2)
  • 10.
    1. $SPARK_HOME/bin/spark-shell –jarsmysql-connector- java-5.1.26.jar 2. val dataframe_mysql = sqlContext.read.format("jdbc").option("url", "jdbc:mysql://localhost/sparksql").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "baby_names").option("user", "root").option("password", “root").load() 3. scala> dataframe_mysql.registerTempTable(“names") 4. scala> dataframe_mysql.sqlContext.sql("select * from names”).collect.foreach(println) Spark SQL with JDBC mySQL (slide 2 of 2)
  • 11.
    Conclusion For more SparkSQL and other Spark tutorials visit: http://www.supergloo.com/
  • 12.
    Credit Title slide image:https://flic.kr/p/8wFrUX