0% found this document useful (0 votes)

65 views12 pages

4220 5 (Python)

The document provides an overview of Python Crash Course covering basic Python and DataFrame manipulations in Databricks notebooks. It discusses how to import and open .ipynb and .py files in a notebook, perform basic file operations using DBFS commands, load and save data to/from Databricks, and describes simple manipulations and SQL queries that can be done on DataFrames.

Uploaded by

darren boesono

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views12 pages

4220 5 (Python)

Uploaded by

darren boesono

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Python Crash Course

Instructor: Li Yang
Python

• This lecture will cover some basic manipulations of python

• Python is an interpreted high-level general-purpose programming

language. Its design philosophy emphasizes code readability with its
use of significant indentation.

• We use databricks notebook but you can also use other …

Databricks Notebook

• Notebook use .ipynb file and also .py file

• You can’t open .ipynb file or .py file directly and you need a notebook to
open it

• Click on workspace (left column)

• Right click your mouse and choose import option

• Browse your local computer and upload .ipynb file or .py file
Databricks File sytem (DBFS)

• First you need to type %fs in your notebook

• Basic manipulations (commonly used ones):

• ls command: lists the contents of a directory

• cp command: copy a file or directory

• mkdirs command: create a directory if it doesn’t exist

• mv command: move a file or directory

• rm command: remove a file or directory

Read data from Databricks

• If your data is not on databricks, you need to upload it first:

• Click on data button of left column

• Create a table

• Upload file

• Create a table with UI and choose a cluster to the table

• Preview table. Then open a notebook, type %fs ls FileStore/tables/

to check data on databricks

Dataframe

DataFrame is a 2-dimensional labeled data structure with columns of potentially

different types. You can think of it like a spreadsheet or SQL table, or a dict of
Series objects. It is generally the most commonly used data object in python

• Data processing’s 1st step: read data and build a dataframe for further use.

Read data from Databricks

• If your data is on databricks already:

• Open or create a new notebook (we choose python as our default language)

• Read data from DBFS to dataframe:

df = spark.read.format(file_type)\

.option("multiline","true")\

.option("inferSchema", “true”)\

.option("header", “true”)\

.load(file_location)

• Remark: readable file types are txt, csv, ldap, json, parquet, orc.
Write dataframe as file to Databricks

• Open or create a new notebook (we choose python as our default

language)

• Read data from DBFS to dataframe:

df.write.format(file_type)\

.option("header", “true”)\

.option("delimiter", “,”)\

.save(file_name)
Simple manipulations on dataframe

• df.cache(): cache dataframe for quick use

• display(df): illustration of dataframe

• df.printSchema(): details of columns: name, formats of columns,

nullability

• df.columns: names of columns

• df.describe().show(): summary of some statistics of each numeric column

• type(object): the data format of the object

Simple manipulations on dataframe

• df[[“column1”…”columnN”]]: create a new dataframe with N columns from a

dataframe

• df.withColumnRenamed(“old column name”, “new column name”): create a new

dataframe replacing an old column name by a new name

• df.withColumn(“column name”, df.[“column name”].cast(“data type”)): create a

new dataframe assigning the data type for the column

• df.withColumn(“new column name”, lit(default value)): create a new dataframe

adding an new column with a default value

• df.drop(“column name”): create a new dataframe dropping a column with the

column name

• df.createOrReplaceTempView(“SQL temporary view name”): create a SQL

temporary view for use
Simple SQL manipulations

• Basic structure of SQL:

• Structure 1: return a new table

Select column1, column2, … or * (all the columns)

from table (temporary view)

under conditions

• Structure 2: return some value of a new table

Select fun(column1), fun(column2), … or fun(*) (all the columns)

from table (temporary view)

under conditions

• Remark: column name can not include space. For example, column name, new age, is not allowed.
You need to change it to new_age or newage …
Simple SQL: where

• Basic structure of SQL:

• Structure 1: return a new table

select column1, column2, … or * (all the columns)

from table (temporary view)

where conditions

• Conditions:

• condition on a column:

• column >, <, =, <> number

• column in (string1… stringN)

• column like string

• Condition 1 and (or) Condition 2

Index
No ratings yet
Index
2 pages
De Mod 2 Transform Data With Spark
No ratings yet
De Mod 2 Transform Data With Spark
32 pages
Python and Pyspark With Databricks, With Azure Project
No ratings yet
Python and Pyspark With Databricks, With Azure Project
9 pages
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
Python You Should Learn
No ratings yet
Python You Should Learn
12 pages
Data Visualization Using Pyplot
No ratings yet
Data Visualization Using Pyplot
14 pages
DHP Answer
No ratings yet
DHP Answer
11 pages
Databricks Spark Exam Notes
No ratings yet
Databricks Spark Exam Notes
27 pages
Manipulating Dataframes - Beginner
No ratings yet
Manipulating Dataframes - Beginner
2 pages
Python Record Manual
No ratings yet
Python Record Manual
18 pages
Notes For Fintech Assesment, Cheatsheet
No ratings yet
Notes For Fintech Assesment, Cheatsheet
19 pages
Basic DataFrame Operation
No ratings yet
Basic DataFrame Operation
11 pages
303database Handling Using Python
No ratings yet
303database Handling Using Python
3 pages
4220 6 (DataFormat)
No ratings yet
4220 6 (DataFormat)
15 pages
PySpark Interview Questions Guide
100% (3)
PySpark Interview Questions Guide
126 pages
Pyspark Cheatsheet
No ratings yet
Pyspark Cheatsheet
10 pages
Python Basics & Data Structures
No ratings yet
Python Basics & Data Structures
47 pages
Module 3 Notes
No ratings yet
Module 3 Notes
45 pages
Quetion Bank
No ratings yet
Quetion Bank
2 pages
Question Bank
No ratings yet
Question Bank
2 pages
02 Python Basics
No ratings yet
02 Python Basics
52 pages
Army School
No ratings yet
Army School
31 pages
4 BNI Python Training
100% (1)
4 BNI Python Training
126 pages
NTU AB0403 Quiz Notes
No ratings yet
NTU AB0403 Quiz Notes
18 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
IP Imp Notes
No ratings yet
IP Imp Notes
5 pages
PySpark Data Frame Questions PDF
100% (2)
PySpark Data Frame Questions PDF
57 pages
PySpark DataFrame Operations Guide
No ratings yet
PySpark DataFrame Operations Guide
10 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
Pyspark Syntax Using Simple Examples
No ratings yet
Pyspark Syntax Using Simple Examples
28 pages
Pyspark IQ FREE Guide
100% (1)
Pyspark IQ FREE Guide
57 pages
Lecture Week2
No ratings yet
Lecture Week2
72 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
DS Final
No ratings yet
DS Final
46 pages
Pandas Library: Data Manipulation & Analysis Guide
No ratings yet
Pandas Library: Data Manipulation & Analysis Guide
9 pages
Spark Essentials
No ratings yet
Spark Essentials
15 pages
50 PySpark Interview Questions 1732556477
No ratings yet
50 PySpark Interview Questions 1732556477
7 pages
Pyspark SQL Basics Cheat Sheet: Python For Data Science
No ratings yet
Pyspark SQL Basics Cheat Sheet: Python For Data Science
1 page
Esc Enter M Y A B D + D Z F Shift + Up/Down Space Shift + Space
No ratings yet
Esc Enter M Y A B D + D Z F Shift + Up/Down Space Shift + Space
12 pages
Freda Song Drechsler - Maneuvering WRDS Data
No ratings yet
Freda Song Drechsler - Maneuvering WRDS Data
8 pages
Bda U5
No ratings yet
Bda U5
42 pages
Abps, Awarpur Class Xii - Study Material 2025-26
No ratings yet
Abps, Awarpur Class Xii - Study Material 2025-26
36 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
CH 5 Summary
No ratings yet
CH 5 Summary
3 pages
Python and PowerBI Syllabus
No ratings yet
Python and PowerBI Syllabus
3 pages
Python Training For Data Analysis
No ratings yet
Python Training For Data Analysis
40 pages
Pandas
No ratings yet
Pandas
50 pages
Python Pandas Guide for Data Analysts
No ratings yet
Python Pandas Guide for Data Analysts
37 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Numpy Notes
No ratings yet
Numpy Notes
38 pages
Sumita Arora Class 12
100% (2)
Sumita Arora Class 12
262 pages
Lecture 7 Mutual Fund and Fund Performance
No ratings yet
Lecture 7 Mutual Fund and Fund Performance
30 pages
Final Mock
No ratings yet
Final Mock
2 pages
(PA) (Exam) Past Exam (2022 Fall)
No ratings yet
(PA) (Exam) Past Exam (2022 Fall)
5 pages
Mock Midterm
No ratings yet
Mock Midterm
1 page
美签个人简历英文版（原件）模板
No ratings yet
美签个人简历英文版（原件）模板
1 page
Big Data Insights for Tech Enthusiasts
No ratings yet
Big Data Insights for Tech Enthusiasts
19 pages
0580 m18 Ms 42
No ratings yet
0580 m18 Ms 42
7 pages
Math IA
No ratings yet
Math IA
16 pages
Chemistry IA FINAL PDF
No ratings yet
Chemistry IA FINAL PDF
13 pages
Darren Boesono 12IBC-2
No ratings yet
Darren Boesono 12IBC-2
1 page
IB Economics: Tobacco Tax Analysis
No ratings yet
IB Economics: Tobacco Tax Analysis
7 pages
Physics IA
No ratings yet
Physics IA
3 pages
Investigating The Effect of PH On Amylase Activity Ss 34
No ratings yet
Investigating The Effect of PH On Amylase Activity Ss 34
4 pages
Data Environment
No ratings yet
Data Environment
49 pages
Decodificador 2 A 4 en VHDL
No ratings yet
Decodificador 2 A 4 en VHDL
1 page
How To Connect A MiR Robot To A WiFi Network 2.1 - en
No ratings yet
How To Connect A MiR Robot To A WiFi Network 2.1 - en
7 pages
CPU Instruction Set Basics
No ratings yet
CPU Instruction Set Basics
13 pages
User Guide For Student v3.0.1
No ratings yet
User Guide For Student v3.0.1
16 pages
DCE310H40 Users Manual Chicony E8hdce310h40 Ex 1 3
No ratings yet
DCE310H40 Users Manual Chicony E8hdce310h40 Ex 1 3
12 pages
Important Questions of SE: Chapter 1:-Introduction To Software and Software Engineering
100% (1)
Important Questions of SE: Chapter 1:-Introduction To Software and Software Engineering
4 pages
Interview Questions Servicenow
No ratings yet
Interview Questions Servicenow
8 pages
HBase and Hive at StumbleUpon Presentation
No ratings yet
HBase and Hive at StumbleUpon Presentation
22 pages
QuickBooks Enterprise Contact Information
No ratings yet
QuickBooks Enterprise Contact Information
11 pages
Docker Fundamentals
No ratings yet
Docker Fundamentals
8 pages
Sumeru 2021 - Presentation
No ratings yet
Sumeru 2021 - Presentation
21 pages
Color Quality Guide
No ratings yet
Color Quality Guide
3 pages
Ez Analyst PDF
No ratings yet
Ez Analyst PDF
136 pages
Task 2 - Activity 1
No ratings yet
Task 2 - Activity 1
4 pages
Graph Traversal Techniques
No ratings yet
Graph Traversal Techniques
31 pages
Microcontroller Based Missile Detection and Destroying System
No ratings yet
Microcontroller Based Missile Detection and Destroying System
3 pages
Data Cleaning: Missing Values: - For Example in Attribute Income If
No ratings yet
Data Cleaning: Missing Values: - For Example in Attribute Income If
30 pages
Zigbee Networks for Engineering Students
No ratings yet
Zigbee Networks for Engineering Students
18 pages
ISO - 16730 - 1 - 2015 - EN (11 Pagine)
No ratings yet
ISO - 16730 - 1 - 2015 - EN (11 Pagine)
11 pages
Changelog
No ratings yet
Changelog
2 pages
Chapter 5 DATA MODELLING AND DESIGN
No ratings yet
Chapter 5 DATA MODELLING AND DESIGN
29 pages
WEG VDL200 ASY Functions Descriptions Parameters Asy 1S9VDFEN en
No ratings yet
WEG VDL200 ASY Functions Descriptions Parameters Asy 1S9VDFEN en
110 pages
Rps Mini Project Final - Removed
No ratings yet
Rps Mini Project Final - Removed
11 pages
Unit Converter: CM To Inches Converter - Rapidtables
No ratings yet
Unit Converter: CM To Inches Converter - Rapidtables
4 pages
Cyber Law
No ratings yet
Cyber Law
43 pages
Practical 2.2
No ratings yet
Practical 2.2
2 pages
Excel Shortcuts
No ratings yet
Excel Shortcuts
1 page
Apache HTTP Server 2.4 Security Guide
No ratings yet
Apache HTTP Server 2.4 Security Guide
166 pages
Control Solutions: Q8 High-Performance H.I.L. Control Board
No ratings yet
Control Solutions: Q8 High-Performance H.I.L. Control Board
2 pages

4220 5 (Python)

Uploaded by

4220 5 (Python)

Uploaded by

Python Crash Course

• This lecture will cover some basic manipulations of python

• Python is an interpreted high-level general-purpose programming

• We use databricks notebook but you can also use other …

• Notebook use .ipynb file and also .py file

• Click on workspace (left column)

• Right click your mouse and choose import option

• First you need to type %fs in your notebook

• Basic manipulations (commonly used ones):

• ls command: lists the contents of a directory

• cp command: copy a file or directory

• mkdirs command: create a directory if it doesn’t exist

• mv command: move a file or directory

• rm command: remove a file or directory

• If your data is not on databricks, you need to upload it first:

• Click on data button of left column

• Create a table with UI and choose a cluster to the table

• Preview table. Then open a notebook, type %fs ls FileStore/tables/

to check data on databricks

DataFrame is a 2-dimensional labeled data structure with columns of potentially

Read data from Databricks

• If your data is on databricks already:

• Read data from DBFS to dataframe:

• Open or create a new notebook (we choose python as our default

• Read data from DBFS to dataframe:

• df.cache(): cache dataframe for quick use

• display(df): illustration of dataframe

• df.printSchema(): details of columns: name, formats of columns,

• df.columns: names of columns

• df.describe().show(): summary of some statistics of each numeric column

• type(object): the data format of the object

• df[[“column1”…”columnN”]]: create a new dataframe with N columns from a

• df.withColumnRenamed(“old column name”, “new column name”): create a new

• df.withColumn(“column name”, df.[“column name”].cast(“data type”)): create a

• df.withColumn(“new column name”, lit(default value)): create a new dataframe

• df.drop(“column name”): create a new dataframe dropping a column with the

• df.createOrReplaceTempView(“SQL temporary view name”): create a SQL

• Basic structure of SQL:

• Structure 1: return a new table

Select column1, column2, … or * (all the columns)

from table (temporary view)

• Structure 2: return some value of a new table

Select fun(column1), fun(column2), … or fun(*) (all the columns)

from table (temporary view)

• Basic structure of SQL:

• Structure 1: return a new table

select column1, column2, … or * (all the columns)

from table (temporary view)

• column >, <, =, <> number

• column in (string1… stringN)

• column like string

• Condition 1 and (or) Condition 2

You might also like