KEMBAR78
4 Analyze NYC Data Using Dedicated SQL Pool | PDF | Sql | Databases
0% found this document useful (0 votes)
18 views5 pages

4 Analyze NYC Data Using Dedicated SQL Pool

The document outlines the process of creating a dedicated SQL pool named NycSQLPool1 in Synapse Studio and loading NYC Taxi data into it. It provides SQL scripts for creating a table and importing data, as well as instructions for querying and visualizing the data. The final query analyzes the relationship between passenger count and trip distances, displaying the results in a line chart.

Uploaded by

kasaramvenky082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views5 pages

4 Analyze NYC Data Using Dedicated SQL Pool

The document outlines the process of creating a dedicated SQL pool named NycSQLPool1 in Synapse Studio and loading NYC Taxi data into it. It provides SQL scripts for creating a table and importing data, as well as instructions for querying and visualizing the data. The final query analyzes the relationship between passenger count and trip distances, displaying the results in a line chart.

Uploaded by

kasaramvenky082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Analyze NYC data using Dedicated SQL Pool

Create a dedicated SQL pool:


In Synapse Studio, on the left-side pane, select Manage and click on
SQL pools under Analytics pools.
Select New
Dedicated SQL pool name = NycSQLPool1
For Performance level choose DW100C
Select Review + create > Create. Your dedicated SQL pool will be
ready in a few minutes.

Your dedicated SQL pool is associated with a SQL database that's also
called NycSQLPool1.
Navigate to Data and click on Workspace.
You should see a database named NycSQLPool1. If you do not see it,
click Refresh.

Note: A dedicated SQL pool consumes billable resources as long as


it's active. You can pause the pool later to reduce costs.

Load the NYC Taxi Data into SQLPOOL1


In Synapse Studio, navigate to the Develop hub, click the + button to
add new resource, then create new SQL script.

Select the pool 'NycSQLPool1' in Connect to drop down list above the
script.
Enter the following code:
IF NOT EXISTS (SELECT * FROM sys.objects O JOIN sys.schemas S ON
O.schema_id = S.schema_id WHERE O.NAME = 'NYCTaxiTrip' AND
O.TYPE = 'U' AND S.NAME = 'dbo')
CREATE TABLE dbo.NYCTaxiTrip
(
[DateID] int,
[MedallionID] int,
[HackneyLicenseID] int,
[PickupTimeID] int,
[DropoffTimeID] int,
[PickupGeographyID] int,
[DropoffGeographyID] int,
[PickupLatitude] float,
[PickupLongitude] float,
[PickupLatLong] nvarchar(4000),
[DropoffLatitude] float,
[DropoffLongitude] float,
[DropoffLatLong] nvarchar(4000),
[PassengerCount] int,
[TripDurationSeconds] int,
[TripDistanceMiles] float,
[PaymentType] nvarchar(4000),
[FareAmount] numeric(19,4),
[SurchargeAmount] numeric(19,4),
[TaxAmount] numeric(19,4),
[TipAmount] numeric(19,4),
[TollsAmount] numeric(19,4),
[TotalAmount] numeric(19,4)
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
-- HEAP
)
GO

COPY INTO dbo.NYCTaxiTrip


(DateID 1, MedallionID 2, HackneyLicenseID 3, PickupTimeID 4,
DropoffTimeID 5,
PickupGeographyID 6, DropoffGeographyID 7, PickupLatitude 8,
PickupLongitude 9,
PickupLatLong 10, DropoffLatitude 11, DropoffLongitude 12,
DropoffLatLong 13,
PassengerCount 14, TripDurationSeconds 15, TripDistanceMiles 16,
PaymentType 17,
FareAmount 18, SurchargeAmount 19, TaxAmount 20, TipAmount 21,
TollsAmount 22,
TotalAmount 23)
FROM
'https://vnycdatalake.dfs.core.windows.net/users/NYCTaxiTrip.parqu
et'
WITH
(
FILE_TYPE = 'PARQUET'
,MAXERRORS = 0
,IDENTITY_INSERT = 'OFF'
)

Click the Run button to execute the script.


This script will finish in less than 60 seconds. It loads 2 million rows of
NYC Taxi data into a table called dbo.NYCTaxiTrip.

Explore the NYC Taxi data in the dedicated SQL pool


In Synapse Studio, go to the Data hub.
Go to NycSQLPool1 > Tables.
Right-click the dbo.NYCTaxiTrip table and select New SQL Script >
Select TOP 100 Rows.
Wait while a new SQL script is created and runs.
Notice that at the top of the SQL script Connect to is automatically
set to the SQL pool called NycSQLPool1.
Replace the text of the SQL script with the below code and run it:
SELECT PassengerCount,
SUM(TripDistanceMiles) as SumTripDistance,
AVG(TripDistanceMiles) as AvgTripDistance
INTO dbo.PassengerCountStats
FROM dbo.NYCTaxiTrip
WHERE TripDistanceMiles > 0 AND PassengerCount > 0
GROUP BY PassengerCount;
SELECT * FROM dbo.PassengerCountStats
ORDER BY PassengerCount;

The above query shows how the total trip distances and average trip
distance relate to the number of passengers.

In the SQL script result window, change the View to Chart to see a
visualization of the results as a line chart.

You might also like