Analyze NYC data using Dedicated SQL Pool
Create a dedicated SQL pool:
In Synapse Studio, on the left-side pane, select Manage and click on
SQL pools under Analytics pools.
Select New
Dedicated SQL pool name = NycSQLPool1
For Performance level choose DW100C
Select Review + create > Create. Your dedicated SQL pool will be
ready in a few minutes.
Your dedicated SQL pool is associated with a SQL database that's also
called NycSQLPool1.
Navigate to Data and click on Workspace.
You should see a database named NycSQLPool1. If you do not see it,
click Refresh.
Note: A dedicated SQL pool consumes billable resources as long as
it's active. You can pause the pool later to reduce costs.
Load the NYC Taxi Data into SQLPOOL1
In Synapse Studio, navigate to the Develop hub, click the + button to
add new resource, then create new SQL script.
Select the pool 'NycSQLPool1' in Connect to drop down list above the
script.
Enter the following code:
IF NOT EXISTS (SELECT * FROM sys.objects O JOIN sys.schemas S ON
O.schema_id = S.schema_id WHERE O.NAME = 'NYCTaxiTrip' AND
O.TYPE = 'U' AND S.NAME = 'dbo')
CREATE TABLE dbo.NYCTaxiTrip
(
[DateID] int,
[MedallionID] int,
[HackneyLicenseID] int,
[PickupTimeID] int,
[DropoffTimeID] int,
[PickupGeographyID] int,
[DropoffGeographyID] int,
[PickupLatitude] float,
[PickupLongitude] float,
[PickupLatLong] nvarchar(4000),
[DropoffLatitude] float,
[DropoffLongitude] float,
[DropoffLatLong] nvarchar(4000),
[PassengerCount] int,
[TripDurationSeconds] int,
[TripDistanceMiles] float,
[PaymentType] nvarchar(4000),
[FareAmount] numeric(19,4),
[SurchargeAmount] numeric(19,4),
[TaxAmount] numeric(19,4),
[TipAmount] numeric(19,4),
[TollsAmount] numeric(19,4),
[TotalAmount] numeric(19,4)
)
WITH
(
DISTRIBUTION = ROUND_ROBIN,
CLUSTERED COLUMNSTORE INDEX
-- HEAP
)
GO
COPY INTO dbo.NYCTaxiTrip
(DateID 1, MedallionID 2, HackneyLicenseID 3, PickupTimeID 4,
DropoffTimeID 5,
PickupGeographyID 6, DropoffGeographyID 7, PickupLatitude 8,
PickupLongitude 9,
PickupLatLong 10, DropoffLatitude 11, DropoffLongitude 12,
DropoffLatLong 13,
PassengerCount 14, TripDurationSeconds 15, TripDistanceMiles 16,
PaymentType 17,
FareAmount 18, SurchargeAmount 19, TaxAmount 20, TipAmount 21,
TollsAmount 22,
TotalAmount 23)
FROM
'https://vnycdatalake.dfs.core.windows.net/users/NYCTaxiTrip.parqu
et'
WITH
(
FILE_TYPE = 'PARQUET'
,MAXERRORS = 0
,IDENTITY_INSERT = 'OFF'
)
Click the Run button to execute the script.
This script will finish in less than 60 seconds. It loads 2 million rows of
NYC Taxi data into a table called dbo.NYCTaxiTrip.
Explore the NYC Taxi data in the dedicated SQL pool
In Synapse Studio, go to the Data hub.
Go to NycSQLPool1 > Tables.
Right-click the dbo.NYCTaxiTrip table and select New SQL Script >
Select TOP 100 Rows.
Wait while a new SQL script is created and runs.
Notice that at the top of the SQL script Connect to is automatically
set to the SQL pool called NycSQLPool1.
Replace the text of the SQL script with the below code and run it:
SELECT PassengerCount,
SUM(TripDistanceMiles) as SumTripDistance,
AVG(TripDistanceMiles) as AvgTripDistance
INTO dbo.PassengerCountStats
FROM dbo.NYCTaxiTrip
WHERE TripDistanceMiles > 0 AND PassengerCount > 0
GROUP BY PassengerCount;
SELECT * FROM dbo.PassengerCountStats
ORDER BY PassengerCount;
The above query shows how the total trip distances and average trip
distance relate to the number of passengers.
In the SQL script result window, change the View to Chart to see a
visualization of the results as a line chart.