Ppt gu scs team2_a_metro project

Predicting Post-Safetrack Metro Reliability
GU SCS Data Science Capstone Project September 10, 2016

over 250m riders annually
118 miles of track
Facts
over 13 disruptions per day
Problem Statement

Problem Statement
highly publicized safety lapses
& deferred maintenance
1 Year timeframe
estimated $60,000,000 price tag
improved safety & reliability?

Hypothesis
The DC Metro System is a pivotal transportation asset for Washington DC and the surrounding
regions. The SafeTrack project is meant to increase system safety and reliability. While technical and
operational disruptions are inevitable, we believe that available data can provide insight into how
frequently Metro riders will experience post-SafeTrack disruptions and ultimately improve their Metro
commute expectations.
Scenario #1
Improvement
Scenario #2
Improvement
Scenario #3
Improvement
To quantify the outcome, we will explore several scenarios to provide riders with a
clearer picture of their post-safetrack commute.
Scenario #4
Improvement
Scenario #5
Improvement

Data Ingestion & Wrangling
System Operations Data: used to determine system
behavior under optimal conditions
Disruption Data: historical data used to analyze the frequency
and effect of technical and operational
disruptions (ie: delays)
Ridership Data: in conjunction with operational datasets,
ridership data used to quantify and extrapolate
the scope of Metro delays.

The Data
ON TIME
ON TIME
ON TIME DELAYED
DELAYED
DELAYED
Planned Operating
Schedule
Disruption Data
Data_Source: wmata.com
Data_Scope:
Provided operating data
under a perfectly
efficient system with no
delays or disruptions
Data_Scope:
Provided 5 years of daily
disruption logs,
including; cause of
disruption and minutes
delayed
Data_Source: opendatadc
Planned Operating
Schedule and
Disruption Data
provided
a basis for
comparing pre and
post-safetrack
system behavior
LN CAR DEST MINLN CAR DEST MIN
RD 6
RD 6
RD 6
RD 6
RD 6
RD 6

The Data
24,335
records
between April
2012 - July
2016
All Metro lines
represented in
the dataset
Description of
disruption
cause.
Translated as
technical or
operational
Delay, in
minutes

Computation & Analysis: Limitations
AccuracyLocation
Station - To -
Station
‘Garbage in -
Garbage out’
concept
Opted to take a two-pronged approach:
1.) Build data product
2.) Develop simulation based on available data
Completeness
Compounding
Delays

Computation & Analysis: Methodology
1
Calculated the number of minutes of trips per day on each
line.
Broke daily delays into five tiers based on severity.
Scenario:1 Scenario:2 Scenario:3 Scenario:5
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 1
Scenario:4
Built in compounding delays based on expected train
departures.
Injected random noise into the system.
2
3
4

A Look Under The Hood
[software system demo]

Results
Created visualizations of
the various simulations
Analyzed results to
determine the shape of the
data

Results
Current 9861.402 102.51522
Scenario #1 9868.713 97.10936
Scenario #2 9854.400 108.57028
Scenario #3 9852.256 102.1384
Scenario #4 9850.429 101.7149
Scenario #5 9848.057 104.1241
Current 8121.386 95.954
Scenario #1 8117.496 97.341
Scenario #2 8115.761 99.953
Scenario #3 8114.653 104.407
Scenario #4 8104.47 99.702
Scenario #5 8093.36 98.429
Current 5280.572 100.5566
Scenario #1 5261.651 92.5748
Scenario #2 5262.043 114.093
Scenario #3 5020.293 41.431
Scenario #4 5014.868 41.251
Scenario #5 5013.92 40.980
Current 6762.053 97.839
Scenario #1 6765.053 97.839
Scenario #2 6759.09 103.266
Scenario #3 6562.22 52.973
Scenario #4 6552.85 53.316
Scenario #5 6540.79 48.947
Current 6811.311 108.8495
Scenario #1 6815.311 108.8495
Scenario #2 6816.787 105.2023
Scenario #3 6809.531 108.5713
Scenario #4 6810.966 97.1970
Scenario #5 6809.322 98.0109
Current 11149.5 97.3886
Scenario #1 11159.6 98.4512
Scenario #2 11146.33 99.5911
Scenario #3 11138.77 112.8393
Scenario #4 11132.07 97.0613
Scenario #5 11123.83 101.226

Conclusions
Scenario #1 Scenario #2 Scenario #3 Scenario #4 Scenario #5
Noticeable improvements in time and probability of delay was not realized until higher
scenario parameters were introduced.
Analysis of the results indicates that SafeTrack repairs
must reduce disruption severity and probability by roughly
30% - 50% for Metro riders to experience improved trip
safety and reliability.

Conclusions
Improvements in
Stochastic System
Biases &
Assumptions
Data Quality
Springboard for
Future Work
SafeTrack’s improvements
may not be noticed if they
do not overcome the
system’s random noise
Recognizing biases and
stating assumptions is
key to data science
The importance of
accurate data cannot be
overstated
Our software can be
generalized and adapted

Ppt gu scs team2_a_metro project

More Related Content

Similar to Ppt gu scs team2_a_metro project

Recently uploaded

Ppt gu scs team2_a_metro project