KEMBAR78
Ppt gu scs team2_a_metro project | PPTX
Predicting Post-Safetrack Metro Reliability
GU SCS Data Science Capstone Project September 10, 2016
over 250m riders annually
118 miles of track
Facts
over 13 disruptions per day
Problem Statement
Problem Statement
highly publicized safety lapses
& deferred maintenance
1 Year timeframe
estimated $60,000,000 price tag
improved safety & reliability?
Hypothesis
The DC Metro System is a pivotal transportation asset for Washington DC and the surrounding
regions. The SafeTrack project is meant to increase system safety and reliability. While technical and
operational disruptions are inevitable, we believe that available data can provide insight into how
frequently Metro riders will experience post-SafeTrack disruptions and ultimately improve their Metro
commute expectations.
Scenario #1
Improvement
Scenario #2
Improvement
Scenario #3
Improvement
To quantify the outcome, we will explore several scenarios to provide riders with a
clearer picture of their post-safetrack commute.
Scenario #4
Improvement
Scenario #5
Improvement
Data Ingestion & Wrangling
System Operations Data: used to determine system
behavior under optimal conditions
Disruption Data: historical data used to analyze the frequency
and effect of technical and operational
disruptions (ie: delays)
Ridership Data: in conjunction with operational datasets,
ridership data used to quantify and extrapolate
the scope of Metro delays.
The Data
ON TIME
ON TIME
ON TIME DELAYED
DELAYED
DELAYED
Planned Operating
Schedule
Disruption Data
Data_Source: wmata.com
Data_Scope:
Provided operating data
under a perfectly
efficient system with no
delays or disruptions
Data_Scope:
Provided 5 years of daily
disruption logs,
including; cause of
disruption and minutes
delayed
Data_Source: opendatadc
Planned Operating
Schedule and
Disruption Data
provided
a basis for
comparing pre and
post-safetrack
system behavior
LN CAR DEST MINLN CAR DEST MIN
RD 6
RD 6
RD 6
RD 6
RD 6
RD 6
The Data
24,335
records
between April
2012 - July
2016
All Metro lines
represented in
the dataset
Description of
disruption
cause.
Translated as
technical or
operational
Delay, in
minutes
Computation & Analysis: Limitations
AccuracyLocation
Station - To -
Station
‘Garbage in -
Garbage out’
concept
Opted to take a two-pronged approach:
1.) Build data product
2.) Develop simulation based on available data
Completeness
Compounding
Delays
Computation & Analysis: Methodology
1
Calculated the number of minutes of trips per day on each
line.
Broke daily delays into five tiers based on severity.
Scenario:1 Scenario:2 Scenario:3 Scenario:5
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Tier 1
Tier 2
Tier 3
Tier 4
Tier 1
Scenario:4
Built in compounding delays based on expected train
departures.
Injected random noise into the system.
2
3
4
Results of Simulated Scenario
A Look Under The Hood
[software system demo]
Results
Created visualizations of
the various simulations
Analyzed results to
determine the shape of the
data
Results
Current 9861.402 102.51522
Scenario #1 9868.713 97.10936
Scenario #2 9854.400 108.57028
Scenario #3 9852.256 102.1384
Scenario #4 9850.429 101.7149
Scenario #5 9848.057 104.1241
Current 8121.386 95.954
Scenario #1 8117.496 97.341
Scenario #2 8115.761 99.953
Scenario #3 8114.653 104.407
Scenario #4 8104.47 99.702
Scenario #5 8093.36 98.429
Current 5280.572 100.5566
Scenario #1 5261.651 92.5748
Scenario #2 5262.043 114.093
Scenario #3 5020.293 41.431
Scenario #4 5014.868 41.251
Scenario #5 5013.92 40.980
Current 6762.053 97.839
Scenario #1 6765.053 97.839
Scenario #2 6759.09 103.266
Scenario #3 6562.22 52.973
Scenario #4 6552.85 53.316
Scenario #5 6540.79 48.947
Current 6811.311 108.8495
Scenario #1 6815.311 108.8495
Scenario #2 6816.787 105.2023
Scenario #3 6809.531 108.5713
Scenario #4 6810.966 97.1970
Scenario #5 6809.322 98.0109
Current 11149.5 97.3886
Scenario #1 11159.6 98.4512
Scenario #2 11146.33 99.5911
Scenario #3 11138.77 112.8393
Scenario #4 11132.07 97.0613
Scenario #5 11123.83 101.226
Conclusions
Scenario #1 Scenario #2 Scenario #3 Scenario #4 Scenario #5
Noticeable improvements in time and probability of delay was not realized until higher
scenario parameters were introduced.
Analysis of the results indicates that SafeTrack repairs
must reduce disruption severity and probability by roughly
30% - 50% for Metro riders to experience improved trip
safety and reliability.
Conclusions
Improvements in
Stochastic System
Biases &
Assumptions
Data Quality
Springboard for
Future Work
SafeTrack’s improvements
may not be noticed if they
do not overcome the
system’s random noise
Recognizing biases and
stating assumptions is
key to data science
The importance of
accurate data cannot be
overstated
Our software can be
generalized and adapted
Questions?
?

Ppt gu scs team2_a_metro project

  • 1.
    Predicting Post-Safetrack MetroReliability GU SCS Data Science Capstone Project September 10, 2016
  • 2.
    over 250m ridersannually 118 miles of track Facts over 13 disruptions per day Problem Statement
  • 3.
    Problem Statement highly publicizedsafety lapses & deferred maintenance 1 Year timeframe estimated $60,000,000 price tag improved safety & reliability?
  • 4.
    Hypothesis The DC MetroSystem is a pivotal transportation asset for Washington DC and the surrounding regions. The SafeTrack project is meant to increase system safety and reliability. While technical and operational disruptions are inevitable, we believe that available data can provide insight into how frequently Metro riders will experience post-SafeTrack disruptions and ultimately improve their Metro commute expectations. Scenario #1 Improvement Scenario #2 Improvement Scenario #3 Improvement To quantify the outcome, we will explore several scenarios to provide riders with a clearer picture of their post-safetrack commute. Scenario #4 Improvement Scenario #5 Improvement
  • 5.
    Data Ingestion &Wrangling System Operations Data: used to determine system behavior under optimal conditions Disruption Data: historical data used to analyze the frequency and effect of technical and operational disruptions (ie: delays) Ridership Data: in conjunction with operational datasets, ridership data used to quantify and extrapolate the scope of Metro delays.
  • 6.
    The Data ON TIME ONTIME ON TIME DELAYED DELAYED DELAYED Planned Operating Schedule Disruption Data Data_Source: wmata.com Data_Scope: Provided operating data under a perfectly efficient system with no delays or disruptions Data_Scope: Provided 5 years of daily disruption logs, including; cause of disruption and minutes delayed Data_Source: opendatadc Planned Operating Schedule and Disruption Data provided a basis for comparing pre and post-safetrack system behavior LN CAR DEST MINLN CAR DEST MIN RD 6 RD 6 RD 6 RD 6 RD 6 RD 6
  • 7.
    The Data 24,335 records between April 2012- July 2016 All Metro lines represented in the dataset Description of disruption cause. Translated as technical or operational Delay, in minutes
  • 8.
    Computation & Analysis:Limitations AccuracyLocation Station - To - Station ‘Garbage in - Garbage out’ concept Opted to take a two-pronged approach: 1.) Build data product 2.) Develop simulation based on available data Completeness Compounding Delays
  • 9.
    Computation & Analysis:Methodology 1 Calculated the number of minutes of trips per day on each line. Broke daily delays into five tiers based on severity. Scenario:1 Scenario:2 Scenario:3 Scenario:5 Tier 2 Tier 3 Tier 4 Tier 5 Tier 1 Tier 2 Tier 3 Tier 4 Tier 5 Tier 1 Tier 2 Tier 3 Tier 4 Tier 5 Tier 1 Tier 2 Tier 3 Tier 4 Tier 5 Tier 1 Tier 2 Tier 3 Tier 4 Tier 1 Scenario:4 Built in compounding delays based on expected train departures. Injected random noise into the system. 2 3 4
  • 10.
  • 11.
    A Look UnderThe Hood [software system demo]
  • 12.
    Results Created visualizations of thevarious simulations Analyzed results to determine the shape of the data
  • 13.
    Results Current 9861.402 102.51522 Scenario#1 9868.713 97.10936 Scenario #2 9854.400 108.57028 Scenario #3 9852.256 102.1384 Scenario #4 9850.429 101.7149 Scenario #5 9848.057 104.1241 Current 8121.386 95.954 Scenario #1 8117.496 97.341 Scenario #2 8115.761 99.953 Scenario #3 8114.653 104.407 Scenario #4 8104.47 99.702 Scenario #5 8093.36 98.429 Current 5280.572 100.5566 Scenario #1 5261.651 92.5748 Scenario #2 5262.043 114.093 Scenario #3 5020.293 41.431 Scenario #4 5014.868 41.251 Scenario #5 5013.92 40.980 Current 6762.053 97.839 Scenario #1 6765.053 97.839 Scenario #2 6759.09 103.266 Scenario #3 6562.22 52.973 Scenario #4 6552.85 53.316 Scenario #5 6540.79 48.947 Current 6811.311 108.8495 Scenario #1 6815.311 108.8495 Scenario #2 6816.787 105.2023 Scenario #3 6809.531 108.5713 Scenario #4 6810.966 97.1970 Scenario #5 6809.322 98.0109 Current 11149.5 97.3886 Scenario #1 11159.6 98.4512 Scenario #2 11146.33 99.5911 Scenario #3 11138.77 112.8393 Scenario #4 11132.07 97.0613 Scenario #5 11123.83 101.226
  • 14.
    Conclusions Scenario #1 Scenario#2 Scenario #3 Scenario #4 Scenario #5 Noticeable improvements in time and probability of delay was not realized until higher scenario parameters were introduced. Analysis of the results indicates that SafeTrack repairs must reduce disruption severity and probability by roughly 30% - 50% for Metro riders to experience improved trip safety and reliability.
  • 15.
    Conclusions Improvements in Stochastic System Biases& Assumptions Data Quality Springboard for Future Work SafeTrack’s improvements may not be noticed if they do not overcome the system’s random noise Recognizing biases and stating assumptions is key to data science The importance of accurate data cannot be overstated Our software can be generalized and adapted
  • 16.