KEMBAR78
Open Transit Data - A Developer's Perspective | PPTX
Open Transit Data
A Developer’s Perspective
Sean J. Barbeau, Ph.D.
Center for Urban Transportation Research
University of South Florida
Overview
 Why Open Data?
 Anatomy of Transit Data Sharing
 GTFS and Related Data Formats
Why open data?
What is open data?
 Transit data that is shared with the
public
 Typically shared via website / FTP site /
web services
 Should be updated regularly, with any
changes in schedule / routes / stops
Open [Data Architecture Source]
 Open architectures mostly focus on:
• Standards within an agency’s software/hardware systems
• Interconnectivity with other government systems
 Open source means software source code is available
 Open data is the sharing of data with external public parties
3rd party
developers
OPEN DATA
Transit Agency
Transit Vehicle AVL Server
Schedule System
Why is open data important?
TCRP 115 – Open Data: Challenges and Opportunities for Transit
Agencies by Carol Schweiger (2015)
 “The benefits to the agency strongly support open transit data.
The availability of open transit data encourages innovation that
could not be accomplished solely by agency staff.
 The top five overall benefits experienced by survey respondents
were:
• (1) increased awareness of our services
• (2) empowered our customers
• (3) encouraged innovation
• (4) improved the perception of our agency (e.g.,
openness/transparency)
• (5) provided opportunities for private businesses
 The legal fears often thought to be barriers to opening transit
data have not been realized.”
http://www.trb.org/Publications/Blurbs/172202.aspx
Why is open data important?
7/20
TCRP 213 – Data Sharing Guidance for Public Transit Agencies –
Now and in the Future (2020)
 “Sharing data can facilitate the following:
• Promote transparency and increase awareness of the
transit agency and its engagement with transit customers.
• Spur innovation and support research that can help transit
agencies plan better service and operate more efficiently.
• Enable cost savings for transit agencies by using outside
resources for data processing and analysis.
• Generate revenue (e.g., through advertising).
• Support improved customer information.
• Support other community functions, such as informing
municipalities, real estate developers, and even law
enforcement agencies.”
trb.org/Main/Blurbs/180188.aspx
Successful open data programs
TCRP 115 – Open Data: Challenges and Opportunities
for Transit Agencies by Carol Schweiger (2015)
 “Five factors lead to a successful open data
program:
• (1) obtaining and maintaining management-level
support for such a program
• (2) recognizing the need for the appropriate level of
resources required to provide and maintain open data
• (3) establishing ways to monitor data accuracy,
timeliness, reliability, quality, usage, and maintenance
• (4) creating and maintaining licensing or registration
• (5) having an ongoing dialogue with both developers
and customers, a practice shown to increase the value
of the data and products that are based on the data” http://www.trb.org/Publications/Blurbs/172202.aspx
The anatomy of
transit data sharing
© 1998 Nick Veasey
Two Types of Open Transit Data
1. Static
• Transit schedules / routes / stops
• Change ~3-4 times a year
2. Real-time
• Estimated arrival times / vehicle
positions / service alerts
• Can change every few seconds
Two Magnitudes of Open Data
A. “Fire hose”
• A dump of the complete state of the transit system
• Not directly suitable for mobile devices
Static -> All transit schedules/routes/stops
Real-time -> All estimated arrivals/vehicle positions/service alerts
B. “Faucet”
• Precise subset of transit data
• Suitable for mobile devices
Static -> “Stop ID 10 is served by Route 5”
Real-time -> “It is 2 minutes until Route 5 bus arrives at Stop ID 10”
Transit Data Flow
Producer Consumer
Transit rider
app
Open Data
(“Faucet”)
Open Data
(“Fire hose”)
Agencies should focus on
producing “fire hose” data
first, “faucet” data second
Standard “fire hose”
formats:
• GTFS
• GTFS-realtime
Standard “faucet” formats:
• SIRI
GTFS and related formats
Successful Open Data Formats Are…
 Organic
• Created and improved by the people producing and consuming the data
 Open
• Open process for evolution
• Data/documentation not hidden behind log-ins
 Easy-to-use for app developers
• Is documentation simple to understand?
• Are there existing open-source software tools?
• Is data provided via best practice web service design (e.g., using RESTful
API with JSON, instead of SOAP with XML)?
General Transit Feed Specification (GTFS)
 Created by TriMet and Google in 2005
 Has become a de facto standard world-wide for transit schedule/route/stop data
 Over 1,500 agencies share open data in GTFS format
 GTFS-realtime for predictions (TripUpdates), VehiclePositions, and service Alerts
GTFS data consists of multiple text files GTFS data powers many apps
Quality is important!
 Any disconnect between agency data and app developers is
jarring to riders
 In one study, 9% of riders said they took the bus less often
due to errors in real-time information[9]
 Data errors and inconsistencies make analysis hard
 Use GTFS and GTFS-realtime validators to catch errors:
• https://gtfs.org/testing/
[9] A. Gooze, K. Watkins, and A. Borning (2013), “Benefits of Real-Time Information and the Impacts of Data Accuracy on the Rider
Experience,” in Transportation Research Board 92nd Annual Meeting, Washington, D.C., January 13, 2013.
gtfs.org/best-practices
 Recommendations & examples
 Organized by file, field, and “cases”
 Matches recommendations to type of
consuming application:
• Trip planning
• Arrival estimation
• Timetable generation
http://gtfs.org/best-practices/
Important best practices
 Datasets should be published at a public, permanent URL, including the zip file
name (www.agency.org/gtfs/gtfs.zip)
• http://gtfs.org/best-practices/#publishing_1
 Keep IDs the same across GTFS datasets
• http://gtfs.org/best-practices/#publishing_3
 One GTFS dataset should contain current and upcoming service (sometimes
called a “merged” dataset).
• At any time, the published GTFS dataset should be valid for at least the next 7
days.
• If possible, the GTFS dataset should cover at least the next 30 days of service.
• http://gtfs.org/best-practices/#publishing_4
 No login should be required, but may use API key
GTFS-Continuous Stops
 Indicates riders can board or alight a vehicle along the route
alignment
 Adds two fields to stop_times.txt
• continuous_pickup, continuous_drop_off
◦ 0 - Continuous stopping
◦ 1 - No continuous stopping
◦ 2 - Must phone agency to arrange continuous stopping
◦ 3 - Must coordinate with driver to arrange continuous
stopping
 Adopted on May 13th, 2020
• Producers: Trillium and TriMet
• Consumers: Google
 For details:
• Proposal - https://github.com/google/transit/pull/208
• Example data
Proposal: GTFS-Flex v2 (https://bit.ly/gtfs-flex-v2)
 Flexible service that include some scheduled stops:
• Route deviation services: the vehicle serves a fixed route
and ordered set of stops, and may detour to pick up or
drop off a passenger between stops
• Point-to-zone service: the rider can board at a fixed stop
such as a train station, and then alight anywhere within
an area, or vice versa
• Point deviation or checkpoint service: the rider can
board at a fixed stop, and then alight anywhere among an
unordered list of stops, or the opposite.
 Booking rules - How far in advance booking should occur
or a phone number that should be called
 Booking times – Describes availability for on-demand
services where trips do not operate unless the service is
requested by at least one rider (e.g., one location to
another)
HART Flex South County service – Tampa, FL
https://mobilitydata.org/
General Bikeshare Feed Specification (GBFS)
 For sharing locations and availability
of bikeshare and scooters
 v2.0 adds deep-links between
multimodal (Google Maps, Transit
App) and bikeshare provider apps
• See MobilityData “What’s New in
GBFS v2.0” article
 v2.1-RC includes geofencing
information for floating bikes, rental
areas
TCRP G-16
 “Development of Transactional
Data Specifications for Demand-
Responsive Transportation”
 Released in 2020
 Describes interactive process of
ordering and delivering a trip
What’s next for agencies?
 Talk to your peers prior to RFP and contracting with vendor
• https://mobilitydata.org/
 In RFPs/contracts, require that all scheduling software and automatic
vehicle location (AVL) vendors:
• Provide frequently-updated open GTFS and GTFS-realtime (TripUpdates,
VehiclePositions, Alert) data
• Follow GTFS Best Practices
• Use GTFS and GTFS-realtime validators
• If same vendor not providing GTFS and GTFS-realtime, ensure they can integrate
with each other (i.e., the IDs match)
 Test data quality prior to finalizing procurement
 Follow practices for successful open data program (TCRP 115)
 If interested in GTFS-Flex v2, contact me or comment on proposal
Thank You!
Sean J. Barbeau, Ph.D.
barbeau@usf.edu
@sjbarbeau

Open Transit Data - A Developer's Perspective

  • 1.
    Open Transit Data ADeveloper’s Perspective Sean J. Barbeau, Ph.D. Center for Urban Transportation Research University of South Florida
  • 2.
    Overview  Why OpenData?  Anatomy of Transit Data Sharing  GTFS and Related Data Formats
  • 3.
  • 4.
    What is opendata?  Transit data that is shared with the public  Typically shared via website / FTP site / web services  Should be updated regularly, with any changes in schedule / routes / stops
  • 5.
    Open [Data ArchitectureSource]  Open architectures mostly focus on: • Standards within an agency’s software/hardware systems • Interconnectivity with other government systems  Open source means software source code is available  Open data is the sharing of data with external public parties 3rd party developers OPEN DATA Transit Agency Transit Vehicle AVL Server Schedule System
  • 6.
    Why is opendata important? TCRP 115 – Open Data: Challenges and Opportunities for Transit Agencies by Carol Schweiger (2015)  “The benefits to the agency strongly support open transit data. The availability of open transit data encourages innovation that could not be accomplished solely by agency staff.  The top five overall benefits experienced by survey respondents were: • (1) increased awareness of our services • (2) empowered our customers • (3) encouraged innovation • (4) improved the perception of our agency (e.g., openness/transparency) • (5) provided opportunities for private businesses  The legal fears often thought to be barriers to opening transit data have not been realized.” http://www.trb.org/Publications/Blurbs/172202.aspx
  • 7.
    Why is opendata important? 7/20 TCRP 213 – Data Sharing Guidance for Public Transit Agencies – Now and in the Future (2020)  “Sharing data can facilitate the following: • Promote transparency and increase awareness of the transit agency and its engagement with transit customers. • Spur innovation and support research that can help transit agencies plan better service and operate more efficiently. • Enable cost savings for transit agencies by using outside resources for data processing and analysis. • Generate revenue (e.g., through advertising). • Support improved customer information. • Support other community functions, such as informing municipalities, real estate developers, and even law enforcement agencies.” trb.org/Main/Blurbs/180188.aspx
  • 8.
    Successful open dataprograms TCRP 115 – Open Data: Challenges and Opportunities for Transit Agencies by Carol Schweiger (2015)  “Five factors lead to a successful open data program: • (1) obtaining and maintaining management-level support for such a program • (2) recognizing the need for the appropriate level of resources required to provide and maintain open data • (3) establishing ways to monitor data accuracy, timeliness, reliability, quality, usage, and maintenance • (4) creating and maintaining licensing or registration • (5) having an ongoing dialogue with both developers and customers, a practice shown to increase the value of the data and products that are based on the data” http://www.trb.org/Publications/Blurbs/172202.aspx
  • 9.
    The anatomy of transitdata sharing © 1998 Nick Veasey
  • 10.
    Two Types ofOpen Transit Data 1. Static • Transit schedules / routes / stops • Change ~3-4 times a year 2. Real-time • Estimated arrival times / vehicle positions / service alerts • Can change every few seconds
  • 11.
    Two Magnitudes ofOpen Data A. “Fire hose” • A dump of the complete state of the transit system • Not directly suitable for mobile devices Static -> All transit schedules/routes/stops Real-time -> All estimated arrivals/vehicle positions/service alerts B. “Faucet” • Precise subset of transit data • Suitable for mobile devices Static -> “Stop ID 10 is served by Route 5” Real-time -> “It is 2 minutes until Route 5 bus arrives at Stop ID 10”
  • 12.
    Transit Data Flow ProducerConsumer Transit rider app Open Data (“Faucet”) Open Data (“Fire hose”) Agencies should focus on producing “fire hose” data first, “faucet” data second Standard “fire hose” formats: • GTFS • GTFS-realtime Standard “faucet” formats: • SIRI
  • 13.
  • 14.
    Successful Open DataFormats Are…  Organic • Created and improved by the people producing and consuming the data  Open • Open process for evolution • Data/documentation not hidden behind log-ins  Easy-to-use for app developers • Is documentation simple to understand? • Are there existing open-source software tools? • Is data provided via best practice web service design (e.g., using RESTful API with JSON, instead of SOAP with XML)?
  • 15.
    General Transit FeedSpecification (GTFS)  Created by TriMet and Google in 2005  Has become a de facto standard world-wide for transit schedule/route/stop data  Over 1,500 agencies share open data in GTFS format  GTFS-realtime for predictions (TripUpdates), VehiclePositions, and service Alerts GTFS data consists of multiple text files GTFS data powers many apps
  • 16.
    Quality is important! Any disconnect between agency data and app developers is jarring to riders  In one study, 9% of riders said they took the bus less often due to errors in real-time information[9]  Data errors and inconsistencies make analysis hard  Use GTFS and GTFS-realtime validators to catch errors: • https://gtfs.org/testing/ [9] A. Gooze, K. Watkins, and A. Borning (2013), “Benefits of Real-Time Information and the Impacts of Data Accuracy on the Rider Experience,” in Transportation Research Board 92nd Annual Meeting, Washington, D.C., January 13, 2013.
  • 17.
    gtfs.org/best-practices  Recommendations &examples  Organized by file, field, and “cases”  Matches recommendations to type of consuming application: • Trip planning • Arrival estimation • Timetable generation http://gtfs.org/best-practices/
  • 18.
    Important best practices Datasets should be published at a public, permanent URL, including the zip file name (www.agency.org/gtfs/gtfs.zip) • http://gtfs.org/best-practices/#publishing_1  Keep IDs the same across GTFS datasets • http://gtfs.org/best-practices/#publishing_3  One GTFS dataset should contain current and upcoming service (sometimes called a “merged” dataset). • At any time, the published GTFS dataset should be valid for at least the next 7 days. • If possible, the GTFS dataset should cover at least the next 30 days of service. • http://gtfs.org/best-practices/#publishing_4  No login should be required, but may use API key
  • 19.
    GTFS-Continuous Stops  Indicatesriders can board or alight a vehicle along the route alignment  Adds two fields to stop_times.txt • continuous_pickup, continuous_drop_off ◦ 0 - Continuous stopping ◦ 1 - No continuous stopping ◦ 2 - Must phone agency to arrange continuous stopping ◦ 3 - Must coordinate with driver to arrange continuous stopping  Adopted on May 13th, 2020 • Producers: Trillium and TriMet • Consumers: Google  For details: • Proposal - https://github.com/google/transit/pull/208 • Example data
  • 20.
    Proposal: GTFS-Flex v2(https://bit.ly/gtfs-flex-v2)  Flexible service that include some scheduled stops: • Route deviation services: the vehicle serves a fixed route and ordered set of stops, and may detour to pick up or drop off a passenger between stops • Point-to-zone service: the rider can board at a fixed stop such as a train station, and then alight anywhere within an area, or vice versa • Point deviation or checkpoint service: the rider can board at a fixed stop, and then alight anywhere among an unordered list of stops, or the opposite.  Booking rules - How far in advance booking should occur or a phone number that should be called  Booking times – Describes availability for on-demand services where trips do not operate unless the service is requested by at least one rider (e.g., one location to another) HART Flex South County service – Tampa, FL https://mobilitydata.org/
  • 21.
    General Bikeshare FeedSpecification (GBFS)  For sharing locations and availability of bikeshare and scooters  v2.0 adds deep-links between multimodal (Google Maps, Transit App) and bikeshare provider apps • See MobilityData “What’s New in GBFS v2.0” article  v2.1-RC includes geofencing information for floating bikes, rental areas
  • 22.
    TCRP G-16  “Developmentof Transactional Data Specifications for Demand- Responsive Transportation”  Released in 2020  Describes interactive process of ordering and delivering a trip
  • 23.
    What’s next foragencies?  Talk to your peers prior to RFP and contracting with vendor • https://mobilitydata.org/  In RFPs/contracts, require that all scheduling software and automatic vehicle location (AVL) vendors: • Provide frequently-updated open GTFS and GTFS-realtime (TripUpdates, VehiclePositions, Alert) data • Follow GTFS Best Practices • Use GTFS and GTFS-realtime validators • If same vendor not providing GTFS and GTFS-realtime, ensure they can integrate with each other (i.e., the IDs match)  Test data quality prior to finalizing procurement  Follow practices for successful open data program (TCRP 115)  If interested in GTFS-Flex v2, contact me or comment on proposal
  • 24.
    Thank You! Sean J.Barbeau, Ph.D. barbeau@usf.edu @sjbarbeau