1. Data product
A structured historical record of how each trip actually performed against schedule. This is the analytical backbone of the project and the piece that becomes most useful for deeper reliability studies.
Case Study ยท Transit Intelligence
A real-time public transit tracking and performance analytics system that turns GTFS-Realtime feeds into live vehicle visibility, stop-level delay analysis, and reusable historical datasets.
Transit agencies currently integrated into the pipeline.
Vehicle-location messages processed each day from continuously polled GTFS-Realtime feeds.
Delay tracking compares actual arrivals against scheduled trips.
Historical trip records are stored for later analysis and research.
Tech Stack
Python
Kafka
Spark
Docker
Airflow
Problem
Transit riders usually see fragments of the picture: a static route plan, an ETA, or a map that does not explain whether the system is actually performing well. SunTransit closes that gap by combining live location feeds with schedule-aware delay computation.
The result is a system that behaves like a transit operations lens. Riders can see where buses and trains are right now, while planners and researchers can inspect reliability patterns over time at the stop, trip, and agency level.
Architecture
SunTransit continuously ingests transit updates, normalizes them against scheduled service, and computes stop-level delay signals that can be explored both live and historically. The core design goal is simple: keep the real-time experience responsive while preserving a structured data product for later analytics.
This makes the project both an application and a data platform. One side serves an interface for live monitoring; the other side builds a durable dataset that can support research, reliability analysis, and future product extensions.
The platform ingests transit data, processes it into schedule-aware events, and exposes both a live monitoring experience and a historical performance dataset.
Interface
A continuously updating map surface for watching active buses and trains move through the network in real time.
A stop-level view of where delays accumulate, making reliability issues much easier to spot than raw event logs or tables.
Outputs
A structured historical record of how each trip actually performed against schedule. This is the analytical backbone of the project and the piece that becomes most useful for deeper reliability studies.
A live interface that surfaces current vehicle positions and transit performance in a way that is immediately readable for both operators and end users.
Access
If you are working on transit analytics, urban systems, or downstream modeling and want access to the historical dataset, reach out directly. The project was built with reuse in mind.
The only requirement is attribution if the dataset is used in a project, paper, or public release.
Project links
The repository contains the implementation details. For collaboration, dataset access, or portfolio context, use the links here.