Case Study ยท Transit Intelligence

SunTransit

A real-time public transit tracking and performance analytics system that turns GTFS-Realtime feeds into live vehicle visibility, stop-level delay analysis, and reusable historical datasets.

Real-time tracking Delay analytics Historical trip data Phoenix + Boston
2

Transit agencies currently integrated into the pipeline.

13M-34.6M

Vehicle-location messages processed each day from continuously polled GTFS-Realtime feeds.

Stop-level

Delay tracking compares actual arrivals against scheduled trips.

Reusable

Historical trip records are stored for later analysis and research.

Tech Stack

Core systems behind SunTransit

Python Python
Kafka Kafka
Spark Spark
Docker Docker
Airflow Airflow
Linux Linux
Jupyter Jupyter

Problem

What SunTransit solves

Transit riders usually see fragments of the picture: a static route plan, an ETA, or a map that does not explain whether the system is actually performing well. SunTransit closes that gap by combining live location feeds with schedule-aware delay computation.

The result is a system that behaves like a transit operations lens. Riders can see where buses and trains are right now, while planners and researchers can inspect reliability patterns over time at the stop, trip, and agency level.

  • Track buses and trains on a live map.
  • Measure whether each vehicle is on time or delayed at each stop.
  • Persist historical records for deeper analysis and downstream products.
  • Support agencies, developers, and urban mobility research use cases.

Architecture

Data pipeline and system shape

SunTransit continuously ingests transit updates, normalizes them against scheduled service, and computes stop-level delay signals that can be explored both live and historically. The core design goal is simple: keep the real-time experience responsive while preserving a structured data product for later analytics.

This makes the project both an application and a data platform. One side serves an interface for live monitoring; the other side builds a durable dataset that can support research, reliability analysis, and future product extensions.

  • Ingest live agency feeds on a recurring schedule.
  • Join real-time observations with schedule information.
  • Compute arrival delay signals at the stop and trip level.
  • Store processed events for dashboards and longitudinal analysis.
SunTransit dataflow diagram

System flow

The platform ingests transit data, processes it into schedule-aware events, and exposes both a live monitoring experience and a historical performance dataset.

Interface

How the product is experienced

Live map showing real-time vehicle tracking

Live tracking

A continuously updating map surface for watching active buses and trains move through the network in real time.

Delay heatmap showing stop-level performance

Delay heatmap

A stop-level view of where delays accumulate, making reliability issues much easier to spot than raw event logs or tables.

Outputs

Two products in one system

1. Data product

A structured historical record of how each trip actually performed against schedule. This is the analytical backbone of the project and the piece that becomes most useful for deeper reliability studies.

2. Dashboard

A live interface that surfaces current vehicle positions and transit performance in a way that is immediately readable for both operators and end users.

Access

Reuse and collaboration

If you are working on transit analytics, urban systems, or downstream modeling and want access to the historical dataset, reach out directly. The project was built with reuse in mind.

The only requirement is attribution if the dataset is used in a project, paper, or public release.

Project links

Interested in the system or the dataset?

The repository contains the implementation details. For collaboration, dataset access, or portfolio context, use the links here.