Trend Analysis of Transit Crime Complaints and Subway Ridership

MTA Logo NYC Transit

Executive Summary

This analysis examines the relationship between transit crime complaints and daily subway ridership across New York City using 12 million records of the city's open data. My findings reveal critical actionable insights to the security team at MTA Headquarters to help provide New Yorkers a safe subway experience.

Key Questions Addressed:

  • Do complaints affect ridership?
  • What temporal patterns emerge?
  • Are there any crime hotspots?

Methodology Overview

Data Sources

MTA Ridership Data, NYPD Transit Crime Complaints Data, MTA Subway Station Data

Time Period Analyzed

January 2024 - June 2024

Analysis Techniques

Geospatial Mapping, Time Series Analysis, Correlation Studies, Statistical Thresholding

Area of Consideration

Brooklyn, Manhattan, Queens, The Bronx

Type of Crimes

Major Felonies (Grand Larceny, Felony Assault, Robbery, Burglary, Rape)

Key Takeaways

Crime-Ridership Correlation

Complaints generally rise with ridership, except at 14 St–Union Sq and 74 Broadway – Jackson Heights, where complaints are high despite low ridership.

T-tests suggest the complaint anomaly is likely due to chance, not differences in average ridership.

A weak cross-correlation index of 0.18 indicates that complaint volume had no significant effect on ridership at these stations.

Notable Insight

Anomaly detection found unusual ridership patterns on 8% of complaint days at 14 St - Union Sq and 11.12% at 74 Broadway - Jackson Heights, suggesting ridership behaved unexpectedly on those days.

Geographic Hotspots

Complaint Days vs Ridership Chart

Exploratory Data Analysis

Complaint Statistics

  • Total Number of Days in the Time Period: 182
  • Number of Days with Complaints: 182
  • Total Number of Complaints: 1,007
  • Average Number of Daily Complaints: 5.53
  • Maximum Number of Complaints in a Day: 15 (occurred on May 4)
  • Most Common Type of Complaint: Grand Larceny with 541 complaints (24% of total) and a daily average of 2.97 complaints.

* For major felonies.

Ridership Statistics

  • Total Number of Complaint Stations: 256
  • Average Daily Ridership: 10,784.23
  • Standard Deviation: 3,412.59
  • Median Ridership: 5,598.0
  • Minimum Ridership: 10.0
  • 25% of Stations had ridership ≤ 9,452.82
  • 75% of Stations had ridership ≤ 12,352.86
  • Maximum Ridership: 158,893.0

* For major felonies per complaint station

Temporal Patterns

Complaint Trends

Complaints count is sporadic throughout the year. While total daily complaints is low on most holidays, both weekends and holidays show elevated daily averages.

Ridership Trends

Daily ridership peaks during weekdays, with a dip on weekends. Ridership significantly drops on holidays.

Station-Level Trends

Ridership across stations with complaints closely mirrors the trend across all stations. 85.9% of the total daily ridership comes from complaint stations.

Methodology

Objective

To investigate anomaly stations with high complaint days frequency but lower average ridership.

Stations Analyzed

Anomaly: 14 St–Union Sq, 74 Broadway–Jackson Heights

Non-Anomaly: Times Sq–42 St, Grand Central–42 St, 34 St–Penn Station

Approach

  • Standardization (Z-Scores): Measured how far each station's metrics deviate from their mean.
  • Hypothesis Testing:
    • F-Tests: Found no difference in data variability between anomaly and non-anomaly groups.
    • T-Tests: To estimate true difference between means of the two groups.
      • Number of Complaint Days: No significant mean difference (p > 0.05, t = -0.26)
      • Avg Daily Ridership: Significant mean difference (p < 0.05, t = -3.35)
  • Correlation Checks:
    • Autocorrelation: No strong patterns based on past days.
    • Cross-Correlation: Weak link between complaint days and ridership (max correlation = 0.18 at 2-day lag)
  • Anomaly Detection: Used statistical thresholding (Threshold = Mean ± 2 × Standard Deviation) based on stable, non-extreme data.

T-Test Results

p-value Two-Tail
Number of complaint days 0.82
Average daily ridership 0.04

Anomaly Detection Model Results

Number of complaint days (non-holiday) flagged as anomalies for average daily ridership:

2 out of 25
14 St-Union Sq
2 out of 18
74 Broadway - Jackson Heights

Conclusions

Although not every complaint leads to an arrest, complaints still provide insight into what is troubling subway riders in New York City and where. This will help senior leadership take suitable actions to improve safety in the system.

Actionable Insights

Higher daily ridership usually comes with more complaint days, except for 14 St–Union Sq and 74 Broadway–Jackson Heights Roosevelt Av stations, where ridership is relatively low.

Statistical testing revealed that average daily ridership at these stations does not depend on number of complaint days.

Hence, deviations in ridership may be due to other factors such as socio-economic environment and design vulnerabilities (e.g., poor lighting, entrance/exit congestion, lack of patrol presence).

Anomaly detection model allows for plugging in new data to identify similar deviations from the norm.