Exhibits 2a & 2b Assumptions
Exhibits 2a & 2b
I limited data to the most frequented entry-stations [Exhibit 1] and separated riders into two groups based on entry times.
1.gen group = 1 if time == tc(18:54)//<-- Peak [6:54 - 6:59 pm] 2.replace group = 2 if time == tc(19:00)//<-- Off-Peak [7:00 - 7:05 pm]
I compared the average number of riders in group 1 to group 2 by destination station.
1.ttest rider_count, by(group) unequal 2.gen pvalue = r(p_l)
(The average for each group was taken from 387 observations. This derives from 9 observations/day, i.e., one from each of the most frequented entry-stations [Exhibit 1] over 43 days. For observations with no riders, zero counts were inserted as these instances are omitted in the original data. The visual representation aggregates rider counts from the nine most frequented entries.)
The null hypothesis we’re testing is whether mean rider volume to a particular exit-station is equal between groups (last period of peak / first period of off-peak). This is actually a stricter standard than required considering ridership trends downward over this time period. (so we’d expect mean rider volume to be less in group 2 if the fare change had no influence over travel time decisions)
Still, the one-sided p-value produced by the test indicates that fifteen exit-stations are destination to significantly more riders beginning their trip at 7:00–7:05 pm compared to 6:54–6:59 pm. So for these exit-stations, there’s a less than 1% chance we’d see such heavy upward trending ridership randomly.
(Black Lines / Grey Lines)
Besides identifying these fifteen exit-stations of interest, the p-value allowed me to rank all exits based on their likelihood of being a destination where riders were influenced by the fare change. For simplicity, Exhibits 2a and 2b show the top nine ranked exit-stations (all with p-values < 0.01); the bottom nine stations also appear on Exhibit 2a.
The ‘natural pattern of expected ridership’ is simply a line of best fit, found by regressing unsmoothed average rider data over the afternoon period of declining ridership. Specifically, the time period used was 5:18–7:30 PM. This begins at the smoothed data peak [Exhibit 1] and extends until ridership flattens.