The v1 engine that shipped in March picks the next question properly. It does not yet pick it at the right speed. That gap is the subject of this note, and the subject of the change that went live in the alpha cohort last Wednesday.
Pacing is now a first-class signal inside the selection loop. Time-on-question and the standard error on the candidate's ability estimate move together. The engine is finally allowed to know whether the candidate is running out of clock.
The shape of the problem
Across the four hundred alpha sessions we have logged on v1, the same pattern shows up. The first eight quant items sit close to the two-minute budget. The middle six drift twenty seconds long. The last seven crash — three to four-minute responses, item-skip rates climbing, accuracy dropping by twelve points relative to the same difficulty band earlier in the section.
The accuracy drop in the back third is not driven by item difficulty. v1's Fisher-information selection actually eases the difficulty of the items in the last seven positions, because the candidate's estimated ability has usually settled by then and the engine targets information gain rather than maximum challenge. The drop is driven by time pressure: the candidate is answering items they could answer cleanly with thirty more seconds, and answering them wrong without it.
What v1 was doing about it
Nothing. The v1 loop logged time-on-question. It did not consume it. Selection ran on ability and topic balance. Pacing data sat in the response table, available for analytics, invisible to the loop. That is the gap this change closes.
The change
Pacing now enters the loop in two places. First, the candidate's pacing position — a running estimate of how much time-budget remains relative to items remaining — biases the difficulty target. When pacing is comfortable, the engine selects at maximum information. When pacing is tight, it widens the eligible band downward and prefers items the candidate is likely to answer in under a minute, recovering clock without sacrificing measurement.
Second, the stop condition is now sensitive to pacing collapse. If three consecutive responses run more than 90 seconds over their predicted time-on-task, and the candidate's standard error has stopped tightening, the engine ends the section early rather than burn clock on items it can already see are not going to land cleanly. A short, clean stop is worth more than a long, ragged one.
# v1.1 selection step
theta, se = irt_mle(history, item_params)
pace = pacing_position(history, section_clock)
eligible = pool.filter(unseen, topic_balance, exposure_cap)
target = difficulty_target(theta, se, pace)
scored = [
(item, fisher_information(item, theta))
for item in eligible
if abs(item.b - target) <= window(pace)
]
next_q = max(scored, key=lambda x: x[1])
if pacing_collapse(history) and se_stalled(history):
end_section()Two changes from the March loop. The eligible pool is filtered against a pace-dependent difficulty window before scoring. The stop condition gets a second predicate. Everything else is what we shipped four weeks ago.
What this is designed to fix, what it does not
From our internal alpha sessions, here is what the change is built to address. These are design goals and early observations, not a published outcomes study — we are not reporting validated effect sizes here.
The back-third accuracy gap. The aim is for items late in a section (positions 15–21) to land closer to the accuracy seen early (positions 1–8) at the same fitted difficulty. Part of that is pacing recovery; part is the engine routing around items it would otherwise have served at maximum difficulty in a position where the candidate has no clock to think.
Session length. The change is meant to reach the same measurement precision in less of the candidate's time — the same standard error on θ at session end, fewer wasted minutes getting there.
The pacing signal is still noisy at the item-level. Time-on-task is a composite of three things — reading speed, working-out speed, and answer commitment — and the loop currently treats it as a single scalar. The candidates whose reading is slow but whose working-out is fast get treated like the candidates whose working-out is slow, which is the wrong call about half the time. Separating those signals is the v2 pacing work and it will need the calibration pipeline to support it.
What we are not doing
We are not surfacing the pacing signal directly to the candidate in this release. A "you are running slow" prompt mid-section is the kind of feedback that produces anxiety-driven rushing — the failure mode the pacing change was supposed to address. The engine is making the pacing-aware call inside the loop. The candidate sees the consequence (a slightly easier item, a slightly shorter section) without being told what triggered it. That is the right shape of the feedback at this stage.
We will write a longer note when the pacing signal gets decomposed into its three constituents and the loop starts routing on them separately. That work is on the Q3 roadmap.
— Brightroom Engineering