Skip to main content
Track Record

How the desk's calls actually score.

A scoreboard can assert accuracy; this page lets you audit it. Every resolved public call is graded against real reporting, then scored for calibration — when the desk says 80%, does 80% actually happen?

How to read this
  • Accuracy — share of resolved calls that hit.
  • Brier — 0 is perfect, 0.25 is a coin flip; it punishes confident misses, not just wrong ones.
  • Calibration — predicted vs. what actually happened, by confidence band.
Accuracy
83%
5 hits · 1 miss across 6 verified calls. Unverified expirations are excluded, and 1 call was voided as not falsifiable at filing — struck from the score in the open, not deleted.
Brier score
0.154
Reasonably calibrated. Lower is better; 0.25 is the no-skill baseline.
Verified calls
6
The record begins 2026-07-01. It sharpens as more calls resolve.
Confidence spread
0.09
Low — the board clusters its confidence. The desk is working to spread calls so the number carries more information.
Track record as of Jul 2, 21:02 UTC
Small sample, shown honestly. With 6 resolved calls, the numbers below are directional, not statistical proof. We publish them now because the point of a calibration desk is to keep score in the open from day one — the curve earns its authority as the ledger grows, not by waiting until it looks good.
Calibration

Predicted confidence vs. what actually happened.

Each row groups resolved calls by the confidence the desk stated at filing, then shows the real hit rate for that group. A well-calibrated desk tracks the diagonal: 80%-confidence calls hit about 80% of the time. Gaps are named, not hidden.
By confidence band

Reliability table

Confidence bandCallsPredictedActualRead
50–60%158%100%under-confident
60–70%no calls yet
70–80%179%0%over-confident
80–90%483%100%under-confident
90–100%no calls yet

“Over-confident” means calls in that band hit less often than the desk claimed; “under-confident” means they hit more often. Both are calibration errors worth correcting — and both are visible here rather than averaged away into a single headline number.

Over time

The record as it accumulates.

Cumulative accuracy after each resolution, oldest to newest. Early on this jumps around on tiny counts; it stabilizes as the ledger fills.
Cumulative accuracy

Every resolution, in order

  1. 2026-07-010% (0/1)
  2. 2026-07-0150% (1/2)
  3. 2026-07-0167% (2/3)
  4. 2026-07-0175% (3/4)
  5. 2026-07-0180% (4/5)
  6. 2026-07-0183% (5/6)

These resolved together when the ledger was first verified against public reporting, so they share a date. As new calls come due on their own deadlines, this becomes a real timeline.