Cost data & sources

Calibration (track record)

Example: the Cost Index tags a beef move “strong.” That word isn’t a vibe — across 1,877 graded calls, the ones it labeled strong came true about 58% of the time versus a coin-flip near 50%, so a strong tag is a checked claim, not a hunch.

does the confidence earn its name?

Method

Cost Data & Sources

Testing whether stated confidence matches the real hit-rate. A forecast that says it is “70% sure” is well-calibrated only if calls like it come true about 70% of the time. The Cost Index publishes its own: its weak, medium, and strong direction calls came true about 48%, 51%, and 58% of the time across 1,877 scored calls, against a coin-flip baseline near 50%.

Why it matters

A confidence label is only worth reading if it’s earned. Calibration is how you check: grade a long run of past calls and see whether each tier comes true as often as it claims. The Cost Index does this in the open — “strong” calls land near 58%, “weak” ones near 48%, against a coin-flip near 50% — so “strong” genuinely beats chance and “weak” honestly barely does. The labels are earned, not decorative.

Watch the 90-second explainer

Frequently asked

What does calibration mean?

Calibration is the test of whether stated confidence matches reality. If you make a batch of calls you say you're 70% sure about, you're well-calibrated only when about 70% of them come true. Being right isn't the bar; being right as often as you claimed is. The honest way to prove it is to grade a long run of past calls and compare each confidence level against how often calls at that level actually came true.

How is a confidence label different from being right?

A single right call proves nothing about a label — even a coin gets heads half the time. Calibration is about the rate across many calls. A label like "strong" earns its name only if strong calls come true more often than weak ones, and more often than a coin flip. A label that doesn't beat chance is decoration, not information.

How well calibrated is the Muntin Cost Index?

The index grades its own direction calls and publishes the record. Across 1,877 scored calls, the ones it labeled weak came true about 48% of the time, medium about 51%, and strong about 58% — each tier above the last, and all measured against a coin-flip baseline near 50%. So "strong" genuinely beats chance and "weak" honestly barely does. The labels are earned, not decorative, and the full record is published at /cost-index/calibration.json.

What is a well-calibrated forecast?

One whose confidence matches its hit-rate: if it says it is 70% sure, calls like it come true about 70% of the time. Being right isn't enough — being right as often as you claimed is. The Cost Index checks its own across 1,877 scored calls: weak, medium, and strong direction calls came true about 48%, 51%, and 58% of the time, each tier earning its label against a roughly 50% baseline.

The 90-second explainer

Calibration, in 90 seconds.

Does the confidence label earn its name? 74s

0:00 / 1:14

A confidence label is only worth reading if it is earned. "Strong" should mean more than a hunch — it should mean that calls labeled strong actually come true more often. If it does not, the word is decoration.
Here is the test. You grade a long run of past calls, and for each level you check how often calls at that level actually came true. Being right once proves nothing — even a coin lands heads half the time. What matters is the rate across many calls.
Here is what the Cost Index showed. Weak calls came true about forty-eight percent of the time, medium about fifty-one, and strong about fifty-eight. Each tier above the last — a climbing bar chart, the same look as the term card itself.
Compare that to a coin flip — about fifty percent. Draw that line across the bars. "Strong," near fifty-eight, genuinely beats it. "Weak," near forty-eight, honestly barely does. That is how you know the labels carry real information, not just confidence.
The labels are earned, not decorative. When the Cost Index says strong, that word has a track record behind it. The full record is public at slash cost dash index slash calibration dot json.

Browse all
171 terms.

Plain-English definitions for every term in your audit, organized by category.

Open Cost Pulse Open the full glossary