Cost data & sources

Calibration (track record)

Example: the Cost Index tags a beef move “strong.” That word isn’t a vibe — across 1,877 graded calls, the ones it labeled strong came true about 58% of the time versus a coin-flip near 50%, so a strong tag is a checked claim, not a hunch.

does the confidence earn its name?

Testing whether stated confidence matches the real hit-rate. A forecast that says it is “70% sure” is well-calibrated only if calls like it come true about 70% of the time. The Cost Index publishes its own: its weak, medium, and strong direction calls came true about 48%, 51%, and 58% of the time across 1,877 scored calls, against a coin-flip baseline near 50%.

Why it matters

A confidence label is only worth reading if it’s earned. Calibration is how you check: grade a long run of past calls and see whether each tier comes true as often as it claims. The Cost Index does this in the open — “strong” calls land near 58%, “weak” ones near 48%, against a coin-flip near 50% — so “strong” genuinely beats chance and “weak” honestly barely does. The labels are earned, not decorative.

Watch the 90-second explainer

Frequently asked

What does calibration mean?

Calibration is the test of whether stated confidence matches reality. If you make a batch of calls you say you're 70% sure about, you're well-calibrated only when about 70% of them come true. Being right isn't the bar; being right as often as you claimed is. The honest way to prove it is to grade a long run of past calls and compare each confidence level against how often calls at that level actually came true.

How is a confidence label different from being right?

A single right call proves nothing about a label — even a coin gets heads half the time. Calibration is about the rate across many calls. A label like "strong" earns its name only if strong calls come true more often than weak ones, and more often than a coin flip. A label that doesn't beat chance is decoration, not information.

How well calibrated is the Muntin Cost Index?

The index grades its own direction calls and publishes the record. Across 1,877 scored calls, the ones it labeled weak came true about 48% of the time, medium about 51%, and strong about 58% — each tier above the last, and all measured against a coin-flip baseline near 50%. So "strong" genuinely beats chance and "weak" honestly barely does. The labels are earned, not decorative, and the full record is published at /cost-index/calibration.json.

What is a well-calibrated forecast?

One whose confidence matches its hit-rate: if it says it is 70% sure, calls like it come true about 70% of the time. Being right isn't enough — being right as often as you claimed is. The Cost Index checks its own across 1,877 scored calls: weak, medium, and strong direction calls came true about 48%, 51%, and 58% of the time, each tier earning its label against a roughly 50% baseline.

Glossary

Browse all
171 terms.

Plain-English definitions for every term in your audit, organized by category.