Tuesday, June 23, 2026Search

The Forward

Finance in motion.

Discipline

Measuring Forecast Accuracy Without Punishing the Team for Honesty

Measuring forecast accuracy turns a rolling forecast from a ritual into a feedback loop — but the wrong metrics teach planners to sandbag.

A paper archery target with a tight group of arrows landed off-center and one arrow near the outer edge

A rolling forecast that nobody scores is a calendar event, not a discipline. The first time a team starts measuring forecast accuracy — putting a number on how wrong last quarter's projection turned out to be — the whole exercise changes character. The forecast stops being a number you produce and starts being a number you're accountable to. That accountability is the point, and also the trap. Score it the wrong way and you teach your sharpest planners to lie to you politely, quarter after quarter, in the form of conservative numbers nobody can be blamed for missing.

Two errors, not one

Forecast error has two components, and conflating them is the root of most bad accuracy programs.

Bias is directional — the systematic tendency to over- or under-shoot. If your revenue forecast lands below actuals seven quarters out of eight, you have a bias problem, and it's almost always cultural rather than analytical. Someone is sandbagging.

Dispersion is the magnitude of error regardless of direction — how far off you tend to be on any given line. A forecast can be unbiased and still wildly imprecise, swinging twenty points high then twenty points low and netting to zero.

The standard tools separate cleanly. Mean Absolute Percentage Error (MAPE) measures dispersion: average the absolute percentage miss across periods and you get a single number for how tight the forecast is. Forecast bias — the average of signed errors — measures direction. The American Production and Inventory Control Society's body of demand-planning literature has used this split for decades; it transfers cleanly to revenue and driver forecasting.

Track both. A team reporting 4% MAPE looks excellent until you notice every single miss is in the same direction — at which point the 4% isn't precision, it's a margin of safety they've quietly built in.

What to measure, and at what altitude

Top-line revenue accuracy is the headline, but it's a lagging, composite figure. By the time you know revenue was 8% light, the diagnostic value is gone. The useful work happens one level down, on the drivers.

If you've built a driver-based model, you already have the structure: forecast new logos, expansion rate, gross churn, and average deal size separately, and measure accuracy on each. A revenue miss that nets to 3% can hide a 15% overshoot on new bookings cancelled out by a 12% undershoot on churn — two broken assumptions masquerading as one good forecast. The Corporate Finance Institute's material on variance analysis makes the same case: decompose before you judge.

The altitude matters for tolerance, too. Not every line deserves the same accuracy bar.

Tolerance bands by line item

A flat "within 5%" target across the P&L is lazy. Some lines are genuinely more forecastable than others, and pretending otherwise punishes planners for the inherent variance of their inputs.

  • Contracted, recurring revenue — tight band, say ±2–3%. You have the contracts; a large miss means a data problem, not a forecasting one.
  • New-business bookings — wide band, ±15% or more at early stage. Pipeline conversion is noisy, and a planner who hits it to the point every quarter is probably sandbagging the number.
  • Headcount-driven costs — tight on existing staff, wide on planned hires that depend on req timing.
  • Usage or consumption revenue — band sized to your actual historical dispersion, not a wish.

Set the bands off observed variance, then revisit them as the business matures. This connects directly to how often you re-forecast: the cadence and the tolerance bands are the same conversation viewed from two angles.

The sandbagging problem

Here's the failure mode that kills accuracy programs. You announce that forecast accuracy is now a tracked metric, possibly tied to comp or to credibility in the room. Rational planners respond by lowering their forecasts until they're trivially beatable. Accuracy "improves." The forecast becomes useless for planning because it no longer represents anyone's honest expectation.

The fix is to score bias as harshly as dispersion. A consistent under-forecast should cost a planner the same reputational capital as a wild miss. When you publish accuracy, publish the signed bias next to the MAPE so a one-directional pattern is impossible to hide. Research on forecasting practice — the International Institute of Forecasters maintains a deep literature here — has documented for years that incentive structures shape forecasts more than analytical technique does. Measure honesty, not just precision.

Variance against rolling, not against a stale budget

The traditional comparison is budget-to-actual. The problem: by month six, a budget locked in the prior November is comparing reality against assumptions that have since been overtaken by events. The variance you compute is mostly noise from a world that no longer exists.

Measuring against your most recent rolling forecast is the more honest test. It asks: given what you knew a month ago, how good was your projection? That's a fair question. "How does this compare to a number we made up before we'd closed three of our ten biggest accounts?" is not.

This is also where speed compounds. The learn-and-correct loop only closes as fast as you can see actuals. If month-end close takes three weeks, your accuracy review lands when the quarter is nearly over and the lesson is academic. Teams running continuous visibility into actuals — rather than waiting on a packaged report — can score last month's forecast in days and adjust the model while it still matters. Harvard Business Review's work on the high price of slow data makes the broader point: latency between event and insight has a real cost.

On the tooling: most teams stitch this together in a spreadsheet and a BI layer, which works until the reconciliation overhead eats the analyst's week. Purpose-built FP&A platforms — Cube, Pigment, and others among them — automate the actuals-versus-forecast comparison, and it's worth auditing how fast your stack surfaces a variance once the number lands . The metric is only as good as the speed of the feedback.

The discipline is simple to state and hard to sustain: measure both bias and dispersion, set honest bands, score sandbagging, and compare against what you most recently believed rather than what you committed to half a year ago. Do that, and the forecast becomes a feedback loop instead of a quarterly performance.

About the author

The Forward Editors

Editorial

The Briefing

One email a week. No filler. No fluff.

Read by CFOs, founders, and finance operators at high-growth companies.

Continue reading