by Max Mulitz
Data generated scouting reports famously involve many pages of data. A team does not gain an advantage by summarizing the most data. The goal is to get the most useful information to the coaches. What specific decisions are coaches looking to the data to help them make?
For example, say a defensive coordinator is deciding if he wants to play base defense or more nickel defense when his upcoming opponent is in 12 personnel. The coordinator expects his base defense to be above average against the run but below average against the pass but for his nickel defense to be above average against the pass but below average against the run. To make this decision the coordinator wants to know the probability the opponent will run or pass in normal down and distances from 12 personnel depending on the defensive personnel.
The problem is, this is a very specific question. A team may have only faced nickel defense from 12 personnel in normal situations a handful of times in a season, so there might not be a meaningful sample. The primary way to alleviate this is by using base rates. If NFL pass 60% of the time from 12 personnel in normal situations against base defenses but only 40% of the time against nickel defense that is useful information. If the base rates are very similar that is also useful information. How often teams run from 12 personnel in general is not that useful for this question, since we have good reason to believe defensive personnel will affect run/pass choices.
The other way to expand our sample is to look for similar situations that would fit our theory. Here that means looking for how the opponents’ run/pass balances changes depending on the defensive personnel in normal situations. Some teams run quite a bit more often vs. nickel than vs. base and some teams don’t change their offense very much based on defensive personnel. Knowledge is theory laden.
As important as what data you include is what data is excluded from the analysis. How often teams run from 12 personnel isn’t very useful because it doesn’t account for the fact that we are considering using a lighter defense that may cause teams to run more.
Also if you don’t constrain yourself to normal situations the sample will be contaminated with examples of teams playing nickel defense on 3rd & long which will inflate the probability of a pass.
If you don’t expand your sample size by looking at classes of situations a team can be fooled into thinking a team is 80% likely to run in a given situation just because they have done it 4/5 times, even if the base rate is just 33% runs.
Interpreting data for game planning is the domain of coaches. A football scientist’s job is to make sure coaches are seeing the correct data that helps them game plan and are not weighed down with irrelevant or misleading data.