Piecing Together the Injury Puzzle

By Max Mulitz

 

There’s been some really great work from a variety of sources on injury rates.  I’m gonna try to synthesize the current research on injury rate and hopefully further the discussion by addressing a couple areas that have so far gone unnoticed.

Football Outsiders and Pro Football Logic both peg the percentage chance of a player missing at least one regular season game with an injury in a given season at ~38%, meaning 62% of players go uninjured. However, when you limit your sample to players who were on a team for the whole 16 game season, only 54% of players are able to avoid injury the entire season. Football Outsiders also notes that rate of players missing time with injuries is generally increasing over time, though Pro Football Logic’s data is from 2015. In total, a base rate of about a 58% chance of not missing time for the average player in a given season is probably about right. Another way to address this problem would be to look at the per game injury rates and then take the probability of not getting injured in a given game and extrapolate it to a full season (15 games, since injuries suffered during week 17 can’t cause the player to miss any regular season games.) Using Pro Football Logic’s injury rate per game of 4.1%, we can extrapolate an expected 47% chance of a player missing at least one game with an injury.

One question might be why the implied chance of a healthy season is a little lower than we’d expect, 53% instead of 58%. One contributing might be injury recurrence. If players who miss games at some point in the season with an injury are more likely to get re-injured later in the season than the average player, than the total injury rate will cause us to expect a greater number of players to sustain at least one injury than is prudent. A player with a 58% chance of staying healthy to dress for 16 games has a 3.6% chance of being injured in a given game (assuming all games carry an equal risk of injury).

 

Injury Proneness

Injury Predictor is an NFL injury analysis prediction website that has done research demonstrating that “injury proneness” seems to be predictable and documenting the increased risk of injury recurrence over a season.

It strikes me as inherently obvious that some players are injury prone. To give an extreme example, it became clear that Arian Foster was more likely than the average player to get injured in his final few seasons, which led to his in-season retirement. While I don’t agree with all of Injury Predictors’ methodology (predicting Drew Brees only has a 1% chance of missing a game with an injury this year is silly) they do claim to have demonstrated the ability to predict which players are at the highest risk of injury (66-75% chance of missing at least one game) and which players are at the lowers risk (~30% chance of missing time).

It may seem strange given a base rate of about 40% injuries to see injury predictability of injury prone players 20-30% above the baseline but for lower risk players to only be injured 10% less than average, but actually I think it makes intuitive sense. If you imagine me, or any person of average athleticism, playing in an NFL game where I either receive 20 carries or play until I am too injured to continue, the chance I would actually be healthy enough to touch the ball 20 times are vanishingly small. This should make it clear that injury likelihood can scale up to essential 100% depending on situation. On the other hand, even the healthiest player is one awkward helmet to the side of the knee away from being out for the rest of the season, football is a brutal game on the human body, and there simply aren’t people who are impervious to it’s risks.

Anyway, my sense right now is that it is possible to identify high injury likelihood, but players with an extremely low rate of injury are probably mostly lucky. Also, because injury likelihood increases with age, by the time you identify a player as being particularly unlikely to get injured, his increased age has probably offset the benefit. Actually that’s exactly what we see in the Football Outsiders piece, where the injury rate is flat across all ages, but when you look at players with long careers their probability of getting injured increases each year of their career. Research suggests the negative physical effects of aging begin at around age 23, so each year of a players career is going to bring accumulated damage and increased risk.

Closing Thoughts

In an upcoming post, we’ll look at average injury length across position and then combine injury rate with injury lengths to look at total games lost due to injury. Understanding how many injuries a team can expect to have is integral to team building and achieving balance between acquiring top-end starters and maintaining quality depth through the lineup so the team can continue to function if injuries do occur.

Understanding the Lack of Predictive Value of Yards/Carry

By Max Mulitz

 

Previous research has shown that yards per carry is very inconsistent both within a season and year to year. The links show that depending on how you pick your sample of backs, the year to year correlation for Yards Per Carry for Runningbacks is only between .1-.3, meaning a significant majority of the difference of yards per carry in a given season is attributable to randomness. Below I’m going to lay out a thought experiment that helped my understanding of yards per carry.

Ok, imagine 3 hypothetical running backs, running back A averages 4.0 yards per carry on his first 199 carries of a season, on his 200th and final carry, he goes 84 yards for a touchdown. He ends the season with 480 yards on 200 carries for a 4.4 yards per carry average, solidly above the league average of 4.0. Running back B averages 4.0 yards/carry on his first 199 carries, on his 200th and final carry, he scores a 44 yard touchdown. He finishes the season with 440 yards on 200 caries for 4.2 yards per carry, the league average. The third back has 199 carries at 4.0 yards a carry, on his 200th carry he scores a 90 yard touchdown, but the play is called back for holding and he ends the year with 4.0 yards per carry. So three backs who had the exact same production on 99.5% of their carries can all have vastly different Yards/Carry because of one play. This distortion gets far worse when you consider even lower numbers of carries, if you repeated the same thought experiment with only 100 carries, the difference between the top and bottom players doubles to 4.8 yards/carry vs. 4.0.

Though Yards per Carry is probably the most mainstream running back/running game statistic, it’s so noisy that it is very hard to use as a predictive indicator, especially for backs without a large sample of carries. I’ll be looking at some other ways to measure the running game in the near future.

Offensive Points Per Drive and Rushing and Passing Efficiency

By Max Mulitz

 

“Although the running game is much more physically demanding than the passing game, coaches should keep in mind that the physical matchups that occur during the running game are not as likely to enable a team to overwhelmingly dominate another team.” Bill Walsh- Finding The Winning Edge

 

We’ve already looked at at the value of rushing and passing efficiency in terms of wins. But obviously an offense contributes to wins primary through scoring points, so I thought I’d look at how rushing and passing efficiency contribute to overall offensive efficiency. I got Points Per Drive data from SportingCharts for the 2011 to 2015 season and used Rushing and Passing Efficiency data from Pro Football Reference.

First, lets look at a multiple regression on Points Per Drive for each Team Season from 2011 to 2015 (for example the 2011 Arizona Cardinals Offense is one data point and the 2013 Arizona Cardinals Offense is another).

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.47405 0.22195 -6.641 4.81E-10 ***
Passing NY.A 0.44907 0.02478 18.126 <2.00E-16 ***
Rushing Y.A 0.14062 0.04072 3.453 0.000712 ***

The R-Squared for the model was .68. Here we see passing efficiency estimated to be about three times as important as rushing efficiency, where previously we saw passing around twice as important. This is probably because Net Yards Per Attempt combines Yards per Attempt and Sacks into one metric. Also, because most turnovers happen in the passing game and interceptions are negatively correlated with passing efficiency, by not including turnovers in the model their value will mostly be captured by passing efficiency (since efficient passing teams throw fewer interceptions, but rushing yards per carry and fumbles are uncorrelated.)

Now consider the following charts, that with a trend line plot Points per Drive against Net Passing Yards Per Attempt and then plot Points Per Drive against Rushing Yards Per Attempt.

PPD.png

Rplot01.png

The relationship between net passing efficiency and PPD is both of a greater magnitude and lower variability than the relationship between rushing efficiency and PPD.

Now consider the following table, which breaks offenses into 4 groups based on offensive points per drive and examines the distribution of team rushing and passing efficiency in those groups.

 

                    All Teams 2011 to 2015
Points Per Drive <1.5 1.5 to 2 2.01 to 2.5 2.51+ All
Number of Team Seasons 18 73 57 12 160
Max Rushing Y/A 4.8 5.2 5.4 4.9 5.4
Upper Quartile Rushing Y/A 4.2 4.4 4.7 4.3 4.5
Median Rushing Y/A 4 4.1 4.3 4.2 4.2
Lower Quartile Rushing Y/A 3.7 3.8 4 4 3.9
Min Rushing Y/A 3.3 3.1 3.4 3.8 3.1
Max Passing NY/A 6.1 7.7 7.5 8.3 8.3
Upper Quartile Passing NY/A 5.7 6.4 7 7.8 6.9
Median Passing NY/A 5.4 6 6.8 7.5 6.2
Lower Quartile Passing NY/A 5 5.7 6.4 7.3 5.8
Min Passing NY/A 4.2 5 5.3 6.5 4.2

So to read the chart, from 2011 to 2015, there have been 18 Team Seasons where an offense has failed to score 1.5 Points Per Drive. For those teams, the median passing Net Yards/Attempt was 5.4 compared to a league median Net Yards/Attempt of 6.2 In fact, none of the 18 teams to score less than 1.5 Points Per Drive even had league median passing efficiency, with a maximum of 6.1.

One thing to look at is the middle 50% (Lower Quartile to Upper Quartile) of passing and rushing efficiency for each level of offensive efficiency. The offenses that scored <1.5 PPD had a middle 50% NY/A of 5-5.7, the 1.5-2 PPD offenses had a middle 50% NY/A of 5.7-6.4, the 2.01-2.5 offenses had a middle 50% NY/A of 6.4-7 and the >2.5 PPD Offenses had a middle 50% NYA of 7.3-7.8. Notice that there is almost no overlap between the groups and that there is almost linear improvement seen in offensive quality as NY/A  increases.

On the other hand, the middle 50% of rushing efficiency for <1.5 PPD offenses was 3.7-4.2, for 1.5-2 PPD offenses it was 3.8-4.4 for 2.01-2.5 PPD offenses it was 4-4.7 and for >2.5 PPD offenses it was 4-4.3. There is enormous overlap at each level of rushing efficiency. While it seems high efficiency offenses may be less likely to have truly ineffective running games (look at the rising minimums), there are all qualities of rushing efficiency represented at each level of offensive quality.

The data supports the hypothesis that Bill Walsh was absolutely correct when he stated passing efficiency exerts dominant control over rushing in terms of driving the quality of an offense.

Understanding the Value of Cap Space

By Max Mulitz

 

Using data from OverTheCap I ran a regression to look for a correlation between offensive and defensive spending as a percentage of the current salary cap and wins. I used data from the 2013-2015 NFL Seasons. Results below.

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.53424 3.16746 -1.432 0.15564
Offense 0.15781 0.04718 3.345 0.00119 **
Defense 0.13955 0.04458 3.13 0.00233 **

The way to read the chart is that for every extra 1% of Cap a team spends on Offense or Defense in a given year, they can expect to win ~0.15 extra games per season. It is interesting to note the coefficients for Offense and Defense are not significantly different from each other, so there’s no clear advantage for spending money on offense vs. defense.

This is a very preliminary look into valuing future Cap Space. Because NFL Teams are allowed to roll over unused salary from season to season, we can estimate the value of the rolled over space in terms of wins. For instance, if the Base Cap is 166 Million next year (as OverTheCap expects) rolling over 5 Million dollars from this season would be worth 5/166=3%*.15=.45 wins. So an extra 5 million dollars saved this year could be worth about .4-.5 wins the following year assuming the money is spent in the offseason at league average efficiency. It’s obviously not practical to try to pin down the exact value of $1 Million Dollars of Cap Space at any given moment as the actual value of money changes on availability of players, team needs, etc. but having some sort of baseline estimate of the value of cap space going forward is valuable as a rule-of-thumb starting point when deciding weather it is worthwhile to cut a high price veteran for cap space.

Calculating SPARQ

By Max Mulitz

 

 

The Seahawks have been using SPARQ for some time now to supplement their draft preparation. Zach Whitman has an entire website dedicated to estimating and calculating SPARQ scores by position. He doesn’t publish the formula but here Josh Hermsmeyer recreates WR SPARQ using multiple linear regression. 

I thought I would use the 2016 Edge SPARQ Scores on Zach’s website to recalculate the formula for Z-Scores (normalized SPARQ Scores) for edge players. It seems SPARQ is just a multiple linear regression, as my adjusted R Squared was .986, almost a perfect fit. Results below.

Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.777706 1.000752 6.773 8.81E-10 ***
Weight 0.02147 0.001098 19.545 <2.00E-16 ***
Forty -1.566179 0.164255 -9.535 1.04E-15 ***
Ten -2.35099 0.300435 -7.825 5.39E-12 ***
Three Cone -0.771297 0.080434 -9.589 7.88E-16 ***
Short Shuttle -1.385896 0.14201 -9.759 3.34E-16 ***
Bench 0.017426 0.002895 6.019 2.91E-08 ***
Broad 0.056072 0.003036 18.472 <2.00E-16 ***
Vertical 0.105705 0.006333 16.692 <2.00E-16 ***

The way to read the coefficients is that for every pound increase in Weight, a players Z Score increases by .02. For every .1 second increase in the Forty, a EDGE player’s Z score decreases by .156. For every one inch increase in Broad Jump, a players Z score increases by .056, etc.

SPARQ is by no means the end all be all of athletic measurement, but Zach has shown it’s strongly correlated with performance and there is certainly value in being able to view a players combine performance as a single number as a starting point in integrating that information into an evaluation.

Modeling The Process of an NFL Play as a Sequential Game to determine the meaning of Offensive and Defensive Formational and Personnel Splits

By Max Mulitz

 

Here’s what a sequential game is.  Below I took a crack at Modeling the general process of a play as a sequential game.

PlayProcess1.png

Basically, nature determines the attributes of the situation such as the down & distance, score differential, time remaining, whatever other factors could be relevant to teams strategy on a given play. Obviously in reality these factors are determined by previous events in the game, but from the perspective of selecting a given play they can be taken as a given.

Then in reality the order of the decisions made differs by team. Some offenses will choose their personnel, formation, and play prior to lining up and will not deviate from that regardless of what the defense does. Other offenses will select personnel, then formation, then choose between a menu of run and pass plays depending on defensive alignment. Still other offenses will select personnel and formation, then use motion to change the formation and snap the ball quickly to prevent the defense from adjusting their coverage to the new formation. This tactic is outlined below.

Play1.png

Similarly, a defense may attempt to disguise it’s front/coverage until right before the snap, so really after personnel is chosen both teams have options to to manipulate the information the opponent observes.

On the other hand, the only option an offense has to manipulate the personnel of the defense is to hurry up following an in bounds play to prevent the opponent from substituting. Nevertheless, it is evident that the process of the offense and defense selecting personnel is usually a sequential game while formations and player location at the snap is more fluid. For this reason, analysis of Offensive Personnel vs. Defensive Personnel performance is easier to apply directly than formational/pre-snap alignment analysis.

Anyway, looking at personnel usage has become all the rage in Basketball. Thinking about sequential order of decision making on a play can help inform what situational splits are the most informative when looking at tendencies and making decisions.

The Win Probability Value of a Point

By Max Mulitz

 

 

This article about hockey analytics which explains that 94% of team wins can be explained by goal differential, got me thinking about the value of a point in football. I grabbed the Team point differential data from SportingCharts for each NFL team from 2011-2015 and ran a simple linear regression to predict wins.

The model had an adjusted R-Squared of .87. That means 87% of an NFL teams’ wins in a given season can be explained by their point differential. The remaining 13% of the game can be explained by luck relating to when a team scores their points as well as certain game management strategies that may distort point differentials (playing prevent defense up 3 scores in the 4th Quarter may increase your opponent’s chance of scoring while decreasing their chance to win the game if they take a long time to score, as one simple example.) Notice this isn’t the same as saying 87% of the game is skill, it is certainly the case that elements of luck (having a high fumble recovery rate, for example) can influence a teams point differential.

The equation for the model was as follows:

Wins = 8.0000 + .0.0278988 * Point differential.

It makes sense that our intercept is 8, since a team with 0 point differential would expect to win half their games. What’s more interesting is we now have a strong estimate for the marginal value of a point in the NFL. For each addition point a team scores (or prevents) in a season, they can expect to win .028 extra games. Using this estimate to value kickers, we see our hypothetical uber kicker (worth 9.6 points a season) is worth .27 wins a season, almost equal to our earlier estimate of .3 and a top 10 kicker is worth .13 wins a season, similar to our earlier estimate of .1.

Football Outsiders FAQ actually shows us that point differential predicts next seasons wins better than previous year wins, which is probably strong evidence that point differential is a purer measure measure of team quality relative to luck than previous years wins is.

Anyway, one thing that the hockey, basketball and baseball analytics communities have figured out that football doesn’t seem to have yet is that we ultimately want to be optimizing for wins, not points added. Some Win Probability Added models exist in football but those are based on fitting a model to specific game situations and then measuring the changes in Win Probability over time.

Because Point Differential is an extremely strong linear predictor of team wins, Expected Points Added can easily and meaningfully be converted to a Wins Added value, which is meaningful since our goal in measuring player quality is ultimately to optimize for number of wins.

 

 

Shotgun Vs. Under Center Run/Pass Balance on 3rd/4th Down & One

By Max Mulitz

 

In this great post by Michael Lopez, we learn that NFL offenses are less effective on 3rd/4th & 1 from Shotgun than from Under Center.Below I did the same study Lopez did, except I broke the results out by runs and passes and I only used ArmChairAnalysis data from 2010-2015, as I feel the game has changed since 2000, especially with regards to how teams employ the shotgun formation. Results Below:

 

Plays Rush Conversion Rate Pass Conversion Rate Pass %
Under Center 4318 68% 59% 18%
Gun 1430 73% 57% 56%

First, notice rushing is much more effective than passing on 3rd/4th & short both from under center and shotgun. In spite of this, teams rush 82% of the time when under center but only 44% of the time from the gun. Breaking these tendencies out by individual teams/offensive coordinators would be useful for a team from an individual game planning perspective, but that’s a project for another day.

We also notice that rushing actually is more effective from the gun than under center and passing is slightly more effective from under center, a counterintuitive result. The advantage of rushing from Shotgun vs. rushing from Under Center was statistically significant at P=.05, while the advantage for passing under center vs. in the gun was not. The next step for this analysis would probably be to look at how Play Action effects the pass success rates both from the Gun and Under Center on 3rd/4th & Short.

One concern might be if teams can sustain success running from the shotgun on 3rd & short or if the success comes from the fact that teams rarely run. There were 9 teams that ran 50%+ of the time when in Shotgun, on average, these teams converted when running from the gun 76%, slightly above the sample average. The leader in gun % were Chip Kelly and Andy Reid’s Philadelphia Eagles, who were in gun 48% of the time on 3rd or 4th & short, ran on 78% of those plays and converted on an above average 77% of their plays when rushing. In total, 19 of 32 teams over the 5 year period were more successful running from the Shotgun than running from under center.

It seems that we have an example of Simpsons Paradox. Being in the Shotgun doesn’t seem to inherently lower a teams chance of converting on 3rd down and may even slightly help it when rushing, however because teams often line up in shotgun with the intention of passing on 3rd and 1 (a strategy with a lower payoff  than rushing) Shotgun success rates fall behind Under Center rates.