As my next Sunday article (once again late) I thought I'd go over what I find is a frustrating statistic for rating players mostly because of how the authors arrogantly discuss it like it's bullet-proof.

In the advanced statistical realm of the NBA there are two main flavors of analysis: a microscopic view using every box score number available to estimate the value of each player, and a macroscopic one that uses the result (i.e. the score of the game) to back calculate the value of each player or line-up. The mainstream media incorrectly assumes “statheads” only care about what’s in the box score and thus miss all that doesn’t show up as a stat like setting a pick, fouling at just the right time to stop a lay-up, or most of the defensive end of the court like being able to guard Dwight Howard one-on-one and denying him deep post position. This is completely off base, although some people, like David Berri of Wins Produced fame, are guilty of this offense.

Macroscopic models, mainly adjusted +/- types that use the score of the game and combinations of players to attach an effect in points in the game to each player, unfortunately are limited in their precision. Even with an 82 game schedule and 48 minutes a game, there is not enough information to diminish the margin of error from roughly five points. For reference, these five points are over the course of a typical game, and it would separate the previous champions the Dallas Mavericks from teams that couldn’t even get into the playoffs. Microscopic models, however, have no problem with precision and with certainty can tell the reader exactly what the result is virtually every time. These models are shortsighted, obviously, by only reading the box score; their knowledge of defense is stealing the ball, blocking it, fouling, or rebounding. What they lack is accuracy. The macroscopic model can show you the entire galaxy but blurred and hazy; the microscopic will focus on a few spiral arms of the galaxy with great detail and clarity. You can’t assume you know exactly what the rest of the galaxy looks like from only a small section.

Wins Produced, an invention from economics professor David Berri, is the kind that focuses on only a few points in space: it uses traditional box score stats to estimate how many wins each player contributes to the team. Unfortunately, the proponents of the stat are so arrogant in the discussion and validation of their findings that they bring down the entire advanced statistical basketball community with them. Others like John Hollinger, of PER fame, will discuss the limitations of his model and admit its flaws. He doesn’t rate players solely by the stat, and with defense ignores it almost entirely. Wins Produced are discussed like it’s the only statistic a person needs when evaluating the league, except for something like age.

Despite the confidence of the authors, Wins Produced isn’t a super advanced metric; it’s really quite simple. The basis to the entire model is the possession. If offensive and defensive efficiency directly relate to wins, a reasonable assumption and one tested statistically, then one then be able to find a proxy for wins based on a model that calculates the aforementioned efficiencies. Rebounds add possessions, and thus add to wins, while turnovers are lost possessions, and thus you lose wins. This is entirely fine on a team level, but there are huge assumptions glossed over when pinning them on individual players.

Although most stats are directly tied to wins through the possessions, some like blocks and assists had their values found through analysis. However, the average value to applied to each player; this means something like a block Is the same value for someone like Dwight Howard, who swats a shot out of bounds and lets the other team have the possession, and Tim Duncan, who tries to tip the ball to his teammate to gain one. Assists have the same problem – an Andre Miller assist is typically more valuable than a Lebron James one because of how often Miller feeds a teammate for a dunk and how often James gets an assist for a long-jumper. Nonetheless, there are even more problems with the statistics directly tied to a possession.

Take, for example, the rebound. Think of the retired Bruce Bowen, a respected defender, guarding an offensively potent wing player during a possession. If he does everything correctly – denying good position after running up the court, move his feet to prevent a drive to the lane, stay grounded during shot fakes, force him into a tough long-jumper – then he still will not receive any credit according to Wins Produced; instead someone like Rasho Nesterovic would grab the rebound and be rewarded with an uptick in his statistics. There is obviously something very wrong with that model.

For defense, there is another factor in Wins Produced that I have not yet discussed. It’s the team defense factor. To incorporate the difficult task of assessing defense beyond a surface level they simply add an almost negligible number based solely on the team of the player; it’s essentially a team defensive efficiency statistic. That’s it. Carlos Boozer got the same benefit as Luol Deng, and Jason Richardson the same as Dwight Howard. There are also positional adjustments to mask the poor numbers for guards versus centers and power forwards, but that’s not egregious compared to the next topic.

For a validation of their model, the authors behind the formula regularly illustrate a table showing Wins Produced for each team and their actual results during the season. The differences are typically one to four wins. However, using those team results to validate the model is very problematic. With how Wins Produced was built, they are essentially summing individual offensive and defensive efficiency statistics and comparing them to team wins, which are highly correlated to efficiency, but those statistics like rebounds are directly tied to the team and not the player. This does not mean they found the magical formula to explain player value; all they did was say wins are explained by efficiency, which was divided to each player based on simple box score statistics. To put it in simple terms, they defined a word using the same word in a slightly different form – i.e. assuming, when one makes assumptions. That is not a useful definition.

Another large assumption David Berri and company make in defending Wins Produced is saying that since performance measured in box score stats or Wins Produced doesn’t change over time and teammates have little effect on how they affect other players’ stats, box score stats comprehensively reflect a player’s worth. The logic behind that argument is highly flawed. They essentially avoided the argument and instead discussed how player stats don’t change much with time. The consistency of box score statistics to each other has nothing to do with the power of those statistics. If players are consistent in the NBA, then it’s reasonable to assume box score stats won’t change much but it’s also reasonable to assume the stuff not found in the box score also won’t differ over time. I don’t know how that argument actually addresses the limitations of the box score, but that’s a real argument they use on their website.

As an illustration on the limitations of Wins Produced, I created a chart below showing players’ Wins Produced versus a regularized adjusted +/- score from stats-for-the-nba.com from the 2010-2011 season. The red line indicates a regression line, which from a linear regression shows the correlation between the two stats. Wins Produced explains only 19% of the variation in adjusted +/-, which means they greatly disagree on the value of players. This could be explained by the inherent limitations of +/-, which can’t give precise results; but I also think it shows they disagree fundamentally on most players in the league to some significant degree. If Wins Produced is a truly faultless method like the authors pretend in their articles (never mentioning that *gasp* the formula could have flaws), then it should more highly correlate to something as comprehensive as +/-.

I also included interesting outliers. Dirk Nowitzki, strangely enough, was the greatest outlier based on standardized residuals. I suspect Wins Produced didn’t like his low rebounding and shotblocking numbers, but Dallas didn’t seem to mind when they won the championship. Nick Collison is an intriguing outlier: either he’s an extremely underrated NBA player on par with the superstars in the league like Chris Paul or Wade, or +/- has a problem in determining his correct value. Above the red regression line you mainly have players known for fundamental defense like Jason Collins and Andrew Bogut or offensive synergy like Nash and Ginobili; below you guys known for putting up stats but forgoing man-to-man defense or other aspects of the game not tabulated in a simple stat like Kevin Love and Kevin Martin.

The main issue with Wins Produced is how the authors and supporters discuss the model like it’s flawless. It’s an interesting method that can compete with Win Shares or PER, but to claim that it can explain wins with a 95% accuracy is ridiculous. All they’re doing is dressing up a team’s efficiency numbers and allotting each piece to a player, and then summing those results to show how close they’re correlated to wins. Well, of course they are; everyone knows a team’s points scored and allowed per possession can explain wins. If you have the audacity to ask them how defense can be explained by steals, blocks and rebounds, you’ll get a response about how your little brain can’t comprehend a counterintuitive result. My advice is to view Wins Produced as an interesting summation of box score stats, and not to sway your complete view of a player. Using Wins Produced as your only method in evaluation a player is like zooming in on one feature of an animal. Maybe that one part can lead you to a conclusion about the rest of the animal, but you could likely become the blind man holding the elephant’s trunk and believe it’s a snake instead of something much more powerful.

Thank you for posting this! They are SO ARROGANT! When I try to explain that offenses need at different times to have 0, 1 or 2 people near the basket and you cannot therefore lump any five high Wins Produced guys together to get a ball team, I get the, "you don't understand the economics" bit from them. Some years back Orlando defeated Cleavland by superior spacing despite having weaker WP players and stats. When I pointed this out, they were all just like, "Oh no it's randomness" ... What a chikensh*t cop out! Can you imagine any player or coach saying we lost the series due to .... randomness? What they don't understand about Dirk is he's the best stretch 4 of all time [and that is valuable!] Funny how coaches are obsessed with spacing on offense and contested shots and ball pressure on defense when those things at best only go 5% of the way towards winning [in bizzaroland!].

ReplyDeleteWhat if the microscopic description got better? What if, as you say, those different type of blocks were accounted for? The model would get better, wouldn't it? I understand some of these things can't be reasonably tracked live, but with video technology, surely someone can make a box score on steroids with all the important distinctions between the different types of blocks, steals, points, and possibly even include spatial references of where the action happened.

ReplyDeleteI understand there is a limit to the amount of useful data and its contribution to a model, but can't the box score be improved, so that models based on it would be a bit more useful, to be used along with +/-?

One of the problems with adding variables to improve the model is that you have diminishing returns with the introduction of new variables. Let's say you add deflections, changes, distinguish between types of blocks (whether or not you gain possession afterwards), something like a successful pick and roll "stop" and others. The results would probably be improved, but not as much as what the previous variables brought. You reach a point where the complexity of what you're tracking isn't worth the trouble.

DeleteBut maybe there is something fundamentally wrong with how box score stats are kept. Maybe the new model would be much improved. All shots aren't equal. It could be that a model like PER or Wins Produced could be made a whole let better if the box score stats were changed. But think back to blocked shots. Not only does it matter if you gain possession afterwards, but also where the blocked shot is, who you're blocking and what the shot clock is at, not to mention the intimidation you bring to the paint.

I think you could make an interesting model with different box score stats, but what +/- is trying to track is ultimately what you're looking for. It's about the result on the scoreboard.

"Even with an 82 game schedule and 48 minutes a game, there is not enough information to diminish the margin of error from roughly five points. For reference, these five points are over the course of a typical game, and it would separate the previous champions the Dallas Mavericks from teams that couldn't even get into the playoffs. "

ReplyDelete"If Wins Produced is a truly faultless method like the authors pretend in their articles (never mentioning that *gasp* the formula could have flaws), then it should more highly correlate to something as comprehensive as +/-."

Wait. What? No. If +/- is as unreliable and noisy as it obviously is, then it is in fact useless to use as a point of comparison. Yet, as I roam the internet looking for analyses of models, this is the point of comparison I keep finding.

I was really hoping for a valid critique, but I didn't find one. You just seem put off by the "arrogance" of the authors. Well ... so?

-Whether or not a model conforms to your expectations is also not a useful criticism. Whether or not it produces useful results is what matters. I would HOPE that a useful model would produce novel insight.

-Rather than observe that you don't think it is fair to "pin" inefficiencies on particular players, why not be willing to check your assumptions and test it. It seems logical to me that efficiency would be closely tied player effectiveness. I may watch a game and come away believing that the 30% shooting or 6 TOs hide the fact that Melo "won" the game for us, but the scientist in me knows darn well that isn't a reliable method of evaluation. Box score stats may miss quite a bit. But does that mean that they don't reflect true productivity, especially as sample sizes grow? It seems to me that the exceptions in question are likely of a smaller scale than the acknowledged randomness of Adj+/-, The latter, you quickly gloss over and return to as your baseline stat for comparison. Yet the former you find reason to invalidate a model.

-What does "advanced" have to do with "simple" and what possible bearing does it have on effectiveness? Simplicity is to be striven for. Needless complexity tends to be showmanship or obfuscation.