Monday, March 31, 2014

The One World Fallacy

What happens once doesn't make the event a certainty.

If you track the perception of great players year by year, you see sometimes see violent swings with specific players, and it's usually because of a title or an otherwise successful season. One recent example is Dirk Nowitzki, who was seen as a star but not a true legend even after his MVP. For instance, Bill Simmons ranked him 39th in his 2010 version of his pyramid. However, after his 2011 title, without an obvious star teammate, he ascended rapidly on all-time lists. Simmons has implied he's now top 20 ever. And this was only after a Cinderella playoff run. Their point differential was only 8th in the league before the playoffs, and Dirk wasn't at the top of MVP lists. What's more likely: that Dirk finally "learned" how to win and had a season that pushed him over the top of about 20 different players NBA history, or a title changed the perception of a player because people put too much weight on championships for individual players?

Hakeem: was it destiny or luck?

After toiling for years with an organization that only provided mediocre talent, nearly leaving at one point, Olajuwon broke through in 1994 with impressive performances against his big man peers. The measure of his dominance, in fact, was the damage he caused to the legacy of the superstar centers he defended: David Robinson and Patrick Ewing. The respect paid to Hakeem, beyond his dazzling blocks and steals totals and his post game, stems from those events. However, there's also a growing legend attached to his title run. As it probably looks more impressive years later, given that he lacked all-star talent, there are some environmental factors ignored and, of course, luck is not considered.

Let's tackle the environmental factors first. First of all, when people discuss recent championship teams, they like to say the title doesn't count because, for example, a star like Rose got hurt. But stars get hurt every season -- Shaq's 2000 title isn't taken away because the star of the 1999 championship team, Duncan, got hurt and missed the playoffs -- and when we look at the past, our memories are too fuzzy to recall every circumstance. The 1994 season has a pretty giant asterisk, if you're into such typographical glyphs: Michael Jordan, at 30 years old coming off three titles, overwhelming pick for greatest player ever, retired and the returning champions suddenly fell from contender status. Imagine LeBron retiring only to see Durant win a title, and how many people would, consequently, criticize Durant's title.

With a wide open field, Jordan's peers saw their odds increase. One of the strongest teams at the time was the Seattle SuperSonics. Led by Gary Payton, flanked by the ferocious Shawn Kemp, the underrated Detlef Scrempf, and other very good role players like Kendall Gill, Sam Perkins, and Nate McMillan, they won 63 games with the league's highest point differential and offense that ranked second by basketball-reference's method and a defense that was third. Their point differential of 9.1 was one of the greatest ever, slightly inflated by a weak schedule, but even after an adjustment they were still by far the "best" team that year. They split the season series with Houston, though they won every game next season and were the best threat to the Rockets. Gary Payton played well against them, and Olajuwon either would be drawn away from the basket to defend Sam Perkins at the three-point line or had to contend with Kemp's dunking and foul-drawing prowess. And the Sonics actually defeated them in 1993 and 1996 in the playoffs.

Yet they lost to the Denver Nuggets in the first round, leaving the path to the finals open for the second seed Rockets. In fact, they lost in the first round again next year, leaving the door open again. It's hard to say how'd they do against the Rockets in the playoffs since they lost to inferior teams. But losing a series doesn't necessarily mean you're worse than your credentials would indicate. An 8 seed versus a 1 seed doesn't have zero odds. Last season, Kevin Pelton estimated the Rockets had a 12.8% chance at beating the Thunder (this was before Westbrook's injury) and he gave the Lakers similar odds as a low seed against the Spurs. The Harden Rockets were a pretty strong 8 seed, and so were the Nuggets back in 1994. We can't say for certain how strong the Sonics were in the playoffs, by true talent, because the series was only played one time. Plus, in 1996 with largely the same crew they rode into the finals and gave the greatest team ever, the 72 win Bulls, a good fight before succumbing to the inevitable. A model's odds of having an upset is partly a reflection of the uncertainty of the two team's strengths, but obviously there's a variation in outcomes: the best team doesn't always win.

For some perspective on the odds a team has of winning a title, right now the Clippers, rated as the second best team, have 20.7% odds of winning according to the Hollinger playoff tool. Championship odds are never 70-30 for the two best teams; we should know from history the field is more wide open than that. So let's pretend the Rockets, a strong team in '94, had 18% odds. With the Sonics going down, their odds (let's say 30%, similar to San Antonio's right now) are distributed to everyone else, mostly to the western conference. We could reasonably say the Rockets now have a 27% chance at winning. So when they won the title, was it really Olajuwon's indomitable will? Then why didn't they win in 1993?

In the finals the Rockets met the Knicks, and while it's infamous for Ewing's poor play, the Knicks still nearly won the title. New York actually outscored Houston by five points, and Houston was down 3 games to 2 at one point. Then Houston won by two points and six points in the finals two games. By most measures the Knicks were the better regular season team, and they gave Houston a competitive fight. Olajuwon went off while the numbers of his role players waned, but this was partly by design as the Knicks had the personnel to play them straight up. The series is known for Ewing choking, but the rest of his team did not. However, if Ewing had not shot 36% from the field, the outcome would likely differ given how close the games were. If Ewing played that series one hundred more times, like Groundhog's Day with amnesia, would he really match his poor play every time? It's reasonable to say he would probably play poorly most of the time, but not that horrifically. He had gotten the best of Olajuwon in a game just a season before.

Houston beat the Knicks that season, who were very good, but so were the Rockets. And it wasn't all Hakeem. People like to mention that the 1994 Rockets were the only championship team with just one player with a PER above 17 besides the 1978 Sonics, a champion who only won 47 regular season games, but Olajuwon's was "only" 25.3 (only in the context of superstars on title teams) yet the Rockets won 58 games. Since PER is a summation of box score stats that approximate wins, what does that tell you about his supporting cast? He had no obvious secondary star, but it was a deep team. Otis Thorpe, their power forward, is underrated by PER, which as most die-hard basketball fans know is deeply flawed: Otis was a great defender, though it doesn't show up in individual box score stats, and he was very efficient that season. The fact that the Rockets were a much better team (by offensive/defensive efficiency and point differential) in 1994 than 1993, even though Olajuwon was markedly no different (he was arguably better in '93, at least according to stats like PER, for people who want to rate his supporting cast like that), should tell one that, yes, his supporting cast was better and his teammates, despite not going to the all-star game, were still providing a huge lift. If Hakeem was doing it all by himself in 1994, then logically he could have done it all himself the season before.

What changed was, quite simply, luck and circumstance. Jordan was gone. The best team in his wake, the Sonics, were bounced early. And the Rockets survived two seven-game series. A single shot from a Houston role player could have ended the Dream's legendary run. If a Sam Cassell three-pointer in game 6 in the finals is off by a couple inches, they lose. A single shot from a bench player could have derailed Hakeem's story. But what do we know of that shot?

The stochastic NBA and the consequences for winners and losers

Whenever I've queried people on the nature of jump shots or free throws, most people, and by a large margin, agree that there's randomness involved. In other words, shots aren't pre-ordained and otherwise good shooters can miss important shots, not from a lack of focus or will, but just due to the random nature of the game. Even if you're truly a clutch shooter, you do not hit 100% of your late-game shots; you have a chance, a probability of missing.

There are many discrete events in the NBA that depend on probabilities. Even a good rebounder can be thwarted by unlucky bounces. If Arron Afflalo sets up on the right baseline, you could run quickly to the left side opposite from Afflalo. That is the most likely spot to catch his potential miss. But if the ball hits the inside of the rim and bounces back to the right baseline, then your chances of rebounding the ball have vanished. That is pure luck. You can do this with free throws, you can do this with passes, you can do this with defensive choices -- there are no fixed values on these events, and they are stochastic. An NBA game by itself isn't actually a coin toss; it's a series of hundreds, or thousands, of coin tosses summed together. With enough events through 48 minutes, or more, we have a fairly good guess for the result of the game, but since the constituents of the game are probabilistic it's not enough for the game to have such certainty. We'd need an infinite number of minutes for the game's outcome to be, strictly speaking, deterministic. It's why an entire season is more informative than a single game.

Of course, that's only the surface of why the NBA is stochastic. One of the most common assumptions among even the most statistically inclined fans and writers is that a player during any point of the season is, basically, operating at the same level of value. No one would argue players don't improve, or get worse, from season to season, but if we suppose that's true on a yearly basis it's reasonable to expect this happens in-season. Players can change, sometimes at imperceptible rates and sometimes at visible ones, game to game, and this adds to the variability and the uncertainty of the game. There are other issues too, like injuries, which can vastly change the playoff odds.

Philosophically, people are often opposed to a random universe. If a team wins, there should be meaning. NBA analysts are derided for taking the fun out of the game by devaluing this, by saying that a win today could just as easily be a loss on another day. But this doesn't change the reality of the league. Additionally, a system of ranking players so based on luck and teammate performance is inconsistent because a player could do absolutely nothing differently but someone else on his team could miss a shot and his value sinks. What reliability does that method have?

In a world where Hakeem loses in the finals in 1994, or earlier in the playoffs, without a reduction in his level of player, he should not be seen as a worse player -- because he wouldn't be.

Cinderella's changing face


The legend of playoff Hakeem, however, may has more to do with the 1995 title because he had two championships in a row, but that's still only a set of two trials, and a few circumstances still apply. Jordan came back near the end of the season, but he and the team weren't at their best, and the Sonics were out of the picture too early again. Obviously, there's another factor: they traded for Drexler midseason. Funnily enough, the "star-less" Rockets were better the season before without Drexler. It's a testament to why teams should be judged on their collective talent, not just their headliners.

What's strange about Olajuwon's second championship is how poor the team was playing in the regular season. Before Drexler, the team had an adjusted point differential of +3.1, which is good but not championship material. Their defense had slipped, and so had their offense. With the Drexler-Thorpe trade, their defensive rating dropped by nearly 6 points per 100 possessions, and their offense improved by only 3.6 points while their pace exploded from a snail's pace of what would have been the league's slowest to one of the fastest teams in the league. However, the schedule was more difficult, and adjusting for that during the regular season their adjusted point differential was almost exactly the same.

Perhaps the team just needed time to gel and develop cohesion on offense and defense. It's difficult to say with certainty. But their adjusted point differential in the playoffs (adjusted for strength of schedule and homecourt advantage) was a scintillating +7.5. Maybe it was a season-long funk, maybe they were coasting until the playoffs -- whatever it was, they were much better, especially on offense. Of course, they still needed luck: they needed the full five games to beat the Jazz, winning the last game by only four games, and the full seven games to beat the Suns, where they escaped with a one point lead in the seventh game.

The '95 Rockets faced four strong title-worthy teams during its path to their second title, which almost never happens. This is where Olajuwon is given extra-credit, the reason being that you can win a title with a weak team as long as you have him, but there's no sufficient proof. The team in the regular season is not necessarily the same you see in the playoffs, and the same core, with all-star Drexler swapped with the underrated Thorpe, was a great team in the regular season the year before. If this team was truly magic, then what happened next season? They were swept by the Sonics, who were the opposite of the Rockets in the playoff overachieving department the previous two seasons. The NBA is a chaotic, swirling sea, and we try to map the depths by the few rocky outcrops that happen to rise above into our view.

We love underdogs for whatever psychological reasons -- perhaps it's cultural, perhaps it's part of tradition of the US. We'll give more credit for winning when no one expects it, and we give most of that credit to the star power. Basketball's a team game, except when it comes to titles. Yet we'll forgive a team for underachieving at other times, and we'll ignore why a team isn't playing better with its star. The playoffs are short and ultimately unpredictable. A team that looks unbeatable under duress one season will lose the next one. Cinderella one year is a spoiled princess and a hag the next.

One world thinking

Olajuwon is a great player. I am not denying that. I would not doubt that he plays better in the playoffs, adjusting for the strengths of the teams he faces, because his PER actually increases from the regular season to the corresponding playoff season, which is rare for anyone. But if there's a world where the Rockets lose in game sevens in '94 and '95, through no fault of his own, some of the very same people who sing his praises would criticizing his lack of title hardware.

Dirk Nowitzki lived through this and almost joined the club of Barkley and Karl Malone. His fans would point out his great playoff moments, his stats against great defenses, but the first round defeat and the loss in the finals soured his legacy. But every player with a long career has sets of both types of moments.

And one missed shot, one bad pass, one errant bounce can forever alter the perception of these great players. I don't want an event Dirk had no control over to determine his legacy. In another world, Dirk is vastly underrated. It shouldn't be that way.

Wednesday, March 26, 2014

2000 RAPM: non-prior and prior informed

Background for what RAPM is: +/- was a revolution for the NBA because it allowed for a completely new method at evaluating players. You look at how a team scores and defends with you on the court and without you. When you set players as variables, you can use regression to calculate player impact. It's a full scope view of what matters in a game: outscoring your opponent. However, it's noisy for a number of reasons. One is that some player combinations are rare (this is known as collinearity.) Another is that the models don't deal well with low minute players, as they don't have enough of a sample for an accurate estimate and will often produce a ludicrous result just to "fit" the data better. 

In simple terms, RAPM deals with this by introducing a heavy dose of regression to the mean. While traditional adjusted +/- creates a model by minimizing error (the difference between the actual points per possession scored/allowed and the expected), RAPM also minimizes the coefficients in the model using a lambda term. The coefficients are reduced toward the "prior," which can be set as zero or as a set of prior values (like the previous season's result.) Using a prior set of values, Bayesian analysis, greatly improves the results. Players with few possessions/minutes will have results close to their priors because their sample size isn't big enough to prove to the model they're more or less valuable.

As the league tried to recover from the lockout season, a new dynasty took form. Although it sounds odd now, the Shaq and Kobe duo were once disappointments, but with Phil Jackson on board as a coach and Shaq putting in more work than he ever had before the Lakers had an all-time great season -- 67 wins, a +8.6 point differential, and a championship. It was the dawn of a new era, and while it wasn't technically a new millennium (that's 2001), a number of new stars were surfacing. Garnett was second in MVP voting and did everything for his Minnesota team. Iverson took a step forward scoring 28 a game. Vince Carter reinvigorated the all-star weekend with his epic dunk contest (oh and he played basketball too.) And other young players emerged -- Dirk went from an unknown German into a intriguingly good player, Ray Allen topped 20 points per game for the first time, Kobe did so as well (the first of 14 such seasons), and Elton Brand was rookie of the year.

How does NPI RAPM view the players? Well, just as a reminder, non prior informed RAPM often has wonky results because there's not enough data -- Rodney Rogers tops the list. He was a big forward who had a career year, shooting efficiently and spacing the floor for the 53 win Phoenix Suns, who toppled the Duncan-less Spurs in the playoffs. Simply put, when Rogers was on the floor the Suns outscored their opponents by 9.6 points per 100 possessions, but it dropped to virtually 0 without him. He also won sixth man of the year; the fact that Phoenix plays with the best with him on the court is impressive. And the adjusted +/- rating, RAPM, obviously agrees that the Suns were better with him. Shaq, however, is third, curiously behind Terry Porter (Spurs that season.) Stockton and Vince Carter continue to be plus/minus stars even without the bias of a prior, and Payton looks great again on offense. As for a historically underrated player, Bo Outlaw was fourth overall -- and he was fifth in 1997 (NPI).

 *When you reference the spreadsheet, try to include the version number. This will reduce future discrepancies.

The preferred form of RAPM, however, is in the spreadsheet below. The top twenty consists of stars and highly respected players with unique skillsets in Sabonis, Rasheed Wallace, Divac, Mutombo, Eddie Jones, and Robert Horry. But Shaq destroys everyone. Playing 40 minutes a game, he would, going purely by the numbers, take an average team to 60 wins.

 *When you reference the spreadsheet, try to include the version number. This will reduce future discrepancies.

RAPM and MVP voting agree on the first two names, and the rest of the top five in voting are rated well too. The divergence starts with big men who, mainly due to defense, are found to be more valuable from plus/minus, but there are only a couple guys in the top ten in MVP voting who aren't highly rated. Iverson came in 7th, winning the MVP next season, but he had one of the worst defensive plus/minus values. Webber also had fairly mediocre plus/minus stats, but was voted 9th. The all-NBA voting follows a similar path as 11 of the 15 players were in the top 25 in RAPM. The other four names are known for not being rated well in the stats community: the aforementioned Iverson and Webber, Stephon Marbury, and Kobe Bryant, who's actually a significant negative on defense. For the Rookie of the Year results, Elton Brand and Steve Francis shared the trophy, but according to this stat they were two of the worst rookies. Instead the guys who came in third and fourth in the voting, Odom and Andre Miller, should have won -- they were first and second, respectively, in RAPM, where Odom in particular was a very valuable player at +3.4. Rookies are rarely that valuable.

The best offensive player was Shaq -- not surprising because he averaged nearly 30 points a game on great efficiency with almost 4 assists. Karl Malone was second; apparently he slipped on defense but was still a scoring machine. Grant Hill, in his last great season, was third, as he was a point forward who scored 26 a game. One surprise is that Gary Payton's value appears to be more on offense because he's fourth here, matching previous results. Shooting legend Reggie Miller was fifth in offense in his last all-star season. Iverson, criticized for his shot selection, was actually eighth. Defense, of course, is dominated by big men -- Mutombo is first again, the giant Shawn Bradley second, and the underrated Bo Outlaw third (though he did pick one up vote for Defensive Player of the Year.) David Robinson was fourth, as RAPM finds him to be the basis for San Antonio's defense. Rasheed Wallace and Vlade Divac, not known for defense, followed closely. The Defensive Player of the Year was actually Mourning, who was tenth in defensive RAPM -- not a terrible choice according to plus/minus, but I suspect there was some voter fatigue with Mutombo. As a last note, this is believed to be Shaq's best season on defense, but he doesn't show up, and instead the highest rated Laker, and highest rated perimeter player in the league, was actually Derek Fisher.

RAPM, like any metric, isn't perfect, but it can perform as well or better than the popular box score metrics PER and Win Shares. For example, while Kobe's defensive +/- doesn't necessarily mean he was a "bad" defender, you can treat it like robust evidence about what effect his defense had in 2000. But every single metric agrees that Shaquille O'Neal stormed the league and was by far the best player. Plus/minus stats are often used to identify undervalued players, and in this context it's more of a historical retrospective for unheralded guys like Divac, but sometimes it's just fun seeing what legends have done in the past. Shaq owned the NBA that year.

Click here for the link to the spreadsheet.

Monday, March 3, 2014

1999 RAPM: non-prior and prior informed

Background for what RAPM is: +/- was a revolution for the NBA because it allowed a completely new method at evaluating players. You look at how a team scores and defends with you on the court and without you. When you set players as variables, you can use regression to calculate player impact. It's a full scope view of what matters in a game: outscoring your opponent. However, it's noisy for a number of reasons. One is that some player combinations are rare (this is known as collinearity.) Another is that the models don't deal well with players with low minutes, as they don't have enough of a sample for an accurate estimate and will often produce a ludicrous result just to "fit" the data better. 

In simple terms, RAPM deals with this by introducing a heavy dose of regression to the mean. While traditional adjusted +/- creates a model by minimizing error (the difference between the actual points per possession scored/allowed and the expected), RAPM also minimizes the coefficients in the model using a lambda term. The coefficients are reduced toward the "prior," which can be set as zero or as a set of prior values (like the previous season's result.) Players with few possessions/minutes will have results close to their priors because their sample size isn't big enough to prove to the model they're more or less valuable.

The NBA lockout that delayed the 1999 season put a serious dent in the league and wasn't aided by the retirement of Michael Jordan and the dismantling of the Bulls. It was a transition year, with the old guard dropping off with the exception of iron-man Karl Malone, while the next generation was still developing and coming into their own. With a shortened season opening in February, games jammed together with too many back-to-back's, a team known for professionalism, led by a second year Duncan and a 33 year-old David Robinson, won the championship. This was Shaq pre-Phil Jackson but post-Jordan; it's a truly lost season.

Nevertheless, the games were played, and we can't ignore what happened. Surprisingly, by NPI RAPM, the best player in 1999 was ... David Robinson, who had a monster defensive impact. Just a season before Duncan and Robinson were neck-and-neck on a per possession basis, but the Admiral leaps ahead here. Though with a 24.9 PER and a league-leading 0.261 Win Shares per 48 minutes, and stats that did not drop in the postseason, perhaps it shouldn't be surprising. However, he only played 31.7 minutes in the regular season, lending credence to an argument for another candidate like Alonzo Mourning. Jaren Jackson is probably the oddest name near the top of the list. He was a Spurs teammate but only scored 6.4 points per game. And he's why we turn to prior-informed versions....

 *When you reference the spreadsheet, try to include the version number. This will reduce future discrepancies.

With two seasons as seed data behind it, RPI RAPM (an adjusted ridge-regression model given prior information) is a more powerful tool. After a dominating 1998, Shaq loses the top spot due to poorer defense. This was probably Alonzo's greatest season with award-caliber defense coupled with potent scoring (20 points in a slow-paced league and a 56 TS%.) He was second in MVP voting -- justified here. As for some surprising results, Blaylock continues to be a plus/minus star, and Rasheed Wallace rockets to the top on the deep but strong Blazers team that nearly defeated the Lakers in 2000. Charles Barkley also continues his one-sided results: his offensive RAPM rises to 6.16 but his defense slips to -2.64.

 *When you reference the spreadsheet, try to include the version number. This will reduce future discrepancies.

Fourth in MVP voting was Iverson. He's 75th here with good offensive value despite his poor shooting efficiency, but that's submarined by his porous defense, according to RAPM. Duncan was third, and he's backed-up by a very strong +5.2 mark. Rookie of the year Vince Carter, however, climbs from the rookie prior of -2 (given to all rookies) to +3.21 overall. He heads a pretty strong rookie class including Pierce, Brad Miller, and Dirk Nowitzki. Jason Williams received some rookie love, but by this method he was a significantly negative player. For validation of the model, 16 of the top 24 players were on all-NBA teams -- or perhaps that's validation of the all-NBA teams. The lowest rated all-NBA player was the young Kobe Bryant at -1.2 and it wasn't close: he was ranked 267th and the next lowest all-NBA player was McDyess at 117.

The aforementioned Barkley was the best offensive player, according to RPI RAPM, and that's not just because of his past heroics: his non-prior informed RAPM was second. Shaq's stats were muted by fewer minutes and the slogging pace of the lockout season, but, with the first of his strong of 30+ PER seasons, he's second in RPI RAPM on offense, barely trailing Barkley. Malone rounds out the top three, and the next highest players are more intriguing: Reggie's three-point bombing is probably underrated by most box score metrics at fourth here, Grant Hill's fifth, and Jeff Hornacek, again, has a strong showing at sixth followed by Blaylock. Both Hornacek and Blaylock are players that appear to be underrated by a method that wasn't available in the 90's. Mutombo, David Robinson, and Mourning were top three by defense. Mourning won the Defensive Player of the Year award, which wasn't a bad choice, necessarily, but Mutombo has a huge lead in RAPM. Rasheed was fourth, and his value was demonstrated later when he was traded to the Pistons to complete one of the greatest defensive teams ever. Jaren Jackson appears to be an anomalous result, ranking fifth, but he was San Antonio's first "three-and-D" player. He concentrated on defense, and his value does not show up in box scores. His NPI RAPM wasn't entirely misleading. Trailing him were Shawn Bradley, Olajuwon, and Garnett. Gigantic size might actually be underrated because Yao Ming's defense looked better under the RAPM microscope and Bradley joins Gheorghe Muresan as another 7' 7" player with strong defensive results.

With three seasons of data, outliers can be weeded out and patterns emerge. We can reevaluate past legends like David Robinson and Stockton who rate well, but sub-stars should receive more discussion. Mookie Blaylock, by NPI RAPM, was 8th in 1997, 11th in 1998, and 36th in 1999. By prior-informed regression, he was fifth in 1999. Blaylock played in the shadows of point guards like Stockton and Kidd, but he may have been better than we thought. Defense is notoriously tricky to judge by basic stats, and this is where RAPM can be most illuminating. Bo Outlaw stands out as a defensive force who wasn't heralded. Divac fares well as a two-way center, often better than his famous teammate Chris Webber. And, again, we shouldn't completely toss aside Jaren Jackson's RAPM results. He was a forgotten role player and not highly regarded, but like Shane Batter we should look beyond the box score.

Click here for the link to the spreadsheet.