Tuesday, April 30, 2013

The 2013 Season Predictions Review

As the 2012-13 season has closed, it's time to look back on season with reverie -- and critical on our perception of what we thought would happen.

In my Denver preview, I discussed at length how important and underrated Nene was, and how McGee was a basketcase who wouldn't help a team despite his stats. Well, I was mostly correct because Washingtion went on a great run with Nene to end the year, but Denver stormed the league in the second half as well. I think that has to do with their other players improving, like Koufos, especially since McGee came off the bench.

My Lakers preview? Well, you can imagine how that looks now. I mentioned how putting a bunch of high usage players together increases their efficiency. Er, whoops. But to my credit there's this sentence: "Injuries are also a concern, and those issues could affect this team more than almost any other one because their bench is, simply put, deplorable." Injuries destroyed the team, but in hindsight the team just didn't fit together. For my Warriors preview, even though my win prediction was inaccurate I was actually pretty spot on: the article was about how, due to Bogut and Curry's injury uncertainties, the team had the highest variance of what we'd expect. Curry had a healthy year, playing a huge amount of minutes, while Bogut at least saw some action, while the team put in a good defensive system to essentially replace his value. As a result, they were an above average team and are now making noise against the Nuggets in the playoffs.

Random predictions

In previewing the season, I listed a random prediction for each team. It was all in good fun, and some weren't serious. But read today? It's pretty hilarious how wrong I was. I'll go through them briefly:


Minnesota Timberwolves: Kirilenko had a fine year as I suggested, but Love had no opportunity to flirt with a 50/40/90 season with 15 rebounds for a month. But someone else had a 50/40/90 season....

Utah Jazz: "Fans chant for Kanter to replace Jefferson, but it's the Millsap-Favors lineup that kills." Well, that didn't exactly happen -- looking at their lineup data, they really didn't play in the frontcourt together. Oops.

Oklahoma City Thunder: "Thabeet will once again fail but not after showing a couple promising games. ... Ibaka will put up heavy DPOTY consideration again, even though the subtraction of Howard decimates the Magic's defense." Betting against Thabeet is like taking candy from a baby, and Ibaka indeed garnered well in the defensive player of the year voting, inexplicably.

Denver Nuggets: JaVale McGee has a respectable season with some maturity? Hm, I was almost correct. He's been better, and on a better team his strengths are showcased.

Portland Blazers: "Lillard will be a weak starter at best." ...In this case, I'm glad to be wrong.

Golden State Warriors: "Curry will have another injury-plauged year, but a new player will emerge from out of nowhere and impress in typical Warriors fashion." Curry was healthy enough to break the three-point record. It's hard to be more wrong than that. Breakout player? Draymond Green?

Sacramento Kings: "Thomas Robinson will end up as runner-up rookie of the year with the still mercurial Cousins as one of the most intriguing young frontcourts." I didn't even get the team correct.

LA Clippers: "Chris Paul will shoot 90% from the line, but the team will still finish in the bottom three in free-throw shooting." I actually wasn't far off -- Paul was at 88.5% yet the Clippers were 27th in the league. So close!

Phoenix Suns: "Despite Nash's departure, Gortat has a career year averaging 17 and 10 on healthy percentages. And everyone forgets about the team." The latter point? Completely right. But Gortat averaged 11 and 8.5 and threw his team under the bus.

LA Lakers: "The Nash-Howard duo is an unguardable combination, but the year will be marred by discussion of how often Kobe shoots. (Hint: too much.)" Uh, yeah, well, the year was marred by a discussion of many things the Lakers did wrong, and one of them was Kobe's volume, so
....

Houston Rockets: "The Rockets surprise people again through their gluttony of forwards, Asik's defense ... a record that keeps them out of the playoffs but far from the top of the lottery." At least I knew they were going to be good and that Asik wouldn't be overmatched in a starting role.

San Antonio Spurs: "After people catch on the Spurs won't go away, they actually don't finish with the best record in the west; they go second after Oklahoma City. Also, Parker misses too many games." I was completely right for once, even down to Parker missing games.

Dallas: "Eddy Curry plays 12 games." Well, I was only off by ten games -- he played two. Strange as it sounds, but at the beginning of the season there was some actual discussion about his signing with the Mavs.

New Orleans: "Anthony Davis will narrowly miss the all-star game, and many will discuss why he should have been on it." Davis kept getting injured and his minutes were too low. Otherwise he might have had a shot ... at least in the east.

Memphis: "Tony Allen and Tony Wroten combine to form arguably the most formidable backcourt defending duo in short time." It was a formidable defense, but uh Allen-Wroten only played eight minutes together.

Toronto Raptors: "Lowry and Jonas push the Raptors toward the direction of the playoffs, narrowly missing the extra season as everyone forgets Calderon." In a way, this was quite accurate, as they traded Calderon and didn't seem to miss him.

Philadelphia 76ers: "Bynum will miss at least 20 games and the 76ers will nearly miss the playoffs." They missed the playoffs, and I was technically correct he missed at "least" 20 games ... in another more accurate assessment, I was wrong about his missed games.

New York Knicks: "The Knicks perform well with Amare injured, but when he comes back and Carmelo plays small forward more their record dives and they refuse to adjust." This sorta happened, but fortunately Amare got injured again. Unfortunately to their bank account, he has a huge contract.

Brooklyn Nets: "Brook Lopez ups his rebounding to ... 6.2 a game. Their wings of Deron Williams, Joe Johnson, and Gerald Wallace often outrebound the frontline, hearkening back to the Kidd-Carter days." Lopez averaged 6.9 rebounds, but his minutes were pretty low. It was a nice season for Lopez, who rebounded much better. But the rebounding story this year is that Reggie Evans rebounded like prime Dennis Rodman, posting the second greatest rebound rate in league history.

Boston Celtics: "Fab Melo proves to be useless, Milicic proves to be himself, and Jared Sullinger puts up some pretty numbers with some pretty bad defense." In hindsight, it's surprising people thought they'd be good.

Chicago Bulls: "Bulls struggle when Deng gets injured and Boozer keeps getting more minutes than Gibson. They finish sixth in the conference." Deng somehow didn't get injured, even though he was basically driven into the ground, Boozer got more minutes, and yeah they finished sixth in the conference. Not bad.

Indiana Pacers: "Paul George has a breakout year with 18 points a game." Everyone's breakout candidate, but I was pretty accurate with his points -- he averaged 17.4.

Detroit Pistons: "Together, Monroe-Drummond average 25 points, 18 rebounds, 3 blocks, 2 steals, and 3 assists." They actually averaged 23.9 points, 17.2 rebounds, 2.2 blocks, 2.3 steals, and 4 assists. That's not a bad center combo.

Cleveland Cavaliers: "Irving makes the all-star team averaging over 20 points a game." Yup. He averaged 22.5, comfortably over 20 but not destroying the mark. But to be fair this was an easy one.

Milwaukee Bucks: "At times, the Bucks' plan of Ellis and Jennings ball-hogging alongside long-armed defenders from Moute to John Henson to Udoh works, but at most other times it doesn't -- a bottom six offense and a bottom 13 defense." Bottom 9 offense and top 12 defense, thanks to Larry Sanders, who came out of nowhere.

Miami Heat: "James' MVP-campaign nearly comes to an end, but when Wade gets injured and the Heat maintain a big lead in the standings he wins it over Chris Paul." Er, Wade finally got healthy and they pulled off a 27 game win streak. Same thing!

Atlanta Hawks: "Even though their "star" left for greener pastures, the Hawks improve their win percentage with the further development of Teague, their outside shooting, and a healthy Horford." Yeah, they got worse....

Washington Wizards: "John Wall has a break-out year averaging near 9 assists and shooting a respectable percentage from the field for the first time." At 44% from the field and a TS% of 52, closer to the league average, he was finally respectable on offense, but his assists were at 7.6. However, he's started to capitalize on his defensive potential.

Orlando Magic: "Orlando will give up 130 points one game while losing by over 40. This is what happens when you replace your starting center, Dwight Howard, with Glen Davis." They were so close! And not for lacking of trying. One game against the Raptors of all teams they lost 123 to 88, and in another they lost 107 to 68. Glen Davis was indeed a terrible starting center, but he got injured and opened up a spot for rebounding savant Vucevic.

Charlotte Bobcats: "They increase their wins to an impressive ... 15. Veterans Sessions and Haywood have no point to participation in competition. By the way, they're paying Diop this much. Try to guess. It's fun!" Well, they had 21 wins, but tripling your win total isn't impressive when you started at 7 wins the year before. I had forgotten how much they were paying Diop this season. I guessed 5 million. Yikes....


Win totals

In reviewing my season prediction for games won, I wasn't Nostradamus but I wasn't terrible either. Truthfully, I used a very rudimentary system, and was curious about how it'd perform. To judge the predictions, there's root-mean squared error (take the sum of all the squared errors, divide by the number of predictions or teams, and then take the square root of the result.) Here's a handy link for a summary of numerous predictions made. How did I perform? Well, I actually ended up in between two Wages of Wins predictions:

My RMSE:
7.73

WOW1:
7.61

WOW2:
7.87

Wages of Wins made two sets of predictions. The second iteration exists because the author changed his minutes allocation. While the first version was better, I don't think it's fair to put out two projections in competition. I wish I had made two separation predictions so I could just flaunt the better one. In essence, I think it's fair to say my modest win projection beat Wages of Wins. However, there's another issue: instead of using integer win predictions, they used a decimal form, and I didn't. To put us on equal footing, I converted their numbers to integers: WOW1 goes to 7.71 and WOW2 to 7.91. Thus, my rough prediction was essentially tied with their first set of predictions, while it beat their real (final) prediction by a fair margin.

Breaking down my win predictions, I was of course destroyed by the Lakers, who suffered a calamitous season of outrageous fortune. I was 14 wins too high; however, I was 18 wins too low for the Golden State Warriors, who actually had a healthy season of Curry and also outperformed their point differential. For both New Orleans and Phoenix, I was 12 wins too high -- the frontcourt of Ryan Anderson and Anthony Davis didn't have much of an impact, and I thought it would, and Nash's departure destroyed the Suns' season. Likewise, Orlando imploded without Howard, even though they had some quality young guys. I was terrible in the west, where my RMSE for those teams was 8.90, while I was pretty respectable in the east at 6.35. But for nine teams I was within three wins, and one team I hit it point blank. Unfortunately for Sacramento, it was the Kings at 28 wins. And strangely enough, I underestimated both teams in the Harden trade -- Durant unleashed an offensive season for the ages, and while Harden and Asik performed about as well as I thought their motley cast of young players were hard to predict.

My takeaway from the predictions? Minutes projections are extremely important, and injuries are a last frontier in NBA analytics. Understanding Howard's injury, Bynum's knee, and Curry's ankles will save a season prediction from going sour. I now have an idea for a pretty interesting win projection model, but alas -- it won't understand Kobe's Achilles injury, and frankly modern medical science doesn't either.

Friday, April 26, 2013

How Westbrook Changes the 2013 Playoffs

The problem with predictions is that the future will not be like the past. Random events occur, throwing off everything to come. Westbrook allegedly hasn't missed a game since junior high, but after tearing his meniscus he's out for weeks, where he may not even make it back in time for the start of the NBA finals. As many NBA analysts have discussed, it was looking like a rematch of Thunder-Heat, especially when one considers the historically strong point differential of the Thunder. With Westbrook out, however, suddenly the west is wide open, and we may see an intriguing, veteran Spurs team in the finals or, as strange as it sounds, the Clippers.

Since Westbrook hasn't missed an NBA game before (439 including the playoffs consecutively), this is a fantastic challenge to the NBA statistics community on predicting the Thunder's performance without him. There are extremely few minutes with backup point guard Reggie Jackson with the other starters, much less ancient veteran Fisher. Lineup analysis is difficult, as the Thunder have never had to deal with extended stretches without Westbrook. It will take careful, precise predictions of how the team performs.

Additionally, here are also two schools of thought on Westbrook -- he shoots way too much, considering Durant is on the same team, damaging his team in the process; or his aggressive style is overall a tremendous positive, even overriding the bad shots. The debate will be armed with new information, but I think the latter has a stronger case. The Thunder's offense was arguably the best during the 2013 season, especially adjusting for strength of schedule, after coming in second in 2012 and Westbrook was the point guard orchestrating the affairs and the player who shot the most. Although Durant is an amazing shooter, Westbrook often receives the short end of the stick at the end of shot clocks, responsible for creating offense out of thin air. But now low usage players well be replacing his minutes, and if he is indeed a monkey wrench in a well-oiled machine the Thunder should hardly miss a beat.

Methods: Win Shares, PER, and IPV (TalkingPractice)

As a limited dataset of lineups featuring non-Westbrook point guards inhibits 5-man analysis, and the fact that there are no missed games to analyze how the Thunder fare without him, the effect of losing Westbrook has to be estimated through individual player stats. I'll present three different methods and a handful of estimates through guessing the players whose minutes will increase without him. I'll also judge the Thunder's strength with and without him through SRS (point differential adjusted for strength of schedule.)

(The next part explains how I calculated the new team strength, so you may skip that if you want.)


The first metric I used is the popular all-in-one player metric: Win Shares. This is easy to work with because Win Shares are in a system where it's easy to convert player changes into team level win percentage changes. For the win percentage of the Thunder before the injury, I used the projected regular season win percentage based on adjusted point differential because it's generally more predictive than plain ol' win percentage. From there, it's a quick back calculation to the new Westbrook-less point differential using the Pythagorean win formula (Points^14/(Points^14 + Opposing points^14)).


The next all-in-one metric is PER, the flagship of the ESPN empire. Frankly, I'm using this out of curiosity, as I don't think it will perform well and due to its heavy bias to high usage players like Westbrook, and the low PER's from the bench guys, it'll be interesting to see the pessimistic outlook. For estimating the loss in SRS, I converted his EWA (estimated wins added through PER) to EWA per game and then subtracted that from the Thunder's win percentage. PER assumes Westbrook was displacing a replacement level player, but this isn't entirely accurate so I added in the estimated boost in win percentage from a set of replacement minutes heavy in crappy backups (i.e. Fisher) and another set with more minutes devoted to better players (i.e. increasing the workload for Durant and Ibaka.) I did the same boost with the other metrics. ESPN gives the full details on Estimated Wins Added on the advanced stats page, if one is curious.


VA: Value Added - the estimated number of points a player adds to a team’s season total above what a 'replacement player' (for instance, the 12th man on the roster) would produce. Value Added = ([Minutes * (PER - PRL)] / 67). PRL (Position Replacement Level) = 11.5 for power forwards, 11.0 for point guards, 10.6 for centers, 10.5 for shooting guards and small forwards
EWA: Estimated Wins Added - Value Added divided by 30, giving the estimated number of wins a player adds to a team’s season total above what a 'replacement player' would produce.

The last metric used is a form of regularized adjusted +/- from TalkingPractice. Plus/minus is just looking at the point differential when a player is on the court, adjusted means regression is used to find the best fit, and regularized is basically a fancy mathematical way to reduce wildly high/low estimates for players with low minutes. The site calls their metric IPV (individual player value), and due to the NBA statistical analyst exodus (they're being hired by teams) it's the best, and one of the few, publicly available +/- stats. A few box score stats are also used to help shape IPV, so it's also not a complete departure from conventional stats.

For player minutes, I have two different situations. One is called the "pessimistic" model: no other player minutes will increase except point guard backups Reggie Jackson by 18 minutes, Fisher by 12 minutes, and shooting guard Sefolosha by 5 minutes. Note that the total minutes displaced here is 35, which is roughly Westbrook's average for the regular season. I'm calculating an adjusted point differential based on the regular season stats instead of a minutes distribution you'd see in the playoffs (Westbrook played 38 a game last year in the playoffs.) Since I've used regular season point differential in playoff predictions, this is fine, as I'll be comparing alike-things. The other minutes distribution, the "optimal model," gives more time to their better players, and suggests the Thunder play less smallball because they lost one of their best small players: Reggie Jackson 13 minutes, Fisher 8 minutes, Sefolosha 6 minutes, Durant 2 minutes, and Ibaka 6 minutes.

Oklahoma City's new team strength

Based on a few different player metrics, there's a range of estimated team strengths in point differential for the Thunder without Westbrook. Win Shares doesn't love Westbrook because it's built on efficiency, and as I've discussed before it gives a lot of unearned credit just for simply being on a great defensive team. In replacing him, they'd still be an elite team, even with the pessimistic minutes distribution. PER, however, views his high usage play as extremely valuable, and finds the backup options as the flotsam of the league -- even Sefolosha doesn't have a good PER. IPV, however, finds an estimate in between the two metrics, and sees the dropoff as significant but not debilitating.

Metric
OKC pre-injury SRS
Westbrook per game
OKC post-injury SRS, pess. mins
OKC post-injury SRS, opt. mins
Win Shares
9.15
5.1
6.5
7.2
PER
9.15
7.2
2.2
2.7
IPV
9.15
3.4
5.3
6.3

I don't believe Win Shares or PER are up to this task, and IPV nicely finds a middle ground between the total collapse of not having a creator and the underrated contributions of their other players. Given our limited information on Reggie Jackson and Fisher, as well the future minutes distribution in the playoffs, any estimate will be a rough guess at best, but the magnitude is important to note. A point differential from 5.1 to 6.5 (IPV's range with a little leeway) is in the same range as the Nuggets (+5.4) and the Clippers (+6.4.) PER and Win Shares also tend to clump all the player value at the very top, evident from Westbrook's estimated +/- per game.

However, Fisher's IPV value is strangely decent, near 0 instead of the typical negative value you'd see from an end of the bench type. This is where knowledge of the metrics you use can come in handy -- IPV uses a prior value that heavily influences the final number, and Fisher has had consistently strong +/- values in the past. Since he's played few minutes recently, there is not enough information to downgrade Fisher's IPV any further (for example, Granger's +/- is very similar to the one he had last year because he had very few possessions to convince the model his value was different.) That type of adjustment may bring down their point differential incrementally, and you can make another one due to hitherto unused lineups being rusty as players try to adapt to each other.

I'd estimate this Thunder team is now somewhere around the +5.5 SRS level, depending on who they play and how Fisher looks. This should be enough to finish off the Rockets, but don't be surprised if they drop a couple games. It looks like the Clippers will meet them in the next round, and they're more likely the stronger team. With homecourt advantage, the series based on my estimates is nearly a coin flip. I'd take the Clippers in seven, maybe even less -- they have a slight edge in adjusted point differential even adjusting for homecourt, and the Thunder have to go against Chris Paul without their all-star point guard. The Clippers often use a lot of players on offense you can hide weaker defenders like Fisher, but it's not enough to overcome the loss of Westbrook. Against a healthy Spurs team? They'd have little chance.

It's time to see what Westbrook's value really is.

Free Throws and Hand Size: Part 2

Introduction

Last year, I looked data between hand size and free-throw percentage to settle the age-old debate about whether or not hand size has a negative association with free throw shooting. Truthfully, I just wanted to stake my claim to this study area because only recently (starting in 2011) has hand size been objectively measured for an acceptable size of the NBA population thanks to the predraft camp, and I did this barely after the 2012 season even started. Now with another season done, there's a whole crop of new rookies, as well as more free throw attempts from guys drafted in the two seasons before that.

When Rondo steps up to the foul line, the commentator will often mention how his large hands make it difficult to shoot free throws and jump shots in general. When Shaq was playing, he was the go-to guy for this excuse. If your hands are too big, the reasoning goes, you are unable to properly grip the ball, and end up awkwardly shooting the projectile as if it were a tennis ball, not a basketball. And so Rondo, born with rare physical gifts, was doomed from the start, and thus was a bad free throw shooter from birth. It's a good story, but it's anecdotal evidence and there are no numbers to back up the claim.

Study methodology

I think there are two components to this myth. One is that large hands make it more difficult to shoot free throws (or jump shots.) The other is that large hands make is impossible to be a good free throw shooter, or at least places a ceiling on your ability to shoot. Both are important to consider. For free throws being more difficult, trend or regression analysis is needed: is hand size a significant predictor of free-throw percentage? For placing a ceiling on free throw ability, even simple graphing can get the job done because you can identify how many players shoot well with large hands.

Although this myth involves overall shooting, only free throws are considered because they're objective, isolated events, which is rare in basketball. The excellent resource DraftExpress was used to compile hand length, as well as other information. Basketball-reference was used to collect free throw totals because of its snazzy season finder, which subsets by many factors like player year. Positions were tabulated for each player -- 1 for point guards, 5 for centers, 1.5 for players who played both point and shooting guard, etc. -- where they were determined by a combination of common sense, position listed on b-ref, and looking through lineup combinations for more obscure players on 82games. For height, I used height without shoes because shoes add uneven amount of inches to players, while for age I used day of birth age, meaning it's in decimal form based upon days, not simple calender year.

I will note here that even measured from the hallowed predraft camp are not flawless, immutable -- height changes based on time of day, where a person is the morning is taller than at night; measurement procedures are not perfect and I've seen guys tested again the following year and they somehow have larger hands even at age 21; and even things like age are not as absolute as one would think, since Shabazz recently aged an entire year when someone found his real birth certificate ("Dad, how many lies have I been living?") As an additional note, hand width is sporadically measured, and I wish it were measured more often but I can only do this analysis with hand length.

Results

There were 137 players with hand measurements who took a free throw for a total of 20,167 free throws -- and this includes 75 players with at least 50 free throws taken. At this point, the best and simplest thing to do is graph hand length and free-throw percentage. (While doing a research study, this is what's known as data exploration and should always be done to look for patterns.) The results are below for all players whose hand lengths were measured at the predraft camp from 2011 to the present with a minimum of 50 free throw attempts in regular season games. The immediate reaction here is that it's a huge mess of data points with only a weak trend. I coded position with a color, as best I could, so you can parse the data within positions. There are a few young guys near 80%, above average, but they span nearly the whole range of hand sizes, and there are players with smaller than average hands who are poor shooters. Also, you're probably wondering about the outliers: Greg Smith (Houston Rockets) has giant hands, though for a power forward he's not terrible, and, yeah, Drummond is the guy south of 40%, but his hands are pretty average for a center.
Since there's an obvious correlation of height and hand size, and a correlation of height and position, one can't simply look at hand size and free-throw percentage. Breaking down the results by position, however, the results are just as noisy. I've produced the same graphs but focusing on the five positions. It's not a big sample size, but as can be seen in the graph below for point guards there's no trend. Lillard, oddly enough, has the biggest hands out of the group, but he's the second best free-throw shooting, as well as good marksman from three-point range -- and his hands are 9.75 inches wide, so it's not the case of "skinny" hands muddying the results.


I included the same graph for the other four positions below for perusing. It's the same story as the point guard graph: there is no trend. I believe looking at free throw shooting within positions is the best way to approach this topic. In grouping by position, players have roles that are more similar so they're expected to have a certain shooting skill level. Since there are more people who are point guard size, there's more potential to find great shooters. However, it's not a perfect system, as there's probably a self-selection bias for position where a player who can't shoot gravitates toward the frontcourt. Players with big hands for their height who thus can't shoot, the reasoning could go, will move a position to compensate for their skill deficiency, and sorting by position will mask these effects. But there's more we can do once we look into the numbers.




A series of regression tests have been run on the same data graphed above (players with a minimum of 50 FTA's.) Hand size by itself is a significant predictor of free-throw percentage, but as we know that's because it's correlated with position quite well. In fact, position by itself is a much better predictor. The R^2 value shows how much of the variation is explained by the given variables, where the adjusted part gives a penalty for having more variables in the model. So in comparing the first two models, position explains 2.3 times more of the variability than hand size does. If you contend that position is the dominating factor but within positions hand size does matter, despite what you saw from the graphs, there is no evidence for this given the results in the third model. Hand size has a p-value of 0.473, meaning there's a 47.3% chance hand size has no effect on free-throw percentage when position is another variable. Another way to prove this is with an F-test between models 2 and 3. Adding the variable of hand size did not significantly improve the results (again) according to the F-test, which is a useful tool when dealing with multiple models. Height was thrown into the mix to see if it could help the variables get along, but that was also not a success. Position is the overwhelming factor, not hand size.

Model
adj. R^2
Intercept
Variables
Coefficient
St. error
P-value
1
0.07987
119
Hand size (in)
-5.42
1.99
8.05E-3
2
0.18299
80.7
Position
-3.27
0.779
7.65E-5
3
0.17759
94.0
Hand size (in)
-1.62
2.24
0.473
Position
-2.90
0.932
2.68E-3
4
0.14540
172
Hand size (in)
-2.35
2.26
0.301
Height (in)
-1.03
0.401
0.0123
5
0.16668
80.1
Hand size (in)
-1.67
2.27
0.465
Position
-3.32
1.97
0.0964
Height (in)
0.200
0.830
0.811

Perhaps there's a problem with the data because I'm looking at free-throw percentage of guys with only 50 free throws, as their percentages might be a bit wonky and the noise could muddle the results. There are a couple approaches to this problem. One is to construct a different regression model where the players are weighted by how many free throw attempts they have, and the other is to pool all free throws into hand size categories. The latter approach will be used first. There are nine hand sizes with at least 300 attempts, where most are well over 2000, covering the range of 7.75 inches to 9.75. In determining whether or not position is more important, a weighted position average was calculated for each category (for example, if there are 100 attempts from a point guard and 50 from a shooting guard, then the average position is 1.33.) To save ink space and another table, the regression results are virtually the same -- hand size by itself is a fine estimator, but position is better and the results do not significantly improve when position and hand size are done together.



Moving onto the weighted regression models, which now include 137 players because with the weighting I don't have to worry about players with low totals skewing the results, the output is interesting. Again, position is a better predictor than hand size, but the p-value in model 8 suggests even when you adjust for position hand size is not a ludicrous variable to consider. A p-value of 0.165 is not significant by any conventional means, but with the limited data set it's intriguing. If you're wondering whether or not model 8 is superior for the inclusion of hand size, there are a few tests for this situation. The most popular is the F-test where you're comparing the residual (difference between predicted and actual value) squared sums between a unrestricted model (the model with more variables, and in this case model 8) and the restricted model (number 7), with an adjustment for the number of observations and how many more variables the unrestricted model has. The F-test, however, states that the reduction in squared errors (i.e. model 8 has a fits the data better) is not statistically significant, and in fact the p-value is 0.165. Model 7 without hand size is preferred. As a last note, I repeated the analysis with height and hand size, since position is slightly subjective and there might be a self-selecting bias as well. The results were virtually the same where the p-value was 0.120.

Model
adj. R^2
Intercept
Variables
Coefficient
St. error
P-value
6
0.1318
131
Hand size (in)
-6.77
1.45
7.71E-6
7
0.2340
81.3
Position
-3.28
0.502
1.27E-9
8
0.2394
100
Hand size (in)
-2.35
1.68
0.165
Position
-2.77
0.618
1.58E-5


Conclusions

The next time you see Rondo brick a free throw off the front of the rim, don't believe the cause is his unusually large hands. Based on a large set of players, including ones with similar sized hands or larger, that is no excuse to be such a poor foul shooter. A series of regression tests have found no evidence of hand size influencing free throw percentage.

You don't even need to mention the regression models for why Rondo doesn't have an excuse. According to Lee Jenkins of Sports Illustrated, Rondo's hands were measured by the Celtics at 9.5 inches long and 10 inches long (that article by SI is completely bizarre: there are accounts of Rondo's connect four prowess and the doctor at his birth remarking on his humongous hands.) Kawhi Leonard, the Spurs' small forward, has hands 9.75 inches long and the second widest hands in the draft database at 11.25 inches (hand widths are incomplete, but it's 167 players.) However, he shoots 80.4% from the line and he's a decent free-point shooter -- and I didn't include the playoffs where he's been slightly better. Rondo's at a pathetic 62% for his career while players with his hand length are at 68% and they're typically power forwards. For another example, Andrew Nicholson has 10 inch long hands but shoots at 80%. He's a rookie and late-comer to the game, playing in the frontcourt where he's not even expected to shoot well; Rondo has no excuse. There are also plenty of historical examples -- Jordan's known for his baseball mitt hands and he shot 83.5%; Connie Hawkins palmed the ball like a tennis ball and still shot 78%; and plenty of giants with large hands like Sabonis, Yao, and Ilgauskas were plus 80% from the line.

But perhaps there's either a small effect hidden within the results here or something overlooked. As I discussed earlier, there are two hypotheses -- hand size negatively impacts free-throw shooting and hand size puts a limit on your ability as a free-throw shooter. The former has been challenged quite effectively with a bevy of statistics. The latter may still have some truth to it, but it's not a large effect since players with large hands are still capable of shooting above the league average. But it might break down at the extremes. There's only one player with 11.25 inch long hands, Greg Smith of the Rockets, and he shoots near 60% (I can't find an official source, but it appears that's around the range of Shaq's hands.)

Obviously, you can't draw conclusions from one player; it'll be interesting when the next draft rolls around if anyone else shows up with boulder size hands. It's also hard to tell from the data if there's a free-throw percentage ceiling across the span of hand sizes. One might posit, for example, that you can't be an elite shooter with hands greater than 9.5 inches, but there are only four players (Irving, Lillard, Klay Thompson, and Isaiah Thomas) you could reasonably argue as elite shooters for their age -- again, not enough for a conclusion.

However, even though the myth is not totally extinguished, it's severely limited -- hand size is not such an overwhelming factor that it applies independently to every player with significant results, and even if your hand size is above average you can still be an above average shooter. Sorry, Rondo, but you'll have to blame something else.

Future work

As with any investigation, the study is ongoing. The hand measurements being for the NBA predraft camp doesn't mean they have to apply to the NBA; I can tabulate the college totals of these players, increasing the sample size because many players measured did not play a single minute and most who did took few free throws. I can also use hand widths, which I have ignored in this article because it limits the number of players. Perhaps hand widths are a more important factor because it has more to do with grip, and college stats can provide enough information for this. But we can't state definitively unless we actually look at the numbers.

Monday, April 22, 2013

Defensive Win Shares Are Completely Broken

As the flagship one metric stat of the popular website basketball-reference, win shares are heavily cited in the  basketball world. Unfortunately, the stat is far from perfect, and the defensive side of the stat is particularly egregious. Defense is notoriously tricky to quantify, and win shares attempts this using the usual bevy of defensive box score stats like blocks and steals, but they also include the team's defensive rating. This means that Zach Randolph gets the same credit for defense Tony Allen and Marc Gasol do, ignoring the few defensive box score stats. The influence of team defense is huge on this stat. What's even more problematic is that it's used in the time before steals and blocks, and when defensive rebounds weren't tracked separately from total rebounds.

The best example of how the metric fails is with Ryan Anderson. Traded from the Magic, who with Howard were a perennial defensive team, to the Hornets, still reeling from tanking and poor decisions, what changed was his scenery, not his defensive skill. Obviously, motivation is important with defense, as is coaching, but his defensive rating tracks closely to this team's rating. 2012 was his breakout season, yet his defensive rating plummeted from 19th in the league, or nearly a 95th percentile, to almost exactly average. The table below has the full details where percentile is based on players that season with at least 500 minutes.

Ryan Anderson's defensive rating (win shares)
Season
Defensive rating
Defensive rating percentile
Team rating
Team ranking
2011
101
94.4
101.8
3
2012
105
51.1
104.1
12
2013
112
7.0
110.3
28


When people use defensive win shares, or win shares in general, they may not fully understand how it's calculated. This is the clearest way to illustrate that.

Need another example? Omer Asik was traded from the defensively dominant Chicago Bulls to the Houston Rockets, where he has to adjust for the mistakes from James Harden and a roster of rookies. Few people watching Asik this year would say he has dropped off significantly in defensive intensity or skill. However, the previous year his rating was 2nd and before that, his rookie season, he was 3rd. This season? 68th. The Rockets are a league average defensive team because of him; when he's off the court they're terrifyingly bad, yet win shares only sees a player who's a great defensive rebounder with some blocked shots on a mediocre defensive team.


Omer Asik's defensive rating (win shares)
Season
Defensive rating
Defensive rating percentile
Team rating
Team ranking
2011
97
99.1
100.3
2
2012
92
99.4
98.3
2
2013
103
80.0
106.1
16


You can see this with players traded midseason too: Tayshaun Prince went from a defensive rating of 111 to 103, effectively going from roughly one of the worst defenders in the league, roughly 10th percentile, to significantly above average. (And no, it's not effort: win shares only sees box score stats, as his defensive rebounding leveled off but his steals and blocks went up, but the major culprit to the change was Memphis' suffocating defense compared to Detroit's mess.)

Need a quick way to discredit the stat to someone? In 2013, DeJuan Blair was 7th in the league in defensive rating.

Saturday, April 20, 2013

2013 Playoff Preview and Predictions

I didn't produce any articles here for a while until the end of the season came (kidnapped by a mysterious Berri gang), but I have more time right now for more content. Expect articles and small research studies to be churned out more frequently.

Eastern Conference

(1) Miami Heat vs. (8) Milwaukee Bucks

When predicting a series, the best bet is five games if you think the outcome is pretty obvious. Four games are only more likely when the series is lopsided, like when a sub-0.500 team meets a defending champion running on all cylinders. Really not much to say here.

What to watch for:
-Larry Sanders' defense. The most unstoppable player once he gets to the rim (LeBron, not Battier, I know you're all thinking) meets one of the best interior defenders.
-What happens with the big man rotation? How much time will Haslem get? How long will they play with a traditional power forward?
-The most interesting part of the series is seeing if the Heat are going for the undefeated playoff run.

Prediction: Miami in four.

(2) New York Knicks vs. (7) Boston Celtics

Although historically it seems more impressive because of the names involved, it's a 3.7 SRS (point differential adjusted for strength of schedule) team versus one that's -0.6. Yes, losing Rondo and gaining Bradley has led to some good defense, but over the last 25 games the Celitcs still don't have a positive point differential. Also, if you haven't heard yet, the Knicks were just on a torrid stretch (but no, you likely dissenting MVP voter who will destroy LeBron's chance at a unanimous award, he can't be the MVP.) Injures could make this series interesting, but the injuries appear to be on both sides.

What to watch for:
-If he stays healthy, Avery Bradley has a chance to be one of the greatest perimeter defenders ever. Normally, it's a bad idea to full-court press a team unless you're in dire straights at the end of a game. Sometimes people ask, after watching college basketball, why NBA teams don't employ it more often, and the answer is that NBA players are way too skilled to be trapped and pressured that easily. Avery Bradley is so great at this he's the exception.
-Every year we wonder if this is it for the Garnett-Pierce Celtics, and last year Ray Allen actually did leave. It's been a dominating run for the old guys, but perhaps this is farewell (again.)
-With the barrage of three-pointers and Carmelo at the 4, the offense clearly works well, but they need to reshape their defense into what it was last year to have any shot at making at dent in Miami's armor. How defenders like Tyson and Shumpert look here will be important later.

Prediction: Knicks in five.

(3) Indiana Pacers vs. (6) Atlanta Hawks

This match-up is similar to Knicks-Celtics. Once you adjust for strength of schedule, the Hawks are almost perfectly average. Pacers, along with the Grizzlies, have the best defense in the NBA. Horford is an undersized center and will be hard-pressed to score inside against the giant Hibbert, although the jump-shooting could bring him out from the paint. But the Hawks already have some problems scoring, and the Pacers' best lineups destroyed the league. There are virtually no factors that bode well for the Hawks; they lost Zaza Pachulia, a center known for his tough (and) annoying play. Recent play? Hawks have been, again, average in the past 25 games, while the Pacers have had a better offense. This series shouldn't take long.

What to watch for:
-Hibbert's defense is DPOTY-worthy, and his offensive slump ended months ago, which some people forgot. So this is a guy who was named an all-star, kept up his box score production after the contract once you ignore his wrist-injury plagued games, and improved defensively more than anyone not named Larry Sanders.
-The over/under of games on NBATV is the same as the over/under of games played in the series.
-The Pacers' defense is worth the price of admission, if you're an NBA die-hard.

Prediction: Pacers in five.

(4) Brooklyn Nets vs. (5) Chicago Bulls

The series rests on Noah's foot. His plantar fasciitis will make him unlikely for at least the first game. Given the lack of depth at center, no homecourt advantage, and the location of Rose unknown -- will he make some improbable comeback when we least expect it, with a cape and a mask? -- it's an uphill battle for the Bulls. The safe bet is to take the Nets -- they played better, won more games, have homecourt, less injury concerns, and they've played better in the second half. Deron Williams, in particular, has regained his form as an all-star point guard, and alongside the gawky Lopez who's defending decently and scoring well, Evans rebounding like Rodman, and Joe Johnson ... existing, the Nets shouldn't have any problems as long as nothing weird happens. The season numbers for the Nets aren't great, but over the last 25 games they've been a strong team. I'm going to predict another boring five game series, but in the first round that's the safest bet, especially when all the planets are in alignment for one team.

What to watch for:
-Seriously, Reggie Evans rebounded like Dennis Rodman. What's fascinating is that Rodman wasn't like this when he was younger and only transformed into an insane rebounding magnet in his 30's, just like Evans.
-Lopez has worked hard on improving his defense. With his size and length he should always have an advantage inside, but his mobility and effort have never been consistent. Hibbert isn't the fast man in the world, but if he can contend for DPOTY then Lopez can be above average.
-Jimmy Butler has just joined the cast of excellent perimeter role players every contender wants: his defense is outstanding and he can hit a three. I think this group is led by Bruce Bowen with Batter as the accountant.
-You'd have a hard time convincing me Gerald Wallace actually played in the '13 season.

Prediction: Nets in five.

Western Conference

(1) Oklahoma City Thunder versus (8) Houston Rockets

The Thunder completed a season with an adjusted point differential that's 9th all-time where all eight teams above them won a title except for the '72 Bucks, who were behind the Lakers in point differential the same year. People think of their offense, but OKC somehow has the fourth best defense in the league. When your offense relies on Durant's incredible scoring, Westbrook's drives, and some efficient third options (Martin/Ibaka/Collison), and you can append elite defense, the road to the finals is already paved for you. I guess this is a way of saying the Rockets stand no chance. I think the most likely outcome is one game lost, but I think it's nearly a toss-up between that and a sweep. The Rockets, however, have an adjusted point differential much stronger than a typical 8-seed, and it's actually just behind the 2nd-seed Knicks. But the gap is too large between Houston and Oklahoma City. Maybe one game they'll rain down three's like it's all-star Saturday night, but this series is only watching for the Harden reunion/humiliation.

What to watch for:
-Harden should get an award is he has a good series here. The Rockets don't really have another good offensive option, and the Thunder know him well. He's going to be attacked.
-Asik's defense should get DPOTY-consideration because when he's on the court the team is somehow above average and would be ranked somewhere in between the Heat and Hawks for 10th, while on the bench the Rockets plummet to 27th, barely ahead of the lowly Hornets.
-Thabo has quietly become a great perimeter defender.
-The Lin/Parsons romance, which should be spun into a great novel someday.

Prediction: Thunder in five.

(2) San Antonio Spurs versus (7) Los Angeles Lakers

Predictions are difficult when you have little relevant information. Nash/Gasol/Howard without Kobe has been the lineup for five more minutes than Jordan/Wilt/Nixon was in 2013. It's really hard to ascertain Nash's contribution at this point, especially with all the rust he's collected; but a conservative estimate has him likely better than the horrible point guards LA has had all year. The series is down to Pau/Howard trying to destroy the Spurs' frontline; unfortunately, the Spurs have gone with Splitter/Duncan lineups for a long time, negating any size advantages. The Spurs are also one of the best ball movement and three-point shooting teams in the league -- just what the Lakers can't defend. Tony Parker's injury concerns could make this interesting, but the Lakers, obviously, have more problems and unlike San Antonio no depth to replace the talent. With a crazy series of uncertainty, it's best to think simple and go with a conservative estimate. A 2-seed versus a 7-seed? Duncan is healthy and playing well? Kobe's out? Yeah, the truth has to be stretched for the Lakers here.

What to watch for:
-Howard versus Duncan could be one for the ages. Howard has a chip on his shoulder and something to prove, finally healthy; Duncan is having one of the best old man seasons in NBA history and likely wants to destroy the Lakers.
-The Lakers' perimeter defense and defensive rotation are vital if they want a chance, which is far-fetched anyhow.
-The high/low game between Pau and Howard is fun to watch for anyone who loves NBA centers.
-Manu, we all miss you.
-T-Mac, is that really you?

Prediction: Spurs in five.

(3) Denver Nuggets versus (6) Golden State Warriors

The Warriors are marginally an above average team, while Denver has played like a dark horse contender lately. That's the series in a nutshell. I think Gallinari's injury is more of a concern than most people because they'll miss his defense and outside shooting, but that's only a problem when they face real competition. Lawson has a nagging heel injury, complicating matters further, but it appears it may not be an issue. The Warriors' defense is not disciplined enough to exploit Denver's offensive weaknesses, and while they've played well in the last 25 games Denver has been even better. Unless Lawson is hampered by injury, this series will be short. 

What to watch for:
-Curry's three-point shooting was otherworldly. No one has ever combined his accuracy, volume, and difficulty as he's often shooting off the dribble and contested.
-JaVale McGee. I don't need to add to that.
-The Nuggets in transition.
-Andre Miller's famed old man game: his hesitation, his footwork, his deceptiveness, his clever moves, his alley-oops ... it's all basketball skill, not brawn.

Prediction: Nuggets in five.

(4) Los Angeles Clippers versus (5) Memphis Grizzlies

This is the highest quality series of the first round and one of the toughest to predict. Both are strong teams and better than anyone except Miami in the east. The Clippers have fallen off defensively and can't contend with the Grizzlies' strength inside. Tony Allen and Conley will harass Paul to no end. The Grizzlies have the best defense in the west (though they're still not west of the Mississippi.) However, LA has homecourt advantage, a larger point differential, and have played better since the all-star break; they even won the season series with a recent win that almost single-handedly decided their seeding. 

I see the Clippers winning in most instances, but the Grizzlies can still make it a tough fight. Memphis' defense  falls off a cliff when Marc Gasol sits since they have no good defensive big men on the bench; consequently, he'll see little rest. The Clippers also have no idea what to do at center at the end of games where Jordan's free-throw shooting is problematic. Odom at center is fine for Don Nelson, not a finals contender. I'd be more convinced of a five game series if LA knew what to do with its rotation -- Bledsoe is fun but when he plays with Paul it's easier for Allen to guard Paul, Billups should be in retirement, Willie Green shouldn't be in a playoff rotation, Caron Butler at his best is a wash, Crawford has a claim for worst defender (with big minutes) in the league, Grant Hill remembers Betamax, and Barnes has been so surprisingly effective that I'm afraid Del Negro will limit his minutes somehow. (Lakers let Barnes go sign a minimum contract and desperately could use him right now.) But they have homecourt advantage. The safest play here is for the Clippers.

What to watch for:
-With the high value of defensive big men, it's difficult for Tony Allen to ever win a DPOTY trophy, but he deserves some sort of plaque for his perimeter defense.
-Speaking of defense, Griffin is somehow underrated and is slowly starting to realize his athleticism can be used for blocking shots.
-Marc Gasol's intelligent defense.
-Above the rim : below the rim :: Blake Griffin : Zach Randolph. Somehow, Griffin doesn't take advantage of this.

Prediction: Clippers in seven.

For fun, here's an early prediction of the rest of the playoffs:
-Thunder over the Clippers in five games.
-Spurs over the Nuggets in seven games.
-Heat over the Nets in five games.
-Knicks over the Pacers in seven games.

-Thunder over the Spurs in five games. 
(Depends on Parker's health and if the ghosts of T-Mac and McGrady show up.)
-Heat over the Knicks in five games.

-Heat over the Thunder in seven games.