Friday, March 18, 2011

The predictive power of K:BB ratio: you'd be better off using a psychic

Ah, K:BB ratio, beloved metric of the ESPN fantasy baseball analyst.

By the end of last season, ESPN must've comment upon every pitcher's K:BB ratio in at least one fantasy baseball 'news' bulletin. The assumption appeared to be that a great K:BB ratio indicated that despite a bad ERA, better days were on the horizon (hello James Shields, Jason Hammel, Scott Baker et al), whilst a bad K:BB ratio but good ERA suggested luck and a forthcoming regression to the mean (hello Trevor Cahill).

Trouble is, K:BB ratio is not very predictive - i.e. it is easy to have a bad K:BB ratio and good ERA or a good K:BB ratio and bad ERA. This is quite evident from the list of last year's qualified starters ranked by K:BB ratio - in amongst the good pitchers like Roy Halladay and Cliff Lee, you'll see the undraftable likes of Doug Fister and Rick Porcello.









 Qualified Starters sorted by K:BB ratio

And here is the predictive value of K:BB ratio in graphical form (for the method behind the madness, click here):


An R^2 of 0.05 is rubbish, and means that K:BB ratio is less predictive of future performance than ERA - i.e. you'd be better off assuming that a pitcher with a high ERA will continue to have a high ERA and vice versa, ignoring the K:BB.

There are things that are better than ERA at predicting future performance, but K:BB ratio ain't one of them. Stay tuned to find out what they are.

Thursday, March 10, 2011

Sabermetrics and Moneyball are wrong

A new book that revels in small sample sizes and random pigeons (probably, I haven't read it) conclusively proves that Moneyball and Sabermetrics are wrong. After all, what have the Red Sox won since Theo Epstein took charge?

http://www.prweb.com/releases/2011/03/prweb5139404.htm

Wednesday, March 9, 2011

How to Pick a Starter Part I

There will, in the upcoming baseball season, be many, many blogs trying to identify those pitchers who've been unlucky (e.g. James Shields), and should be acquired at all costs, and those pitchers who've got great numbers which are little more than luck (e.g. Trevor Cahill), and therefore must be traded ASAP to some poor sap who doesn't follow advanced statistics.

So which stats should we use to judge future pitcher performance? Can stats divide the good from the lucky and the unlucky from the rubbish? To keep things simple, we'll begin with looking at what past ERA tell us about future ERA.

The Method

From Fangraphs, I downloaded the pitching stats of all qualified starters going back to 2005. In this initial analysis, I asked how consistent a pitcher's ERA is from year to year. To do this, I plotted for each pitcher his ERA  in year 1 vs his ERA in year 2 - and to increase sample size (n=267), I repeated this for each pair of years - for instance, in today's graph, the data is the correlation between 2005 ERA and 2006 ERA, 2006 ERA and 2007 ERA etc, all the way through to 2010. I've treated each yearly pair as an individual data point so that many pitchers stats appear more than once. By looking at the correlation between all these individual pitchers' ERAs from year to year, we can judge how good ERA is at predicting future success (at least as measured by ERA).

How good is this year's ERA at predicting next year's ERA?


We can measure the correlation between data points by calculating the R^2 for a line of best fit through the data points. If every pitcher's ERA was identical from year to year, R^2 would equal 1. Obviously this is impossible. Instead, the (low) R^2 of 0.13  for the correlation between this* year's ERA and next* year's ERA suggests that an individual pitcher's ERA is very variable from year to year.

Helpfully, this data also gives a baseline, so we can ask whether a given pitching stat (e.g. WHIP, K:BB ratio, FIP, xFIP) is better (R^2 >0.13) or worse (R^2<0.13) than ERA at predicting future ERA and in this way identify the stat that is the best predictor of future pitching success). Those exciting analyses are coming very soon.

*where this and next can be any pair of consecutive years.