Wednesday, March 9, 2011

How to Pick a Starter Part I

There will, in the upcoming baseball season, be many, many blogs trying to identify those pitchers who've been unlucky (e.g. James Shields), and should be acquired at all costs, and those pitchers who've got great numbers which are little more than luck (e.g. Trevor Cahill), and therefore must be traded ASAP to some poor sap who doesn't follow advanced statistics.

So which stats should we use to judge future pitcher performance? Can stats divide the good from the lucky and the unlucky from the rubbish? To keep things simple, we'll begin with looking at what past ERA tell us about future ERA.

The Method

From Fangraphs, I downloaded the pitching stats of all qualified starters going back to 2005. In this initial analysis, I asked how consistent a pitcher's ERA is from year to year. To do this, I plotted for each pitcher his ERA  in year 1 vs his ERA in year 2 - and to increase sample size (n=267), I repeated this for each pair of years - for instance, in today's graph, the data is the correlation between 2005 ERA and 2006 ERA, 2006 ERA and 2007 ERA etc, all the way through to 2010. I've treated each yearly pair as an individual data point so that many pitchers stats appear more than once. By looking at the correlation between all these individual pitchers' ERAs from year to year, we can judge how good ERA is at predicting future success (at least as measured by ERA).

How good is this year's ERA at predicting next year's ERA?


We can measure the correlation between data points by calculating the R^2 for a line of best fit through the data points. If every pitcher's ERA was identical from year to year, R^2 would equal 1. Obviously this is impossible. Instead, the (low) R^2 of 0.13  for the correlation between this* year's ERA and next* year's ERA suggests that an individual pitcher's ERA is very variable from year to year.

Helpfully, this data also gives a baseline, so we can ask whether a given pitching stat (e.g. WHIP, K:BB ratio, FIP, xFIP) is better (R^2 >0.13) or worse (R^2<0.13) than ERA at predicting future ERA and in this way identify the stat that is the best predictor of future pitching success). Those exciting analyses are coming very soon.

*where this and next can be any pair of consecutive years.

No comments:

Post a Comment