More on the Challenge

January 14th, 2009

I have to admit, just because I haven’t been writing about it, thoughts of this challenge haunt me all day long.  I have always put pressure on myself, but this is getting a bit out of hand.  I go to bed and lie awake thinking about possibly tweaking my figures a bit, and then I wake up and don’t exactly remember what I thought about the night before.

Regardless, I’m getting closer.  I would like to note that I ran multiple regressions on figuring plate appearrances from previous years’ totals, and it looks like Marcel is pretty close.  This is good to know, but it is also a double-edged sword.

Fact is, I was hoping to come up with something better, but Tango is right.  Pretty much all projection systems are about the same amount right and the same amount wrong.  Problem is that if I base mine on what Marcel is doing, I am going to correlate to Marcel at something like .990, which is ridiculous, I think.  I want to come up with something different that ends up somewhere in the middle of the pack of all these other projection systems.  Anything in the top half of the table would be considered a victory for me. However, judging by the scoring system, where only the top 5 out of 25 people or so score points in each “league”, I would just like to earn some points.

Fact is, about 600 batters go to the plate at least once every year, and I have to figure out who all of them are going to be.  I have noticed that the number has been slowly rising since 1990, which is as far back as I have checked.  I bet that has something to do with the Kenny Lofton phenomenon, where a player could contribute, but no one signs him because of age and saving money.  Then, you end up with three different utility players sharing time between the minors and the majors making up for where Kenny would have played.  Is it good or bad for baseball, I don’t know, but there’s probably a bunch of reasons I’m not even thinking of (like expansion).  By the way, about 500 of those 600 played the year before that so the trick will be coming up with the other 100, and the 10% (see Evan Longoria) that actually get some good playing time.

Anyway.  My plate appearrance correlation is higher and my average error is lower than that of Marcel at the moment, but I am lower in correlation in things like home runs, and RBIs.  However, my average error is right on, where Marcel ends up being off across the board.  The problem is that I have no idea if what I have is beating Marcel or losing.  It’s not really important either way, but if I could regularly beat Marcel, I would at least have a chance in the challenge.

So where am I at?  Well, I am pretty happy with my plate appearrance formula.  It seems that either the one where I take half of last season and add in a couple tenths of a three year average and then add 100, or the one where I take 65% of last season and add in 10% of the season before and add about 85 both beat Marcel in correlation, average error and absolute error.  After that, it gets tougher.

I’ve been messing with the weights, and I have found that my HR correlation jumped up when I used a 9/6/4 (Marcel uses a 5/4/3).  I regress to the mean 1000 PA, which ends up being much less than Marcel, because my multipliers are higher.  Marcel’s multipliers add up to 12 and he does 1200 PAs.  Mine add up to 19 and I use 1000, so it’s about half.

But I am only happy with Home Runs at the moment.  I would like to have a different multiplier for each statistic that I need.  I am aware the formula only calls for HR + SB + (H - .27*AB) + R/3 + RBI/3, but I am going to need to add in 2B and 3B and SB and CS, as my R and RBI statistics are based on regressions of those statistics.

I think where I am going to end up is that I come up with a different multiplier for each and every statistic, and then I use my regression formulas for R and RBI.  I then average my “projections” based on multipliers and the regression projections to come up with a third number for R and RBI.  I will not do this for HR, SB or H, as those statistics are not based on other factors, such as the batter in front or behind the person in question.

And then, I will tackle pitching.  Man, am I running out of time.  I hope that I a) don’t embarrass myself and b) that this happens again next year, because I have thoroughly enjoyed thinking about this so much and I believe that I can come up with something in the next year that will have a shot.

But we’ll have to wait and see.

Forecasting , ,

Digging Deeper

December 22nd, 2008

I took a deeper look at Marcel, and it looks like I was a bit incorrect.  It seems that it can definitely project a player’s worst/best season, for one.  For another, using three years is likely good enough to produce as accurate a projection as anyone is going to get.  If a player like Jimmy Rollins gets 700 plate appearances three years in a row, you’d better believe that a guess of around 700 should come for the next year.  You can’t guess that a player that hasn’t been injured in the past three years would be injured or have any different amount of playing time than the year before, unless you heard it in the news or something.

And that’s not what I’m after.  I do intend, in the challenge, depending on the “due date,” to at least take a look at each team’s depth chart on MLB.com, but I likely won’t change much, save for possibly inflating some plate appearances for rookies, but even if I were to do that, I would have never hit on Evan Longoria last year, as he wasn’t even on the opening day roster.

So, the goal is to be close–moreover, closer than others.  I think I can do that on some levels.  I believe Marcel is way off in determing plate appearances.  I mean, it takes a safe road when it comes to guessing.  But, go back to Jimmy Rollins, or just pick a player who gets 700 Plate Appearances year in and year out (see:  Bobby Abreu 2004, 2005 and 2006).  Marcel would take Abreu’s 686 from 2006 and take 50% of that, take his 719 from 2005 and take 10% of that, and then add 200, giving Abreu a projection of about 615.  That’s not too bad, but it seems to undercut the average a bit.  Secondly, it regresses everyone to around an average of 500 plate appearances, which is good if we were looking at all everyday players.  Fact of the matter is, however, there are eight position players on each of 30 Major League Teams.  This means that, at most, 240 players will reach the 502 plate appearance plateau.  Another fact is that over twice that number of position players get plate appearances every year.  The bulk of the players get below that amount.

The truth is that the average number of plate appearances for a position player in Major League Baseball is about 340-350 each season.  This means that regressing the number of plate appearances to 500 is too much.  Secondly, the average player that gets over 340 trips to the plate one year ends up with less the next year.  This also works on the other side of the line.  I feel the best thing to do is to attempt to regress the plate appearances toward 350 instead of 500 like Marcel does.

Secondly, from 1996-2007, the average number of plate appearances for a first-timer was about 100, not the 200 that Marcel projects.  Since all I am going to have time to do is to look at the league averages for these players, I would rather weed them out by starting them out at 100 instead of 200.  200 Plate Appearances is a lot.  Ramon Santiago spent the entire 2008 season in Detroit last year and got 156.  I don’t want to over-project (though I bet a lot of people, including me, miss on his numbers this coming year).

Forecasting , , , , ,

Forecasters’ Challenge

December 12th, 2008

I have been given the chance to compete in the forecasters’ challenge.  Not that I know too much about extrapolation or what have you, but I am still interested in giving it a shot.  I got the conformation email over a month ago, and have been trying a few things, but I really got started today.

I have a two-tiered goal for this.  First, I don’t want to do so horrible that I lose to simply taking the previous year’s statistics as the system.  I think I can beat that.  However, I want to win, which would be the second tier.  The fact of the matter is, though, that even the best system might now win.  There is a lot of luck involved and, as it has been shown, some of them are very close to one another when all is said and done.

Marcel is very simple, and though I have about the amount of mathematical knowledge to compete with it, I feel I can do better.  I haven’t really looked at pecota because it requires being a baseball prospectus member (I let my subscription lapse), but here are some of the problems I have with a lot of projections that I have seen.

1.  They only use the last few years.  Fact of the matter is, that just about 1/2 of position players in a given season have five years experience or more.  It is extremely possible that it is not necessary to use more than a couple or three years, but I at least would like to look into the possibility of using as much history as each player has, if it would help.

2.  They are not set up to project a player having his best or worst ever season.  Even worse, they are not set up to project a player having his best or worst season in the past few years.  Even though Marcel regresses to the mean, unless a player’s last three seasons are very closely related and they are all far enough above or below the mean to matter, it is unlikely this will happen.  The age factor is in there as well, but for all intents and purposes, the score will end up somewhere between the highest and lowest of the last three seasons.  I’m not sure I can better this, but I would like to look at it a bit.

3.  I understand they have to come out relatively early to help people that want them, but I still think that they can do a bit better on new players.  This would take time, but maybe a 25 or 40 man roster guess in plate appearances other than using 200 like Marcel might do something.  I don’t know.

What I do like about Marcel is that it figures plate appearrances first and then figures secondary statistics based on those plate appearrances.  The first thing I will do is attempt to determine if I can come up with a formula to extrapolate plate appearrances that has a better correlation coefficient to true numbers than Marcel.  If I can do that, I, at least, don’t think I will be embarassed.

Forecasting , ,