Posts Tagged baseball
Winter retains its grip up here in Minnesota at this time of the year, but the days are getting longer and the Twins are in camp down in Florida, so Spring fever is building. I can’t wait to get out to a game at Target Field with the sun shining and our nine flagging down fly balls and bashing them out of the park.
It will be interesting to see if the bloom comes off the rose of our new stadium now that our home team has stunk up the place for two years running. However, Minnesotans are so crazy to get outdoors after being stuck indoors for half the year that they may not care that their club has regressed to its mediocre mean.
According to this article in the latest Chance magazine new stadiums do not make teams statistically more competitive. Yes, teams do increase payroll in conjunction with the greater revenues coming from flocks of fans that come with their new digs. But this drops off in a year or two and things go back to the way they were.
I am not surprised. Nevertheless, I am positive that the Twins will come around this year and make it to the playoffs. That is the nature of a true baseball fan—hopeless optimism.
“GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking.” This is how Steve Lohr of the New York Times leads off his article in today’s Sunday paper on The Age of Big Data. Certainly the abundance of data has created a big demand for people who can crunch numbers. However, I am not sure the end result will be nearly as profitable as employers may hope.
“Many bits of straw look like needles.”
- Trevor Hastie, Professor of Statistics, Stanford University, co-author of The Elements of Statistical Learning (2nd edition).
I take issue with extremely tortuous paths to complicated models based on happenstance data. This can be every bit as bad as oversimplifications such as relying on linear trend lines (re Why you should be very leery of forecasts). As I once heard DOE guru George Box say (in regard to overly complex Taguchi methodologies): Obscurity does not equal profundity.
For example, Lohr touts the replacement of earned run average (ERA) with the “Siera”—Skill-Interactive Earned Run Average. Get all the deadly details here from the inventors of this new pitching performance metric. In my opinion, baseball itself is already complicated enough (try explaining it to someone who only follows soccer) without going to such statistical extremes for assessing players.
The movie “Moneyball” being up for Academy Awards is stoking the fever for “big data.” I am afraid that in the end the call may be for “money back” after all is said and done.
Today’s New York Times sports section provided this intriguing graphic on “putting a price tag on winning”. Their hometown Yankees stand out as the big spenders by far. It paid off in wins over the last decade – the period studied. However, if you cover up the point depicting the Yanks, the graph becomes far less compelling that salary buys wins – mainly due to counteractive results enjoyed by two low-payroll teams: The Minnesota Twins and the Oakland Athletics.
I found similar patterns and, more importantly, data to reproduce these, in this study of MLB Payroll Efficiency, 2006-2008 by Baseball Analyst Rich Lederer. No offense to Rich or the NY Times – it is the damn Yankees (sorry but I am weary of them defeating the Twins every post-season) who are the blame for this flaw in drawing conclusions from this data: One point exerts undue leverage on the fit, which you can see on this diagnostic graph generated by Design-Expert® software.
However, after doing the obvious thing – yanking the Yanks from the data, the conclusion remains the same: Higher payroll translates to more wins in Major League baseball. Here are the stats with/without the Yankees:
- R-squared: 0.41/0.34
- Wins per $ million of payroll (slope of linear fit): +0.13/0.16
In this case, a high leverage point does not exert the potential influence, that is, the end result does not change due to its location. If you’d like to simulate how leverage impacts fit, download this educational simulation posted by Hans Lohninger, Associate Professor of Chemometrics at Vienna University of Technology.
Twins fever is running rampant now in Minnesota. The home baseball club is leading their division and riding a wave of popularity with a new stadium and a home-town hero (Joe Mauer – a fellow graduate of my high school in Saint Paul). Over dinner this week with a colleague and a master statistician in town for a visit the talk turned to the Twins and who now should be considered their ace pitcher – Carl Pavano or Francisco Liriano. Although appreciative of Pavano’s consistently good performance over the entire year, I felt that Liriano has come on stronger in the second half of the season. Having reached a stalemate, the three of us agreed that the data might tell the story.
I found everything I needed to make my case for Liriano at ESPN’s statistics site for Major League Baseball. Here is the pitching “split” for the second half of the MLB season so far for Liriano vs Pavano; respectively:
- Win-Loss: 7-0 vs 6-4
- WHIP (walks and hits per inning pitched): 1.27 vs 1.32
- ERA (earned run average): 2.22 vs 3.41
Pavano is good, but Liriano is my pick as the current ace of the Minnesota Twins pitching staff. Why argue with words? Let the data speak.
I managed to procure a seat to the opener yesterday for the new Minnesota Twins baseball stadium (Target Field) in Minneapolis. Although many questioned the wisdom of leaving it open to the elements, dire predictions of early games being snowed out did not materialize, at least this year. In fact, we enjoyed an unseasonable warm day while watching the hometown club defeat the Boston Red Sox to rousing cheers of the nearly 40,000 fans in attendance (39,715 to be precise). Many of the statistics for the landmark game are captured in this ESPN boxscore. Oh, oh, here I see an anomaly – the attendance reported at only 38,145 (96.6% full). My hunch is that the other 1,030 fans might be found at hometown hero Hrbek’s Bar – a spacious gathering spot in the stands of the Target Field.
Although the pre-game festivities and competition provided great entertainment, I eagerly awaited the breaks between half-innings to peruse the details in the Minnesota Twins 2010 Record and Information Book – a 396 page tome filled with 6 point type. Nearly every page features a statistic biased in favor of the team or a particular player. This is done by focusing on one specific attribute and then choosing the time frame which puts it in the most flattering light. For example, we learn on page 122 that “Anthony Swarzak became the first starting pitcher in club history to pitch 7.0 scoreless innings in his Major League debut.” I like the way they put the carry the statistic to the tenth’s decimal. : )
When offered a free $149 seat behind home plate, I abandoned my desk last Wednesday for a matinee game between Minnesota and Cleveland. It was dollar-a-dog day so I enjoyed a $2 lunch washed down by what the Wall Street Journal (9/11/09) considers to be one of the Major League’s best-valued beer. They figure that the cost per ounce of brew – 27 cents at the Metrodome — should be based solely on the quality of the team.
The Twins have been consistent winners for some years and remain in the hunt for the Central Division of the American League after winning 2 out of 3 from the Detroit Tigers this weekend. Nevertheless, they chronically play second-fiddle to East Coast teams like the Boston Red Sox. WSJ reports that the crimson-hoser’s home field, Fenway Park, charges more than twice as much for beer for the Twins – far more than they merit by their winning percentage – a 46% overcharge, to be precise. Boston fares poorly on price-to-value across the board according to this in-depth analysis by a blogger writing as “JinAZ” for Beyond the Box Score – a Saber*-Slanted Baseball Community.
However, being a lifelong fan of baseball, and having been to many ballparks around the League, but never Fenway, I’d gladly pay a premium to see the Red Sox some day. (That explains why they can charge so much!)
*Referring to sabermetrics — a statistician specializing on baseball
On May 17 I reported that sabermetrician Clay Davenport computed 200 to 1 against the Minnesota Twins making the Major League Baseball playoffs. Guess what? Not only did they achieve a place in post-season, they won their divisional championship. Granted, it was very unlikely the way the Twins turned their season around, and it was downright surrealistic for them to end up in first after their very last game. As I blogged earlier, statistics be damned by what we now know:
Twins win improbable division title.
One might do well by betting on the Twins when they are down again in future. For example,at the beginning of the 1991 baseball season, odds on 1990’s last-place Twins winning this year’s title were 100-1. They ended up as the World Series champs. Similarly, in 1987 the Twins went all the way in Major League Baseball. Prior to that season I went to Las Vegas for a conference and saw a betting board with odds at 100 to 1 against the Twins winning the championship. Ever since I’ve second-guessed myself for not betting anything — even $10 would have netted me $1000! Unfortunately, I am a man of little faith in the face of such overwhelming statistics.
PS. Post season results were not good — three games and out for the Twins in their playoff series with Oakland. This is a triumph for sabermetrics because the Athletics are led by its biggest proponent –Billy Beane.