Tuesday, March 15, 2016

Baseball Analytics: SKYNET Is Activated

I attended the fifth annual Society for American Baseball Research (SABR) Analytics Conference held in Phoenix from March 10-12.  Among the 250 to 300 attendees, were several noted seminal figures in the development of baseball analytics and baseball history.  It was a thrill to see a couple of rows ahead of me, John Dewan, Dick Cramer and John Thorn sitting together.

For those not familiar with the topic, analytics is:
"the discovery and communication of meaningful patterns in data [and] relies on the simultaneous application of statistics, computer programming and operations research to quantify performance."
Analytics are used in many fields of human activity.  In baseball, the best known of the analytic approaches is sabermetrics, the empirical analysis of the sport or, as Bill James described it, "the search for objective knowledge about baseball". 

Readers of this blog are aware by now of THC's love of baseball history.  He also started reading Bill James in the early 1980s and has followed developments on the analytical side, although some aspects in recent years have been a bit mystifying to me. The conference was a good opportunity to immerse myself in its current status, though I'll admit that the math in a couple of presentations went over my head.

The format of the conference is a mixture of panels and research presentations.  The panels include analysts, baseball front office personnel - this year from the Diamondbacks, Orioles, Giants, Mariners and Padres as well as the new General Managers for the Reds (Dick Williams) and Angels (Billy Eppler, with whom I was particularly impressed, despite the fact that he used to work for the Yankees).  The panels were outstanding.  I've attended a lot of conferences and it evident that a lot of thought went into the topics and the mix of panelists and the three moderators, Brian Kenny of MLB Network, Vince Gennaro, President of SABR, and Mike Ferrin from Arizona Diamondbacks/MLB Network Radio, did a terrific job asking questions and keeping the discussions on an interesting track.

Also participating on the panels were four former major leaguers, all now analysts for either ESPN or MLB Network; Aaron Boone, who once did a very bad thing to a pitch from Tim Wakefield; the  blunt and sarcastic Dallas Braden (winner of the much coveted Best Facial Hair trophy); (Aaron Boone, Eno Sarris from FanGraphs, Dallas Braden and Mike Ferrin, from Sports Illustrated for Kids)

the startlingly spontaneous and unfiltered Eric Byrnes - you never knew where he'd land when he launched one of his verbal excursions, but following the trajectory was always entertaining and informative; and Alex Cora, who was very insightful (more on Alex, below).

I'll lay out my main takeaways from the conference, followed by a summary of some of the more interesting panels and presentations and ending with some odds and ends that you might find entertaining. 



Last season was the first in which STATCAST collected data on every major league game.  In terms of the sheer amount of data points it is revolutionary.  Gennaro mentioned that 99% of all the data in baseball history had been collected in the first game of the 2015 season.  Most of us see STATCAST during MLB Network game broadcasts and on its shows - things like the speed and track of an outfielder as he runs to field a ball, but it is much more than that.

Statcast, developed by Major League Baseball Advanced Media, collects data using high-resolution optical cameras and radar equipment that has been installed in every major league ballpark.  It measures the position of every player in the field at all times and the position and movement of the ball.  Among the many specific items it measures:

- Spin rate on pitches
- Pitcher arm slots
- Pitcher position on rubber
- Exit velocity for batted balls
- Launch angle when ball is struck by bat
- First step reaction time for fielders when balls are hit
- Outfielder speed and route efficiency

If you want to know more here is a primer on Statcast as well as its homepage.

The data dump is enormous but what it means is another thing altogether.  Several speakers mentioned that the key now for clubs is to figure out what data is actually useful.  Compounding the difficulty is that there is only one year of STATCAST data at this point, so determining what is normal or average performance is premature in most instances.  What it means is still to be determined, but it looked to me like we've entered a new world of information.

The only thing we need to fear is if STATCAST, like SKYNET, becomes self aware and starts running the game.

(2027 Philadelphia Phillies roster?)

Process v Outcomes

I was struck by the implications of STATCAST combined with some of the advanced neurological and physical work being done by some of the presenters.  For instance, Jason Sherwin of Decervo is measuring how the visual system is connected to the neural system by measuring pitch recognition and the differences between the neural decision to swing and the physical initiation.

Many of the existing sabermetrics measure results (WAR, for example) and are focused on the relative evaluation of players.  The data from STATCAST and the emerging technologies to measure visual, neural and other physical systems have an additional element that can be used to train and develop players.  Quite a few of the baseball front office panelists as well as the ex-players stressed the importance of this process oriented data.  It will require some new skill sets in order to convey it in useful form to players, but the overall feeling was it may be more accepted by players compared to outcome measurements.  In addition, because STATCAST is visual, as well as measured, it is less "black-box", thus improving the conversation with users.  A couple of panelists noted that Statcast has already changed the nature of the conversation between coaches and players and the potential opportunity for players to use it for their own improvement is enormous.

On a cautionary note, Eric Byrnes, responded to Vince Gennaro's claim that the new STATCAST data will help coaches be more effective with players.  While endorsing analytics in general, Byrnes remarked that: "Some of the greatest hitters are some of the dumbest guys I've ever met", adding that hitting is a reactive event and you can mess someone up by trying to put too much in their head.

Growth of Team Analytics Capabilities

One speaker likened finding the most important data amidst the mass of newly available data to "not looking for a needle in a haystack, but rather looking for THE needle in a huge stack of needles.

The data avalanche is precipitating the further growth of existing club analytics operations and prompting the last holdouts, clubs like the Tigers and Phillies, to set up analytics groups over this past off-season.

Along with math and statistics backgrounds, clubs are looking for people with training in physics (not physical therapy, physics), database construction, knowledge of programming languages like Python and data visualization.

There's an arms race in baseball operations, not just on the mound.  Everyone is searching for how to use this new data and someone is going to end up with the competitive advantage if they can figure it out.  It was noted that the investment cost for analytics is very small compared to the enormous cost of player salaries and the potential return on that investment can be very high.

The Joey Votto Thing

Votto's name must have come up more than all other ballplayers combined, and it happened each day.  As most baseball fans are aware the Votto Thing is the controversy around Joey's refusal to swing at any pitch outside the strike zone, even with runners in scoring position.  Votto has remarkable command of the strike zone and ends up taking a lot of walks.  Eno Sarris of FanGraphs related a conversation with Votto in which he said that by maintaining plate discipline he can extend his career.  Another panelist mentioned that Votto told him that he tried getting more aggressive and pulling the ball for about a month to get "cheap" home runs but it didn't work, so he went back to his normal approach.  This article by an ESPN writer who attended the conference contains more details.

(Joey Votto, not swinging)


Defensive Metrics Panel (Alex Cora, ESPN, John Dewan, Owner, Baseball Info Solutions, Caleb Peiffer, Manager Baseball Ops, Seattle Mariners)).

This ended up being primarily a discussion on how defensive shifting was changing baseball.  Dewan pointed out that only a few years ago the Tampa Bay Rays led the majors, shifting 200 times in a season.  In 2015 the Rays and Astros each employed shifts about 1400 times and every team in the league shifted more than 200 times.

Dewan stated that his analysis showed "the more you shift, the more runs you save" (about 20-25 for the Rays and Stros last year) and that extreme shifts (where the SS is on the first base side of second) were much more effective than partial shifts (where the SS moves towards second, but remains on same side).

Alex Cora made several intriguing comments regarding shifts.  He mostly favors increased use, but says the biggest problem is the loss of double plays because shifted infielders are playing in unfamiliar positions. He added that teams are now beginning to practice turning double plays and making relay throws from shift positions.
                                                            (Alex Cora)

Cora went on to say that the skill sets for most infield positions are different because of shifting.  First basemen need to be more explosive moving to the bag and able to handle awkward throws.  Second basemen need to be quick coming in and have a better arm than traditional players at that position  Third basemen must be more athletic because they will sometimes be playing SS or 2B - quick reflex but slow moving sluggers as third basemen are less valuable.  Shortstop skills are the only ones that remain the same.  Overall, that's why utility men who can effectively play all infield positions are becoming more valuable.

The new rule on sliding into 2nd was discussed.  The rule restricts types of slides but also eliminates the phantom double play and makes whether the second baseman or shortstop touched the bag reviewable.  Cora, who played a lot of SS and 2B, hates the new rule.  He said he only touched second about 5% of the time during his years in the majors and that by forcing the fielder to do so it will actually increase the risk of collision.

Unintended Consequences of Rising Strikeout Rates (Rob Mains, OnTheFieldOfPlay.com)

An interesting presentation that started by noting that 10 of the top 13 years in baseball history for frequency of batters Hit By Pitch (HBP) have occurred since 2000, with the other seasons all occurring before 1910.  As a side note, fans my age grew up hearing about how "in the old days" pitchers threw more high and inside, but Mains pointed out that the lowest HBP rates in baseball history are between 1925 and 1950, with frequency less than half of today's.  According to Mains, it is related to the rising strikeout rates, which have reached epidemic proportions with 21% of at bats resulting in a strike out. 

The most significant discovery by Mains is that 2014 and 2015 were the first years in baseball history when more at bats ended with the pitcher ahead of the count than the batter.  He found that when an at bat ended with pitchers ahead in the count there were a statistically significant fewer sacrifice flies and more hit batsmen and wild pitches (the latter was 3x more likely).  Mains also noted that doubles, triples and home runs were reduced by a statistically significant number when pitchers were ahead, though not singles.

Mains concluded that the critical factor is not overall worse control of pitches, as 2014 and 2015 saw the fewest walks since 1968, but rather that with pitchers ahead of the count more, they expanded the zone, which he demonstrated with head maps showing distribution of pitches when pitchers were ahead and behind - the difference was striking.  The result was with more pitches off the plate, inside and out and low in the zone there were more opportunities for hit batsmen and wild pitches.

As with all these research projects, ideas for further analysis were prompted by the presentation.  Given that pitchers are throwing faster than ever, is there a correlation between pitching speed and increases in wild pitches and hit batsmen?

Hidden Gold on the Diamond?  The Contribution of the Relative Age Effect to Talent Estimation Errors of High School Players in the June MLB Draft (Robert Brustad, Professor, School of Sport & Exercise Science, University of Northern Colorado)

Brustad's analysis focused on how age on draft day for high schoolers was relevant to subsequent performance, taking as its starting point that astute drafting of high schoolers has proven more difficult than drafting of college grads.  The difference in ages at the high school level doesn't seem like much (a year in most cases is as much as it gets) but Brustad remarked that at that age there is huge variability in maturation, physical and mental.  Older high school players tend to perform better, but based on the data Brustad presented, baseball teams overvalue current performance vs potential performance.  He found that the youngest quartile of draft players in the 2005 through 2012 drafts had more than twice the major league WAR value of the oldest quartile.  In seven of eight years, the youngest quartile performed best (in the eighth it was one of the middle quartiles). This was the same result an earlier study found for the 1965-96 drafts, prompting Brustad's comment that he was surprised teams had not learned anything in the intervening years and remain poor at talent projection.

Quantifying the Impact of Injuries on Playing Time and Performance (Joe Rosales, Baseball Info Solutions (BIS))

Realizing there was very limited injury information (outside of the Disabled List) being systematically collected, at the beginning of the 2015 season BIS began constructing a comprehensive injury database, including things like each time a batter fouled a ball off his body. Rosales reported on the first year's data.  BIS tracked about 4700 incidents of which fouls off body (1375), struck by ball/bat (1246) and Hit By Pitch (1003) constituted nearly 80% with no other category having more than 200.  Not surprisingly, catchers bore the brunt, having nearly 25% of the total incidents (pitchers were second with about 8%).  The five players with the most incidents were all catchers:

Salvador Perez (80)
Francisco Cervelli (66)
Russell Martin (53)
Derek Norris (50)
Stephen Vogt (47)

About 90% of the catcher events were being hit by foul balls or bat swings.  Nearly 40% of the impacts were to the head.  Perez had both the most head impacts and games with multiple head impacts.  Rosales noted that in the week after a game with multiple head impacts, catchers experienced a significant decline in offense.  While he noted it was a small sample size, he raised the question of whether these impacts were having a cumulative effect.

The only other significant finding was for 14 days after a pitcher had a head impact, he lost about 1 MPH off his fastball.  As more data is collected in future years, additional studies may lead to more insight.

Regarding Perez's abilities, in another panel discussion, one of the participants told of a conversation with Royals first baseman Eric Hosmer in which he said the reason his team had such great success in preventing runners going from 1st to 3rd on singles was due less to the outfield throwing arms than to Salvador Perez.  In the case of almost every opposition runner reaching first, that team's 1B coach warned them about getting too long a lead because of Perez's ability to do a snap throw to first.

(Salvador Perez, injured; photo from CBS Sports) 

Splitting Range, Positioning, and Throwing in Defense (Scott Spratt, Baseball Info Solutions)

Spratt presented improvements in calculating Defensive Runs Saved (DRS) that better take into account positioning and what happens once balls are fielded by breaking down components into Range, Positioning and Throwing.  Some of the results were eye-opening.  At shortstop, all of Andrelton Simmons performance came from range and throwing, while his positioning was very poor.  Didi Gregorius was the opposite, performing poorly on range and throwing but outstanding in positioning with a net result close to that of Simmons.  Francisco Lindor's numbers in every category were amazing, based on his half season.

At second, Jose Altuve had poor range, good positioning and phenomenal throwing stats.  Among third basemen, while Nolan Arenado and Manny Machado were the best overall, Kyle Seager was #1 in throwing.

Another interesting analysis was the difference in how teams saved runs defensively,  Spratt compared the breakdown for the Giants and Indians.  The Giants infield saved 34 runs, 14 via Range, 19 by Throwing, but only one by Positioning.  In contrast, the Indians saved 19 runs, with the 21 saved by Positioning, offset by two by Throwing and -4 in Range.

How Big Data and Analytics is Impacting Baseball's Business Operations (John Fisher, Senior VP, Ticket Sales & Marketing, AZ Diamondbacks, Ryan Gustafson, VP, Strategy & Innovation, SD Padres, Dan Migala, Chief Innovation Officer, PCG & Sports Desk Media)

Baseball business operations cover everything a franchise does not involving the actual play of the team and player development and contracting.  Operation of the stadium, ticketing, merchandising and cable, radio and web presence come under business ops.  While every club know has a baseball analytics group, only about half the teams have business analytics groups, though the number is growing.

The need for analytics is that unlike most other major sports, baseball teams rely on local resources for the bulk of their funding.  NFL teams get 80% of their revenue from the league (derived from TV, sponsorships and merchandise), while with most baseball teams it is the opposite, and apparently concessions and parking are relatively small monetary sources.  John Fisher remarked that 54% of the Diamondbacks revenue comes from ticket sales, placing a premium on understanding customers and retaining and growing season ticket holders.  He mentioned that 12 hours after the signing of Zach Greinke was announced he got an email from the team owner, telling him he needed to generate a lot more revenue!

Fisher and Gustafson walked us through several analytical tools they are using to increase revenue, including predictive models of the the lifetime value of potential customers to their franchises and, in the Diamondbacks case, how a new look at season ticket holder data led them to completely revamp their approach towards retention.

One tidbit from the Diamondback was their realization that their hard core, baseball savvy customer base attended games Monday through Thursday, while the weekend demographic was different, has led them to different approaches to what is displayed on the scoreboards - with the weekday scoreboards, featuring more player and statistical information.  The Diamondbacks also introduced new uniforms this season, more along college lines, to attract younger fans.


Dallas Braden remarked how great a teammate Yoennis Cespedes was when both were with the A's.  He made everyone in the lineup feel better.

Billy Eppler (Angels GM, who spent eleven years with Yankees):
- Gene Michaels is the best evaluator of talent
- The best advice he got was from Brian Cashman - don't make decisions in first 24 hours after something good or bad has happened.
- He also quoted Alex Rodriguez urging him to be more open about how management evaluates players; "tell the players what you value, they will make themselves that way".  Alex certainly did.

Several of the ex-players talked of the importance of team chemistry and having a positive workplace beyond whatever is measured in the metrics.  Gennaro spoke of his interviews with players in which they all stressed the importance of this, particularly because of the isolating nature of the batter/pitcher confrontation, positive team situations were highly valued.

Baseball-Reference.com founder Sean Forman, responding to a question about the controversy over Ty Cobb's career hit total: "when I get to heaven, I'll see God's Baseball-Reference and finally know what Ty Cobb's hit total really was".


  1. Well done, fascinating, "Moneyball" comes to mind. Worthwhile info as long as it's doesn't create overthinking athletes, but maybe technology, reliable statistics and quality training will handle that variable over time. Good stuff in AZ! dm

  2. This is great, Mark. You should hear the Joey Votto rancor on the Cincinnati talk radio. Personally, I don't think he should change a thing. The Reds are going to be terrible anyway, but ultimately I believe his strategy of some sort of small ball will be proven to be the most efficient way to deliver. He is not a supremely talented player - he's good - but he's borderline great because of the thought and the discipline.