Smart Baseball

Home > Other > Smart Baseball > Page 26
Smart Baseball Page 26

by Keith Law


  As part of writing this book, I interviewed numerous GMs, front office executives, and team analysts, and a major question I asked them all was what they thought the future of analytics in baseball entailed. Their answers had two common themes: MLB’s Statcast data stream represents a quantum shift in how the industry uses analytics; and further advances will be incremental, rather than exponential, as all teams have access to the same data and any insights will spread more quickly throughout the game. David Forst, GM of the Oakland A’s, said that “Statcast is the ultimate playing field leveler; everyone can have access to everything on the field being measured. How much of your resources you put into breaking down that data and trusting it in your decision-making” will be the separating factor among clubs, instead of just different teams possessing different data.

  The Cardinals’ GM, John Mozeliak, has managed one of the most forward-thinking front offices in the game since the start of 2008, and summarized the view of many of his peers when he said, “When you think about what tomorrow’s going to look like, clearly the margins are getting tighter in terms of taking advantage of something the Cardinals may have had going for us ten years ago versus today.” The gains are smaller, and they may not last as long as analysts or scouts change teams or as information leaks out into the public sphere.

  One of the more surprising outcomes of my conversations with executives about the future of analytics was how many brought up injury prevention and rehabilitation as an area for future progress. The medical or training departments were typically their own islands, reporting back to the general manager but working largely independently of the rest of baseball operations. Now teams are integrating baseball data with their medical operations to try to predict injuries before they happen and to try to help players come back from injuries more quickly.

  A former GM cited it as the biggest area for future advances and potential competitive advantages for teams. “Injury prevention’s a big deal. A lot is being done there, with people trying to get a basis for it. If we can eliminate DL days, that’s huge. There’s more work being done with deliveries, pitch types, and what the impact is” of pitching mechanics or use of certain pitches. Other executives cited looking for pitchers whose velocities or spin rates are declining, using those data as a proxy for fatigue, shutting the pitchers down before they suffered any structural damage to a ligament or tendon. In late August 2016, when the Padres demoted pitcher Ryan Buchter to triple-A, his manager, Andy Green, specifically cited a reduced spin rate as evidence that Buchter was suffering from fatigue.

  Another GM mentioned that “having all of this data allows you to track player wellness better and hopefully cuts some potential injuries off at the pass. Even guys’ running speed, when they’re getting treatment for leg stuff, you see their speed drop down you know to give guys an off day. It might be some of the most low-hanging fruit out there, minimizing DL days, optimizing health and wellness.” He mentioned a rise in the number of companies trying to sell systems or tools that attack this goal, although their efficacy is a wild guess at this point. One analytics director, breaking down the roles of the various people in his department, specifically mentioned having a staffer “who’s more injury sports science rehab oriented,” as opposed to others who focus on the draft or work with the major-league coaching staff on positioning players.

  One point that came up repeatedly in these conversations is how the Statcast data rely on a two-dimensional system when, of course, players are three-dimensional objects moving in three-dimensional space. Teams have started to look for analysts with backgrounds in 3-D modeling, including those with physics backgrounds, to try to tease more insight out of what the Statcast data can provide, but the bigger leap forward would come with data that report on three dimensions. Think about how wearable technology could apply to the movements of a baseball player; the market of such “smart” technologies already includes elbow sleeves for pitchers and gloves for batters. If a player is susceptible to hamstring pulls, what if he could wear something on his thigh to track his leg movements and try to identify what’s causing these pulls or spot weakness in the muscle before it strains?

  Another analyst suggested the possibility of embedding sensors in the ball or the bat to measure things like spin rate—which is technically two numbers, since the ball spins on two axes—more accurately. This would require an up-front investment by the league itself, but that’s what MLB did for Statcast, recognizing that the teams would value the data and that some of the information could be turned into content to enhance the viewer’s experience. Again, more precise data about the movement of the ball or the bat could open the door to greater insight on how players get hurt and perhaps on how to reduce the odds of them doing so.

  Player development is not typically considered when writers or fans think about the impact of statistical analysis on baseball teams, but that integration has already started for many clubs, and several executives highlighted it as an area for further advances. Mejdal specifically cited the work that his colleague Mike Fast, who created the first widely used catcher-framing metric before joining the Astros, has done in helping work with young catchers in Houston’s system; even Jason Castro, a longtime big leaguer, saw his framing improve dramatically in 2015–16 due to work with the team’s coaching staff and analysts. Mejdal may seem to state the obvious when he discusses using the results of analytical work with players, but this philosophy is just now becoming widespread in the sport: “If you hope to have the best players and the best staff, how could you withhold something that would enable them to be better players and better staff?”

  Every MLB team now has the ability to install the TrackMan radar-based system in all of its minor-league stadiums as well; even low-revenue teams like Oakland have done so, meaning there’s no excuse for any team to choose not to install it. That means the granular data we see for big leaguers, like spin rate or exit velocity, are now available to teams on their own minor-league players, as well as opposing players on visiting clubs. (Teams have the option to trade their home parks’ data to other teams for those teams’ data.) So teams can now use this information to identify players with particular skills or who may be good targets for development, such as having a pitcher use a pitch more frequently or trying to change a hitter’s launch angle.

  The Astros declined to discuss it on the record, but multiple other executives pointed to Houston’s emphasis on the high fastball as an example of how new data have changed a development philosophy. In the high-offense era of the late 1990s and early 2000s, teams shied away from flyball pitchers, because they were more susceptible to home runs, and actively sought out groundball pitchers, with Derek Lowe and Brandon Webb both showing that a starter could thrive with just average velocity if he threw a sinker that forced hitters to hit the ball on the ground. This approach had a trade-off—a groundball is slightly more likely to become a hit than a flyball—but keeping the ball in the park made it worthwhile. The Astros appear to have seen, through Pitch f/x and now Statcast data, that pitchers who throw high spin-rate fastballs that don’t sink can also be effective, getting more swings and misses on four-seamers up in the zone or even above the zone—a pitch type and location that scared the hell out of everyone just a few years earlier.

  Wearable technology or sensors in the bat and ball would also provide a benefit here, as the giant black box of mechanics, whether for a pitcher or a hitter, might start to receive some statistical foundation. Some teams send their pitchers to the American Sports Medicine Institute or similar locations for biomechanical analysis, but the process is expensive, and optimizing a delivery is based on what little we already know about deliveries. (For example, ASMI research has shown that pitchers with short strides toward the plate release the ball higher in their deliveries, which reduces velocity and is correlated with a higher rate of shoulder injuries.) Collecting data on deliveries throughout a system, or even across multiple systems, would allow for a more systematic analysis of who gets h
urt and who doesn’t, of who gets more spin or sink or break on pitches, of which hitters put more backspin on the ball, and so on. The questions behind that sentence are not new; teams have looked for those players for as long as scouts have driven to all corners of the country to find players. But putting data behind the assertions of what’s desirable in a player may upend some conventional wisdom and could improve the way the industry scouts players and develops them.

  We’ve already seen such adjustments appear even at the major-league level. Daniel Murphy went from a nice platoon piece to an MVP-caliber hitter in 2015 and 2016 by increasing his launch angle through changes in his setup and stance, moving closer to the plate and starting his hands in a better position to get the bat head into the zone.

  As revolutionary as Statcast will be to all sides of the game, there is a world beyond Statcast, believe it or not. Statcast is the talk of the town right now because it just moved in, and there’s plenty of truth in the belief that such an extensive data stream will yield new insights into player performance, particularly in the descriptive realm—better telling us what actually happened on the field and what those actions were worth. Dividing up responsibility between fielders on a tough defensive play, or between a pitcher and the fielders behind him over the course of a game or a season, has long been a game of statistical estimation—educated guesswork, really, based on things we knew to be true at a macro level, because we couldn’t measure this stuff at a micro level. Much of the last 150 years in the world of particle physics has been about using mathematics to explain the existence and behavior of particles our instruments could not see; the technology later caught up to the theory, proving that Ludwig Boltzmann was right about atoms, and that Peter Higgs and others were right about the Higgs field and breaking the symmetry of the electroweak force. Baseball analysis isn’t particle physics (yet!) but has followed a similar path of hypothesis to discovery as we’ve moved from generic play-by-play data to Pitch f/x to Statcast.

  However, since Statcast does level the playing field, now some teams are looking for competitive advantages in areas not covered by this data. Some are rather obvious; when any MLB team hosts an amateur event, such as the Perfect Game All-American Classic at Petco or the Under Armour All-American Game at Wrigley Field, the club can choose to turn on the TrackMan system and collect Statcast-style data on the players for their own use without an obligation to share them with other teams. If I were running a club, I’d be lobbying to hold as many of those events as I could afford, just to gather more data on potential draft picks.

  Moving beyond Statcast data means using other technologies to collect the information, often requiring that the players themselves consent to devoting some time to the measurement process. The most notable public example of this is Boston’s use of software that claims to measure hand-eye coordination, a process they call “neuroscouting.” The software asks the player, sitting at a computer, to tap a key when a ball appears on the screen in a certain location or with the seams oriented a certain way, and thus tracks reaction times and recognition. The first year the team used the software in the draft was 2011, and their fifth-round pick that year was an undersized high school infielder from Tennessee named Mookie Betts, whom they drafted on the strong recommendation of area scout Danny Watkins—and because Betts had one of the highest scores on their neuroscouting tool that year.

  There are already vendors in the “analytics” space offering solutions for this so-called neuroscouting, vision testing, personality or psychological testing, and more, some of which may work and much of which is probably pseudoscience capitalizing on an industry flush with cash and now populated with executives actively seeking tools like these. It isn’t hard to foresee, as one team analyst suggested, “measuring facial expressions coupled with body movement to assess body language for quantifying attitude, stress, or emotion within a game; cameras that emulate fMRI’s to measure brain activity; measuring heart rate, sweat, DNA, sleep patterns, food and water intake, energy, muscle fatigue, joint torque and angles, you name it.” One general manager mentioned that the problem with some of these tests is that we lack any understanding of what the resulting data might mean; citing psychological testing, he questioned whether “stubbornness” should be seen as a good quality, a bad one, or neither.

  The last common theme among these answers was the challenge of merely integrating all of these new data, Statcast and otherwise, with the team’s existing operations. One GM who had gone through the exercise of setting up his team’s first analytics department bemoaned the time required just to build a system capable of handling Statcast data, but also pointed out that they had few internal systems that required the development of new bridges to connect it to the Statcast database.

  One longtime executive who’s overseen the construction of analytics departments for multiple teams put the opportunity this way: “Anytime there’s a new data stream, organizations face the question of when and how to integrate the new data into their decision-making processes. Organizations that go too quickly might rely on data that is inaccurate or compromised in some way. (We saw that time and time again with early defensive data.) Organizations that wait too long might fall behind and find themselves at a competitive disadvantage. Organizations that rely too much on a single new measure might be underemphasizing other important variables. Organizations that don’t factor the new data enough in their mix might be missing out on the benefit of the breakthrough. So, generally, I think the next big opportunity is not in a single new data stream or field of research. Instead, it lies in applying new data at the right time and in the right proportion with other variables to best predict future performance.”

  Several executives on the baseball and analytics sides told me that the skills for which they’re hiring have changed substantially in the last few years with the rise of big data, or what one director called “continuous data.” Bachelor’s degrees in computer science and programming skills were sufficient before the advent of Statcast; now teams are actively looking for master’s degrees or even Ph.D.s. Cleaning data from these various sources—spotting obvious errors or anomalies and removing them before they skew any sort of analysis—is a huge part of the job now, which is yet another skill set (understanding how to develop algorithms that can spot these outliers en masse, rather than having to find them by hand) that wasn’t necessary five years ago. Middleware might be familiar to many of you, but the idea of developing software to connect two different systems was simply not germane to baseball teams until these new data sources arrived. Getting systems to communicate so decision-makers have access to the data they want when they need them is the first big opportunity.

  The second is cultural—getting the baseball operations folks, from scouts to coaches up to the GM, ready to work with the new data and incorporate them into a draft model, player development decisions, or in-game tactics. One team director of analytics said that “one thing we found out early is to make everything visual. Players have a lot of other things going on, so the more you can communicate in one picture, the more they can get it. We spend most of our time thinking about finding ways to visualize things.” With so many employees essentially working remotely—scouts on the road, coaches and players at minor-league affiliates that could be halfway across the country—being able to communicate data-driven concepts in clear language is a potential separator skill for people looking to work in baseball operations.

  Nobody to whom I spoke said that there’s another revolution coming. Statcast was, in their view, the culmination of a series of upheavals in the business that started with the A’s run of playoff appearances around the turn of the century, the publication of Moneyball, the decisions by the Red Sox, Cardinals, and Blue Jays to very publicly hire front office people devoted to statistical analysis, and the 2006 introduction of Pitch f/x data. Of course, no one thinks the stathead era in baseball is over; they believe the change is permanent, and that changes in the near future will be incremental rather
than exponential. Teams are hiring a new breed of analysts, and they’re going to look for scouts and coaches who have the capacity or the experience to work with the recommendations or insights the analysts develop. It will likely be a period of experimentation with new ideas and technologies, some of which may stick, the way we’ve seen shifting and batter-specific defensive positioning became a routine part of the game. And the gap between what teams know about players and what those outside of front offices know will continue to grow.

  Epilogue

  When I started with the Blue Jays in January 2002 as a “Consultant, Baseball Operations,” later changing to the catch-all “Special Assistant” title, I was the analytics department. The largest part of my job was merely gathering data and putting it in a form I could present to my boss, J. P. Ricciardi, who was committed to having this information available for decisions but came from a baseball background rather than a technical one. I learned Perl, a scripting language well suited to tasks like scraping text from Web pages and searching through them for specific strings, and spent much of my time each spring working on collecting season data from college teams’ websites—eventually collecting from more than five hundred such sites each year—so that, first and foremost, we would have these data available in the draft room, and also so that we could identify players we might need to scout.

  At the time, only a handful of teams were mining college data to look for hidden value, although after the publication of Michael Lewis’s Moneyball in 2004, this practice became more common and the inefficiency in that market closed quickly. Prior to that, however, part of my job was to suggest to area scouts those players they should at least go evaluate—even if it was strictly to dismiss them as nonprospects—based on their statistical performances. Most of these players were doing well with skills that wouldn’t translate to the majors, such as the pitcher, whose name I’ve since lost track of, at Fairleigh Dickinson University who had solid numbers but a fastball that sat around 82 mph. But occasionally we’d find a late-round gem; our biggest success was Ryan Roberts, a senior at Texas-Arlington whom we signed for $1,000 in the 18th round in 2003. Roberts accumulated 5.7 WAR over a nine-year big-league career, playing in 518 games—although only 17 of them were for the Blue Jays.

 

‹ Prev