by Keith Law
By my last spring with the Blue Jays, I’d developed additional scripts to strip play-by-play game logs from college sites that offered them so we could estimate groundball rates and swinging-strike percentages for pitchers and very basic splits for position players. Within a few years, however, an independent data provider, collegesplits.com, began offering these data and much more (including left/right splits for pitchers and hitters) to teams for a fee, and by 2010 or so at least half of MLB teams were using this kind of information in their draft processes.
The other major part of my job at the time was to work with the data MLB provided for all professional players through delivery of daily flat files and the posting of game logs for every player. I wrote further scripts to deal with all of these, so we could easily identify, say, pitchers who were particularly effective at retiring left-handed hitters, or who had high groundout rates, or hitters whose value might be obscured by tough home ballparks. I spent more time working to collect, clean, and format these data than I did to “analyze” them, because the latter part was so straightforward—applying park effects, for example—while tiny glitches in the format of a Web page could throw a beautifully designed Perl script (if I do say so myself) into disarray.
MLB’s Pitch f/x data set just became available in my last year with Toronto, and I left to join ESPN before I got to do much of anything with it. Had I stuck around, my old tools would have been inadequate to the job; where I could store everything in Microsoft Access and export it to Excel for formatting, Pitch f/x had too many rows of data for basic desktop software. That was a job for a database programmer, and I am not one of those. This was the first inflection point, where hiring more than one person to staff an analytics department—and hiring someone with greater technical skills than I possessed—started to make sense.
Before the 2015 arrival of Statcast data, there were already teams that employed departments of six or more analysts, handling Pitch f/x data, college data, and some of the TrackMan data available for high school players from showcase events. Now Statcast data and its sheer size—the aforementioned terabyte of data per season—have led to an even greater rise in hiring; one department head estimated to me that all thirty MLB teams in total employ about two hundred people in analytics departments, from directors to entry-level programmers. I was one of the only full-time employees of any MLB team in 2002 whose job was to work with data; now, there are fifty to a hundred times more such people working for clubs, and I am completely underqualified for the job I used to hold.
As teams get smarter, the gap widens between what teams know about players and what we know about players—and by we, I mean not just the fans, but those of us who cover the industry for a living. Where fifteen or twenty years ago, the idea of even employing a single consultant to provide insight via statistical analysis was unorthodox, today teams employ entire departments of a half dozen or more analysts, some sporting Ph.D.s, to help gather, organize, and process data and queries to improve their decision-making on players. Increasing the accuracy of player projections—that is, what the player’s performance is likely to be next year, the year after, or over the life of a long-term contract—has long been a sort of holy grail for front offices, which is why you’re seeing so many resources thrown into analytics departments. Projections can never be perfect and should always have confidence intervals around them (“We’re 95 percent confident his OBP will be between .340 and .360”), but even marginal improvements in their accuracy can mean millions of dollars in value to a team.
This puts the fan (in other words, you) in a different place today than in 2007 or 1997. It was reasonable in prior eras to think that when it came to player stats, we all knew what the teams knew, and in certain cases we seemed to know more, or simply to consider it more than the front offices in question did. Today there is no question that the teams have more data than we have, and that they are drawing conclusions that we won’t know about until much later, if at all. We may certainly still disagree with team decisions on players, but we don’t have the same information they do.
I still take hope in the recent statistical revolution and the ongoing changes promised by Statcast and any future data sources. Where once the discussion and coverage of baseball was ruled by superstition and myth, today more fans demand some rational underpinning to arguments over whether the Nationals gave up too much for Adam Eaton, whether Mike Trout is having the best start to a career in baseball history, whether Manny Machado or Bryce Harper will end up the best player from the 2010 draft, and so on. You can still try to write arrant nonsense or spew it on television, but you’ll be picked apart for doing so, because the rise of the analysts has led to a more educated fan base.
Every player’s stat line tries to tell the story of his season, so if you want to get the story right, you have to use the right stats. Using the old-fashioned, outdated stats I broke down in Part One meant getting the story wrong. They ascribed credit to one player for the actions of another, and sometimes led writers and fans to believe that players had mythical powers like the ability to play better in a clutch situation. We know better now, whether it’s how to value what a player did or how to dismiss quackery like clutch hitters and lineup protection.
Understanding more modern statistics, even those as simple as OBP or slugging percentage, allows everyone to better understand what’s happening on the field, whether it’s going well or poorly, or the moves that teams make off the field. If your favorite team just acquired a player you’ve never heard of before, you’re going to want to know whether he’ll help. The better the statistics you look at to answer this question, the more confident you can be in your answer. And now you’re better armed to watch the watchmen, to read the work of people who cover the game (like me) and see if we’re telling the right kind of stories about the game, or ignoring statistical information that leads to a different conclusion. When a broadcaster tells you that some player “just knows how to win” or is “a great RBI guy,” your BS detector will light up like a Christmas tree. When a manager or GM claims that a low-OBP player can lead off because he’s fast, you know why speed is a red herring. You’re armed to think rationally about a sport that, for most of its 150 or so years, was covered and treated and discussed in the most irrational terms.
This will still be true for the savvy fan even as the information gap I mentioned above grows. You don’t need to know or understand the importance of exit velocity or launch angle or spin rate to watch and enjoy a game, or to follow a player or team through a season. This information may help you—for example, it appears that a fastball with high velocity but just average spin rate isn’t going to be as effective as the velocity alone might imply, missing fewer bats and leading to more hard contact. And you, the savvy fan (you’re welcome), should keep an open mind about new advances; ten years ago we never thought about putting a value on catcher framing, but now it’s driving transactions and pushing the worst framers out of regular jobs.
Teams are developing better tools to drive their player projections, regressing performances to mean levels or employing mixed models to try to incorporate random effects into metrics for pitchers, but you don’t have to understand any of this to be an educated fan. You only have to accept that the search for knowledge within baseball never ends, so what appears to be a complete story of a player today may turn out to be incomplete tomorrow. I said in the chapter on pitching metrics that my 2009 NL Cy Young vote may end up looking wrong as we learn more about how much credit or blame falls on a pitcher when a ball in play becomes a hit. Using the best knowledge we have right now while remembering that we may know a lot more in the future is the essence of Smart Baseball.
Acknowledgments
I’d like to thank my editor, Matt Harper, for shepherding this project from concept to completion, taking a set of essays and helping me weave them together into something coherent and cogent.
My agents, Eric Lupfer for literary and Melissa Baron for anything else, helped make
this book more than just some idea I had in the middle of thirty other ideas I had that never went anywhere. Eric in particular turned the elevator pitch into a written document and then into a formal proposal, one that landed me with HarperCollins faster than I could have hoped for.
Meredith Wills provided some essential research help, especially early in the process, which formed a lot of the foundation of the early chapters on ERA and fielding, although much of the work she did doesn’t appear directly in the book. The commentary about catchers whose proficiency at throwing out runners might hurt their apparent defensive value because runners stop attempting to steal against them comes from research Meredith did for this project.
I spoke to many people inside the industry to research this book, folks who made more time for me than I could have expected. The Statcast team at Major League Baseball Advanced Media, including Cory Schwartz, Greg Cain, Tom Tango (he exists!), Mike Petriello, and Daren Willman spent an afternoon walking me through the product’s history and capabilities. I felt like a kid walking through a science museum for the first time.
Molly Knight was especially helpful with advice and a critical eye that helped make the final book cleaner and more polished.
There are more team executives who helped than I can list, and some requested that they remain anonymous, but among those I can thank publicly are David Forst, Theo Epstein, Alex Anthopoulos, John Mozeliak, Chris Long, Sig Mejdal, Jason Pare, James Click, Dan Fox, Matt Klentak, John Coppolella, Mitchel Lichtman, and Farhan Zaidi, who’d like me to say that he was especially unhelpful.
My editors and colleagues at ESPN, especially at ESPN.com and Insider, were gracious enough to give me the time I needed to write a book while maintaining a full-time job and regular presence across ESPN’s various platforms. I appreciate their constant support and understanding.
My entire career in baseball has been something of a happy accident, and it only occurred at all thanks to J.P. Ricciardi, who gave me my first job in the game (and, among other things, made “Joey Bagodonuts” a permanent part of my vocabulary), and Billy Beane, who helped convince J.P. to give me a shot. I also worked with some wonderful people in my four-plus years in Toronto, and have to single out Tony Lacava and Tommy Tanous for the time they spent with me at games, teaching the most basic aspects of scouting to someone who, for all my comfort with numbers, could barely tell a slider from a changeup when I first got there.
And finally, I’d like to thank my wife and daughter for their incredible patience throughout the writing process, for all the times I was there but not really there, buried in my computer or stuck on the phone, turning out a 275-page book inside of nine months.
Index
The pagination of this electronic edition does not match the edition from which it was created. To locate a specific entry, please use your e-book reader’s search tools.
Aaron, Hank, 33, 118, 119, 126
Adcock, Joe, 28
Adjusted Batting Runs (ABR), 15, 190–91
African-American players, 214–15
Alfonzo, Edgardo, 39
Alien & Sedition Acts, 5
Allen, Cody, 54–55
Alomar, Roberto, 35, 35, 211, 213
Altitude, playing at, 25. See also Coors Field
Altuve, Jose, 11
Alvarez, Dario, 48
American Sports Medicine Institute, 238, 266
Anaheim Angels, 79, 190, 251, 262
Angel Stadium, 190
Aparicio, Luis, 118, 172
Applied math, 207–29
Arbitration Projection Model, 50–51
Area scouts, 232–34
Arizona Diamondbacks, 24–25, 48, 100, 199, 208, 241
Arm, The (Passan), 238
Arm strength, 239
Arrieta, Jake, 154, 154–55, 161, 197
Arthur, Rob, 252
“Artificial intelligence,” 250
Assists, 79–80, 164–65, 169–70, 172
At bat, 178–79
batting average and, 9–17
At Bat app, 247
Athleticism, 234–35
Atlanta Braves, 16, 24, 28, 31, 47–48, 145–46, 146, 199
Atlanta Journal-Constitution, 31
Aurilia, Rich, 34
AVG. See Batting average
BABIP (Batting Average on Balls In Play), 148–53, 155
common formula, 148–49, 148n
2015 NL Cy Young Award, 154–55
Baker, Dusty, 33–34
Ball, Trey, 234–35
Baltimore Orioles, 43, 44, 131, 142
Barfield, Jesse, 142
“Barrels,” 253
Baseball Between the Numbers, 65–66
Baseball Hall of Fame. See Hall of Fame
Baseball Info Solutions (BIS), 165, 166, 168
Baseball myths, 85–106
clutch hitters, 86–89
“hot hand,” 102–5
lineup construction, 93–96
lineup protection, 90–93
productive outs, 96–102
Baseball Prospectus, 2, 63–64, 111, 134, 178, 179, 180
Baseball-Reference, 15, 50, 72, 117, 143n, 164, 166, 172, 192, 196n, 201
Baseball Research Journal, 86, 88
Baseball scouts. See Scouting
Baseball title creep, 202–3
Baseball traditions, 2–3
Baseball Writers’ Association of America (BBWAA), 3, 209
Whitaker and, 210, 212
Base-stealing
catcher and, 176–77
times caught stealing, 32, 60, 62–63, 64, 66–68, 67, 187
Base Stealing Runs, 189–90
Batting average, 9–17
calculating, 11–12
correlation analysis, 13–14
history of use, 10
slugging percentage compared with, 122, 126
Batting Average on Balls In Play. See BABIP
Batting Runs, 15, 115n, 133, 188–91
Batting titles, 9–10, 15
Bauer, Trevor, 55, 243
Beane, Billy, 30
Bellos, Alex, 103
Beltre, Adrian, 147
Benard, Marvin, 34, 34
Bench, Johnny, 176
Bequeathed runners, 144–45
Berger, Mike, 241
Betances, Dellin, 48–49, 50, 145, 176–77
Betts, Mookie, 267
Beyond the Box Score (blog), 177
Biggio, Craig, 118
Biomechanical analysis, 266
Blair, Willie, 24
Blyleven, Bert, 29–30, 215
Boggs, Wade, 10, 40, 124
Bolt, Usain, 251, 257
Boltzmann, Ludwig, 267
Bonds, Barry, 33–34, 92
batting average, 12, 33
NL MVP, 16
OBP, 117, 126
RBIs, 39
slugging percentage, 122, 126
2001 season, 33–34
Book, The: Playing The Percentages In Baseball (Tango, Lichtman, and Dolphin), 87, 90–91, 94, 95
Borowski, Joe, 49
Boston (TV show), 4, 97–98
Boston Red Sox, 27, 53, 110, 116, 142–43, 207, 226–27, 234–35, 267, 270
Boswell, Thomas, 134
Boxberger, Brad, 49
Brach, Brad, 43–44
Brenly, Bob, 100
Brett, George, 80
Britton, Zach, 43–44, 51
Brock, Lou, 60, 67, 67–68, 118
Brooklyn Dodgers, 60
Brooks, Harold, 88
Brown, Kevin, 81, 215–19, 216, 218
Bryant, Kris, 253
Buchter, Ryan, 263
Bumgarner, Madison, 208
Bunning, Jim, 216
Bunts, 97–102
Cabrera, Mauricio, 47–48
Cabrera, Miguel, 11, 88, 100–101, 126, 135, 135, 253
Carpenter, Chris, 199–200
Carter, Joe, 35, 35–36, 39
Cary, Chuck, 142
Castellanos, Nick, 147
Castillo, Luis, 66, 122, 122
Castro, Jason, 264–65
Catcher defense metrics, 176–81
Catcher errors, 73–74
Catcher framing, 177–81
Caught stealing, 32, 60, 62–63, 64, 66–68, 67, 187
Chadwick, Henry, 10, 12
Chance, Dean, 222
Chapman, Aroldis, 54, 145, 254
Chase Field, 190
Chass, Murray, 45
Chesbro, Jack, 21–22
Chicago Cubs, 4–5, 113–14, 115, 151, 154, 161, 207–8, 258
Chicago Sun-Times, 45
Chicago White Sox, 52, 53, 77, 85, 117, 175, 233, 237, 239
ChyronHego, 247–48
Cincinnati Reds, 160, 160–61, 175, 254
Clark, Jack, 35, 35, 114, 257
Clemens, Roger, 23–24, 225–26
Cleveland Indians, 6, 54, 101–2, 174, 207–8
Closers, 21, 219–21
Proven Closers, 47, 49, 50–55, 145
save rule and, 47–55
Clutch and Win Probability Added (WPA), 157–59
Clutch hitters, 86–89, 157–59, 274
Clutch pitching, 47, 161
Cobb, Ty, 10, 67, 118, 118
Coleman, Vince, 39–40, 40, 60, 61
Collective bargaining agreement, 113
College players, 98, 100, 242–43, 271–72
Collegesplits.com, 272
Colon, Bartolo, 25
Colorado Rockies, 25, 49, 66, 116, 196
Complete games, 20, 27, 141
Concepcion, Davey, 210
Cook, Earnshaw, 109
Coors Field, 49, 100, 136, 187–88, 195, 196
Correa, Carlos, 169
Correlation analysis, 13–14, 38