Deriving Expected wOBA Utilizing Exit Velocity & Launch Angle
With the advent of freely available statcast data that can be easily pulled from a great site like Baseball Savant it becomes easier to find the necessary tools that allow an analyst to delve into areas where previous research had to use proxies such as hard versus soft contact or batted ball trajectory. These are rather large buckets and both are subject to the personality and opinions of the stringer inputting the data making them less than ideal candidates for translating balls into play into useful information. Taking a leap forward we can now use measures such as exit velocity, or the speed of the ball off of the bat, and launch angle, or the horizontal angle at which the ball comes off of the bat. With more precision should come the ability to better analyze which players are earning the production showing up in their traditional metrics, and which can expect dramatic shifts in one direction or another. This work will attempt to use past data to derive expected wOBA output as a function of both exit velocity, and launch angle.
Some basic grunt work at the above link will eventually allow you to build a database capable of showing the average wOBA for every exit velocity and launch angle pair that existed in 2016. This is useful because we can code vlookups using this data to create a baseline for expectations based on those specific coordinates. First, though, you may question how reliable those historical wOBA figures would be for a given set of coordinates. In the snipped image above you can see that when a ball came off a bat at 108 MPH it led to a weighted On Base Average (wOBA) of .962, which is quite good, but you can see that along each of the various launch angles that results vary wildly. Mostly this is due to the incredibly small samples that can be seen along vast stretches of the horizontal trajectory highway. If we want to be able to have good confidence in this kind of model we don’t want to be relying upon sample sizes of only a couple of balls in play. We can fix that.
Before introducing a good workaround to these sample size issues it might be beneficial to visualize what this data set looks like in order to better understand how final figures were arrived at for each set of coordinates. For that, let’s start with the weighted Runs Above Average (wRAA) table:
(open all images in new tab to see enlarged version)
Because these images would be massive I have sufficiently zoomed out to show what the bulk of the -90 to 90 degree launch angle (bottom to top) by 50 to 120 MPH exit velocity (left to right) looks like. You can think of the colorations as a heatmap, of sorts. The more green the more total production could be found in that area. The more red the less production to be found there. Each individual cell shows the result of multiplying the number of balls in play at that set of coordinates by the wOBA as given by Baseball Savant. Using wRAA instead of wOBA is advantageous for a few reasons that I will touch on in a moment.
Eventually we are going to want to calculate the wOBA for that given cell so we’re also going to need to plot the number of balls in play for each set of coordinates so that we have a denominator to divide our wRAA back into once we do a neat little trick. You can see a high frequency of balls in play centering around the 100 MPH exit velocity and between 0 and 30 degrees off of the bat.
Going back to the wRAA chart you might notice the funky border going on in the heart of the most productive area. I assure you, this has nothing to do with Carcosa. Earlier I had mentioned that we want to expand the sample size for each of these cells so that we can bulk up the sample sizes, and in turn increasing the confidence that we can have in that specific set of coordinates. The reason that we use wRAA in the first place is that it allows us to easily lump different coordinates together and then divide by the total number of balls in play. Using this weighted average allows us to increase the sample size, but easily back out into a wOBA figure. The trick is that I have weighted three different sections based on proximity to the cell we are concerned with.
In this example we’re focusing on a ball that went 103 MPH off the bat at an angle of 25 degrees. For that specific cell I assign a weight of 60% to the algorithm. Next, at a weight of 30% I take the average of the 9 cells that are centered on our specific cell, and the last 10% goes to the average of the five-by-five grid that is centered by our cell of interest. I calculate this for every cell and then divide by the number of at bats for each segment of the formula. This, finally, gives us a really strong weighted average wOBA for any set of coordinates:
Note the near automatic homer zone radiating out of the intersection at 110 MPH and 30 degrees. There is also a neat little mirrored Nike swoosh that cuts through the heart of the data. These are balls that usually make it over the infielder and in front of the outfielder falling in for hits at a high rate, but within a very narrow band. Another area of interest is the pop up or can of corn zone that rides the northern plain, but makes pillages down into the southward lands from roughly 75 to 100 MPH. It is a great reminder that both aspects are needed in order to expect excellence on a ball in play, though one or the other can often lead to success, as well.
Now that we have a good idea of the expected wOBA for every exit velocity and launch angle pair it becomes pretty easy to create VLOOKUPs that can easily spit out expected wOBA figures based on our two measures. With a baseline created we can now incorporate data from any or every ball in play in order to compare actual results to the expected baseline. I pulled every ball in play for every pitcher and batter in 2016 and then park adjusted each one based on the park that it was hit in and further drilled down to whether the batter was right or left-handed. I did this for both our expected wOBA that is based on the exit velocity and launch angle and the actual wOBA yielded based on whether the ball was a single, double, triple, homer or out. These figures can be aggregated in any way that you want. Chris Archer versus lefties at home? No problem. Evan Longoria on fastballs over 94 at Yankee Stadium? Easy peasy. And so on. Let’s start by looking at the total park-adjusted expected wOBA, or xwOBA*, and park-adjusted actual wOBA, or awOBA*.
Bear in mind this is only balls in play and does not include walks, strikeouts or hit batters, at least not yet. Also, this includes the entire seasons for players including time spent with other teams. You can note that wild deviations from expectations aren’t fully restricted to those with small samples. The Logans underperformed, while Corey Dickerson went the other way by quite a bit. The thing about these snapshots is that it’s a moment in time. I prefer a bit more context, and to get that we can look at rolling averages over the course of the year:
This can give us good insight into the peaks and valleys of a player’s season or perhaps shine some light on times when the guy is hiding an injury. It also tells us when he was well out of whack from expectations. The middle of Archer’s season, for instance, shows a massive spike in his wOBA allowed, but he was actually more like a little worse than average guy. You can also see the strides he made over the course of the year and the very strong, and sustainable finish that he displayed. In addition to seeing how the season unfolded, we can also shine a light on the distribution of their balls in play using what I’m calling a spray chart:
I’m using the created heatmap as a background and then graphing each ball in play by its coordinates as an overlay. This should give you a good idea of where the player is concentrating their profile, and who is either avoiding or embracing the danger zones based on their job of either hitting or pitching, respectively. Once these templates have been created it is very easy to populate the charts simply by filtering the data.
There is no limit to the things that we can look into using this tool so feel free to reach out on twitter (@sandykazmir) if there is a player that you would like me to take a look at. It’s incredibly early, but we can even look at stuff that is happening basically in real time. Additionally, while this research has revolved around creating a reliable metric for rating the productivity of balls in play it has avoided walks, strikeouts and hit batters. These are things we can pretty easily fold in while maintaining everything on the same linear weights scale as every plate appearance ends in one of these outcomes. Here’s a look at the 30 best batters on this young year (through 4/13) by wRAA, which I like because it rewards for performance compared to average, but also rewards for volume:
Here is the top 30 pitchers:
We can also focus on just the Rays:
The Rays have had three players provide above average production and several more that are on the other side of the coin. While xWOBA* is still our measure of expected production on balls in play, I’m now also using twOBA* that takes into account all the plate appearances that did not end with a ball in play. The wRAA estimate is based off of this last component. Here are the pitchers:
There are some nicer tales to be told here. Archer has been really good through his first two starts and he is joined by fellow starter Jake Odorizzi on the positive side. Jumbo and the Horseman both come in as strong pieces of the bullpen so far. It’s not all roses as Andriese, Pruitt and Cobb have been lesser lights, but it’s good to keep in mind that this is from two games. Last thing before moving on is that I wanted to share each of the teams to this point in the season. Let’s start with pitchers on the left and hitters on the right. :
The Rays pitching has been a little better than average, but the bats are grading out as worst in the league. While we have seen flashes there just hasn’t been enough offense from the entirety of the lineup. We’re still seeing the big inning, but we’ve also become accustomed to the droughts, and so far it looks like they’re here to stay.