Analytics In The Outfield

ScoutGPT

AI is the hot button topic right now. More specifically Large Language Models (LLMs), like ChatGPT and Perplexity, are where all the buzz is at. Peeling the layers back for a second, these LLMs are just models that are making associations between words. What words go together? What part of the sentence is the attention focused? The best use of AI, in my opinion, is the automation of tasks to give people more time to do other things. We still must be wary though because AI is not perfect, and definitely should be used as a tool rather than a full solution.

This emergence of AI in our lives made me wonder how we could apply this to baseball. Thinking about this I thought of different scenarios such as bullpen management decisions, free agency decisions, among other things. Ultimately I settled on exploring the use for scouting reports. Let me preface this by saying I am not suggesting that AI should replace scouts. There is still something to be said about a scout watching a player (video AI analysis is coming, but that is another can of worms) to see a players tendencies that the numbers would not show. However, I think in terms of game planning for an analytics team in conjunction with scouts, there is an opportunity to automate to reduce hours spent researching.

I have many thoughts about AI and its use but let’s get to baseball!

Mike Trout’s August Struggle

An example I am going to look at is Mike Trout’s struggle in August to build out a scouting report. Specifically I will be looking at Aug. 7 – Aug 21, 2025 where he was really struggling at the plate.

AVG	SLG	wOBA	K%	Bat Speed
0.250	0.275	0.343	26.4	70.8

A lot of swing and miss here and also his bat speed was down as well. There were articles about his struggle such as this one:

“Trout, however, doesn’t believe his struggles are due to distraction. He stated before the Halos’ series against the Texas Rangers he’s finding difficulty in “recognizing the pitch, just being on time.”” [https://www.si.com/mlb/angels/onsi/angels-news/angels-manager-thinks-mike-trout-is-distracted-amid-brutal-slump].

So there was a lot said as well as performance metrics to look at. Opposing analysts and scouts from opposing teams would be synthesizing this data to figure out how to attack Trout at the plate during this time. Let’s see what AI comes up with!

Prompt: Build scouting report for Mike Trout, Aug. 7 – Aug 21, 2025

ChatGPT

Mike Trout (Los Angeles Angels) – Aug. 7 – Aug. 21, 2025 Scouting Report

Overview

Veteran three-time AL MVP Mike Trout (age 34 in 2025) entered this period as a key offensive leader for the Angels but was in one of the tougher stretches of his illustrious career. Trout’s overall 2025 season numbers show he hit .232/.359/.439 with 26 HR and 64 RBI in 130 games, reflective of a solid but markedly down year by his historic standards.

Performance Summary (Aug. 7–21)

Offense

Batting average & production: In August overall, Trout hit approximately .214 with 1 HR, 7 RBIs, and 14 runs scored over 24 games, indicating limited impact at the plate during this stretch.
Slump indicator: According to team reports from late August, he batted just .190 with a .207 slugging percentage since Aug. 7, suggesting a notable slump during much of this period.
Approach & timing: Trout and Angels coaching staff publicly noted struggles with pitch recognition and timing. Trout himself commented that he felt good in cage work but was “off” in game timing.
Power: Despite the slump, Trout had hit at least one long ball (No. 398 of his career on Aug. 6), but power production was limited immediately after that milestone.

Health & Availability

Injury context: Trout returned from a bone bruise in his surgically repaired left knee earlier in the season and had missed games, which may have affected rhythm and mobility.
No new major injury was reported between Aug. 7–21, but ongoing knee discomfort was mentioned as a lingering issue in other late-season commentary on his form.

Advanced Metrics (Context for Period)

Approach profile: Across 2025, Trout’s profile trended toward a high strikeout rate (~32 %) and a high percentage of three-true-outcome plate appearances (HR, BB, K), suggesting a more all-or-nothing approach that could contribute to streakiness at the plate.
Statcast baseline: Over the full 2025 season Trout produced respectable exit metrics (avg. exit velo ≈ 90.9 mph, Hard-Hit % around 49.3 %), but those numbers were slightly down from prior peak years, which may correlate with his mid-August struggles.

Strengths Observed in This Stretch

Professional approach: Still demonstrates elite plate discipline relative to average MLB hitters; walks remain a strong part of his game even during down periods.
Power potential: Capable of game-changing contact and milestone power (e.g., home run No. 398 just before the period).
Experience & baseball IQ: Leveraging veteran instincts, even when production dips.

Weaknesses / Issues

Consistency & timing: Struggles with timing and pitch recognition were evident in decreased batting average and slugging during Aug. 7–21.
Limiting injuries: Lingering knee issues may impact lower-half mechanics, reducing his ability to drive the ball with consistency.
Three-true-outcome skew: Heavy reliance on walks, strikeouts, and home runs creates volatility; tough stretches can look especially barren when power is suppressed and strikeouts are high.

Role & Usage

Primarily Designated Hitter with limited time in the field compared to earlier career phases.
Remains a middle-of-order bat and team cornerstone, though impact was muted down this stretch.

Overall Take (Aug. 7–21)

This period was not representative of Trout’s peak performance levels. While he continued to show glimpses of power and maintained a disciplined eye, his batting average and slugging were depressed, timing was off, and production lagged relative to expectations for an elite hitter. Lingering health concerns and adjustment to a DH-heavy role likely contributed to the uneven showings. That said, his veteran approach and ability to draw walks kept value from falling further.

Although expansive analyses, ChatGPT could not hone in directly to that time period. Instead it talked about the whole month of August. Overall it is a good summary but not exactly a scouting report. Let’s look at a different LLM.

Perplexity

So perplexity gave an extensive 6 page scouting report.

Here are highlights:

Week by week game analysis

Summary of metrics

Tool evaluation (20-80) scouting report

Strengths, concerns, red flags, mechanical observations

You can find the full report below

Trout Scouting Report Download

I was incredibly impressed with Perplexity with the amount of detail it gave. It did exactly what I asked (breakdown of that time period) and more. This is much closer to what a scout/analytics team would like to know for a scouting report. I did notice my numbers did not match (Yikes, I could definitely be wrong) so either human error or AI error.

I then asked an additional question: What pitches did he struggle with during this time period?

Publicly available pitch-by-pitch data is not granular enough (by date range) to isolate exactly which pitch types Mike Trout struggled with only between August 7–21, 2025, so we cannot cleanly separate that two-week window from the rest of his 2025 season by pitch type.

Over the full 2025 season, however, the underlying pattern is clear and likely applies to that stretch:

He showed elevated whiff rates against all categories, but especially offspeed(changeups/splitters) and breaking balls (sliders and curves), with offspeed whiff/swing around 45% and breaking balls around 37% in 2025.
Analytical reports from 2025 also note career‑high swing-and-miss rates against four-seamers, with around a 30%+ whiff rate on heaters and very poor results against sinkers, plus very high whiff rates versus sliders/sweepers (over 40% in one early‑season sample).

So while we cannot isolate Aug. 7–21 specifically, the best-supported answer is: during that period he was most vulnerable to elevated four‑seamers, sinkers below the zone, and breaking/offspeed pitches (sliders, curves, changeups/splitters), which were already the primary sources of his swing-and-miss and chase issues throughout the 2025 season.

This is where we get to limitations of AI. We can get this data via Baseball Savant, but AI cannot filter the site with a query (yet). Something to note is I have the Pro version of Perplexity but not ChatGPT which may have impacted the responses. In addition, there are whole courses on prompting and how to change it to get better results. I do not delve into that here, as I just wanted to see how they do in general. Something really cool about Perplexity is it shows you each step in the “thinking” (i.e. Where it is searching, what it is searching for, etc.) and sources.

I think AI is very exciting for the scouting world to automate processes to allow a scout/analyst to work more efficiently. Perplexity is pretty close, but it is not a scout. Training an LLM just on scouting literature could improve this, as well as better prompting (looking at myself). I’m looking forward to looking at this again during the season.

Data & AI

BaseballSavant

ChatGPT

Perplexity

Special Thanks

I want to say thank you to Brad Ito, who talked through this idea with me and gave me fantastic LLM resources.

Bat Speed = Contact?

A question popped up in my mind recently: How does bat speed affect contact rate? We know that higher bat speed leads to better contact, but the question here is on making contact. Intuitively we’d think that more bat speeds leads to a little more time to make a swing decision and the ability to catch up to higher velocities, which then leads to more contact. On the other hand, maybe it doesn’t really matter and the gains are not as much as we think and the actually hitting ability of a batter is more important. Let’s dig into the data!

Here are the top 5 hitters based on bat speed

Player	Average Bat Speed (MPH)	Whiff %	Batting Average	Barrel %
Oneil Cruz	78.65	34.3%	0.209	20.3%
Junior Caminero	78.65	24.5%	0.253	12%
Jo Adell	77.62	28%	0.229	16.7%
Nick Kurtz	77.57	33.4%	0.304	19.3%

What stands out is that the barrel percentages are pretty elite, as we’d expect, but the whiff % is really bad and batting average nothing to write home about (except Kurtz, who had a phenomenal year). What gives? Maybe this is small sample theater (i.e. A small sample does not tell the whole story) so let’s look at 215 qualified players.

Variable	Correlation with Bat Speed
Whiffs per swing	0.618
Squared up per bat contact	-0.471
Competitive Swing %	0.411

We get not so strong correlations for all combinations of variables. Interestingly, as bat speed increases then whiffs per swing and competitive swing % increases (positive trend) while squared up per bat contact decreases (negative trend).

Let’s look at the regression!

We get the following equation

Bat Speed ~ -140.6 + 22.5*Whiff_Per_Swing – 9.94*squared_up_per_bat_contact + 233.3*percent_swings_competitive

We get an R-Squared of 0.497, indicating that only 49.7% of the variability is explained by the model. All variables are significant and we see the same trends as the correlation showed.

Well, it appears that there is more than meets the eye (literally) when it comes to bat speed and contact. Other variables such as pitch selection, probably contribute to contact as well. So bat speed is a great indicator of quality of contact, but a moot point if a player is not making contact.

Note: As a professor who teaches data analytics, I decided to do this analysis in Excel since my students have to for the course. Although not great for large scale, you can explore data in Excel to get a start.

Dedication

This blog post is dedicated to my cousin Mike, who passed suddenly a couple months ago. He was the best food blogger in Orange County, and inspired me to do a blog. https://eatingmywaythroughoc.blogspot.com

Data

BaseballSavant

Who is Jack Kochanowicz?

Jack Kochanowicz bursted onto the scene at the end of 2024 for the Angels posting a 3.99 ERA, 104 ERA+, 1.19 WHIP, and 3.8% BB%. He came up as a high groundball pitcher with the occasional strikeout. In 2025 he sports a 5.53 ERA, 74 ERA+, 1.59 WHIP, and 10.4% BB%, all well below what he did in 2024. On the surface we can see that his BB% went sky high, but what happened and who is the real Jack Kochanowicz? Is it the league catching up or is he doing something different?

First, we will look at the changes from BaseballSavant year over year.

FB velo, average exit velocity, chase rate, and hard-hit % were about the same year over year. However, whiff% (+5%), k% (+6%), BB% (+7%), barrel % (+4%) went up while GB% went down (-6%). So on the surface it appears more of a focus on swing and misses than groundballs, higher number of walks, and barrels. Interestingly, his xERA and xBA were around the same. Expected metrics are generally better because they account for randomness and luck, and it appears that maybe the 2025 version is more of who he is and he was lucky in 2024.

Let’s look deeper at the data.

First let’s look at pitch usage.

Pitch	2024	2025	Change
4-Seam Fastball	6.1%	18.4%	12.3%
Changeup	2.9%	10.9%	8.0%
Sinker	72.6%	47.8%	-24.8%
Slider	14.1%	16.3%	2.2%
Sweeper	4.3%	6.5%	2.2%

Kochanowicz has gone away from the sinker with a higher reliance on the 4-seam fastball and changeup this season. His sinker has been hit harder this year, but his homerun rate has skyrocketed with less of them.

This leads to the question of is he attacking hitters differently? Now let’s look 1st pitch distribution to see if there is the same trend.

1st Pitch	2024	2025	Change
4-Seam Fastball	3.0%	13.5%	10.5%
Changeup	0.4%	2.1%	1.7%
Sinker	75.2%	53.9%	-21.3%
Slider	17.1%	25.9%	8.8%
Sweeper	4.3%	4.6%	0.3%

A lot more 4-seam fastball and slider usage first pitch and less sinkers. How about with 2-strikes?

2-strikes	2024	2025	Change
4-Seam Fastball	13.6%	24.0%	10.4%
Changeup	8.2%	17.6%	9.5%
Sinker	54.4%	39.4%	-15.1%
Slider	16.3%	10.4%	-5.9%
Sweeper	7.5%	8.6%	1.1%

Less sinkers and sliders, but more 4-seam fastballs and changeups.

It is clear that the approach throughout the entire plate appearance has drastically shifted from a lot of sinkers to a combo of fastball and changeup. In 2024 he actually only threw 25 changeups! The sinker and slider have shown differing results the last two season.

Sinker

Year	wOBA	xwOBA	Difference
2025	0.360	0.349	-0.011
2024	0.330	0.348	0.018

Slider

Year	wOBA	xwOBA	Difference
2025	0.483	0.445	-0.038
2024	0.310	0.307	-0.003

For the sinker, in 2024 his xwOBA was higher indicating a little luck but that has reversed in 2025 where it is the opposite indicating a little unlucky. Overall the xwOBA is about the same year over year. So his sinker is about the same quality as last season.

The wOBA and xwOBA for the slider were about the same, but the difference year over year has been significant. Looking beyond this, the spin rate is similar to 2024. Exit velocity on the slider is up +2 mph, whiff % up +3.4%, and launch angle down -5 degrees. What gives?

Let’s look at location

From an initial look it appears more sliders are in zone and middle-middle in 2025, but there are more sliders thrown in 2025. Time to isolate on just swings.

Looks like a bit more in the zone and closer to middle-middle.

The Angels broadcast talked about the sinker being up this year contributing to the use of the changeup more. Here is the plot of swings on the sinker.

To me, it looks like a similar distribution.

So who is Jack Kochanowicz? I think he is somewhere in the middle of 2024 & 2025 to date. Cheap answer I know, but he drastically changed his approach to hitters and his expected metrics about the same which shows similar quality of contact from the hitters. Also there is only a year and a half-ish of data on him. What is contributing to his higher ERA is his high walk %. Definitely not a pitching coach but based on the data he needs better command, to increase the sinker usage, and fix the slider location or use it less. He is a groundball pitcher who should use the Angels infield defense to his advantage and not worry as much about the strikeout.

It is fascinating to me how players can be so good for a time and then the numbers flip. Always been interested in what they are doing different and how can they get back to who they were. More of this similar analysis in the future!

Data

BaseballSavant

Putting a bow on the impact of the shift

I wrote over a year ago about the possible impact of banning the shift (Impact of Banning the Shift) and I revisited it for the 2024 season in May (Revisiting the impact of the shift). Now that the 2024 season has ended, let’s look at all the data. Again, we will look at just pulled baseballs on the ground (which the shift was used for) for the 2024 season.

Batting Average (2024)	Batting Average (2022)	Difference
0.207	0.222	-0.015

Interestingly, in May we found the batting average was 0.281. Granted, it was only a month and a half into season, it seems like the pitching caught up. In my original post about the shift there was an -0.031 impact estimated from the shift. We see here with real data that it was actually less of an impact than estimated. The goal of banning it was more action, and it looks more hits were not induced at the clip the league may have expected.

Let’s look at the hitters that were identified as being the most negatively impacted by the shift in the first post (if they are still playing in MLB)

Name	Batting Average (2024)	Batting Average (2022)	Difference
Joey Gallo	0.08	0.071	+0.009
Byron Buxton	0.375	0.146	+0.229
Eddie Rosario	0.122	0.081	+0.041
Jose Ramirez	0.165	0.340	-0.175
Joc Pederson	0.276	0.250	+0.026
Salvador Perez	0.2	0.208	-0.008
Ozzie Albies	0.243	0.167	+0.076

Compared to the differences in May, these are very minuscule, with the largest positive change being Byron Buxton and negative being Jose Ramirez. In Jose’s case, he shoots the ball all over the field and Buxton’s speed and being healthier may have had an impact.

We do not see the positive effects MLB wanted from banning the shift. Maybe there are cases up the middle where there are more hits now but for pulling the baseball there is minimal effect.

When I did my initial post, it was interesting that the impact would have been minimal and then in May it looked like the opposite. This is definitely the idea of regressing to the mean and small data problems.

To me this was a very interesting result and I hope you enjoyed the ride!

Data

Baseball Savant

Quality Pitch % (QP%) and QP%+

There are many metrics to evaluate how good a pitcher is. From tradition metrics like ERA and WHIP to Sabermetrics with FIP and WAR to Statcast era metrics like Whiff% and CSW%, there are so many out there. Recently, there is a trend to look at swing and miss ability of a pitcher as a main metric for how good a pitcher is but really you should look at several. What I think is missing right now is looking at command + stuff in a metric, so what I propose here is quality pitch % (QP%) and QP%+ to evaluate this.

First we need to define what this metrics encompasses. What really is a quality pitch? Well, could be a pitch that is at the edge of the strike zone (shadows) but also a swing a miss pitch. Even if not called a strike for the umpire or contact, hitting the edge of the strike zone is a good pitch. In addition, a pitch could be anywhere and a hitter missing it shows it was a good pitch. Alas, we have a definition! The formula then becomes:

QP% = (# Shadow Pitches + # Swing and Miss Pitches)/(Total Pitches)

We will select the data with this criteria from BaseballSavant, looking at starting pitchers with at least 1000 pitches for the 2024 season and relief pitchers with at least 750 pitches in 2024.

Starting Pitchers

Let’s look at the top starting pitchers in QP%

	Name	Quality Pitches	QP%	QP%+
1	Yamamoto, Yoshinobu	671	53.8%	126
2	Snell, Blake	836	52.7%	123
3	Crochet, Garrett	1198	52.5%	123
4	Gallen, Zac	1212	52.4%	122
5	Perez, Martin	1065	51.7%	121
	Ryan, Joe	1088	51.7%	121
7	Crawford, Kutter	1360	51.6%	120
8	Flaherty, Jack	1262	51.5%	120
	Castillo, Luis	1472	51.5%	120
	Ober, Bailey	1252	51.5%	120
11	Anderson, Tyler	1418	51.4%	120
	Imanaga, Shota	1232	51.4%	120
	Woo, Bryan	752	51.4%	120
	Gibson, Kyle	1341	51.4%	120

We see players that have great stuff like Snell, Yamamoto, Gallen, and Castillo but also interesting names like Crawford, Perez, and Anderson. Crawford (4.19 ERA, 103 ERA+) and Perez (4.36 ERA, 96 ERA+ but better with Padres) don’t jump off the page but Anderson (3.60 ERA, 118 ERA+ and All Star) has turned in an exceptional year.

Relief Pitchers

Let’s look at the top relief pitchers in QP%

	Name	Quality Pitches	QP%	QP%+
1	Nardi, Andrew	509	56.2%	146
2	Hader, Josh	581	54.3%	141
3	Clase, Emmanuel	489	54.3%	141
4	Miller, Mason	498	54.2%	141
5	Lee, Dylan	424	53.5%	139
6	Iglesias, Raisel	450	53.3%	138
7	Erceg, Lucas	467	53.2%	138
8	Yates, Kirby	498	53.0%	138
9	Estrada, Jeremiah	494	52.5%	136
10	Cano, Yennier	481	52.4%	136

We see similar trends to the starting pitchers with Hader, Close, Miller, and Iglesias a few here with fantastic stuff but very surprisingly Nardi (5.07 ERA, 89 ERA+) is at the top.

So what does all this result in? Well, we can evaluate a pitcher’s ‘pitchability’ by using QP% but it doesn’t tell the whole story, just like the other metrics. However, I think we should look at this because if a pitcher locates, they should have more success. You can’t teach command, but you can work on spin rate, break, etc. in the pitching lab.

Data

Baseball Savant

(R)Shiny AAA Pitch Data

I first made an RShiny app in graduate school when I first discovered it in a course I was taking. Simple, yet powerful applications. Of course, I used baseball data for fun outside of my assignments. The goal was to look at pitch data and break it down by location. I wanted to bring that back to make something with AAA data!

Recently, MiLB has been tracking pitch data using Statcast. I used 2024 data from the Salt Lake Bees (Angels AAA Affiliate) to make an RShiny app of pitch data. I don’t believe it is complete data, since not all stadiums have Statcast, but used what there was. I isolated on all swing, not including foul balls. My goal was to be able to allow a user to filter by hits, in-play, whiffs, pitcher, and date. The output would be pitch location, separated by pitch type, and associated events. In addition, I added a stacked bar chart for frequency of pitch types by events. Let’s see how it turned out!

I didn’t do anything super fancy, but I think it is pretty neat. It is not BaseballSavant level but something you can do with open source data and software.

You can check it out here: http://cwatkins1123.shinyapps.io/Bees_Pitcher_App

Won’t gatekeeper code, so here it is for those interested:

bees <- read.csv("bees_pitch_data_24_data.csv")
bees$game_date <- as.Date(bees$game_date)
bees <- bees %>%
        mutate(events = ifelse(events == '', 'whiff', events))


tzone <- round(mean(bees$sz_top),2)
bzone <- round(mean(bees$sz_bot),2)
inKzone <- -.95
outKzone <- 0.95

kZone <- data.frame(
  x = c(inKzone, inKzone, outKzone, outKzone, inKzone)
  , y = c(bzone, tzone, tzone, bzone, bzone)
)


ui <- fluidPage(
  titlePanel("Bees Pitchers 2024", window ="Bees Pitchers 2024"),
  
  sidebarLayout(
    sidebarPanel(radioButtons("resultInput", "Result", choices = c("All", "Hits","In-Play","Whiffs"), selected = "All"),
                              uiOutput("playernameInput"),
                 sliderInput("dateInput",
                             "Dates:",
                             min = min(bees$game_date),
                             max = max(bees$game_date),
                             value = c(min(bees$game_date),max(bees$game_date)),
                             timeFormat="%m-%d-%Y")
    ),
    mainPanel(plotOutput("coolplot", width = "750px", height = "750px"),
              br(),
              plotOutput("coolplot2"),
              br(),
              textOutput("nrow"),
              br(),
              textOutput("credit"),
              br(),
              textOutput("signature"),
              br(),
              br())
    
  )
)
server <- function(input, output){
  output$playernameInput <- renderUI({
    selectInput("playernameInput", "Pitcher", 
                choices = sort(unique(bees$player_name)),
                selected = "Crouse, Hans")
  })
  filtered <- reactive({
    if(is.null(input$resultInput)) {return(NULL)}
    else if(input$resultInput == "Hits"){
      bees %>% 
        filter(player_name == input$playernameInput,
               events %in% c('single', 'double', 'triple', 'home_run'),
               game_date >= input$dateInput[1],
               game_date <= input$dateInput[2])
    }
    else if(input$resultInput == "In-Play"){
      bees %>% 
        filter(player_name == input$playernameInput,
               description == "hit_into_play",
               game_date >= input$dateInput[1],
               game_date <= input$dateInput[2])
    }
    else if(input$resultInput == "Whiffs"){
      bees %>% 
        filter(player_name == input$playernameInput,
               description %in% c("swinging_strike", "swinging_strike_blocked"),
               game_date >= input$dateInput[1],
               game_date <= input$dateInput[2])
    }
    else{
      bees %>% 
        filter(player_name == input$playernameInput,
               game_date >= input$dateInput[1],
               game_date <= input$dateInput[2])
    }
  })
  
  output$coolplot <- renderPlot({
    if(is.null(input$playernameInput)) {return(NULL)}
    ggplot(filtered(), aes(x = plate_x, y = plate_z)) + geom_point(aes(col = events)) +
      scale_y_continuous(limits = c(0,5)) +
      scale_x_continuous(limits = c(-2.2, 2.2)) + coord_equal() +
      geom_path(aes(x, y), data = kZone, lwd = 1, col = "red", alpha = .5) +
      labs(x = "x", y = "z", title = "Pitch Location") +
      theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 20),
            legend.title = element_text(face = "bold"))+facet_wrap(~pitch_name, ncol =2)
  }, height = 750, width = 750)
  
  output$coolplot2 <- renderPlot({
    if(is.null(input$playernameInput)) {return(NULL)}
    ggplot(filtered(),aes(fill = events, x = pitch_name))+
      geom_bar(aes(y = (..count..)/sum(..count..)))+
      labs(x = "Pitch Type", y = "Frequency")
  })


  output$nrow <- renderText({
    if(is.null(input$playernameInput)) {return(NULL)}
    nn <-nrow(filtered())
    paste("Based on your criteria, there were", nn, "pitches found.")
  })
  
  output$credit<- renderText({
    paste("Data pulled from BaseballSavant")
  })
  
  output$signature <- renderText({
    paste("By Chris Watkins, Ph.D.")
  })
  
}
shinyApp(ui = ui, server = server)

Data

Baseball Savant

What happened to HRendon?

Anthony Rendon was an All-Star, Silver Slugger and 3rd in MVP voting in 2019 with the Nationals. His injury issues with the Angels have been well documented, and he is on pace to play in the most games (over 58) with the Angels in 2024. What has interested me though is his lack of power from 2020-2024 after his increase in power from 2016-2019. Surely there is something in the numbers to pick out right? Let’s see.

Year	Age	HR	SLG
2019	29	34	0.598
2020*	30	9	0.497
2021	31	6	0.382
2022	32	5	0.380
2023	33	2	0.318
2024	34	0	0.280

Anthony Rendon
*COVID shortened year

We see a significant decrease in homers (this trend can be seen in doubles as well), leading to a decrease in slugging. These years should be prime years for Rendon, and we could blame injuries for sure but let’s compare to Mike Trout, who has had his share of injuries and only 2-years younger than Rendon.

Year	Age	HR	SLG
2019	27	45	0.645
2020*	28	17	0.603
2021	29	8	0.624
2022	30	40	0.630
2023	31	18	0.490
2024	32	10	0.541

Mike Trout
*COVID shortened year

It’s worth noting in 2023 that Trout struggled hitting the fastball (that’s a story for another day), leading to more strikeouts and lower slugging. Regardless, we can see that when Trout was healthy, he was still Mike Trout (for the most part). What gives with Rendon? Well, for that we need to look at what he is swinging at and the contact metrics.

Year	Avg Exit Velo (MPH)	Avg Launch Angle	Barrel %	Chase %	Whiff %
2019	90.4	19.5	12%	20.6%	12.9%
2020*	90.1	19.5	6.3%	16.7%	14.7%
2021	89.1	22.3	5.6%	21.9%	17.2%
2022	89.6	18.7	8.3%	18.6%	19.6%
2023	90.1	16.2	4.8%	16.9%	17.9%
2024	88.1	14.1	2.5%	17.5%	12.5%

Anthony Rendon
*COVID shortened year

In terms of baseballs he swings at, Rendon is still elite in chase % and whiff %. So the pitches he is swinging at are still good pitches and he makes contact. The average exit velocity has been similar, although the lowest in 2024, but we see a significant decrease in barrel % and varying launch angle. To me (not a hitting coach), it seems like he does not have a consistent swing because the launch angle has been up and down the last several years. This is the first year Statcast is tracking bat speed, and Redon’s average bat speed is 68.4 MPH, which is not good at all. Interestingly, his 34.5% squared up % is elite.

Rendon’s eye at the plate is still elite, which gives promise for the future. Obviously he needs to stay on the field, but also needs a more consistent swing, more bat speed, and increase in launch angle for more consistent barrel %. Could be his injuries leading to this as well.

I, and all Angels fans, are hoping he figures it out for 2025.

Data

Baseball Savant

Baseball Reference

Can we perfectly predict exit velocity?

New Hawkeye technology is allowing us to measure bat speed for hitters and the data has now become available through Baseball Savant. The idea of exit velocity is related to the change in velocity (i.e. Pitch speed to exit velocity), where it should be dependent on pitch speed, launch angle (how good the contact is), and bat speed. For those physics fanatics, it is related to momentum (P = mv). So, do we have everything we need to predict exit velocity perfectly? Let’s dive in!

I selected data from just August of contact (no foul balls) because we have enough data for a model (13,301 observations) from Baseball Savant.

First, let’s look at correlation between exit velocity and the predictive variables

Variable 1	Variable 2	Correlation
Exit Velocity	Pitch Speed	0.108
Exit Velocity	Launch Angle	0.149
Exit Velocity	Bat Speed	0.468

We see a positive correlation for each, meaning there is a positive trend between exit velocity and the other variables. These are weak correlations, with bat speed being the most decent correlation.

Next, we will model exit velocity (Exit_Velo ~ Pitch_Speed + Launch_Angle + Bat_Speed)

Note: release_speed = pitch_speed

With a logistic regression model, we get a model that explains 23.6% of variability in the data, with bat speed having the largest effect size. All variables are significant, but with a large data set this is expected. Let’s call it how it is, this model is not good, which is very interesting. Would more data fix this? Maybe, but the sample size is large. What other variables could be impacting exit velocity? Maybe weather? From a physics perspective, we have a majority of what we have (except air friction from weather) so this is a surprising result to me at least.

Being the physicist lover that I am, let’s build a change in momentum statistic (Exit Velocity minus Pitch Speed).

	player_name	bat_speed	momentum_change
1	Cabbage, Trey	78.91372	20.2
2	Narvaez, Carlos	75.11731	17.1
3	Baker, Luken	74.89069	16.2
4	Crawford, Brandon	79.51338	15.6
5	Monasterio, Andruw	70.5919433	12.85
6	Haase, Eric	71.3540129	10.1714286
7	Sweeney, Trey	76.348464	10.1
8	Riley, Austin	75.0177657	8.12727273
9	Cameron, Daz	70.8280327	8.01818182
10	Gonzalez, Romy	73.5086556	7.71111111

And compare to highest bat speed.

	player_name	bat_speed
1	Stanton, Giancarlo	81.1253164
2	Walker, Jordan	80.2783138
3	Crawford, Brandon	79.51338
4	Cabbage, Trey	78.91372
5	Wallner, Matt	78.3187904
6	Wisdom, Patrick	77.8608838
7	Leon, Pedro	77.70978
8	Judge, Aaron	77.4944854
9	Schwarber, Kyle	77.3780231
10	Adell, Jo	77.1606417

Looking at these lists, those with the top 10 change in momentum that are on the top 10 list for bat speed are the following: Trey Cabbage, and Brandon Crawford (only 20%!!).

The old adage is that a pitcher that throws fast “provides the power”, which may be the reason for these results. Or, in general pitchers throwing faster means there is less of a difference in momentum here. There is more to learn with bat speed being available, and as more metrics become available it is exciting. Also, shout out to Jo Adell for being top 10 in bat speed in this data set!

Data

Baseball Savant

Revisiting the impact of the shift

I wrote over a year ago about the possible impact of banning the shift (Impact of Banning the Shift) so I wanted to revisit it for the 2024 season so far. This gives the hitters and pitchers a season (2023) to adjust to the change. Again, we will look at just pulled baseballs on the ground (which the shift was used for) for the 2024 season up to this point.

Batting Average (2024)	Batting Average (2022)	Difference
0.281	0.222	+0.059

We are definitely seeing an increase in batting average so far on pulled baseballs on the ground, resulting in about 6 more hits per 100 baseballs pulled on the ground. Recall that the impact in the previous post was -0.031, about 3 less baseballs that were hits with the shift. A caveat here is that we are only about a month and a half into the season, so it could change by the end (which I till revisit!).

Let’s look at the hitters that were identified as being the most negatively impacted by the shift in the previous post (if they are still playing in MLB)

Name	Batting Average (2024)	Batting Average (2022)	Difference
Joey Gallo	0.429	0.071	+0.358
Byron Buxton	0.552	0.146	+0.406
Eddie Rosario	0.214	0.081	+0.133
Jose Ramirez	0.235	0.340	-0.105
Joc Pederson	0.233	0.250	-0.017
Salvador Perez	0.467	0.208	+0.259
Ozzie Albies	0.375	0.167	+0.208

Nearly everyone on the list that was negatively impacted by the shift are showing more success in 2024 on pulled baseballs on the ground. Jose Ramirez sprays the ball all over the field and Joe Pederson’s difference is not as large as the others, so something else may be going on there.

It will be interesting to see how this shakes up at the end of the season, but players are definitely seeing large improvements without the shift.

Data

Baseball Savant

Making the Case for RTB (Runners Total Bases) and RTB%

Many (if not all) those in Sabermetrics think the RBI is dead and that it should not be considered when evaluating a player. This is because RBI’s are dependent on players around you getting on base and in scoring position, thus not truly an individual statistic. The opponents of that thinking say there is something to be said about a player being ‘clutch’ and coming through in those situations because more runs lead to more wins. My thinking is that there is a case in the middle, which is looking into how a player moves runners on base. To score runs you need to get on base, then move the runners till they eventually score and there are many ways to move runners than just hits. This is where Runners Total Bases (RTB) comes in.

So what is RTB? Well, it is the number of bases that runner on base is moved by a player.

For example, let’s say Shohei Ohtani is on first base with Mike Trout at the plate. Trout singles and Ohtani moves to 3rd base. Then, Trout’s RTB for that at bat is 2.

Simple right? Naysayers would say that RTB alone is the same as the RBI because it is dependent on opportunities so let’s fix that with RTB%.

For example, let’s assume Trout had 4 RTB and 2 opportunities in one game (i.e. 2 runners were on base total during his PA’s) then RTB% = 2. This means that on average, Trout moves runners 2 bases.

From here we can get metrics like RTB%+, can add weights in for certain types of hits that contribute to RTB, among other things. I would remove plays that result in errors, as that is on the defense, and walks.

Is this metric perfect? Absolutely not and should be tested, but I think it is the start to improving on RBI to get a metric that evaluates a player fairly. The idea that is on my mind of how this helps is optimization: Get players with high OBP and high RTB% and you will score runs. Ideally putting high RTB% players hitting behind high OBP players.

You probably noticed that I don’t have data showing an example of RTB. The data is available via Baseball Savant, but would take a bit of time to code and apply the logic. With that in mind and having a full-time job, I wanted to put the idea out there (good or not) and apply it later. Maybe a future post I can take a small sample of data to show this in action.