Balks: An Illustrative & Quantitative History

Balks: An Illustrative & Quantitative History

By David Venturi

May 9, 2016

What is a balk? Pro Baseball Insider has the simplest definition:

In the simplest sense, a balk is when the pitcher tries to intentionally deceive the hitter or runner. It can be a flinch on the mound after the pitcher gets set, a deceptive pick off attempt, or even just as simple as dropping the ball once you become set. There are many actions that can result in a balk. When runners are on base and a balk is called, all the runners move up one base.

A full list of the actions that constitute a balk can be found here.

Balks are rare. Since 2000, there have only been 100-200 balks per season, which is roughly one every 12 to 24 games (or 648 to 1296 innings pitched) in a full 2430-game season.

Balks are difficult to spot. Balks sometimes go unnoticed by fans, players, and umpires. What constitutes a balk might be subjective depending on the umpire. Balks might even be ignored by umpires depending on the situation.

The definition of a balk has changed over time. Throughout baseball history, there have been a number of tweaks to the balk rule. With each tweak, balk totals for the subsequent season tended to spike or dip.

Questions for Investigation

  1. How have balks trended throughout baseball history? Does the trend align with rule changes, rule enforcements, etc.?
  2. The balk rule is designed to limit pitcher deception towards the baserunner. Did the balk rule changes and enforcements in the mid-to-late 1900s spark an increase in stolen base attempts?
  3. Are there significantly more balks called per inning pitched in the regular season compared to the postseason? If yes, why might this be occurring?
  4. Who is the all-time balk king? Who is the modern-day (post-2000) balk king? Who is the balk iron man (most innings pitched without a balk)?

Dataset

The Lahman Baseball Database contains complete batting and pitching statistics from 1871 to 2015, plus fielding statistics, standings, team stats, managerial records, post-season data, and more. The Master (player names, DOB, and biographical info), Pitching (regular season pitching statistics), PitchingPost (postseason pitching statistics), and Batting (regular season batting statistics) tables are used in this balk analysis.

The full database and a detailed description of its contents can be found on Sean Lahman's website.

Analysis

How have balks trended throughout baseball history? Does the trend align with rule changes, rule enforcements, etc.?

Balks per inning pitched have been on a slow upward trajectory since 1885 or so, with spikes in 1899, 1950, 1963, and 1988. The spike in 1988 (1 balk for every ~40 innings pitched) was so dramatic that the season is referred to as The Year of the Balk. All of these spikes coincide with rule changes and enforcements. As per Recondite Baseball:

  • 1898/1899: The first balk rule dealing with runners on base was inserted into the rule book in 1898. It stated a pitcher was compelled to throw to a base if he made a motion in that direction. The following year, the balk rule was refined to say a pitcher could not fake a pickoff throw.
  • 1950: A new rule requiring a one-second stop before delivering a pitch with men on base was implemented in 1950.
  • 1963: The National League cracked down on balks ... for the 1963 season. An order to umpires to clamp down on balks resulted in twenty balks called in the first twenty games of the year.
  • 1988 (The Year of the Balk): The 1988 version [of the rules] replaced “complete stop” with “single complete and discernible stop, with both feet on the ground.” This slight change, intended to make balk calls more uniform throughout major league baseball, instead sparked one of most frustrating summers ever for major league hurlers.

2. Balks & Stolen Base Attempts

The balk rule is designed to limit pitcher deception towards the baserunner. Did the balk rule changes and enforcements in the mid-to-late 1900s spark an increase in stolen base attempts?

The post-1950 balk rule changes and enforcements coincide with an increase in balks (blue) and an increase in stolen base attempts (red). Though there could be other factors at play here (i.e. an increase in player speed, managers calling for more stolen base attempts based on strategy change, etc.), it appears likely that the balk rule changes and modifications were effective in promoting the running game.

3. Regular Season Balks vs. Postseason Balks

Are there significantly more balks called per inning pitched in the regular season compared to the postseason? If yes, why might this be occurring?

3.1 Visual Trend

There is a high variation in postseason balks per inning pitched (green). This variation is to be expected because of the small sample size of innings pitched in each postseason. For reference, there are ~28 thousand innings pitched in the postseason pitching table compared to ~3.75 million in the regular season pitching table. Despite quick mean calculations in Excel revealing that regular season BK/IP is much larger than postseason BK/IP, it is difficult to determine purely from this yearly visualization which is the larger overall value.

3.2 Hypothesis Test

Setup

Question: Are there significantly more balks called per inning pitched in the regular season compared to the postseason?

H0: μD, regular season BK/IP - postseason BK/IP ≤ 0

HA: μD, regular season BK/IP - postseason BK/IP > 0

where H0 is the null hypothesis, HA is the alternative hypothesis, and μD, regular season BK/IP - postseason BK/IP is the population mean difference in balks per inning pitched for the regular season compared to the postseason.

A right-tailed, independent t-test comparing two independent means is appropriate for this scenario for the following reasons:

  • Mean regular season BK/IP and mean postseason BK/IP are random samples from two independent populations. These means are considered samples because The Lahman Baseball Database does not have data for all of history and there is some missing data for the years the database does cover. Regular season and postseason data are not dependent on one another, assuming there is no significant relation between the pitchers that advance to the postseason and the pitchers that commit an extremely high or low amount of balks.
  • Population standard deviation is unknown.
  • Assumption: the BK/IP data is approximately normal. This assumption is not as important since the sample size of innings pitched is so large (~3.75 million innings pitched in the regular season pitching table and ~28 thousand in the postseason pitching table) and the Central Limit Theorem can be invoked.

Since the variances are not roughly equal, as illustrated by the above figure, unpooled standard error is appropriate for this test.

Results

Regular Season:
Weighted mean (BK/IP): 0.00356572754435
Weighted standard deviation (BK/IP): 0.00945781564073
Sample size (IP): 3 761 644.66667

Postseason:
Weighted mean (BK/IP): 0.00256125009702
Weighted standard deviation (BK/IP): 0.0292809931418
Sample size (IP): 25 768.6666667

Unpooled SE: 0.000182471469219
t: 5.50484660225
df: 25 804.5120818
p-value: 0

There is sufficient evidence at any alpha level of significance to support the claim that there are significantly more balks called per inning pitched in the regular season compared to the postseason.

So why was there 1 balk called every 280 innings in the regular season and only 1 balk called every 390 innings in the postseason? My speculation is that a combination of the two factors below is responsible for the discrepancy:

  • Umpires "swallow the whistle" in the postseason and tend to make fewer controversial calls. Balks are somewhat of a grey area and perhaps umpires are more conservative when the stakes are high in playoff games. From the linked article, "psychologists have found that people view inaction as less causal, less blameworthy and less harmful than action."
  • Pitchers are more careful to not commit a balk when the stakes are high, as is the case in the postseason. The 162-game regular season is a marathon and it is likely difficult to maintain a consistently high level of focus throughout. Perhaps pitchers consciously apply a higher level of focus in the playoffs with regards to balks.

4. Balk Kings & Iron Men

Who is the all-time balk king? Who is the modern-day (post-2000) balk king? Who is the balk iron man (most innings pitched without a balk)?

All-time Balk King

Name First Game Last Game IP BK IP/BK
Don Heinkel 1988-04-07 1989-05-18 62.67 7 8.95
Don Rowe 1963-04-09 1963-07-18 54.67 5 10.93
Ravelo Manzanillo 1988-09-25 1995-05-09 63.00 5 12.60
Ray Hayward 1986-09-20 1988-07-05 78.67 5 15.73
German Gonzalez 1988-08-05 1989-09-25 50.33 3 16.78
Tim Fortugno 1992-07-20 1995-07-26 110.33 6 18.39
German Jimenez 1988-06-28 1988-10-01 55.67 3 18.56
Steven Kent 2002-04-04 2002-09-22 57.33 3 19.11
Yunesky Maya 2010-09-07 2013-05-21 59.00 3 19.67
Gene Walter 1985-08-09 1988-09-30 182.67 9 20.30

For pitchers with more than 50 innings pitched, Don Heinkel is the all-time IP/BK leader with 1 balk every ~9 innings pitched (7 balks in 62 and 2/3 innings). Heinkel, like many of the pitchers on this leaderboard, pitched in the era influenced by 1988, The Year of the Balk, so perhaps he doesn't deserve the Balk King title based on true "skill" alone.

Modern-day Balk King

Name First Game Last Game IP BK IP/BK
Steven Kent 2002-04-04 2002-09-22 57.33 3 19.11
Yunesky Maya 2010-09-07 2013-05-21 59.00 3 19.67
Evan Reed 2013-05-16 2014-09-19 55.67 2 27.83
Al Alburquerque 2011-04-15 2015-09-29 225.00 8 28.12
Franklin Morales 2007-08-18 2015-10-04 486.00 17 28.59
Nick Neugebauer 2001-08-19 2002-09-25 61.33 2 30.67
Ambiorix Burgos 2005-04-23 2007-05-26 160.333333 5 32.07
Matt Andriese 2015-04-10 2015-10-04 65.67 2 32.83
Edgmer Escalona 2010-09-10 2013-08-18 100.00 3 33.33
Travis Phelps 2001-04-19 2004-09-11 105.67 3 35.22

For pitchers with more than 50 innings pitched in the post-2000 era (after balk rates normalized following the 1988 rule change), Steven Kent is the modern-day IP/BK leader with 1 balk every ~19 innings pitched (3 balks in 57 and 1/3 innings). Kent pitched in the 2002 season. Perhaps the more interesting name on this list is Franklin Morales. Morales, still active in 2016, has committed an astounding 17 balks in 486 innings, which equates to 1 balk every ~29 innings. His sample size of 486 innings pitched is more than double anyone else's on the leaderboard. For me, Franklin Morales is the modern-day balk king.

Balk Iron Man

Name First Game Last Game IP BK IP/BK
Kirk Rueter 1993-07-07 2005-07-29 1918.00 0 inf
Sam Jones 1951-09-22 1964-10-03 1643.33 0 inf
Eric Milton 1998-04-05 2009-06-27 1582.33 0 inf
Don Larsen 1953-04-18 1967-07-07 1548.00 0 inf
Tom Brewer 1954-04-18 1961-09-27 1509.33 0 inf
Dick Hall 1952-04-15 1971-09-25 1259.67 0 inf
Paul Lindblad 1965-09-15 1978-10-01 1213.67 0 inf
Chad Billingsley 2006-06-15 2015-07-18 1212.33 0 inf
Steve McCatty 1977-09-17 1985-09-25 1188.33 0 inf
Bill Travers 1974-05-19 1983-07-17 1120.67 0 inf
Trevor Hoffman 1993-04-06 2010-09-29 1089.33 0 inf
Clem Labine 1950-04-18 1962-04-24 1079.67 0 inf
Jonathon Niese 2008-09-02 2015-10-04 1068.33 0 inf
Dave Boswell 1964-09-18 1971-09-17 1065.33 0 inf
Scott Baker 2005-05-07 2015-05-02 1064.67 0 inf

Kirk "Woody" Rueter is the all-time innings pitched leader without a balk. Rueter pitched 1918 balk-less innings over a career that spanned 13 years, ~275 more innings than the second-place Sam Jones. Perhaps even more impressive is the fact that Rueter began his career in 1993, when balk-calling rates had not yet settled down from the highs caused by the rule change in 1988, i.e., The Year of the Balk. Jonathan Niese is probably the active pitcher on this leaderboard that has the best shot at catching Rueter. Niese is only 29 years old and is still a starting pitcher that logs ~150-200 innings per year. It would take Niese ~4.5 more balk-less years at his current pace to take the title of Balk Iron Man away from Rueter.


This post was a project for the "Investigate a Dataset" phase of my Udacity Data Analyst Nanodegree. The code used to generate this analysis is located here. Data wrangling steps are outlined as well.