Sports Reference Blog

Archive for the 'Statgeekery' Category

2020 WAR Update

16th March 2020

As we approach the beginning of the 2020 season, we have made some updates to our Wins Above Replacement calculations.  You may notice some small changes to figures as you browse the site. As always, you can find full details on how we calculate WAR here.

Defensive Runs Saved Changes

Last week, we updated Defensive Runs Saved (DRS) totals across the site with new figures from Baseball Info Solutions.  The new methodology involves breaking down infielder defense using the PART system - assigning run values to Positioning, Air Balls, Range, and Throwing.  Under the new system, an infielder’s total DRS is the sum of his Air Balls, Range, and Throwing runs saved, while Positioning runs saved are credited to the team as a whole.  You can read more about the updates in the Sports Info Solutions blog.  The PART system applies to all infielders since 2013.

Folding these numbers into WAR, we see some significant changes for individual player seasons.  The 2019 Oakland A’s get even more recognition for defense on the left side of their infield, with shortstop Marcus Semien gaining 0.7 WAR and third baseman Matt Chapman gaining 1.6 WAR from the new DRS numbers, lifting both players above Mike Trout and into second and third place respectively on the 2019 AL WAR leaderboard.  Chapman’s 1.6 additional WAR represents the largest single-season change in this update.

On the other end of the spectrum, we see Adrian Beltre with the most significant drop in this update, losing 1.5 WAR in 2015.

Since we use DRS to measure the quality of a team’s defense, these new values also impact pitcher WAR values.  Team total DRS changed by as much as 46 runs for a given team and season - the 2019 Dodgers defense improved from 75 DRS to 121 DRS by non-pitchers under the new system.  Once applied to a specific pitcher, however, the changes to WAR are much smaller in magnitude than the changes to individual fielders. The most extreme example is Hyun-Jin Ryu, who pitched 182.2 innings in front of the 2019 Dodgers defense.  Considering the Dodgers defense to be 46 runs better across the entire season, and considering that Ryu was the pitcher for 13.52% of the Dodgers’ balls in play in 2019, we adjust our expected runs allowed for Ryu by 6.2 runs for the season. After following the rest of the steps in our pitching WAR calculation, the end result is a drop of 0.3 WAR for the season.  All other changes to pitching WAR from this change to team defense are smaller than Ryu’s 0.3 WAR drop in 2019.

Park Factors

Park factors for 2018 have been re-computed to include the 2019 season, since WAR uses a three-year average for park factors when computing pitching WAR.  The most significant change here is the Miami Marlins, whose pitching park factor rose from 90 to 95 (where <100 represents a pitcher’s park and >100 represents a hitter’s park).  José Ureña sees the biggest benefit from this, with his 2018 WAR rising by 0.7 wins. All other changes to pitching WAR from updated park factors are smaller than Ureña’s 0.7 WAR gain in 2018.

New Game Logs from Retrosheet (1904-1907)

Last month, we updated the site with new data from Retrosheet, including new game logs for players from 1904 to 1907.  Having game-level data allows us to be more precise in our WAR calculations, since we can consider the specific ballparks a pitcher played in and the opponents he faced.

Take Christy Mathewson in 1907 as an example.  Prior to this change, we used the league average (excluding his team) of 3.36 runs per nine innings as the expected quality of his opposition.  However, with game-level data, we can see that Mathewson’s actual opponents averaged 3.55 runs per nine innings, showing that Mathewson was probably used strategically and started more games against better opponents.  Indeed, Mathewson pitched in 10 of the Giants’ 22 games against the league’s best offense, the Pirates, as well as 7 of the Giants’ 22 games against the Cubs, the NL’s second-best offense. Against the Dodgers and Cardinals, who each struggled offensively and scored fewer than 3 runs per game, Mathewson pitched in just 8 games total.

Knowing this about his usage, we can set more accurate expectations for how many runs an average player would have allowed under Mathewson’s circumstances.  By adjusting the quality of his opposition, we expect an average pitcher to have allowed about 7 more runs over the course of the season, resulting in a bump of 0.9 WAR in 1907.  All other changes to pitching WAR from new game log data are smaller than Mathewson’s 0.9 WAR gain in 1907.

Baserunning and Double Plays from Play-by-Play Data (1931-1947)

When calculating runs from baserunning and double plays, we use play-by-play data from seasons where it is complete enough to credit players for things like scoring from first on a double, advancing from first to third on a single, and hitting into fewer double plays than expected.

In the past, we have taken play-by-play data into account back to 1948 for baserunning and double plays, because the data further back than that has been incomplete and could give players an advantage in their WAR simply by having more complete play-by-play records than their peers.  As this data has become more complete over time, we have moved this cutoff back to 1931. The data is still somewhat sparse for games that took place during World War II (1943-45), but we felt it was worth including those years as well.

Pete Reiser of the Brooklyn Dodgers was skilled at taking extra bases, and it showed in the play-by-play accounts.  In 1942, he took extra bases at a rate of 55%, compared to the league average of 45%. Additionally, the Dodgers were tied with the Cardinals as the league’s top scoring offense, so Reiser had many opportunities to put his speed to use.  He scored from first on doubles a league-leading ten times in just 15 opportunities, and also scored from second on a single 24 times, good for 5th in the NL that year, in just 29 opportunities. Using this play-by-play data while computing WAR gives Reiser an additional 1.2 WAR in 1942.  All other changes to batting WAR from this change are smaller than Reiser’s 1.2 WAR gain in 1942.

Caught Stealing Totals from Game Logs (1926-1940)

When crediting runners for how many runs they contributed with their baserunning, we take into account their stolen base and caught stealing totals.  Caught stealing totals are missing for many players between 1926 and 1940, but we have complete game logs for players in that span.

In the past, when we didn’t have a caught stealing total for a player, we would estimate how many times they were likely to have been caught stealing based on the league’s stolen base success rate and the ways the player reached base during the season.

We are now using actual caught stealing totals from the players’ game logs, so there are some changes for players who did considerably better or worse than we had been estimating.

Take, for example, Freddie Lindstrom.  In 1928, the Giants third baseman stole 15 bases, but his official season stat line does not have caught stealing available.  Previously, we had estimated that he was caught stealing 11.57 times, based on everything else we knew about his performance and the league he played in.  However, game logs indicate that Lindstrom was caught 21 times, nearly twice as often as we had estimated. This difference gets folded into our baserunning runs calculation and results in a drop of 0.4 WAR.  All other changes to batting WAR from this change are smaller than Lindstrom’s 0.4 WAR drop in 1928.

Biggest Career Movers

Hall of Famer Ernie Lombardi sees the biggest change to his career WAR with this update, sinking from 46.8 WAR to 39.5 WAR, a drop of 7.3 wins.  The largest gain goes to infielder Lonny Frey, who picks up 5.2 wins. Both these players played in the 1930s and 1940s and saw big changes because of their baserunning.  Lombardi is known for being one of the slowest runners in baseball history, and this update shows that the numbers back that reputation. Frey was a fast runner in an era where stolen bases were rare, so he has been underrated to this point when it comes to his baserunning contributions.

On the mound, previously cited Hall of Famer Christy Mathewson is the big winner.  As discussed above, his WAR now recognizes how his manager would use him against tougher opponents, and he sees his career WAR jump by 2.2 wins.  Barney Pelty experiences the biggest drop of 1.9 wins.

We’ve highlighted some of the more extreme changes here, but to see full lists of the largest changes to season and career WAR totals, please see the spreadsheet here.

We're very excited about these new additions and hope you enjoy them as well. Thanks to Baseball Info Solutions for their contributions. Please let us know if you have any comments, questions or concerns.

Posted in Advanced Stats, Announcement, Baseball-Reference.com, Data, Features, History, Leaders, Play Index, Statgeekery, WAR | 5 Comments »

Ad-Free and Play Index Changes Coming to Baseball-Reference.com

4th March 2020

The Play Index launched on Baseball-Reference.com over thirteen years ago and has been one of the most used research tools for baseball ever since. We've made a few additions over the years, but the tools have largely stayed the same and the price has only gone from $29/year to $36/year during those thirteen years.

The Sports Reference sites have continued to grow in traffic and advertising revenue over that time to the extent that the Play Index and our ad-free options are a very, very small portion of our revenue. Most of that is on us, as we have not done a great job of promoting and marketing tools that are highly valued by a dedicated group of users. The Baseball Play Index represents less than 4% of our revenue and ad-free memberships are less than 1%. In addition, the Play Index tools are complicated to maintain and manage, and quite frankly are a money-loser for us at this time. It's well past time to re-think how these tools are setup within our constellation of sites.

While Sports Reference is doing quite well overall, I'm not comfortable with having so much of our revenue dependent on advertising. We are very beholden to search engines continuing to send us traffic, and likewise the ad market can be fickle and difficult for a small to medium size operator to navigate.

Also, advertising on the sites does not make it easier for you to answer the questions you have. This is our primary mission. We maintain a relatively low level of advertising on the sites (at least compared to your regional newspaper), and we are loathe to add additional advertising units or more intrusive units. Some of you may use an ad blocker, in which case we are making no money from your use of the site at all, and the audience for our ad-free product has proven to be very small as well.

A subscription model aligns our interests much better with our users' interests as well. I realize that users are being asked to sign up for lots of subscriptions these days, but we feel the tools within the Play Index are so specialized and useful that they warrant a paywall.

So we are making some changes. The Play Index for each site will be moving to Stathead.com. Stathead.com will become the center for all of our subscription products. We expect these products to include tools and information beyond just a redesigned set of Play Index tools. This won't happen all at once, but we'll start with baseball and then proceed through the remainder of our sports. Also, we will be ending our ad-free product and instead Stathead memberships will have ad-free built-in. There just aren't enough users to justify a separate ad-free product. These changes will begin this month and continue through April on baseball and then continue with the other sites after that.

If you are a subscriber, we will make every effort to make certain you are happy with the options we provide to convert your ad-free or Play Index subscription over to Stathead including the option of a refund on your subscription. You will be hearing more from us about the changes over the next few weeks as we will email users directly.

During the deployment of these changes, the Play Index on Baseball-Reference.com (and the to be launched Stathead.com Baseball) will be free. They will continue to be free through at least April 30th. If you are a current subscriber to either of our products, we have already extended your subscription by an additional two months during this free period.

--sean forman

Posted in Announcement, Baseball-Reference.com, Play Index, Redesign, Statgeekery | 27 Comments »

2019-20 NBA Player Projections Added to Basketball Reference

18th October 2019

Basketball-Reference has added 2019-20 NBA player projections, using our Simple Projection System, which is adapted from Tom Tango's Marcel the Monkey Forecasting System.

Since we're not controlling substitution patterns, all projections are for per-36 minutes statistics. In addition to the full list of player projections linked above, we are also displaying them on individual player pages. Please use these responsibly and enjoy! Also, take the time to check out some other features you may be unaware of in the site's Frivolities section.

Posted in Announcement, Basketball-Reference.com, Features, Statgeekery | Comments Off on 2019-20 NBA Player Projections Added to Basketball Reference

Old Hoss Radbourn: 59 or 60 Wins?

10th April 2019

Keen-eyed Baseball-Reference users have written us asking about an update made to the statistics of Hall of Fame pitcher Old Hoss Radbourn. In the past, we had displayed Radbourn with 59 wins in his 1884 season with Providence. However, in a recent update, Radbourn has been bumped up to 60 wins.

Before we delve into what the correct number is, let's zoom out a bit, first. It will probably surprise most baseball fans to discover that there was no league-mandated rule in place for assigning wins and losses before 1950. Wins were awarded, but they were entirely up to the discretion of the official scorer. Compounding this issue is the fact that while the leagues tracked pitcher wins for much of the Deadball Era, they made many errors, and even briefly stopped officially counting pitcher wins and losses for a few years in the 1910s as ERA was first gaining popularity. A SABR member named Frank Williams meticulously corrected the record, and his research formed the basis for the accepted totals you see today.

Williams unveiled his groundbreaking work in 1982 with the article All the Record Books Are Wrong. I'd encourage you to read the article at that link (and thank you to John Thorn for re-posting it in its entirety).

Williams was the original source for the 59 wins attributed to Radbourn in 1884. He arrived at this number by determining what practices were used at the time to determine pitcher wins and losses. Earlier record books had retroactively applied the 1950 rule to Radbourn's era and given him 60 wins as a result. However, it was discovered by Frederick Ivor-Campbell that this was done in error and that one of his 1884 wins (on July 28) should have actually been credited to his teammate Cyclone Miller.

Miller was indeed the correct winner if you applied the 1950 rule, since he pitched 5 innings and left with a lead. However, Radbourn pitched 4 shutout innings and was more effective. Practice in the 1880s allowed for the more effective pitcher to be deemed the winning pitcher, per Pete Palmer. While Williams originally concluded that Miller was the correct winner of this game (giving him 59 wins on the season), he has recently concluded that using practices of the time Radbourn is the correct winner, and therefore has 60 wins in 1884.

Ironically, we end up back at the original 60 wins attributed to Radbourn's 1884 season all the way back in 1920, but hopefully we've learned a good deal along this path. We hope this serves as a reminder how valuable the research done by SABR members is.

In conclusion, we are now showing that Old Hoss Radbourn was credited with 60 wins in his 1884 season.

Posted in Baseball-Reference.com, History, Statgeekery | 7 Comments »

2019 WAR Update

21st March 2019

As we approach the beginning of the 2019 season, we have made some updates to our Wins Above Replacement calculations.  You may notice some small changes to figures as you browse the site. As always, you can find full details on how we calculate WAR here.

Openers

Last season, the Tampa Bay Rays popularized the concept of the opener, where the first pitcher of the game is expected to pitch considerably less than a typical starting pitcher.  The opener is followed by a “headliner” or “bulk guy,” who enters the game after the opener but takes on responsibilities similar to a traditional starting pitcher. The Rays found success with this approach, and several other teams followed suit.

Our Wins Above Replacement calculation treats starting pitchers and relief pitchers differently, since relief pitchers have much lower ERAs than starters.  The opener strategy throws a wrinkle into this, since the opener is not expected to go deep into the game and the headliner is, so we have a starting pitcher who is behaving more like a relief pitcher and vice versa.

Tom Tango posted some thoughts on this last year, and the discussion in the comments of that post produced a working definition for the opener:

  1. Determine if we have an opener.  This pitcher must start the game and have either at most 2 innings pitched (6 outs), or at most 9 batters faced.
  2. Determine if we have a headliner. This pitcher must meet two criteria:
  3. Length of appearanceAt least 4 innings pitched (12 outs), or at least 18 batters faced
  4. Order of appearanceThey are the first reliever, OR they are the second reliever, but the first reliever entered mid-inning, and the second reliever started the following inning

 

If both these pitchers exist, then we have a game with an opener and a headliner.  Both pitchers must exist; you cannot have an opener without a headliner, and vice versa.

Using this definition, we have updated our WAR calculation to treat openers like relievers and headliners like starters.  This change has been applied to all seasons since 1960, the first year we apply a starter/reliever adjustment.

Ryan Yarbrough, the Rays’ most frequent headliner, is an instructive case.  He pitched 38 games and 147.1 innings, but started just 6 times.  By the above definition, 16 of his relief appearances were as a headliner.  Prior to this adjustment, the Rays’ rookie had 0.9 WAR for 2018. After the adjustment, Yarbrough has 1.5 WAR.  The new calculation recognizes that Yarbrough is behaving more like a traditional starting pitcher, and holds his performance to the same standard it would if Yarbrough had started those games.

Park Factors

Park factors for recent seasons have been re-computed to be three-year rolling averages. For instance, 2017 Park Factors now encompass 2016-2018. This is something that needs to be done each year when the season ends.

Catcher Defense Prior to 1953

With help from Sean Smith of baseballprojection.com (and of an unnamed team front office) and baserunning statistics from Pete Palmer, we now have incorporated catcher defense for the years 1890 through 1952 based on stolen bases, caught stealing, errors, passed balls, and, from 1925 on, wild pitches.  Prior to this update, these players’ defensive abilities were judged only based on errors and passed balls.

Duke Farrell is a particularly noteworthy beneficiary of this change.  His career WAR rises by nearly 8 wins, because he played in an era (1888-1905) with a lot of stolen base attempts and did a better job of throwing out runners than his contemporaries.

This change also impacts pitchers’ WAR figures, since we have more information about the quality of defenses to take into account.  For instance, Jack Taylor and Kid Nichols of the 1904 Cardinals see their WAR numbers rise by more than a win each after accounting for the fact that their catchers threw out fewer runners than the rest of the league.  Indeed, the Cardinals’ primary backstop Mike Grady saw his WAR drop by two wins with this update.

On the flipside, legendary pitcher Cy Young loses more than 4 wins over his career after accounting for the above-average work his teammates did behind the plate throughout his career.

We’ve highlighted some of the more extreme changes here, but to see full lists of the largest changes to season and career WAR totals, please see the spreadsheet here.

 

Posted in Advanced Stats, Announcement, Baseball-Reference.com, Data, Statgeekery, WAR | 12 Comments »

“Own Goals” Added to NBA/ABA Box Scores

12th March 2019

At Basketball Reference, we're constantly working to beef up the accuracy and content of our historical box scores. A coverage map of our progress can be seen here.

We recently undertook a mission to ensure that the sum of player points within our box scores correctly adds up to their team's points in that game. We uncovered an interesting wrinkle in the way the NBA (and the ABA) attributed points to players and teams decades ago.

Read the rest of this entry

Posted in Announcement, Basketball-Reference.com, History, Statgeekery | Comments Off on “Own Goals” Added to NBA/ABA Box Scores

Full Shooting Details for Every* 50-Pt Game in NBA History

6th February 2019

Through the games of February 5, 2019 there have been 533 50-point games in NBA history (497 in the regular season and 36 in the playoffs). We have a box score with FGM, FTM and Points scored for every player for every game in NBA history. What we don't always have (particularly for older seasons) is FGA and FTA. However, we have made a concerted effort to get these details for every 50-point game in league history and we now have the full shooting details for all of these performances with the exception of three one.

Read the rest of this entry

Posted in Announcement, Basketball-Reference.com, Data, Features, History, Play Index, Statgeekery | 4 Comments »

Get to Know Mike Daum & Chris Clemons: CBB’s Next 3,000-Pt Scorers

4th November 2018

They've been playing college basketball for well over 100 years. And yet only eight men in the history of major men's college basketball have managed to score 3,000 career points: Pete Maravich, Freeman Williams, Lionel Simmons, Alphonso Ford, Doug McDermott, Harry Kelly, Keydren Clark and Hersey Hawkins. Notably, none of these players joined the 3,000-pt club in the same season.

In the 2018-19 season, we could see membership in this club jump from eight to 10 as Campbell's Chris Clemons and South Dakota State's Mike Daum seem poised to become the first pair of players in NCAA history to join the 3,000-point club in the same season. Clemons, a 5'9" high-flyer, and Daum, a 6'9" double-double machine, have few things in common in style of play, but they each enter their senior seasons with identical career totals of 2,232 points. Every NCAA D-I men's basketball player that has entered their senior year with 2,200+ points has gone on to reach 3,000 that season. Read the rest of this entry

Posted in Announcement, CBB at Sports Reference, Data, History, Statgeekery, Uncategorized | Comments Off on Get to Know Mike Daum & Chris Clemons: CBB’s Next 3,000-Pt Scorers

Ejection Totals and In-Game Tendencies Added to Manager Pages

24th October 2018

For manager pages on Baseball-Reference, we have added a column for ejections to their primary Managerial Stats table. Bobby Cox's career 162 ejections make for a nice finishing piece on his collection of accolades. We have ejections data for managers all the way back to the 1889 season, so even classics like John McGraw are fully accounted for. We'll also take the opportunity to mention that if you want to dig into what the cause for these ejections were, Retrosheet's Managers section will have that for you.

We have also added a new Managerial Tendencies table to managers' pages, showing how often their teams employed certain strategies and how their rate compared to the league they were managing in. We show a manager's tendencies in stolen base attempts at 2nd and 3rd, as well as how often their teams attempted sacrifice bunts, issued intentional walks, or made player substitutions.

Using one recent example, in Dusty Baker's final year with the Washington Nationals, his players attempted to steal 3rd base on 2.9% of the chances they had. Using 100 as the league average, Baker in 2017 had a league-adjusted rate of 180, meaning that Baker's team was attempting this almost twice as much as the average NL squad that season.

We have intentional walk tendencies back to 1955, while the other managerial tendencies are available since 1925. If you have any questions about this new feature or any other section of Baseball-Reference, feel free to contact us through our feedback form.

Posted in Announcement, Baseball-Reference.com, Data, Features, History, Statgeekery, Trivia | 3 Comments »

Introducing the WNBA Player Season Finder

10th August 2018

Regular Basketball-Reference users are well acquainted with the Play Index, which allows us to compare players across eras and slice and dice season-level data by many criteria. Today we are now introducing the WNBA Player Season Finder, which will be accessible from both the Play Index page and from our WNBA home page. We have WNBA stats back to the league's inaugural 1997 season, which means you can now search all of WNBA history with this tool.

Just like our NBA Player Season Finder, with the new WNBA tool you can do single-season, combined season and total season searches. For example, with the combined season search, you can now create franchise career leaderboards, maybe to see how far ahead in first place Tamika Catchings is among point scorers in Indiana Fever history. Or with the total seasons search, you can now execute a search like players with the most qualified seasons of 2 blocks per game; Margo Dydek and Lisa Leslie lead with nine seasons each finishing with that mark in their career.

Of course, current season stats are also searchable with the Player Season Finder, so you can give them some perspective with past stats. A'ja Wilson is burning up the league in her first WNBA season, currently averaging over 20 points per game. Here's a look at the others in WNBA history who finished with 20 points per game in their rookie season.

Query Results Table
Tota Tota Per Per Per Per Per Per Per Per Per Shoo Shoo Shoo Shoo Shoo
Rk Player Season Tm Lg PTS G GS MP FG FGA 2P 2PA 3P 3PA FT FTA FG% FT% 2P% 3P% eFG%
1 Cynthia Cooper 1997 HOU WNBA 22.2 28 28 35.1 6.8 14.5 4.4 8.7 2.4 5.8 6.1 7.1 .470 .864 .508 .414 .553
2 Seimone Augustus 2006 MIN WNBA 21.9 34 34 33.1 8.3 18.2 7.4 15.7 0.9 2.5 4.4 4.9 .456 .897 .473 .353 .481
3 A'ja Wilson 2018 LVA WNBA 20.3 29 29 30.8 7.1 16.0 7.1 16.0 0.0 0.0 6.0 7.7 .446 .785 .446 .446
Provided by Basketball-Reference.com: View Original Table
Generated 8/10/2018.

Stay tuned for more additions to the WNBA section of our site here on the Sports-Reference Blog. If you have any questions or suggestions, feel free to contact us through our feedback form.

Posted in Advanced Stats, Announcement, Basketball-Reference.com, Features, History, Leaders, Play Index, Stat Questions, Statgeekery | 4 Comments »