Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by ${\rm RS}$ and ${\rm RA}$, there is some $\gamma$ such that the winning percentage is approximately ${\rm RS}^\gamma / ({\rm RS}^\gamma + {\rm RA}^\gamma)$. One important consequence is to determine the value of different players to the team, as it allows us to estimate how many more wins we would have given a fixed increase in run production. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who estimated the run distributions as arising from independent Weibull distributions with the same shape parameter; this has been observed to describe the observed run data well). We now model runs scored and allowed as being drawn from independent Weibull distributions where the shape parameter is not necessarily the same, and then use the Method of Moments to solve a system of four equations in four unknowns. Doing so yields a predicted winning percentage that is consistently better than earlier models over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression but must evaluate a two-dimensional integral of two Weibull distributions and numerically estimate the solutions to the system of equations; as these are trivial to do with simple computational programs it is well worth adopting this framework and avoiding the issues of implementing the Method of Least Squares or the Method of Maximum Likelihood.
翻译:暂无翻译