Statistics

The Exponential Distribution

Problem

If we randomly sample 1000 Premier League players, the average length of a professional career is 12 years. Additionally, career longevity follows an exponential distribution. Joe Schmoe is a new prospective talent. What are the odds that his career will last exactly 8 years?

Solution: the exponential distribution

To answer this, we need the probability density function for the exponential distribution. The PDF formula for a continuous random variable X is as follows:

[EQUATION]

In this formula, λ is called our rate parameter. The rate parameter represents the average rate at which events occur and can be estimated by the mean of your continuous random variable. In our example:

[EQUATION]

Now that we have our rate parameter, we can calculate the PDF at 8 to get our answer:

lambda_ = 1 / 12
x = 8
res = lambda_ * math.e ** (-lambda_ * x)
print(res)
>>> 0.042784759919382666

Visualizing the PDF of the Exponential Distribution

We successfully calculated the probability of a career lasting exactly 8 years. It's not practical, however, to calculate the PDF manually when packages like Scipy are at our disposal.

Let’s leverage Scipy to calculate the same results. An important note from the Scipy documentation:

A common parameterization for expon is in terms of the rate parameter lambda, such that pdf = lambda * exp(-lambda * x). This parameterization corresponds to using scale = 1 / lambda.

In the following snippet, we will create a continuous random variable that is exponential and calculate its PDF at 8 to verify our manual calculation above.

ex_dist = expon(scale=1/lambda_)
ex_dist.pdf(8).item()
>>> 0.042784759919382666

Now that we’ve calculated the PDF, let’s visualize the shape of the PDF for an exponential distribution with Plotly.

x = np.linspace(0, 25, 26)
expon_discrete = expon.pdf(x, scale=1/lambda_)
x_cont = np.linspace(expon(scale=1/lambda_).ppf(0.05),
                     expon(scale=1/lambda_).ppf(0.95), 100)
expon_continuous = expon.pdf(x_cont, scale=1/lambda_)

fig = go.Figure()
fig.add_trace(go.Scatter(
    x=x_cont,
    # fill='tozerox',
    y=expon_continuous,
    mode='lines'
))
fig.add_vline(x=8)
fig.add_annotation(x=8, 
                   ax=35,
                   y=res,
                   xshift=5,
                   yshift=5,
                   text="P<sub>8</sub> = 0.042",
                   xanchor='left',
                   yanchor='bottom',
                   showarrow=True,
                   arrowhead=1)

fig.update_layout(
    xaxis=dict(dtick=2, title= "# of years until retirement"),
    yaxis=dict(title="Probability"),
    bargap=0.05, 
    width=800, 
    title="Exponential Distribution For Career Longevity with λ=0.833"
)

[IMG]

Cumulative Distribution Function for Exponential Distribution

Now, let’s slightly rephrase our question.

If we sample 1000 professional footballers at random, the average length of a professional footballer’s career in the Premier League is 12 years. Career longevity follows an exponential distribution. Joe Schmoe is a new prospective talent. What are the odds that his career will be shorter than 8 years?

To answer this we are looking for the area under the probability distribution. This area is shaded blue in the plot below:

[IMG]

To find this area, we can leverage the cumulative distribution function of the exponential distribution and evaluate it at 8. First, let’s go over the formula for the cumulative distribution function of an exponential distribution:

[IMG]

Now, let's use Scipy to calculate the cumulative distribution function at 8. Additionally, we’ll use Plotly to visualize the results.

ex_dist = expon(scale=1/lambda_)
ex_dist.cdf(8)
>>> 0.486582880967408

cdf = ex_dist.cdf(x_cont)
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=x_cont,
    fill='tozeroy',
    y=cdf,
    mode='lines'
))

fig.update_layout(
    xaxis=dict(dtick=2, title= "# of years until retirement"),
    yaxis=dict(title="Probability"),
    bargap=0.05, 
    width=800, 

    title="CDF for Exponential Distribution with λ=0.833"
)

fig.add_vline(x=8)
fig.add_annotation(x=8,
                   y=ex_dist.cdf(8).item(),
                   ax=-30,
                   yshift=10,
                   xshift=-10,
                   text="P<sub>8</sub> = .487",
                   xanchor='right',
                   showarrow=True,
                   arrowhead=1,
                   yanchor='bottom')
fig.show()

[IMG]

As you can see, the probability of a career being shorter than 8 is 0.487.

Conclusion

I believe that sports provide an easy analogy to make complex topics easily digestible. In this blog, we used career longevity to describe the theory behind an exponential distribution. Additionally, we were able to understand how to use the probability density function and the cumulative density function to answer questions concerning the longevity of a player's career.

References

“Scipy.Stats.Expon#.” Scipy.Stats.Expon — SciPy v1.14.1 Manual, docs.scipy.org/doc/scipy/reference/generated/scipy.stats.expon.html. Accessed 8 Sept. 2024.

Previous
F Distribution
Next
PyTest