Statistics

The F Distribution

Introduction

This tutorial is a complete guide to the F Distribution. This tutorial builds off of my chi-squared distribution tutorial covered here. I highly recommend having a solid understanding of the chi-squared distribution before moving on to the F distribution.

With that said, I will cover the theory behind the F distribution and reinforce that theory with data visualizations in Python. You will gain an understanding of what an F distribution is, how to calculate one, and how to visualize the distribution with Python. By the end, you will have a holistic understanding of all of the properties of an F distribution. That understanding will be cemented by hands-on experience with Numpy, Scipy, and Plotly.

Formal Definition

Suppose a continuous random variable, U₁, has a chi-squared distribution with n degrees of freedom and U₂ has a chi-squared distribution with m degrees of freedom. This implies the ratio of U₁ / n and U₂ / m has an F distribution with n degrees of freedom in the numerator and m degrees of freedom in the denominator. More formally:

Visualizing the F distribution in Python

For this tutorial, I will be using Plotly, Numpy, and Scipy stats. I will assume those packages are already installed. You can follow along with a Python file or Jupyter Notebook with the following imports at the top of your file:

import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import numpy as np
from scipy.stats import chi2, f

Now, let’s use these packages to visualize how an F distribution relates to chi-squared. First, let’s create two chi-squared distributions with 5 and 10 degrees of freedom:

fig = go.Figure()

fig.add_trace(
    go.Scatter(x=x, y=U_1_pdf, mode="lines", name=f"U<sub>1</sub>: ({n} dof)")
)

fig.add_trace(
    go.Scatter(x=x, y=U_2_pdf, mode="lines", name=f"U<sub>2</sub>: ({m} dof)")
)

fig.update_layout(
    width=700,
    title="Chi-Squared Distributions"
)
fig.show()

[IMG]

Additionally we can manually calculate the F distribution from these two chi-squared distributions and plot its pdf with a histogram:

F_manual = (U_1 / n) / (U_2 / m)
fig = go.Figure()

fig.add_trace(
    go.Histogram(x=F_manual, name="F", histnorm='probability density')
)
fig.add_trace(
    go.Scatter(x=x, y=f.pdf(x, n, m), mode="lines")
)
fig.add_trace
fig.update_layout(xaxis=dict(range=[0, 15]), width=700, title="F Distribution PDF")
fig.show()

[IMG]

As you can see, we confirmed our manual calculation of an F distribution with the built-in probability density function provided by the scipy.stats module.

Properties of the F Distribution

Let’s dive deeper into the properties of an F distribution.

Mean

The mean of an F distribution will always be the degrees of freedom in the denominator over the degrees of freedom in the denominator minus 2, or:

[EQ]

We can verify the validity of our formula by manually taking the mean of our sample of data and comparing to plugging in the appropriate values into our formula. Previously we used m to denote the degrees of freedom in the denominator so we can plug m into our formula. Remember, the numbers won't be exact because their is always some noise in our sample.

mew = m / (m - 2) # m = 10
>>> 1.25
F_manual.mean().item()
>>> 1.252

Median

The median of an F distribution will be exactly 1 when the degrees of freedom in the numerator and the denominator are the same. Otherwise, the median will be roughly 1.

np.median(F_manual).item()
>>> 0.94

Mode

Unlike the other central tendencies, the mode of an F distribution requires more explanation. Let’s look at how the mode of the F distribution changes with degrees of freedom:

x = np.linspace(0, 4, 200)
dists = [(1, 3), (4, 12), (12, 4), (100, 100)]
PDF = [(f.pdf(x, n, m), n, m) for n, m in dists]
fig = go.Figure()
for (y, n, m) in PDF:
    fig.add_trace(go.Scatter(x=x, y=y, mode="lines", name=f"DOF<sub>num</sub>: {n} DOF<sub>den</sub>: {m}"))

fig.update_layout(title="F Dist Accross Multiple Degrees of Freedom", width=700)
fig.show()

[IMAGE]

Notice how the degrees of freedom in the numerator and the mode increase together. For n ≤ 2, the mode is either 0 or will approach 0. As n increases in approaches 1 and will always be below 1.

Conclusion

In this blog, we covered the theory behind the F distribution. You now understand how it relates to the chi-squared distribution. Additionally, you can calculate and visualize an F distribution both manually and by leveraging the scipy.stats module. In a subsequent tutorial, I will go over the F test and how it can be used to compare variances in the real world.

Let’s Connect

  • LinkedIn (Open to Work)
  • Twitter
  • Website (Consulting)
Previous
Chi Squared Distribution