Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Ljung-Box or Durbin Watson — Which test is more powerful
Latest   Machine Learning

Ljung-Box or Durbin Watson — Which test is more powerful

Last Updated on July 26, 2023 by Editorial Team

Author(s): Tanveer Hurra

Originally published on Towards AI.


Source: Pixabay

Durbin Watson is more powerful but there is a catch. Read on to know more.

When it comes to statistical testing, one of the most important factors that we look for is the power of the test, which may be briefly defined as follows:

Power of a test: The probability that the test will reject the null hypothesis when the alternate hypothesis is true.

In simple words higher the probability of a test to detect the True Positive, the higher its power is. This will become more lucid throughout this article. We will check two statistical tests: Ljung-Box and Durbin Watson for their power and draw a conclusion of which one to use and when.

Both Ljung-Box and Durbin Watson are used roughly for the same purpose i.e. to check the autocorrelation in a data series. While Ljung-Box can be used for any lag value, Durbin Watson can be used just for the lag of 1. The Null and Alternate hypotheses for both the tests are the same:

H0: There is no autocorrelation in the data.

H1: There exists a significant autocorrelation.

We will use python libraries to carry out the experiment and the procedure of the experiment will be as follows:

  1. Create a random data-set (no correlation case)
  2. Carry out Ljung-Box and Durbin Watson test on it and record the output.
  3. Repeat step 2 multiple times (1000 times) to check the probability of the test to reject the null hypothesis. Probability to give out False Positive.
  4. Calculate the power of the test: 1 — value obtained in step 3.

We first need to load all the required libraries:

from statsmodels.stats.api import acorr_ljungboxfrom statsmodels.stats.stattools import durbin_watsonimport numpy as npimport matplotlib.pyplot as plt

We will create a random dataset first using random.normal() function from Numpy, this will create a random number picked from a standard normal distribution.

sample_size = 150
random_data = [np.random.normal() for i in range(sample_size)]

The two tests imported from statsmodels library can be used directly to calculate the test statistic and p-value. Here it becomes prudent to make it clear that in the case of the Durbin Watson test, we fail to reject the null hypothesis if the test-statistic is around 2 and reject the null hypothesis otherwise. In the case of the Ljung-Box test, the decision can be taken by using the p-value that the test throws.

The whole logic can be given the shape of a function as shown below:

def run_test(sample_size): random_data = [np.random.normal() for i in range(sample_size)] #create random data with given sample size dw = durbin_watson(random_data) if(dw > 1.8 and dw < 2.2): 
#A tolerance of 0.2 is kept to decide in case of DW
dw = 0 else: dw = 1 ljung = float(acorr_ljungbox(random_data, lags = 1)[1]) #The acorr_ljung_box() returns both test-statistic and p-value, index of 1 is used to access the p-val. if(ljung > 0.05): #Significance level of 5% is considered ljung = 0 else: ljung = 1 return dw, ljung

Both of these tests return 0 if the null hypothesis is not rejected and 1 otherwise. Ideally, the function defined above should always return 0 as we are testing the data series of random nature. A value of 1 returned by the function will be a False Positive and will be used to judge the power of these two tests.

Now that we have run_test() function with us, we can call it again and again to calculate the power of these tests, but we will do it for not just a single sample size but multiple sample sizes to understand the relation of power with the size of the data.

sample_sizes = [50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000]

The sample sizes defined in the above python lists will be used to carry out this experiment and we will run the run_test() function for each sample size 1000 times.

The below lines of code will do the job for us:

sample_sizes = [50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000]number_of_runs = 1000#Creating empty lists to contain the results later for durbin watson(dw) and ljung-box (lb)dw_data = []lb_data = []for sample_size in sample_sizes: x = [run_test(sample_size) for i in range(number_of_runs)] #runtest() is called 1000 times for each sample size dw = [i[0] for i in x] lb = [i[1] for i in x] dw_per = np.sum(dw)/number_of_runs 
#calculatng fraction of times the null hypothesis was rejected
lb_per = np.sum(lb)/number_of_runs#Populating the empty lists to contain the results dw_data.extend([dw_per]) lb_data.extend([lb_per])

We now have results with us and we are now at the stage to check how the power of these two tests relate to the sample size of the data. We will use the matplotlib library to plot the results to get the insights and inferences:

plt.plot(sample_sizes, lb_data, label = ‘Ljung-Box’)plt.plot(sample_sizes, dw_data, label = ‘Durbin-Watson’)plt.xlabel(‘Sample Size’); plt.ylabel(‘1-Power’)plt.legend()plt.show()
Code Output. Source: Self

The above graph shows it clearly that for a small sample size, using a Durbin-Watson test is a bad idea as it has low power but for larger sample sizes it performs better than ljung-box. In the case of Ljung-Box, the power is consistent irrespective of the sample size. So which one to use depends on the sample size you have at your hand.

This article is also published on Tea Statistic

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓