Minimizing the Mean Square Error: Frequentist Approach
Last Updated on June 4, 2024 by Editorial Team
Author(s): Varun Nakra
Originally published on Towards AI.
Statistical Inference and Mean Square Error
In a utopian world, we would have access to unlimited data, i.e., the entire population, so there wouldnβt be a need for an βinferenceβ, as we would know for certain about what we are interested in. However, in all practical settings, we donβt have access to the βpopulationβ but only have access to observations that comprise a sample. This results in the need to make βinferencesβ about the population using the sample. Statistical inference is drawing some type of conclusions about one or more parameters (population characteristics), and point estimation is selecting a single number, based on sample data, that represents a sensible value for those parameters. This gives rise to two concepts β the true value of the parameter, the βground truthβ, which is hidden from us as we donβt have access to the population, and the point estimate of the parameter.
The Frequentist considers the true value of the parameter ΞΈ as fixed but unknown. The Bayesian considers the true value of the parameter ΞΈ as an βobservedβ value of a random variable Ξ (which does not get observed by us, but is only a realization of the true value)
Again, under ideal settings, we could find an estimator that is exactly equal to the true value of the parameter, always! However, since the estimate of the true value of the parameter is a function of the sample, it is a random variable per se. This is because different samples would result in different values of the estimate. For some samples, the true parameter will be overestimated and for others underestimated. This leads us to the idea of an βerrorβ in our estimation.
Now, there could be multiple ways, such as squared error, absolute error, etc., in which the error of estimation mentioned in 1 can be quantified. However, we will stick to only squared error in this article and explore other βerror functionsβ in related articles in the future.
It makes sense to βaverageβ the squared error defined in 2 to measure the performance of the estimator on an βaverageβ. This gives rise to the concept of βMean Square Errorβ, which is defined as follows
Minimizing Mean Square Error: Frequentist approach
As mentioned above, under the frequentist approach, we will consider the true parameter as fixed but unknown and attempt to minimize it.
Consider a simple example of estimating the population mean using the sample mean for n observations x1, . . . , xn. The sample mean is the estimator of the population mean and is defined as follows
Instead of using 4, we use a modified version of sample mean that is multiplied by a constant k. The objective is to solve for the optimal value of the constant for which the mean square error will be minimum.
As evident from 11, the optimal value of k depends on the unknown parameter ΞΈ. In other words, the estimator of minimum mean square error is not realizable because, being dependent on ΞΈ, it depends on the bias term. Therefore, finding the minimum mean square error estimator becomes an unrealized task for this simple example and also for most practical cases. It is for this very reason that we constrain the bias to zero and find the estimator that minimizes the variance, leading to the concept of Minimum Variance Unbiased Estimator(MVUE). However, unfortuntately, even the MVUE estimator does not always exist. Are there alternative ways of finding estimators with minimum variance and minimum bias? Stay tuned for the answer in the sequel article βMinimizing the Mean Square Error: Bayesian approachβ
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI