The Misuse of Statistics
Last Updated on June 22, 2020 by Editorial Team
Author(s): Arghyamalya Biswas
Opinion, Statistics
Some people can misuse statistics because we can lie with statistics
βPoliticians use statistics in the same way that a drunk uses lamp-posts for support rather than illuminationβ -A.Lang
Statistics is the primary tool for assessing relationships and evaluating study questions by revealing the underlying truth of unbiased data. Unfortunately, these tools are often misused either inadvertently because of ignorance or lack of planning or conspicuously and deliberately to achieve some particular target or result. In this era of Big Data, Statistics and its methods are vital because they not only help to summarize or analyze the data but also provide interpretations and further future consequences. These consequences(social, economical, etc.) are one of the key reasons that propel a politician (or political party) to distort data and statistical analysis for his (or their) own fulfillment. In many ways, a certain fact or conclusion can be altered by performing some statistical techniques carelessly or deliberately.
Let us consider a locality L. Suppose party A used to rule that locality. In a certain election party, B won over A. Thereafter, party B is ruling the locality L. Now Party B is willing to show that their performance has been better than that of party A because again, the election is coming, and obviously they donβt want to lose. Party B wants inhabitants of locality L to have a good impression on them before the vital election so they will use data and statistics in some way that will turn in favor of party B. Inhabitants of locality L were asked to mark the performances of party A and party B on a scale of 6(on the real line). Depending on the satisfaction levels of inhabitants, both the party were marked on their respective performances. Suppose, Now, Party B claims that people are more satisfied with them, and the overall satisfaction level is higher for party B than that of partyΒ A.
Here comes statistics to justify the claim raised by party B. Suppose, X denotes the satisfaction marks for the party A given by a randomly selected inhabitant in the locality and Y denotes the same for party B. In this background, it is clear that (X, Y) generates a paired data. Two standard tests, namely Fisherβs t-test and paired t-test, can be used to test for the hypothesis concerning equality of means. But paired t-test is used usually when we do have bivariate data, and Fisherβs t-test is used when two variables of interest are independent. So in this context, to test the hypothesis(above mentioned) paired t-test is expected to provide the prominent result. But what happens if one uses Fisherβs t-test instead of paired t-test using n data points on X and Y variables(sample size isΒ n).
Suppose, n=10, then ten pairs of data on (X, Y) is givenΒ :
X: 1.77, 5.68, 0.07, 2.26, 2.60, 3.12, 3.56, 1.04, 2.68,Β 3.10
Y: 1.45, 3.93, -0.03, 1.01, 3.20, 2.02, 1.57, -0.61, 2.68,Β 2.43
Here, we are to test, H0:ΞΌ1=ΞΌ2 against H1:ΞΌ1>ΞΌ2 (where ΞΌ1, ΞΌ2 denotes mean satisfaction level of inhabitants by performances of party A and party B respectively)
We observe,
The test I i.e., Paired t-test rejects Ho if T1>c1=1.833.
So, Test I rejects Ho (null hypothesis) at a 5% level of significance.
Test II, i.e., Fishers Test rejects Ho if T2>c2=1.734
So, Test II accepts Ho(null hypothesis)at a 5% level of significance.
[T1 and T2 denotes the value of test statistics for paired t-test and Fisherβs t-test respectively, T1=3.024135, T2=1.262702]
This fact may induce party B to publish the conclusion based on Fisherβs t-test instead of the paired t-test simply because Fisherβs t-test does not have enough reason to deny their (party Bβs) claim. Thus people living in locality L will be digesting a falseΒ fact.
Although the above-mentioned trick of swapping or twisting the truth does not involve the worst factor, that is data manipulation the harsh truth of the real-world data is that data are manipulated even in the medical sector, social projects, environmental projects, etc. for certain political and other benefits. Sometimes tricks may be implemented in graphical diagrams like bar diagrams, histograms, time series plots,Β etc.
For a time-series data choice of the time interval is important in understanding the true trend of the variable of interest. In capturing the trend of employment rate, data must be seasonally adjusted. There are several little twisting factors that may produce βoverestimatesβ or βunderestimates,β and there are people always ready to use those estimates obtained by statistically wrongΒ means.
Thus we see when this body of scientific methods, statistics is used in a misleading fashion can trick the casual observer into believing something other than what data really shows. This introduces statistical fallacy, which occurs when a statistical argument asserts a falsehood. This type of cheap activity with data in this century of data should be abolished immediately, and this can happen if we become more aware of statistics with understanding.
βFalse Facts are highly injurious to the progress of science, for they often long endure; but false views if supported by some evidence, do little harm, as everyone takes a salutary pleasure in proving their falseness; and when this is done, one path towards error is closed, and the road to truth is often at the same time openedβ-Charles Darwin, TheΒ Descent
The Misuse of Statistics was originally published in Towards AIβββMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI