NHST or null hypothesis statistical testing is the most
commonly used statistical approach in health sciences, (Silva-Ayçaguer, 2010). Statistician Ronald Fisher introduced the
null hypotheses this is where two data sets do not differ for example the
average height of 5 years old boys and girls doesn’t differ (Fisher, 1955). Later Neymen and
Pearson reviewed Fishers approach they concluded that researchers accept the
null hypotheses or they reject it in favour of a second hypothesis, they also
reported this could cause two types of errors known as type one and type two
errors, this has developed into the null hypothesis testing system we use today
and is the corner stone data analysis (Neyman, 1928).
The P value is the probability of obtaining a set of results
by chance if the null hypothesis is true.
The null hypothesis is rejected in favour of the alternative hypothesis
if the P value is less than the predetermined level of statistical significance
(Daniel, 2002) it has been reported that hypothesis
testing can at best calculate doubt but cannot eliminate doubt in its entirety,
this doubt can be one of two types, either a type one error or a type two
error. A type one error occurs when a
researcher falsely rejects a null hypothesis and a type two error is when a
researcher falsely accepts a null hypothesis. (Banerjee, 2009) the important point
made is that we can never prove or disprove completely by statistical testing
we can only reject the null hypothesis which results in by default in
acceptance of the alternative hypothesis, and if we fail to reject the null
hypothesis we accept it by default.
The p <.05 alone is not a safeguard against type one errors, the P value can be misinterpreted and misreport by the researcher leading to a type one error otherwise known as a = Alpha (Head, 2015). This happens when the researcher concludes there is a significant difference between two sets of statistics when in fact the reverse is true (Guyatt, 1995) for example if a researcher hypothesised that the price of cars reduced with age this is known as the null hypothesis the researcher then collected a set of data manufacture year of the car and current sale price and the data returned a p value of p<.05 but is interpreted as not significant then a type one error occurs. P-hacking can also occur, this is when researchers influence the data they collect by selecting only significant results to report on, in the paper written by Head ML, and colleagues (2015) entitled The Extent and Consequences of P-Hacking in Science, they describe p-hacking as "occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant." (Head, 2015) . For example, if researchers hypothesised drug A would be a better cure then drug B and planned a study for a period of 6 months and at 4 months found significant results, if they stopped the study and wrote up their findings they would be collecting selective data which would provide significant results, or if they ran the study for the full term (6 months) and discovered results which were not quite significant and chose to extend the results until the results supported their study, again this could be deemed P-hacking as they have carried on the study until they gain the required results to support their hypothesis (selective data results) both studies would produce false positive results or type one error. It is also reported by Head ML that the drive to produce significant results is due to publication bias, that is to say journals are more likely to publish papers which show significant differences than those which don't (Head, 2015). Other reasons P-hacking is used are researchers appear more appealing if they regularly produce significant results, researchers are more likely gain higher prestige, to be promoted and or gain further grants by publishing in well thought of journals (Raj, 2017). One of the biggest consequences of p hacking is producing incorrect methods for treating patients (Head, 2015). HARKing or hypothesising after the results are known, is also another way researchers attempt to prove a significant difference by altering their original hypothesis by creating a second hypothesis without reporting the change in the original hypothesis. In a paper written by Nobert L Kerr entitled HARKing: HYPOTHESISING AFTER THE RESULTS ARE KNOWN, Kerr defines HARKing as "presenting a post hoc hypothesis (i.e., one based on or informed by one's results) in one's research report as if it were, in fact, an a priori hypotheses." (Kerr, 1998). For example, a researcher hypothesis that energy drinks raise cognitive abilities, the researcher then measures the cognitive ability of the subjects and finds energy drinks have no significant effect on the cognitive abilities of the subject group and then introduces a new hypothesis to support the findings, this would not only be an example of HARKing but in essence also creates a type one error by rejecting the original hypotheses. The main point of HARKing is the researcher does not disclose any of the changes made to the first hypothesis. What can we do to identify publication bias? It has been reported funnel plots and meta-analysis could reveal publication bias, if there is no publication bias present then the scatter plot will be even around the mean and shaped like a funnel. Where publication bias exists then the scatter plot will be empty around the mean, the scatter plot can also become asymmetrical when publication bias is present if the effect size is small to medium. However, it should be noted scatter plots do have their limitations, they are subjective and each individual's interpretation may vary. Asymmetric funnel plots may also be caused by other factors such as poor method or simply by chance. The advantage of carrying out meta-analysis is the ability to combine multiple studies which provides a more accurate representation of the population and provides a stronger power, it may also assist in showing the differences between the studies. (Song, 2013) Publication bias can have serious consequences, it may lead to discrepancies in clinical trials where drug A is reported as having a significant effect on a specific illness thus also resulting in further health risks to patients, in these days of austerity budgets could be wasted on more expensive drugs when a cheaper alternative could produce the same if not better effects and may even increase mortality rates (Song, 2013). In summary the p<.05 is not a safeguard for type one errors, it can be misinterpreted (Head, 2015) leading to the null hypothesis (H0) being rejected in favour of the alternative hypothesis (H1) thus resulting in a type one error. Researchers can influence the data they collect by selecting only significant results to report on P-hacking (Head, 2015) and researchers can engage in HARKing (hypothesising after the results are known) altering the original hypothesis without recording or reporting the original hypothesis (Kerr, 1998). Attempts have been made to identify instances of p-hacking and HARKing but no all-encompassing method has been found to safeguard against these practices. One solution suggested by Song and colleagues to combat these practices is to introduce compulsory registration at the beginning of studies which would force researchers to be fully transparent throughout the whole study (Song, 2013).