Advertisement

Evaluation of Proposed Protocol Changing Statistical Significance From 0.05 to 0.005 in Foot and Ankle Randomized Controlled Trials

Published:March 21, 2022DOI:https://doi.org/10.1053/j.jfas.2022.03.005
      In 2018, a group of 72 methodologists suggested shifting the p value threshold from the commonly accepted .05 convention to .005, and p values between .05 and .005 would be labeled “suggestive” (
      • Benjamin DJ
      • Berger JO
      • Johannesson M
      • Nosek BA
      • Wagenmakers EJ
      • Berk R
      • Bollen KA
      • Brembs B
      • Brown L
      • Camerer C
      • Cesarini D
      • Chambers CD
      • Clyde M
      • Cook TD
      • De Boeck P
      • Dienes Z
      • Dreber A
      • Easwaran K
      • Efferson C
      • Fehr E
      • Fidler F
      • Field AP
      • Forster M
      • George EI
      • Gonzalez R
      • Goodman S
      • Green E
      • Green DP
      • Greenwald AG
      • Hadfield JD
      • Hedges LV
      • Held L
      • Hua Ho T
      • Hoijtink H
      • Hruschka DJ
      • Imai K
      • Imbens G
      • Ioannidis JPA
      • Jeon M
      • Jones JH
      • Kirchler M
      • Laibson D
      • List J
      • Little R
      • Lupia A
      • Machery E
      • Maxwell SE
      • McCarthy M
      • Moore DA
      • Morgan SL
      • Munafó M
      • Nakagawa S
      • Nyhan B
      • Parker TH
      • Pericchi L
      • Perugini M
      • Rouder J
      • Rousseau J
      • Savalei V
      • Schönbrodt FD
      • Sellke T
      • Sinclair B
      • Tingley D
      • Van Zandt T
      • Vazire S
      • Watts DJ
      • Winship C
      • Wolpert RL
      • Xie Y
      • Young C
      • Zinman J
      • Johnson VE
      Redefine statistical significance.
      ). By increasing the stringency of statistical significance, the risk of false positives would decrease across the medical literature. Randomized controlled trials (RCTs) are the gold standard of medical literature because of their status as Level 1 evidence. RCTs commonly report p values for statistically significant outcomes, and, as a result, are also prone to misinterpretation. Misinterpretation of p values in RCTs is problematic since these studies serve as the evidentiary base for high-level recommendations in clinical practice guidelines. By implementing the p value shift proposed by Benjamin et al (
      • Benjamin DJ
      • Berger JO
      • Johannesson M
      • Nosek BA
      • Wagenmakers EJ
      • Berk R
      • Bollen KA
      • Brembs B
      • Brown L
      • Camerer C
      • Cesarini D
      • Chambers CD
      • Clyde M
      • Cook TD
      • De Boeck P
      • Dienes Z
      • Dreber A
      • Easwaran K
      • Efferson C
      • Fehr E
      • Fidler F
      • Field AP
      • Forster M
      • George EI
      • Gonzalez R
      • Goodman S
      • Green E
      • Green DP
      • Greenwald AG
      • Hadfield JD
      • Hedges LV
      • Held L
      • Hua Ho T
      • Hoijtink H
      • Hruschka DJ
      • Imai K
      • Imbens G
      • Ioannidis JPA
      • Jeon M
      • Jones JH
      • Kirchler M
      • Laibson D
      • List J
      • Little R
      • Lupia A
      • Machery E
      • Maxwell SE
      • McCarthy M
      • Moore DA
      • Morgan SL
      • Munafó M
      • Nakagawa S
      • Nyhan B
      • Parker TH
      • Pericchi L
      • Perugini M
      • Rouder J
      • Rousseau J
      • Savalei V
      • Schönbrodt FD
      • Sellke T
      • Sinclair B
      • Tingley D
      • Van Zandt T
      • Vazire S
      • Watts DJ
      • Winship C
      • Wolpert RL
      • Xie Y
      • Young C
      • Zinman J
      • Johnson VE
      Redefine statistical significance.
      ) to RCTs, more accurate interpretations of results can potentially be made.
      We have evaluated the effect of the protocol suggested by Benjamin et al (
      • Benjamin DJ
      • Berger JO
      • Johannesson M
      • Nosek BA
      • Wagenmakers EJ
      • Berk R
      • Bollen KA
      • Brembs B
      • Brown L
      • Camerer C
      • Cesarini D
      • Chambers CD
      • Clyde M
      • Cook TD
      • De Boeck P
      • Dienes Z
      • Dreber A
      • Easwaran K
      • Efferson C
      • Fehr E
      • Fidler F
      • Field AP
      • Forster M
      • George EI
      • Gonzalez R
      • Goodman S
      • Green E
      • Green DP
      • Greenwald AG
      • Hadfield JD
      • Hedges LV
      • Held L
      • Hua Ho T
      • Hoijtink H
      • Hruschka DJ
      • Imai K
      • Imbens G
      • Ioannidis JPA
      • Jeon M
      • Jones JH
      • Kirchler M
      • Laibson D
      • List J
      • Little R
      • Lupia A
      • Machery E
      • Maxwell SE
      • McCarthy M
      • Moore DA
      • Morgan SL
      • Munafó M
      • Nakagawa S
      • Nyhan B
      • Parker TH
      • Pericchi L
      • Perugini M
      • Rouder J
      • Rousseau J
      • Savalei V
      • Schönbrodt FD
      • Sellke T
      • Sinclair B
      • Tingley D
      • Van Zandt T
      • Vazire S
      • Watts DJ
      • Winship C
      • Wolpert RL
      • Xie Y
      • Young C
      • Zinman J
      • Johnson VE
      Redefine statistical significance.
      ) on foot and ankle-related RCTs in the top 3 foot and ankle-related journals. We hypothesized that there would be many outcomes that would change their designation when applying the methodology. To evaluate this, we conducted a PubMed search looking at studies published from January 1, 2016 to November 10, 2021, in the following 3 journals; Foot and Ankle International, Journal of Foot and Ankle Surgery, and Foot and Ankle Surgery. The inclusion criteria for the study were RCTs published in the above journals with specifically stated primary endpoints. If a study has multiple primary endpoints, all were included. Exclusion criteria were any study that was not prospective and randomized by design, also any study that did not state primary endpoints was excluded. Two authors extracted the data using a pilot-tested Google form, any disagreements or questions were resolved by published methodologic orthopedic authors. Statistical analysis was done using descriptive statistics with percentages. All analyses were done using Google Sheets.
      Our study found 222 primary endpoints from the 83 RCTs. Of the 222 endpoints, 101 endpoints (45.5%; 101/222) were at or below the 0.05 threshold, while 121 endpoints (54.5%; 121/222) were above the 0.05 threshold. We also found that 59 endpoints (26.6%; 59/222) were below 0.005. As a result, we found that 58.4% (59/101) of the endpoints that were statistically significant in our sample's RCTs would remain statistically significant, while 41.6% (42/101) of the endpoints would be reclassified to “suggestive” under the proposed protocol change. The protocol changes also cause only 15 studies in our sample to have all primary endpoints be statistically significant.
      We found that almost three-fourths of the primary outcomes in our sample would have not been statistically significant using the new p value protocol. The results also show 58.4% of the statistically significant endpoints would remain significant, which are statistically different from similar studies done in different aspects of medicine including orthopedic surgery. In 2018, a study was conducted in the Journal of the American Medical Association (JAMA) evaluating the change that would occur if statistical significance was changed from 0.05 to 0.005 in the top 3 highest impact factor general medical journals. The study found that 70% of outcomes would have maintained statistical significance under the new p value protocol (
      • Wayant C
      • Scott J
      • Vassar M.
      Evaluation of lowering the P value threshold for statistical significance from .05 to .005 in previously published randomized clinical trials in major medical journals.
      ). Furthermore, Johnson et al looked at the same outcomes in the scope of orthopedic trauma journals. They found that 41.5% of their statistically significant results would remain significant (
      • Johnson AL
      • Evans S
      • Checketts JX
      • Scott JT
      • Wayant C
      • Johnson M
      • Norris B
      • Vassar M
      Effects of a proposal to alter the statistical significance threshold on previously published orthopaedic trauma randomized controlled trials.
      ). While our results vary, it showed that there would be a drastic change to literature with the implementation of these protocols.
      One of the biggest reasons suggested for the protocol change is to limit the use of P hacking. P hacking is the process of analyzing data using multiple methods until one method yields a statistical significance (
      • Gadbury GL
      • Allison DB
      Inappropriate fiddling with statistical analyses to obtain a desirable p-value: tests to detect its presence in published literature.
      ,
      • Head ML
      • Holman L
      • Lanfear R
      • Kahn AT
      • Jennions MD.
      The extent and consequences of p-hacking in science.
      ). Authors use this method due to the existence of publication bias among journals, perpetuating the effect of publication bias in medical literature. A study published by Razak et al (
      • Razak HRBA
      • Razak HRB
      • Ang JGE
      • Attal H
      • Howe TS
      • Allen JC.
      P-hacking in orthopaedic literature: a twist to the tail.
      ) in The Journal of Bone and Joint Surgery (JBJS) found the presence of P hacking in the top 3 orthopedic journals when results were pooled together. These results show that orthopedics is prone to P hacking and publications bias (
      • Reddy AK
      • Anderson JM
      • Gray HM
      • Fishbeck K
      • Vassar M.
      Clinical trial registry use in orthopaedic surgery systematic reviews.
      ,
      • Reddy AK
      • Scott JT
      • Checketts JX
      • Norris BL.
      The state of publication bias in orthopaedic surgery systematic reviews—what are steps to minimization.
      ,
      • Scott J
      • Checketts JX
      • Cooper CM
      • Boose M
      • Wayant C
      • Vassar M.
      An evaluation of publication bias in high-impact orthopaedic literature.
      ). In addition, Okike et al (
      • Okike K
      • Kocher MS
      • Mehlman CT
      • Heckman JD
      • Bhandari M.
      Publication bias in orthopaedic research: an analysis of scientific factors associated with publication in The Journal of Bone and Joint Surgery (American Volume).
      ) found that there was a lack of studies in orthopedic literature showing nonpositive results. Benjamin et al (
      • Benjamin DJ
      • Berger JO
      • Johannesson M
      • Nosek BA
      • Wagenmakers EJ
      • Berk R
      • Bollen KA
      • Brembs B
      • Brown L
      • Camerer C
      • Cesarini D
      • Chambers CD
      • Clyde M
      • Cook TD
      • De Boeck P
      • Dienes Z
      • Dreber A
      • Easwaran K
      • Efferson C
      • Fehr E
      • Fidler F
      • Field AP
      • Forster M
      • George EI
      • Gonzalez R
      • Goodman S
      • Green E
      • Green DP
      • Greenwald AG
      • Hadfield JD
      • Hedges LV
      • Held L
      • Hua Ho T
      • Hoijtink H
      • Hruschka DJ
      • Imai K
      • Imbens G
      • Ioannidis JPA
      • Jeon M
      • Jones JH
      • Kirchler M
      • Laibson D
      • List J
      • Little R
      • Lupia A
      • Machery E
      • Maxwell SE
      • McCarthy M
      • Moore DA
      • Morgan SL
      • Munafó M
      • Nakagawa S
      • Nyhan B
      • Parker TH
      • Pericchi L
      • Perugini M
      • Rouder J
      • Rousseau J
      • Savalei V
      • Schönbrodt FD
      • Sellke T
      • Sinclair B
      • Tingley D
      • Van Zandt T
      • Vazire S
      • Watts DJ
      • Winship C
      • Wolpert RL
      • Xie Y
      • Young C
      • Zinman J
      • Johnson VE
      Redefine statistical significance.
      ) believe, by changing the threshold of statistical significance to 0.005, P hacking will radically decrease. By reducing P hacking, it will reduce one of the facets that perpetuate publication bias within orthopedic literature (
      • Razak HRBA
      • Razak HRB
      • Ang JGE
      • Attal H
      • Howe TS
      • Allen JC.
      P-hacking in orthopaedic literature: a twist to the tail.
      ,
      • Reddy AK
      • Scott JT
      • Checketts JX
      • Norris BL.
      The state of publication bias in orthopaedic surgery systematic reviews—what are steps to minimization.
      ).
      While there are multiple reasons to make the change in protocol, there are a few items that must be considered before making those decisions. One of the items is, when considering p values, no matter what change is made, clinicians must have the ability to determine the clinical significance of results. Studies may show a statistical difference between 2 points but may not be clinically significant. An example of this is in Moore et al (
      • Moore MJ
      • Goldstein D
      • Hamm J
      • Figer A
      • Hecht JR
      • Gallinger S
      • Au HJ
      • Murawa P
      • Walde D
      • Wolff RA
      • Campos D
      • Lim R
      • Ding K
      • Clark G
      • Voskoglou-Nomikos T
      • Ptasynski M
      • Parulekar W
      Erlotinib plus gemcitabine compared with gemcitabine alone in patients with advanced pancreatic cancer: a phase III trial of the National Cancer Institute of Canada Clinical Trials Group.
      ), which highlights a study that displayed statistical significance in treatment for advanced pancreatic cancer, but the positive result was looking at mean survival and the survival was only for 10 more days, which is clinically not significant (
      • Ranganathan P
      • Pramesh CS
      • Buyse M.
      Common pitfalls in statistical analysis: “P” values, statistical significance and confidence intervals.
      ).
      Our results suggest that changing the threshold for statistical significance from 0.05 to 0.005 in foot and ankle RCTs would heavily alter literature published in the field. By implementing this methodology, it is a promising measure to be able to increase RCT quality until a more substantial solution can be found. With that being said, caution must be taken when interpreting our results, also requiring further evaluation.

      References

        • Benjamin DJ
        • Berger JO
        • Johannesson M
        • Nosek BA
        • Wagenmakers EJ
        • Berk R
        • Bollen KA
        • Brembs B
        • Brown L
        • Camerer C
        • Cesarini D
        • Chambers CD
        • Clyde M
        • Cook TD
        • De Boeck P
        • Dienes Z
        • Dreber A
        • Easwaran K
        • Efferson C
        • Fehr E
        • Fidler F
        • Field AP
        • Forster M
        • George EI
        • Gonzalez R
        • Goodman S
        • Green E
        • Green DP
        • Greenwald AG
        • Hadfield JD
        • Hedges LV
        • Held L
        • Hua Ho T
        • Hoijtink H
        • Hruschka DJ
        • Imai K
        • Imbens G
        • Ioannidis JPA
        • Jeon M
        • Jones JH
        • Kirchler M
        • Laibson D
        • List J
        • Little R
        • Lupia A
        • Machery E
        • Maxwell SE
        • McCarthy M
        • Moore DA
        • Morgan SL
        • Munafó M
        • Nakagawa S
        • Nyhan B
        • Parker TH
        • Pericchi L
        • Perugini M
        • Rouder J
        • Rousseau J
        • Savalei V
        • Schönbrodt FD
        • Sellke T
        • Sinclair B
        • Tingley D
        • Van Zandt T
        • Vazire S
        • Watts DJ
        • Winship C
        • Wolpert RL
        • Xie Y
        • Young C
        • Zinman J
        • Johnson VE
        Redefine statistical significance.
        Nat Hum Behav. 2018; 2: 6-10
        • Wayant C
        • Scott J
        • Vassar M.
        Evaluation of lowering the P value threshold for statistical significance from .05 to .005 in previously published randomized clinical trials in major medical journals.
        JAMA. 2018; 320: 1813-1815
        • Johnson AL
        • Evans S
        • Checketts JX
        • Scott JT
        • Wayant C
        • Johnson M
        • Norris B
        • Vassar M
        Effects of a proposal to alter the statistical significance threshold on previously published orthopaedic trauma randomized controlled trials.
        Injury. 2019; 50: 1934-1937
        • Gadbury GL
        • Allison DB
        Inappropriate fiddling with statistical analyses to obtain a desirable p-value: tests to detect its presence in published literature.
        PLoS One. 2012; 7: e46363
        • Head ML
        • Holman L
        • Lanfear R
        • Kahn AT
        • Jennions MD.
        The extent and consequences of p-hacking in science.
        PLoS Biol. 2015; 13e1002106
        • Razak HRBA
        • Razak HRB
        • Ang JGE
        • Attal H
        • Howe TS
        • Allen JC.
        P-hacking in orthopaedic literature: a twist to the tail.
        J Bone Joint Surg. 2016; 98: e91https://doi.org/10.2106/jbjs.16.00479
        • Reddy AK
        • Anderson JM
        • Gray HM
        • Fishbeck K
        • Vassar M.
        Clinical trial registry use in orthopaedic surgery systematic reviews.
        J Bone Joint Surg Am. 2021; 103: e41
        • Reddy AK
        • Scott JT
        • Checketts JX
        • Norris BL.
        The state of publication bias in orthopaedic surgery systematic reviews—what are steps to minimization.
        Injury. 2022; 53: 213-214
        • Scott J
        • Checketts JX
        • Cooper CM
        • Boose M
        • Wayant C
        • Vassar M.
        An evaluation of publication bias in high-impact orthopaedic literature.
        JB JS Open Access. 2019; 4: e0055
        • Okike K
        • Kocher MS
        • Mehlman CT
        • Heckman JD
        • Bhandari M.
        Publication bias in orthopaedic research: an analysis of scientific factors associated with publication in The Journal of Bone and Joint Surgery (American Volume).
        JBJS. 2008; 90: 595
        • Moore MJ
        • Goldstein D
        • Hamm J
        • Figer A
        • Hecht JR
        • Gallinger S
        • Au HJ
        • Murawa P
        • Walde D
        • Wolff RA
        • Campos D
        • Lim R
        • Ding K
        • Clark G
        • Voskoglou-Nomikos T
        • Ptasynski M
        • Parulekar W
        Erlotinib plus gemcitabine compared with gemcitabine alone in patients with advanced pancreatic cancer: a phase III trial of the National Cancer Institute of Canada Clinical Trials Group.
        J Clin Oncol. 2007; 25: 1960-1966
        • Ranganathan P
        • Pramesh CS
        • Buyse M.
        Common pitfalls in statistical analysis: “P” values, statistical significance and confidence intervals.
        Perspect Clin Res. 2015; 6: 116-117