Advertisement

Validation of the American College of Foot and Ankle Surgeons Scoring Scales

Published:April 29, 2011DOI:https://doi.org/10.1053/j.jfas.2011.03.005

      Abstract

      The American College of Foot and Ankle Surgeons (ACFAS) assembled a task force to develop a scoring scale that could be used by the membership and practitioners-at-large. The original publication that introduced the scale focused primarily on use of the scale and provided only brief background on the development of the health measurement instrument. Concerns regarding the validity and reliability of the scale were raised within the professional community, and ACFAS assembled a task force to address these concerns. The purpose of this article is to address the issues raised by reporting the detailed methods used in the development of the ACFAS Scoring Scales. The authors who constitute this task force reviewed the body of work previously conducted and applied standards that serve to evaluate the scoring scale for: 1) validity, 2) reliability, and 3) sensitivity to change. The results showed that a systematic and comprehensive approach was used in the development of the scoring scales, and the task force concluded that the statistical methods and instrument development process for all 4 modules of the scoring scales were conducted in an appropriate manner. Furthermore, modules 1 and 2 have been rigorously assessed and the elements of these modules have been shown to meet standards for validity, reliability, and sensitivity to change.

      Level of Clinical Evidence

      Keywords

      The American College of Foot and Ankle Surgeons (ACFAS) Universal Evaluation Scoring Scale Task Force developed 4 anatomically based scoring scales intended as clinical instruments to be used to measure subjective and objective parameters germane to foot and ankle surgery. Modules 1 and 2 were released in 2002 (
      • Zlotoff H.J.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Schwartz N.H.
      • Thomas J.L.
      • Weil Sr., L.S.
      ACFAS Universal Foot and Ankle Scoring System: first metatarsophalangeal joint and first ray (module 1).
      ,
      • Zlotoff H.J.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Schwartz N.H.
      • Thomas J.L.
      • Weil Sr., L.S.
      ACFAS Universal Foot and Ankle Scoring System: forefoot (module 2).
      ), and a subsequent user’s guide containing all of the modules was published in 2005 (
      • Thomas J.L.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Weil Sr, L.S.
      • Zlotoff H.J.
      • Roukis T.S.
      • Vanore J.V.
      ACFAS Scoring Scale user guide.
      ). The first 2 modules, which focus on the first metatarsophalangeal joint and first ray, and the forefoot, respectively, were developed and statistically analyzed in an effort to confirm that their development and design yield valid results.
      The work of the original task force a priori was intended to be periodically reevaluated after a reasonable trial period by practicing surgeons. However, reevaluation has not yet occurred and the scoring scales have not yet been widely used for a variety of reasons, including issues raised about the accuracy of validation by external (
      • Lavery L.A.
      • Armstrong D.G.
      Letter to the Editor: ACFAS Scoring Scale: ready, fire, aim?.
      ) and internal reviews by the ACFAS Evidence Based Medicine and Research Committee. Specifically, the methods, analysis, and results of the original development process were not reported and have subsequently been challenged (
      • Lavery L.A.
      • Armstrong D.G.
      Letter to the Editor: ACFAS Scoring Scale: ready, fire, aim?.
      ).
      The purpose of this report is to address the issues raised by reporting the detailed methods used in the development of the ACFAS Scoring Scales. In addition, the steps used for prior validation of modules 1 and 2, including assessments of validity, reliability, and sensitivity to change, are presented here as well.

      Materials and Methods

      The authors, who comprise the current ACFAS Scoring Scale Task Force, were assembled in 2010 to conduct the first reevaluation and to address the limitations of previous publications related to the ACFAS Scoring Scales. This process involved collection of all materials archived by the ACFAS that pertained to the development of the original scoring scales. We then reviewed the previous work to assess the conclusions that were reported in the original publications. Specific objectives of the current task force were derived from our own independent review of the available materials, and the criticisms raised by others and mentioned earlier in this report. In addition, we report the methods by which the original scoring scales were developed and the approach taken for instrument validation. Where possible, we determined standardization of the specific radiographic and objective functional items used in modules 1 and 2 to maximize reproducibility. To achieve this, we performed a detailed electronic search of the literature to determine if the techniques described possess a higher level of inter-rater and intra-rater reliability than those originally selected. Our objective was to strengthen the current scoring scale by identifying widely recognized and accepted techniques for making measurements, and bringing a higher level of standardization to the scoring process. The genesis of the scoring scales is depicted schematically in Figure 1.

      Results

      The 4 modules of the ACFAS Universal Evaluation Scoring Scales are depicted in Fig. 2, Fig. 3, Fig. 4, Fig. 5, and the user instructions that accompany the first 2 modules are depicted in Figure 6. The scoring scales, as well as the work completed by the original task force, served as the starting point for our analysis of the development of the instrument in terms of validity, reliability, and sensitivity to change. The methods of radiographic and functional measurement described in Figure 6 represent techniques that are generally known to show high levels of inter-rater and intra-rater reliability.
      Figure thumbnail gr6
      Fig. 6Instructions for the objective section of the modules 1 and 2 of the ACFAS Universal Evaluation Scoring Scales.

       Instrument Development

      After a detailed review of all correspondence and materials archived by the ACFAS for the scoring scales, we determined that the original task force initiated its scoring scale development with a review of the literature including previously published scoring scales. A survey was conducted at the 1999 ACFAS Annual Scientific Conference asking members to rank the importance of having a scoring scale and the importance of different parameters. Of those members surveyed, 88% agreed or strongly agreed that the development of a foot and ankle surgery scoring scale was a worthwhile project for the ACFAS to conduct. The parameters were ranked by the members from the most to the least important, as follows: pain, function, shoe gear limitations, radiographic measurements, and cosmesis. The original task force then consulted additional experts in foot and ankle surgery to create initial scoring scales (i.e., sections, subsections, questions and answers, weighting of answers). This evolved into a modified Delphi process, a standard procedure in which a panel of experts is assembled, to obtain consensus for the conceptual framework of the specific questions and answers generated (
      • Jones J.
      • Hunter D.
      Consensus methods for medical and health services research.
      ). Subsequently, the format of these questions and answers was revised through the modified Delphi process. Patient focus group sessions were then conducted, and the findings were used to incorporate patients’ values and interpretations of the questions into the instruments (Table 1). Global questions were also derived to address the pain, appearance, and functional components of the scoring scales (Figure 1).
      Table 1Patient focus groups: Reasons for having foot surgery from a patient perspective
      Module 1Module 2
      AppearancePainInability to Wear All Shoe TypesAppearancePainInability to Wear All Shoe Types
      Presurgery/test5.5%86.8%7.7%7.2%84.5%8.3%
      Presurgery/retest8.7%79.7%11.6%12.3%78.5%9.2%
      Kappa coefficient0.4430.403
      0 = poor agreement; 1 = slight agreement; 2 = fair agreement; 3 = moderate agreement; 4 = substantial agreement; 5 = almost perfect agreement
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      .

       Final ACFAS Scoring Scale

      The final ACFAS Scoring Scales (Fig. 2, Fig. 3, Fig. 4, Fig. 5) are comprised of the following 4 modules:
      • Module 1: First Metatarsophalangeal Joint (MPJ) and First Ray (11 questions)
      • Module 2: Forefoot (Excluding First Ray) (12 questions)
      • Module 3: Rearfoot (Including Flatfoot) (16 questions)
      • Module 4: Ankle (22 questions)
      Each of the final scoring scales included a total of 100 points (50 subjective, 50 objective). The original task force selected a total of 100 points for ease of use, interpretability, and the ability to maintain analogy with other scoring scales. The subjective parameters consisted of subsections encompassing questions and answers pertaining to pain (30 points), appearance (cosmesis) (5 points), and functional capacity (15 points), whereas the objective parameters consisted of radiographic (18 points) and functional (musculoskeletal) (32 points) measurements. Item weighting was determined by means of a process that included expert consultation, the modified Delphi process, and patient focus groups.
      The primary aim of the ACFAS Scoring Scale is to evaluate the subjective and objective health outcomes before and after foot and ankle surgery. The ACFAS Scoring Scale should be used with patients enrolled in prospective clinical trials for foot and ankle surgery under the following conditions:
      • Pathology/disease: Foot and ankle musculoskeletal diseases requiring surgical intervention
      • Population for intended use: Adults (≥18 years old), English speaking
      • Administration mode: Subjective component, self-administered; objective component, clinician-rated
      • Recall/observation period: Condition at present time administered preoperatively and postoperatively, except subjective pain response is over the past month
      The ACFAS Scoring Scales (Fig. 2, Fig. 3, Fig. 4, Fig. 5) modules 1 and 2 were tested in a total of 91 patients in module 1 and 84 patients in module 2, in 6 centers over several years for validity, reliability, and sensitivity to change.

       Validity

      After a detailed review as noted above, we determined that the first validation study assessed both content (face) and initial construct validity with an a priori plan to assess criterion validity and more in-depth assessment of construct validity in future scoring scale updates. Content (face) validity was thoroughly assessed through the modified Delphi process consisting of 6 members and 2 consultants (
      • Thomas J.L.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Weil Sr, L.S.
      • Zlotoff H.J.
      • Roukis T.S.
      • Vanore J.V.
      ACFAS Scoring Scale user guide.
      ) in collaboration with an independent biostatistician over several years’ time. This helped ensure adequate content (face) validity and appropriateness of questions and answers relative to the purpose of this scoring scale. A review of the content by the current task force reaffirmed the appropriateness of questions and answers, and helped to identify and refine areas where questions could be improved through further standardization.
      Construct validity, as it relates to the proposed use of the scales, refers to the anticipated behavior of the scale after surgical intervention. Construct validity was demonstrated via the expected directional change in the module scores after surgical intervention (e.g., after first metatarsal osteotomy for hallux valgus, it is expected that the intermetatarsal 1–2 angle would decrease rather than increase). We hypothesized that the scoring scale would increase between the specific preoperative state and postoperative surgical intervention, because higher scores represent more desirable outcomes. A paired 2-sided Student's t test using the preoperative and postoperative scores found a consistent increase in the total scores with the expected directionality (P < .01) (Table 2, Table 3).
      Table 2Assessment of construct validity, modules 1 and 2: Patient reported outcomes by item
      Question/Item
      Module 1: Subjective, n = 91PreoperativePostoperativeExpected DirectionObserved Direction
      Pain
       None7%28%
       Slight13%47%
       Moderate46%19%
       Significant30%6%
       Severe4%0
      Appearance
       Like very much3%30%
       Mostly like7%32%
       Neutral25%26%
       Mostly dislike37%9%
       Dislike28%2%
      Function/Shoes
       Any shoe all the time3%23%
       Any shoe most of the time29%40%
       Only walking shoes67%36%
       Only custom shoes1%2%
      Module 1: Objective, n = 89PreoperativePostoperativeExpected DirectionObserved Direction
      HA angle
       0°−20°32%82%
       21°−30°38%7%
      31°2%3%
       −1°-3°25%7%
       > –3°3%1%
      IM angle
       0°−10°28%85%
       11°−19°66%14%
      20°5%0
       < 0°1%1%
      First MT declination
       16°−24°63%68%
       25°−29°10%24%
      29°5%8%
       10°−15°23%0
       < 10°N/AN/AN/AN/A
      Hallux purchase
       Not movable39%39%
       Resistant39%46%
       Easy22%15%
      ROM first MPJ DF
      60°40%40%
       45°−59°22%38%
       36°−45°20%13%
       < 36°18%10%
      ROM first MPJ PF
      91%76%
       < 0°9%24%
      Hallux IPJ extension
       0°98%93%
       < 0°2%7%
      Limp: Yes42%19%
      Module 2: Subjective, n = 84PreoperativePostoperativeExpected DirectionObserved Direction
      Pain
       None4%26%
       Slight13%38%
       Moderate42%24%
       Significant33%12%
       Severe8%0
      Appearance
       Like very much12%26%
       Mostly like18%24%
       Neutral27%32%
       Mostly dislike16%15%
       Dislike27%3%
      Function/Shoes
       Any shoe all the time6%19%
       Any shoe most of the time23%38%
       Only walking shoes68%43%
       Only custom shoes4%0
      Module 2: Objective, n = 82PreoperativePostoperativeExpected DirectionObserved Direction
      4–5 IM angle
       0°−8°84%95%
      16%5%
      MT lengthN/AN/AN/AN/A
      Transverse MPJ
       0°−5°77%91%
       > 5°23%9%
      Transverse IPJ
       0°−5°77%91%
       > 5°23%9%
      ROM MPJ DF
      65°73%82%
       45°−64°18%16%
       < 45°9%2%
      ROM MPJ PF
       ≥ 0°88%93%
       < 0°12%7%
      Digital purchase: Yes67%88%
      Drawer sign
       Stable73%93%
       Subluxable16%7%
       Dislocated11%0
      Limp: Yes48%0
      Abbreviations: DF, dorsiflexion; HA, hallux abductus; IM, intermetatarsal; IPJ, interphalangeal joint; MPJ, metatarsophalangeal joint; MT, metatarsal; N/A, not applicable; PF, plantar flexion; ROM, range of motion.
      Table 3Construct validity, totals by parameter, modules 1 and 2
      Preoperative TotalPostoperative TotalExpected DirectionObserved DirectionP Value
      From 2-sided paired Student's t test. Patients with scores submitted for preoperative and postoperative testing.
      Module 1, subjective21.8 ± 8.0432.3 ± 11.04.005
      Module 1, objective32.9 ± 10.0338.8 ± 9.23< .001
      Module 2, subjective20.8 ± 8.9931.7 ± 9.32< .001
      Module 2, objective30.4 ± 9.5536.3 ± 4.47< .001
      From 2-sided paired Student's t test. Patients with scores submitted for preoperative and postoperative testing.

       Reliability

      After a detailed review as noted above, we have identified the following process as occurring. Reliability of the subjective portion of the scoring scales was assessed via test–retest with an a priori plan to assess internal consistency in future scoring scale updates. Test–retest analysis was conducted with correlation coefficients between the initial test and the retest 7 to 10 days later. In broad terms, test–retest is a technique used to confirm reliability by answering and then re-answering the same question at a later time. Initial and subsequent answers are compared to determine whether the question can be reliably answered. We confirmed that a test–retest evaluation was performed both preoperatively and postoperatively such that each question and answer were collected 4 times. Fair to substantial agreement was obtained in all categories (Table 4, Table 5).
      Table 4Reliability: Kappa coefficient for test–retest of subjective parameters by item, modules 1 and 2
      Preoperative Test–RetestResponse RatePostoperative Test–RetestResponse Rate
      Module 1 pain0.390274.7%N/A38.5%
      Module 1 appearance0.398274.7%N/A38.5%
      Module 1 functional capacity0.499374.7%0.519338.5%
      Module 2 pain0.464376.1%0.317238.0%
      Module 2 appearance0.512376.1%0.509338.0%
      Module 2 functional capacity0.580376.1%0.643438.0%
      0 = poor agreement; 1 = slight agreement; 2 = fair agreement; 3 = moderate agreement; 4 = substantial agreement; 5 = almost perfect agreement
      • Landis J.R.
      • Koch G.G.
      The measurement of observer agreement for categorical data.
      .
      Table 5Reliability: Test–retest of subjective parameter totals, modules 1 and 2
      Test95% CIRetest
      Retest administered 7 to 10 days after first test for operative period.
      95% CI
      Preoperative mean module 1 score21.8 ± 8.04[20.13, 23.47]23.2 ± 10.53[20.67,25.73]
      Postoperative mean module 1 score
      Postoperative scores obtained 6 months after surgical intervention.
      32.3 ± 11.04[29.26, 35.34]34.8.2 ± 9.9[32.07, 37.53]
      Preoperative mean module 2 score20.8 ± 8.99[18.85, 22.75]22.4 ± 10.62[19.77,25.03]
      Postoperative mean module 2 score
      Postoperative scores obtained 6 months after surgical intervention.
      31.7 ± 9.32[28.45, 34.95]32.6 ± 11.45[29.03, 36.17]
      Abbreviation: CI, confidence interval.
      Retest administered 7 to 10 days after first test for operative period.
      Postoperative scores obtained 6 months after surgical intervention.

       Sensitivity to Change

      Sensitivity is based on the ability of the scoring scales to reflect a change after an intervention where a change would be reasonably expected (i.e., after surgery). After a detailed review, as described above, we have identified the following process as having occurred. The ACFAS Scoring Scale was administered preoperatively and again within a 6-month postoperative period. A paired 2-sided Student's t test was used to evaluate the mean change in the total score. A statistically significant change in the total score was detected, P < .01, which reflects the scoring scales’ capacity to detect a clinically significant change after surgical intervention. Loss to follow-up ranged between 22% and 35% over the follow-up period. Therefore, sensitivity analyses were performed to assess the influence of nonresponders compared with a compliers-only analysis with a paired 2-sided Student's t test (Table 6).
      Table 6Sensitivity for change: Outcomes of scales for preoperative, postoperative, and difference for modules 1 and 2
      Preoperative TotalPostoperative TotalPre–Postoperative Difference
      Patients with scores submitted for preoperative and postoperative testing.
      Restricted Difference
      Compliers only analysis with scores submitted for all 4 time periods.
      P Value
      Patients with scores submitted for preoperative and postoperative testing.
      95% CI
      Patients with scores submitted for preoperative and postoperative testing.
      P Value
      Compliers only analysis with scores submitted for all 4 time periods.
      95% CI
      Module 1, subjective21.8 ± 8.0432.3 ± 11.04+4.7 ± 13.54+13.5 ± 9.05.005[1.47, 7.93]<.001[9.99, 17.00]
      Module 1, objective32.9 ± 10.0338.8 ± 9.23+6.5 ± 11.28N/A< .001[3.83, 9.17]N/AN/A
      Module 2, subjective20.8 ± 8.9931.7 ± 9.32+6.51 ± 11.26+9.19 ± 10.17< .001[3.09, 9.93]<.001[5.46, 12.92]
      Module 2, objective30.4 ± 9.5536.3 ± 4.47+4.6 ± 8.26N/A< .001[2.37, 6.83]N/AN/A
      Abbreviations: CI, confidence interval; N/A, not applicable.
      Patients with scores submitted for preoperative and postoperative testing.
      Compliers only analysis with scores submitted for all 4 time periods.

      Discussion

      Scoring scale clinical instruments have traditionally been developed through a loose process that includes data thought to be clinically important to the investigator. However, this approach results in the genesis of scoring scales that generally provide insight into patient response to treatment but do not allow for comparison of results because they lack standard measurement and reporting techniques. The addition of objective data such as radiographic measurements with high interobserver reliability as well as reliable and valid joint range-of-motion techniques reduce investigator error and improve the usefulness of the scoring scale. In contrast to the above, the optimal scoring scale clinical instrument begins with a consensus panel that determines the critical elements to be assessed (the modified Delphi process). Once the areas of interest are determined, questions are crafted that will help to ascertain the relative success of the procedure or treatment to be evaluated. Draft scoring scales are then assembled and tested. Through constant refinement, the scale is eventually formed and must then be validated. The validation process is a formal, statistical process, and several specific criteria must be obtained. The optimal scoring scale clinical instrument will produce a quantity that is reliably reproducible and closely correlates with the patients’ symptoms. This requires that the scoring scale has undergone a validation process that includes the following criteria:
      • Validity based on content validity (face value), construct validity (subjective versus objective correlation), and criterion validity (correlation with gold standard)
      • Reliability as demonstrated by consistency in data collection (intra-rater test–retest)
      • Sensitivity to change after the study treatment
      We conducted the first reevaluation of the ACFAS Scoring Scales to address the limitations of previous publications (
      • Lavery L.A.
      • Armstrong D.G.
      Letter to the Editor: ACFAS Scoring Scale: ready, fire, aim?.
      ), to report the detailed methods used in their development, and to confirm the validation of modules 1 and 2 by evaluating the previously conducted (
      • Zlotoff H.J.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Schwartz N.H.
      • Thomas J.L.
      • Weil Sr., L.S.
      ACFAS Universal Foot and Ankle Scoring System: first metatarsophalangeal joint and first ray (module 1).
      ,
      • Zlotoff H.J.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Schwartz N.H.
      • Thomas J.L.
      • Weil Sr., L.S.
      ACFAS Universal Foot and Ankle Scoring System: forefoot (module 2).
      ,
      • Thomas J.L.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Weil Sr, L.S.
      • Zlotoff H.J.
      • Roukis T.S.
      • Vanore J.V.
      ACFAS Scoring Scale user guide.
      ) but not reported assessments of validity, reliability, and sensitivity to change. This was an extensive process that involved collection of all correspondence and materials archived by the ACFAS on the development of the original scoring scales and detailed review of this material over several sessions. We completely reviewed the previous work to assess the conclusions reported in the original publications and addressed the issues raised by others as mentioned previously. We reviewed the methods by which the original scoring scales were developed and the approach taken for instrument validation. Where possible, we determined standardization of the specific radiographic and objective functional items used in modules 1 and 2 to maximize reproducibility. To achieve this, we performed a detailed electronic search of the literature to determine if the techniques described possess a higher level of inter-rater and intra-rater reliability than those originally selected.
      In the original publications of the ACFAS Scoring Scales (
      • Zlotoff H.J.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Schwartz N.H.
      • Thomas J.L.
      • Weil Sr., L.S.
      ACFAS Universal Foot and Ankle Scoring System: first metatarsophalangeal joint and first ray (module 1).
      ,
      • Zlotoff H.J.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Schwartz N.H.
      • Thomas J.L.
      • Weil Sr., L.S.
      ACFAS Universal Foot and Ankle Scoring System: forefoot (module 2).
      ,
      • Thomas J.L.
      • Christensen J.C.
      • Mendicino R.W.
      • Schuberth J.M.
      • Weil Sr, L.S.
      • Zlotoff H.J.
      • Roukis T.S.
      • Vanore J.V.
      ACFAS Scoring Scale user guide.
      ), the validation process, data, and methods used were not reported. The subjective and objective sections were lacking in regard to definition, measurement criteria, and derivation of information. In addition, the rationale for the specific scaling (weighting) was not discussed. Despite these shortcomings, we have demonstrated that modules 1 and 2 of the ACFAS Scoring Scale (as originally published, i.e., without any additions or deletions) met the threshold for validation criteria as described in the last paragraph. Finally, through complete review of all pertinent information and detailed data analysis we have provided the evidence necessary to address the issues raised above.
      There are myriad strengths of the ACFAS Scoring Scales for foot-related health measurement. The sample size for modules 1 and 2 was suitably large, and the design involved a multicenter patient enrollment process. The general ACFAS membership determined the desire to proceed with the development of the scoring scales with the specific intent of having this complete the rigors of validation. There was a strong assessment and evaluation of content (face) validity that we were able to achieve through independent review. A patient focus group was used and a large number of experts were consulted, as well as involved members of the original task force that used a modified Delphi approach. There was also an appropriate time span between the test–retest periods that allowed for assessment of reliability. Reliability, validity, and sensitivity to change were all tested, and small confidence intervals, indicating good precision of the reported results, were observed. Finally, to the authors’ knowledge, modules 1 and 2 of the ACFAS Scoring Scales represent the first region-specific scoring scale that has been validated by quantifying validity, reliability, and sensitivity to change (
      • Schneider W.
      • Knahr K.
      Scoring in forefoot surgery: a statistical evaluation of single variables and rating systems.
      ,
      • Kitaoka H.B.
      • Patzer G.L.
      Analysis of clinical rating scales for the foot and ankle.
      ,
      • SooHoo N.F.
      • Shuler M.
      • Fleming L.L.
      Evaluation of the validity of the AOFAS clinical rating systems by correlation to the SF-36.
      ,
      • Parker J.
      • Nester C.J.
      • Long A.F.
      • Barrie J.
      The problem with measuring patient perceptions of outcome with existing outcome measures in foot and ankle surgery.
      ,
      • Button G.
      • Pinney S.
      A meta-analysis of outcome rating scales in foot and ankle surgery: Is there a valid, reliable, and responsive system?.
      ,
      • SooHoo N.F.
      • Samimi D.B.
      • Vyas R.M.
      • Botzler T.
      Evaluation of the validity of the Foot Function Index in measuring outcomes in patients with foot and ankle disorders.
      ,
      • Martin R.L.
      • Irrgang J.J.
      • Lalonde K.A.
      • Conti S.
      Current concepts review: foot and ankle outcome instruments.
      ,
      • SooHoo N.F.
      • Vyas R.
      • Samimi D.
      Responsiveness of the Foot Function Index, AOFAS Clinical Rating Systems, and SF-36 after foot and ankle surgery.
      ,
      • Baumhauer J.F.
      • Nawoczenski D.A.
      • DiGiovanni B.F.
      • Wilding G.E.
      Reliability and validity of the American Orthopaedic Foot and Ankle Society Clinical Rating Scale: a pilot study for the hallux and lesser toes.
      ,
      • Ibrahim T.
      • Beiri A.
      • Azzabi M.
      • Best A.J.
      • Taylor G.J.
      • Menon D.K.
      Reliability and validity of the subjective component of the American Orthopaedic Foot and Ankle Society Clinical Rating Scales.
      ,
      • Van der Leeden M.
      • Steultjens M.P.M.
      • Terwee C.B.
      • Rosenbaum D.
      • Turner D.
      • Woodburn J.
      • Dekker J.
      A systematic review of instruments measuring foot function, foot pain, and foot-related disability in patients with rheumatoid arthritis.
      ).
      As with any meaningful publication, the strengths and message must be placed in the appropriate context by discussing identified limitations. Standardization of data collection methods was inconsistent for both subjective and objective components largely because of study intent to use the scale in multiple centers. Examples include patients completing the subjective scale in a waiting room versus in the presence of the surgeon. An objective component example would include angular measurement via digital rather than conventional radiographic films. Consistency in methodology is critical to internal validity and can decrease the accuracy of the data reported. In an effort to clarify and standardize data collection methods, Figure 6 has been constructed for reference when conducting the objective components of the first 2 modules. Although this is an important consideration, patients’ results were compared with themselves at the different intervals and under similar conditions at those times. These differences are also reflective of the diversity of practice settings in which the scales were intended to be implemented into. Another limitation identified was in the definitions of measurements. In the original task force the numeric endpoints were developed via expert consultation, but a more robust approach is to use statistical methods to determine the distribution of measurements. Although not the ideal methodology, it is still considered an acceptable practice. Similarly, the weighting of questions was not determined by the optimal statistical methods but instead through a composite of consensus of expert consultation, patient focus groups, and others as noted above. As a result, question generation methodology was emphasized over question reduction. The scales therefore may include questions that are less useful and add to the overall burden of the data collection. Modules 3 and 4 have yet to undergo the necessary evaluations for validity, reliability, and sensitivity to change. Their use cannot be fully endorsed at this time; however, there are future plans to conduct these necessary assessments. Demographic data from the sample population were not archived by ACFAS and therefore could not be analyzed by the authors. This makes generalizability difficult because the patient population cannot be strictly defined. Because multiple centers were selected, the samples are assumed to reflect an average foot and ankle surgeon’s clinical environment. Cluster analysis would have been an ideal but onerous method for analyzing differences at each of the centers. Criterion validity was not assessed in this setting because a gold standard for comparison does not exist. Several other scales are routinely used in clinical research; however, comparison with them as a gold standard could not be assessed because of significant concerns related to their own validity, reliability, and sensitivity to change, or because they were not analogous. As per mandate, future reviews and updates will attempt to address these limitations to their fullest extent. These updates will include the addition and removal of questions, development of more global questions, and other issues or concerns as they are identified.
      In conclusion, we have addressed the issues raised by reporting the detailed methods used in the development of the ACFAS Scoring Scales, as well as the steps used for prior validation of modules 1 and 2 that included assessments of validity, reliability, and sensitivity to change.

      References

        • Zlotoff H.J.
        • Christensen J.C.
        • Mendicino R.W.
        • Schuberth J.M.
        • Schwartz N.H.
        • Thomas J.L.
        • Weil Sr., L.S.
        ACFAS Universal Foot and Ankle Scoring System: first metatarsophalangeal joint and first ray (module 1).
        J Foot Ankle Surg. 2002; 41: 2-5
        • Zlotoff H.J.
        • Christensen J.C.
        • Mendicino R.W.
        • Schuberth J.M.
        • Schwartz N.H.
        • Thomas J.L.
        • Weil Sr., L.S.
        ACFAS Universal Foot and Ankle Scoring System: forefoot (module 2).
        J Foot Ankle Surg. 2002; 41: 109-111
        • Thomas J.L.
        • Christensen J.C.
        • Mendicino R.W.
        • Schuberth J.M.
        • Weil Sr, L.S.
        • Zlotoff H.J.
        • Roukis T.S.
        • Vanore J.V.
        ACFAS Scoring Scale user guide.
        J Foot Ankle Surg. 2005; 44: 316-335
        • Lavery L.A.
        • Armstrong D.G.
        Letter to the Editor: ACFAS Scoring Scale: ready, fire, aim?.
        J Foot Ankle Surg. 2006; 45: 284-285
        • Jones J.
        • Hunter D.
        Consensus methods for medical and health services research.
        Br Med J. 1995; 311: 376-380
        • Schneider W.
        • Knahr K.
        Scoring in forefoot surgery: a statistical evaluation of single variables and rating systems.
        Acta Orthop Scand. 1998; 69: 498-504
        • Kitaoka H.B.
        • Patzer G.L.
        Analysis of clinical rating scales for the foot and ankle.
        Foot Ankle Int. 1997; 18: 443-446
        • SooHoo N.F.
        • Shuler M.
        • Fleming L.L.
        Evaluation of the validity of the AOFAS clinical rating systems by correlation to the SF-36.
        Foot Ankle Int. 2003; 24: 50-55
        • Parker J.
        • Nester C.J.
        • Long A.F.
        • Barrie J.
        The problem with measuring patient perceptions of outcome with existing outcome measures in foot and ankle surgery.
        Foot Ankle Int. 2003; 24: 56-60
        • Button G.
        • Pinney S.
        A meta-analysis of outcome rating scales in foot and ankle surgery: Is there a valid, reliable, and responsive system?.
        Foot Ankle Int. 2004; 25: 521-525
        • SooHoo N.F.
        • Samimi D.B.
        • Vyas R.M.
        • Botzler T.
        Evaluation of the validity of the Foot Function Index in measuring outcomes in patients with foot and ankle disorders.
        Foot Ankle Int. 2006; 27: 38-42
        • Martin R.L.
        • Irrgang J.J.
        • Lalonde K.A.
        • Conti S.
        Current concepts review: foot and ankle outcome instruments.
        Foot Ankle Int. 2006; 27: 383-390
        • SooHoo N.F.
        • Vyas R.
        • Samimi D.
        Responsiveness of the Foot Function Index, AOFAS Clinical Rating Systems, and SF-36 after foot and ankle surgery.
        Foot Ankle Int. 2006; 27: 930-934
        • Baumhauer J.F.
        • Nawoczenski D.A.
        • DiGiovanni B.F.
        • Wilding G.E.
        Reliability and validity of the American Orthopaedic Foot and Ankle Society Clinical Rating Scale: a pilot study for the hallux and lesser toes.
        Foot Ankle Int. 2006; 27: 1014-1019
        • Ibrahim T.
        • Beiri A.
        • Azzabi M.
        • Best A.J.
        • Taylor G.J.
        • Menon D.K.
        Reliability and validity of the subjective component of the American Orthopaedic Foot and Ankle Society Clinical Rating Scales.
        J Foot Ankle Surg. 2007; 46: 65-74
        • Van der Leeden M.
        • Steultjens M.P.M.
        • Terwee C.B.
        • Rosenbaum D.
        • Turner D.
        • Woodburn J.
        • Dekker J.
        A systematic review of instruments measuring foot function, foot pain, and foot-related disability in patients with rheumatoid arthritis.
        Arthritis Rheum. 2008; 59: 1257-1269
        • Landis J.R.
        • Koch G.G.
        The measurement of observer agreement for categorical data.
        Biometrics. 1977; 33: 159-174