Minimal clinically important difference of commonly used patient-reported outcome measures in total knee arthroplasty: review of terminologies, methods and proposed values
Knee Surgery & Related Research volume 32, Article number: 19 (2020)
The aim of this article was to highlight various terminologies and methods of calculation of minimal clinically important difference (MCID) and summarize MCID values of frequently used patient-reported outcome measures (PROMs) evaluating total knee arthroplasty (TKA).
Materials and methods
PubMed and EMBASE databases were searched through May 2019. Of 71 articles identified, 18 articles matched and underwent a comprehensive analysis for terminologies used to indicate clinical significance, method of calculation, and reported MCID values.
MCID was the most common terminology (67% studies) and anchor-based methods were most commonly employed (67% studies) to calculate it. The analytical methods used to calculate and the estimated values of MCID for clinical use are highly variable. MCID values reported for WOMAC scores are 20.5 to 36.0, 17.6 to 33.0 and 12.9 to 25.0 for pain, function and stiffness sub-scales, respectively, and 4.7 to 10.0 for OKS.
There was lack of standardization in the methodology employed to calculate MCID in the available studies. MCID values reported in this review could be used for patients undergoing TKA, although caution is advised in their interpretation and application.
Patient-reported outcome measures (PROMs) are frequently incorporated in clinical research as key outcome variables for the evaluation of the treatment effects after total knee arthroplasty (TKA) . While providing patient’s inputs using self-completed scientific questionnaires, PROMs help in a better understanding of the patient’s perspective and improve physician-patient communication . Consequently, numerous PROMs have been validated, including generic PROMs such as the 36-item Short Form survey (SF-36), 12-item Short Form survey (SF-12) and disease- or joint-specific PROMs such as the Western Ontario and McMaster Universities Arthritis Index (WOMAC), Oxford Knee Score (OKS), and Knee Society Score (KSS) [3,4,5,6,7]. However, an accurate and meaningful interpretation of the PROMs is challenging, as the traditionally reported statistically significant differences do not necessarily imply clinically meaningful change. Furthermore, statistical significance, which is centered on testing a null hypothesis using statistically determined probability “p value” does not provide adequate insights to make better treatment decisions . Hence, interpreting the clinical research in terms of the clinical rather than statistical significance has attracted researchers in order to facilitate an evidence-based approach to clinical decision-making.
Introduced as a benchmark of reporting clinical significance, the concept of minimal clinically important difference (MCID) has emerged as an important psychometric property for interpreting changes in the PROM scores from the patient’s perspective . MCID has garnered lot of attention from clinicians and researchers with its potentially wide applications in research, practice, and policy-making. In its group level application in clinical research, MCID is used as a decision threshold to test the effectiveness of a promising new treatment against the current best practice . Additionally, at an individual level, it assists in preoperative discussions regarding patient expectations and helps in making balanced treatment decisions in clinical practice . Moreover, with the potential to interpret the usefulness of different forms of interventions, MCID helps in the formulation of health policies by subsidizing treatments with better patient-reported improvements . Hence, orthopedic surgeons should be familiar with the concept and critical issues related to MCID.
The understanding and utility of MCID in the context of TKA is challenged by the multiple similar terminologies, analytic methods used for calculation, and consequentially, wide variability in the calculated MCID values. Firstly, it has been noted that multiple terminologies are currently utilized to indicate clinically significant changes. Although certain terms like minimal clinically important change (MCIC) are interchangeably used with MCID, distinctions between other terminologies such as minimal important difference (MID), minimal important change (MIC), clinically important difference (CID), and minimum detectable change (MDC) need to be understood . Secondly, multiple methods are currently available to calculate MCID which may have led to varied MCID values and confusion in choosing the appropriate method and the calculated MCID value . Thirdly, there is considerable variability in MCID calculated across different studies for each of the PROMs [12,13,14,15]. It is undetermined whether this variability is because of diverse methodologies of calculation or different clinical contexts in each of the studies (such as heterogeneous demographic characteristics, disease severity, baseline PROM scores, and time-points of analysis). These critical issues need to be thoroughly reviewed before considering the application of MCID in clinical and research contexts pertinent to TKA.
The purpose of this article is to help clinicians and researchers understand the concept and critical issues related to MCID by highlighting various terminologies, methodologies used for calculation, and reported MCID values of commonly used PROMs evaluating outcomes of TKA.
Materials and methods
A comprehensive search of the PubMed and EMBASE databases was conducted from their years of inception through May 2019, keeping the purpose of article in mind. The search was conducted by two independent reviewers (SM and MC) and limited to peer-reviewed articles in English language only. The medical subject headings (MeSH) or the keywords used for search included “minimal clinically important difference” or “MCID,” “minimal important change,” or “MIC,” “minimal important difference” or “MID”, “‘clinically important difference” or “CID,” “minimal clinically important change” or “MCIC,” and “total knee arthroplasty.” After removal of the duplicates (n = 361), 1520 articles matched our search criteria using the aforementioned items, including four articles obtained from manual searching from the references of the core articles. A preliminary screening of titles was performed and 992 articles were excluded as they were considered irrelevant to the current review. Five hundred and twenty-eight abstracts were thus obtained and analyzed, of which 457 articles were excluded based on a-priori established inclusion and exclusion criteria. The full text of 71 articles that passed preliminary screening were retrieved and assessed for eligibility.
The articles were considered eligible if they reported MCID for one of the six commonly utilized PROMs evaluating outcomes of primary TKA in osteoarthritis (OA) knee which were the WOMAC, OKS, 1989 - original KSS, 2011- new KSS, SF-36 and SF-12. The articles were excluded if (1) brief reference of MCID was available but details of its calculation were missing (n = 41), (2) reported MCID of non-relevant PROMs (n = 6), (3) MCIDs were calculated for outcomes of hip and knee arthroplasty together with no distinct estimates for TKA (n = 6) (Fig. 1). Finally, 18 studies were considered eligible for this review. These articles were analyzed for terminologies used to indicate clinically meaningful change, analytic method employed for calculation and proposed MCID values.
Among 18 studies included in this review, 12 (67%) used the terminology MCID to indicate clinically meaningful changes after TKA [14,15,16,17,18,19,20,21,22,23,24,25] (Table 1). All these 12 studies used the terminology MCID alone except for one study  which used both MCID and MIC. Of the remaining 6 (33%) studies, two studies [13, 26] used clinically important difference (CID) alone, one study  used MCIC alone, one study  used MID alone, and one study  used both MIC and MID. Hence, although MCID was the most frequently used terminology to indicate clinically important change after TKA, other related terminologies are employed in about one third of the available studies.
The methodology employed to calculate MCID of PROMs for TKA was variable in the included studies. Among 18 included studies, 12 (67%) studies [12,13,14,15,16,17,18, 21, 23, 25, 28, 29] employed anchor-based methods making them the most commonly used analytic methods to establish MCID. All of these 12 studies used only anchor-based method to determine MCID except for one study  which used both anchor-based and distribution-based methods. The remaining 6 (33%) studies [19, 20, 22, 24, 26, 27] used only distribution-based methods to calculate MCID.
Of the 12 studies employing anchor-based methods, receiver operator characteristic (ROC) curve analysis was the most frequently used method with as many as 7 (58%) studies employed it; either alone [12, 13] or in combination with mean change and/or regression analysis methods [17, 23, 25, 28, 29] (Table 2). The mean change method was the second most employed anchor-based method as it was used in a total of 6 (50%) studies; either alone [14,15,16] or along with other anchor-based methods [10, 28, 29]. The mean difference method was used in 3 (25%) studies along with other anchor-based methods to establish MCID [23, 25, 28]. Additionally, 2 (17%) studies [18, 21] utilized linear regression analysis to establish MCID and 1 (8%) study  employed logistic regression analysis.
Among the distribution-based methods, four studies [14, 16, 25, 28] reported MDC and one study  reported standard error of measurement (SEM) to assess the reliability of the MCID estimates calculated by the anchor-based approaches in those studies.
Apart from different analytical methods employed to calculate MCID, there was variation in the methodology among different studies even when they used the same analytical method. For instance, the studies employing ROC curve analysis to calculate MCID have used different transitional scales, cut-offs for area under the curve (AUC), and anchor questions (Table 2). Precisely, four studies used a 5-point transitional scale [17, 23, 25, 28], one study each employed 6-point , 7-point , and 15-point  scales in the methodology. Moreover, cut-off applied for AUC that defines the diagnostic ability of MCID calculated was inconsistent across various studies with wide range from 0.55 to 0.84. Additionally, a minimum AUC cut-off value of 0.7 is recommended to ensure optimal diagnostic reliability of this method and an increase in the AUC value indicates higher predictive accuracy . However, one of the studies in this review calculated MCID of SF-12 with a reported AUC of less than 0.7 which questions the reliability of such an estimate . Similarly, among seven studies that used a distribution-based method, six studies [19, 20, 22,23,24, 27] used 0.5 times the standard deviation (SD) to calculate MCID while one study  used 0.8 times the SD. Hence, there was variation in the methodology applied which using an individual analytical method used to calculate MCID among various studies.
The variation in methodology was also found among studies which calculated MCID in the context of individual PROMs (Table 3). For instance, there were 6 (33%) studies which reported the MCID of WOMAC employing either mean change [14, 17, 22], ROC [12, 13], or both . Furthermore, these studies reported MCID values of WOMAC at varied time-points of analysis ranging from 6 months to 2 years. Similarly, among six studies reporting MCID values for OKS, three studies [18, 28, 29] used anchor-based methods while the other three studies [19, 20, 27] used distribution-based methods and evaluated MCID at varied time-points of follow-up ranging from 6 months to 5 years.
The calculated MCID values of PROMs for TKA also showed wide variation (Table 3). MCID values reported for WOMAC sub-scales in the included studies were 20.5 to 36.0 for pain, 17.6 to 33.0 for function and 12.9 to 25.0 for stiffness [12,13,14,15,16,17]. MCID values of OKS reported by studies were 4.7 to 10 points [18,19,20, 27,28,29]. Even although these MCID values for OKS show variation, it was less apparent than the wider variation of MCID values reported for WOMAC. Additionally, MCID for OKS reported by distribution-based approaches were lower (4.7 to 5 points) compared to those calculated by anchor-based approaches (5 to 10 points). Among anchor-based methods, MCID reported by mean change was noted to be higher (9 to 10 points) compared to other methods (5 to 9 points) [19, 27]. Regarding the original KSS, two studies reported MCID values at 2 years’ follow-up as 1.9 to 9.0 for the knee score (KSS-KS) component and 4.4 to 10.2 for the function score (KSS-FS) component [20, 31]. There were no studies that reported MCID values for the new KSS scoring system. Four studies reported MCID for SF-36 score at 6 months to 5 years after TKA [14, 16, 20, 26]. Two [14, 16] of these studies utilized anchor-based methods to report the MCID of the Spanish translated and validated version of SF-36. Other two studies utilized distribution-based methods to report the MCID of the original scoring system of SF-36 and its Dutch translated system [20, 26] (Table 3). Regarding SF-12, four studies [18, 22, 24, 25] reported MCID at 1 or 2 years after TKA using anchor- and distribution-based methods. Two [18, 22] of these studies reported MCID only for the physical component summary component of SF-12 and two other studies reported MCID for both mental and physical components [24, 25]. MCID values reported for the physical component summary was in the range 1.8 to 5 points and that for the mental component was − 1.4 to 5.4 points.
The orthopedic surgeons should have a detailed knowledge of the MCID of commonly used PROMs for patients undergoing TKA in terms of variations in the used terminology, optimum methodology of MCID calculation, and a critical overview of the available MCID values. One of the key findings of this review was that MCID was the most common terminology used to indicate clinically important change in PROMs. However, other similar and confusing terminologies are mentioned in about one third of the available studies on the topic. The analytical methods used to calculate MCID and its available values for clinical use are highly variable. Nonetheless, the anchor-based methods, especially ROC-curve analysis, are most commonly used to calculate MCID in conjunction with other methods. MCID values that may be routinely used with caution are 20.5 to 36.0, 17.6 to 33.0 and 12.9 to 25.0 for pain, function and stiffness sub-scales of WOMAC score, respectively, and 4.7 to 10 for OKS.
MCID and related terminologies
Jaeschke et al.  introduced the concept of MCID in 1989 as the smallest difference in scores in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient’s management. Subsequently, the term MCID has been generically used to indicate clinically meaningful change in a wide variety of clinical contexts, irrespective of the analytic method used in its estimation. Our study found that the terminology “MCID” has been used in two thirds of the studies that calculated the estimate of clinically meaningful change in PROMs after TKA. However, several other related terminologies like MCIC, MIC, MID, CID, and MDC may be confused with MCID and need further consideration. This will help physicians to improve their understanding of the distinctions between various terminologies and reach consensus regarding their appropriate utility in clinical and research setting.
Minimal clinically important change (MCIC) is a distinct terminology but with a similar definition as MCID and is utilized synonymously with it in a study to predict the rates of satisfaction after TKA . Another terminology used in studies [18, 28, 29] related to TKA is called MIC which is defined as the change in the health status in a single group or single individual over a period of time. The MIC is specifically utilized to indicate changes “within a group” (calculated using mean change method) or more importantly “within an individual” (calculated using ROC-curve analysis). As the group averages fail to capture the changes “within an individual level”, the use of MIC that is calculated by ROC analysis is of great relevance to assess the progress in the individual patients in clinical practice. Hence, we recommend the specific use of MIC to indicate changes in the health status “within an individual”, albeit the use of MCID to non-specifically indicate the changes in health status across all clinical contexts.
Minimal important difference (MID), defined as the differences in the health gain or loss between two independent groups of patients, is applicable in setting of clinical trials . The MID is calculated by differences in the mean scores in patients reporting themselves as “little better” and “about the same”. Accordingly, two studies have used this terminology while calculating the estimate of minimum important clinical change in PROMs after TKA [27, 28]. However, such similar contextual application of MID was not supported in other subsequently published studies which used MCID to compare health status between the comparison groups [23, 25]. Nonetheless, the concept of MID may be more relevant for clinical research where this terminology may be used interchangeably with MCID .
The concept of MID represented by above terminologies has been criticized in the context of TKA as one would expect larger than minimum improvements after TKA to be clinically more relevant [13, 26]. Accordingly, the terminology clinically important difference (CID) was used in two studies to indicate clinically relevant changes after TKA that are not necessarily minimum [13, 26]. CID was defined as the difference in scores of an outcome measure that is perceived by patients as beneficial or harmful. It is calculated using a transitional group that reports more than minimal improvements such as “good deal better” in contrast with MCID which involves the use of a transitional group showing slight or minimal improvements after TKA that is the “somewhat better” group . Future studies should evaluate the relative clinical relevance between CID and MCID to improve our understanding for their application in clinical practice and research related to TKA.
In contrast to all the above terminologies indicating clinical significance, minimum detectable change (MDC) is a purely statistical concept. MDC is defined as the minimal change that can be detected taking the measurement error into account . The concept of MDC is based on the standard error of measurement (SEM) and is used in reliability assessment of the calculated MCID values. For instance, MCID that is less than MDC is questioned for its reliability as it lies within the bounds of measurement error of the PROM. Conversely, with a MCID greater than MDC95, it is possible to state with 95% of confidence that the change in scores is outside the bounds of measurement error and thus reflecting a true change . Hence MDC acts as a reasonable starting point to detect the reliability of the calculated MCID values but cannot reflect clinically important change in a PROM.
With this background on the nuances in the MCID-related terminologies, it is recommended to maintain a standardized terminology in the literature in order to avoid the confusion among the clinicians and researchers. Considering the continued and increasing utilization of the term MCID since its inception in 1989, it seems that the term MCID has stood the test of time and should be the choice of terminology used in clinical practice and research in future [23, 25]. Nonetheless, the specific use of MIC for indicating changes within an individual is potentially advantageous considering its relevance in clinical practice.
Analytical methods for calculating MCID
There is a wide variation in the analytic methods used to calculate MCID that is presented in this review. Nine distinct methods that are currently employed to calculate MCID can be categorized into anchor-based and distribution-based methods .
Anchor-based methods use an independent tangible criterion in the form of a clinical or patient-based anchor question to calculate MCID. The responses to these anchor questions are typically used to assign the population under study into transitional groups. For instance, the response to an anchor question “Compared to before surgery, how would you rate pain in the same knee?” is used to establish transitional groups such as “great deal better,” “somewhat better,” “equal,” “somewhat worse” and “great deal worse” on a typical 5-point global rating of change (GRC) scale . Thereafter, the baseline pre-TKA and post-TKA PROM scores are analyzed in four distinct methods (mean change, mean difference, ROC, and regression analysis) emphasizing on the transitional group that reports minimal change (“somewhat better”). Two third of the studies included in this review employed anchor-based methods; making them the most-used analytic methods to establish MCID of PROMs for patients treated with TKA (Table 2).
Among anchor-based methods, ROC-curve analysis was the most-used method to calculate MCID. It entails establishing the threshold of MCID by a single point on the ROC curve that has maximum sensitivity and specificity to dichotomize the patients into those who achieved the clinically meaningful change (somewhat better and great deal better) and to those who did not (equal, somewhat worse and great deal worse) . The United States Food and Drug Administration recommends ROC-curve analysis as the best available method to establish MCID for “within an individual” analysis . However, MCID determined by ROC-curve analysis may not be ideal for analyzing changes “within or between the groups as it involves a single point estimate on ROC curve with no confidence intervals which is equivalent to pointing at a single individual out of the whole group. Furthermore, the results of this review reflect that the heterogeneities in the methodology of the studies which calculated MCID related to TKA in terms of different transitional scales, ROC-curve cut-offs for AUC, and anchor questions used (Table 2).
The mean change method was the second most employed anchor-based method to calculate MCID among the studies included in this review. MCID is calculated using the mean change method by estimating the absolute change in the mean PROM scores from baseline to follow-up in the sub-group of the patients who report themselves as “somewhat better” . As this method entails the study of longitudinal changes in one group over a period of time, it is best used for cohort studies. Additionally, at an individual-level application it is deemed to misclassify certain individuals as not having a change when the magnitude of their change falls below the group mean. The mean change method was the third most-employed anchor-based method among the studies included in this review. In contrast to the mean change method, the mean difference method calculates MCID by estimating the difference in PROM score between two transitional groups (like “somewhat better” and “no change”), making it more relevant in clinical trials while comparing intervention and control groups. Regression analysis is the fourth most-employed anchor-based approach that uses linear or logistic regression modeling to the mean score differences (from baseline to follow-up) to establish MCID. In the simplest form of linear regression, the slope of the linear relation between the differences in the PROM scores (independent variables) and transitional responses (dependent variable) is used to establish MCID [18, 21]. In logistic regression analysis, the non-linear relationship between the transitional responses (dependent variables) and all the confounding factors that can possibly affect it such as age, sex and baseline PROM scores (independent variables) are analyzed to establish MCID . It has been proposed as one of the least biased methods for establishing the MCID for “between the groups” analysis that involves comparison of groups with different parametric characteristics with independent confounding influences on MCID .
Apart from the drawbacks of the individual methods highlighted above, there are few shortcomings of anchor-based methods that warrant caution. Firstly, the anchor questions used to establish MCID are not validated, in addition, to the heterogeneity in the methodology pointed earlier (Table 2). As the calculation of MCID is dependent on both these factors, they may be responsible for wide variability in MCID values (Table 3). Secondly, using a single anchor question to calculate MCID has been a cause of concern as it is difficult to completely capture the changes following TKA with one anchor question. Thirdly, anchors have been criticized for their susceptibility to recall bias (the patient’s memories of the prior health state may often be inaccurate) and the tendency to be affected by the patient’s current status. The above-mentioned limitations in the anchor-based methods used to calculate the available MCID values of PROMs related to TKA warrants caution during their interpretation and clinical application.
The distribution-based methods, in contrast to anchor-based methods, are grounded on the statistical significance with no direct relationship to clinical significance. While standard deviation (SD) is the most frequently employed statistical method to determine MCID, standard error of measurement (SEM) and MDC report the measurement error used to assess the reliability of the MCID calculated by anchor-based approaches. The rationale for using SD is based on an assumption that half of the SD of the pre-treatment scores most likely approximates to a moderate effect size . Seven out of 18 studies in this review employed distribution-based methods to calculate MCID of PROMs related to TKA (Table 2).
In contrast to SD, which is sample-dependent, standard error of measurement (SEM) and MDC denote the measurement error in the PROM instrument, independent of the patient population. Although described as one of the methods used to establish MCID, we believe that SEM and MDC are statistical entities that best denote the measurement error with no consistent relationship with MCID and therefore cannot independently replace it.
The use of distribution-based methods to establish MCID has been challenged for not providing direct information regarding a patient’s perspectives of change. As they are more statistical than clinical it is believed that they do not address the “clinical” part of “minimal clinical important difference” . Secondly, although the magnitude of change determined by SD or effect size is certainly statistically significant, it might not necessarily be the reliable cut-off to establish MCID. Thirdly, as SD is sample dependent, MCID obtained by using SD cannot be generalized to other populations. Due to such inherent limitations, distribution-based methods are not employed alone and rather are used as a supplement to the anchor-based methods in the determination of MCID .
Considering multiple analytical methods with heterogeneity in methodology across studies and inherent limitations, multiple MCID values with wide variations have been reported for the same PROM (Table 3). Moreover, there is no established consensus yet on the best available approach to calculate MCID. Although it has been traditionally recommended to synthesize a smaller range of values by incorporating anchor- and distribution-based methods together, conceptually referred to as triangulation; it is interesting to note the paucity of such attempts in the literature pertaining to TKA . Although a modified Delphi model has been proposed to obtain a reasonable consensus in other specialties, it is important to recognize that these judgments cannot be objectively verified. Nonetheless, researchers are recommended to employ validated standardized methodology with multiple anchor questions and triangulation to completely capture the changes after TKA along with consistent reporting of the measurement error to ensure the reliability of calculated estimates.
Available MCID values of commonly used PROMs related to TKA
The commonly used PROMs evaluating the clinical outcomes of TKA use either disease- or joint-specific PROMs such as WOMAC, OKS and KSS or generic PROMs such as SF-36 and SF-12 which evaluate health-related quality of life.
Western Ontario and McMaster Universities Arthritis Index (WOMAC) is a validated, 24-item, disease-specific questionnaire used to evaluate patients with hip or knee OA, with three sub-scales measuring pain (five items), stiffness (two items), and function (17 items) . Each of the items has five possible responses with scores of 0 to 4 for each response with a maximum score of 96. Six out of 18 studies reported MCID values of WOMAC score using different analytical methods and at varied time-points of analysis ranging from 6 months to 2 years (Table 3). These studies reported a wide range of MCID values for WOMAC sub-scales ranging from 20.5 to 36.0 for pain, 17.6 to 33.0 for function and 12.9 to 25.0 for stiffness. The possible reasons for the wide variability in MCID values are inconsistencies in the analytic methods as highlighted previously.
Oxford Knee Score (OKS) is a validated, knee-joint-specific, 12-item questionnaire with five items assessing pain and seven items for function. Each item has equal weightage (0 to 4) with a possible score ranging from 0 to 48 and a higher score indicating better outcomes . Six studies reported MCID values of OKS employing different methods using different analytical methods at varied time-points of follow-up ranging from 6 months to 5 years (Table 3). Overall, MCID values of OKS demonstrated better convergence compared to WOMAC ranging from 4.7–10.0 points.
The original KSS proposed in 1989 is a knee-joint-specific questionnaire with two sub-scales, knee rating (KSS-KS, 0–100 points) and function score (KSS-FS, 0–100 points) . The KSS-KS is further categorized into pain (0–50 points) which is patient-reported and knee score (0–50 points) that is clinician rated in terms of range of motion (ROM), alignment and stability. Two studies reported MCID value of original KSS sub-scales at 2 year follow-up using different analytical methods (Table 3). As a clinician-completed scoring system, concerns have been raised regarding its validity which has led to the proposition of new KSS . However, none of the studies in this review have reported MCID values of new KSS.
SF-36 is a generic instrument used to assess health-related quality of life with eight domains and two summary scales: physical component summary and mental component summary . Four studies reported the MCID of SF-36 at 6 months to 5 years after TKA using different analytical methods (Table 3). The 12-item Short Form survey (SF-12) is a consolidated version of SF-36 with 12 items and eight scales or domains . Four studies reported a MCID of SF-12 at 1 to 2 years after TKA using anchor- and distribution-based methods (Table 3). The reported MCID values for SF-12 physical component summary ranges from 1.8 to 5.0 points, it is between − 1.4 to 5.4 points for the mental component summary.
The limitations in methodology of MCID calculation warrants caution to clinicians before these MCID values in clinical practice.
General considerations while using MCID values
The concept of MCID is associated with certain inherent limitations that the clinicians and researchers need to be mindful of before clinical application. Firstly, MCID is a context-specific entity. A context not only includes the type of disease (osteoarthritis) or treatment (TKA) but also comprises of population characteristics like age, sex, socio-economic factors, baseline disease severities and patient expectations. Hence, MCID is not a fixed attribute and should be cautiously applied across varied patient populations. Secondly, MCID is specifically meant to capture an individual’s response to treatment rather than the mean experience of the entire group. MCID reported as a single point estimate using group mean scores runs the risk of misclassifying certain individuals as non-responders when their improvement falls below the group mean. Therefore, MCID derived from group mean scores should be judiciously applied in clinical practice to detect individual-level changes. Instead, MCID by ROC analysis for individual-level application serves a better purpose in this regard . Thirdly, MCID is commonly used for determining sample sizes in clinical research based on an assumption that it ensures clinical significance to the statistically established significance. Such assumptions have been lately questioned and caution is advised while application of MCID in power analysis . Due to the aforementioned limitations in its use, established MCID values need to be utilized judiciously, considering the specific context of application.
The concept of MCID has been evolving over the past few years with many areas of ongoing research. In our opinion, progress in the following areas will permit the better use of this alluring concept. Firstly, consensus on the appropriate use of relevant terminologies and standardizing the methods of calculation are much needed to go beyond the existing state of conflict and to use MCID as a powerful outcome metric. Additionally, large organizations and consortia of researchers can provide consensus-based periodic updates on MCID values of commonly used PROMs . Secondly, to address the variability of MCID values, the preferred approach is to synthesize a smaller range of values by incorporating anchor- and distribution-based methods together using triangulation as they complement each other. Thirdly, as MCID is a context-specific entity, the best way to obtain reliable estimates is to establish MCID that is specific to the clinical settings using standard patient populations and validated linguistic questionnaires, something that is feasible at large-volume centers. With progress in the aforementioned areas, the utility of MCID can be vastly improved that can make it a powerful tool in clinical research besides aiding in clinical decision-making and better treatment practices.
Of the existing terminologies, MCID remains the most-used and anchor-based methods are the most-used analytic methods to calculate it. The MCID of WOMAC and OKS are reported in most of the studies with estimates ranging from 20.5 to 36.0 for pain, 17.6 to 33 for function, and 12.9 to 25 for stiffness sub-scales of WOMAC score and 4.7 to 10 points for OKS. As it is a context-specific value, the judicious use of published MCID values is advisable both in clinical and research settings. Although, there is no ideal method, synthesizing a smaller range in MCID estimates by triangulation of both anchor- and distribution-based approaches is recommended. However, due to the paucity of such attempts in the literature pertaining to TKA, MCID determined by ROC is regarded as most suitable for “within an individual” analysis in clinical practice and estimates obtained by regression analysis are considered least biased for “between the groups” analysis in clinical research comparing study and control groups.
Availability of data and materials
The data presented in this manuscript is available on the PubMed and EMBASE databases.
Clinically important difference
Minimal clinically important change
Minimal clinically important difference
Minimum detectable change
Minimal important change
Minimal important difference
Knee Society Score
Oxford Knee Score
Patient-reported outcome measures
36-item Short Form survey
12-item Short Form survey
Total knee arthroplasty
Western Ontario and McMaster Universities Arthritis
Rolfson O, Bohm E, Franklin P, Lyman S, Denissen G, Dawson J et al (2016) Patient-reported outcome measures in arthroplasty registries Report of the Patient-Reported Outcome Measures Working Group of the International Society of Arthroplasty Registries Part II. Recommendations for selection, administration, and analysis. Acta orthopaedica 87(Suppl 1):9–23. https://doi.org/10.1080/17453674.2016.1181816
Anthoine E, Moret L, Regnault A, Sebille V, Hardouin JB (2014) Sample size used to validate a scale: a review of publications on newly-developed patient reported outcomes measures. Health Quality life Outcomes 12:176. https://doi.org/10.1186/s12955-014-0176-2
Insall JN, Dorr LD, Scott RD, Scott WN (1989) Rationale of the Knee Society clinical rating system. Clin Orthop Relat Res 248:13–14
Ware JE Jr, Sherbourne CD (1992) The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care 30(6):473–483
Ware J Jr, Kosinski M, Keller SD (1996) A 12-Item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Med Care 34(3):220–233. https://doi.org/10.1097/00005650-199603000-00003
Murray DW, Fitzpatrick R, Rogers K, Pandit H, Beard DJ, Carr AJ et al (2007) The use of the Oxford Hip and Knee Scores. J bone Joint Surg Br Vol 89(8):1010–1014. https://doi.org/10.1302/0301-620x.89b8.19424
Bellamy N, Buchanan WW (1986) A preliminary evaluation of the dimensionality and clinical importance of pain and disability in osteoarthritis of the hip and knee. Clin Rheumat 5(2):231–241. https://doi.org/10.1007/bf02032362
Page P (2014) Beyond statistical significance: clinical interpretation of rehabilitation research literature. Int J Phys Ther 9(5):726–736
Celik D, Coban O, Kilicoglu O (2019) Minimal clinically important difference of commonly used hip-, knee-, foot-, and ankle-specific questionnaires: a systematic review. J Clin Epidemiol 113:44–57. https://doi.org/10.1016/j.jclinepi.2019.04.017
Wright A, Hannon J, Hegedus EJ, Kavchak AE (2012) Clinimetrics corner: a closer look at the minimal clinically important difference (MCID). J Man Manipulative Ther 20(3):160–166. https://doi.org/10.1179/2042618612y.0000000001
King MT (2011) A point of minimal important difference (MID): a critique of terminology and methods. Expert Rev Pharmacoecon Outcomes Res 11(2):171–184. https://doi.org/10.1586/erp.11.9
Maratt JD, Lee YY, Lyman S, Westrich GH (2015) Predictors of satisfaction following total knee arthroplasty. J Arthroplasty 30(7):1142–1145. https://doi.org/10.1016/j.arth.2015.01.039
Chesworth BM, Mahomed NN, Bourne RB, Davis AM (2008) Willingness to go through surgery again validated the WOMAC clinically important difference from THR/TKR surgery. J Clin Epidemiol 61(9):907–918. https://doi.org/10.1016/j.jclinepi.2007.10.014
Escobar A, Quintana JM, Bilbao A, Arostegui I, Lafuente I, Vidaurreta I (2007) Responsiveness and clinically important differences for the WOMAC and SF-36 after total knee replacement. Osteoarthritis Cartilage 15(3):273–280. https://doi.org/10.1016/j.joca.2006.09.001
Escobar A, Riddle DL (2014) Concordance between important change and acceptable symptom state following knee arthroplasty: the role of baseline scores. Osteoarthritis Cartilage 22(8):1107–1110. https://doi.org/10.1016/j.joca.2014.06.006
Quintana JM, Escobar A, Arostegui I, Bilbao A, Azkarate J, Goenaga JI et al (2006) Health-related quality of life and appropriateness of knee or hip joint replacement. Arch Intern Med 166(2):220–226. https://doi.org/10.1001/archinte.166.2.220
Escobar A, Garcia Perez L, Herrera-Espineira C, Aizpuru F, Sarasqueta C, Gonzalez Saenz de Tejada M et al (2013) Total knee replacement; minimal clinically important differences and responders. Osteoarthritis Cartilage 21(12):2006–2012. https://doi.org/10.1016/j.joca.2013.09.009
Clement ND, MacDonald D, Simpson AH (2014) The minimal clinically important difference in the Oxford Knee Score and Short Form 12 score after total knee arthroplasty. Knee Surg Sports Traumatol, Arthrosc 22(8):1933–1939. https://doi.org/10.1007/s00167-013-2776-5
Kiran A, Hunter DJ, Judge A, Field RE, Javaid MK, Cooper C et al (2014) A novel methodological approach for measuring symptomatic change following total joint arthroplasty. J Arthroplasty. 29(11):2140–2145. https://doi.org/10.1016/j.arth.2014.06.008
Bin Abd Razak HR, Tan CS, Chen YJ, Pang HN, Tay KJ, Chin PL et al (2016) Age and preoperative Knee Society Score are significant predictors of outcomes among Asians following total knee arthroplasty. J Bone Joint Surg Am Vol 98(9):735–741. https://doi.org/10.2106/jbjs.15.00280
Lee WC, Kwan YH, Chong HC, Yeo SJ (2017) The minimal clinically important difference for Knee Society Clinical Rating System after total knee arthroplasty for primary osteoarthritis. Knee Surg Sports Traumatol Arthrosc 25(11):3354–3359. https://doi.org/10.1007/s00167-016-4208-9
Berliner JL, Brodke DJ, Chan V, SooHoo NF, Bozic KJ (2017) Can preoperative Patient-reported Outcome Measures be used to predict meaningful improvement in function after TKA? Clin Orthop Related Res 475(1):149–157. https://doi.org/10.1007/s11999-016-4770-y
Lizaur-Utrilla A, Gonzalez-Parreno S, Martinez-Mendez D, Miralles-Munoz FA, Lopez-Prats FA (2019) Minimal clinically important differences and substantial clinical benefits for Knee Society Scores. Knee Surg Sports Traumat Arthroscopy. https://doi.org/10.1007/s00167-019-05543-x
Blevins JL, Chiu YF, Lyman S, Goodman SM, Mandl LA, Sculco PK et al (2019) Comparison of expectations and outcomes in rheumatoid arthritis versus osteoarthritis patients undergoing total knee arthroplasty. J Arthroplasty 34(9):1946–52.e2. https://doi.org/10.1016/j.arth.2019.04.034
Clement ND, Weir D, Holland J, Gerrand C, Deehan DJ (2019) Meaningful changes in the Short Form 12 physical and mental summary scores after total knee arthroplasty. Knee 26(4):861–868. https://doi.org/10.1016/j.knee.2019.04.018
Keurentjes JC, Fiocco M, Nelissen RG (2014) Willingness to undergo surgery again validated clinically important differences in health-related quality of life after total hip replacement or total knee replacement surgery. J Clin Epidemiol 67(1):114–120. https://doi.org/10.1016/j.jclinepi.2013.04.010
Kiran A, Bottomley N, Biant LC, Javaid MK, Carr AJ, Cooper C et al (2015) Variations in good Patient Reported Outcomes after total knee arthroplasty. J Arthroplasty 30(8):1364–1371. https://doi.org/10.1016/j.arth.2015.02.039
Beard DJ, Harris K, Dawson J, Doll H, Murray DW, Carr AJ et al (2015) Meaningful changes for the Oxford Hip and Knee Scores after joint replacement surgery. J Clin Epidemiol 68(1):73–79. https://doi.org/10.1016/j.jclinepi.2014.08.009
Ingelsrud LH, Roos EM, Terluin B, Gromov K, Husted H, Troelsen A (2018) Minimal important change values for the Oxford Knee Score and the Forgotten Joint Score at 1 year after total knee replacement. Acta Orthopaedica 89(5):541–547. https://doi.org/10.1080/17453674.2018.1480739
Norman GR, Sloan JA, Wyrwich KW (2003) Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care 41(5):582–592. https://doi.org/10.1097/01.mlr.0000062554.74615.4c
Copay AG, Eyberg B, Chung AS, Zurcher KS, Chutkan N, Spangehl MJ (2018) Minimum clinically important difference: current trends in the orthopaedic literature, Part II: Lower extremity: a systematic review. JBJS Rev 6(9):e2. https://doi.org/10.2106/jbjs.rvw.17.00160
Jaeschke R, Singer J, Guyatt GH (1989) Measurement of health status. Ascertaining the minimal clinically important difference. Controlled Clin Trials 10(4):407–415. https://doi.org/10.1016/0197-2456(89)90005-6
Jayadevappa R, Cook R, Chhatre S (2017) Minimal important difference to infer changes in health-related quality of life-a systematic review. J Clin Epidemiol 89:188–198. https://doi.org/10.1016/j.jclinepi.2017.06.009
Turner D, Schunemann HJ, Griffith LE, Beaton DE, Griffiths AM, Critch JN et al (2009) Using the entire cohort in the receiver operating characteristic analysis maximizes precision of the minimal important difference. J Clin Epidemiol 62(4):374–379. https://doi.org/10.1016/j.jclinepi.2008.07.009
(2006) Guidance for industry: patient-reported outcome measures: use in medical product development to support labeling claims: draft guidance. Health Qual Life Outcomes 4:79. https://doi.org/10.1186/1477-7525-4-79.
Revicki D, Hays RD, Cella D, Sloan J (2008) Recommended methods for determining responsiveness and minimally important differences for patient-reported outcomes. J Clin Epidemiol 61(2):102–109. https://doi.org/10.1016/j.jclinepi.2007.03.012
Scuderi GR, Bourne RB, Noble PC, Benjamin JB, Lonner JH, Scott WN (2012) The new Knee Society Knee Scoring System. Clin Orthop Related Res 470(1):3–19. https://doi.org/10.1007/s11999-011-2135-0
One of the authors certifies that he (TKK) has received during study period, an amount of less than USD 10,000 from Smith and Nephew (Memphis, TN, USA), has received during the study period, an amount of less than USD 10,000 from B. Braun (Tuttlingen, Baden Wurttemberg, Germany), outside the submitted work. The author certifies that neither (s)he, nor any members of their family, have any commercial association (such as consultancies, stock ownership, equity interest, patent/licensing arrangements etc.) that might pose a conflict of interest in connection with the submitted article.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Maredupaka, S., Meshram, P., Chatte, M. et al. Minimal clinically important difference of commonly used patient-reported outcome measures in total knee arthroplasty: review of terminologies, methods and proposed values. Knee Surg & Relat Res 32, 19 (2020). https://doi.org/10.1186/s43019-020-00038-3