We present the development and results of a nationwide, prospective, observational follow-up programme including patient-reported outcome measures (PROMs) for the Swedish Hip Arthroplasty Register. The programme started in 2002 and has gradually expanded to include all units performing total hip replacement in Sweden. The self-administered PROMs protocol comprises the EQ-5D instrument, the Charnley class categorisation and visual analogue scales for pain and satisfaction. These current analyses include 34 960 total hip replacements with complete pre- and one-year post-operative questionnaires.
Patients eligible for total hip replacement generally report low health-related quality of life and suffer from pain. One year post-operatively the mean EQ-5D index increased to above the level of an age- and gender-matched population, with a considerable reduction of pain (p < 0.001). Females, younger patients and those with Charnley category C reported a lower EQ-5D index pre-operatively than males, older patients and Charnley category A or B, respectively (all p < 0.001). In a multivariable regression analysis Charnley category C, male gender and higher age were associated with less improvement in health-related quality of life (p < 0.001).
Nationwide implementation of a PROMs programme requires a structured organisation and effective data capture. Patients’ response rates to the Registry are good. The continuous collection of PROMs permits local and national improvement work and allows for further health-economic evaluation.
Implant survival has been the most commonly reported outcome variable following joint replacement surgery. However, the retention of a prosthesis is not the only indicator of success.1–6 Traditionally, indicators which have been collected have included radiological results, rates of infection, re-admissions, and other adverse events such as thromboembolic and cardiovascular complications. None of these give sufficient sense of the ultimate outcome for the patient. In medical research and among health-care providers there has been a shift to focus on patient-reported outcomes7 and how they are measured and analysed.
The main indications for total hip replacement (THR) are pain and poor health-related quality of life (HRQoL) assessments. Therefore it would be preferable to measure and report such variables when analysing the results of THR. Monitoring patient-reported outcome measures (PROMs) at a national level for a certain intervention permits both local improvements in the delivery of care and national analyses such as health-economic evaluations.
There are currently 89 different national medical quality registers in Sweden. They are all run by the profession but funded by the Swedish Association of Local Authorities and Regions and the National Board of Health and Welfare. These organisations stipulated more than ten years ago that the registries should measure outcomes with a multidimensional approach and include both disease-specific and generic PROMs to supplement general follow-up. Recently, the Department of Health in England required the routine measurement of PROMs for all NHS patients in England before and after they undergo hip and knee replacement surgery.8
The aim of the Swedish Hip Arthroplasty Registry is to collect prospective observational nationwide data in order to monitor and improve outcome of THR for each patient in Sweden. Starting as a pilot project in 2002, the Registry introduced a PROMs follow-up programme, which has gradually been adopted by most units performing THR in Sweden. The inclusion of these in a nationwide prospective study is demanding because of the volume of data being acquired and the need for appropriate supporting information technology. We describe the development of the prospective nationwide Swedish PROMs programme for patients with THR, and analyse the response rate. We have investigated the pre- and one-year post-operative PROMs in the Swedish THR population, and examined how age, gender, diagnoses and comorbidity are associated with different patient-reported outcomes.
Patients and Methods
The Swedish Hip Arthroplasty Register was initiated in 1979.9 All public as well as private orthopaedic units in Sweden performing THR participate on a voluntary basis. Besides the information included in the social security number (date of birth and gender), individual data on diagnoses, side and detailed information on implants and fixation are reported. More recently, information about the American Society of Anesthesiologists’ (ASA) classification of physical status,10 height and weight has been added to the variables collected. Individual procedure registration captures between 97% and 99% of all primary procedures.11
In order to increase the utility of the Register, a standardised protocol including PROMs has been introduced stepwise from 2002. All patients are asked to complete a self-administered ten-item questionnaire, including Charnley’s functional categories (A, B, and C),12,13 a visual analogue scale (VAS) for pain, and the generic EQ-5D measurement.14 This is done pre-operatively and, unless the patient has undergone revision surgery, at one, six, and ten years post-operatively.11 Supplementing the follow-up instrument, a VAS addressing satisfaction with the outcome of the intervention has been added.
The Charnley categories partly permit correction of scores due to differing musculoskeletal comorbidity burdens. Category A comprises patients with unilateral hip disease, category B, patients with bilateral hip disease and category C, patients with multiple joint diseases or other major medical conditions impairing walking capacity. Originally, the classification was developed for categorisation by the interviewer. In the PROMs programme the Charnley category is assigned by using two self-administered questions: 1) Do you have any symptoms from the other hip? and 2) Do you have problems walking because of other reasons (e.g., pain from other joints, back pain, angina, or any other medical condition impairing your walking capacity)? The VAS for pain ranges from 0 (no pain) to 100 (worst pain imaginable). The question addresses the average pain-experience from the current hip during the last month. The vertical line supplied carries a subscale of indicators and ordered response levels (0 to 20, no or slight pain; 20 to 40, mild pain; 40 to 60, moderate pain; 60 to 80, severe pain; 80 to 100, unbearable pain). The VAS addressing satisfaction with the outcome of the hip surgery ranges from 0 (satisfied) to 100 (dissatisfied). This vertical line is also supplied with subscale indicators and ordered response levels (0 to 20, very satisfied; 20 to 40, satisfied; 40 to 60, uncertain; 60 to 80, not satisfied; 80 to 100, dissatisfied).
The EQ-5D14 is a HRQoL or utility instrument that evaluates subjects in five dimensions, namely mobility, self-care, usual activities, pain/discomfort and anxiety/depression.14,15 In this version of the EQ-5D instrument (EQ-5D-3L; EuroQol Group, Rotterdam, The Netherlands) each dimension is divided into three levels of severity generating 243 possible combinations of responses. The EQ-5D can be presented as a health profile or as a global health index with a weighted total value for HRQoL. The minimum value is −0.594, representing a state worse than death, and the maximum is 1.0. In order to adjust for cultural differences in response pattern, different tariffs are used when computing the index. Lacking a specific Swedish tariff, the Registry uses the United Kingdom tariff based on time-trade-off valuation, a method used to determine the weighting of different health states.16 Since no health-economic analyses have been conducted, negative values have not been adjusted. The EQ-5D also contains a health state VAS (EQ-VAS) ranging from 0 (worst imaginable) to 100 (best imaginable).
The questionnaire has been adapted to an internet-based touch-screen application for pre-operative use in hospital clinics. This system has been tested internally for reliability and validity. The advantages include immediate online access to the results, no missing values and decreased risk of systematic errors due to problems such as illegible handwriting or incorrect manual registration. However, some units prefer to use the conventional pen-and-paper questionnaire pre-operatively.
At follow-up at one, six and ten years after the operation the questionnaire is mailed to the patient with an explanatory letter and a stamped addressed envelope enclosed. Besides general information about the outcome measures programme and the survey, the covering letter tells the patients to contact their orthopaedic surgeon if they have problems with their THR. Non-respondents receive the first and only reminder after eight weeks. Once a month, the Registry distributes lists of all patients due to receive follow-up protocols to the participating clinics. Each clinic is then responsible for the logistics including checking the current address, sending out questionnaires and reminders and manually registering the data with the online database.
From the Register all patients with complete pre-operative and one year post-operative follow-up protocol were selected. Procedures for tumours and acute fractures were excluded. The main analyses consisted of 34 960 THR procedures (32 396 patients; 9257 patients with bilateral THR, of whom 197 had undergone one-stage bilateral THR). The mean age was 68.1 years (15 to 96), and 20 220 procedures (57.8%) were in female patients. The demography, including distribution of diagnoses and Charnley category, is presented in Table I⇓. The mean time to follow-up from operation was 12.7 months (1 to 68).
An expected EQ-5D index was retrieved from a Swedish reference population in the Western Region aged from 16 to 84 years. This included data from 63 349 individuals collected during 2006 to 2008 in a population survey with random samples.17 Expected indices were assigned by gender and age divided into five-year cohorts. Patients over 84 years were conservatively assigned the index for the 80-to 84-year group. The mean percentage of expected indices was calculated from the mean of the division of the individual EQ-5D index by the assigned expected value.
The development of the PROMs programme and the increase in participating units is illustrated in Figure 1⇓. A completeness analysis was undertaken using data from all THRs performed in 2008 in all clinics with a running routine of collecting the questionnaire (73 of 78 clinics at that time). The completeness pre-operatively, at the one-year follow-up, and combined, was expressed as the proportion of respondents among all patients with THR during 2008 as reported to the Registry. The non-respondents’ data were analysed individually with respect to age, gender and diagnosis and at group level with respect to hospital and regional affiliation.
Statistical methods and data analysis.
Data regarding categorical variables are generally presented using frequencies and proportions. For continuous variables, means, sds and in some cases 95% confidence intervals (CI) or medians and ranges are used. The SPSS v 17.0 statistical package (SPSS Inc., Chicago, Illinois) was used for statistical analyses. The Wilcoxon signed-rank test, the paired samples t-test, the Mann-Whitney U test or Fisher’s exact test were used to evaluate any difference between the groups or measures as appropriate. Multivariable regression analyses in which delta values for outcome parameters or satisfaction VAS were used as dependent variables, with gender, age, diagnoses, and Charnley category as independent variables. A p-value < 0.05 was considered to show a significant difference.
The rates of completeness for the pre-operative questionnaire and the one-year follow-up questionnaire among the 12 300 procedures performed in 2008 were 86.1% (10 588) and 90.2% (11 095), respectively; a total of 9727 patients (79%) completed both questionnaires. Table II⇓ presents the demography of non-respondents compared with respondents. There were no significant gender differences among the types of respondent, but diagnoses (osteoarthritis versus other diagnoses) and ASA scores differed significantly between respondents and non-respondents. Diagnoses other than osteoarthritis were more frequent and the mean ASA score was higher among non-respondents. There were inconsistent differences with regard to age.
The pre-operative EQ-5D indices were distributed bimodally, one cluster (n = 16 156 procedures) having a median at 0.088 and the other (n = 18 804 procedures) at 0.69 (Fig. 2a⇓). Post-operatively, no problems in any dimension were reported in 12 417 procedures (35.5%). Figure 2b⇓ presents the tri-modal one-year post-operative distribution with a middle large group (n = 19 985 procedures) clustering at a median of 0.73 and a small group of non-responders (n = 2558, 78%) clustering around a median of 0.089. The consistency of the Charnley self-categorisation was examined. We found that patients undergoing 14 109 procedures (40%) considered themselves Charnley A or B and those undergoing 9658 (28%) Charnley C both pre-operatively and one year after surgery. However, patients involved in 11 182 procedures (32%) were inconsistent, either shifting from A or B to C (5275 procedures, 15%) or vice versa (5907 procedures, 17%).
Patients generally reported a low EQ-5D index and considerable pain (Table III⇓) pre-operatively. One year postoperatively there were improvements: EQ-5D index and EQ-VAS and pain were significantly reduced (all p < 0.001, Wilcoxon signed-rank test). Females and younger patients (< 60 years) had a lower mean EQ-5D index pre-operatively but had a greater mean gain at the one-year follow-up than males and older patients (both p < 0.001, Mann-Whitney U test). The mean satisfaction VAS score was 16.8 (95% CI 16.4 to 17.2) and patients in 30 988 procedures (88.6%) reported a score ≤ 40 (satisfied). Males were slightly (mean difference of 2.7) but significantly more satisfied (p < 0.001, Mann-Whitney U test).
Compared with the EQ-5D index of the general population (expected EQ-5D index), a lower index was reported pre-operatively in 30 030 procedures (86%). One year after surgery an EQ-5D index higher than expected was reported after 23 352 procedures (67%). However, generally the youngest age groups (< 44 years) did not reach the expected level of EQ-5D index (p < 0.001, Mann-Whitney U test). In Table IV⇓ the reported EQ-5D indices are compared with the expected values for different age groups and genders.
Patients with no previous or later (from 1 January 1992 until 31 December 2009) primary THR procedure were assigned unilateral status. Bilateral status was assigned to all primary procedures where the patient at any time had THR on the contralateral hip. Further, for the bilateral procedures it was assigned if it was the first or the second hip. Results are presented in Table V⇓. The mean difference in EQ-5D index at the one-year follow-up was significantly higher for the bilateral first hip compared both with the second hip and the unilateral THRs (p < 0.001, Mann-Whitney U test).
The influence of gender, age, diagnosis and Charnley category on differences in EQ-5D index, EQ-VAS, pain and the absolute satisfaction were studied using multivariable regression analyses. Generally the multivariable correlation coefficients were low, which means that the variables included in the model can only predict the outcomes to a small extent. Male gender was associated with lower gain in EQ-5D index and less improvement in both pain VAS and EQ-VAS, but also with more satisfaction. Charnley categorisation was a strong predictor: Charnley C was associated with a lower increase in EQ-5D, less pain reduction, fewer improvements in EQ-VAS and less satisfaction. Older age was a negative predictor for all outcomes. Diagnosis was not a significant predictor for difference in EQ-VAS but for the changes in EQ-5D index, avascular necrosis was associated with a slightly greater improvement. In contrast avascular necrosis predicted less reduction of pain, while inflammatory hip disease predicted a greater improvement in EQ-5D index, more reduction of pain and greater satisfaction.
Patients eligible for THR generally report low HRQoL and suffer from pain. At one-year follow-up HRQoL was restored to the levels of an age- and gender-matched general population with a significant reduction of pain. Younger patients reported a lower mean EQ-5D index pre-operatively but they reported a greater mean gain at the one-year follow-up. However, generally the youngest patients did not reach the expected level of HRQoL. Our interpretation is that younger patients have more active lifestyles and are likely to be more hampered by the limitations of their hip disease, and subsequently their artificial joint, than older patients. Despite considerable differences in EQ-5D index before and after surgery, the generally poor HRQoL in younger patients suggests scope for improvement in the way these patients are managed. In order to preserve as much HRQoL as possible among younger patients, persevering with non-surgical treatment options is important. However, it might reasonably be argued they should have surgery at an earlier stage, but the risk of revision complicates this analysis.
Patients with bilateral THR have a lower mean EQ-5D index for the first hip compared with the second hip both pre- and post-operatively, but the outcome after the second hip does not differ from that of those who thus far have only had unilateral surgery. From this study we are unable to draw any further conclusions about THR in bilateral hip disease but leave this for future investigations.
In all, 76% of the patients improved in EQ-5D index by more than 0.074 units, reportedly a minimum clinically important difference.18 Using this definition, only 5% reported a clinically significant reduction in EQ-5D index, while 19% reported an unchanged HRQoL. The minimum clinically important difference for pain measured with a VAS ranges from 9 to 15 points.19,20 More than 90% of respondents reported an improvement exceeding 15 points at the one-year follow-up.
As described by others21,22 this population had a bimodal distribution of EQ-5D indices pre-operatively and a trimodal distribution post-operatively. This reflects the design of the algorithm rather than a true separate grouping of the population. The algorithm design assigns problems in the pain dimension as a relatively greater impact on the weighted index than problems in other dimensions. The pre-operative inconsistency between the preference-based EQ-5D index and the patient-centred EQ-VAS, which was absent at the follow-up, is interesting. It may be an effect of a response-shift, meaning a change in internal standards or in the perception of HRQoL during the course of disease. This has previously been described by Ostendorf et al21 for patients eligible for THR. Moreover, the pronounced ceiling effect post-operatively with 35% of the THR population reporting highest possible health state limits the possibilities to analyse the outcomes. For this reason the Registry has decided to test and compare the EQ-5D-5L version to the EQ-5D-3L with five levels of graded severity instead of three.23
The VASs used in the PROMs programme are not orthodox; they are modified for two reasons. Firstly, the scales are used in different settings (in pen-and-paper forms, for touch-screen and internet applications) and cannot be dependable on a specific length. Secondly, older patients tend to have difficulties understanding the traditional VAS.24 The modified VASs have been tested internally, showing adequate validity and reliability.
There are probably several reasons for inconsistencies in Charnley self-categorisation. Patients shifting to C may have become aware of or may have developed medical conditions impairing their mobility during the time between the two assessments. Improvements in general health following THR could explain why some patients shifted from C to A or B. This could be due to surgery but also to other healthcare interventions. Alternatively, the normal variability of the medical condition that led the patient to report C categorisation pre-operatively could explain the shift. Also, patients’ natural misconception of the origin of symptoms, e.g. referred knee-pain or the spine-hip dilemma, is a conceivable explanation of the inconsistencies. The ratio of patients with Charnley B (bilateral hip disease) was lower than expected. The results suggest that patients forget the well-functioning replacement hip and hence categorise themselves as Charnley A. One could argue that the non-original self-assessment design may be one reason for the variability. However, inconsistencies over time have been reported for the interviewer-based method of classifying patients as well.25 Despite these limitations of the instrument, Charnley categorisation is still one of the strongest predictors of patient-reported outcomes,26 and earlier recommendations about its use are confirmed.13
To our knowledge, there are few other examples of prospective, nationwide collection of PROMS for a common orthopaedic intervention. The New Zealand National Joint Register27 collects the Oxford hip score28 in a follow-up programme but not pre-operatively. They report similar proportions of patients who do not respond satisfactorily to surgery. The recently started NHS PROMs programme covering hip and knee replacements includes the EQ-5D and the Oxford hip or knee scores, but the programme is in its infancy. To combine short disease-specific and generic instruments seems ideal. The most frequently used generic instrument in clinical trials is the 36-item Short-form health survey (SF-36).29 Due to the large number of questions it is not suitable for use in routine follow-up, and the SF-36 health utility index cannot be used for health economic analysis like the EQ-5D index. From a scientific point of view there is limited rationale for a full-scale nationwide PROMs programme. Random prospective samples would be statistically sufficient. However, the overall aims of the PROMs are 1) to supplement the register with patient-reported outcomes, 2) to increase the sensitivity of the register analyses in order to identify failures from the patients’ subjective viewpoint, 3) to facilitate local monitoring and improvement-work, 4) to create a methodologically adequate health-economic instrument for cost-utility analysis and resource allocation, and 5) to reduce the number of routine consultations after THR. The latter is achieved by the cover letter enclosed to the follow-up protocols, where patients are requested to contact the clinician if they have hip problems. The Registry also recommends clinics are held to perform six- and ten-year radiological x-ray follow-ups and provide some logistics for this.
This study illustrates that implant survival as a single outcome parameter is too blunt. One year after primary THR implant survival is in most studies close to 100% while patient satisfaction in the Swedish THR population is barely 90%. Even though THR overall has outstanding results compared with many other operations, many patients do not consider they have benefitted, which also was demonstrated by Judge et al30 in a European multi-center cohort study. We must continue to investigate the reason for these failures. Combining technical outcome parameters with generic and disease-specific measures along with data on direct and indirect costs will provide a more subtle and complete assessment of the operation.
The completeness analysis was based on 73 of 78 clinics. We chose to exclude units which were just starting to implement the follow-up programme during 2008. All units now collect PROMs. Completeness rates in this prospective nationwide programme are high and the minor demographic differences among the non-responders are probably not of decisive importance for the true outcomes. The greater response rate one year post-operatively, where the questionnaire was sent by mail, reflects the different local logistical problems of presenting the questionnaire to all patients before surgery.
There is no consensus on the indications or timing for surgery. Large variations between, and even within countries in the severity of disease at the time when the decision to operate is made, have been reported.31 Collecting both disease-specific and generic data before and after the operation could help the profession to make the indications more precise and uniform on a national level. We cannot tell from these results whether surgery is being done at the right time or on the right patients. The high proportion of patients with rather moderate symptoms pre-operatively raises the question of whether these patients underwent surgery prematurely. Perhaps, for some patients, non-surgical options would have been preferable. In contrast, the results suggest that the timing of surgery is mostly correct since patients’ HRQoL is generally restored to above the expected level. A challenge for the healthcare system is to restore function and HRQoL lost due to hip disease, but also to minimise the loss prior to an eventual joint replacement.
Females have worse pre-operative HRQoL and report more pain but have a greater improvement in HRQoL and greater reduction of pain than males. Although this gender difference is small, females improve to a significantly higher level above the expected EQ-5D index level than males. The difference in mean age at surgery is more than two years. The results indicate that females have surgery later in the disease course and that their selection with regard to comorbidity is stronger than that among males. The fact that females report lower HRQoL than males across countries and age groups both in the general population32,33 and in the THR population stresses the importance of conducting prospective and not cross-sectional studies.
The strength of this study lies in the large number of patients included, which avoids the problems of performance bias that can occur if patients are recruited from selected surgeons and centres. However, this large scale PROMs programme requires extensive logistical support. The concentrated set of PROM variables is a prerequisite for maintaining high response rates but it may limit the range and depth of the analyses.
Nationwide implementation of a PROMs programme requires a structured organisation and effective and reliable methods for data capture. Patients’ response rates to the Swedish Hip Arthroplasty Registry are good. The overall PROMs of THR surgery in Sweden are satisfactory, with a mean increase in EQ-5D index of 0.36 at the one-year follow-up. However, an important minority do not respond satisfactorily to surgery and the reasons for these short-term failures need to be investigated further. Charnley category C, male gender and higher age were associated with less improvement in HRQoL. The PROMs programme allows continuous monitoring and improvements in outcomes locally as well as nationally.
A table showing multivariable linear regression analyses for different outcome measures (EQ-5D, EQ-VAS, pain VAS and satisfaction VAS) with respect to age, gender, Charnley category and diagnosis is available with the electronic version of this article on our website at www.jbjs.org.uk
No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.
- Received August 26, 2010.
- Accepted March 3, 2011.
- © 2011 British Editorial Society of Bone and Joint Surgery