We have evaluated the quality of life and functional outcome after unilateral primary total hip replacement (THR). Between 5 January 1998 and 31 July 2000, we recruited a consecutive series of 627 patients undergoing this procedure and investigated them prospectively. Each was assessed before operation and reviewed after six months, 18 months, three years and five years. The Short Form-36 Health Survey (SF-36) and Harris Hip scores were evaluated at each appointment.
All dimensions of the SF-36 except for mental health and general health perception, improved significantly after operation and this was maintained throughout the follow-up. The greatest improvement was seen at the six-month assessment. On average, women reported lower SF-36 scores pre-operatively, but the gender difference did not continue post-operatively. The Harris Hip scores improved significantly after operation, reaching a plateau after 18 months. The improved quality of life was sustained five years after THR.
Approximately 4500 primary hip replacements are performed in the National Health Service in Scotland every year,1 whereas in England over 30 000 procedures are carried out with an overall cost of £140 million a year.2 National guidance has been published on the selection of prostheses for primary total hip replacement (THR) using long-term viability as the determinant.3 However, the assessment of the functional outcome and quality of life is another important consideration in evaluating whether the operation is cost-effective.
The Short Form-36 Health Survey (SF-36) is a 36-item questionnaire which has been used extensively and validated as a measurement of general health status.4 It generates scores in eight dimensions, namely physical functioning, role limitation due to physical problems, role limitation due to emotional problems, social functioning, mental health, energy/vitality, bodily pain, and general health perception. It also contains a single item asking about perceived changes in health over the past year.
We have evaluated the effects of hip replacement on quality of life using this score. Previous such assessments have only occurred or up to two years post-operatively and the sustainability of these outcomes in the longer term remains unknown. We have therefore examined changes in the SF-36 over a period of five years and assessed the effects of age and gender on the scores. The Harris Hip Score (HHS)5 was used as a measure of functional outcome specific to the hip.
Patients and Methods
Between 5 January 1998 and 31 July 2000, 627 consecutive patients underwent unilateral THR in our hospital. The primary diagnosis was osteoarthritis (OA) in 580 (92.5%), rheumatoid arthritis in 18 (2.9%), post-traumatic OA in nine (1.4%), avascular necrosis in eight (1.3%), developmental dysplasia in two (0.3%), and was not recorded in ten (1.6%). A total of 389 patients (62%) were women and 238 (38%) were men. The mean age was 68 years (19 to 96). The women, who had a mean age of 69.2 years (19 to 89), were significantly older than the men, who had a mean age of 66.8 years (24 to 96) (p = 0.003). The principal indication for surgery was pain. The operation was performed by, or under the direct supervision of, one of six experienced orthopaedic surgeons who were not authors. Each used the approach familiar to them and inserted a fully cemented prosthesis. The SF-36 questionnaire (UK version)6 and a HHS evaluation5 were carried out pre-operatively and at six months, 18 months, three years and five years post-operatively. The assessments and data collection were undertaken by a dedicated research audit nurse, who was not an author.
Statistical analysis was performed using SPSS version 13.0 (SPSS Inc., Chicago, Illinois). Scoring was carried out according to the coding and formulae detailed in the United Kingdom SF-36 analysis and interpretation manual.6 The transformed scores range between 0 (worst health, severe pain) and 100 (best health, no pain). A two-sample t-test was employed to compare means, and paired-samples t-tests were used, where appropriate, for matched responses. The use of parametric methods was justified by the large sample sizes. Repeated measures analysis of variance (ANOVA), adjusted for age and gender, was used to test for overall differences between the five time points for each scale. Where this gave a significant time effect, Bonferroni-corrected paired t-tests7 were used to compare the scores at consecutive time points and between the pre-operative score and each subsequent time point. The relationship between age and the SF-36 scores was investigated using Pearson’s correlation coefficient. Multiple regression was employed to assess the mean difference in score between the genders, having adjusted for the effect of age. We considered p < 0.05 to be statistically significant.
Figure 1⇓ summarises the individual dimensions of the SF-36 and the HHS throughout the period of study. By the end of the five-year follow-up, 69 (11.0%) patients had died and 15 (2.4%) had undergone a revision procedure.
Repeated-measures analysis gave a time effect significant at p < 0.001 for all dimensions except mental health (p = 0.002) and general health perception (p = 0.63). The mean scores of all the dimensions except these two improved significantly following operation and remained so throughout the entire follow-up period compared with the baseline scores (p < 0.001). The greatest change occurred between the pre-operative assessment and the six-month review. None of the changes between the six-and 18-month reviews were statistically significant. Between 18 months and three years there was a significant decrease in mean energy/vitality, bodily pain and changes in health scores (p = 0.014, p = 0.014, p < 0.001, respectively). Between the three- and five-year reviews there was a significant drop in the mean scores of physical functioning, social functioning, bodily pain and changes in health (p < 0.001, p = 0.042, p = 0.042, p = 0.021, respectively).
Effect of age (Table I⇓).
The correlation between age and pre-operative SF-36 scores was inconsistent. Pearson’s correlation coefficients ranged between −0.153 and +0.117, suggesting only a weak correlation, even though some of these correlation coefficients were statistically significant.
Effect of gender.
Table II⇓ compares the mean SF-36 scores of men and women pre-operatively and at the reviews. As the women were significantly older than the men, multiple regression was performed to adjust for the effect for in of age. Before operation, women had significantly lower mean scores in all dimensions except for in the general health perception. However, following THR there was no statistically significant difference between the genders in their mean scores, apart from social functioning at six months and the role limitation due to emotional problems at 18 months.
Harris hip score.
Figure 2⇓ shows the trend of the HHS over the study period. As with the SF-36, the greatest change occurred between the pre-operative assessment and the review at six months (p < 0.001), with a time effect significant at p < 0.001 in the repeated-measure analysis. Between six and 18 months there was a further small, but significant improvement (p < 0.001). Following that the scores plateaued.
The curve of the HHS was similar to that for the physical functioning score, although the latter was found to deteriorate after three years. The female patients consistently achieved a lower mean HHS than the male patients. Using multiple regression, adjusted for age, the differences were statistically significant at the pre-operative, six-month and 18-month assessments (p < 0.001, p = 0.009, p = 0.048, respectively).
This prospective study confirms that THR significantly improves the quality of life in patients with OA of the hip.8–12 Our cohort compares favourably with other studies which have a relatively smaller sample size and a shorter follow-up. These studies were of patients undergoing both knee and hip replacements. Throughout the period of the study a dedicated audit nurse was responsible for the administration and collection of the questionnaires, ensuring the consistency and completeness of the database.
This paper represents the experience of a group of surgeons using a variety of components, and two types of cemented prosthesis. As the focus was on the overall satisfaction of the patient with their THR, no subgroup analysis was done.
At the end of five years, 13.4% of our patients had either died or undergone revision arthroplasty. Only 60% to 70% of the dimensions scores were valid, reflecting the inherent challenge of a prospective study in obtaining complete data from a large group of patients. The questionnaire was not always completed fully by the patient, resulting in missing scores. The completion rate may be improved by using the shorter version, the SF-12, which employs 12 of the 36 items of the SF-36. However, details about health status and outcomes may be lost with a simpler questionnaire.
Our results confirm the sustainability of the benefits of THR, even though there was a decline in certain SF-36 dimensions after 18 months. Although this was statistically significant, the clinical significance is debatable in view of the marginal change in the mean dimension scores. Nevertheless, the decline in dimensions of the SF-36 has not been observed before, as previous studies have had a maximum follow-up of two years.8–12 Further application of the SF-36 to our patients in the future will reveal whether the trend to decline will continue. However, it is important to appreciate the potential effects of ageing on SF-36 scores. Cross-sectional population studies have demonstrated an inverse relationship between these scores and different age groups,13 and this may partially account for the deterioration observed in our longitudinal study.
There was a significant improvement in the mean changes in health score post-operatively, but this also declined after 18 months, although it remained higher than the pre-operative level. The determination of changes in health is based on a single response in which the patient is asked to rate his/her current health against that in the preceding year. In contrast, the determination of general health perception is based on five responses dealing with the patient’s rating of their health, its comparison with that of acquaintances, and the expectation of their future health. McGuigan et al,8 followed up 114 patients undergoing total hip and knee replacement for two years and observed a significant increase in their changes in health scores but none in their general perception of health scores. It was felt that the patients would see a positive change in their health after total joint replacement but still rate perception of their health as less than that of their acquaintances and expecting it to deteriorate with age.8 This highlights the importance of understanding the construct and content of any health measures in the interpretation of their results.
Instead of using arbitrary cut-off points to define the elderly patients we assessed the effect of age on the SF-36 scores using Pearson’s correlation coefficient, given the numerical nature of the variables. We identified a weak but, in some instances, statistically significant correlation between the separate SF-36 dimension scores and age. When the sample size is large, as in this study, a correlation coefficient is likely to be statistically significant even when its value is close to zero. Age has not been found to be a significant determinant of the SF-36 scores following THR.12,14 Kiebzak et al12 suggested a possible floor effect and argued that the scores were universally low for all patients with end-stage OA, and that further decline in function attributable to age may not be appreciated by the SF-36.
Men tended to have a higher SF-36 score as had been noted in several other studies, even though the dimensions concerned may vary.8,9,12,15 However, in our cohort the differences between genders were only notable pre-operatively. Following THR, of the 36 comparisons performed at each review, only differences in the social function at six months and the role limitation due to emotional problems at 18 months reached statistical significance, and this could well be a consequence of multiple statistical testing. Total hip replacement appeared to have eliminated the pre-operative differences of SF-36 scores between men and women, implying that women had achieved more in terms of the SF-36 scores after THR. The gender difference was similarly noted in the HHS, except that the difference remained statistically significant up to 18 months post-operatively.
There were marked improvements in the HHS following THR. The greatest change was seen by six months, but the patients had the potential to improve further until 18 months. The level achieved was maintained at the five-year review, in contrast to the decline in the SF-36 scores. Lieberman et al15 examined the relationship between the HHS and the SF-36 scores in their 144 patients following primary THR and noted that the correlation was highest in the physical component summary scores.
Apart from the HHS, there are a number of functional outcome scoring systems for the hip, including the Oxford hip score,16 the Merle D’Aubigne score17 and the Hospital for Special Surgery hip rating.18 The contents of these systems overlap to a large extent. The practicality of using multiple systems and limited resources restricted us to employing one hip-specific measure only. The SF-36 is a general health survey and has not been developed specifically to evaluate patients with OA. However, it allows one to evaluate important issues of quality of life that may not be addressed adequately by hip-specific health measure such as the HHS.15 The SF-36 has been criticised for its limitation when applied on an individual basis,8 but its extensive use in outcome analysis and its proven validity and reliability make it useful in comparison between different conditions.
We are grateful to Lorraine McComiskie for supervising the follow-up of our patients, Anne Simpson for the administration of questionnaires and data collection, and Rob Elton for statistical assistance. We acknowledge the contribution of the following consultant orthopaedic surgeons to the development of our local hip database; I.J. Brenkel, T. I. S Brown, R. A. Buxton, T. W. Dougall, R. C. Marks, I. Weir.
No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.
- Received August 1, 2006.
- Accepted February 12, 2007.
- © 2007 British Editorial Society of Bone and Joint Surgery