We sought to establish the incidence of joint failure secondary to adverse reaction to metal debris (ARMD) following metal-on-metal hip resurfacing in a large, three surgeon, multicentre study involving 4226 hips with a follow-up of 10 to 142 months. Three implants were studied: the Articular Surface Replacement; the Birmingham Hip Resurfacing; and the Conserve Plus. Retrieved implants underwent analysis using a co-ordinate measuring machine to determine volumetric wear. There were 58 failures associated with ARMD. The median chromium and cobalt concentrations in the failed group were significantly higher than in the control group (p < 0.001). Survival analysis showed a failure rate in the patients with Articular Surface Replacement of 9.8% at five years, compared with < 1% at five years for the Conserve Plus and 1.5% at ten years for the Birmingham Hip Resurfacing. Two ARMD patients had relatively low wear of the retrieved components. Increased wear from the metal-on-metal bearing surface was associated with an increased rate of failure secondary to ARMD. However, the extent of tissue destruction at revision surgery did not appear to be dose-related to the volumetric wear.
Metal-on-metal (MoM) hip resurfacing devices were re-introduced following a number of design changes.1,2 The encouraging early results of the Birmingham Hip Resurfacing (BHR; Smith and Nephew, Warwick, United Kingdom) designers’ series3 led to a rapid increase in the number of surgeons carrying out the procedure. A number of manufacturers subsequently developed their own implants, resulting in the large number of resurfacing systems in regular use in Europe and the United States.4 Each commercially available device has a different combination of modifications of the original designs.5 The central design features which are thought to influence wear are: the use of ‘as cast’ versus forged material; varying heat treatments of the femoral and acetabular components; the difference in diameter between mated components (the diametral clearance); the arc of acetabular cover, and the angle of function of the femoral component.5–10
With the technology currently available, the MoM bearing surface remains integral to the design of hip resurfacing. The main drawback of a metal articulation is the production of metal debris due to the combined effect of mechanical and corrosive wear. In fact, the popularity of the procedure has waned in the last two years following a number of reports of adverse reactions in the peri-prosthetic tissues of resurfaced hips.11–15
We have previously highlighted the disparity in blood chromium (Cr) and cobalt (Co) levels between patients receiving different hip resurfacings16 and have identified a relationship between increased wear of the articular surface and the incidence of soft-tissue lesions.17 The aim of this study was to involve other centres and thereby increase the number of hip resurfacing procedures available for analysis in order to gain more understanding of the incidence of adverse reactions to metal debris (ARMD) in three commonly used hip resurfacing arthroplasty designs, further investigate the clinical effects of increased wear debris from MoM hip resurfacings, and determine whether the extent of tissue destruction in ARMD is related to volumetric wear from the bearing surfaces. ARMD is an umbrella term.17 It is used to describe joint failure secondary to surface wear of the bearing surface or corrosion debris, in the absence of any other obvious explanation. It encompasses metallosis, pseudotumour and aseptic lymphocyte-dominated vasculitis associated lesion (ALVAL).18
Patients and Methods
Three implant designs were used in the study: A: the Articular Surface Replacement (ASR; DePuy, Leeds, United Kingdom); B: BHR; C: Conserve Plus (C+; Wright Medical, Memphis, Tennessee). The important differences between the three devices can be seen in Table I⇓.
All the patients of three experienced hip resurfacing surgeons (AVFN, KDS, JPH) who received a design A, B or C resurfacing prosthesis between January 1998 and January 2009 were involved in this study. A total of 4226 hips in 3888 patients were studied. Details of the patients are shown in Tables II⇓, III⇓ and IV. Surgeon 1 (AVFN) is based in the United Kingdom (University Hospital of North Tees, Stockton, United Kingdom). Between 2002 and 2004 he used design B for all resurfacing procedures. From 2004 he used design A exclusively. This cohort of patients has been described previously.17,19 From 2007, patients at this centre have undergone routine whole blood and serum metal ion testing.20 Surgeon 2 (KDS) is based in Belgium (ANCA Clinic, Ghent, Belgium). He initially used design B but has subsequently also used designs A and C. Routine serum metal ions analysis is undertaken post-operatively. The early results of the patients receiving design B at this centre have been published previously.21 Surgeon 3 (JPH) is based in the United Kingdom (Freeman Hospital, Newcastle-upon-Tyne, United Kingdom) and uses design B exclusively. All surgeons used the posterior surgical approach.
At all centres, outcomes were assessed at six months, at one year and annually thereafter using the Harris hip score22 and the UCLA activity score. Any patient whose hip was revised or was listed for revision secondary to ARMD at the time of writing was recorded. At centres 1 and 2, standing radiographs were obtained at the time of blood sampling and analysed using EBRA software (University of Innsbruck, Innsbruck, Austria) to measure the inclination and anteversion of the acetabular component.23,24 This was carried out by one of the authors (DJL). At centre 3, the first 100 well-centred digitised standing radiographs were analysed in order to provide an assessment of the orientation of the acetabular component, whose position was allocated a ‘zone’ as shown in Figure 1⇓. Zone 1 corresponds to a safe zone derived from our previous work based upon an analysis of metal ion results and explants.25
The diagnosis of ARMD was made by the consultant in charge and was based on clinical presentation, findings at revision and the histological appearance of capsular tissue taken at revision surgery. The appearance of the peri-prosthetic tissues at revision was graded as follows: 0, no soft-tissue necrosis; 1, small localised areas of tissue necrosis; 2, widespread tissue necrosis, stability of the implant not obviously compromised and 3, widespread tissue necrosis with compromised stability of the implant.
Serum metal ion analysis.
Blood samples for Cr and Co analysis were taken more than 12 months post-operatively to avoid the confounding effects of the running-in period.26 The methods of sampling and ion analysis at both centres have been described previously.20,27
Histopathological examination of tissues from revision procedures at centre 1.
The processing of tissue specimens has also been described previously.17 In our experience, the vast majority of tissues retrieved from ARMD patients exhibit two dominant, and often co-existent, cellular responses: histiocytic and lymphocytic. For the purpose of this paper, tissues were described as having a dominant ‘histiocytic’ response if there was a band of histiocytes > 1.5 mm in width or a dominant lymphocytic (ALVAL) response if there were multiple perivascular lymphocytic cuffs > 1.5 mm in size. Metal particulate load in the tissues was graded using a scale similar to Mirra’s classification28 in order to allow correlation with the rate of volumetric wear. We used a more reproducible grading system derived from the assessment of tissue iron in liver biopsies. It is based upon the ease of identification of particles and the magnification power used to identify them: 0, granules absent or barely discernable at × 400; 1, easily confirmed at × 400, barely discernable at × 250; 2, discrete granules resolved at × 100; 3, discrete granules resolved at × 25; 4, masses visible to the naked eye.
The average diameter of the lymphocytic cuff, the width of the histiocytic band and the integrity of the surface membrane were recorded for each patient and correlated to the rate of volumetric wear.17
The wear of all available femoral and acetabular explants from centres 1 and 3 was measured by a co-ordinate measuring machine using a scanning head (Legex 322; Mitutoyo, Halifax, United Kingdom) with a spatial resolution of < 1 μm in the area of measurement. Volumetric wear was calculated using a validated method.29 Volumetric wear rates were correlated with serum metal ion results using Spearman rank univariate analysis.
A Cox proportional hazards model was constructed using surgeon and implant design as qualitative variables and bearing diameter as a quantitative variable. Given the numbers involved in the study, a p-value of < 0.01 was considered to be statistically significant.
At the time of writing, there were 60 failures related to ARMD. The incidence of ARMD by centre was as follows.
There was a clear difference in the failure rates between the design A and B patient groups. At a mean follow-up of 65 months (59 to 87), one design B patient had been revised. This was in contrast to the design A group, in which 24 hips had been revised and three more were awaiting revision, amounting to a failure rate of 6.3% at a mean follow-up of 37 months (10 to 67).
Two patients with unilateral design A resurfacings had undergone revision, amounting to a rate of revision of 3.4% at a mean follow-up of 31 months (11 to 58). The failure rate secondary to ARMD was 1.2% (23 cases) in the design B group, at a mean follow-up of 68 months (10 to 142) and 0.42% (two cases) in the design C group at a mean follow-up of 37 months (10 to 82).
Only design B was used here, and three patients had been revised. This amounted to a failure rate of 0.45% at a mean follow-up of 65 months (10 to 130).
Cox proportional hazards model showed that patients receiving design A implants were at a significantly greater risk of ARMD failure than patients with designs B and C. Smaller implants were at greater risk of early failure but the surgeon performing the procedure did not significantly affect survival (Table V⇓).
The most common presenting symptom was pain, located predominantly in the groin and occasionally radiating to the greater trochanter and down the thigh, and frequently associated with clicking and clunking sensations. One hip was asymptomatic but as the patient had experienced a severe ARMD with an ASR (DePuy) total hip replacement on one side she asked for the contralateral ASR to be revised. At revision, there were variable degrees and combinations of soft-tissue necrosis, joint effusions in 30, macroscopic metallosis in 18, component loosening in eight and masses in 17. In some cases the effusion was massive (> 200 ml) and had extended through the abductor musculature. The asymptomatic patient described above had severe soft-tissue destruction. Four patients presented with painless swellings in the lateral thigh and groin (two design B, two design A), one associated with femoral nerve symptoms. Revision surgery in these cases revealed psoas bursae containing caseous material. At centre 1, five male patients with design A implants were found to have increased levels of Cr and Co on routine screening. As they were asymptomatic they were simply observed. All five patients became symptomatic. Fractures of the femoral neck in association with gross macroscopic metallosis were found at revision. A psoas mass was also identified in one of these cases. Soft-tissue destruction was not extensive in these cases. Patients who were found to have a large effusion at revision surgery were revised significantly earlier than those found to have masses at revision (median time to revision (effusion) 21 months versus 52 months (masses), p < 0.001).
Table VI⇓ shows the significant differences in bearing size, acetabular component orientation and metal ion levels between the asymptomatic and ARMD cohorts. There was a clear trend towards increased risk of failure with decreasing size of femoral component in the three centres (Fig. 2⇓). Figure 1⇑ shows the relationship between the angle of inclination and anteversion of the acetabular component and failure. In two ARMD acetabular components we found they were in the safe zone for metal ion reduction (45° ± 5° inclination and 15° ± 5° for anteversion).25 An overview of the metal ion results for each implant group can be seen in Figure 3⇓. Median levels of Cr and Co in ARMD patients (n = 37 with pre-revision ion levels) were significantly higher than in the asymptomatic cohort, with the median Co concentration × 20 greater in the failed group. The two patients described above, with acetabular components in the safe zone, did have ion levels comparable with the asymptomatic cohort (Co levels of 2.1 μg/l and 1.9 μg/l). Histology of the tissue specimens showed no macroscopic metallosis but a dominant ALVAL reaction. Both of these patients had grade 2 soft-tissue destruction at revision surgery.
Total rates of volumetric wear correlated well with serum Cr (r = 0.847, p < 0.001) and Co levels (r = 0.732, p < 0.001) and with particulate tissue load (r = 0.60, p < 0.001), but not with perivascular lymphocytic cuff thickness, histiocyte band width, surface membrane necrosis or macroscopic tissue necrosis at revision surgery. A relationship was identified between lymphocytic cuff diameter and surface membrane necrosis (r = 0.55, p = 0.011).
This paper represents the largest collection of clinical and biochemical results from hip resurfacing patients in the current literature. We believe that the three-centre nature of the study, the experience of the surgeons involved and the use of three resurfacing designs provides a fair representation of the performance of modern hip resurfacing in the wider orthopaedic community.
Currently there is increasing concern over the potential adverse effects of metal debris. It remains to be shown whether these adverse reactions are dose-dependent and whether they are mediated primarily by an immune response to, or a direct toxic effect of the metal debris. Reported rates of ALVAL,21 pseudotumours9 and metallosis12 vary throughout the literature. There appears to be no consensus as to the boundaries between the described conditions. Liu et al9 define pseudotumour as “a soft-tissue mass associated with the implant which is neither malignant nor infective in nature”. It is not clear whether this term includes joint effusion. ALVAL is a histological diagnosis21 which has also been used to describe the clinical appearance of tissue necrosis and abnormal joint fluid at revision surgery.30 Metallosis is defined as aseptic fibrosis, local necrosis, or loosening of a device secondary to metallic corrosion and release of wear debris.31,32 At centre 1, metallic debris was identified, at least on a microscopic scale, in every tissue sample removed from ARMD joints. This is consistent with previous reports.33 In a number of tissue specimens, marked lymphocytic infiltration (ALVAL) was observed, coexistent with abundant particulate-laden histiocytes. The pathogenesis of these cellular processes is beyond the scope of this paper, hence our reason to use the umbrella term ARMD in order to determine the overall incidence of unsatisfactory clinical outcomes attributed to wear from MoM bearing surfaces.
In a smaller, single surgeon, single-site study, we previously highlighted clear differences in the release of metal ions between two resurfacing devices (designs A and B in the present study).16 De Smet et al27 demonstrated that serum metal ion concentrations correlate well with wear of retrieved femoral components. By using Cr and Co analysis as a surrogate indicator of wear, we therefore concluded that increased wear was associated with an increased probability of early failure. These results have been substantiated here. In this series, design A has the highest rate of failure secondary to ARMD (Fig. 4⇓). It is also the most vulnerable to the effects of variation in acetabular component position in terms of ion generation. Design B is associated with a smaller range of Cr and Co values than design A and this is reflected in the lower failure rates secondary to ARMD. Patients receiving design C have the lowest serum ion levels and also the lowest failure rates secondary to ARMD. An obvious confounding factor here, however, is that the design C patients have a shorter mean follow-up than those with design B resurfacings. The Cox proportional hazards model should account for this factor in our analysis; however, we interpret these results with caution. The proportional hazards model we used will not account for unknown specific design features which may have an unexpected effect once a certain period of time has elapsed.
There is overwhelming evidence to show that surgeons cannot consistently position the acetabular components precisely. Without exception, studies show wide variations in the angles of inclination of the acetabular component and, to an even greater extent, its anteversion (Fig. 5⇓).34 Surgeons must accept that some variables may be beyond their control, for example changes in pelvic tilt35 during pre-operative positioning, intra-operative pelvic rotation36 and patient size. Consequently they should choose implants which are compatible with the inevitable variability of acetabular component orientation.
While the three resurfacing systems have differences of design including varying clearances, heat treatment or no heat treatment of both the femoral and acetabular components, we believe that the design feature most likely to explain the disparity in performance is the arc of acetabular cover. The acetabular component of design C has a smaller arc of cover than the other devices (Table I⇑). Despite this decreased cover, the range of motion prior to impingement is not compromised firstly, due to the recessed nature of the articular surface and also due to the smaller functional angle of the femoral component relative to the other two designs. These features increase the vulnerability of the device to the two mechanisms which appear to be critical in the acceleration of wear rates: edge-loading and microseparation/subluxation.37–39 The smaller arc of the cover, i.e. the more shallow the acetabular component, the greater is the tendency to rim-loading at equivalent angles of inclination and anteversion, when matched for size.
There are likely to be important factors other than those described above. Two design A devices were found to have radial clearances of < 60 μm, meaning that any distortion of the acetabular components in vivo40 may be very detrimental. This is currently under investigation.
The relationship between the amount of wear debris and the extent of tissue destruction is not straightforward. Using serum ion levels as a surrogate measure, our results suggest that, when exposed to low levels of wear, most patients with a resurfaced hip do extremely well.41 Whether or not there is a causal relationship, patients exposed to more debris are more likely to have complications. In this series, the patients with extremely high levels of metal ions/wear from the bearing surfaces did not, however, present exclusively with worsening pain, but with other symptoms including delayed fracture of the femoral neck and/or masses. At revision surgery, macroscopic metallosis was the dominant feature. This was in contrast to the striking appearance of joint fluid and tissue necrosis, which was seen in patients with moderately increased wear/metal ion levels. Neither volumetric wear rates nor joint/serum ion concentrations correlated with microscopic necrosis or the extent of tissue destruction observed at revision surgery. The relationship between the thickness of perivascular lymphocytic cuffs and surface layer necrosis suggests that tissue destruction is not a result of toxic concentrations of metal debris, but is more likely to be the result of an immune response provoked by the debris.42 Patients found to have pseudotumours were revised at a significantly longer period from primary surgery than those with no masses. We speculate that pseudotumours are part of the same pathological spectrum of disease which causes joint effusions and tissue death, but is a more advance stage of the disease process. Metallic debris produced by different devices may have different sizes and shapes43 and wear particles produced by design A could conceivably have greater immunogenicity. The central message, however, is that in the short- to mid-term at least, the vast majority of patients with MoM resurfaced hips will not experience severe soft-tissue reactions when exposed to the levels of wear associated with well-functioning bearing surfaces.
We conclude that increased wear from MoM hip resurfacings is associated with an increased probability of adverse clinical outcomes. These adverse outcomes include severe destruction of soft tissues and bony necrosis. It is likely that a provoked immune response is primarily responsible for the observed soft-tissue destruction, but in most patients high levels of wear are needed to instigate this negative cascade of events. We estimate that < 1% of patients develop reactions to normally wearing-bearing surfaces.
A table showing the explant analysis and dominant cellular response of failed devices from centres 1 and 3 is available with the electronic version of this article on our website at www.jbjs.org.uk
No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.
- Received May 12, 2010.
- Accepted October 14, 2010.
- © 2011 British Editorial Society of Bone and Joint Surgery