This study reports the mid-term results of a large-bearing hybrid metal-on-metal total hip replacement in 199 hips (185 patients) with a mean follow-up of 62 months (32 to 83).
Two patients died of unrelated causes and 13 were lost to follow-up. In all, 17 hips (8.5%) have undergone revision, and a further 14 are awaiting surgery. All revisions were symptomatic. Of the revision cases, 14 hips showed evidence of adverse reactions to metal debris. The patients revised or awaiting revision had significantly higher whole blood cobalt ion levels (p = 0.001), but no significant difference in acetabular component size or position compared with the unrevised patients. Wear analysis (n = 5) showed increased wear at the trunnion-head interface, normal levels of wear at the articulating surfaces and evidence of corrosion on the surface of the stem.
The cumulative survival rate, with revision for any reason, was 92.4% (95% confidence interval 87.4 to 95.4) at five years. Including those awaiting surgery, the revision rate would be 15.1% with a cumulative survival at five years of 89.6% (95% confidence interval 83.9 to 93.4).
This hybrid metal-on-metal total hip replacement series has shown an unacceptably high rate of failure, with evidence of high wear at the trunnion-head interface and passive corrosion of the stem surface. This raises concerns about the use of large heads on conventional 12/14 tapers.
Modular cobalt-chrome large-diameter femoral heads were introduced in 2003 to treat failure of the femoral component of metal-on-metal hip resurfacing when the acetabular component was well fixed. These had a 12/14 cone which allowed assembly on to an appropriate stem. Midland Medical Technologies (Birmingham, United Kingdom) who made the large diameter modular head did not manufacture a stem at that time, and surgeons used a variety of components from different manufacturers. Large modular heads on a femoral stem were increasingly used as a primary hip replacement instead of hip resurfacing for patients with poor quality of the femoral bone. The combination of a large durable bearing with a clinically proven femoral stem had perceived advantages, including a low rate of dislocation1 and wear,2 a greater range of movement3 and the potential for increased longevity compared with conventional metal-on-polyethylene total hip replacement (THR). These features made it an attractive option for young active patients with degenerative disease of the hip.
The reported survival of metal-on-metal THRs has been favourable, with few failures and complications, although most studies have involved 28 mm diameter heads.4–8 The results from smaller series on larger diameter bearings have also been encouraging.9–11
Recent reports of unexpected high failure rates for some types of hip resurfacing and a high incidence of adverse reaction to metal debris (ARMD)12,13 have led to an alert from the Medicines and Healthcare products Regulatory Agency prompting an urgent review of all metal-on-metal articulations.14
This study reports the mid-term clinical and radiological results of a large-bearing metal-on-metal hybrid THR. Secondary aims were to identify potential sites of failure from retrieval wear analysis and to identify factors predictive of revision.
Patients and Methods
Between 2002 and 2007, 199 metal-on-metal THRs were implanted by the senior author (JML). The author’s experience of hip resurfacing in post-menopausal women had been disappointing, owing to early failure of the femoral component, and the metal-on-metal hybrid THR was chosen because it offered the potential benefits of a large durable bearing combined with a femoral stem which had previously given excellent results. Other indications for the metal-on-metal hybrid THR included poor-quality proximal femoral bone resulting from conditions such as avascular necrosis of the femoral head, severe cystic degeneration or previous surgery. The cohort contained 110 women and 75 men (14 bilateral cases) with a mean age at surgery of 58.1 years (29 to 77) and a mean follow-up of 62 months (32 to 83).
All patients had a collarless polished tapered cobalt chrome stem (CPT; Zimmer, Warsaw, Indiana) implanted with cement (Palacos R+G; Heraeus Medical GMbH, Hanau, Germany). On the acetabular side, the Birmingham Hip Resurfacing (BHR) acetabular component (Smith & Nephew, Warwick, United Kingdom) coupled with a large modular metal femoral head (Midland Medical Technologies (MMT) Ltd, Birmingham, United Kingdom) was used until 2006, when they were replaced by the Adept resurfacing actabular component and modular metal heads (Finsbury Orthopaedics, Leatherhead, United Kingdom). The MMT modular head was introduced in isolation without a matching stem. The choice of stem was left to the surgeon’s judgement, as was the practice at the time.15 The senior author initially used the same stem (MS30; Sulzer Orthopaedics, Alton, United Kingdom) as one of the inventors of the BHR. Sulzer Orthopaedics was then acquired by Zimmer. The senior author was already familiar with the CPT stem (Zimmer) and began to use it instead of the MS30 as it was made of chrome-cobalt, had the same 12/14 taper as the MS30 stem and had excellent clinical results.16 The Adept system was used from 2006, after which time the senior author used the uncemented Adept stem with the large diameter modular head. Currently there are no universally accepted standards, guidelines or testing methodologies for trunnions used in hip replacement surgery, and there are no firm guidelines for the accepted tolerances between different taper geometries.
All operations were undertaken through a posterolateral approach in a laminar flow operating theatre. All metal-on-metal hybrid THRs were scheduled to have radiological and clinical review at one, two and five years postoperatively. However, after the Medicines and Healthcare products Regulatory Agency alert in 201014 and concerns over the number of patients presenting with clinical and radiological failure, all patients were recalled for review. The following assessments were performed:
The Oxford hip score17 was recorded (0 = worst score, 48 = best score). Patients were specifically asked if they had experienced groin, start-up or trochanteric pain, new-onset clunking or clicking, or fatigue (limp after exertion). Those describing new-onset symptoms were categorised as ‘painful hips’.
Standardised digital anteroposterior (AP) pelvic and lateral hip radiographs were obtained. Assessment of the position of the acetabular component was carried out using Einzel-Bild-Roentgen-Analyse software18,19 (EBRA; University of Innsbruck, Innsbruck, Austria) on all radiographs from each patient. Measurements were performed by an independent assessor (DJL), and mean values were used for analysis. The EBRA version of the acetabular component was calculated as the angle between the true version and the proposed mid-range value (15°; proposed normal range of version 5° to 25°). This established a range of versions from the proposed optimum value. Radiographs were assessed by two authors (BJRFB, JML) for progressive radiolucencies around the acetabular (De Lee and Charnley)20 and femoral (Gruen, McNeice and Amstutz)21 components, osteolysis, bone resorption and component migration.
Metal ion analysis.
Blood was sampled with a 21-gauge needle (Becton-Dickinson, Oxford, United Kingdom) and collected in trace element tubes containing sodium ethylenediaminetetracetic acid (EDTA). Samples were measured by inductively coupled plasma mass spectrometry for whole blood cobalt (Co) and chrome (Cr) levels (expressed in nmol/l). Normal ranges were given as 0 nmol/l to 120 nmol/l for Co, and 0 nmol/l to 135 nmol/l for Cr (equivalent to 0 ppb to 7 ppb). Patients with bilateral hip replacements were analysed separately to avoid confounding bias.
In cases of revision joint fluid was aspirated (either before or around the time of revision surgery) and analysed for Co and Cr levels using the same technique and equipment as for blood metal ion sampling.
Revised components underwent volumetric wear analysis of the bearing surfaces using a coordinate measuring machine (Legex 322; Mitutoyo (UK) Ltd, Andover, United Kingdom) with an accuracy of 0.8 μm. Measurements were made every 5° on 18 concentric circles as well as at the pole of the component, thereby completing a total of 4500 to 6000 measurements for each component, depending on the radius of the explant. Rates of volumetric wear were calculated using Matlab (MathWorks, Natick, Massachusetts) using a previously validated program.22 The availability of the RedLux Artificial Hip Profiler (RedLux Ltd, Southampton, United Kingdom) has also enabled further surface wear analysis on the most recent explants (n = 4), by scanning the surface of spherical objects and providing a three-dimensional image of the surface with the shape and location (geometric information) of the wear patch as well as information on the wear volume.23 Wear at the taper junction was determined using the co-ordinate measuring machine to perform several out-of-roundness traces at 0.5 mm height intervals on the internal surface of the tapers. Two unusual tapers were analysed as controls with maximum out-of-roundness of 8 μm and 10μm, respectively.
All single variable hypothesis tests were conducted within a non-parametric framework, after confirming that data were not normally distributed. Group comparisons for continuous variables were made using the Mann-Whitney U test. Multifactorial analysis was conducted using logistic regression models with categorical explanatory variables. Survival analysis techniques were used to model the log-rank test and Kaplan-Meier estimates of the survival function. Statistical inferences were conducted with a two-sided significance level of 5%. SAS version 9.1.3 (SAS Institute Inc., Cary, North Carolina) and R (R Foundation for Statistical Computing, Vienna, Austria) were used to perform the analysis.
Two patients died from causes unrelated to their THR. There were 13 patients who were lost to follow-up. Further patient demographics are outlined in Table I⇓.
A total of 17 hips (8.5%) required revision. The mean time to revision was 45.5 months (18 to 70), at a mean age at operation of 59.8 years (50 to 71). A total of 15 revisions were in women. Of the 17 revisions, 14 were the result of an ARMD, two due to deep infection and one to a peri-prosthetic fracture. The diagnosis of ARMD was made on MRI in nine cases, and clinically with subsequent histological confirmation after exploration of the hip and revision in five. Abundant necrotic caseous material was commonly found around the bone-component or bone-cement interfaces and often tracked anteriorly along the psoas tendon, with extensive bone loss and a spectrum of soft-tissue involvement and peri-implant fluid collections (Fig. 1⇓). A further 14 patients are awaiting revision as indicated by radiological changes (n = 14) and/or pain (n = 9), along with high metal ion levels (n = 10). Details of all revision cases are summarised in Table II⇓.
The cumulative survival rate for both acetabular and femoral components, with revision for any reason, was 97.0% (95% CI 93.4 to 98.6) at three years and 92.4% (95% CI 87.4 to 95.4) at five years (Fig. 2a⇓), and for ARMD as the cause of revision 96.9% (95% CI 93.3 to 98.6) at three years and 93.6% (95% CI 89.0 to 96.3) at five years.
Analysis by gender showed that the cumulative survival rate in women remaining free of revision for any reason was 94.9% (95% CI 88.9 to 97.7) at three years and 88.1% (95% CI 80.4 to 93.0) at five years, and the corresponding rate for men was 100.0% at three years and 98.6% (95% CI 90.5 to 99.8) at five years (Fig. 2b⇑).
Including those patients awaiting revision (i.e., all failures) as a worst-case scenario, the revision rate for any reason would be 15.1% at final follow-up, with a cumulative survival at five years of 89.6% (95% CI 83.9 to 93.4), and by gender, 12.5% at final follow-up with a cumulative survival at five years of 95.0% (95% CI 85.0 to 98.4) for men and 16.9% at final follow-up with a cumulative survival at five years of 87.8% (95% CI 79.9 to 92.8) for women.
All revision cases and nine of the 14 (64%) awaiting revision presented with symptoms. In two revision cases presentation was acute, with individual cases of dislocation and fracture (Vancouver B324 secondary to ARMD: no history of trauma and associated with a large volume of necrotic tissue in the cement-bone interface, with resorption and thinning of the cortex resulting in pathological fracture; Fig. 3b⇓). In those not revised or awaiting revision, 17 patients (18 hips, 9%) had painful hips; 13 of the 17 (76%) were women. Mean pre- and postoperative Oxford hip scores for all patients and revision/ awaiting revision cohort are outlined in Table I⇑.
A common progressive spectrum of radiological findings was seen which involved initial scalloping and bone resorption of the medial calcar region, with progressive lucency in Gruen zones 7 and 1, along with scalloping and bone resorption in zones 1 and 3 around the acetabular component (Fig. 3⇑). This spectrum of change was observed in ten of the 14 patients diagnosed with ARMD and in all those awaiting revision. In three cases peri-prosthetic fractures were present (two Vancouver AG, one acute fracture Vancouver B3). A further seven asymptomatic patients have non-progressive medial calcar resorption and are being investigated with metal artefact reduction sequence MRI.
Acetabular component size and EBRA analysis.
Comparison of revision/awaiting revision versus non-revision cohorts showed no significant difference in acetabular component size (p = 0.77), inclination (p = 0.38) or version (p = 0.12) (Table I⇑).
Metal ion analysis.
There was a significant increase in Co levels in the revision/awaiting revision group (p = 0.001) compared with the non-revision cohort, but no significant rise in Cr or Mo metal ion levels (p = 0.14 and p = 0.22, respectively) (Table I⇑).
Black markings and deposits were visible at the trunnion/modular head interface (Fig. 4a⇓). The stem had obvious pitting and evidence of corrosion along the surface which was more marked at the region of the proximal cement/stem interface and the tip of the stem (Fig. 4b⇓).
The mean bearing surface wear between head and acetabular component was 1.86 mm/yr (sd 1.55) (+/− 2 sd). These values, and the geometric informaton gathered from Redlux images, did not show abnormal wear volume, depth or position for the length of time the implants had been in place. The mean maximum out-of-roundness of the taper was 34.5 μm (sd 13.3) (± 2 sd; normal range 8 μm to 10μm). A characteristic pattern was observed with two discrete regions of wear at polar opposites to each other on the margin of the edge of the trunnion (Fig. 4c⇑). These findings indicate that increased wear at the trunnion/head interface and passive corrosion of the stem are the two main sources of metal ion debris.
The mean metal ion levels in fluid taken at revision cases were 18 635 nmol/l Co, and 35 740 nmol/l Cr. The mean ratio of Cr/Co ions was 6.5:1 in the joint fluid, compared to 1:2 in whole blood.
The presence of pain, high whole blood Co levels and radiological changes were included in a multiple logistic regression analysis model to determine the strongest predictors of revision or impending revision. The presence of an isolated raised Co level in the absence of either symptoms or radiological changes was not predictive of failure (p = 0.675). However, the presence of pain (p < 0.001) and radiological changes (p < 0.001) in isolation were both significant predictors of failure.
Outcomes from the National Joint Registry for England and Wales25 as well as independent series from centres in the United Kingdom26 have caused concern about the survival of metal-on-metal articulations. This large metal-on-metal hybrid THR series has demonstrated an unacceptably high level of early failure associated with extensive soft tissue and bony involvement. In the latest joint registry report the revision rate quoted for large-head metal-on-metal hybrid THR was 7.8% (6.6% to 9.3%) at five years. This is comparable to our revision rate of 8.5% (4.5% to 13.4%) at five years. More alarming, however, is the increase in the failure rate that is occurring with time (after two years in women and five years in men, Fig. 2b⇑) both in this series and in the registry.
This high rate of failure might reasonably be thought to be the result of a mismatch between the head and stem taper/trunnion. However, it is becoming increasingly clear that the same pattern of failure is seen in similar devices supplied by a single manufacturer.27 There are no universally accepted standards relating to the testing of these devices, and as a result there are limited data available on the mechanical behaviour of large diameter modular heads on 12/14 trunnions. The testing undertaken on these devices before introduction to the market clearly did not identify the risk of excessive wear between the head and the stem. There is an opportunity for the engineers and the orthopaedic manufacturing industry to develop an accepted testing methodology so that new devices can be tested appropriately before being implanted.
The majority of these failures have shown evidence of ARMD (14 of 17 revisions). It is now established that the reaction to metal debris may take several years to develop.13,26 This has been highlighted in this series by a mean time to revision of 45.5 months, with only one revision occurring within two years of implantation. It is therefore likely that we are underestimating the true rate of ARMD.
Pain was a prominent feature of failure but, unlike the recent series reported by Donell et al,28 was often accompanied by abnormal radiographs. The follow-up of these patients prior to the recall was planned at one and five years, and therefore risked not identifying patients until later in the failure process. We have learnt that the symptoms are often subtle and easily overlooked. In two revision cases the patients initially presented with mild lateral trochanteric pain and were treated conservatively for trochanteric bursitis. Later more intrusive symptoms prompted exploration to reveal, in one patient, complete destruction of the abductor insertion and a bald trochanter. Laterally based pain is a potential symptom of early failure and warrants further investigation.
Other established factors associated with early failure in hip resurfacing implants have included female gender, older age, high acetabular component inclination29–31 (> 50°) and version, small head size (< 50 mm) and high metal ion levels (Co > 7 ppb or 120 nm/l).14 It remains to be established whether these risk factors also apply to the large-head metal-on-metal hybrid THRs.
In this series being female was significantly associated with revision alone (p = 0.008), but not when combined with those awaiting revision (p = 0.157). The initial early revisions were predominantly in women, but at the latest review there had been a rise in failure in men. The cause of the high rate of early revision in women is not known, but indicates the multifactorial nature of the failure process. At present it is not possible to determine whether gender is an isolated risk factor in large diameter metal-on-metal hybrid THRs, but this will become clearer as the duration of follow-up increases.
High Co ion levels were also significantly associated with revision/awaiting revision, but unlike the failures of hip resurfacing, age, component positioning and component size were not significant risk factors for impending or actual failure: 60% of the acetabular components in this cohort were implanted within the proposed ‘safe zone’32 (inclination 40° (± 10°), version 15° (± 10°)). Furthermore, within this cohort of failures or impending failure with raised Co or Cr levels, 80% of acetabular components were still placed within the safe zone. The use of ‘generic’ safe zones, however, must be interpreted with caution as each individual implant will have a spectrum of tolerance within which it functions optimally. There have been no previously reported series of either the large-head modular Birmingham or Adept hip systems, and therefore the optimum safe zone for these combinations is not known.
Retrieval analysis has identified the trunnion-head interface as a potential source of metal ion debris, with otherwise normal wear volume per year (including depth and position) at the articulating interface. The common wear pattern observed at the trunnions (∞ shape) suggests a mechanical cause from excess force at the interface. Burroughs et al33 have previously reported that torsional forces at the trunnion increase as head size increases when metal is tested on standard and highly cross-linked polyethylene. However, the observed pattern with two identical areas of wear at polar opposites around the taper circumference is more suggestive of ‘toggling’, rather than a rotational moment. Furthermore, the lubrication regime that will be integral to the magnitude of these forces will be different for metal-on-metal and metal-on-polyethylene articulations, and will be affected by several factors, including rim contact, impingement, acetabular component deformation, point loading, sliding distance34,35 and the duration of the bedding-in phase. These factors, along with the magnitude of early ion production, are therefore likely to be relevant in the currently unknown aetiology of these early failures.
A further source of metal ion production is passive corrosion of the stem surface. Analyses of metal particulate matter from tissues of failed metal-on-metal articulations have shown that Cr (in the form of chromium orthophosphate, a byproduct of corrosion)36,37 is the predominant ion. Joint fluid aspirated at revision surgery showed markedly elevated Cr levels compared to Co, a reciprocal finding to the ratio of Co and Cr in the whole blood of the same patients. This is a similar finding to that described by Lang-ton et al,13 and confirms the macroscopic retrieval findings that corrosion has also had a role in the production of metal ions in these failed cases.
What remains unclear is whether mechanical wear at the trunnion or passive corrosion of the stem is the predominant contributor to metal ion production. Elevated levels of metal ions in the blood have been implicated by hip resurfacing studies and retrievals14 where the articulating surfaces are the proposed primary sources of metal ions. We suggest that the smaller surface area of the trunnion will result in less metal ion production than the larger articulating surfaces. In this series, testing the metal ion levels using a threshold set at 120 nmol/l had a sensitivity of 83% and specificity of 52% for failure. In order to obtain 100% sensitivity the threshold would need to be lowered to 40 nmol/l (equivalent to 2.4 ppb). Multilogistic analysis further showed that an isolated high Co level at this threshold was not significantly predictive of failure compared to the presence of either radiological changes or symptoms (pain). This raises the concern that the threshold level of 7 ppb is too high for metal-on-metal hybrid THRs and may falsely reassure the surgeon that the implant is functioning well.
Without a specific diagnostic test the importance of a complete assessment of these patients including a clinical history, examination and standard plain radiographs, cannot be over-emphasised. Our early experience informs us that one must intervene as soon as possible in patients with even mild symptoms, to avoid catastrophic complications.
This metal-on-metal hybrid THR series has demonstrated unacceptable high failure rates and a high occurrence of ARMD. Retrieval analysis has highlighted concerns over excess wear at the trunnion along with evidence of corrosion to the stem. Pain is a positive predictor of failure, and new subtle symptoms should not be overlooked. Metal ion levels remain a useful aspect of assessment, but in isolation are not specific or predictive of failure. Further work is necessary to determine the true aetiology of the high failure rates in large-head metal-on-metal hybrid THRs and to establish the mechanical forces and their resultant effects on the 12/14 taper. With the increasing popularity of larger femoral heads this series highlights a need to develop international standardised testing regimes and evidence-based guidance for surgeons on the safe and appropriate use of large diameter modular heads on tapers of varying dimensions and geometries.
A scatter plot showing the acetabular component inclination and version of those hips revised or awaiting revision, with reference to the ‘safe zone’, is available with the electronic version of this article on our website at www.jbjs.org.uk
The authors wish to thank Mrs A. Wakefield for her assistance in the data collection for this paper.
No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.
- Received November 18, 2010.
- Accepted January 25, 2011.
- © 2011 British Editorial Society of Bone and Joint Surgery