The Articular Surface Replacement (ASR) hip resurfacing arthroplasty has a failure rate of 12.0% at five years, compared with 4.3% for the Birmingham Hip Resurfacing (BHR). We analysed 66 ASR and 64 BHR explanted metal-on-metal hip replacements with the aim of understanding their mechanisms of failure. We measured the linear wear rates of the acetabular and femoral components and analysed the clinical cause of failure, pre-revision blood metal ion levels and orientation of the acetabular component.
There was no significant difference in metal ion levels (chromium, p = 0.82; cobalt, p = 0.40) or head wear rate (p = 0.14) between the two groups. The ASR had a significantly increased rate of wear of the acetabular component (p = 0.03) and a significantly increased occurrence of edge loading (p < 0.005), which can be attributed to differences in design between the ASR and BHR. The effects of differences in design on the in vivo wear rates are discussed: these may provide an explanation as to why the ASR is more sensitive to suboptimal positioning than the BHR.
Metal-on-metal (MoM) hip resurfacings and modular large-head MoM (LHMoM) hips both have the advantage over metal-on-polyethylene (MoP) hips of a reduced risk of dislocation, lower rates of wear and the elimination of poly-ethylene debris that causes osteolysis. 1 However, on 7 September 2010 the Medicines and Healthcare products Regulatory Agency (MHRA) issued a Medical Device Alert recalling both the Articular Surface Replacement (ASR, DePuy, Leeds, United Kingdom) resurfacing and ASR XL LHMoM hip replacements. 2 DePuy voluntarily withdrew the ASR after data from the National Joint Registry for England and Wales had shown a five-year revision rate of 12.0% for the ASR resurfacing. 3 By contrast, the comparable rate for the Birmingham Hip Resurfacing (BHR, Smith and Nephew, Warwick, United Kingdom) system, which has the lowest five-year revision rate of all current generation resurfacing hip systems, was 4.3%. 3
The ASR was launched as a ‘fourth-generation’ hip resurfacing in 2003, six years after the BHR, with the aim of improving the existing implant design, instrumentation and surgical technique, and hence improve the clinical outcome. 4 DePuy remain the only company to have published the design rationale along with supportive data for their resurfacing in a peer-reviewed journal. 4
The largest clinical study of BHR hip resurfacings (5000 hips) reported a revision rate of 3.6% at a mean follow-up of 7.1 years. 5 There is no study of similar size for the ASR: the largest presents clinical data for 214 ASR hips with a mean follow-up of 43 months and a revision rate of 5.6%. 6 Table I ⇓ compares these studies and the clinical causes of failure. Siebel, Maubach and Morlock 7 presented a short-term follow-up (mean 46 weeks) of 300 ASR hips and reported a rate of revision of 2.8%, noting the effect of surgical skill on this rate.
A previous study 8 comparing blood metal ion levels in 160 patients with well-functioning ASR (n = 90) and BHR (n = 70) hips showed that the only significant difference between the groups was a lower level of cobalt (Co) in the BHR hips. The authors concluded that the ASR is more sensitive to the position of the acetabular component. There are several key design differences between the ASR and BHR acetabular components and it is important to understand the effect that each of these has on the clinical performance of the implant. The ASR has a reduced cup articular arc angle (CAAA) and clearance compared with the BHR, and the rim, or chamfer, of the acetabular component has a smaller radius. These factors increase the theoretical risk if edge contact and high wear. We define the CAAA as the portion of a hemisphere of the acetabular component that articulates with the head.
In this paper we combine clinical data, wear measurements and further tribological analysis from explanted ASR and BHR hips. Our purpose was to try to explain the difference in performance between the two systems and to understand their mechanisms of failure. This should clarify the importance of specific design features, which may then be applied to the design of future implants.
Materials and Methods
This was a retrospective study of consecutively collected ASR and BHR components, forming part of an implant retrieval programme. Implants were contributed by 93 surgeons from 68 centres across the UK. To be included in the study, we required each hip to have sufficient clinical data to identify the cause of failure, pre-revision measurement of whole blood metal ion levels, and either plain radiographs or a CT scan to measure the position of the components. These were available for 64 BHR and 66 ASR hips, a total of 260 components. Details of the patients’ demographic data and the components implanted are shown in Table II ⇓. Both modular and resurfacing head components were included in this study; Matthies et al 9 have shown that there is no significant difference in wear between modular and resurfacing heads.
The position of the components was measured either from three-dimensional (3D) CT scans or plain radiographs (AJH, JAS), and a consultant orthopaedic surgeon (AJH) classified the clinical cause of failure according to the categories in the National Joint Registry 3 using pre-, intra- and post-operative data. To be classified as ‘unexplained’ the components had to be adequately positioned, without evidence of infection and well fixed. There is no accepted definition of an adequate position of the acetabular component: in this paper a satisfactory inclination was defined as > 10° and < 70°. Satisfactory version was defined as being between −10° and +40°. This satisfactory range includes suboptimally positioned devices, but no orthopaedic surgeon would reasonably expect a device to work effectively outside this range. Data on the position of the acetabular component are included so that the reader can interpret the results using their own criteria.
Whole blood samples were taken pre-operatively in trace element blood tubes (K2EDTA; Becton, Dickinson and Co., Franklin Lakes, New Jersey) using the Vacutainer system (Becton, Dickinson and Co.). These tubes are certified low for metals. Samples were anonymised and stored at −20°C. Standard operating procedures were established for chromium (Cr) and cobalt (Co) measurement in biological fluids using inductively coupled plasma mass spectrometry (ICPMS) (PerkinElmer Elan; PerkinElmer, Waltham, Massachusetts).
The wear measurements were obtained using a Taylor Hobson Talyrond 365 (Taylor Hobson Ltd, Leicester, United Kingdom). 10 This is a ‘roundness measuring machine’ in which the component is rotated on an accurate spindle (maximum run-out 20 nm) while a stylus (2 mm diameter ruby) contacts the surface and records the deviations from a perfect circle (stylus gauge resolution 10 nm).
The acetabular component was measured by a series of traces parallel to its rim along lines of latitude. An MI (maximum inscribed) circle (the maximum diameter of circle than can be completely enclosed by the measured profile) was fitted to each profile to represent the unworn shape of the component. The maximum depth of linear wear is represented by the maximum distance between the MI circle and the measured profile. Further polar measurements along the lines of longitude can be used to separate in vivo wear from form errors. Form errors are deviations from a perfect sphere caused by deformation during implantation or explantation, or manufacturing tolerances, and can often be larger than the wear scar. The wear of the femoral head was measured by a series of 12 equally spaced polar traces. A MC (minimum circumscribed) arc (the minimum diameter arc to completely enclose the measured profile) was fitted to each profile, and all 12 profiles were then plotted on the same axis. As the femoral head components are axisymmetrical, the maximum wear can be calculated from the maximum difference between worn and unworn profiles, allowing wear and form to be separated. This method allowed the location, depth and extent of any wear scar to be calculated.
Edge-loaded implants have significantly higher wear rates. 9, 11 In this study a component was classified as edge loaded if the maximum depth of the wear scar occurred at the rim of the acetabular component.
A power study was done before starting the investigation, using a method similar to that used by Kwon et al. 12 Power analysis was performed for the rate of wear, the main outcome of the study. The sample size required to determine a statistically significant difference in linear wear rate was estimated using Altman’s nomogram. 13 Based on previous experience of wear measurements, 9 a difference in wear rate of 5 μm/year was selected as representing a significant difference in the rate of wear. A pilot study of ten hips was performed in order to estimate the expected standard deviation in linear wear rate: this was 7.73 μm/year, giving a standardised difference of 1.07. Statistical significance was set at 0.95 for this study. Accordingly, a minimum sample size of 46 (23 in each group) provided a power of 95%.
Neither the linear wear rates nor the blood metal ion levels followed a normal distribution in the population studied, so we used a non-parametric approach for statistical analysis. The Mann-Whitney U test was used to compare the outcomes between the two. Spearman’s rank test was used to test for correlation between the rate of wear and the inclination of the acetabular component, and Fisher’s exact test was used to compare the difference in the incidence of edge loading between the two groups. A p-value < 0.05 was considered significant.
Table II ⇑ provides a summary of the patient parameters in the two groups and the clinical causes of failure, acetabular component position and metal ion levels.
There were no significant differences in the causes of failure between the two groups, which were well matched for head size, inclination angle, age, gender and time of implantation. There was a difference in the version of the acetabular component, but this was not significant and both medians were within Lewinnek’s safe zone 14 (Table II ⇑).
The rate of linear wear of the acetabular component in the ASR group was significantly higher than that in the BHR group (p = 0.03). There was no significant difference in the rate of linear wear of the femoral head (p = 0.14) (Table III ⇓). The wear rates for both components are plotted in Figure 2 ⇓. The incidence of edge loading was 87% for the ASR group and 63% for the BHR group, a significant difference (p < 0.005). There were not enough components in specific sections of the inclination spectrum for a reliable statistical analysis of the occurrence of edge loading against position; however, Figure 3 ⇓ shows the rate of wear against position and edge loading, and the trends are considered in the discussion.
This is the first study that directly compares explanted ASR and BHR resurfacing and stemmed large-diameter MoM hips; it is also one of the largest published studies of clinical and wear data from explanted hips. It compares the performance of the ASR and the BHR from three aspects: the clinical causes of failure, pre-revision metal ion levels, and analysis of component wear.
Clinical causes of failure.
There was no significant difference in the clinical causes of failure between the two groups. The majority of failures were classified as ‘unexplained pain’. The cause of failure was not associated with any differences in component wear rates or blood metal ion levels. There is much anecdotal evidence of increased rates of loosening of the ASR acetabular component, but this was not seen in this study. A review of the literature found no peer-reviewed papers reporting a higher rate of aseptic loosening of the acetabular component for the ASR.
Figure 4 ⇓ shows that there are well-positioned, low-wearing hips that fail. Normally the clinical cause of failure is unexplained pain. It is not possible to comment on the failure rate of hips within the safe zone, but it is clear that the ASR and BHR both suffer failures as a result of patient-specific factors unrelated to wear or malpositioning.
The pre-revision whole blood metal ion levels of the patients with failed implants in this study were significantly higher than those of patients with a well-functioning ASR or BHR hip reported by Langton et al 8 (Table IV ⇓). However, there was considerable overlap in the range of metal ions between the failed and the well-functioning hips. In the series of ‘well-functioning’ hips reported by Langton et al, 8 the patients with high metal ion levels probably had edge-loaded, high-wearing hips but were relatively tolerant of the metal wear products. The large spread of metal ion levels in those with failed hips, which is similar to the spread of wear rates (Figs 1 ⇑ and 2 ⇑), reflects the different mechanisms of failure. Not all mechanisms of failure, for example early femoral neck fracture or patient-specific metal hypersensitivity, are associated with high wear. The differences in pre-revision metal ion levels between the failed ASR and BHR hips were not significant.
The rate of wear of the acetabular component was significantly higher in the ASR than in the BHR hips. There was, however, no significant difference in the rate of wear of the femoral heads. This is due to the increased frequency of edge loading in the ASR group. Edge loading occurs when the contact patch between the components extends over the rim of the acetabular component. In vivo it has been shown to result in a wear rate up to 25 times greater than normal. 9, 11 It is thought that edge loading leads to increased contact pressures and disruption of the lubrication regime. The results of this study are consistent with the current literature, in which increased wear rates (and/or blood metal ion levels) have been attributed to edge loading, 9, 11, 15, 16 and that the ASR is particularly susceptible to high wear as a result of edge loading. 8, 17
The maximum linear wear rate is based on the maximum depth of the wear scar; the wear scar for edge-loaded acetabular components is located near to the rim, but can be extremely deep (up to 740 μm). The direction of the load vector moves continuously in relation to the surface of the head, so that the head wear scar tends to be shallower but more extensive. As a result, differences in the maximum depth of the wear scar between edge-loaded and non-edge-loaded head components become less significant.
The measurement of volumetric wear with a CMM (coordinate measuring machine) is unreliable for shallow, extensive wear scars such as those seen on the head. The depth of these wear scars is often of a similar magnitude to that of the component form. Form is the deviation from the designed spherical shape of the component caused by manufacturing errors or deformation. It is important that measurement analysis can separate wear from form in vivo. Nearly all CMM measurement protocols fit a perfect sphere to the measured data. This does not distinguish wear from form, and potentially leads to errors in the measurement of these volumetric and linear wear rates. The Talyrond measurement protocol uses the axisymmetrical nature of the components to separate wear and form by measuring the linear depth relative to an unworn section of the component.
The increased incidence of edge loading seen in the ASR hips is related to the design of the implant. The cup articular arc angle (CAAA) is the angle subtended by the articular surface: a reduced angle results in a reduced ‘centre-edge angle’ and an increased risk of edge loading. 16– 18 The CAAA is not constant across all sizes and is less for smaller heads: 18, 19 on average the CAAA of the ASR is 10° less than that of the BHR (as measured in our retrieval laboratory). A compounding factor is the reduced clearance of the ASR compared with other designs. 20 The authors have measured the clearance of unworn components according to British Standard. 21 The typical radial clearance for an ASR hip was approximately 50 μm compared with approximately 110 μm for the BHR hip.
Application of the Hertz Theory of Elastic Contact 22 suggests that a hip with a low clearance and greater conformity between the head and acetabular component will-result in an increase in the size of the contact patch and a reduction in contact pressure. Therefore, although the ASR may benefit from lower contact pressures under optimal conditions, the increased size of the contact patch reduces the CPER (contact patch edge to rim) distance, which increases the risk of edge loading (Fig. 5 ⇓). The effect of the lower clearance on the risk of edge loading is as significant as the reduced CAAA for the ASR compared with the BHR.
In a hip simulator, hips with a low clearance have been shown to wear less and to have shorter ‘running-in’ times. 23, 24 However, these components were often tested in their optimum position, and this reduced wear has failed to translate to in vivo performance. As well as an increased risk of edge loading, hips with a low clearance are at greater risk from the effects of deformation of the acetabular component. However, Langton et al 8 have noted that well-functioning large-diameter ASR hips in men are associated with significantly lower ion levels than the equivalent BHR hips, suggesting that under ideal in vivo conditions the ASR can wear less than the BHR, as the designers intended and as predicted in simulator studies. 4 Langton et al 8 attribute the reduced wear rate to fluid film lubrication, promoted by the low clearance. However, there is no evidence to support this theory, and it ignores the complex non-Newtonian behaviour of synovial fluid. Mavraki and Cann 25 showed that the protein films which form on the surface of metal components are very sensitive to contact pressures. The reduction in the rate of wear of components with a low clearance may be the result of lower contact pressures, as predicted by the Archard wear equation, a model commonly used to describe sliding wear between lubricated contacts. Theoretically the volume of wear can be calculated. This would allow the formation of a protective protein-based boundary lubrication film, thereby further reducing the wear rate. 25 Several factors will contribute to the increased metal ion levels reported for smaller-diameter heads. 6, 8 For example, large acetabular components have an increased CAAA relative to small acetabular components: this results in increased coverage and an increased centre-edge angle. 18, 19 For identically positioned acetabular components, the distance between the edge of the contact patch and the rim of the component increases with head size and reduces the risk of edge loading. Hertzian contact mechanics shows that the contact patch is larger and the contact pressure lower in hips with a larger head. 22
In this study all the failed components that had been implanted with an inclination angle > 60° (Fig. 4 ⇑) were edge loaded, as were all components with severely adverse version. The exception was one BHR, implanted for 51 months, with no measurable wear scar, but the patient was in severe pain from the primary surgery and was immobile, which therefore reduced the rate of wear to an undetectable level. For malpositioned components there was no difference in the rate of wear between ASR and BHR components. However, it is with the suboptimally positioned components within 10° of Lewinnek’s safe zone 14 where the difference in the incidence of edge loading and rate of wear can be seen (Fig. 3 ⇑). There were not enough hips in each group to allow statistical analysis, but the trends are clear. Figure 5 ⇑ shows the relationship between inclination and wear rate. There was a significant positive correlation between wear and inclination of the acetabular component with the BHR (p = 0.01, Spearman’s rank correlation coefficient = 0.4) which was not seen with the ASR (p = 0.6, Spearman’s rank correlation coefficient = 0.08). This suggests either that ASR hips wear more at all inclinations or that the ‘safe zone’ that avoids edge loading is much narrower than that of the BHR. There were insufficient hips in this study to investigate the effects of head size or gender. The results concerning component position in our study are, like those in all published retrieval studies, limited by a lack of translational data, such as horizontal femoral offset, which may affect the rate of wear, as has been shown by hip simulator studies of micro-lateralisation. 27 Future work should probably take this into consideration.
The design of the rim is one of the key differences between the ASR and the BHR. The bearing surface of the ASR stops 2.5 mm below the rim, where there is a groove for the introducer to be attached to the component. The ASR has a lower CAAA than the BHR, but as a result of the groove at the rim there is no difference in the range of movement between the ASR and the BHR before impingement occurs. Another significant difference between the two systems is the radius of the chamfer at the top of the bearing surface: this was measured using a stylus-based profilometer at 0.5 mm for the BHR and 0.2 mm for the ASR for unworn 56 mm acetabular components. The radius of the rim has an important effect on the peak contact pressure, as the contact patch extends over the rim of the component: a large radius reduces the peak contact pressure and hence the local wear rate. The larger radius of the rim chamfer on the BHR helps to reduce the wear rate of edge-loaded components and in the ASR, the reduced radius of the chamfer further compounds the effect of edge loading on wear. Future work should probably include measurement of the surface roughness of heads, specifically as the sharp edge of the ASR acetabular component may increase roughness and damage to the head, and result in increased wear.
Non-implant related factors.
The failure rates reported by the National Joint Registry 3 are only as accurate as the information they receive. Between 2001 and 2009, only 256 failed ASRs were reported. 30 The number of patients and failures used to calculate the five-year failure rate is likely to be very low. Since the MHRA medical device alert 30 and ASR recall, 2 which alerted surgeons to potential problems associated with the ASR, and in conjunction with the financial contribution of DePuy towards revision surgery, the threshold for revision surgery will be lowered. This combined with an increased rate of reporting failures will result in a further increase in the ASR failure rate relative to that of the BHR and other implants.
It is probable that some of the difference in failure rates is the result of better surgical technique for the BHR than for the ASR and is not, therefore, related to the design of the implant. Several studies have shown the effect of surgical skill on the outcome of resurfacings, 7, 31 and further clinical studies of large numbers of patients treated by expert surgeons may help to separate surgical and implant failures.
This study was not a controlled clinical trial and we cannot comment on the relative failure rates of the ASR and BHR, although such information is available from the National Joint Registry. 3 It was potentially vulnerable to a collection bias, but the large number of surgeons and hospitals involved reduced this risk. It also included 32% of all ASR components reported to the MHRA 30 as having failed. An important limitation was that the power analysis was limited to the primary outcome of the study, which was component wear. Power analyses were not performed for the other outcome variables, such as whole blood metal ion levels, so these results must be interpreted with caution. The two groups were well matched for gender, age and femoral head size. There was a significant difference in the ratio of modular to resurfacing hips, which reflects the relative proportions of modular and resurfacings sold by both companies. However, Matthies et al 9 showed no significant difference in wear rates between modular and resurfacing hips. The median time of implantation is longer for the BHR than the ASR: the BHR has been available for six years longer than the ASR and will consequently have more long-term failures. It is beyond the scope of this study to address the mean time to failure of the hips. The wear rate of hips is not constant with time: the hip has an initial ‘running-in’ phase of increased wear and then a ‘steady-state’ phase of lesser wear. 32 The lower median duration of implantation of the ASR may result in an increase in the average wear rate, but at 35 months it is longer than the ASR running-in phase of 0.5 to 2 million cycles. 4
Valid measurement of the position of the acetabular component depends on the imaging modality: plain radiographs are inferior to CT. However, only one retrieval study has incorporated CT measurements, 29 and so the 35% of our patients who have had the position of their acetabular component quantified by CT scans is reassuring.
This is the first study to explain the differences in metal ion levels seen in other studies, by examining explanted components and proposing mechanisms of failure based on the differences in design. The ASR is more sensitive to suboptimal positioning than the BHR as a result of the rim design, low clearance and reduced CAAA. However, both implants suffer high wear when incorrectly positioned, and both have examples of ‘unexplained’ failures of low-wearing well-positioned cups.
We are extremely grateful for the help provided by the retrieval centre coordinator G. Lloyd (Imperial College) and P. Coward (Royal National Orthopaedic Hospital). We are also grateful to all the surgeons and patients that donated their hips to this study. This work was funded by the British Orthopaedic Association (BOA) through an industry consortium of nine manufacturers: Depuy, Zimmer, Smith & Nephew, Biomet, JRI, Finsbury, Corin, Mathys and Stryker. The contract allows for freedom to publish all results.
No benefits in any form have been received or will be received from a commercial party related directly or indirectly to the subject of this article.
- Received December 14, 2010.
- Accepted May 17, 2011.
- © 2011 British Editorial Society of Bone and Joint Surgery