Prosthetic total elbow arthroplasty (TEA) is a recognised treatment for the painful arthritic elbow. The available implants can be broadly grouped into linked, where the humeral and ulnar components are physically connected, and unlinked. The former group can be subdivided further depending on whether there is varus-valgus play between the humeral and ulnar components (sloppy hinge) or there is no play (fixed hinge). A minority of implants incorporate a replacement for the head of the radius.
The results reported for TEA are not as good as those for hip and knee replacement, although the published number of studies have tended to be comparatively small.1 There is little consensus as to the best implant to use in different clinical settings. A systematic review of these published series will give a clearer picture of the relative performances of the different implants.2
Materials and Methods
We conducted a PubMed search of the literature for all papers addressing the subject of TEA by using the words “elbow”, “arthroplasty” or “replacement”, using all fields. We undertook a hand search of the bibliographies of the papers describing series of prosthetic elbow arthroplasties. Case reports and references to abstracts of conference presentations were excluded, as were endoprosthetic reconstructions following excision of tumours. Some groups have published series of articles in which the same patients were included in more than one paper. In each case the largest series was included in the overall analysis and the smaller ones rejected. Some papers included the results of both revision procedures and primary replacements. We excluded these unless the information for primary arthroplasties alone could be extracted. All papers were reviewed by two authors (CPL, AJG) to determine exclusions and assess data; any disagreements were resolved by discussion with the third author (AJC).
The information gathered from the selected papers included the functional outcome score, the range of movement, loosening, dislocation, deep infection, revision, radiolucencies, instability of the arthroplasty, problems with the wound, failure of repair of the triceps, ulnar nerve lesions and overall complications.
There was no universal definition of loosening. We considered the implant to be ‘loose’ when the paper referred to either migration of a component, a complete radiolucency of at least 2 mm or the presence of radiolucencies and symptomatic loosening. Components found to be loose at revision were also categorised as loose regardless of the radiographic findings. A second, broader consideration of radiolucencies was established to include all complete radiolucencies of at least 1 mm, all those which showed progression and those components which had previously defined as being loose.
Implants which are not linked are vulnerable to subluxation or dislocation. ‘Dislocation’ was defined as cases which would require a procedure to obtain reduction even if no surgery was performed. Where there were subjective symptoms or radiographic signs of subluxation the joint was categorised as ‘unstable’, and this included all cases which had dislocated. As instability of linked implants reflects a failure of the implant, we described such cases as implant failures by disassembly. Other implants which failed by fracture of a component were noted. Where the bushing of a linked implant was revised due to wear, this was recorded as a revision, not as an implant failure. The mean rate of implant failure was gleaned from papers which specifically noted whether implants had failed or not. As it is almost certain that papers reporting revision would have mentioned any implant breakages, we have also calculated an ‘assumed’ failure rate with the number of cases with a known survival outcome as the denominator.
Definitions of infection also varied from paper to paper. In this review, deep infection refers to all cases in which the implant was retained after a washout, the elbow was revised for sepsis, or where dehiscence of the wound required either post-operative antibiotic treatment or flap closure to achieve tissue cover. Wound problems included all these cases and also those where there was reference to a haematoma, prolonged wound drainage, local erythema or dehiscence that was left to granulate or could be closed directly. Failures of repair of the triceps were noted.
The pre-operative condition of the ulnar nerve was noted when recorded. The rates of permanent and transient ulnar nerve palsies were documented, using the total number of implantations for each given paper as the denominator. A final default field to consider the overall complication rate was established, although it should be recognised that there was considerable variation in the recording of more minor complications such as erythema of the wound or a haematoma. Some patients may have had more than one complication.
An overview analysis was performed to reflect the findings of all papers reviewed. Sub-group analyses were carried out to consider the results of TEA for the specific indications of rheumatoid arthritis (RA), post-traumatic arthritis (PTA) and acute fractures, for different types of implant namely unlinked, sloppy hinges and fixed hinges and for selected implants still in widespread use including the Capitello-condylar (Johnson & Johnson, Leeds, UK), Coonrad-Morrey (Zimmer, Swindon, UK), Kudo (Biomet, Swansea, UK) and Souter-Strathclyde (Stryker Howmedica, Newbury, UK), which each command more than 10% of the market share in the UK.3
When considering length of follow-up and improvement in the range of movement, the series were weighted for size by multiplying the stated mean value by the number of cases in an individual series, then dividing the sum of these values by the total number of cases in order to obtain a corrected mean value for overview and sub-group analyses. Similarly, the figures for functional rating and revision rates were derived from the overall number of cases with a particular outcome divided by the total number of cases being considered from all series. These figures have been given the prefix ‘weighted’ in the text to emphasise this point. As the data for some fields were not available in all papers, the denominators vary. For revisions, the number of cases lost to follow-up or who had died has been excluded from the denominator unless the outcome of the arthroplasty was known and clearly stated. The figures used to calculate the percentages have been given. For complications, the rate has been calculated in the same way, but the range of rates reported is also presented, along with a median value. Given the heterogeneous nature of the papers reviewed, we feel that this provides readers with a better indication of the published experience than by providing only summary statistics which may give a misleading overall estimate of the effect of treatment.
The initial search and reviews of the bibliography returned 783 references of which 186 were excluded because they were not English language publications and 460 because they did not consider clinical series of primary prosthetic elbow arthroplasties. Of the remaining 137 articles, 864–,8 fulfilled the criteria for inclusion in the overall analysis; ten were excluded because it was not possible to isolate the data concerning primary implantation in papers which presented the results of both primary and revision procedures, and nine were excluded because there were insufficient clinical data to make a useful assessment. There were 32 papers which were otherwise acceptable but were excluded because the cases presented had been included in other papers, and 38 of the 86 (44%) were from the institution of the designer of the implant.4,5,8,10,12,14–17,19–21,24,26,27,31,33,39,41,44,46,47,50–54,56–60,63,79,81–83,85
The papers from the Gschwend group describing the GSB III implant (Zimmer)4,90,91 are an important contribution to the literature on elbow arthroplasty. However, the series included the same group of patients, of whom at least one received the GSB III implant as a two-stage revision, undergoing an excision arthroplasty following loosening of a GSB I prosthesis (Sulzer Inc, Winterthur, Switzerland) and subsequent re-implantation. These papers therefore should have been excluded under the criteria outlined above but following discussion we included the most recent paper in the analyses, treating the case noted above as an implantation for a flail elbow while accepting that it is possible that there may be more than one case within the series representing a revision procedure. Dee has published two papers26,92 that share some of the same patients. Only the larger, later series26 was included in the overall analysis, but the small, more detailed paper92 about a subgroup with rheumatoid arthritis was in the subgroup analysis from which the other paper was excluded.
Overall, 3618 implantations have been recorded in patients with a mean age of 58 years and a mean weighted follow-up of 60 months. The duration of follow-up was stated in only 69 of the 86 series.4–12,14,18–22,27,28,30,33–38,40–60,62–81,83–85,87,89 Sixteen11,17,19–21,23,25,34,36,45,52,57,58,61,67,82 of the 86 papers considered the results of more than one type of implant, although only five of these were comparative series.19,20,34,57,67 The results for the different types of implants are shown in Table I⇓, for the more commonly used implants in the United Kingdom in Table II⇓, and for different indications in Table III⇓; the papers included are listed in the legend to each table.
Authors used their own outcome score in most of the papers. Of the more widely used outcome scores, the Mayo elbow performance score (MEPS)7 was used by 21 of 86 (24%)5,50,51,54,57–60,62–64,66,68–71,76–78,87,88 of authors and in a modified form by a further three (3%).59,80,81 When considering only papers published since this score was described,7 the MEPS or a modification has been used in half the papers considered (23 of 46). Of the other outcome measures employed, only the score of Ewald93 (used in eight of 86 papers18,28,35,41,44,46,53,55) was used in more than 5%. Overall, the proportion with a good or excellent result as defined by the score used in the individual paper was 78%. The functional results of the different types of implant (Table I⇑) showed a higher proportion of excellent and good results for sloppy hinge devices (82%) than for unlinked (78%) or fixed hinge implants (73%). Analysis of the subgroups (Table III⇑) showed similar results for RA and PTA, albeit with many more implantations followed up for a shorter period in the RA group. The clinical results reported for acute fracture were very good, although both the number of procedures and the duration of follow-up were comparatively small.
The mean arc of flexion/extension improved in all series, with a weighted improvement of 26°. Linked implants restored a better range of movement than unlinked components (Table I⇑). This improvement was greatest in patients being treated for PTA, although the numbers studied were small (Table III⇑). The mean weighted improvement in fixed flexion deformity was 6° but there was variation between the implants. There was a weighted mean improvement of 11° with the Coonrad-Morrey elbow, of 3° with the Capitello-condylar and the Souter-Strathclyde elbows, but with the Kudo implant, there was a weighted mean deterioration of 1°.
There was an overall weighted revision rate of 13%. Only 26 of the 86 (30%)7,9,12,14,16,21,29,48,52,61,63,64,68–71,74,75,79,81,82,84,86,94,95,100 papers had performed a formal survival analysis or presented data in such a way that a survival analysis could be undertaken.
Complication rates of 14% to 80% with a median rate of 33% were reported. Deep infections were seen in 5% (143 of 2940; median 5%) of cases. The rate of deep infection has remained reasonably constant over time when considering papers published in five-year blocks. Prior to 1979, the rate was 6.9% (10 of 144); from 1979 to 1983, the rate was 5.4% (13 of 238) and between 1984 and 1989 it was 7.7% (26 of 336). When considering papers published since 1989, the rate has remained at around 4% for the periods 1989 to 1993 and 1994 to 1998 (4.0% (26 of 647) and 3.9% (24 of 606) respectively) and 4.5% (44 of 969) for the period 1999 to 2003.
The rate of post-operative wound problems was 9% (195 of 2115; median 7%). The duration of post-operative immobilisation was given in 63 of 86 papers. When considering the risk of wound problems in relation to this, the rate was 8.9% (34 of 382) with immobilisation for two days or less, 9.8% (84 of 861) with three to five days, 9.7% (25 of 257) with six to eight days and 6.0% (4 of 66) with nine or more days.
Although documented in only 41% of papers (35 of 86),4,5,8–10,14,17–22,25–28,32,33,37,41,49,53,54,57,62,64,66,67,72,73,79,82,85,87,89 post-operative disruption of the triceps was seen in 3% (47 of 1676; median 3%). When considering the approach used, disruption was seen in 0.56% (1 of 177) of cases where a triceps turndown was used, 2.8% (12 of 428) where the insertion of triceps was kept in continuity with the periosteum of the ulna by means of a posterolateral or sub-periosteal technique and 11% (14 of 129) where the insertion was released from the ulna.
Loosening was reported in 9% with lucencies in 14% (Table I⇑). The rate of loosening for sloppy hinge devices appeared to be lower than for unlinked implants over a similar period of follow-up, both being lower than for fixed hinges. For unlinked implants the rate of radiolucencies (11%) reported was similar to the rate of loosening (10%). With the linked implants, the reported rates of radiolucencies were much higher than the rates of loosening (21% vs 11% for fixed hinges; 15% vs 5% for sloppy hinges). The rate of loosening appeared to be higher in RA. However, the numbers reported for PTA are small and the mean follow-up for acute fractures is relatively short, so these findings should be interpreted with caution. In the unlinked implants, dislocation was reported in 5% and instability in 14%.
Failure of the implant was seen in both linked and unlinked versions, with an overall rate of 4% (69 of 1604). Component failure was seen with the Baski (humeral stem fracture), the Capitello-condylar (ulnar polyethylene fracture), the Coonrad III (ulnar stem fracture) and the Kudo 4 (humeral stem fracture). With linked implants, disassembly of the components represented failure or breakage of the axle locking mechanism or disassociation of the components. The reported rate was 6% (44 of 750) for sloppy hinge and 1% (1 of 166) with fixed hinge implants. In one paper, four of 23 Pritchard II sloppy hinge implants (Johnson & Johnson) had disassembled, with the axle of the locking mechanism seen to be backing out in a further six; these latter cases have not been included as failures in this analysis as they had not required intervention and continued to function.9
Permanent lesions of the ulnar nerve were seen in 5% (120 of 2416; median 3%; 0% to 27%). Pre-operative symptoms or signs associated with the ulnar nerve were reported in only 13 of 86 papers, with rates from 0% to 19% recorded; 49 of 86 papers documented whether transient ulnar nerve lesions occurred, with rates from 0% to 37% being reported (median 2%).
This paper is a comprehensive review of the English-language literature concerning TEA. Assessing the quality of method of the observational studies is an important step in a review of this kind.
A high proportion of the series reported (44%) originate from the institutions of the designers of the implant. The experience of these individuals, while valuable, is not necessarily representative of the experience of orthopaedic surgeons as a whole. However, as elbow arthroplasty is a relatively specialised procedure with few individuals carrying out the operation frequently, this is not as important as when considering knee or hip arthroplasty.
In 45% (62 of 137) of the papers reviewed, the same cohorts of patients had possibly been presented on several occasions; 32 papers were rejected on these grounds alone. Under some circumstances the later paper may update the series either to reinforce the longevity of good early results4,91 or, more importantly, to change the advice when deterioration is seen with time.6–9,94–,98 In these instances, the authors generally highlighted the previous publications early in subsequent papers, leaving the reader in no doubt as to the results. However, it was not always obvious that the results for some patients had been previously described and this problem is well recognised.99 Multiple presentation of patients could skew the overall findings of a review and so we excluded all but the largest or most informative paper where publications appear to have included the same patients in more than one series.
The outcomes were frequently presented used non-validated scoring systems; this is not surprising in a field where validation of such scores has only recently been published.100 In five papers, the authors did not use an outcome score, presenting either the results for range of movement and pain independently10–,12 or by not considering clinical function.13,15
The reporting of outcome rarely involved the use of a recognised form of survival analysis such as the Kaplan-Meier technique, and so the survival data are less reliable. The importance of using reproducible reporting methods when presenting survival data has been highlighted.101 The revision rates reported, both by our re-calculations and by the median values quoted, were similar for all three classes of implant (Table I⇑). The last paper reporting the outcome of a fixed-hinge device was from 1984.14 The indications for treatment of symptomatic replaced elbows may well have changed, so the survival figure for this class of implant may be artificially good. The revision rates for sloppy hinge and unlinked devices appear to be similar, with loosening being the most common mode of failure for both.
In terms of function, the results for both the sloppy hinge and unlinked implants appeared to be better than for the fixed hinges. The scoring systems used were, however, too varied to draw a firm conclusion, although the results for these two classes of implant are similar. The range of movement achieved with a sloppy hinge appeared to be better than that with unlinked implants, perhaps because the bone can be shortened to improve movement without risking instability of the prosthesis. This may explain the greater improvement in fixed flexion deformity seen with the Coonrad-Morrey elbow in comparison to the Capitello-condylar, Kudo and Souter-Strathclyde devices. Given the difference in the arc of movement, it may well be that the difference in functional outcome, where sloppy hinges appeared marginally better, does represent a true difference. However, the scoring systems used in the different papers may have a bearing on the functional scores achieved. Of the papers published since 1992, the Mayo elbow performance score was used in 68% of papers reviewing sloppy hinge implants and 30% of papers reviewing unlinked implants. This system allocated 10% of its points to the assessment of stability, which will be restored by a linked component in the absence of marked loosening or disassembly of the components. In reviewing outcome in unstable elbows, as is often the case in RA, implants of the linked type would be more likely to score highly for stability than the unlinked. This may have contributed to the apparent differences in functional outcome. It is important to remember that a stable joint is required for good elbow function.
One of the reasons given to support the use of an unlinked over a linked device is that by relying on soft-tissue balance to maintain congruency of the components, the risk of loosening is lower. However, the literature reviewed here suggests that sloppy hinge devices may have lower rates of loosening than unlinked devices (Tables I⇑ and II⇑). The most common mode of failure for unlinked devices was aseptic loosening. However, given the lack of a universally accepted definition of radiographic loosening this should be interpreted with caution. Another factor which may have contributed to the apparent differences in survival of the different implants is that for some implants, such as the Kudo, the papers describe the results of many different design prototypes while others, particularly the GSB III, give the outcome of only one version of the design. There are few published series of fixed hinge devices with high rates of loosening, yet some abstracts excluded from this review, document the high rate of failure by loosening of implants of this type.102,103 While the reported rates of loosening for linked devices are lower than those for unlinked implants, the former have much higher rates of progressive radiolucencies which may herald loosening in the future. While the evidence from Gschwend, with a loosening rate of 5% and a mean follow-up of 162 months, suggests that this may not be the case, there is clearly a need for more long-term data on the sloppy hinge devices, especially from independent groups.4
Elbow arthroplasty has an anecdotal reputation as carrying a high rate of infection. Kraay et al15 noted that their incidence of infection fell after the routine use of antibiotic-impregnated cement. After analysing the rate of deep infection in papers published in five-year blocks, we found that in publications since 1989 the rate has fallen to a steady state of around 4%, down from rates of 7.7% (1984 to 1988), 5.4% (1979 to 1983) and 6.9% (prior to 1979). This may reflect more widespread use of antibiotic-impregnated cement. However, the number of implants reported in each time period also differs, with lower numbers reported in the first three blocks (144, 238 and 325) than in the three since 1989 (647, 606 and 904).
No association was found between the duration of postoperative immobilisation and the rate of wound problems. Although information concerning breakdown of the triceps after operation is not well reported, release of the extensor mechanism from the ulnar periosteum appears to carry a higher risk of breakdown than do exposures which use a triceps turn-down or keep the extensor mechanism and the ulnar periosteum in continuity.
Information concerning ulnar nerve palsy is hard to review, given that it is not possible to determine how thoroughly this assessment was made. The reported rate of permanent palsy of 5%, while high, is not surprising given that the main indication for arthroplasty is rheumatoid arthritis. Pre-operative symptoms and signs were only recorded in 15% of the papers.
In this review the inclusion of only English-language publications may have increased the chance of “post-publication bias” in overestimating the treatment effect by including smaller studies which reached positive conclusions, while rejecting those which reached negative conclusions, since these series, if published, are more likely to feature in non-English language publications.104 However, the papers reviewed were case series, observational studies without control groups, and therefore the effect size, which is a measure of the difference in outcomes between the intervention and control groups, was not relevant.
The definitions of infection and loosening are arbitrary, but standardised. By using these standard criteria, data from many papers had to be excluded and this may well have biased the results. However, this was essential to ensure that comparisons could be made between the different publications and implants, and in particular to allow for the pooling of results. Similarly, the definition of a wound problem was arbitrary, but standardised; there was considerable variation in reporting practice among the papers.
When considering the functional outcomes, we have pooled the results from many different scoring systems, few of which have been validated. In many cases, the scoring system was unique to the publication, having been invented by the authors. All the systems used a combination of patient and clinician-based assessments, addressing pain, range of movement and functional ability amongst other things. As they had at least some common ground, we felt that it was appropriate to pool the results in terms of the proportion of patients returning a score rated “excellent” or “good”, albeit by arbitrary groupings. The assessment of functional outcome represents a small part of this review, and is among the least reliable of the fields considered.
By including the Gschwend paper,4 we broke one of our exclusion criteria. This step was taken only after prolonged discussion and was justified by a desire to include data on as many implants in contemporary use as possible.
We have reviewed the English-language literature of TEA in a systematic fashion. We have found that the reports of sloppy hinge implants suggest that they restore a better arc of movement, may return a higher proportion of good and excellent clinical results and may have a lower rate of radiographic loosening. The rates of revision of sloppy hinge and unlinked devices were comparable at a mean follow-up of five years. The papers reviewed tended not to use validated outcome measures or perform formal survivorship analyses and there were differences in the definitions used for several key data fields, notably loosening and infection. This made comparison difficult and makes the pooling of results unreliable. To date, there are very few comparative studies available, and none powered to detect differences in implant performance. This is not surprising given the nature of surgical implant research, but there is a clear demand for such series to throw more light on the performance of implants in selected patient groups in order to clarify the relative roles of the available implants.
- © 2005 British Editorial Society of Bone and Joint Surgery