Aims The PROximal Fracture of the Humerus Evaluation by Randomisation (PROFHER) randomised clinical trial compared the operative and non-operative treatment of adults with a displaced fracture of the proximal humerus involving the surgical neck. The aim of this study was to determine the long-term treatment effects beyond the two-year follow-up.
Patients and Methods Of the original 250 trial participants, 176 consented to extended follow-up and were sent postal questionnaires at three, four and five years after recruitment to the trial. The Oxford Shoulder Score (OSS; the primary outcome), EuroQol 5D-3L (EQ-5D-3L), and any recent shoulder operations and fracture data were collected. Statistical and economic analyses, consistent with those of the main trial were applied.
Results OSS data were available for 164, 155 and 149 participants at three, four and five years, respectively. There were no statistically or clinically significant differences between operative and non-operative treatment at each follow-up point. No participant had secondary shoulder surgery for a new complication. Analyses of EQ-5D-3L data showed no significant between-group differences in quality of life over time.
Conclusion These results confirm that the main findings of the PROFHER trial over two years are unchanged at five years.
Cite this article: Bone Joint J 2017;99-B:383–92.
- Proximal humeral fractures
- Randomised controlled trial
- Operative versus non-operative treatment
- Long-term follow-up
We report the five-year follow-up of the PROximal Fracture of the Humerus Evaluation by Randomisation (PROFHER) trial (trial registration identifier: ISRCTN50850043).
PROFHER was a pragmatic, multi-centre randomised controlled trial (RCT), funded by the United Kingdom National Institute for Health Research (NIHR), which compared operative and non-operative treatment of adults with a displaced fracture of the proximal humerus involving the surgical neck.1
Between September 2008 and April 2011, 250 adults were recruited into the trial. At two-year follow-up, the primary outcome and the Oxford Shoulder Score (OSS)2,3 were available for 215 participants.4 The results showed no significant difference between operative and non-operative treatment by OSS over two years (p = 0.479) or other patient-reported clinical outcomes in the two years following fracture;4,5 and the cost of surgery was considerably greater.6
The initial choice of a two-year follow-up for PROFHER was a pragmatic one which balanced feasibility and the expectation that any differences in the OSS between the two treatment groups at two years would represent a true and enduring effect. However, there is insufficient evidence from other RCTs to confirm this assumption.7 Recovery from serious injuries such as a fracture of the proximal humerus is a long and often incomplete process that can be hindered by complications. A substantial proportion (15/74, 20%) of participants in a trial with less severe (‘minimally displaced two-part’) fractures than in PROFHER had continuing ‘severe’ disability after two years, although less than that at one year (30/84, 37%).8
We reasoned that a five-year follow-up would allow for delays in recovery, potential functional deterioration, and subsequent operations resulting from complications, such as avascular necrosis and complications of surgical fixation or humeral head replacement, which could arise or become symptomatic later on. The extension made practical sense as the infrastructure was already in place and the potential availability of a large group of patients presented an opportunity to gain reliable evidence about patient-reported longer term outcome, as well as insight into the feasibility of future research.
We set up the extended follow-up study at the Yorks Trials Unit, securing ethical approval in September 2010 from the institution,5 before the end of recruitment to, and without knowing the results of, the first study.
Our primary aim was to obtain three, four and five-year data on the key outcomes (OSS, EuroQol 5D-3L (EQ-5D-3L),9 and subsequent surgery) to determine whether the effect of treatment detected at two-year follow-up had persisted or changed. A further aim, linked to the collection of EQ-5D-3L data and information about any further surgery, was to examine the potential effect on our economic analysis6 of any change in health related quality of life (HRQoL) and the costs related to this.
Our secondary aims were to generate longer term condition-specific data on shoulder function that would provide reference data for the interpretation of the findings of PROFHER and future studies of proximal humeral fractures and to inform future research in this area on the appropriate duration of follow-up.
Patients and Methods
The methodology of the main trial is reported elsewhere.1,5 The inclusion and exclusion criteria are shown in Table I.10 The final version of the extended study protocol (version 3.0; 09 August 2012) is published on the NIHR website.11 All related amendments were reviewed and approved by the Leeds West Research Ethics Committee (08/H1311/12).
Postal questionnaires were sent at three, four and five years after the start of the original trial to the 176 participants who had completed and returned a consent form sent on receipt of their 24-month questionnaire. A pre-notification letter was sent before this and, when necessary, reminders were sent after two and four weeks, with the option to complete questionnaires by telephone after six weeks. To maximise collection of data at the three time points, patients were asked to complete a short questionnaire restricted to the OSS, EQ-5D-3L, recent operations on their shoulder, and recent fractures. Patients were also sent an unconditional £5.00 incentive payment with each questionnaire. We also collected data from NHS Digital, using the NHS Summary Care Records available electronically for authorised staff, on patient mortality at regular intervals before sending the questionnaires to avoid distressing bereaved families or friends.
The primary outcome measure was the OSS, which assesses pain, function and activities of daily living.2,3 It contains 12 items, each with five categories of response, and a range of total scores from 0 (worst outcome) to 48 (best outcome).3 Secondary outcomes were the EQ-5D-3L, used to estimate utilities (HRQoL weights),9 further shoulder surgery and further fractures. While mortality was a secondary outcome in the main follow-up, it was reported solely as a reason for loss to follow-up in the extended follow-up: mortality and definitive treatment of the fracture after two years could not reasonably be expected to be linked and would not anyway be listed as a cause of death. Overall, the OSS and EQ-5D-3L were collected at six, 12, 24, 36, 48 and 60 months; EQ-5D-3L data were also collected at baseline and three months. Secondary shoulder surgery and further fractures were collected from hospital forms at one and two years and from patient questionnaires at three, four and five years’ follow-up.
The main study was designed to detect a standard effect size of 0.4 (approximating to five OSS points) with 80% power using 5% significance level, and needed approximately 200 participants at two years.1 We assumed a 20% attrition rate at five years and based our proposal on a final sample size of 160 which would provide 71% power to detect a standard effect size of 0.4 using 5% significance level. Given the reduced statistical power for the extended follow-up, significance testing was limited to the primary outcome alone.
Statistical and economic analyses
All analyses were performed using Stata version 13.1 (StataCorp, College Station, Texas) and were on an intention-to-treat basis, participants being analysed in the groups to which they were randomised. Significance tests were two-sided at the 5% significance level.
OSS data from the extended follow-up time points were added to the primary analysis model of the PROFHER trial.4 The analysis compared OSS data from the two treatment groups over all follow-up assessments using a multilevel regression model. In order to account for the correlation of outcomes over time from the same patients, time points were nested within patients. The model adjusted for the fixed effects of treatment group; time (six months, one, two, three, four and five years); interaction between treatment group and time; tuberosity involvement at baseline (yes or no); age (< 65 years, ≥ 65 years), and gender and health status at baseline (EQ-5D-3L). The unstructured covariance pattern was retained from the primary analysis model. Patients with valid OSS data at one or more follow-up points for the standard or extended follow-up as well as complete covariate data were included in the analysis. Estimates of the difference in OSS between treatment groups, 95% confidence intervals (CI) and p-values were obtained for the extended follow-up at three, four and five years.
In a sensitivity analysis, the multilevel model was repeated substituting missing data with data derived by multiple imputation by chained equations. Missing outcome and covariate data were imputed from age, gender, tuberosity involvement, EQ-5D-3L index at baseline and available OSS data at other follow-up points.
As with the main trial, the possibility of differential long-term treatment responses for older patients (subgroups: < 65 years versus ≥ 65 years) and more complex fractures (subgroups: involvement of no tuberosities/one or both tuberosities) was explored. Expectations of the benefit of surgery over conservative treatment, established before the main trial results were known, were that this was greater in patients < 65 years and in patients with fractures involving one or both tuberosities,1 and that these benefits might only emerge in the longer term.11 Unadjusted mean OSSs by subgroups and treatment arm were therefore explored. Due to the substantially reduced statistical power for the subgroups, no statistical testing was performed.
We calculated the annual and overall frequencies of shoulder surgery and fractures in each treatment group that had occurred within the previous year. Extended follow-up data were combined with those of the main trial to establish the number of participants in each treatment group who had secondary shoulder surgery or a further fracture over five years. Free text providing details of further surgery and non-pre-specified fractures was categorised by two independent observers (HH and AR), who were blinded to the treatment group.
The economic analysis aimed to explore whether the results from the PROFHER trial were sustained over a five-year time period by determining the between-group differences in HRQoL (measured via the EQ-5D-3L) at set times (three, four and five years) and examining how this difference evolved over time. We also planned to estimate costs of any further shoulder surgery and report these descriptively.
The methods used to process the EQ-5D-3L data and calculate quality-adjusted life years (QALY) scores were the same as those described in our previous cost-effectiveness report.6 Briefly, the EQ-5D-3L data were transformed into ‘health-related quality of life weights’ (utilities) using the United Kingdom general population tariff which assigns societal values to each health state.12 QALYs were calculated by combining the utility estimates by the duration of time in each health state using the area under the curve method following the trapezium rule which assumed linear interpolation between follow-up points.13 A discount rate was applied to QALYs after 12 months, at an annual rate of 3.5%.14
In the main trial, the base-case analysis was conducted for the imputed dataset by means of multiple imputation with chained equations, using seemingly unrelated regression analysis.6 This method accounts for the correlation between costs and effects from the same individuals and imputes the missing data. However, other regression-based methods are available for handling missing data in longitudinal studies, principally mixed models, and results may be sensitive to the methods used.15 A multilevel model similar to the primary OSS analysis was therefore conducted to investigate whether the results obtained in the main trial were robust to this alternative method of analysis. Therefore, the mean difference in utilities and QALYs (with 95% CIs) between the two groups was estimated using a multilevel model that adjusted for the fixed effects of treatment group, time (three and six months, one, two, three, four and five years), interaction between treatment group and time, tuberosity involvement at baseline, age, gender and baseline utility.
Uncertainty around the results was explored by means of sensitivity analysis that used multiple imputation by chained equations to replace missing data on QALYs in the multilevel model where missing outcome and covariate data were imputed from age, gender, tuberosity involvement, and baseline utility.
Of 176 patients (81% of the 218 who returned questionnaires at two years; 70% of 250 randomised trial participants) who consented to long-term follow-up at two years after randomisation, valid OSSs were received for 164 (93%) at three years, 155 (88%) at four years, and 149 (85%) at five years’ follow-up (Fig. 1). Retention was therefore slightly lower than anticipated in the extended follow-up. However, additional power was gained by the multilevel analysis. A total of ten patients died during the long-term follow-up, five in each trial arm.
As found at baseline (except for smoking status, which did not affect the OSS results) and two-year follow-up,4 patient characteristics were balanced between groups at five-year follow-up in the 149 patients with complete OSS data (Tables II and III). Furthermore, the characteristics of the RCT population remained representative, as none of the baseline characteristics differed meaningfully between participants at the start of the trial and those remaining at the end.
Primary outcome (OSS)
Unadjusted OSS outcomes for patients with valid data were very similar in both groups for the extended follow-up period (Fig. 2). This featured a trend of small score increases between two and four years, with little difference in the fifth year. OSS scores were skewed towards maximum OSS shoulder function: over half the population had stable and satisfactory shoulder function3 at all three follow-up points: three years (median 42, interquartile range (IQR) 35 to 47.5); four years (median 43, IQR 37 to 48); five years (median 44, IQR 36 to 48).
When adding the long-term OSS follow-up data to the existing multilevel analysis, group differences were not statistically significant at any of the long-term follow-up time points. This was true for the primary analysis model including all patients with available outcome data at any time point as well as the sensitivity analysis including all patients using data derived by multiple imputation (Tables IV and V). None of the estimated mean differences was clinically meaningful; almost all were smaller than one OSS score point in magnitude with no consistent trend for the direction of the treatment effect.
The substantial overlap of the confidence intervals for the unadjusted OSS scores indicate that there were no marked differences between the treatment groups for the subgroups based on age (Fig. 3) or tuberosity involvement (Fig. 4). In both subgroups, the patterns of OSS score differences were not consistent with prior expectations.
Only one patient reported further shoulder surgery during the extended follow-up period. This was a reverse shoulder replacement in year three in a non-operative group patient who had already undergone surgery (arthroscopic capsular release and subacromial decompression) during the main follow-up. Consequently, the number of patients who needed secondary surgery remained at 11 in each treatment group.4
A total of 81 further fractures were reported by 52 patients over the five-year follow-up period. A small number of fractures are likely to be duplicated from one year to the next but as this could not be known definitively, patient data were accepted as submitted, with the exception of one participant who provided the date of their fracture. There were more fractures in the non-operative group (50 fractures, 33 patients) than the operative group (31 fractures, 19 patients), especially of the spine and hip (Table VI).
Inevitably, when compared with the 125 randomised into each treatment group, the extent of missing EQ-5D-3L data increased considerably in the extended follow-up period. For the 176 participants who consented to long-term follow-up, complete EQ-5D-3L scores were available for 159 (90%) at three years, 153 (86%) at four years and 151 (86%) at five years.
Figure 5 shows the distribution of mean utilities (EQ-5D-3L scores) for all the available patients across the five years for the two groups. Patients in the operative group started from a higher mean baseline utility (0.43; -0.36 to 1, operative versus 0.38; -0.35 to 1, non-operative). However, at the end of the second year there was little difference in EQ-5D-3L scores between treatment groups. This finding was consistent at three, four and five years with the 95% CIs overlapping at each assessment point. The same pattern applied for the analysis of utilities when adjusted for baseline utility or for all covariates (Table VII).
Between-group mean difference in QALYs based on individual patients’ utilities are shown in Table VIII. At the end of the five years, patients allocated to the non-operative group generally had a marginally higher QALY gain than patients allocated to the operative group. Hence the QALY gain for non-operative patients is maintained over time whether data are adjusted for baseline utility or for all covariates. The mixed model was repeated substituting missing data with data derived by multiple imputation by chained equations. For both analyses, there were negligible differences in the QALYs between the two groups at the different follow-up times (Table VIII).
The extended follow-up found no statistically or clinically significant differences between operative and non-operative treatment of displaced fractures of the proximal humerus involving the surgical neck at three, four or five years in the OSS, our primary outcome. Nor was there any trend for group differences relating to age or fracture type.
These findings mirror those of the main trial.4 No trial participant had secondary shoulder surgery for a new complication during the extended follow-up period. The between-group differences in utilities, based on EQ-5D-3L data, at three, four or five years were very small: the 95% CIs overlapped at each assessment. The same lack of statistically significant between-group differences applied to the HRQoL analysis that showed the trend for a QALY gain for participants in the non-operative group was maintained over time. Sensitivity analyses indicated minimal differences between the two groups at each follow-up time.
By exceeding the original target of 200 participants at two-year follow-up, PROFHER was sufficiently powered at final follow-up. By contrast, we were 11 short of the 160 participants with OSS data at five years, and therefore did not meet the revised statistical power criteria for the extended follow-up. However, we believe this is unlikely to affect the validity of the results. First, loss to follow-up, including identical mortality (five in each group), was balanced in the two groups. Secondly, baseline characteristics at five years were comparable between groups as well as being representative of the original population. Thirdly, much of the missing data were accounted for in the multilevel analysis, which included 231 patients. Fourthly, the between-group differences were small: the 95% CIs at each follow-up time were less than the minimal clinically important difference of five points. Fifthly, the between-group differences in the EQ-5D-3L were also very small, again reflecting comparability of the groups. Finally, there were no new complications warranting surgery.
Although there were no cost data to replicate the incremental cost-effectiveness analysis conducted for the PROFHER trial, the analyses of the health utility data for the five-year period produced results that are consistent with the main trial analysis:6 in general, patients allocated to surgery reported lower HRQoL. The very small differences in HRQoL between the two groups found for the mixed model and multiple imputation analyses indicate negligible differences in quality of life between the treatment groups. The costs of the only shoulder operation reported for the extended follow-up would not have affected the findings of the main trial.
We consider that it is unsafe to draw any conclusions from the observed differences in participants incurring further fractures between the two groups on the basis of treatment group. We suggest that this is primarily a chance effect. In terms of known risk factors for fractures (such as higher age, female gender, previous fracture and smoking), the two groups were at similar risk of further fracture at baseline except for smoking status, where there was a higher incidence of smokers in the non-operative group. This may partly explain a higher number of fractures in that group. Known inaccuracies, relating to both under- and over-reporting, of self-reported fractures16 are of some concern and indeed, based on additional participant commentary, we have confirmed one instance of duplicate reporting over time. We also have no information about whether there was any difference in the advice offered and medication provided for preventing further fractures in the two groups.
Our findings of an absence of treatment differences on the OSS in the extended follow-up underpin the main findings for the two-year follow-up. The only case of further surgery over the extended follow-up was further surgery for a patient who had already had surgery for a complication that occurred within the two-year follow-up.5 Given that most (15 of 22) secondary surgery occurred in the first year, this finding and the lack of difference in the OSS provide reassurance that late symptomatic complications are rare. The HRQoL results show that the PROFHER economic analysis was applicable over a five-year period. The overall OSS results show that most patients had attained satisfactory shoulder function by two years: this was subsequently sustained. Therefore, the two-year follow-up would have been sufficient for the PROFHER trial, and this finding could inform the length of follow-up for future RCTs on these fractures.
Take home message:
- The results of the extended follow-up underpin the main findings of the PROFHER trial.
- There was no significant difference in patient-reported outcome between operative and non-operative treatment for the majority of adults with proximal humeral fractures involving the surgical neck.
H. Handoll: Advised on and contributed to methods and reporting throughout the trial, Wrote the first and revised drafts of the paper incorporating separate reports from AK and BC.
A. Keding: Trial statistician, Provided advice on methods, Produced and implemented statistical analysis plan, Contributed to the preparation of the paper.
B. Corbacho: Trial health economist, Produced the health economics analysis plan, Conducted the economics analysis, Contributed to the preparation of the paper.
S. Brealey: Trial manager, Advised on the design and coordinated the implementation of the extended follow-up including data collection, Contributed to the writing.
C. Hewitt: Statistician, Independently repeated primary analysis, Commented on paper.
A. Rangan: Chief investigator, Advised on the study design and on the clinical aspects of the analysis, Contributed to the writing.
We are grateful to the patients who generously completed questionnaires for the extended follow-up.
We thank R. Clarkson and L. Kottam (both at James Cook University Hospital, Middlesbrough, United Kingdom) for checking the Summary Care Records of patients for mortality, and staff at various participating sites for their help in locating missing patients.
This project was funded by the National Institute for Health Research (NIHR) Health Technology Assessment Programme (Project number: 06/404/53).
This paper presents independent research commissioned by the NIHR. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the UK National Institute for Health Research, the UK National Health Service, or the UK Department of Health.
The sponsor (Teesside University) managed the grant application process and monitored the study but it had no direct role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.
Although none of the authors has received or will receive benefits for personal or professional use from a commercial party related directly or indirectly to the subject of this article, benefits have been or will be received but will be directed solely to a research fund, foundation, educational institution, or other non- profit organization with which one or more of the authors are associated.
This is an open-access article distributed under the terms of the Creative Commons Attributions licence (CC-BY-NC), which permits unrestricted use, distribution, and reproduction in any medium, but not for commercial gain, provided the original author and source are credited.
This article was primary edited by G. Scott and first proof edited by A. C. Ross.
- Received October 15, 2016.
- Accepted November 28, 2016.
- ©2017 Handoll et al