The Eurocleft Cohort Study began in the late 1980s as an intercenter comparison of the records of 9-year-
old children with complete unilateral cleft lip and palate. It sought to overcome, at least in part, some of the limitations and potential biases associated with the comparison of outcomes described in single-cen- ter reports. A full account of the methodology and findings has been presented elsewhere [1–7]. Five of the original six teams agreed to continue follow-up of the cohort until age 17. The stated surgical protocols of respective centers are shown in Table 41.1.
W.C. Shaw, G. Semb
41
Table 41.1. Surgical treatment protocol of the five participating centers
A B D E F
Birth Presurgical Presurgical Presurgical
orthopedics (Hotz) orthopedics orthopedics
(Extra-oral (T-traction) strapping)
3 months Lip and hard Lip and hard
palate closure palate closure
(Tennison + (Millard +
vomerplasty) vomerplasty)
5 months Lip closure
(Modified Skoog/Tennison- Randall) and bone grafting
9 months
18 months Soft palate closure Hard and soft
(Modified von palate closure Langenbeck) (Veau–Wardill–
Kilner pushback)
24 months Soft palate closure
(Wardill pushback)
8–11 years Bone grafting Bone grafting Bone grafting Bone grafting and hard palate
closure Lip closure (Millard, Skoog)
Soft palate closure (von Langenbeck, Perko, Wardill, Kriens)
Hard and soft palate closure (Varied methods and timing) Lip (Varied methods and timing)
41.1.1 Outcomes at Age 9
At age 9, several differences between the centers were apparent. In the main, these reflected dissimilarities of centers D and F with the remainder. Patients from center D were characterized by a flattening of the nose, reduced nasal prominence, a short upper lip, and a concave profile. Center F patients were also characterized by a retruded upper lip and relative re- duction in upper face height. In center D’s patients, the skeletal profile was also retrusive compared with cen- ter B (Fig. 41.1). Major differences were evident for dental arch relationship (Fig. 41.2). Whereas only 7%
of cases for center E were considered to have a likely future need for osteotomy, almost half (48%) did so for center D.
It was not possible to ascribe success or failure to particular details of the surgical protocols, but poor outcomes appeared to be related to decentralized services without consistent protocols.
41.1.2 Follow-up
The aims of the follow-up were: to quantify the bur- den of care imposed by respective protocols; to see whether the ranking of centers for different outcomes at age 9 was predictive for equivalent outcomes at age 17; to assess patient/parent satisfaction with care, and to explore interrelationships with outcome and bur- den [8–11]. A separate comparison of speech out- comes was carried out at age 11–14 [12].
41.1.3 Survey of Treatment Experience The amount of treatment provided by the five differ- ent teams in 1976–79 was remarkably different (Table 41.2). Most notable was the lengthy hospital stay associated with presurgical orthopedics at that time in centers D and F. The subjects in center D also had more orthodontic visits for treatment and review, and for the overall number of surgeries compared to the other centers. From discussion with these centers it would seem that the reason for the large differences in the intensity of treatment was not primarily related to clinical need, but rather differing beliefs and histor- ical practices that had shaped the clinical protocols of the period.
41.1.4 Consistency of Outcomes Over Time The statistical analysis used to compare the five cen- ters was a general linear mixed model applied to lon- gitudinal data [13]. Variance terms were included in the model to account for between-subject variation in the intercept as well as fixed factor for assessment point (9, 12, 17 years) and center. Full details are reported elsewhere [14].
As Figure 41.3 indicates, the scores for dental arch relationship tended to improve in centers A, B, and E, but not in D and F. There was a consistent relationship over time for most cephalometric variables, e.g., soft tissue profile (Fig. 41.4) and for nasolabial appear- ance.
Fig. 41.1. Superimposition of mean plots for skeletal struc- tures at age 9
Fig. 41.2. Goslon individual patient scores at age 9 by center.
A Goslon score of 1 represents excellent maxillary prominence, and a score of 5 severe maxillary retrusion. One way to consid- er this outcome variable is the likely future need for subsequent maxillary osteotomy; cases falling below 3.5 at this age are like- ly candidates for osteotomy in the late teens
41.1.5 Lack of Association Between Outcome and the Amount of Treatment
Not surprisingly, follow-up of these five cohorts of pa- tients from age 9 to age 17 confirmed the main finding of the first report, with some centers continuing to achieve considerably better outcome than others, at all age points. Perhaps more surprising is the lack of association between amount of treatment and final outcome (Tables 41.3–41.5). Especially ironic is the finding that the two centers with the highest intensity of early treatment (hospitalization in order to per- form presurgical orthopedics) achieved the lowest rankings for eventual outcome (Table 41.3). Patients in the center with the least favorable outcomes (center
D) also experienced the longest orthodontic treat- ment duration and the highest number of orthodon- tic visits. It appears that this was partly due to the complexity of center D’s orthodontic treatment proto- cols with almost continuous treatment from the erup- tion of the primary dentition, and partly to the unfa- vorable dentofacial outcomes of primary surgery. (It is now almost 30 years since treatment began for these cohorts of patients, and hospitalization for orthope- dics has long been discontinued).
This lack of association between treatment out- come and intensity may represent a key lesson for the development of future protocols. It justifies an em- phasis on simplicity, economy, and minimized burden for the patient, rather than adherence to demanding protocols with unsubstantiated promise.
Table 41.2. Amount of surgery, presurgical orthopedics, and orthodontic treatment at the five centers
A B D E F
SURGERY
Mean number of surgeries 4.8 3.3 6.0 4.4 3.5
Mean days in hospital 33 31 60 24 26
EARLY ORTHOPEDICS
Months of treatment 13 0 15 0 5
Number of visits 11 0 8 0 17
Days in hospital 0 0 60 0 146
ORTHODONTIC TREATMENT
Treatment length (years) 5.6 3.3 8.5 3.5 4.0
Number of visits Treatment 52 41 54 33 47
Follow-up 11 23 42 16 25
Total 63 64 94 49 72
Fig. 41.3. Mean dental arch relationship scores at ages 9, 12, and 17 years for participating centers
Fig. 41.4. Mean soft tissue profile (angle SSS-NS-SMS) at ages 9, 12, and 17 years for participating centers
41.1.6 Lack of Association Between Outcome and Satisfaction
Perhaps the most perplexing finding of the series is the inconsistency between objectively rated outcomes and patient/parent satisfaction. We observed in- stances where the highest levels of dissatisfaction with treatment outcome were reported by subjects attend- ing the centers with the best objective ratings (Table 41.6). The possible reasons for this disparity have been discussed elsewhere [9] and highlight the need for concerted work on the understanding and measurement of patient satisfaction and the provision of more holistic models of cleft care.
41.1.7 The Benefits
of Intercenter Comparison
It goes without saying that professionals entrusted with the provision of health care have an obligation to review the success of their practices and where short- comings are revealed, to take remedial action. Such ef- forts should constitute a continuous cycle, sometimes known as clinical audit. This has been defined as “the systematic critical analysis of the quality of care in- cluding procedures for diagnosis and treatment, the use of resources and the resulting outcome and quali- ty of life for the patient” [15]. Often, efforts in clinical audit are divided into evaluating the process of care (the way in which it is delivered) and the outcomes of care (what is achieved). Cycles of outcome audit are more easily established when the intervention is com- mon and the consequences are clear-cut and quickly observable. Cleft audit therefore involves a consider- able challenge because of the lengthy follow-up re- quired, the complexity, subtlety, and number of rele- vant outcomes, and above all, the relatively low number of cases.
However, intercenter collaboration still offers sig- nificant advantages, by providing insights into the processes and outcomes of treatment of comparable services elsewhere, the establishment of future goals, and the exchange of evidently successful practices.
Table 41.4. Relationship between outcome assessment (dental arch relationship using the 17-year-old yardstick) and amount of orthodontic treatment in the different centers
Objective ranking Center Treatment length (years) No. of visits
Treatment Check-up
Best E 3.5 33 16
A 5.6 52 11
B 3.3 41 23
F 4.0 47 25
Worst (D) 8.5 54 42
Table 41.5. Relationship between outcome assessment (dental arch relationship using the 17-year-old yardstick) and the mean number of surgeries per patient in the different centers
Objective ranking Center No. of surgeries
Best E 4.4
A 4.8
B 3.8
F 3.5
Worst D 6.0
Table 41.3. Relationship between outcome assessment (dental arch relationship using the 17-year-old yardstick) and amount of infant orthopedic treatment in the different centers
Objective Ranking Center Months of treatment No. of visits Days in hospital
Best E 0 0 0
A 13 11 0
B 0 0 0
F 5 17 146
Worst D 15 8 60
The participation of two UK centers in the 9-year-old comparisons in this series was an important stimulus for the UK’s national assessment and reorganization of cleft services that subsequently took place [16].
Perhaps the greatest benefit of intercenter compar- isons is the co-operative spirit they foster and the gradual diminution of rivalry that occurs. It is no co- incidence that the majority of the participants of this cohort study and the related Scandcleft study [17, 18]
are now participants in the same multicenter random- ized trials.Working together also allows the sharing of past successes and failures and specifically facilitates fruitful joint working, such as the development of rat- ing scales and the formulation of new research ques- tions.
41.1.8 Audit vs. Research
A fundamental limitation of intercenter comparisons is that they cannot distinguish between the influence of different individual elements of a center’s protocol on its outcomes or between its protocols and the influ- ence of the personnel who deliver that protocol. In the present series, both centers with the poorer out- comes employed presurgical orthopedics (of different kinds), but their disappointing results might have been due to many other reasons such as the role of low-volume surgeons in center D and to the use of ear- ly bone grafting in center F, rather than to an adverse effect of early orthopedics. What it does tell these cen- ters is that elsewhere, better results are being achieved without orthopedics and the additional burden and cost that this imposes. (In any event, both centers changed in several ways; D discontinued early ortho- pedics, adopted the protocols of center E, and changed the way care was delivered from low- to high-volume specialists, and center F discontinued early bone grafting as well as early orthopedics).
While a series of intercenter comparisons with large numbers of cases would eventually allow one or another center to emerge as the highest achiever for particular outcomes, this would be of limited value to
the clinical community as a whole. Only protocols can be transferred, not clinicians. Inevitably, the defini- tion of good or bad protocols, or good or bad elements of protocols requires the explicit arrangements of a randomized trial. And this is the main distinction be- tween audit and research: clinical research (principal- ly clinical trials) determines optimal procedures that can be transferred to an infinite number of centers;
clinical audit determines whether they have been transferred successfully.
41.1.9 Alternatives to Intercenter Comparisons
for Routine Clinical Audit
Despite the advantages of collaborative work listed above, there are two important limitations to their use as the routine method of clinical audit. Firstly, multi- ple between-group comparisons increase the sample of cases required and secondly, the logistic challenges and costs may be an issue. For most teams, reassur- ance that they are achieving outcomes in the main- stream of competent practice may be sufficient, rather than knowing their relative ranking against a selec- tion of other centers. Several possibilities may be con- sidered.
41.1.9.1 Good Practice Archive
One strategy would be to assemble an archive of rele- vant clinical records that are considered to be repre- sentative of good practice, perhaps drawn from the consecutive cases of respected centers. Provided other centers collect equivalent consecutive records, match- ing of cases on relevant characteristics would enable comparisons to be made. This could be performed by the center in question ’behind closed doors’ or, if pre- ferred, with more transparency. For example, the cen- ter’s cases could be mixed with the archive cases and independently rated, or rated by a blinded panel in- volving the center’s own personnel. With appropriate
Table 41.6. Relationship between objective ranking of nasolabial outcome and patient dissatisfaction
Objective ranking Percentage of respondents Objective ranking Percentage of respondents dissatisfied with nasal appearance dissatisfied with lip appearance
Best A 64 Best B 14
E 32 A 41
B 14 F 6
D 45 E 42
Worst F 33 Worst D 16
technology, such exercises could be managed via the Internet to maximize efficient use of time. However this must be set against loss of the benefit of face-to- face professional interaction and discussion. (See Sect. 41.3 EUROCRAN below).
41.1.9.2 Registries
An alternative or complementary approach is the use of a registry. Prospective entry of newborn patients would have the particular advantage of establishing a list of consecutive cases that could be used to affirm that follow-up and exclusion bias are not confounders of later comparisons.
41.1.9.3 Benchmarks
The simplest system would be for teams to obtain a minimal set of objective measurements on their cases that are subject to negligible measurement error, e.g., incisal overjet [19]. These could then be compared
with tables or graphs summarizing ’good practice’
outcomes. One way of doing this would be to convert values for the reference data to a normal distribution curve. Individual centers could then plot their mean value for a particular characteristic on the curve to determine the extent of any difference from the norm and its statistical significance [20](Fig. 41.5).
41.1.10 International Agreement on Record Collection
and Comparison Methodology All of the above would be facilitated by a broad con- sensus approach. At a recent meeting held under the auspices of the World Health Organization [21] a global consensus on recommendations for record keeping was agreed (www.who.int/ncd/hgn/publica- tions.htm). This defines minimum record keeping across a range of cleft types and treatment episodes for centers that might wish to participate in future international comparisons.
Fig. 41.5. A simple two-step system to check outcome for a series of consecutive patients with UCLP based on the mean val- ue for overjet on the noncleft central incisor.
No special equipment or statistical skills are required. Any appropriate distribution may be selected as the reference base. From [20]
with kind permission
In the meantime, urgent work is needed by cleft researchers to establish norms for outcomes over the range of cleft care and to undertake collaborative work to refine comparison methodology. Further work on the long-term reliability of early outcome assessment is also a high priority if the elimination of unsuccessful protocols is to take place more rapidly.
Longitudinal archives from cleft clinics from around the world could make a significant contribution in this work, by defining which early measurements are most likely to be predictive over time.