Abstract
Psychiatrists and psychologists acting as expert witnesses in court cases are often accused of bias or error. We examined the level of agreement and factors influencing agreement between expert reports admitted into evidence during adversarial civil proceedings. The inter-rater reliability of the psychiatric diagnosis was examined in 51 pairs of civil medicolegal reports written by experts engaged by the same side and 97 pairs of experts engaged by opposite sides. Reports written by experts engaged by the same adversarial side had good agreement about the presence of a mental disorder (κ = .74) but had only fair agreement about the specific psychiatric diagnosis (average κ = .31). Reports written by experts engaged by opposing adversarial sides had poor agreement about the presence of any mental disorder and also the specific psychiatric diagnosis. Experts were more likely to agree about the presence of a mental disorder if the plaintiff was involved in a fatal accident. The agreement of treating doctors and experts was similar to that of pairs of experts.
The role of advocates in adversarial proceedings is to represent their clients to the best of their ability. Advocates sometimes engage the services of expert witnesses to provide evidence in areas that are beyond the usual knowledge of the court. Unlike the advocate, the principal duty of an expert witness is to provide an accurate and unbiased opinion. The expert witness is required to resist pressure arising from efforts by the competing advocates to obtain the best possible outcome for their clients.
The possible causes of bias in the opinions of experts have been widely discussed.1–3 Advocates naturally select experts whose previous opinions are known to support a client's case. Other possible sources include the understandable wish to please the hiring party, the financial inducement of the prospect of further work, and the nature of the instructions and the selection of documents given to the expert witness by the lawyer. The conclusions of medicolegal assessments may also be influenced by the interaction between expert and plaintiff. Obvious examples are the desire of treating doctors not to damage a therapeutic relationship,4 the sympathy evoked by a bereaved plaintiff, and the countertransference evoked in the expert during the assessment.5 Consideration of the consequences of the expert's opinion for the patient and the defendant may also have an influence on the conclusions.
Even without the potential sources of bias, the formulation of psychiatric opinion regarding the effect of trauma is a complex task that includes some subjective assessment. Most psychiatric disorders have no biological markers or objective signs, and the expert must assess whether symptoms are present and are severe enough to meet the accepted criteria to make a diagnosis. In addition to making a diagnosis, experts are often asked to comment on the likely cause of symptoms, the patient's level of disability, and the prognosis. Despite longstanding concerns about the reliability of medicolegal assessments after trauma, the reliability of assessments of disorders such as post-traumatic stress disorder (PTSD) has received little scientific study. The Diagnostic and Statistical Manual of Mental Disorders (DSM-III) field trials reported only a very modest level of agreement between independent assessors in the diagnosis of minor mood disorders and anxiety disorders, did not report the reliability of PTSD, and may not even have included any patients with this diagnosis.6 PTSD was not one of the conditions studied in the ICD-10 field trials,7 and no attempt was made to establish whether PTSD could be distinguished from other psychiatric conditions in the DSM-IV trials.8
There are many studies that demonstrate the reliability of structured and semistructured interviews used in diagnosing depressive and anxiety disorders, including PTSD. There are also studies that show adequate inter-rater reliability of agreement between clinical diagnosis and structured interviews in some specialized settings.9–12 However, the suitability of semistructured and structured interviews in forensic settings has been challenged, and the results of structured interviews rarely form the basis of experts’ opinions.13,14
When the existing uncertainties about the reliability of clinical diagnosis are considered in a legal context in which the patient, the legal representatives, and even the expert witness may have a stake in a particular outcome, it is not surprising that doubts have been expressed about the reliability of psychiatric diagnoses generated in assessments for the courts.15 In this retrospective study of a complete series of concluded claims for psychological injury after motor vehicle accidents, we examined the extent of agreement between experts and factors that may influence agreement.
For the purpose of the study, we assumed that any intrinsic unreliability in the specific diagnostic categories of psychiatric disorder is a source of random error. Thus if the experts were subject to this form of error, pairs of experts, irrespective of their roles, would be unlikely to agree about diagnosis. It was also assumed that the influences on the expert in favor of a particular outcome are a cause of systematic bias. If experts were subject to this form of bias then experts in the same role would be more likely to agree than experts from opposing sides.
The specific hypothesis was that experts from the same adversarial side would be more likely to agree about the psychiatric diagnosis than would experts from opposite sides. Our further hypotheses were that there would be higher levels of agreement about the presence of any mental disorder and the principal psychiatric diagnosis in pairs of reports by experts from the same profession (psychologist or psychiatrist), pairs of experts (rather than a pair made up of a treating doctor and an expert), and pairs of reports with respect to a plaintiff who was severely injured or bereaved.
Methods
The Sample of Reports
The reports used in the study have been described elsewhere16 and were provided to M.M.L. by the only law firm acting for the National Roads and Motorist's Association (NRMA), which at the time was a mutual society that provided insurance and other services to about half the motorists in the state of New South Wales, Australia. The files are the property of the NRMA (now called the Insurance Australia Group, or IAG), are held in secure storage by M.M.L., and will be returned for shredding in 2011. For the purpose of this study no identifying information was recorded, as the claims were identified with a number, and the experts were identified with a three-letter code. The electronic and paper records of the research are securely stored by M.M.L. The IAG has not sought any information from the study and has received copies of papers only after submission for publication to peer reviewed journals.
The reports were written about patients in a series of 559 consecutive third-party personal-injury claims. At the time, claims for personal injury after motor vehicle accidents were dealt with under common law, and most cases were settled by agreement between the plaintiff and defendant before a court hearing. In 67 claims, there were two or more reports written by a psychiatrist or a psychologist (excluding the reports of neuropsychologists). The reports consisted of all the psychiatric reports served on the defendant's lawyers by the plaintiffs and all the reports of assessments of the claimant by the defendant's experts.
Statistical Analysis
A κ statistic17 was used to measure inter-rater agreement about each specific psychiatric diagnosis, as it is an index of agreement about categorical variables that takes into account the differing prevalence of the conditions. The κ statistic is generally used to measure the level of agreement between two raters and generates a number between −1 and 1, with 1 indicating agreement in all cases, 0 indicating the level of agreement that may be expected by chance, and −1 indicating disagreement in all cases. It was used to classify the level of agreement according to the following scores: 0 to .2, poor; .2 to .4, fair; .4 to .6, moderate; .6 to .8, good; and .8 to 1.0, very good. There are other ways of measuring agreement between multiple raters,18 but these methods were unsuitable for this naturalistic study that had a varying number of raters per case. There is some controversy about the use of κ and the resultant scale of agreement (from poor to very good), and the statistic is also sensitive to the number of cases and the number of categories.19,20 However, the κ statistic was chosen to compare the inter-rater agreement between multiple sets of raters in the DSM-III,6 DSM-IV,8 and ICD-107 field trials.
A contingency table (df = 2) was used for three group comparisons of categorical data. The injury severity (IS) score was compared by using the Kruskal-Wallis test, and the plaintiff's age was analyzed with a one-way ANOVA. All tests were in two-tailed form.
Univariate measures that were found to be significant in the comparison of groups of reports that agreed about the presence of mental disorder or the principal psychiatric diagnosis were included in two generalized estimating equations (GEEs). For the GEE, the individual claim was used as the subject variable, and the particular report pair was the within-subject variable. A GEE was used in preference to logistic regression, as it assumes observations made within identified data clusters (in this case, report pairs from the same claim) lack statistical independence. We used agreement about the presence of a mental disorder and agreement about the principal diagnosis as two binary dependent variables, with a logit link function, a binomial probability distribution, and an unstructured correlation matrix.
All the statistical analyses were performed with SPSS for Windows, version 15.0.
The Sample of Report Pairs
Of the 67 claims with two or more psychiatric reports, there were 42 claims with two reports, 16 with three, 8 with four, and 1 with five, making a total of 169 reports. Psychiatrists wrote 119 of the reports and psychologists wrote 50. Fifty-six of the reports were from experts engaged by the plaintiff, 68 were from the defendant's experts, and the remaining 45 were written by treating psychologists or psychiatrists.
There were 148 possible pairings of the 169 reports. Eighty-four of the reports appeared in one report pair, 48 reports were used in two report pairs, 32 in three report pairs, and 5 in four report pairs.
Of these, there were 73 report pairs in which both reports were written by psychiatrists, 17 in which both were written by psychologists, and a further 58 in which one report was written by a psychiatrist and the other by a psychologist. In 77 report pairs, both of the reports were written by experts who were not involved in the patient's treatment, in 63 one report was written by an expert and the other by a treating practitioner, and in 8 both reports were written by treating practitioners.
The sample presented a methodological dilemma of whether to use the findings of all possible report pairs in the analysis, because it would overestimate the number of degrees of freedom in claims in which all the possible pairs of three or more reports were used. For example, if two of three pairs of reports about a plaintiff agreed about a specific diagnosis, then the third pair must logically also be in agreement. This problem could have been approached by including only an incomplete set of report pairs per claim, or by excluding any claims with three or more reports. Either method results in an arbitrary omission of some report pairs and would alter the degree of agreement reported. Hence, we used an approach that included all the possible report pairs.
This approach prevented the use of the κ statistic to calculate an overall measure of agreement among all 148 report pairs, as the conclusion with respect to psychiatric diagnosis could be logically deduced in 46 of 148 report pairs from the results of the other 102 report pairs, and hence almost a third of the observations would have lacked statistical independence. However, within the group of 97 pairs of results from opposite sides, there were no instances when agreement between reports could be deduced from other report pairs. Within the group of 51 report pairs from the same side, there were six claims with three reports written by experts engaged by the same side in which the agreement between any one pair of raters could be predicted by the findings of the other two pairs. An analysis of the κ statistics after excluding these cases found minimally changed levels of agreement about the specific psychiatric diagnosis (see the footnotes to Table 1).
The GEE is an appropriate way to analyze all 148 pairs, as it assumes a lack of independence within cases.
Diagnostic Variables
Each diagnosis from each report was recorded. There is no ideal way of measuring inter-rater agreement between raters making multiple diagnoses.21,22 In the κ analysis, we restricted the consideration of the agreement to the specific psychiatric diagnosis, so that if one report contained a diagnosis of traumatic brain injury and an adjustment disorder and the second report in the pair had only a diagnosis of adjustment disorder, the result was coded as an agreement for adjustment disorder, but a disagreement for traumatic brain injury. Agreement between the earlier and later reports was measured when the reports were both from the same side.
In the GEE analysis, agreement was defined as agreement about the principal psychiatric diagnosis of the patient and agreement about the presence or absence of any mental disorder attributed to the accident. In these claims, we used a previously described diagnostic hierarchy based on an existing ranking of diagnoses of traumatic brain injury, PTSD, major depression, and other anxiety disorders and then adjustment disorders, bereavement, other disorders, and no diagnosis, to determine the principal diagnosis.16 PTSD was placed above major depression in this hierarchy, as PTSD was more frequently diagnosed than depression in this setting, and depression was often reported to be a complication of PTSD. Each report pair was rated 1 if the raters agreed about the principal psychiatric diagnosis (including no diagnosis), and/or 1 if they agreed about the presence or absence of mental disorder, and 0 in either category if they disagreed.
As a result, two measures of agreement were generated about the patient for use as dependent variables in the GEE: agreement about the presence of any mental disorder and agreement about the principal psychiatric diagnosis.
Expert Variables
The details of the expert were collected from each report. The profession of the author (psychiatrist or psychologist), the role (defendant's or plaintiff's expert; treating doctors were considered to be on the plaintiff's side), and the status as a court-recognized expert (expert versus treating practitioner) was recorded for each case. For the GEE, each report pair was scored 1 in each of these variables if they were the same and 0 if they differed. These pair-wise measures of agreement were within-subject variables in the GEE.
Plaintiff Variables
The patient's injuries were recorded using the injury severity (IS) score, an instrument for classifying the overall severity of multiple injuries. It classifies the injuries in bodily regions on a six-point scale from no injury to lethal injury and is scored by calculating the sum of the square of the three most severe injuries, making a maximum score of 75 for injuries that are not immediately lethal. In an acute medical setting, the IS score is correlated with mortality,23 and, applied retrospectively to medical files, it predicts the most acute setting of medical care (in intensive care, a general ward, emergency department, or outpatient treatment) and the number of subsequent days in the hospital.24 The age, sex, and whether another person was killed were also used as independent variables in the GEE.
The 67 plaintiffs had an average age of 36 (SD 15) years, 37 were male, and 13 had been involved in an accident in which there was a fatality. The median IS score was 12, indicating that most plaintiffs sustained moderate or severe physical injuries. The characteristics of plaintiffs who were the subject of three or more psychiatric or psychological reports did not differ significantly from those who were the subject of only two reports.
Results
Reports written by experts from the same adversarial side had good agreement about the presence of any mental disorder (κ = .74) but had only fair agreement about the specific diagnosis (average κ = .31). The agreement between experts who were employed by the same side was good for traumatic brain injury, but was in the poor or fair range for the other common diagnoses. There was poor agreement about the presence of any mental disorder (κ = .09) and also the specific psychiatric diagnosis (average κ = .14) in report pairs written by experts from opposing adversarial sides (Table 1).
There were only six instances in which there were two experts for the defendant for a single claim, although all the reports written at the request of the defendants’ lawyers were available to the researchers. In these six cases, one pair agreed that no diagnosis was warranted, five pairs agreed about the presence of mental disorder, but no pair agreed on the principal psychiatric diagnosis. It is likely that the sample of treating psychiatrist reports was also quite complete, as the insurance company generally paid for treatment before litigation only if the treating practitioner provided a report. All eight pairs of reports by treating practitioners agreed that the plaintiff had a mental disorder, but only four pairs agreed about the principal diagnosis.
In 53 report pairs, the report writers agreed about the principal psychiatric diagnosis, and in 30, they did not agree about the presence of a mental disorder. In the remaining 65 pairs, the report writers agreed that the plaintiff had a mental disorder but did not agree about the principal psychiatric diagnosis.
Report pairs that did not agree about the presence of a mental disorder were more likely to be written by writers from opposite adversarial sides about a plaintiff who had not been in a fatal accident (Table 2).
The GEE suggested that pairs of reports with writers engaged by the same side and those written about a plaintiff involved in a fatal accident were more likely to agree that the plaintiff had a mental disorder and pairs of reports from the same adversarial side were also more likely to agree about the principal psychiatric diagnosis (Table 3).
Conclusions
The results of this study should be interpreted with caution because of the relatively small sample size and the likelihood that some plaintiff's reports that did not support the claim that the plaintiff sustained a psychiatric injury were not tendered. Hence, if a plaintiff's lawyer received an opinion that the plaintiff did not have a psychiatric disorder, the lawyer was free to seek a further opinion and was not required to disclose the contents of the earlier report. If a second report found that the plaintiff had a psychiatric disorder, it may not have agreed with the report of the defendant's expert, whereas the report that was not served on the defendant may well have agreed with the report of the defendant's expert. We were unable to find out how many plaintiff's reports were not served. The withholding of reports that did not support the plaintiff's case may have decreased the level of agreement between experts from opposite adversarial sides, but should have increased the level of agreement between pairs of experts engaged by the plaintiff. However, we found a low level of agreement about the actual diagnosis by experts from the same adversarial side.
A further limitation of the study is that more than half the experts who performed the assessments may have been aware of an earlier psychiatric opinion. Awareness of a previous diagnosis usually increases inter-rater reliability, but in an adversarial legal setting, it may have the opposite effect and may have contributed to the lack of agreement between the experts in this study, including the lack of agreement between experts engaged by the same side.
We found poor agreement about both the presence of any mental disorder and all the specific psychiatric diagnoses in reports by experts engaged by opposite sides. Experts from the same adversarial side usually agreed that a mental disorder was present, and being on the same side was the most important predictor of agreement between experts about the presence of a specific psychiatric diagnosis. This finding suggests that even if the experts were not biased, the reports that were eventually relied on in litigation contained predictable opinions about the presence or absence of psychiatric disorder.
We also found evidence of a significant level of error in making psychiatric diagnoses, as there was only modestly greater agreement about the diagnosis of common outpatient psychiatric disorders in reports from the same adversarial side when compared with those from opposite sides. The inter-rater reliability of the most common diagnosis, PTSD, in reports written by experts on the same adversarial side was only fair.
The diagnosis of PTSD presents particular difficulties in medicolegal settings. The disorder is defined as a consequence of trauma, even though the causal relationship between the traumatic event and subsequent symptoms may be the main matter before the court. Moreover, PTSD has few objective features and is a relatively new diagnostic category, for which the diagnostic criteria have been revised several times. We observed that some experts elicited a similar history but differed in their diagnoses because they disagreed about whether the plaintiff's experiences were sufficiently traumatic. Another reason for disagreement may have been the use of idiosyncratic diagnostic criteria in an attempt to avoid prompting the patient to report the symptoms or to exercise caution in relying on the DSM-III and DSM-IV criteria alone in medicolegal settings. However, the low level of agreement between experts reported in this study may not have been entirely due to the medicolegal context, as the κ statistics for the diagnosis of anxiety and depressive disorders were similar to the modest levels of agreement reported in the DSM-III field trials.6
The hypothesis that report writers would be more likely to agree about the presence of a mental disorder in those involved in fatal accidents was confirmed, but the seriousness of the plaintiff's overall injuries was not associated with increased agreement between experts. The level of agreement with respect to the psychiatric effects of fatal accidents suggests that feelings of sympathy evoked in the report writers may be a source of bias in expert opinions. Notably, all the experts who examined plaintiffs who were involved in fatal accidents diagnosed psychiatric disorders. However, the diagnoses varied considerably among both plaintiff's and defendant's experts, and the range of diagnoses applied to bereaved patients was similar to that of the diagnoses of nonbereaved plaintiffs.
The hypothesis that treating practitioners and experts would be less likely to agree than two experts or two treating practitioners was not confirmed. This result suggests that the theoretical concerns about the tendency of treating doctors to provide reports that favor the patients under their care can be overstated.
The study does provide some evidence of both error and bias in written evidence about psychiatric diagnosis. In general, if both error and bias are known to be present, minimization of bias should be the priority, as a biased but accurate measure is always wrong, whereas an objective but inaccurate opinion is sometimes correct. Measures to reduce bias obviously include the use of experts appointed and instructed by the court, disclosure of all reports including reports that are unhelpful to the plaintiff, and codes of conduct reinforcing the expert's primary duty to the court. However, the findings of this study do not support the reliance on court-appointed experts, as all experts may still be subject to bias arising from their perception of the wishes of the court and their subjective views of the plaintiff. Moreover, the degree of error in diagnosis suggests that it would be unwise to replace the current system with one in which psychiatric opinion is not tested by comparison with those of other experts.
Although this was a relatively small study that considered only subjects involved in motor vehicle accidents in a single jurisdiction, the similarities in civil litigation and psychiatric practice in English-speaking countries make it relevant to other jurisdictions. To the authors’ knowledge, it is the only study of its kind, despite the very large number of psychiatric reports prepared for courts each year around the world. The findings suggest the need for more research, not only to improve the reliability of psychiatric evidence, but also to establish the reliability of the diagnosis of minor psychiatric disorders in clinical settings.
Acknowledgments
The authors acknowledge Mr. Victor Kelly, formerly of Abbott Tout Lawyers, Sydney, for making the files available for study; and Professor Daniel Shuman, Southern Methodist University, Dallas, TX; Professor Chris Tennant, University of Sydney; and Dr. Tim Slade, University of New South Wales, for their assistance.
- American Academy of Psychiatry and the Law