Abstract
The purpose of this study was to assess the extent of agreement on psychiatric diagnosis in written evidence provided by experts in serious criminal matters in Australia. We found good or very good inter-rater agreement on the diagnoses of acquired brain injury, schizophrenia-spectrum psychosis, substance-induced psychotic disorder, and intellectual disability. There was moderate agreement on the diagnosis of depressive and personality disorders. Agreement on anxiety disorders, in particular post-traumatic stress disorder, was poor. Agreement on the principal Axis I diagnosis was moderate, and there was a similar probability of agreement within pairs of experts engaged by the same side and those engaged by opposite sides. Concern about bias in expert psychiatric opinion in criminal cases appears to have been overstated. There was little evidence to suggest that experts’ adversarial roles influenced their opinions on psychiatric diagnosis.
The evidence presented by experts to support the presence of psychiatric disorders has been described as biased1,2 or based on an inadequate scientific standard.3 However, most of the criticism of the work of mental health expert witnesses has been anecdotal2 and has focused on possible causes rather than empirical evidence of bias.
There are several potential sources of bias and perceived bias in adversarial legal proceedings. A lawyer may be aware of the opinions expressed by several experts in previous matters and may choose the expert who is most likely to support the client's case. The lawyer selects the documents to be examined and can instruct the expert to limit evidence to specific areas. The expert may feel sympathy for the client and write a report that is helpful, and, in some jurisdictions, an unfavorable report does not have to be presented in court. Factors that reduce the likelihood of bias include the expert's duty to be truthful and impartial and the prospect of cross-examination that could harm the expert's reputation.
Expert evidence can be excluded from proceedings if it is shown to be biased or unreliable. If the Daubert standard, which requires that scientific evidence have a known error rate, is strictly applied, the absence of empirical studies demonstrating that a branch of science is reliable may affect the admissibility of expert evidence.4,5
The reliability of the diagnosis is central to the science of psychiatry and to psychiatric evidence, because few psychiatric disorders have specific biological markers. Concern about the reliability of psychiatric diagnosis was the main reason for the adoption of specific definitions of mental disorders in the Diagnostic and Statistical Manual (DSM) classification system.6–9 The DSM-III and the International Classification of Diseases (10th edition; ICD-10) field trials used methods approximating ordinary clinical practice and reported satisfactory levels of agreement on the diagnosis of severe psychiatric illnesses, such as schizophrenia and bipolar disorder.8,10 However, agreement on the definitions of common outpatient conditions, including anxiety disorders and the less severe forms of depression, was less satisfactory, and neither trial reported the level of inter-rater agreement on the new diagnosis of post-traumatic stress disorder (PTSD).8,10
Subsequent studies of the inter-rater reliability or agreement on psychiatric diagnoses have involved structured and semistructured interviews; for example, the Structured Clinical Interview for DSM Disorders was used in the assessment of inter-rater reliability in the DSM-IV trials.11 Although structured diagnostic interviews have been found to improve levels of inter-rater agreement,12 they have not been widely used in forensic evaluations because of doubts about the validity of diagnoses generated when a list of symptoms is suggested to the person being evaluated.13,14 Hence, methods similar to those applied in forensic evaluations have seldom been used in research examining the reliability of psychiatric diagnoses.
In a study of reports about psychiatric injury after motor vehicle accidents, we found that experts’ diagnoses generally favored the side that engaged them and that experts engaged by the same side were significantly more likely to agree on the presence of a psychiatric disorder.15,16 The levels of agreement on the diagnosis of PTSD and other anxiety and depressive conditions were generally unsatisfactory, with kappa values of between −0.12 and 0.38.16 We were unable to locate any studies of the reliability of psychiatric diagnoses in reports prepared for criminal cases.
In the present study, we used methods similar to those in our study of civil cases16 to determine the extent of agreement on the psychiatric diagnoses made by experts engaged to examine clients in criminal matters. Considering our findings in the civil cases, we expected to find low levels of agreement on the diagnosis of anxiety and depressive disorders and a greater likelihood that experts engaged by the same side would agree about the psychiatric diagnosis than experts from opposite sides. Our first objective was to establish the level of agreement on diagnostic categories. We then tested two specific hypotheses:
First, that pairs of reports by experts from the same side are more likely to agree on the principal diagnosis and that pairs of reports of experts from opposing sides are less likely to agree. An association between same side report pairs and agreement on the diagnosis suggests that bias arises from the experts’ adversarial roles, and a lack of association suggests an absence of the effect of role.
Second, that pairs of reports by experts in the same profession (psychiatrist or psychologist) are more likely to agree on the principal diagnosis and that agreement is less likely in pairs of reports by experts in different professions. An association between same profession report pairs and agreement on the diagnosis suggests an effect of differences in the experts’ training and experience, and a lack of association suggests an absence of effect of profession.
We also examined the association of age, sex, marital status, employment status, history of criminal convictions, and seriousness of the charges with agreement on the principal psychiatric diagnosis, because of the possibility that agreement on diagnosis is associated with the defendant's demographic or criminologic variables.
Methods
The Sample of Reports
Copies of reports from a consecutive series of 110 criminal cases concluded between 2005 and 2007, in which there were two or more reports written by psychiatrists, psychologists, or both, were made available by the Office of the Director of Public Prosecutions (ODPP) in New South Wales (NSW), Australia. There were 270 reports, 226 of which were written by 30 psychiatrists and 44 of which were submitted by 15 psychologists. Defense experts wrote 148 of the reports and prosecution experts wrote 122.
All the cases involved serious indictable offenses that were dealt with in the higher courts, including 30 charges of murder or attempted murder (homicide offenses), 35 of malicious wounding or serious assault, 14 of sexual assault, 12 of offenses against property, 10 of drug-related offenses, and 9 of other offenses, including fraud, kidnapping, arson, and firearms offenses.
Permission to perform the study was obtained from the NSW Justice Health Research and Ethics Committee and the NSW Director of Public Prosecutions.
Data Collection
Data were collected on the following variables:
whether the expert was engaged by the defense or by the prosecution;
whether the expert was a psychiatrist or a psychologist;
the sex of the defendant and the age, marital status, and employment status at the time of the offense;
the most serious current charge and history of convictions;
and the psychiatric diagnoses.
All three authors independently rated reports from 28 cases, and there was no inter-rater disagreement on the experts’ diagnoses or the other variables recorded. The remaining 82 cases were rated by M.L. and O.N. or by M.L. and G.E., with no disagreements.
Coding of the Diagnoses
Australian psychiatrists generally adhere to the DSM diagnostic system, and the diagnoses made by the expert witnesses in this study were generally consistent with DSM-IV diagnoses. The diagnoses were coded according to categories that were consistent with DSM-IV chapter headings, and methods described in earlier research8 were used in the analysis of agreement. Hence schizophrenia, schizophreniform disorder, schizoaffective disorder, delusional disorder, and psychosis not otherwise specified were grouped together for the analysis of agreement on the presence of schizophrenia-spectrum disorder. Major depressive disorder, dysthymia, and adjustment disorder with depressed mood were grouped in a depressive disorders category. PTSD was included with anxiety disorders such as obsessive-compulsive and panic disorders. Non-DSM categories included acquired brain injury, intellectual disability, and psychosis. The category of acquired brain injury included traumatic brain injury, alcohol-related brain damage, and the dementias. Intellectual disability was coded if the expert made a diagnosis of mental disability of any severity, including borderline low intelligence. A further analysis was conducted of cases that fell in the broader category of psychotic disorders, which included schizophrenia-spectrum psychosis, substance-induced psychotic disorder (SIPD), psychotic depression, and mania, because the diagnosis of any of these disorders often results in a finding of reduced criminal responsibility.
Statistical Analysis
Analysis of multiple rating pairs for each subject presents a methodological dilemma arising from the number of degrees of freedom when three or more pairs are examined.16–18 The statistical methods used are described in our earlier study16 and include the use of generalized estimating equations (GEEs) to examine the association of the variables with agreement on the principal Axis I diagnosis.16
The kappa statistic19,20 was used to assess the level of agreement on the presence or absence of each diagnosis between experts on opposite sides and those on the same side. The ranges of kappa statistics showing agreement have been defined as poor, <0.2; fair, 0.2 to 0.4; moderate, 0.4 to 0.6; good, 0.6 to 0.8; and very good, 0.8 to 1.0.21 Nonparametric tests (the χ2test, or Fisher's exact test when the number of cases was less than five in any cell) were used to examine possible associations between the experts’ roles and specific diagnoses, where agreement was rated as poor or fair.
Overall agreement in both the kappa and GEE analyses was determined by first applying a diagnostic hierarchy to the principal Axis I diagnosis similar to that described in a previous study.15 In this hierarchy, the diagnosis of acquired brain injury was considered first. If the diagnosis was made in one of the reports, the other report was examined for agreement or disagreement. If neither report contained a diagnosis of brain injury, the reports were examined for a diagnosis of schizophrenia-spectrum psychosis and then SIPD, other psychotic disorders (including in this study two cases of mania and one case of alcohol withdrawal delirium), depressive disorders, and anxiety disorders. Thus, the highest disorder in the hierarchy of acquired brain injury, schizophrenia spectrum psychosis, SIPD, other psychotic disorders, depressive disorders, and anxiety disorders was coded as the principal Axis I diagnosis. Substance dependence and abuse disorders were not considered in the determination of agreement on the principal Axis I diagnosis, in part because a proportion of the reports recorded a history of substance abuse in the body of the report but did not include a diagnosis of a substance abuse disorder in the conclusions. Axis II diagnoses of personality disorder were not considered in the GEE analysis of overall agreement because of the low number of cases and because a rating of agreement or nonagreement could be made on the basis of an Axis I diagnosis in every report pair. There were no pairs of reports with a disagreement on personality disorder without an accompanying disagreement on the Axis I diagnosis.
The use of GEEs allowed the examination of factors associated with agreement on the principal Axis I diagnosis among the complete sample of report pairs. We used univariate GEEs to examine the associations between the independent variables and the binary dependent variable of agreement on the principal Axis I diagnosis. The individual case was regarded as the subject variable, and the particular report pair was the within-subject variable. All factors were also included in a multivariate GEE main-effects model.
A power analysis of the proportions of these variables (for example, same or opposite expert role) and the dependent variable (agreement or disagreement on the principal Axis I diagnosis) was performed prospectively, to determine the sample size needed for the GEEs on the basis of the following assumptions: that report pairs in which there was a agreement on the Axis 1 diagnosis would be twice as likely to include reports written by experts from the same side and that there would be an equal number of defense and prosecution reports. We calculated that 108 report pairs were necessary for an 80 percent chance of finding a true association (p < .05) between agreement and report pairs with experts from the same side.22 The statistics were performed with SPSS for Windows, version 15.0.
Results
Pairs of reports by experts engaged by the prosecution and the defense were available in 105 of the 110 cases. In 60 cases, there were two reports, in 38 cases there were three reports, in 5 cases there were four reports, and in 2 cases there were five reports. Thus there were 224 unique pairs of reports: 60 + (3 × 38) + (5 × 6) + (2 × 10). These comprised 147 report pairs by experts from opposite sides and 77 report pairs by experts engaged by the same side (Table 1). A psychiatric diagnosis equivalent to a DSM-IV diagnostic category was made, or a statement that no mental disorder was present, was recorded in every report. Most report writers explicitly used DSM-IV categories, and the criteria relied on to make the diagnoses were generally referred to in the conclusions.
There was a good or very good level of agreement on the presence of SIPD (κsame role = 0.795, κopposite roles = 0.819) and the presence of intellectual disability (κsame role = 0.751, κopposite roles = 0.845). The agreement on schizophrenia-spectrum psychosis (κsame role = 0.737, κopposite roles = 0.630), personality disorder (κsame role = 0.555, κopposite roles = 0.607), acquired brain injury (κsame role = 0.623, κopposite roles = 0.653), and substance misuse (κsame role = 0.514, κopposite roles = 0.643) was also classified as good.
There was moderate agreement on the diagnosis of depressive disorder (κsame role = 0.381, κopposite roles = 0.476). Prosecution experts diagnosed depression on 19 occasions and defense experts on 28, a difference that did not reach statistical significance (χ2 = 1.357, p = .241). However, there was agreement in fewer than half (20/55) of the pairs of reports in which this diagnosis was raised.
Agreement on the diagnosis of anxiety disorders was poor (κsame role = 0.187, κopposite roles = 0.144), PTSD was diagnosed on 16 occasions by defense experts and once by a prosecution expert (Fisher's exact test, p < .001). It was the most common anxiety disorder and was diagnosed by at least one expert in 26 of the 224 pairs of reports. There was one case in which a defense and prosecution expert agreed on the presence of PTSD and one other in which two defense experts agreed. In the remaining 24 pairs, only the defense expert made the diagnosis of PTSD. There was one case of obsessive-compulsive disorder (κ = 1.0 in both groups) and two cases of mania (κ = 1.0 in both groups). The diagnoses of pathological grief (three pairs), malingering (three pairs), and attention deficit-hyperactivity disorder (two pairs) all had a kappa of 0 in both groups.
There was good agreement on the presence of psychosis (κsame role = 0.802, κopposite roles = 0.702) and moderate agreement on the principal Axis I diagnosis (κsame role = 0.491, κopposite roles = 0.453).
Kappa values were generally not higher in the group of reports written by experts from the same side compared with pairs of reports by experts on opposite sides, indicating that psychiatric diagnosis was not influenced by the expert's adversarial role. There was no association between report pairs of experts from the same side and agreement on the principal Axis I diagnosis in the univariate or multivariate GEEs. Agreement on the principal diagnosis was significantly lower in homicides compared with other offenses, when examined with univariate and multivariate GEEs (Table 2). A post hoc examination of homicides suggested that this finding was largely due to disagreement on emerging schizophrenia-spectrum psychosis. There were 11 pairs of reports about homicides in which one expert had made a diagnosis of first-episode schizophrenia and the other had not. In these pairs, the other expert had diagnosed SIPD (three pairs), psychotic depression (three pairs), PTSD (one pair), cognitive disorder (one pair), prodromal schizophrenia (one pair), malingered psychosis (one pair), and no psychiatric disorder (one pair).
There was a lower probability of agreement on the principal diagnosis of married than of unmarried defendants in the univariate analysis and a lower probability of agreement on male than on female defendants in the multivariate analysis.
Discussion
We found good agreement on the diagnosis of common psychiatric disorders among defendants in criminal matters. In contrast to the findings of our study of civil reports,16 there was little evidence of bias arising from whether the expert was engaged by the defense or by the prosecution. Agreement on the principal Axis I diagnosis was moderate, reflecting the narrower definition of inter-rater agreement used in our analysis.
We found low levels of agreement for the diagnosis of anxiety disorders, and to a lesser degree, depressive disorders. The higher level of agreement on psychotic illness may have been due to the presence of objective signs and a documented history of treatment in more of the defendants with those diagnoses than in those with diagnoses of PTSD and depression.
The diagnosis of PTSD is often of little relevance in criminal proceedings, because it usually does not reduce criminal responsibility or carry great weight in mitigation, unless the offending behavior was a direct consequence of some form of trauma. Hence, some experts may not have considered whether the accused had PTSD. However, disorders such as substance abuse and personality disorder are also rarely considered in trials and those diagnoses had moderate or good levels of inter-rater agreement. The poor level of agreement on the diagnosis of PTSD in a study that also found little evidence of bias and good levels of agreement in the diagnosis of other disorders casts further doubt on the reliability of the diagnosis of PTSD in medicolegal settings.23,24
The first hypothesis, that experts from the same side are more likely to agree than experts from opposite sides, was not confirmed. The kappa statistics for the diagnosis of the principal Axis I diagnosis and schizophrenia-spectrum disorders were slightly higher in the pairs of reports from the same side than in those from opposite sides, but this finding was not the case for any other disorder, and same expert role was not associated with agreement on the principal diagnosis in the GEEs.
The second hypothesis, that experts from the same profession are more likely to agree about the diagnosis than experts from differing professions, was not confirmed. Report pairs written by two psychiatrists were no more likely to agree on the diagnosis than were report pairs written by a psychiatrist and a psychologist.
There was some evidence that demographic factors and aspects of the charge itself were associated with agreement on the principal diagnosis. In the univariate analysis, there was less agreement on the principal diagnosis of married defendants, and in the multivariate analysis, there was less agreement on the psychiatric diagnosis of males. These findings should be interpreted cautiously because there were only 16 female defendants and because the association between being married and a lower level of agreement was not independent of other factors. Both univariate and multivariate test results showed a lower probability of agreement on the diagnosis in homicides. The apparent explanation for this finding is that patients with an established diagnosis of schizophrenia were more likely to face nonhomicide charges, whereas defendants in their first episode of psychosis, with no history of treatment for psychosis, committed 21 of the 30 homicide offenses. This finding concurs with that of Bourget and associates,25 who noted the difficulty in arriving at a reliable diagnosis in cases of homicide in first-episode psychosis. However, associations of agreement on diagnosis with age, marital status, and the presence of homicide charges should be regarded with caution and may represent type I errors, because we were not examining a specific hypothesis and did not apply a statistical correction for multiple comparisons of demographic and criminologic variables.
The higher level of reliability of psychiatric diagnoses made by expert witnesses in criminal matters is in contrast with findings in our earlier study of expert reports in civil proceedings and may be due to the nature of disorders found among those charged with serious criminal offenses. Many of the defendants had severe and disabling forms of schizophrenia, a condition that is more reliably diagnosed than are nonpsychotic disorders.8,10,26 Another reason could be the way in which the experts were engaged by the lawyers in criminal matters. The prosecution experts were selected from a panel of experienced forensic psychiatrists, most of whom were still in clinical practice and who had also prepared reports for the defense in other cases. The experts chosen by the defense on the basis of their experience and reputation in court were also often on the prosecution panel of experts. By contrast, the experts in our study of civil matters were often retired from clinical practice and had become full-time expert witnesses, usually providing reports for just one side or even for just one of the insurance companies or firms specializing in personal injury. A further difference between criminal and civil cases is that experts in civil matters are less likely to face cross-examination.15
Another limitation of the finding of a satisfactory level of agreement between experts on major psychiatric disorders in this study is that levels of agreement on the absence of a mental disorder were higher than levels of agreement on the presence of a disorder. Although agreement on uncommon disorders relies on concurrence on their absence, this difference was also found in relation to the most common diagnosis, schizophrenia-spectrum psychosis.
Perhaps the most significant limitation of our assessment of agreement on psychiatric diagnosis resulted from the use of observational methods in the study. We were unable to establish whether any reports had been withheld by the defense, either because they were of no assistance or because they included history that was prejudicial to the defendant's case. Moreover, although we were able to control for a loss of statistical independence in cases with three or more reports with the use of GEEs, even the diagnoses in cases in which there were just two reports might not always have been entirely independent, because the writers of the prosecution reports would frequently have been aware of the diagnoses made in the reports by the defense experts, and both report writers might have been aware of the earlier diagnoses made by other treating doctors. Furthermore, the reason for requesting more reports in some cases could have been the disagreement about the diagnosis between the initial reports. Additional reports would be expected to lower agreement in most cases.
Agreement could also have been reduced by some experts who omitted diagnoses that they knew to be present but did not consider relevant. For example, the diagnosis of a substance abuse disorder was often not made despite a history of harmful substance use in the body of the report. This omission could also have contributed to the lower levels of reliability in the diagnoses of some disorders, including PTSD.
The role of expert witnesses in Australia is similar to that of expert witnesses in other jurisdictions with adversarial legal systems, and the question they are usually asked to address is not the defendant's psychiatric diagnosis, but the level of criminal responsibility or competence to stand trial. Hence, although all of the reports in this study included an opinion about psychiatric diagnosis, agreement on the presence or absence of a disorder did not necessarily mean that there was agreement on competence to stand trial or the availability of a mental illness defense. However, in a related study of the reliability of expert opinion about legal issues in the same case sample with the same statistical methods, we found moderate levels of agreement between defense and prosecution experts on the availability of the defense of mental illness (κ = 0.508; 95% CI, 0.295–0.720) and fair levels of agreement regarding competence to stand trial (κ = 0.293, 95% CI, 0.134–0.451), with little evidence of bias arising from the expert's role.27
Finally, although we examined a large consecutive sample of cases, the sample size did not provide sufficient statistical power to exclude the possibility that bias can sometimes influence the psychiatric diagnosis provided by experts. We showed that bias arising from whether the expert was engaged by the defense or prosecution in criminal proceedings did not have a great influence on the diagnosis. The good level of agreement, particularly about the diagnosis of psychotic disorders, is also likely to exist in other jurisdictions that use similar methods for training psychiatrists and qualifying experts. However, the standard of written opinions may have improved in recent years in NSW because of the introduction of codes of conduct for expert witnesses, which require experts to state that any report is complete, truthful, and objective and that they are aware that their first duty is to assist the court.28
Conclusions
The results of this study show that the psychiatric diagnoses made by expert witnesses in criminal matters are generally reliable. An exception appears to be the diagnosis of PTSD, either because it was not relevant to criminal proceedings, or because it is difficult, with the current diagnostic criteria, to make the diagnosis in a reliable way in legal settings.
Acknowledgments
We thank Nicholas Cowdrey, QC, New South Wales Director of Public Prosecutions, for permission to conduct the study; Craig Hyland, Office of the Director of Public Prosecutions, for assistance in collating the reports; and Peter Arnold for invaluable assistance in preparing the manuscript.
Footnotes
-
Disclosures of financial or other potential conflicts of interest: None.
- American Academy of Psychiatry and the Law