Abstract
Despite the growing attention to quality and quality improvement in health care in the United States, forensic psychiatry has yet to incorporate relevant developments and information and make quality an important item on the agenda. This article reviews the empirical research regarding the perceived quality of forensic evaluations, which has primarily examined criminal rather than civil forensic evaluations. Beyond the available research, many important policy and empirical questions must be addressed, including the definition of a quality forensic evaluation, the process used to access quality, the indicators and measures used, the methods that provide incentives for performing quality evaluations, the role of forensic psychiatry training programs, and the role of the American Academy of Psychiatry and the Law (AAPL) or other professional organizations in the quality improvement enterprise.
In 1996, Dietz1 encouraged forensic psychiatrists to aspire to excellence in their chosen field. He emphasized the need to search for the truth, to be honest, to disclose biases and credentials to potential clients, and to accept cases with caution. He asked us to disclose completely “all information, all reasoning, and all opinions” (Ref. 1, p 162). He recommended that we volunteer the weaknesses of the case, despite the client's resistance. He cautioned us not to serve as advocates for a political or social cause, much as a lobbyist might do, or as a frustrated lawyer advocating for a party to the litigation. Rather, he advised that we adopt the role of the forensic scientist, and present our findings and opinions with “scrupulous fairness” (Ref. 1, p 161). He noted that the “ultimate arbiters of excellence in forensic practice” are our peers rather than our clients, who are the attorneys and courts (Ref. 1, p 154).
Though useful, these ideas are preliminary to a more far-reaching goal of addressing quality and quality improvement in forensic evaluations.2 In this article, some important concepts and developments regarding quality improvement in clinical medicine and mental health care are reviewed. Knowledge of these developments should further efforts to improve forensic evaluations. Later, a review of empirical studies of the quality of forensic evaluations is provided. Finally, some of the quality and quality improvement in forensic evaluations are discussed.
Quality and Quality Improvement in Clinical Medicine
Two significant themes in the culture of medical practice in the United States are now the focus of substantial attention: patient safety and quality of medical care.3,4 Patient safety is just one component of quality. Quality care is certainly safe, but it is also timely, patient centered, efficient, effective, and equitable (i.e., quality does not vary because of gender, ethnicity, location, and socioeconomic status).5 We might, therefore, refer to the quality of health care in the plural rather than the singular. There are health care “qualities” rather than one quality, and each should be delineated and assessed.6,7 The different stakeholders in the health care system may each have diverging goals for a quality system, with patients focused on effectiveness and safety, but insurers focused on efficiency. The concern about both health care safety and quality reflects increased scrutiny and demand for accountability by the greater, non-health care community.8
Superimposed on fiscal concerns is the more recent concern about patient safety, medical error, and the cost of professional liability insurance.3 The Institute of Medicine's 1999 landmark report on medical error, To Err is Human: Building a Safer Health System,9 contended that there are as many as 100,000 fatalities annually in hospitals in the United States due to medical errors or adverse events. This report has prompted substantial interest in examining systems of medical care rather than individual physicians as the ultimate cause of medical error. It is revolutionary for medicine to examine health care systems as the source of adverse events, rather than to blame individual practitioners.10,11
The Institute of Medicine has drawn further attention to health care quality through its publication, Crossing the Quality Chasm,5 which recommended a framework and strategies to improve health care quality. Toward this end, the Institute of Medicine has adopted the following definition of health care quality: Quality is the “degree to which health services for individuals and populations increase the likelihood of desired health outcomes and are consistent with current professional knowledge” (Ref. 5, p 232).
Quality can be evaluated based on the structure of health care (i.e., physical facilities, staffing, organization, financing, provider type and duties, utilization controls and incentives), its process (i.e., the delivery of care, such as actual interventions of assessment and treatment), or its outcome (i.e., the results of care, such as morbidity, mortality, and patient function and satisfaction).6 Clinicians may be more comfortable measuring the process of care because of the multidetermined nature of clinical outcomes, and purchasers are more interested in measuring patient satisfaction and utilization and the outcome of health care.7 Cost control efforts focus on process and outcome, rather than the structure of care.6 Generally, health care quality is a property of the health care system. But, to improve the daily practice of medicine, we should monitor its performance and assess change in health care practice, once quality improvement efforts have been implemented.7
National efforts to assess and improve health care quality have been ongoing for several years through numerous organizations and agencies.12 The Centers for Medicare and Medicare Services (CMS) has published hospital morbidity and mortality data. Another federal agency, the Agency for Health Care Research and Quality (AHRQ), has developed a computerized quality-measurement system, with a database of specific measures, and sets of clinical performance measures. The National Committee on Quality Assurance (NCQA), composed of managed health care organizations, has produced Health Plan Employer Data and Information Set (HEDIS), which provides standardized evaluation criteria or performance measures to help employers select health plans; these criteria are predominately medical rather than psychiatric. The Joint Commission on the Accreditation of Health Care Organizations (JCAHO) has developed its own system for accreditation and performance measures. Some states publish data on various providers' outcomes for selected diseases or procedures. With the diverging priorities of these organizations, it has not been possible to obtain a consensus on performance measures.4
Measures of health care quality are essential for health care purchasing, monitoring, accreditation, and advancement of performance.7 Outcome measures of care in medicine are increasingly evaluated and include appropriate use of medication for chronic disease (diabetes, heart disease, post-myocardial infarction, asthma, and hypertension), complication rates after surgery, prolonged hospital lengths of stay, and rehospitalization rates.4,12 Quality of medical care indicators for screening include screening for breast cancer, cervical cancer, colorectal cancer, childhood immunization, and cholesterol levels.4 Ratings of physician performance include patient satisfaction, clinical outcome, and compliance with practice parameters and professional guidelines.13,14 Patient satisfaction ratings include timely access to care, provider-patient communication, rating of medical care, office organization, and health care quality of life. Patient and family satisfaction ratings are often used but have been criticized as failing to measure quality of health care.
Empirical study of the quality of medical care actually delivered in the United States reveals disappointing results. McGlynn and colleagues15 determined that only one-half of the recommended preventive, acute, or chronic care was actually provided to patients. More surprising was that quality of delivered care varied substantially by medical condition. These data bring a sense of urgency to the failure of our health care system as a whole. Given the failure of traditional medical care, integrated disease management programs that educate consumers and support practitioners are being developed and implemented for some chronic diseases.12,16
There are marked disparities in health care access, use, quality, and outcome by race, ethnicity, income, education, place of residence, and sex.17,18 Such disparities persist, even when availability of health insurance is eliminated as a confounding variable. Clinician bias is one explanatory factor in these disparities. Efforts to improve the overall quality of medical care have been demonstrated to reduce, but not eliminate, these disparities.19
Some insurers and managed care organizations offer a performance or quality financial bonus for excellent health care.20,21 Employers, distressed over large increases in health insurance premiums, could require quality assessment and improvement efforts and reward providers of high quality, efficient health care who obtain better patient outcomes. Family practitioners in the United Kingdom can contract with the National Health Service to link physicians' pay to the quality of health care provided.22 Such a quality bonus initiative requires that the medical practice regularly collect data and hire additional staff. Investment in health care technology, such as electronic medical records and prescriptions, is thus encouraged to improve quality and efficiency and reduce medical error.
Current efforts to improve patient safety and health care quality use several methods. Some insurers and managed care organizations have published health care quality results, so-called report cards, for hospitals and outpatient practices.23–25 Morbidity and mortality conferences parade poor outcomes before the hospital medical staff or department faculty. Incident reports to the hospital, state department of health, or other state agency, are increasingly required. Hospitals and other facilities are accredited, based on a variety of quality-of-care indicators and measures. Confidential peer review is undertaken by hospital staff to an unknown extent.26 Professional liability suits are brought, in part, to deter future adverse outcomes and improve care. Most of these health care quality improvement mechanisms have not enjoyed significant implementation with regard to improving forensic mental health evaluations.
There are several barriers to repeated measuring of health care quality, including motivational, financial, organizational, and technical.4,5,7,15,22,25 Measurement systems should be developed, implemented, and financed, with many methodological challenges. Specific goals are needed to implement practice changes. Attitudinal and motivational problems apply to the individual clinicians and the hospital or health care system seeking quality improvement. Resources devoted to quality improvement could compromise attention to other health care needs, such as providing for the care of the uninsured and underinsured.
Many of these matters regarding the quality of health care can apply to the quality of forensic evaluations and will be elaborated herein. Regard for the numerous dimensions of health care quality, absence of consensus on performance measures, challenges in evaluating health care quality, importance of the system of care in understanding quality, attention to process and outcome of care, presence of several stakeholders in the endeavor, usefulness of technology in the improvement of quality, disappointing actual quality of health care, impact of financial considerations on health care quality improvement, and future use of quality report cards are each potential problems for the quality of forensic evaluations, as well.
Quality and Quality Improvement in Psychiatry
We know relatively little about the quality of mental health treatment programs, how well they work, and how to measure their effectiveness.27 Quality improvement research activity has focused on the detection and treatment of serious depression, schizophrenia, and suicide.
Quality indicators specific to psychiatry include time until treatment, detection of mental disorder, and adequacy of treatment. Specific quality measures include rehospitalization rates, time interval until an outpatient appointment after hospitalization, adequacy and duration of acute and maintenance antidepressant medication treatment, frequency and duration of use of hospital seclusion or restraint, and use of appropriate laboratory testing. Outcome measures include symptomatic improvement or remission, functional status, return to employment, or complications such as criminal arrest or suicidal behavior.28,29
Some researchers have investigated the adequacy of treatment of mental disorders by primary care physicians and psychiatrists.30,31 Results generally indicate that primary care physicians fail to detect depression and anxiety or to treat it properly.32 In one study, only 19 percent of those with a probable depressive or anxiety disorder who were seen by a primary care physician received appropriate care (i.e., two months of psychotropic medication in the past year, or at least four counseling visits).33 Appropriate care was less likely to be provided to those patients who were black, less educated, male, younger than 30 or older than 59 years. When seen by primary care physicians, depressed patients are undermedicated, not treated at all, or not referred for counseling. Antidepressant medication prescriptions are written without follow-up, leading to the patient's noncompliance and discontinuation of medication and symptom chronicity.30–33
Quality improvement research has demonstrated improved outcomes for depression and schizophrenia, using models for the management of chronic illness. These models include training clinicians in appropriate care, repeated and systematic follow-up of patients, monitoring, outreach programs, telephone support of patients, and collaboration between the treating mental health professional and the primary care physician.31,34–36
Even in private practice, clinicians are increasingly using outcome measures. Hatfield and Ogles37 surveyed a national sample of psychologists, almost all of whom had doctoral-level education and had been licensed for a mean of 18 years. Of the sample (n = 874), 37 percent reported that they used some form of outcome assessment, including standardized and individualized measures. They reported both internal and external motivations for using outcome measures.
Consumer ratings of behavioral health services in managed health plans are now commonplace and are required by the National Committee for Quality Assurance (NCQA) accreditation process.38,39 Surveys ask consumers about their experience in locating a clinician, seeking approval from the health plan, selecting a clinician with whom they were satisfied, and the time taken to obtain treatment.
As noted earlier in the discussion of quality of general medical care, the problems and challenges with respect to quality improvement of mental health care are relevant to quality improvement efforts regarding forensic evaluations. The absence of consensus quality or performance measures for mental health treatment, the need to examine systems of mental health care, the attention to process and outcome dimensions of mental health care, the frequently low quality of measured actual mental health care, and the usefulness of consumer ratings of quality are relevant to the quality improvement process regarding forensic evaluations.
Empirical Data Regarding Quality of Forensic Evaluations
Relatively little conceptual analysis has been undertaken with regard to the quality of forensic evaluations. The published literature, with few exceptions, has not benefited from current knowledge and research findings obtained from the work reviewed in the prior section regarding the general quality of health care. The published literature on the quality of forensic evaluations has largely been empirical and has concentrated on the adequacy of evaluation reports.
A variety of empirical studies have been conducted relevant to standards and quality of forensic evaluations.40 Topics investigated include contents of actual forensic reports, desired contents of forensic reports, perceived deficiencies of reports and evaluations, and prevalence of the use of diagnostic tests. Some studies report only descriptive data regarding expert report contents, while others include quality ratings. The studies have included the perspectives of attorneys, judges, and forensic mental health experts. Relatively few studies have been conducted regarding the quality of forensic evaluations, except in the case of child custody evaluations. Many of the published studies consist of small sample sizes in a restricted geographical area. A representative sample of these will now be reviewed, in chronological order, dealing with general forensic practice, psychological test usage, child custody, and surveys of judges and attorneys.
General Forensic Practice
Petrella and Poythress41 examined reports of competency-to-stand-trial and criminal responsibility evaluations conducted at Michigan's Center for Forensic Psychiatry using several quality measures. Their principal concern was the respective quality of evaluations conducted by psychiatrists, psychologists, and social workers at the Center, all of whom had at least some forensic evaluation training and used a standardized evaluation format. They used two measures of an evaluation's thoroughness: first, the use of data sources other than the interview; and, second, the quantity and comprehensiveness of the evaluator's notes recording the interview. In addition, they had three outside raters (a practicing attorney, a trial judge, and a law professor) rate the quality of a sample of forensic reports (n = 30), using a nine-point Likert scale, on a set of quality measures (e.g., for criminal responsibility reports: use of proper legal criteria, a clearly stated opinion, adequate basis for opinion, use of psychiatric jargon, clinical characteristics of the defendant, and overall report quality). The obtained results were that reports by psychologists were more thorough than those by psychiatrists, based on the use of more collateral data sources. In trial competency reports, the frequency of contacting attorneys was 43 percent by psychiatrists, 45 percent by psychologists, and 53 percent by social workers. Review of the defendant's previous medical or psychiatric records was even less frequent in trial competency evaluation reports, with 10 percent by psychiatrists, 25 percent by psychologists, and 35 percent by social workers. Psychologists' reports were often considered to be of higher quality when rated blindly by the outside raters.
Heilbrun and Collins42 rated trial competency and criminal responsibility reports of evaluations conducted in the hospital and in the community in Florida. Raters examined preadjudication (i.e., those recommended as incompetent to stand trial) and postadjudication trial competency reports (n = 277) prepared by psychologists and psychiatrists. Only 19 percent of the latter were certified by the American Board of Psychiatry and Neurology. Some evaluations included both trial competency and criminal responsibility. The reports were relatively brief, with a mean length of 3.9 pages for both hospital and community evaluations. Only 30 percent of community-based evaluations included a notice of nonconfidentiality to the evaluee. Psychological testing (i.e., the MMPI, and the Revised Wechsler Adult Intelligence Scale; WAIS-R) was used in 13 percent of the hospital-based evaluations and 41 percent of the community evaluations. Hospital-based evaluators reviewed previous mental health evaluations in 81 percent of cases, but community-based evaluators did so in only 30 percent of cases. Arrest reports were reviewed in 95 percent of hospital evaluations and 48 percent of community evaluations. Arrest reports were reviewed in 42 percent of competency evaluations, but in 67 percent of criminal responsibility evaluations. Jail staff were interviewed in 1 percent of hospital evaluations, but 17 percent of community evaluations. Evaluators did not contact attorneys when conducting trial competency evaluations. The reports addressed the relevant state legal competency criteria in 95 percent of hospital reports and 61 percent of community reports. An ultimate issue opinion was offered in 95 percent of hospital reports, 99 percent of community reports, and 47 percent of criminal responsibility reports. No quality ratings were conducted.
Borum and Grisso43 surveyed a national sample of forensic psychologists (n = 53) and psychiatrists (n = 43) concerning their beliefs about the appropriate content for reports (i.e., not evaluations) on competency to stand trial and criminal responsibility. This novel approach did not examine evaluators' reported or actual behavior as experts. Of the sample, 81 to 83 percent had forensic board certification (American Board of Forensic Psychiatry, American Board of Forensic Psychology), with an average of 17 years of experience in conducting forensic evaluations, and 52 to 54 percent of their practice consisted of forensic evaluations. Each had conducted an average of 48 trial competency evaluations and 35 criminal responsibility evaluations in the prior year. On the written survey, each subject rated the importance of potential content items for competency and responsibility reports on an importance scale of essential, recommended, optional, or contraindicated. The investigators defined a “consensus” on these ratings when a content item was endorsed by 70 percent or more of the respondents. Content and quality ratings of the experts' actual reports were not evaluated, but the investigators considered only opinions about the appropriate content of criminal forensic reports. Given the large number of content items in both types of reports, there was considerable variability regarding what experts thought was essential or recommended. For instance, there was much disagreement about whether the defendant's description of the alleged offense should be included in trial competency reports. One-third of the psychiatrists thought that it was contraindicated in reports, but one-half of the psychiatrists believed that it was important to include. Similarly, one-third of psychiatrists and psychologists thought that it was important to include the police version of the alleged offense in a trial competency report, but 27 percent of psychiatrists thought that inclusion was contraindicated. Only one-fourth of forensic psychologists thought that psychological testing was essential for competency and responsibility evaluations. On the one hand, two-thirds (67%) of psychiatrists indicated that ultimate issue opinions were essential for trial competency evaluations; on the other hand, 13 percent of psychiatrists indicated that it was contraindicated to include ultimate issue opinions in competency reports. Regarding criminal responsibility reports, 59 percent of psychiatrists thought that ultimate issue opinions were essential, but 20 percent thought that they were contraindicated.
Robbins and colleagues44 evaluated 66 systematically selected competency-to-stand-trial reports from New Jersey (inpatient evaluations) and Oklahoma (community-based evaluations).44 Two raters independently coded the reports to determine whether the reports provided relevant functional psycholegal ability data. In all but four reports, examiners expressed an ultimate issue opinion about the defendant's trial competency. More than 25 percent of the defendants were deemed “incompetent” in the reports by the examiners, but examiners sometimes failed to provide treatment recommendations to restore competency. In all but four reports, examiners recorded a psychiatric diagnosis, but only 27 percent of reports stated how the diagnosis affected the defendant's functional ability. Only 11 reports included psychological testing or review of the treatment record, but most of these failed to describe how these data related to functional deficits; 26 of the reports were based only on an interview with the defendant, without testing or third-party data. Examiners included extraneous information, such as treatment for conditions unrelated to trial competency deficits, treatment of individuals found to be competent, or opinions about risk of violence and criminal responsibility. The authors expressed disappointment in the quality of sampled competency reports. They recommended a national sample of trial competency assessment reports to determine practice in this area, and a standardized format for conducting and recording competency-to-stand-trial evaluations.
Skeem and colleagues45 analyzed community examiners' reports of trial competency in Utah conducted between 1991 and 1994. Two clinicians examined each of the fifty defendants. There were 18 examiners, 80 percent of whom were psychologists. Only two examiners had forensic diplomate status, so that most were general clinicians rather than trained forensic experts. With regard to the use of collateral data in the assessments, 65 percent of the examiners reviewed police reports, 37 percent reviewed mental health records, and 9 percent contacted defense counsel. Most examiners failed to consider the defendant's legal decision-making capacity, but they did address related abilities, such as appreciation of charges, knowledge of courtroom personnel, and ability to disclose information to counsel. Examiners typically failed to relate the defendant's psychopathology to impaired competency to stand trial. Even when examiners opined that the defendant was competent, they only infrequently provided details regarding the defendant's psycholegal abilities. The examiner pairs agreed as to clinical diagnosis (79%) and trial competence (82%), but not as to the specific trial competence domains.
Hecker and Steinberg46 examined 172 juvenile predisposition reports from a Philadelphia area juvenile court jurisdiction between 1992 and 1996. The purpose of the reports was to provide relevant information to the juvenile court judge to guide court disposition. The reports had been prepared by four licensed psychologists. Report contents were quantitatively rated on a three-point scale for each content area in question. The investigators also examined the concordance between the psychologists' recommendations and the courts' disposition of the case. Perhaps, the most significant finding of the study was that only seven percent of the reports were deemed to have sufficient or better ratings of the evaluators' explanation of disposition recommendations. Judges were more likely to implement the evaluators' recommendations when there was adequate mental health information presented in the report and the evaluator adequately explained the recommendations.
Ryba and colleagues47 surveyed psychologists regarding appropriate practice of competence-to-stand-trial evaluations of juveniles. Respondents (n = 82) were doctoral level psychologists throughout the United States with considerable experience in conducting adult and juvenile trial competency evaluations. The mailed, written survey instrument was modeled after that of Borum and Grisso,43 which solicited information about the importance of elements of the competency evaluation instead of examining evaluators' reports. Similar results were obtained in this study, despite the different subject pool and juvenile evaluation population, though more subjects in this study indicated that particular items were essential or required than in the earlier study. Again, 68 percent of the sample indicated that ultimate issue trial competency opinions were essential, but 10 percent thought that they were contraindicated. Differences in opinion on this matter may reflect divergent jurisdictional requirements. Ten percent thought that proffered opinions on factors other than the referral issue of trial competency were essential, but 33 percent thought that they were contraindicated. Regarding psychological testing, 44 percent indicated that testing was essential, and 35 percent recommended it, predominately intelligence testing. Forensic assessment instruments were thought to be essential in 30 percent of juvenile competence evaluation reports and recommended in 40 percent, predominately the Competence Assessment for Standing Trial for Defendants with Mental Retardation (CAST-MR). Subjects were also queried about the frequency of the use of tests and instruments, revealing somewhat lower frequencies. The investigators did not determine the accuracy of these self-reported test frequencies to ascertain whether evaluators actually do what they report.
Ryba and colleagues48 separately reported on the assessment of juvenile maturity by psychologists in competency-to-stand-trial evaluations. A wide variety of techniques and as many as 34 tests were reportedly used in this regard, suggesting the absence of a definitive standard of practice in this area.
Zapf and colleagues49 examined competency-to-stand-trial reports in Alabama that had been completed between 1994 and 1997. Only those reports in which the defendant was assessed as incompetent to stand trial were reviewed (n = 53). Similar to findings of other studies in this area, these reports omitted discussion of several psycholegal functional areas required by law. For instance, 22 percent of the reports did not address the defendant's understanding of the nature of the proceedings; 15 percent did not address the defendant's ability to participate effectively in the trial process; 93 percent failed to discuss the defendant's ability to appreciate his or her role in the legal proceedings; and 27 percent did not discuss the defendant's restorability to competence.
Christy and colleagues50 evaluated juvenile competency-to-stand-trial evaluations conducted in Florida between 1997 and 2001 by private practitioners, 95 percent of whom had doctoral or medical degrees. Only those evaluations of juveniles adjudicated incompetent to proceed and referred to the state's juvenile restoration program were reviewed (1,357 reports for 674 juveniles). The juveniles had a median of two reports each, and reports were a median of four pages in length (range, 0.5–15 pages). Nearly one-half (48%) of the reports failed to state where the evaluation was conducted. Arrest reports were reviewed in 38 percent of cases, school records in 12 percent, and interviews with defense attorneys in 2 percent. Sixty percent of reports indicated at least one third-party interview. Intelligence testing was used in 44 percent of cases, reflecting that mental retardation was the basis for a finding of incompetence in half the cases. Forensic assessment instruments were used in 29 percent of evaluations. Important cognitive mental status information was omitted in one-half the reports. However, evaluators' descriptions of the defendant's psycholegal functioning was better than that of clinical functioning, but often failed to provide examples or supporting data regarding psycholegal capacities. Ultimate-issue opinions were provided in 96 percent of cases, as required by Florida law. In only 62 percent of cases was the predicate condition for the incompetency finding stated in the report, contrary to legal requirements.
Warren and colleagues51 reviewed 5,175 criminal responsibility evaluations, over a 10-year period, conducted by 222 evaluators trained at the University of Virginia. Training was a five- to seven-day program that included a sample report segment and completion of a written examination. Results revealed that these evaluators spent three hours interviewing the defendant, and eight hours total on the case, including report preparation. A surprising finding was that evaluators often did not have available the defendant's criminal history, psychiatric or medical records, defendant's statements about the charges, or witness accounts. Psychologist evaluators used testing in 22 percent of cases, and psychiatrists did so in 6 percent. Of interest was the finding that less experienced evaluators were more likely to opine that the defendant met the standard for legal insanity. With regard to race, 8.5 percent of minority defendants were judged legally insane, compared with 11.4 percent of white defendants, raising a question about racial disparity in the conduct of forensic evaluations.
Psychological Test Usage
Psychological tests are an important and sometimes essential component of forensic evaluations. Testing, in theory, can improve the quality of forensic assessments, although data to establish this hypothesis are lacking. Yet, there are many relevant questions about test selection, administration, interpretation, qualification of the examiner, and admissibility in court required under a general acceptability or peer-based admissibility criterion.52,53 Data about actual test usage can provide answers to these questions.
Borum and Grisso43,54 studied the frequency of reported psychological test usage in criminal forensic evaluations, examining the same national sample reported earlier. Evaluators were asked their opinions about the importance of psychological testing in trial competence and criminal responsibility evaluations, as well as their reported use of tests and forensic instruments. They provided data on the types of tests and the specific tests used for these evaluations. Actual test usage was not determined, and the data were self-report only. Results indicated that psychiatrists and psychologists considered testing to be equally important (i.e., ratings of “essential” or “recommended”), but psychologists reported using or ordering tests more frequently than psychiatrists. Both groups considered psychological testing to be more important for conducting criminal responsibility than trial competency evaluations. Objective personality testing, especially with the MMPI-2, was the most commonly used psychological test for either evaluation.
Lees-Haley and colleagues55 assessed the frequency of use of various neuropsychological tests in adult personal injury evaluations by a national sample of 100 expert neuropsychologists. Only reports of evaluations of traumatic brain injuries and toxic exposures were considered. The study did not rely on the self-report and recall of evaluators. The study observed that no single neuropsychologist used exactly the same test battery as any other and that the number of tests administered ranged from 1 to 32, with a mean of 11.7. The WAIS/WAIS-R, MMPI/MMPI-2, and the Wechsler Memory Scale-Revised WMS/WMS-R were the most frequent instruments used by examiners, and that finding represented a significant change from results of earlier studies of the use of neuropsychologist testing.
Boccaccini and Brodsky56 examined the reported frequency of psychological testing by a national sample of forensic psychologists in emotional injury cases. The sample (n = 80) reported the percentage of evaluators who used a particular instrument, the percentage of cases in which a particular instrument was used, and the reasons for selecting a particular instrument. Results indicated that there was substantial variation of test selection across evaluators and that there was no standard assessment procedure, even in emotional injury cases. The MMPI was the only instrument used by more than one-half of the evaluators. The evaluator's clinical experience with a specific test was as important as the availability of test norms in selecting tests. The Rorschach was the fourth most popular test used. Evaluators typically used four or five instruments in each evaluation. Actual frequency of test usage was not studied or reported.
Lally57 surveyed a national sample of board-certified forensic psychologists regarding the frequency of use and opinions about acceptability of psychological test instruments in six criminal forensic practice areas. The evaluators (n = 64) were asked to rate specific instruments as to whether they were recommended, acceptable, or unacceptable and to report the frequency of test use. Evaluators used four to six tests, depending on the type of forensic evaluation, for each evaluation, with at least minimal frequency. A test was categorized as acceptable when at least half of the sample rated it acceptable. Projective tests generally were viewed as unacceptable across forensic evaluations, although the Rorschach was viewed less unfavorably than the others. Actual test usage by the forensic psychologists was not calculated.
Much additional research is needed on actual and desired use of psychological testing and the frequency and rationale for test administration, each across a variety of forensic settings and content areas. Investigation regarding the incremental value (i.e., additional information) that is produced by testing in many forensic contexts must be conducted. In many forensic evaluation contexts, it remains uncertain what tests, if any, and administered by whom, are essential to a quality forensic evaluation.
Several studies regarding psychological test selection are discussed in the next section with regard to child custody evaluations.
Child Custody Evaluation Practices
Keilin and Bloom58 sent a written questionnaire to psychologists, psychiatrists, and masters-level practitioners who were child custody experts in private practice to assess their “activities, beliefs, and experiences relating to child custody” (Ref. 58, p 339). The evaluators (n = 82) provided data on the frequency of use of various custody evaluation procedures, such as individual interviews, observations of interactions, and school or home visits. They reported the use and frequency of use of various psychological tests for the parents and children, but did not report the overall frequency of testing. Overall, a mean of 19 hours was reportedly spent on each custody evaluation. The evaluators were asked to rate the importance of various decision-making criteria for custody in hypothetical cases based on a nine-point Likert scale (e.g., child's preferences, parental alienation, quality of bonding, parental stability). They reported their actual and preferred child custody recommendations.
Ackerman and Ackerman59 surveyed 201 doctoral-level psychologists from 39 states about 112 aspects of child custody evaluations. Almost all of the sample (88%) worked in private practice. The survey paralleled that of Keilin and Bloom,58 but excluded psychiatrists and social workers. Eight percent of respondents stated that they did not test children, and two percent did not test adults. Respondents reportedly spent a mean of 26 hours on each custody evaluation, excluding consulting with attorneys and court testimony.
LaFortune and Carpenter60 surveyed by mail mental health professionals from five states who were experienced child custody evaluators. No psychiatrists were among the sample (n = 165). The investigators did not review child custody reports, but, using a five-point scale, queried the sample about their training and experience, interaction with the legal profession, desirable characteristics of evaluators, preferences for retention, and the role of experts. The respondents reported spending 21 hours completing a custody evaluation, including report writing. An “advocacy index” was presented based on the sample's response to questions about involvement with the retaining attorney.
Bow and Quinnell61 evaluated psychologists' reported practices in child custody evaluations. Respondents (n = 198) were a national sample of clinical and forensic psychologists with master's or doctoral degrees, with 92 percent in private practice. Evaluators were asked to rank child custody evaluation procedures according to their importance, and state their preferred procedures and average time spent thereon. They rated the importance of statutory child custody criteria. Nearly the entire sample (94%) reported that they made an explicit custody/visitation recommendation in their reports. Custody reports averaged 21 pages, with a range of 4 to 80 pages. They also reported use of psychological tests and parent-rating scales. Psychological testing was considered as adjunctive to clinical interviews with the parents and children, rather than the primary procedure, though the parent was reportedly administered an MMPI in 88 percent of cases.62
Bow and Quinnell63 examined 52 child custody reports prepared by a national sample of doctoral-level psychologists. The reports ranged from 5 to 63 pages, with a mean of 24 pages. The report format and content varied widely. Specific problems noted by the investigators included failure to identify the specific evaluation procedures, the reason for the referral, or the specific testing used. Some reports did not use collateral contacts, demonstrated over-reliance on testing, were adult rather than child focused, or were conducted by an evaluator who had served as a therapist.
Surveys of Judges and Attorneys
Owens and colleagues64,65 surveyed and interviewed two small samples of trial judges in New York about their reasons for ordering trial competency evaluations and their satisfaction with them. Generally, they indicated that the evaluations were useful to them and had few complaints about them. However, some judges were interested in defendants' backgrounds and clinical data and often transformed the competency evaluation into an evaluation of violence risk, need for treatment, or sentencing.
LaFortune and Nicholson66 conducted via mail a semistructured survey of Oklahoma judges and attorneys about the adequacy of already submitted competency-to-stand-trial evaluations in two metropolitan areas. The subjects (n = 110) were asked to rate retrospectively six characteristics of expert reports provided by mental health professionals, using a five-point Likert scale: (1) timeliness of the reports; (2) familiarity of the examiners with legal criteria; (3) use of understandable language by the examiners; (4) presence of a factual basis for examiners' conclusions; (5) usefulness of reports in decision-making; and (6) overall quality of reports. Outpatient competency reports were judged to be of higher quality than inpatient reports. Subjects also reported their perceptions of the actual and optimal frequency of describing defendant characteristics in competency reports, typically indicating that reports tended to summarize rather than detail the relevant functional capacities. Generally, attorneys requested more specificity in reports, such as “more detailed descriptions of the factual bases for legal conclusions” (Ref. 66, p 248). Attorneys also stated their request for criminal responsibility information and opinions, even though that is an extraneous issue.
Redding and colleagues67 surveyed by mail experienced Virginia judges, prosecutors, and criminal defense attorneys about the testimony of mental health experts, by using a hypothetical insanity defense case. Questionnaires including vignettes were sent to the subjects, who were asked to rate the importance of eight types of mental health testimony evidence on a nine-point Likert scale. The subjects (n = 72) responded that they were primarily interested in the defendant's clinical diagnosis, whether the clinical condition met the relevant criminal responsibility standard, and the expert's ultimate opinion on the legal issue. The subjects were less interested in research or actuarial evidence from experts, such as diagnostic reliability and statistical crime data related to the diagnosis. Ratings of judges and prosecutors correlated strongly, but defense attorneys ranked clinical diagnosis, statistical data on diagnostic reliability, and explanations of criminal behavior to be more probative than did the other two groups. This difference probably reflects the divergent roles and needs of the subjects for mental health testimony. These responses from members of the bar highlight the hazards of over-reliance by judges and attorneys on conclusory legal testimony by mental health experts, as well as the need to educate the judiciary about the value of social science.
Dahir and colleagues68 conducted a telephone and mail survey of a national sample of 325 state trial judges in 1998 and 1999. While judges supported their gatekeeping role as defined by Daubert,53 they lacked the scientific literacy required by Daubert. Few judges understood concepts such as falsifiability or error rate.69 They therefore ruled, improperly, on the admissibility of psychological syndrome evidence in cases at bar, based primarily on the qualifications of the expert and general acceptance considerations.68 Additional training to improve judges' scientific understanding was recommended.
Analysis of Empirical Forensic Practice Studies
There are many limitations to the research that has been conducted regarding actual or reported forensic practice, though they provide some useful information. These empirical studies almost exclusively rely on expert reports, which, as noted, represent just one component of the evaluation. Characteristic of any documentation, the forensic report is only a window into the evaluation itself and is therefore limited. The report can only be as good as the evaluation that precedes it. We can work toward improving report content, writing, exposition, critical thinking, and decision making, but that effort goes only so far. In addition, the studies focus on criminal forensic evaluations in state court and child custody evaluations, with much less empirical literature on other forensic evaluations. Studies of trial competence evaluations predominate over those of criminal responsibility. Generalization of the study data are an important concern, given the limited sample sizes, sampling procedure, geography, legal jurisdiction, statutory requirements, characteristics of defendants and examiners, and location of evaluation in the hospital or community. Subject selection and recruitment are generally not random, and it is often unclear what would constitute a random sample of evaluators.
As yet, there are no published studies of observed forensic evaluations, or of observed evaluations in conjunction with written reports. The studies reveal problems in expert reports, such as the absence of stated reasoning to support the expert opinion (i.e., on causation, psycholegal function), the use of ultimate issue conclusions, the offering of opinions on extraneous areas, and the failure to acknowledge the limitations of the report or evaluation.45,70 Only some of the studies included designated quality measures, which are necessary to evaluate the adequacy of forensic reasoning and decision making in evaluation reports. Some studies inquire about the beliefs of evaluators, or their report of the frequency with which they use a test or instrument, but there is no accompanying evidence of the accuracy of these self-reports. Thresholds of evaluation quality probably differ from study to study.
Nevertheless, the studies raise substantial questions about the quality and thoroughness of criminal forensic evaluations, given the variability of the use of collateral information, contact with an attorney, psychological testing, and use of forensic assessment instruments.40,45,70 One of the most common report weaknesses is the failure to substantiate expert opinions and the related failure to relate psychopathology to expert opinions regarding psycholegal abilities.40,45,70 The relationships among symptom, diagnosis, psychological test-identified deficits, and psycholegal functional impairment is too often neglected.45,70 Failure to explicate these links comprehensively, including the evaluator's reasoning in reaching opinions, is likely to render the evaluation report less useful and cogent to the attorney or court that requested it. While there is sometimes consensus about evaluators' views regarding the desired content of reports, there is often lack of such consensus, depending on the specific content.
The empirical studies of psychological test usage reveal different types of information: reported frequency of test usage, importance of the test, acceptability of test usage, and reasons for test selection. Most of the studies are limited by reliance on the evaluators' retrospective memories for test selection and use, which are likely to be highly fallible. Test selection is reported without regard for the particular referral or clinical situation.62 In the only study to examine actual test usage, Bow and Quinnell63 examined actual reports submitted to them by evaluators. Some tests were judged acceptable for some forensic evaluations but not for others. Examining reported test use is not equivalent to judging whether a test satisfies general acceptance criteria under Frye52 or Daubert.53,57,71 Even if a test is judged as recommended by evaluators, such an opinion does not indicate that the standard of quality care requires such test use. It is noteworthy that the studies often reveal great divergence in test selection for forensic evaluations.
Practice Parameters and Guidelines for Forensic Evaluations
Examination of practice guidelines with regard to conducting forensic evaluations is relevant to determining evaluation quality and to quality improvement. Practice guidelines recommend specific professional conduct but usually differ from standards that are mandatory and prescriptive.72 They are designed to assist and educate clinicians and patients, rather than to legislate practice in a particular area. In contrast, medical review criteria are designed to evaluate health care outcomes and decisions and can differ from guidelines.73 Practice guidelines are typically prepared by many of the most knowledgeable experts in a particular area and are sponsored by professional medical societies, health plans, or government agencies. Ideally, they reflect professional consensus in a field. Practice guidelines are much more prevalent for medical treatment rather than for evaluation, and some organizations differentiate between treatment guidelines and practice guidelines.74 There are thousands of published practice guidelines in medicine, but relatively few for forensic evaluations.75
Existing practice guidelines pertinent to forensic evaluations include those for child custody,76–79 child and adolescent abuse victims,80 child protection,81 juvenile sex offenders,82 conduct disorder,83 post-traumatic stress disorder,84,85 criminal responsibility,86 and physicians' fitness for duty.87 The practice guideline regarding suicide, published by the American Psychiatric Association, also discusses suicide assessment.88 These guidelines are applicable to both generalist and specialist practitioners in the field.
Although practice parameters in forensic psychiatry can be useful in determining what constitutes a quality forensic evaluation, practice parameters are problematic in this regard. Most guidelines are written generally rather than in a detailed, specific, cookbook-like manner and cannot be readily translated or put into practice or even serve as practice review criteria.89 Many are presented as guidelines that are to be aspired to rather than adhered to. Practice guidelines generally are slow and expensive to develop and are soon out of date.90 Many guidelines are not evidence based, but only state the beliefs and practices of the authors. Many do not properly synthesize the available scientific evidence.91 Other barriers to successful implementation include that physicians are unaware of their existence, lack knowledge of their content, disagree with their parameters, have negative expectations regarding their likely outcome, and have difficulty changing practice habits.92 External barriers to their use include lack of time and resources, organizational constraints, lack of reimbursement, inability to generalize their use to every situation, fear of liability, and difficulty of use.92 Nevertheless, quality improvement activities in forensic evaluations must consider the extant forensic practice parameters. Much work remains to be accomplished in the development of forensic practice guidelines that reconcile the professional differences across all types of forensic practitioners regarding optimal practice.
Quality Concerns in Forensic Evaluations
The factors in quality of health care that were discussed earlier are relevant to our approach to quality assessment and improvement of forensic evaluations. As described earlier, there are several dimensions or domains of forensic evaluation quality. An evaluation, therefore, can be evaluated and improved in different respects. In this regard, the qualities of a forensic evaluation include timeliness, safety to evaluee and evaluator, effectiveness, efficacy, efficiency, evaluee-centeredness, and absence of disparity. Quality can be assessed by examination of the evaluation process, structure, and outcome. In these different respects, quality ratings can be individually or simultaneously conducted by the evaluator, evaluator's employer, client/payer, evaluee, professional peer, and others.
Quality indicators for forensic evaluations (i.e., general areas of interest but not specific measures) remain to be developed, after additional research. Typical indicators might include expertise (training and experience), the expert's role, data that form the basis of the evaluation, demeanor during interview, and expert decision-making. Specific measures regarding each indicator would then be developed. For the indicator of demeanor or communication with the evaluee, for instance, specific items to be measured could include whether the evaluator is disrespectful, argumentative, cynical, rejecting, angry, or biased.Table 1 lists several other sets of measures for evaluations. As in the general health care setting, achieving consensus regarding performance measures is a critical task for the forensic mental health field, and has yet to be achieved.
As in general medical care, there are many stakeholders with regard to forensic evaluations, and their repeated input is essential. Arguably, our peers are the best judges of the quality of our work—better than the retaining attorneys. Attorneys excessively rely on cross-examination of court testimony to expose the weaknesses and failures of forensic evaluations, with various degrees of success.93–95 Cross-examination is not, however, a quality improvement mechanism, but has different purposes. It is unsurprising, therefore, that attorneys, who are the ones who typically retain evaluators, are complacent about the deficient quality of forensic evaluations and reports. Nevertheless, soliciting client feedback through the use of satisfaction ratings could be useful. These ratings would be completed by the client (i.e., attorney, court, agency, evaluee, payer). Ratings could occur on a prospective or retrospective basis, after every evaluation, or a subsample of them. Table 2 lists several possible areas of inquiry. The primary purpose of the client satisfaction instrument should be for quality improvement rather than marketing.
A major lesson for those who work in the forensic mental health field, derived from the work on quality in health care, is that, rather than examine only the work of the individual clinician, we must consider the entire system of care. Third-party and other collateral data are essential for conducting forensic mental health evaluations, and barriers to obtaining such information must be eliminated.96,97 Process and outcome must be addressed as well. The costs of quality assessment and improvement must be considered, perhaps tempered by the ability in the future to adopt technology, such as software applications, to assist with review of expert reports in critical areas.
Raising the Quality of Forensic Evaluations
The general principles of quality improvement are useful in the context of advancing the quality of forensic evaluation. Initially, a quality improvement agenda must be established, with advance planning, motivation, and incentives. There must be financial support for quality improvement efforts. Quality indicators and measures must be developed and psychometrically evaluated. Other principles include the use of specific quality improvement intervention, regular audit and feedback, empirical assessment of change, and assessment of the barriers to change in individuals and the system.
As discussed, the empirical literature regarding forensic evaluations documents the need to raise the quality of evaluations and reports (Table 3). This is especially a concern regarding evaluations conducted by the generalist clinician who has no formal training or supervised experience in conducting forensic evaluations. Tolman and Mullendore102 reported that risk assessments performed by forensic specialists were of higher quality than those conducted by generalists with regard to the use of psychological assessment instruments; further, specialists were more familiar with the risk assessment literature than were generalists. In general medicine, research has been conducted on the relationship between board certification and clinical patient outcomes, with mixed results, though many studies reveal a positive correlation in this regard.103 As to forensic mental health evaluations, Otto and Heilbrun104 delineated several categories of forensic experts: accidental experts brought into court by their patient's attorney; legally informed clinicians without specialized forensic experience or training; proficient clinicians with supervised forensic experience; and specialist clinicians with the highest level of forensic experience, formal training, and board certification.104 Further research regarding the respective quality of evaluations conducted by these categories of experts is needed. Quality improvement approaches probably should be individualized for the different levels of practitioners, depending on the identified deficiencies.
Review of quality improvement activities in general medicine leads to the conclusion that no single, specific intervention is likely to be successful in improving the quality of forensic evaluation. Rather, a variety of interventions is appropriate, depending on the forensic service organization, model of service delivery, density of local forensically trained specialists and educators, and other factors. Table 4 lists some of these interventions. Successful interventions are not likely to be single events, but will require repetition and modification over time.
Perhaps the most significant problem area identified by the empirical studies of forensic reports is the evaluator's failure to determine the reasoning by which the opinions are reached.40,45 Evaluators have repeatedly failed to link psychopathology, psychological test results, psychiatric diagnosis, and legal functional impairments. This omission is characteristic of both forensically trained and untrained evaluators. For maximum effect, quality improvement activities must focus on this deficit, and the other frequently identified evaluation weaknesses, such as the use of collateral informants and data.96,97 Such an improvement will strengthen the evaluation's credibility and utility to attorneys and courts.
Several states have, by statute or general practice, developed credentialing procedures for forensic evaluators, usually in criminal or child custody cases.106 Those evaluators have attended mandatory training and may have completed a local examination. Recertification is, or could be, accomplished by incorporating quality audits of the evaluator's work product. The Commonwealth of Massachusetts, a leader in this respect, has had for many years a routine quality review and improvement component of expert reports for Designated Forensic Professionals who conduct criminal pretrial evaluations.107,108 Similarly, criminal court clinics are ideally situated to institute quality review activities of their employee-evaluators, as a condition of employment and promotion. Forensic credentialing for faculty can also be accomplished by university departments of psychiatry and hospital medical staff committees. The forensic mental health field could encourage legislation or policy that further promotes forensic credentialing, accompanied by mandatory quality audits and improvement activity. Volume credentialing requirements could also be set as they are for radiologists (i.e., mammography) and cardiothoracic surgeons, based on the reported association between volume and evaluator performance demonstrated even in mental health treatment.109–112 Specific credentialing for types of forensic evaluations could be implemented. For instance, mandatory forensic credentialing for participation in pretrial or postconviction death penalty cases is desirable, given the importance of such evaluations.113
The Maintenance of Certification program of the American Board of Medical Specialties is intended to increase the “accountability, service and quality of medical care” (Ref. 114, p 4). Components of this program include self-assessment, life-long learning, and performance in practice.103 Though problems with physician clinical performance assessment have been identified, the assessment of the competence and performance of forensic evaluators is certainly a feasible enterprise once consensus on performance measures has been obtained.115,116
Credentialing and recredentialing should require some review by peers of one's actual, not reported, work. Peer review of forensic testimony has been offered through the national and local efforts of members of the American Academy of Psychiatry and the Law (JAAPL), although few expert witnesses have participated.117,118 No attempt has been made for peer review of actual evaluations, which consumes more resources but is likely to be more valuable.
Role of AAPL
AAPL has many opportunities to initiate quality improvement activities in forensic psychiatry. Educational, consultative, and research activities are within the purview of AAPL. The organization, through its committees or new task forces, could identify quality indicators, measures, and barriers to change. AAPL could prepare resource documents, position statements, and practice guidelines regarding evaluation of quality and quality improvement. The Awards Committee could acknowledge forensic service programs that have instituted quality improvement activities, much as the American Psychiatric Association gives awards for exemplary clinical service programs. The newly created AAPL Institute for Education and Research could play a substantial role in funding researchers to develop quality indicators and measures or to train forensic evaluators to improve the quality of their work. Beyond AAPL's traditional involvement in education and research, an expanded role could include performing quality audits of forensic evaluations for organizations on request.
Liaison among forensic mental health organizations and groups is essential in the quality improvement enterprise, and AAPL should be a leader in this field. Forensic psychiatrists often work on individual cases in conjunction with clinical or neuropsychologists. Almost all of the empirical work on forensic reports has been conducted by psychologists, and the subjects of those investigations have predominately been psychologists. The studies reveal significant differences between psychiatrists and psychologists in the frequency of use of psychological testing and other evaluation modalities. Even among psychologists, selection of tests and forensic assessment instruments varies widely in forensic evaluations. The role and appropriateness of the psychological tests administered by psychiatrists has not been explicated. Consensus among differing forensic mental health practitioners in these areas is important.
Conclusions
Much work remains to be accomplished in improving the quality of forensic evaluations of evaluators with or without specific forensic training and experience.119 There are numerous barriers to quality improvement, including motivational or psychological factors. Of great concern is that practitioners are likely to be satisfied with the quality of their evaluations, believe that little improvement is needed, and fail to undertake self-assessment or quality improvement unless externally mandated.102 Imposing an external quality agenda or standard may be considered a threat to the evaluator's autonomy and narcissism.120 Within organizations, barriers include financial and bureaucratic resistance to change. Scientific barriers include the absence to date of established quality indicators and performance measures. There is resistance to change within the legal system, including infrequent use of court-appointed evaluators, except in child custody cases. Despite the adoption of gatekeeping efforts incorporated in the rules of evidence, our adversarial legal system has excessively relied on cross-examination as its primary filter for excluding pseudoscience and inadequate expertise.53 Legal issues such as confidentiality and privilege of evaluations and reports must be addressed before they are shared with quality improvement staff or peers.99 Further, due to extant financial incentives, there is, regrettably, a market for mediocre quality forensic evaluations conducted with questionable or antiquated methods.121,122
In the long-term future, we expect that quality improvement at a more sophisticated level will transcend anything discussed heretofore. We hope to move beyond the exclusive reliance on unguided clinical judgment as the basis for expert forensic opinion, given its well-documented fallibility.123,124 Improvements in clinical and forensic decision-making are essential developments for the field.125,126 Just as technological developments will no doubt be used to improve the quality of general health care, we hope to see computer-assisted decision-making for risk assessments and other forensic evaluations,127–130 use of forensic cognitive neuroscience tools and specific forensic assessment instruments,97 and objective psychophysiological testing. We need a more complete and scientific understanding of self-report,131 clinical judgment,124–126 heuristics,126,132 and memory,133,134 given their central roles in forensic evaluation content and expert decision making. We also need improved methods of detecting, quantifying, and correcting for bias and lack of objectivity in experts.135–138
- American Academy of Psychiatry and the Law