A comparative study of violence risk assessment tools: A systematic review and metaregression analysis of 68 studies involving 25,980 participants

https://doi.org/10.1016/j.cpr.2010.11.009Get rights and content

Abstract

There are a large number of structured instruments that assist in the assessment of antisocial, violent and sexual risk, and their use appears to be increasing in mental health and criminal justice settings. However, little is known about which commonly used instruments produce the highest rates of predictive validity, and whether overall rates of predictive validity differ by gender, ethnicity, outcome, and other study characteristics. We undertook a systematic review and meta-analysis of nine commonly used risk assessment instruments following PRISMA guidelines. We collected data from 68 studies based on 25,980 participants in 88 independent samples. For 54 of the samples, new tabular data was provided directly by authors. We used four outcome statistics to assess rates of predictive validity, and analyzed sources of heterogeneity using subgroup analysis and metaregression. A tool designed to detect violence risk in juveniles, the Structured Assessment of Violence Risk in Youth (SAVRY), produced the highest rates of predictive validity, while an instrument used to identify adults at risk for general offending, the Level of Service Inventory – Revised (LSI-R), and a personality scale commonly used for the purposes of risk assessment, the Psychopathy Checklist – Revised (PCL-R), produced the lowest. Instruments produced higher rates of predictive validity in older and in predominantly White samples. Risk assessment procedures and guidelines by mental health services and criminal justice systems may need review in light of these findings.

Research Highlights

► Meta-analysis of violence risk assessments from 68 studies based on 25,980 participants. ► The predictive validity of tools measured by four outcome measures varies widely. ► Risk measures are more valid in older samples and when predicting violent offending. ► Tools designed for specific populations perform better. ► Actuarial tools and structured clinical judgment perform comparably.

Introduction

Risk assessment tools assist in the identification and management of individuals at risk of harmful behaviour. Due to the potential utility of such tools, researchers have developed many risk assessment instruments, the manuals for which promise high rates of construct and predictive validity (Bonta, 2002). Recent meta-analyses have identified over 120 different risk assessment tools currently used in general and psychiatric settings (for a metareview, see Singh & Fazel, 2010). These measures range from internationally utilized tools such as the Historical, Clinical, Risk Management – 20 (HCR-20; Webster, Douglas, Eaves, & Hart, 1997) to locally developed and implemented risk measures such as the North Carolina Assessment of Risk (NCAR; Schwalbe, Fraser, Day, & Arnold, 2004). Given the large selection of tools available to general and secure hospitals and clinics, prisons, the courts, and other criminal justice settings, a central question is which measures have the highest rates of predictive accuracy. To date, no single risk assessment tool has been consistently shown to have superior ability to predict offending (Campbell et al., 2007, Gendreau et al., 2002, SBU, 2005, Walters, 2003), and several major uncertainties remain regarding the populations and settings in which risk assessments may be accurately used (Leistico et al., 2008, Guy et al., 2005, Schwalbe, 2008, Smith et al., 2009).

Such uncertainties are important given that risk assessment tools have been increasingly used to influence decisions regarding accessibility of inpatient and outpatient resources, civil commitment or preventative detention, parole and probation, and length of community supervision in many Western countries including the US (Cottle et al., 2001, Schwalbe, 2008), Canada (Gendreau et al., 1996, Hanson and Morton-Bourgon, 2007), UK (Kemshall, 2001, Khiroya et al., 2009), Sweden (SBU, 2005), Australia (Mercado & Ogloff, 2007), and New Zealand (Vess, 2008). Recent work has suggested that the influence of risk assessment tools appears to be growing in both general and forensic settings. For example, violence risk assessment is now recommended in clinical guidelines for the treatment of schizophrenia in the US and the UK (American Psychiatric Association, 2004, National Institute for Health and Clinical Excellence, 2009). In the US, risk assessment tools are used routinely in the mental health care systems of the majority of the 17 states that have civil commitment laws (Mercado & Ogloff, 2007). Recent studies in England have found that two thirds of mental health clinicians in general settings are using structured risk assessment forms (Higgins, Watts, Bindman, Slade & Thornicroft, 2005), as are clinicians working in over 70% of forensic psychiatric units (Khiroya et al., 2009). Risk measures are also being used with increasing regularity in both criminal and civil court cases in the US and the UK (DeMatteo and Edens, 2006, Young, 2009). The widespread, often legally required use of risk measures (Seto, 2005) necessitates the regular and high-quality review of the evidence base.

The research base on the predictive validity of risk assessment tools has expanded considerably; however, policymakers and clinicians continue to be faced with conflicting findings of primary and review literature on a number of central issues (Gendreau et al., 2000, Singh and Fazel, 2010). Key uncertainties include:

  • (1)

    Are there differences between the predictive validity of risk assessment instruments?

  • (2)

    Do risk assessment tools predict the likelihood of violence and offending with similar validity across demographic backgrounds?

  • (3)

    Do actuarial instruments or tools which employ structured clinical judgment produce higher rates of predictive validity?

There is contrasting evidence whether risk assessment tools are equally valid in men and women. Several recent reviews have found no difference in tool performance between the genders (e.g., Schwalbe, 2008, Smith et al., 2009). Schwalbe (2008) conducted a meta-analysis on the validity literature of risk assessment instruments adapted for use in juvenile justice systems, and found no differences in predictive validity based on gender. This finding was supported by a meta-analysis conducted by Smith, Cullen, and Latessa (2009), who found that the Level of Service Inventory – Revised (LSI-R) produced non-significantly different rates of predictive validity in men and women. In contrast, recent meta-analyses have found that the predictive validity of certain risk assessment tools is higher in juvenile men (Edens, Campbell & Weir, 2007) or in women (Leistico et al., 2008).

Another uncertainty is whether risk measures' predictive validity differs across ethnic backgrounds. There is evidence from primary studies and meta-analyses that risk assessment tools provide more accurate risk predictions for White participants than other ethnic backgrounds (Bhui, 1999, Edens et al., 2007, Långström, 2004, Leistico et al., 2008). This variation may be due to differences in the base rate of offending among individuals of different ethnicities (Federal Bureau of Investigation, 2002). These differences are seen in inpatient settings (Fujii et al., 2005, Hoptman et al., 1999, Lawson et al., 1984, McNiel and Binder, 1995, Wang and Diamon, 1999) and on discharge into the community (Lidz, Mulvey, & Gardner, 1993). Contrary evidence has been provided by reviews which have assessed the moderating influence of ethnicity on predictive validity rates in White, Black, Hispanic, Asian, and Aboriginal participants and have found no differences (Guy et al., 2005, Edens and Campbell, 2007, Schwalbe, 2007, Skeem et al., 2004).

Previous meta-analyses (e.g., Blair et al., 2008, Guy, 2008, Leistico et al., 2008) have found that participant age does not affect the predictive validity of risk assessment tools. However, epidemiological investigations and reviews (e.g., Gendreau et al., 1996) have found that younger age is a significant risk factor for offending. Therefore, we investigated the influence of age on predictive validity in the present meta-analysis.

Actuarial risk assessment tools estimate the likelihood of misconduct by assigning numerical values to risk factors associated with offending. These numbers are then combined using a statistical algorithm to translate an individual's total score into a probabilistic estimate of offending. The actuarial approach is an attempt to ensure that each individual is appraised using the same criteria and, in doing so, can be directly compared to others who have had the same tool administered regardless of who conducted the assessment.

The individual administering the assessment is thought to play a critical role when clinically based instruments are used. Broadly, clinical approaches to risk assessment can be dichotomized into unstructured clinical judgment and structured clinical judgment. Unstructured clinical judgment refers to a clinician's subjective prediction of whether an individual is likely to offend or not. No pre-specified set of risk factors is used to guide the clinician's analysis, which relies on professional experience for accuracy in assessing the likelihood of offending (Hanson, 1998). Recent reviews have suggested that this form of risk assessment has poor predictive validity (Daniels, 2005, Hanson and Morton-Bourgon, 2009). The poor performance of unstructured clinical judgment is thought to be a consequence of its reliance on subjective risk estimates that lack inter-rater and test–retest reliability (Hanson & Morton-Bourgon, 2009).

To increase construct validity and reliability, the authors of risk assessment tools developed new measures that adopt an approach known as structured clinical judgment (SCJ). In this approach, clinicians use empirically-based risk factors to guide their predictions of an individual's risk for offending (Douglas, Cox & Webster, 1999). Advocates of this approach believe that SCJ does more than simply assess risk, it provides information that can be used for treatment planning and risk management (Douglas et al., 1999, Heilbrun, 1997). While past reviews have provided evidence that actuarial tools produce higher rates of predictive validity than instruments which rely on structured clinical judgment (Hanson and Morton-Bourgon, 2009, Hanson and Morton-Bourgon, 2007), other researchers have presented evidence that both forms of risk assessment produce equally valid predictions (Guy, 2008). Due to this uncertainty, we investigated this issue in the present meta-analysis.

Secondary uncertainties in the field of risk assessment include whether study design characteristics such as study setting, prospective vs. retrospective design, or length of follow-up influence predictive validity (Singh & Fazel, 2010).

Prisons, psychiatric hospitals, courts, and the community are typical settings in most research in the forensic risk assessment literature (Bjørkly, 1995, DeMatteo and Edens, 2006, Edens, 2001, Bauer et al., 2003). Meta-analytic evidence regarding the moderating role of study setting on effect size varies (Leistico et al., 2008, Skeem et al., 2004). Some experts (e.g., Edens, Skeem, Cruise & Cauffman, 2001) suggest that differences in the accuracy of risk assessments may be attributed to the contextual differences in these study settings. Due to these differences, one measure may be superior to others in one study setting but not in another (Hanson & Morton-Bourgon, 2007).

Whether a study has a prospective or retrospective design may influence predictive validity findings. Being that the primary goal of risk assessment is to predict future offending, some researchers have stated that prospective research is not just appropriate, but necessary to establish a tool's predictive validity (Caldwell, Bogat & Davidson, 1988). However, strengths of retrospective study designs are that researchers do not have to wait for time to elapse before they investigate whether the studied individuals reoffended or not. This methodology is particularly useful with low base rate outcomes such as violent crime (Maden, 2001). Both actuarial and clinically based instruments can be used retrospectively. The latter tools can be scored using file information from sources such as psychological reports, institutional files, and/or court reports (de Vogel, de Ruiter, Hildebrand, Bos & van de Ven, 2004).

The evidence regarding the long-term efficacy of risk assessment tools is mixed. Using ROC analysis, Mossman (2000) concluded that accurate long-term predictions of violence were possible. In support, several recent meta-analyses have found that length of follow-up does not moderate effect size (Blair et al., 2008, Edens and Campbell, 2007, Edens et al., 2007, Schwalbe, 2007). Contrasting evidence has been found by reviews which have reported higher rates of predictive validity for studies with longer follow-up periods (Leistico et al., 2008, McCann, 2006, Smith et al., 2009). Other researchers have found tools to be valid only in the short-term and, even then, only at modest levels (Bauer et al., 2003, Sreenivasan et al., 2000). Given that studies often follow participants for different lengths of time and given that effect size may vary with time at risk, the role of this variable needs to be examined (Cottle, Lee, & Heilbrun, 2001). Very few tools have been validated for short follow-up time, such as hours, days or one to two weeks, which typically is a most relevant timeframe in clinical real-world decisions-making situations (SBU, 2005).

In summary, despite the increasing use and potential importance of risk assessment instruments, it is unclear which instruments have the highest rates of predictive validity, and whether these rates differ by important demographic and study design characteristics. We have therefore undertaken a systematic review and meta-analysis to explore rates of predictive validity in commonly used instruments and to assess the potential sources of heterogeneity outlined above.

Section snippets

Review protocol

The Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Statement (Moher, Liberati, Tetzlaff & Altman, 2009), a 27-item checklist of review characteristics designed to enable a transparent and consistent reporting of results, was followed.

Tool selection

Our goal was to analyze the predictive validity of the most commonly used risk assessment tools in the field today. Based on reviews of the literature (e.g., Bonta, 2002, Doren, 2002, Kemshall, 2001, Singh and Fazel, 2010), we

Descriptive characteristics

Information was collected on 25,980 participants in 88 independent samples from 68 studies. Information from 54 (n = 15,775; 60.7%) of the samples was specifically obtained from study authors for the purposes of this synthesis. Included studies were conducted in 13 countries: Argentina (n = 199; 0.8%), Austria (n = 799; 3.1%), Belgium (n = 732; 2.8%), Canada (n = 9,112; 35.1%), Denmark (n = 304; 1.2%), Finland (n = 208; 0.8%), Germany (n = 1,337; 5.1%), The Netherlands (n = 622; 2.4%), New Zealand (n = 220; 0.8%),

Discussion

This systematic review and meta-analysis investigated the predictive validity of nine commonly used risk assessment tools: HCR-20, LSI-R, PCL-R, SARA, SAVRY, SORAG, Static-99, SVR-20, and VRAG. We collected data from 68 studies constituting 88 independent samples. These samples included a total of 25,980 participants from 13 countries. Information on 61% of the participants was specifically obtained from study authors for the purposes of this synthesis. We investigated three main research

Conclusion

Violence risk assessment tools are increasingly used to make important decisions in clinical and criminal justice settings. The present meta-analysis found that the predictive validity of commonly used risk assessment measures varies widely. Our findings suggest that the closer the demographic characteristics of the tested sample are to the original validation sample of the tool, the higher the rate of predictive validity. We also found that tools designed for more specific populations were

Funding

There was no specific funding for this study.

Conflicts of Interest

None declared.

Acknowledgements

The authors thank Dr. Helen Doll for her assistance with statistical analysis, Professor Klaus Ebmeier for his assistance in the translation of several articles in German, and Sophie Westwood for her assistance with the inter-rater reliability check. The following authors are thanked for providing studies and/or tabular data for the analyses: April Beckmann, Sarah Beggs, Susanne Bengtson Pedersen, Klaus-Peter Dahle, Rebecca Dempster, Mairead Dolan, Kevin Douglas, Reinhard Eher, Jorge Folino,

References (99)

  • J. Bonta

    Offender risk assessment: Guidelines for selection and use

    Criminal Justice and Behavior

    (2002)
  • R. Borum et al.

    Manual for the structured assessment of violence risk in youth (SAVRY)

    (2002)
  • R. Borum et al.
  • P.M. Bossuyt et al.

    Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD initiative

    British Medical Journal

    (2003)
  • R.A. Caldwell et al.

    The assessment of child abuse potential and the prevention of child abuse and neglect: A policy analysis

    American Journal of Community Psychology

    (1988)
  • M. Campbell et al.
  • H. Cleckley

    The mask of sanity

    (1941)
  • J. Cohen

    A coefficient of agreement for nominal scales

    Educational and Psychological Measurement

    (1960)
  • J. Cohen
  • C.C. Cottle et al.

    The prediction of criminal recidivism in juveniles: A meta-analysis

    Criminal Justice and Behavior

    (2001)
  • Daniels, B. A. (2005). Sex offender risk assessment: Evaluation and innovation. Unpublished doctoral dissertation,...
  • V. de Vogel et al.

    Type of discharge and risk of recidivism measured by the HCR-20: A retrospective study in a Dutch sample of treated forensic psychiatric patients

    International Journal of Forensic Mental Health

    (2004)
  • J.J. Deeks

    Systematic reviews of evaluation of diagnostic and screening tests

  • D. DeMatteo et al.

    The role and relevance of the Psychopathy Checklist – Revised in court: A case law survey of U.S. courts (1991-2004)

    Psychology, Public Policy, and Law

    (2006)
  • M. Dernevick et al.

    The use of psychiatric and psychological evidence in the assessment of terrorist offenders

    Journal of Forensic Psychiatry and Psychology

    (2010)
  • W.L. Deville et al.

    Conducting systematic reviews of diagnostic studies: Didactic guidelines

    BMC Medical Research Methodology

    (2002)
  • D.M. Doren

    Evaluating sex offenders: A manual for civil commitments and beyond

    (2002)
  • K.S. Douglas et al.

    Violence risk assessment: Science and practice

    Legal and Criminological Psychology

    (1999)
  • K.S. Douglas et al.

    The HCR-20 violence risk assessment scheme: Concurrent validity in a sample of incarcerated offenders

    Criminal Justice and Behavior

    (1999)
  • J.F. Edens

    Misuses of the Hare Psychopathy Checklist – Revised in court

    Journal of Interpersonal Violence

    (2001)
  • J. Edens et al.

    Identifying youths at risk for institutional misconduct: A meta-analytic investigation of the Psychopathy Checklist measures

    Psychological Services

    (2007)
  • J. Edens et al.

    Youth psychopathy and criminal recidivism: A meta-analysis of the Psychopathy Checklist measures

    Law and Human Behavior

    (2007)
  • J. Edens et al.

    The assessment of juvenile psychopathy and its association with violence: A critical review

    Behavioral Sciences and the Law

    (2001)
  • Editorial

    Should protocols for observational research be registered?

    Lancet

    (2010)
  • S. Fazel et al.

    Risk factors for criminal recidivism in older sexual offenders

    Sexual Abuse: A Journal of Research and Treatment

    (2006)
  • Federal Bureau of Investigation

    Uniform crime reports for the United States

    (2002)
  • D. Fujii et al.

    Ethnic differences in violence risk prediction of psychiatric inpatients using the Historical Clinical Risk Management – 20

    Psychiatric Services

    (2005)
  • P. Gendreau et al.
  • P. Gendreau et al.

    Cumulating knowledge: How meta-analysis can serve the needs of correctional clinicians and policy-makers

    (2000)
  • P. Gendreau et al.

    Is the PCL-R really the “unparalleled” measure of offender risk?: A lesson in knowledge cumulation

    Criminal Justice and Behavior

    (2002)
  • Guy, L. (2008). Performance indicators of the structured professional judgment approach for assessing risk for violence...
  • L. Guy et al.

    Does psychopathy predict institutional misconduct among adults? A meta-analytic investigation

    Journal of Consulting and Clinical Psychology

    (2005)
  • R.K. Hanson

    What do we know about sex offender risk assessment?

    Psychology, Public Policy, and Law

    (1998)
  • R.K. Hanson et al.
  • R.K. Hanson et al.
  • R.K. Hanson et al.

    The accuracy of recidivism risk assessments for sexual offenders: A meta-analysis of 118 prediction studies

    Psychological Assessment

    (2009)
  • R.K. Hanson et al.

    Static-99: Improving actuarial risk assessments for sex offenders (User Report 99-02)

    (1999)
  • R.D. Hare

    The Hare Psychopathy Checklist-Revised (PCL-R)

    (1991)
  • R.D. Hare

    The Hare Psychopathy Checklist – Revised

    (2003)
  • Cited by (453)

    View all citing articles on Scopus

    These references are studies included in the meta-analysis.

    View full text