Abstract
The Static-99, an actuarial rating method, is employed to conduct sexual violence risk assessment in legal contexts. The proponents of the Static-99 dismiss clinical judgment as not empirical. Two elements must be present to apply an actuarial risk model to a specific individual: sample representativeness and uniform measurement of outcome. This review demonstrates that both of these elements are lacking in the normative studies of the Static-99 and its revised version, the Static-99R. Studies conducted since the publication of the Static-99 have not replicated the original norms. Sexual recidivism rates for the same Static-99 score vary widely, from low to high, depending on the sample used. A hypothetical case example is presented to illustrate how the solitary application of the Static-99 or Static-99R recidivism rates to the exclusion of salient clinical factors for identifying sexual dangerousness can have serious consequences for public safety.
Sexually violent predator (SVP) or sexually dangerous person (SDP) laws seek to identify a small group of extremely dangerous incarcerated sexual offenders who represent a threat to public safety, if released from custody. The laws allow for the indefinite civil psychiatric commitment of sex offenders after their prison sentences have been served. They require the presence of a mental disorder and a risk assessment of future sexual violence. The risk assessment can be performed by using clinical interview methods, an actuarial approach, or a combination of both. Recently, increasing claims have been made on behalf of the actuarial approach at the expense of clinical evaluation. It has been argued that this quantitative method is objective and accurate and obviates the need for clinical judgment, which is viewed as subjective and potentially misleading.1–3
The actuarial approach uses a rating instrument with statistically identified risk factors and provides a precise numerical risk score for each individual. That score is then translated into qualitative descriptors such as low-, moderate-, or high-risk, on the basis of the predicted rates of sexual recidivism (typically defined as charges or convictions) associated with each score. The actuarial method requires no clinical input, and in fact, such input is viewed as adding “noise” to the assessment.3
The most widely used and researched sex offender risk assessment actuarial is the Static-99.4,5 In the decade since its inception, there has been a marked reduction in the value placed on clinical judgment in making risk assessments as well as a corresponding increase in the reliance on, and claims for, the actuarial approach. Recently, the Static-99 was revised and reissued as the Static-99R. Although replication studies have demonstrated various sexual recidivism rates associated with both the Static-99 and the Static-99R norms, there remains a heavy reliance on these actuarial instruments in determining an individual sex offender's specific risk for future sexual violence. Decisions regarding the involuntary commitment of SVP/SDP offenders carry the serious responsibility of balancing public safety needs against an individual's right to liberty. In this light, we will review the empirical bases for the Static-99 and Static-99R and discuss the appropriate role of actuarial assessment in the evaluation of risk of recidivism.
Static-99/Static-99R: Changing Group Norms
The Static-99 has 10 items that were derived empirically, first through the use of a meta-analysis3,4 and then through the amalgamation of an initial Canadian actuarial instrument (Rapid Risk Assessment for Sexual Offense Recidivism (RRASOR)6) with a United Kingdom rating tool (Structured Anchored Clinical Judgment-Minimum (SACJ-Min)7). Total scores on the Static-99 are translated into risk categories based on each score's statistically estimated (by survival analysis) sexual recidivism risk. The initial or original normative study estimates were based on three Canadian development samples and one U.K. validation sample of persons released from custody, largely in the 1970s. Risk assessment recommendations included the Static-99 as the rating tool and reporting of risk on the basis of these Canadian and U.K. norms. The codebook for the Static-99 gave both sexual recidivism risk percentages and relative ranking by risk (i.e., low, moderate, high) for 5, 10, and 15 years following release from custody.4
Although it has been argued that Canadian and U.K. norms from the 1970s and earlier may not accurately represent the risk in some United States samples8 or may overestimate the risk in minority groups,9 the Static-99 scores have assumed an unassailable quality as almost the last word in risk assessment. For example, the Static-99 has been adopted in several states as the risk measure employed by Departments of Correction and is a mandated part of SVP assessments in at least one state.10 Despite emerging contradictory findings about the degree of accuracy of the Static-99 prediction as well as wide divergence in risk percentages in replication samples when compared with the original norms, the tacit implication was that the Static-99 recidivism estimate was accurate, given the statistical effect sizes associated with predictive accuracy for actuarial estimates. Moreover, the Static-99 was understood to be the preferred actuarial method given its wide use and large replication pool.
However, a shift in reliance on the original Static-99 norms occurred when the crafters of the tool, in their own analysis of newer Static-99 studies, found that the recidivism rates reported in the original Static-99 norms were not holding firm; that is, they were not being replicated.11 In response, the authors stated that the original norms were based on recidivism estimates in the 1970s and 1980s, and because there had been reductions in Canadian12 and U.S.13 sexual recidivism rates since then, renorming was warranted. For almost a decade, no comments have been made as to the reliance on these very same risk percentages that are now viewed as unstable.
Despite larger Static-99 datasets, Helmus et al.11 reported that they found significant differences in recidivism rates in their samples associated with the same Static-99 score. What followed next was a flurry of norms posted on the Static99.org website, with both a paper14 and professional workshops,15,16 providing evaluators with changing instructions on how to use these norms. The solution offered to understand the different recidivism rates was to conduct an analysis on subgroups derived from the overall or “routine” sample that included sex offenders from various countries. The subgroup analysis focused on the sexual recidivism rates found among high-risk, untreated sex offenders compared with treated sex offenders, with both groups coming from Canada and thus representing only Canadian norms. It was suggested that this method would provide the boundaries of risk and that professional judgment would be necessary in determining an individual's risk level in this continuum. However, there was a caution that the need for professional judgment was recommended conditionally until more research was conducted.
In the fall of 2009, the newest iteration, the Static-99R,15 was presented and touted to replace the Static-99. The Static-99R amended the first item, “age” to “age at release,” to offer age-corrected scores and a new set of norms. That process, also statistically derived, attempted to take into account the finding of lowered rates of sexual recidivism among older sex offender age groups (e.g., 50 years and older, and 60 years and older). A caveat regarding this age correction is that age at release applied only to those sex offenders pending release who were in custody for a sexual offense. For those who were in custody for other offenses and whose sexual offense was historical, the age at release is calculated as the age at the time of their last sex offense.
In addition, the original Static-99 risk scores were identified through survival analysis, which includes “time to offense” as a factor in the hazard estimate for recidivism.17 Survival analysis is a popular method of measuring recidivism risk because how quickly someone is likely to reoffend after release is of interest. The downside of Cox regression for developing risk scores (survival analysis) is that it uses a hazard of recidivism rather than an absolute base rate to develop risk/hazard estimates; consequently, the hazard ratios are relative rather than absolute (actual recidivism rate in the sample). To address this deficiency, the risk scores of the Static-99R were developed using logistic regression. Logistic regression has the advantage of yielding an absolute risk ratio but constrains sample sizes and does not incorporate time to offense in the risk estimates. Ironically, after the resulting risk ratios are developed and validated with the different norming groups, the ultimate recommendation is to use relative rather than absolute risk ratios anyway. This method leaves open the possibility that risk or hazard scores based on Cox regression analyses may then ultimately be superior, as they will not artificially constrain the sample in the analyses and will account for time to offense.
Further, the Static-99R samples were subdivided into several normative groups. The first was labeled “routine correctional samples”; it was described as consisting of eight samples of sex offenders from Canada, the United States, England, Sweden, and Austria, and comprised sex offenders selected from a correctional system. The next group, called “nonroutine,” represented all samples of sex offenders who were pre-selected in some way. The nonroutine group was further divided into two groups: those pre-selected for treatment needs, which consisted of six samples of persons selected for treatment (although not necessarily receiving treatment), and those pre-selected for high risk/high need, which consisted of individuals referred for services at a forensic psychiatric facility.
Of note, both the Static-99 and the Static-99R normative data are based overwhelmingly on unpublished findings. Many are doctoral or master's theses, with limited details as to who comprised the study sample.11 The renorming of the Static-99R consisted of 23 recidivism samples (per information from an unpublished master's thesis by Helmus14), with the samples ranging in size from 175 to 1,278. There were six samples from the United States and 10 from Canada, with the remaining from the United Kingdom, New Zealand, Denmark and Sweden (largest N). Data from the United States consisted of the following samples: Bartosh et al.,18 a peer reviewed Arizona study of a corrections sample (N = 186); Epperson,19 a North Dakota study that was an unpublished report of those released either from custody or probation (N = 178); Johansen,20 an unpublished dissertation of a Washington State corrections study of inmates in sex offender treatment (N = 273); Knight and Thornton,21 a document submitted to the Department of Justice of a Massachusetts study of 466 individuals forensically hospitalized at Bridgewater State Hospital between 1959 and 1984; Saum,22 an unpublished dissertation of another North Dakota study of sex offenders under treatment at the North Dakota Department of Human Services (N = 175); and Swinburne et al.,23 unpublished data presented as a poster at an Association for the Treatment of Sex Abusers Conference regarding a study of 681 Minnesota sex offenders in an outpatient sex offender treatment program.
As noted, only one of the U.S. studies, Bartosh et al.,18 was published in a peer-reviewed journal. Another study used in this renorming process, which presumably was intended to include studies of closer to contemporary releases, is the Knight and Thornton21 Bridgewater State Hospital study based on data on offenders released from the forensic hospital between 1959 and 1984. Such data appear more likely to represent the rejected older norms rather than do the current samples. As can also be observed in the U.S. studies, each dataset represents different custodial status: probation; release from a forensic hospital; prison release; and outpatient sex offender treatment. Wide variability in the samples undermines the rationale for new norms—that is, those with better sample representativeness. It is difficult to understand the commonality between Massachusetts insanity acquittees hospitalized more than 20 years ago and released from hospital commitment and the more contemporary Washington State sex offenders who were in outpatient sex offender treatment.
Another concern related to the new norms is that within a one-year period, there have been multiple postings of even “newer” norms on the Static99 website. The justification offered for changing the norms just months later is that such changes reflect larger samples and new statistical manipulations. These changes have then been followed by declarations that this process of rating alone is sufficient to describe risk.15,16
A hypothetical case example follows. It was written to illustrate the difficulties in applying the Static-99 and Static-99R in individual sex offender risk assessments.
Hypothetical Case Example
Mr. X. is a 62-year-old man housed at a forensic state hospital as a pre-commitment sexually violent predator awaiting his commitment trial. His first sexual offense occurred when he was 20 and involved the rape and strangling of a 16-year-old girlfriend, whom he rendered unconscious and then sexually assaulted. He was sentenced to a three-year-to-life term; he served one and one-half years and was released under parole supervision. At age 24, while on parole, he was arrested for the rape of a 23-year-old woman. He was charged with and subsequently convicted of three counts of sodomy, one count of penetration with a foreign object (a bottle), and one count of assault with intent to cause great bodily injury. During this rape, Mr. X. strangled the victim with nylons and cut her throat. The woman was a patron at a bar where Mr. X. was drinking. He followed her to her car and asked her for directions. He then pushed her into her car, beat her head with his fists, and took her to a remote location where the sexual assault occurred. The victim was unconscious through much of the attack. The victim was then thrown into the trunk and was discovered in the early morning hours by a jogger who heard her moving inside the trunk. During the same period, Mr. X. was noted to have perpetrated a similar set of offenses against three other females, but prosecution was not pursued because these victims were blindfolded and could not identify him. He was given a term of 30 years in state prison. He was released on parole almost 16 years later when he was 40-years-old. While on parole, he became romantically and sexually involved with his 65-year-old landlady. Three months into their relationship, she reported that Mr. X. tied her up, beat her about the head to the point that she lost consciousness, and inserted a rolling pin into her vagina. Mr. X. was charged with and convicted of one count of assault and one count of penetration with a foreign object. He received a prison term of 20 years. At age 55, he became eligible for parole, was found to meet the SVP criteria, and, subsequent to a probable-cause hearing, was sent to a forensic hospital. In the seven years since his forensic hospital placement, Mr. X. has participated diligently in sex offender treatment. He admitted to having a total of 20 victims. He said that in his youth he was aroused by strangulation, but no longer. Mr. X. stated he has erectile dysfunction and the medical conditions of an enlarged prostate, diabetes, and hypertension. In addition, his mobility is limited by severe lower extremity pain secondary to diabetic neuropathy. His recent penile plethysmography (PPG) findings demonstrated arousal to coerced sexual behavior via audio presentations of rape. A sexual history polygraph suggested that he was forthcoming in his report of victims and his sexual interests. His treatment providers have uniformly characterized Mr. X. as highly motivated in treatment and superior in his participation. He has completed all the hospital phases of treatment, although he was not mandated to do so. The Static-99 rating based on Mr. X.'s history is shown in Table 1.
The following Static-99 norms could be applied. Original Static-99: 39 percent at 5 years and 45 percent at 10 years; Helmus (2008)14 new norm: 27 percent at 5 years and 33.5 percent at 10 years; Hanson and Thornton (2008)14: 29 percent at 5 years and 44.8 percent at 10 years; CSC sample (treatment, 2008): 17.4 percent at 5 years and 23.0 percent at 10 years; high risk (untreated, 2008)14: 32.7 percent at 5 years and 42.8 percent at 10 years; and age correction (Hanson, 2007)14: 9.1 percent at 5 years for those 60+.
These norms suggest that Mr. X.'s score is similar to those of sex offenders scoring a 7 and that his risk at five years falls somewhere between 9.1 percent and 39 percent or 9.1 percent and 32.7 percent if we exclude the old norms (Original Static-99) as instructed to do so by the crafters of the Static-99. Qualitative labels for these risk percentages would be that Mr. X. is somewhere between a low and a high risk. Which of these various norms should apply? In looking to the developers of the Static-99 for guidance, the evaluator is instructed to use professional judgment.11
The Static-99R rating based on Mr. X.'s history is shown in Table 2. The norms that could be applied are as follows. Routine sample relative risk ratio: 1.71; routine sample logistic regression sexual recidivism: 8.7 percent; nonroutine overall sample: 15.4 percent sexual recidivism at 5 years and 22.6 percent sexual recidivism at 10 years; pre-selected treatment need: 12.3 percent sexual recidivism at 5 years and 18.2 percent sexual recidivism at 10 years; and high risk/high need: 20.1 percent sexual recidivism at 5 years and 29.6 percent sexual recidivism at 10 years.
Which norms of the Static-99R should the evaluator use? Is Mr. X. more similar to the Bridgewater Massachusetts offenders who represent the U.S. sample of high risk/high need? Or is he more similar to the routine sample? Or should he be viewed as nonroutine? The difference is considerable in terms of reported sexual recidivism: 8.7 percent (low risk) to 29.6 percent (moderate risk) by 10 years. But is a 10-year estimate even reasonable given that there are very few rapists over the age of 70 (Mr. X. would be 72 years old in 10 years)? However, Mr. X. was pre-selected for treatment while in custody (as this is when it could be argued that the SVP evaluation was initiated) and therefore, is his level really 12.3 percent at 5 years? The empirically derived answer is this: by the Static-99R estimates, Mr. X. falls somewhere between low and high risk.
Conclusions
Boccaccini et al.10 note that despite widespread use of actuarials such as the Static-99, there has been little examination of how well they work in specific contexts. Nonetheless, the Static-99 has become a mandated part of risk assessment for sex offenders and is used in determining methods of monitoring such offenders when released to the community (e.g., sex offender registration, GPS monitoring, SVP/SDP evaluations). Its prominence is reflected in the fact that at least 30 states have reported using the Static-99 specifically in sex offender supervision decision-making and that professional groups (such as the Association for the Treatment of Sexual Abusers) as well as publications specifically cite the use of actuarial risk ratings in their guideline of how to conduct such assessments.10
Several concerns are raised about this domination of the Static-99/Static-99R in sexual recidivism risk assessments, not the least of which is the applicability of group norms to individuals differing from the samples on which the risk values were derived. Apart from the dizzying number of risk scores and qualifications, the validity of the risk scores themselves is dubious, given different definitions of recidivism in the norming samples, lack of clarity in statistical methods, and an overreliance on unpublished manuscripts and presentations to document methods. For the Static-99, 13 of the samples used charges as a recidivism indicator and 15 of the samples used convictions. Other studies comparing the use of charges versus convictions have found that convictions underestimate recidivism when compared to charges.19,20 It is then difficult to know how to interpret the recidivism rates reported for risk scores when they are not clearly tied to one or the other type of recidivism. The Static-99/Static-99R risk estimates are not stable, and their applicability to persons who harbor idiosyncratic or difficult to quantify risk factors not described in existing studies (e.g., persistent or outlier sexual deviance such as klismaphilia or multiple paraphilias) is highly questionable.
Although the developers of the Static-99 have acknowledged the role of professional judgments, they have consistently employed statistics to assail the use of anything other than an actuarial (and Static-99)-based assessment.3 Structured professional judgments were reported to be weak in predicting sexual recidivism, whereas the actuarials were viewed as superior. While “professional” judgment studies were few in number (three) and were dominated by correctional or parole officers comprising the professionals making the judgment, the implication was that “noise” was added to the Static-99 risk assessment when “factors external to the actuarial” were used to adjust the final Static-99 rating. The dismissal of professional judgment as inaccurate by those who are considered prominent sexual recidivism researchers3 can be used to persuade triers of fact to reject the clinical opinions of testifying experts. The expression of risk in numerical form, whether it is a risk percentage, a d-statistic, or a receiver operating characteristic (ROC) or risk ratio, gives the trier of fact the impression of the precision of risk to a greater degree of accuracy than actually exists. Ongoing research using the Static-99 across jurisdictions has led to variable findings as to the accuracy of this tool, particularly with respect to risk percentages. Moreover, as noted by Boccaccini et al.,10 there are only two peer-reviewed published studies related to the use of the Static-99 in U.S. samples. However, only one was included in the Static-99 renorming. Needed more than a critique of the analyses used, however, is a critique of the forums in which this vital information has been presented. Information on the statistical analyses undergirding the new norms is reported in several unpublished arenas. The serious nature of the sentencing decisions being made using these norms requires that these risk estimates are getting it right.
Despite its limitations, this approach remains robust, largely because of the lure of quantification. Unlike other arenas in mental health that seek to address potential risk of harm (e.g., risk for suicide) where individual factors are weighted into the assessment, sexual recidivism risk seems to be stalled in “actuarial-land” with the veneer of “quantification” belied by shifting “norms.” Although they purport to be empirically based, the current Static-99 and its newer iteration, the Static-99R, violate the basic tenets of evidence-based medicine that require reasoned, not mechanical, application of group findings to the individual. Two core elements must be present to apply an actuarial risk model to a specific individual: sample representativeness and uniform measurement of outcome. Both of these elements are lacking in Static-99 and Static-99R research reviews. Thus, a call for caution must be sounded when using these tools to make weighty decisions involving an individual's liberty and the protection of public safety.
Several contrary articulations by Helmus and colleagues have been reported as to the use of clinical judgment. In their 2009 report, they noted: Differences in recidivism within each Static-99 score on the basis of the same offender type suggest that evaluators can no longer, in an unqualified way, associate a single Static-99 score with a single recidivism estimate. Instead, each Static-99 score is associated with a range of recidivism estimates, and evaluators must make a separate judgment as to where a particular offender lies within that range. This new conceptualization of recidivism norms forces the evaluator to consider factors external to the risk scale [Ref. 11, p 41].
Yet, Hanson and Morton-Bourgon3 also stated that professional judgment is weak in predicting sexual recidivism. In addition, Helmus and colleagues, in attempting to advise evaluators on which of the large array of Static-99 norms should be selected for application for a specific offender, noted that, “Until further research is conducted, however, this professional judgment is unavoidable” (Ref. 11, p 42).
The Static-99/Static-99R can be seen as one more tool that may be used by the clinician in rendering opinions on risk, as long as the clinician understands and conveys its limitations. Ignoring salient clinical factors (e.g., sexual sadism) because clinical judgment is out of vogue or not empirical and choosing instead the solitary application of actuarial risk norms can have serious consequences for the public. Clinical judgment is the process that incorporates all elements of a case, not just one, such as a numeric risk score. Forensic experts can provide an understanding of deviant drives that underlie sexual offenses and can identify “red flags” for risk (e.g., strangulation of a child victim during the sexual act that speaks to a pedophilic/sadistic focus) that are critical to the goal of a comprehensive risk assessment. Such assessments by forensic experts rely on education, skill, and professional experience. They should reflect reasoned judgments based on an understanding of all elements of a case, not just a small number of risk factors.
Footnotes
-
Disclosures of financial or other potential conflicts of interest: None.
- American Academy of Psychiatry and the Law