Two commentaries from Allen Frances in response to APA field trial documents
Shortlink Post #130:
Allen Frances, MD, chaired the DSM-IV Task Force and a former chair of the Department of Psychiatry at Duke University School of Medicine, Durham, NC. He is currently professor emeritus at Duke.
References and resources
DSM-5 document: Q & A on DSM-5 Prevalence and Reliability January 12, 2012
based on the The American Journal of Psychiatry article DSM-5: How Reliable Is Reliable Enough? Helena Chmura Kraemer, Ph.D.; David J. Kupfer, M.D.; Diana E. Clarke, Ph.D.; William E. Narrow, M.D., M.P.H.; Darrel A. Regier, M.D., M.P.H. January 01, 2012, Vol. 169. No. 1
Consumer-Friendly Frequently Asked Questions about DSM-5 Field Trials
Frequently Asked Questions about DSM-5 Field Trials in Large, Academic Settings
DSM-5 Field Trial Protocol for Large, Academic Settings
DSM-5 Field Trial Protocol for Routine Clinical Practice Settings
APA’s Request for Proposals for Potential Field Trial Sites
DSM-5 Field Trials in Routine Clinical Practice Settings
Supplemental Material for Clinician Application to Own Institutional Review Board (IRB)
Inside DSM-5 Field Trials, Flyer, American Psychiatric Association Practice Research Network, December 2011
Commentary: DSM-5 Disorganization, Disarray, and Delays, Dr Dayle Jones, PhD, January 3, 2012
Two commentaries from Allen Frances, MD
Two Fallacies Invalidate the DSM-5 Field Trials
APA telegraphs that DSM 5 will be unreliable.
Allen Frances, MD | January 16, 2012
The designer of the DSM-5 Field Trials has just written a telling commentary in the American Journal of Psychiatry. She makes two very basic errors that reveal the fundamental worthlessness of these field trials and their inability to provide any information that will be useful for DSM-5 decision making.
1) The commentary states: “A realistic goal is a kappa between 0.4 and 0.6, while a kappa between 0.2 and 0.4 would be acceptable.” This is simply incorrect and flies in the face of all traditional standards of what is considered ‘acceptable’ diagnostic agreement among clinicians. Clearly, the commentary is attempting to greatly lower our expectations about the levels of reliability that were achieved in the field trials – to soften us up to the likely bad news that the DSM-5 proposals are unreliable. Unable to clear the historic bar of reasonable reliability, it appears that DSM-5 is choosing to drastically lower that bar – what was previously seen as clearly unacceptable is now being accepted.
Kappa is a statistic that measures agreement among raters, corrected for chance agreement. Historically, kappas above 0.8 are considered good, above 0.6 fair, and under 0.6 poor. Before this AJP commentary, no one has ever felt comfortable endorsing kappas so low as 0.2-0.4. As a comparison, the personality section in DSM III was widely derided when its kappas were around 0.5. A kappa between 0.2-0.4 comes dangerously close to no agreement. ‘Accepting’ such low levels is a blatant fudge factor – lowering standards in this drastic way cheapens the currency of diagnosis and defeats the whole purpose of providing diagnostic criteria.
Why does this matter? Good reliability does not guarantee validity or utility – human beings often agree very well on things that are dead wrong. But poor reliability is a certain sign of very deep trouble. If mental health clinicians cannot agree on a diagnosis, it is essentially worthless. The low reliability of DSM-5 presaged in the AJP commentary confirms fears that its criteria sets are so ambiguously written and difficult to interpret that they will be a serious obstacle to clinical practice and research. We will be returning to the wild west of idiosyncratic diagnostic practice that was the bane of psychiatry before DSM III.
2) The commentary also states: “one contentious issue is whether it is important that the prevalence for diagnoses based on proposed criteria for DSM-5 match the prevalence for the corresponding DSM-IV diagnoses” …. “to require that the prevalence remain unchanged is to require that any existing difference between true and DSM-IV prevalence be reproduced in DSM-5. Any effort to improve the sensitivity of DSM-IV criteria will result in higher prevalence rates, and any effort to improve the specificity of DSM-IV criteria will result in lower prevalence rates. Thus, there are no specific expectations about the prevalence of disorders in DSM-5.”
This is also a fudge. For completely unexplained and puzzling reasons, the DSM-5 field trials failed to measure the impact of its proposals on rates of disorder. These quotes in the commentary are an attempt to justify this fatal flaw in design. The contention is that we have no way of knowing what true rates of a given diagnosis should be – so why bother to measure what will be the likely impact on rates of the DSM-5 proposals. If rates double under DSM-5, the assumption will be that it is picking up previous false negatives with no need to worry about the risks of creating an army of new false positives.
This is irresponsible for two reasons. First off, we are already suffering from serious diagnostic inflation. Rates of psychiatric disorder are already sky high (25% in the general population in any year; 50% lifetime) and we recently have experienced three runaway false epidemics of childhood disorders in the past 15 years. Second, drug company marketing has been so abusive as to warrant enormous fines and so successful as to result in widespread misuse of medication for very questionable indications. Recent CDC data suggest that the severely ill remain very undertreated, but that the mildly ill or not ill at all have become massively overtreated, especially by primary care physicians.
The DSM-5 proposals will uniformly increase rates, sometimes dramatically. Not to have measured by how much is unfathomable and irresponsible. The new diagnoses suggested for DSM-5 will (mis)label people at the very populous boundary with normality. Mixed anxiety depression and binge eating disorder will likely have astounding high rates between 5-10% – that’s tens of millions people now considered ‘normal’ suddenly converted into mentally ill by arbitrary DSM-5 fiat. Psychosis risk and disruptive mood disorder will be extremely common in the young; minor neurocognitive among the elderly. Legions of the recently bereaved will be misdiagnosed as clinically depressed; rates of generalized anxiety and addiction will mushroom; and ADD which has already almost tripled will find even more room at the top. The field trial developers seem either unaware or insensitive to the unacceptable risks involved in creating large numbers of false positive, pseudo-patients.
Indeed, quite contrary to the blithe assertions put forward in the commentary, we should have rigorous expectations about prevalence changes triggered by any DSM revision. Rates should not be wildly different for the same disorder UNLESS there is clear evidence of a serious false negative problem and firm protections against creating a massive false positive problem. And new disorders with high prevalences should not be included without substantial scientific evidence and convincing proof of accuracy, reliability, and safety. We have known since they were first posted that none of the DSM-5 proposals comes remotely close to meeting a minimal standard for accuracy and safety. And now, the AJP commentary seems to be softening us up for the bad news that their reliability is also lousy.
The workers on DSM-5 ignore the often dire implications of drastically raising the prevalence of an existing disorder or adding an untested new disorder with high prevalence – i.e., the misguided and potentially harmful treatment, the unnecessary stigma, and rising health care costs that also cause a misallocation of very scarce resources. Just two examples. Do we really want even more antipsychotic medications prescribed for children, the elderly, and returning war veterans when these are already being used so loosely and inappropriately? Isn’t the current legal and illegal overuse of stimulant medications already a big enough problem without introducing a drastically lowered set of criteria for diagnosing ADD? Sad to say, DSM-5 has failed to do an adequate risk/benefit analysis on any of its suggestions. Every one of its changes is designed to chase elusive false negatives; none protects the interests of mislabeled false positives.
Given our country’s current binge of loose diagnostic and medication practice (particularly by the primary care physicians who do most of the prescribing), DSM-5 should not be in the business of casually raising rates and offering inviting new targets for aggressive drug marketing. Instead, DSM-5 should be working in the opposite direction – taking steps to increase the precision and specificity of its diagnostic criteria. And the texts describing each disorder should contain a new section warning about the risks of overdiagnosis and ways of avoiding it. It is impossible to say what is the “right” prevalence of any disorder, but it is careless and reckless to so dramatically increase the prevalences of mental disorders without evidence of need or proof of safety.
The DSM-5 field trials have cost APA at least $3 million (perhaps a whole lot more). They started off on the wrong foot by asking the wrong question – focusing only on reliability and completely ignoring prevalence. The deadlines for starting the trials and for delivering results have been repeatedly postponed because of poor planning, an excessively cumbersome design, and disorganized implementation. The results will be arriving at the very last minute when decisions should have already be made. And now we get a broad hint that the reliabilities, when they are finally reported, will be disastrously low.
What should be done now as DSM-5 enters its depressing endgame? There really is no rational choice except to drop the many unsupportable DSM-5 proposals and to dramatically improve the imprecise writing that plagues most of the DSM-5 criteria sets.
DSM-5: How Reliable Is Reliable Enough?
DSM 5 is willing to except poor quality.
Allen Frances, MD | January 18, 2012
This is the title of a disturbing commentary written by the leaders of the DSM 5 Task Force and published in this month’s American Journal of Psychiatry. The contents suggest that we must lower our expectations and be satisfied with levels of unreliability in DSM 5 that historically have been clearly unacceptable. Two approaches are possible when the DSM 5 field trials reveal low reliability for a given suggestion: 1) admit that the suggestion was a bad idea or that it is written so ambiguously as to be unusable in clinical practice, research, and forensics; Or, 2) declare by arbitrary fiat that the low reliability is indeed now to be relabeled ‘acceptable’.
In the past, ‘acceptable’ meant kappas of 0.6 or above. When the personality disorders in DSM III came in at 0.54, they were roundly derided and given only a reluctant bye. For DSM 5, ‘acceptable’ reliability has been reduced to a startling 0.2-0.4. This barely exceeds the level of agreement you might expect to get by pure chance.
Previously in its development, DSM 5 has placed great store in its field trials. This quote is from the Chair of the DSM 5 Task Force: “There’s a myth that all the decisions have been made, when in fact, all the decisions haven’t been made. Just because things have been proposed doesn’t necessarily mean they’ll end up in the DSM-5. If they don’t achieve a level of reliability, clinician acceptability, and utility, it’s unlikely they’ll go forward.”
And this quote is from a 2010 interview given to a science writer by the head of the DSM 5 Oversight Committee: “It’s going to be based on the work of the field trials – based on the assessment and analysis of them. I don’t think anyone is going to say we’ve got to go forward if we get crappy results.”
The DSM 5 tune has now changed dramatically. The commentary written for AJP by the leadership of DSM 5 Task Force appears to be suggesting that they will, in fact, “go forward,” and with sub par reliabilities of 0.2-0.4. Now consider that the original field trial plan was to have a second phase to permit fixing those diagnostic criteria that were found to have unacceptable reliability in the first phase. These would go back to the workgroups who could then rewrite the offending criteria and retest the new version in the second phase of the field trial. But poor planning and administrative foul-ups kept pushing back the field trials so that they are now at least 18 months late in completion. As time was running out, DSM 5 leadership quietly dropped the second phase of the field trials, removing any reference to it from the timeline posted on the DSM-5 website. Their Plan B substitute for adequate field testing appears in AJP- To wit: a drastic lowering of the bar for what is ‘acceptable’ reliability.
Can ‘accepting’ unacceptably poor agreement uphold the integrity of psychiatric diagnosis? Poor reliability degrades our ability to communicate with one another clinically, and prohibits meaningful research. ‘Accepting’ as reliable kappas of 0.2-0.4 is to go backwards more than thirty years to the days of DSM II. Before DSM III, Bob Spitzer and Mel Sabshin saw the need to develop a criterion based system that could achieve reasonable diagnostic agreement. This is the very minimum condition necessary for current clinical work and future progress in psychiatry.