Molecular Psychiatry (2021) 26:23–25
https://doi.org/10.1038/s41380-020-00931-z
COMMENT
Machine learning for psychiatry: getting doctors at the black box?
1
Dennis M. Hedderich ●
Simon B. Eickhoff2,3
Received: 7 May 2020 / Revised: 14 September 2020 / Accepted: 22 October 2020 / Published online: 10 November 2020
© The Author(s) 2020. This article is published with open access
Abstract
Recent developments in the field of machine learning have spurred high hopes for diagnostic support for psychiatric patients
based on brain MRI. But while technical advances are undoubtedly remarkable, the current trajectory of mostly proof-of-
concept studies performed on retrospective, often repository-derived data, may not be well suited to yield a substantial
impact in clinical practice. Here we review these developments and challenges, arguing for the need of stronger involvement
of and input from medical doctors in order to pave the way for machine learning in clinical psychiatry.
1234567890();,:
1234567890();,:
Recent advances in algorithms and hardware have created exciting proof-of-concept papers re-analyzing available data-
high hopes for machine learning (ML) to become an almost sets into clinical impact. We here highlight some of the per-
universal solution for complicated problems. This enthusiasm tinent aspects and discuss the need for increased clinical input
has quickly taken over medical research, resulting in a along these steps.
growing number of publications highlighting the potential of While available datasets have enabled data scientists to
ML, accompanied by increasingly strong claims to enter explore many different questions, most work has focused on
clinical practice [1]. With respect to brain imaging, as often a supervised algorithms for closed questions, in particular
frontrunner for innovation, the motivation behind this devel- diagnostic classifications (“Does patient X have disease A or
opment seems obvious: MRI is highly standardized and not?”). However, these closed-type questions are hardly
between-subject analyses have been established for decades. reflective of clinical reality, where “open world” challenges
In addition, several large and open datasets provide relatively prevail as doctors usually have to consider several differential
good resources for model training [2]. There is also a clinical diagnoses. These may not only have different a priori like-
need in psychiatry: neuropsychiatric disorders are a leading lihoods that are again dependent on presenting symptoms and
cause of morbidity and disability-adjusted life years lost medical history, but may also coexist given the high pre-
worldwide, hence hopes are high for ML to accelerate the valence of comorbidity, e.g. between anxiety disorders and
diagnostic and nosological progress in psychiatry. depression or between substance abuse and psychosis.
Notwithstanding the potential impact of ML on psychiatry, Finally, in spite of clinical complaints, a patient may actually
it seems debatable whether the current state and trajectory are not have any (detectable) brain disease. Consequently, even
well aligned with the high expectations and oftentimes bold approaches showing excellent and robust performance on
promises [3]. Especially in psychiatry, more consideration of closed questions may be misleading in practice, if they dis-
prerequisites and further directions is needed to translate miss a particular diagnosis without weighing alternative
explanations. This illustrates the need for stronger con-
sideration of actual use cases from the very start of algo-
rithmic development. In turn, however, also expectations from
* Simon B. Eickhoff the medical side need to be grounded with respect to meth-
[email protected] odological feasibility. Hence, much closer interaction between
1 developers and users than currently practiced seems essential
Department of Neuroradiology, Klinikum rechts der Isar,
Technical University of Munich, School of Medicine, to avoid frustration on either side.
Munich, Germany Driven by the idea of “AI based diagnostics”, most current
2
Institute of Systems Neuroscience, Medical Faculty, Heinrich research focusses on “supervised” ML, which, independent of
Heine University Düsseldorf, Düsseldorf, Germany the sophistication of architecture, in essence learns a mapping
3
Institute of Neuroscience and Medicine, Brain & Behaviour (INM- from input to target space based on a set of labeled obser-
7), Research Centre Jülich, Jülich, Germany vations (the training set). Obviously, representative training
24 D. M. Hedderich, S. B. Eickhoff
data is essential to achieve generalization to future cases, but at the deployment site need to be empowered by such
ensuring representativeness in ML in psychiatry is challen- knowledge in order to locally and ultimately at the individual
ging and extends beyond adjusting for rather obvious socio- level adapt and monitor the use of ML approaches. To ensure
demographic factors like sex and age. But influencing factors such ML literacy necessary for shared decision making,
such as life events e.g. birth weight, obesity, or traumatic teaching data science to doctors needs to be developed and
experiences, but also occupational history or child-rearing, promoted.
which affect brain structure and function as well as neu- As we have touched on in the previous paragraph,
ropsychiatric outcomes, are widely neglected [4]. As such, diagnostic labels in clinical psychiatry are notoriously
they may potentially compromise ML performance through unassertive with respect to their neurobiological under-
hidden stratification, which describes implications of unrec- pinnings, limiting the usefulness of supervised ML strate-
ognized subsets of cases within the training set. Thus, if a gies. As an alternative not relying on labeled data,
depression classifier picks up on traces of childhood trauma unsupervised learning groups individuals based on detected
through a high percentage of depressed patients with a history patterns in high-dimensional data [5]. Once robust patterns
of childhood trauma, an actually depressed person without are established, these may then inform pathophysiological
trauma may be misclassified as healthy. By training on ret- concepts in psychiatry by comparison to clinical (phenom-
rospective, open-source datasets with little biographic infor- enologically driven) nosology or individual psychopatho-
mation, these relationships get lost. Probably, the only remedy logical work-up of the patients as described above for
to this predicament is to increase structured reporting of misclassified cases. Such approach would resonate well
biographic information in large MRI datasets and active with the Research Domain Criteria (RDoC) idea of
consideration of these factors when pursuing classifiers for dimensional psychiatry, and the increasing popularity of
neuropsychiatric diseases. This calls for a close collaboration canonical correlation analysis (CCA) finding linked com-
between clinicians and data scientists in order to obtain the ponents in imaging and clinical data. Together, these may
same level of accurate and multidimensional biographic then lead to important refinements of current concepts for
information as would be considered in clinical practice when psychiatric disease classification. However, we would like
weighting the likelihood of differential diagnoses as expla- to note that unsupervised approaches will inevitably find
nations for the current symptoms. components in the data, which does not make these by
In addition to input features being systematically con- themselves useful in clinical practice. But any ML approach
founded by (undocumented) influences, noisy or misaligned in the diagnostic work-up of patients in psychiatry will only
target labels also represent a potentially serious pitfall. While succeed if it creates an impact in real life, e.g. through
this is true also for neighboring disciplines like neurology, choosing the most beneficent therapeutic option for a
given evolving pathophysiological concepts and diagnostic patient. This being said, it becomes obvious that the prog-
guidelines, it is a fundamental problem for psychiatry, where nostication of treatment response may even be a more
diagnoses are ultimately conventions on how to group relevant question for ML to solve in psychiatry than cor-
symptoms into disease categories, given the lack of con- rectly assigning a diagnostic label. Again, the evaluation of
clusive pathophysiological models. Hence, achieving perfect this impact needs to thoroughly involve medical experts as
classification accuracy relative to clinical labels may actually it entails multiple facets beyond, e.g. factor stability or
not be desirable, particularly given that the latter are often generalization. On the one hand, it may promise more
acquired more easily in a clinical interview. On the one hand, appropriate interventions, better long-term outcome and
this suggests that algorithms designed for robust learning on reduced socio-economic costs, but conversely also dete-
noisy labels may be more appropriate than those aimed at riorating patient-physician relationship, unclear account-
minimizing the (cross-validated) prediction error. More ability and difficult acceptance [6]. Critically weighting
importantly, however, we would argue that this emphasizes benefits and drawbacks of ML in psychiatry calls for pro-
the need for closed feedback-loops by clinical work-up of spective, multi-center designs in a realistic clinical envir-
misclassified cases for advancing pathophysiological insight onment as opposed to the currently prevailing proof-of-
and ultimately classifications. If a healthy control was mis- concept studies. Learning from other fields like pharmaco-
classified as “depressed”, can this be explained by mere logical drug testing, where the introduction of preregistra-
technical deficiencies? Or does the subject share traits, bio- tion and external monitoring dramatically reduced the
graphic influences, even subclinical symptoms with the number of positive studies, this will likely lead to sobering
patients in the training group that the algorithm picked up? results [7]. There is also a pertinent rationale for a closer
Such questions, which rely on more extended characterization integration of clinical (drug) trials and ML in psychiatry,
and sufficient transparency of the algorithms, need to be especially for the prediction of treatment outcomes. Neu-
addressed before visions of “precision psychiatry” can ropsychopharmacology represents the main therapeutic
become reality. Further downstream, clinical decision makers option for most psychiatric disorders but non-response rates
Machine learning for psychiatry: getting doctors at the black box? 25
are high. Integrating ML into clinical trials could thus open Publisher’s note Springer Nature remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
new opportunities towards a better insight into patient- or
setting-specific factors that may influence the therapeutic
Open Access This article is licensed under a Creative Commons
response, and ultimately allow more targeted deployment of Attribution 4.0 International License, which permits use, sharing,
(new) drugs. Such models towards precision psychiatry adaptation, distribution and reproduction in any medium or format, as
need to be validated themselves in separate trials and ML- long as you give appropriate credit to the original author(s) and the
source, provide a link to the Creative Commons license, and indicate if
suggested stratifications need to be tested for their added
changes were made. The images or other third party material in this
benefit over clinical best-practice recommendations for the article are included in the article’s Creative Commons license, unless
choice of therapeutic agents. In this context, we also note indicated otherwise in a credit line to the material. If material is not
that realization of such potential will largely depend on the included in the article’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted
willingness and ability of psychiatrists and healthcare pro-
use, you will need to obtain permission directly from the copyright
viders to integrate such novel markers into clinical routine; holder. To view a copy of this license, visit http://creativecommons.
a non-trivial task for psychiatrists outside academic insti- org/licenses/by/4.0/.
tutions, which in many countries provide the majority of
care. Will such extensive evaluation protocols slow down
technical innovation in a fast-moving field? Most likely, but References
if ML for medical imaging aspires to have an impact similar
to pharmaceutical treatment, it seems indispensable to hold 1. Topol EJ. High-performance medicine: the convergence of human
and artificial intelligence. Nat Med. 2019;25:44–56.
its evaluation to similar standards. This is even more true
2. Eickhoff S, Nichols TE, Van Horn JD, Turner JA. Sharing the
when dealing with a yet poorly understood and complex wealth: neuroimaging data repositories. Neuroimage.
organ as the brain in the context of multidimensional con- 2016;124:1065–8.
cepts of neuropsychiatric diseases. 3. Arbabshirani MR, Plis S, Sui J, Calhoun VD. Single subject pre-
diction of brain disorders in neuroimaging: promises and pitfalls.
In summary, we argue for a deeper involvement of
Neuroimage. 2017;145:137–65.
domain experts, particularly medical professionals, in the 4. Fox SE, Levitt P, Nelson CA III. How the timing and quality of
process of developing novel ML applications for clinical early experiences influence the development of brain architecture.
psychiatry to help fulfill the currently high expectations. Child Dev. 2010;81:28–40.
5. Hastie T, Tibshirani R, Friedman JH. Unsupervised learning. In:
The elements of statistical learning: data mining, inference, and
Acknowledgements Open Access funding enabled and organized by
prediction. 2nd ed. New York: Springer (2009).
Projekt DEAL.
6. Heinrichs B, Eickhoff SB. Your evidence? Machine learning
algorithms for medical diagnosis and prediction. Hum Brain Mapp.
Compliance with ethical standards 2020;41:1435–44.
7. Kaplan RM, Irvin VL. Likelihood of null effects of large NHLBI
Conflict of interest The authors declare that they have no conflict of clinical trials has increased over time. PLoS ONE. 2015;10:
interest. e0132382.