Collective intelligence is a promising approach for improved medical diagnosis, according to a study using data from the Human Diagnosis Project (Human Dx) comparing the diagnostic accuracy of groups of physicians and medical students with that of individual physicians. The study, published in JAMA Network Open, show superiority of grouped diagnoses and suggest that the results translate to clinical settings.
Madrid, May 16, 2019. Diagnosis is fundamental for the practice of medicine; however, studies suggest that are highly prevalent even for common conditions, yielding significant morbidity and associated mortality. For centuries, the predominant model has been of an individual professional evaluating the patient and arriving at a diagnosis. However, everything points out to the fact that collective intelligence improves individual diagnosis, a finding that has been defended by the National Academy of Medicine to reduce medical errors. Rounds of teams of doctors in hospital wards, case discussions, and tumor committees are proof of this.
Group diagnosis may be more effective even when involving only 2 doctors, although an exponential increase in accuracy was observed in groups with 9 participants. The improvement was observed even in groups of nonspecialists versus specialists evaluating the patients individually. The data are derived from a study from the departments of Medicine and Public Health of the universities of Harvard and Washington, along with the Internal Medicine and Primary Care Service of the Brigham and Women's Hospital in Boston (United States). The principal investigator is Michael Barnett, a primary care physician and a professor in the Department of Health Policy and Management at Harvard.
The project is powered by the online platform Human Dx, which was designed to support the teaching of doctors, fellows (in the United States, physicians who continue clinical and/or research training after residency), residents, and medical students. Specifically, Barnett selected all cases authored by users between May 7, 2014, and October 5, 2016, with 10 or more respondents, which added to a total of 1,572 cases solved by 2,069 users from 47 countries. In all, 70.2% of the users had training in internal medicine, 59.4% were residents or fellows, 20.8% were practicing physicians, and 19.8% were medical students. Only 10% belonged to surgery and other specialties.
The results showed an effect on the diagnostic quality, understood as a correct diagnosis within the top three of possible diagnoses. For groups, the three primary diagnoses were a collective differential generated through a weighted combination of diagnoses. To account for pairing of data by case, a variation of the McNemar test was used. The conclusion was that "a collective intelligence approach was associated with higher diagnostic accuracy compared with individuals, including individual specialists whose expertise matched the case diagnosis, across a range of medical cases."
The larger the group, the greater the precision
This cross-sectional study revealed, using a wide selection of online clinical cases and common symptoms, that the independent differential diagnoses performed by several physicians combined in a weighted list significantly exceeded the diagnoses of individual physicians, even in groups as small as 2, and that the accuracy increased in larger groups of up to 9 doctors. The mean accuracy was 85.6% for groups of 9 versus 62.5% for individual physicians.
Groups of nonspecialists also significantly outperformed individual specialists in solving cases matched to the specialty of the individual specialist. Among other partial analyses, the diagnostic accuracy of the groups was compared with that of individual specialists in practical cases in their medical specialty, demonstrating superiority of the group diagnosis: accuracy of 85.5% for a group of 2 versus 77.7% for individual specialists.
The range of improvement varied according to the specifications used to combine the diagnoses, but the groups systematically outperformed the individuals. The absolute improvement in the accuracy of the individuals with respect to the groups of 9 varied when presenting the symptoms of an increase of 17.3% (95% CI, 6.4% - 28.2%, P = 0.002) for abdominal pain to 29.8% (95% CI, 3.7% - 55.8%; P = 0.02) for fever. Groups of 2 users (77.7% accuracy, 95% CI, 70.1% - 84.6%) to 9 users (85.5% accuracy, 95% CI, 75.1% - 95.9%) outperformed the individual specialists in their subspecialty (66.3% accuracy, 95% CI, 59.1% - 73.5%, P < 0.001 vs. groups of 2 and 9).
“Collective intelligence with groups of physicians and trainees was associated with improved diagnostic accuracy compared with individual users across multiple clinical disciplines," the authors concluded, emphasizing that this is a relatively new concept, although the medical profession has relied for some time on group decision-making processes such as the Delphi technique or tumor committees.
Pencil and paper
Up to now, this is the most extensive study on collective intelligence for clinical diagnosis in general, both in number of participating physicians and number of clinical cases used. The study also suggests the importance of establishing a diagnostic range to build a collective intelligence approach, thanks to the weighting rules used to build the collective differential and, no matter how long the list of combined diagnoses, reach minimum changes in precision in the two top first diagnoses.
The authors want to transfer future research to clinical settings, with or without the help of high-cost technology to implement collective intelligence strategies. They recognize that smartphones and the Internet could speed up the collection and integration of group decisions, "collective intelligence need not be facilitated by software; it is possible that even with paper and pencil, diagnostic accuracy could be improved with a collective intelligence approach," argues Barnett's team when defending the viability of this diagnostic approach using few resources.
The experts warn that the added benefit of the collective intelligence approach would need to be weighed against "the time and workload necessary to implement in practice," an aspect that reinforces the need to evaluate this diagnostic approach in the complexities of the real-world setting. In the opinion of these experts in Public Health, "a collective intelligence approach could provide valuable diagnostic assistance for primary care clinicians in areas with (…) higher rates of diagnostic error."
On the other hand, the researchers recognize the limitations arising from the data set itself. Moreover, those who contribute to Human Dx may not be a representative sample of the medical community (the analysis includes only 431 practicing physicians). Additionally, the platform was not specifically designed to evaluate collective intelligence. In any case, Barnett emphasizes that "pooling the diagnoses of multiple physicians into a ranked list could be an effective approach to improving diagnostic accuracy, but further study in a clinical setting is needed."
Barnett M, Boddupalli D, Nundy S. et al. Comparative Accuracy of Diagnosis by Collective Intelligence of Multiple Physicians vs Individual Physicians. Jama Network Open. 2019;2 (3):e190096. doi:10.1001/jamanetworkopen.2019.0096