As a timer counted down, a team of physicians from St. Michael’s Medical Center in Newark, N.J., conferred on a medical diagnosis question. Then another. And another. With each question, the stakes at Doctor’s Dilemma, an annual competition held in May in Washington, D.C., grew higher. By the end, the team had wrestled with 45 conditions, symptoms, or treatments. They defeated 50 teams to win the 2016 Osler Cup.
The stakes are even higher for real-life diagnoses, where doctors always face time pressure. That is why researchers have tried since the 1960s to supplement doctors’ memory and decision-making skills with computer-based diagnostic aids. In 2012, for example, IBM pitted a version of its Jeopardy!-winning artificial intelligence, Watson, against questions from Doctor’s Dilemma. But Big Blue’s brainiac couldn’t replicate the overwhelming success it had against human Jeopardy! players.
The trouble is, computerized diagnosis aids do not yet measure up to the performance of human doctors, according to several recent studies. Nor can makers of such software seem to agree on a single benchmark by which to measure performance. Using reports on such software in the peer-reviewed literature, one team of researchers found wide performance variations across different diseases, as well as different usage patterns among doctors. For example,…[Read more]
Published at : Updated