Authors

  1. McGraw, Mark

Article Content

A new report captures a number of suboptimal practices used in the data handling phase of machine learning (ML) system development, and it offers a series of strategies designed to eliminate them.

  
Artificial Intellige... - Click to enlarge in new windowArtificial Intelligence. Artificial Intelligence

"Minimizing bias is critical to adoption and implementation of machine learning in clinical practice," the authors wrote in the report. Published in Radiology: Artificial Intelligence, the researchers go on to note that systematic biases produce consistent and reproducible differences between the observed and expected performance of ML systems, which result in suboptimal performance (2022; https://doi.org/10.1148/ryai.210290).

 

Such biases can be traced back to various phases of ML development, the authors wrote: data handling, model development, and performance evaluation. Their report presents 12 suboptimal practices during data handling of an ML study, explaining how those practices can lead to biases and describing what can be done to mitigate them. The authors employed an "arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting and feature engineering, providing examples from the available research literature as well.

 

According to the authors, ML applications in radiology have resulted in more than 8,000 publications globally between the years of 2000 to 2018. However, mitigation of possible mathematical bias "remains a critical concern for adopters," they wrote. The researchers also noted that, while bias in medical research may be random (such as sampling variability or measurement precision), systematic bias produces consistent and reproducible differences between observed and expected performance, with unrecognized bias potentially contributing to suboptimal results.

 

To mitigate bias, the investigators recommend that researchers "carefully design and implement a pipeline of data handling, model development, and performance evaluation. Each of these steps may introduce systemic or random bias. ... Such biases must be recognized and, ideally, eliminated."

 

In the report, the team of Mayo Clinic researchers provide mitigation strategies for the 12 suboptimal practices they see occurring in each of the four data handling steps of ML system development, such as:

 

* issues during data collection, such as improper identification of the dataset, relying solely on a single source of data or an unreliable source of data;

 

* data investigation concerns, such as conducting insufficient exploratory data analysis (EDA), or performing EDA with no domain expertise or failing to observe actual data;

 

* data splitting, such as leakage between datasets, imbalanced datasets, or overfitting to hyperparameters; and

 

* data engineering, such as improper feature removal, improper feature rescaling, and mismanagement of missing data.

 

 

The researchers' starting point was the idea that AI "will impact the everyday routine of anyone working in health care, regardless of their specific role within an institution," noted study co-author Gian Marco Conte, MD, PhD, Assistant Professor of Radiology and Research Associate in the Radiology Informatics Laboratory at the Mayo Clinic.

 

For example, he said that a researcher, clinician, or administrator might be asked to review or design a study that uses an AI algorithm or evaluate a vendor's proposal presenting an AI-based tool to implement in clinical practice.

 

"So, even if not actively researching in the field of AI, it is essential to be familiar with its basic concepts," Conte said. "Among these fundamental concepts, bias is one of the most important, since it can affect AI algorithms at all levels, from design to clinical implementation."

 

It is also a fundamental concept to remember when reading or reviewing a paper about AI. "So, we aimed to make this fundamental concept accessible and clear to as many stakeholders as possible, because these are the people that will directly impact the implementation of AI in radiology," he noted.

 

Each of the suboptimal practices the study identified can introduce bias at different levels. And, while some of the best practices require technical knowledge-such as how to handle DICOM data correctly, for instance-most of them can and should be understood by the vast majority of stakeholders, Conte said.

 

"Some suboptimal practices can overestimate model performance, creating false and dangerous expectations and hype that harm the whole field," he continued, citing data leakage between training and validation/test sets as an example. This leakage can generate overly optimistic and biased results because the model performance is evaluated on data that's directly related to the one used during training.

 

Conte said one way to mitigate this issue would be to require authors to publish the code they used during data preparation, other than the one used during model training, so that reviewers and readers could identify a potential source of the leakage.

 

Speaking to Oncology Times, Conte also addressed the lack of diversity in collected data, another of the suboptimal practices he and his co-authors identified. Developing algorithms based on data collected at a single institution "can result in algorithms that work great for a specific portion of the population, but can produce wrong and dangerous results for another," he said.

 

To develop more heterogenous and diverse datasets, researchers should collect data from multiple institutions and geographical locations, use data from different vendors, and collect data at different points in time. Conte noted this might not always be practical, however, due to the difficulty of sharing medical data between institutions.

 

"So, it is essential to always describe in detail the cohort(s) used to develop these algorithms so that the people evaluating them can know if their output can be trusted when applied to a different population," he said.

 

Ultimately, most if not all of the sources of bias the authors identified "are related to the lack of involvement of subject matter experts at every level of a study, which will produce unreliable and potentially dangerous results," Conte stated. "So, in general, the best way to mitigate bias is to have different subject matter experts collaborating on a project, from its initial design to its practical implementation."

 

Mark McGraw is a contributing writer.