When training a machine learning algorithm for a supervised-learning task in some clinical applications, uncertainty in the correct labels of patients may adversely affect the performance of the algorithm. For example, even clinical experts may have less confidence when assigning a medical diagnosis to some patients because of ambiguity in the patient’s case or imperfect reliability of the diagnostic criteria. As a result, the assigned labels for these patients may not accurately reflect their underlying state. Rather than excluding patients with diagnostic uncertainty, an alternative approach is to use this additional information about diagnostic certainty during training of the algorithm, which could lead to more efficiently learning and better generalization to new patient cases. We present a robust method implemented with Support Vector Machines to account for such clinical diagnostic uncertainty when training an algorithm to detect patients who develop acute respiratory distress syndrome (ARDS), a pulmonary condition of the critically ill diagnosed using clinical criteria known to be imperfect.

They also performed a novel time-series sampling method to address the problem of inter-correlation among the longitudinal clinical data from each patient used in model training with consideration to stochastic dependency and the data not being independent and identically distributed. Preliminary results show that they can achieve meaningful improvement in the performance of algorithm to detect patients with ARDS on a hold-out sample, when they compare our method that accounts for the uncertainty of training labels with a conventional SVM algorithm.

See the article here: http://ieeexplore.ieee.org/document/8304750/