Applying Data Preprocessing Methods to Predict Premature Birth


Data mining and pattern classification tools have{enabled prediction of several medical outcomes with high levels of accuracy. This is due to the capability of handling large datasets, even those with missing values. Preterm birth (PTB) can have damaging long-term effects for infants and rates have been increasing over the last two decades worldwide. The purpose of this work was to investigate whether preprocessing methods, when applied to two different prenatal datasets, can improve prediction accuracy of our software tool to predict PTB. The primary software used within this work was R. The software was used to deal with missing values and class imbalances found in these two datasets. The results show that in comparison to our past work, we have managed to increase the performance of the prediction tool using the metrics of sensitivity, specificity, and ROC values.

Lead Researchers

Link to Publication


  1. Erika Bariciak

    Investigator, CHEO Research Institute

    View Profile Email
  2. Jeff Gilchrist

    Associate Scientist, CHEO Research Institute

    View Profile Email