Synthetic Data

Synthetic data offers a safe and effective way to support research and innovation aimed at improving the health of children and youth. Recent advancements in machine learning have made it feasible to generate high-quality synthetic datasets that serve this purpose. These synthetic datasets are created by training machine learning models on real data and then using those models to generate new, artificial data – like a digital twin. While synthetic data does not correspond to any real individuals, it preserves the statistical patterns and relationships found in the original datasets.

This approach enables companies and researchers to use synthetic data to train, validate, and test their models and software applications without compromising privacy. Rigorous privacy assessments are conducted to ensure that the risk of re-identification is minimal, while the data’s utility for statistical and machine learning tasks is maintained. Methodological support is also available to help users apply synthetic data appropriately across different analytical contexts.

Related News

Researchers

  1. Khaled El Emam

    Senior Scientist, CHEO Research Institute Professor, Faculty of Medicine, University of Ottawa

    View Profile Email