Guidelines and recommendations for machine learning in biology

    2021_11 Research Highlight Fig 1 - Ian Photo 2
    Dr Ian Walsh



    As technologies advance, biology is flooded with complex scientific data beyond manual processing capacity. Artificial intelligence, particularly Machine Learning (ML), offers opportunities to find patterns and discover hidden connections between a disease and its causal agent or treatment. This has resulted in an increased use of ML in biology within academic communities as well as the pharmaceutical industry. However, there is a lack of community-wide ML standards to prevent undesirable ML practices or misinterpretation of the data. In this publication, we propose a set of recommendations to guide scientists in making the right decisions when developing ML for biology.


    Societal Impact

    Standardized approaches in reporting ML methods and presenting ML results will improve the quality of science. This publication in Nature Methods offers considerable insight into early community consensus-based recommendations and discussions. It aims to influence the wider ML community and increase the overall effectiveness of ML methods by providing guidelines, checklists and recommendations to the user.


    Technical Summary

    DOME stands for Data, Optimisation, Model and Evaluation, it is a set of guidelines we have proposed in Nature Methods for biological ML. It seeks to meet the demand for a cohesive and combined set of recommendations for data, optimization techniques, model selection and evaluation protocols. DOME will likely lead to increased reproducibility and clarity of ML methods for easier comprehension by readers and peer-reviewers.

    DOME consists of community-wide guidelines, recommendations and checklists which seeks to define best practices, improve publication standards and quality. It is also meant to trigger further consensus-based community discussions in the wider ML community.


    2021_11 Research Highlight Fig 1 - Ian Fig 1
    Figure 1. Biological data are complex, ranging from information pertaining to genome, proteome and post-translational modifications. Machine learning typically encompasses four major aspects of data, optimization, model and evaluation (DOME). Well-conceived machine learning (ML) following recommended DOME guidelines enhances biological ML applications, such as medical decisions, personalized and precision medicine, drug manufacturing and development.



    Walsh, Ian, et al. "DOME: recommendations for supervised machine learning validation in biology." Nature Methods (2021): 1-6.