Machine learning algorithm for precise prediction of 2'-O-methylation (Nm) sites from experimental RiboMethSeq datasets.

Fiche publication


Date publication

mars 2022

Journal

Methods (San Diego, Calif.)

Auteurs

Membres identifiés du Cancéropôle Est :
Pr MOTORINE Iouri, Dr MARCHAND Virginie


Tous les auteurs :
Pichot F, Marchand V, Helm M, Motorin Y

Résumé

Analysis of epitranscriptomic RNA modifications by deep sequencing-based approaches brings an essential contribution to the general knowledge on their precise locations and relative stoichiometry in cellular RNAs. To reveal RNA modifications, several analytical approaches have been proposed, including antibody-driven enrichment, analysis of RT-signatures and specific chemical treatments. However, analysis and interpretation of these massive datasets, especially for low abundant cellular RNAs (e.g. mRNA and lncRNA) is not easy nor straightforward, since the insufficient specificity and selectivity are leading to massive false-positive and false-negative identifications. The main issue in the application of these methods relies on a subjective classification of potentially modified positions, mostly based on arbitrarily defined threshold values for different scores. Such approach using pre-defined scores' values was revealed to be appropriate for limited complexity datasets (for tRNA and/or rRNA analysis), but application to longer reference sequences requires much better classification algorithms. In this work we applied a machine learning algorithm (Random Forest, RF) to create a predictive model for analysis of 2'-O-methylated sites in RNA using RiboMethSeq datasets. Model's training was performed on a large collection of human rRNA datasets with well-known modification profiles and the performance of the prediction was assessed using experimentally defined profiles for other eukaryotic rRNAs (S.cerevisiae and A.thaliana). Application of this Random Forest prediction model for detection of other RNA modifications and to more complex datasets is discussed.

Mots clés

2’-O-methylation, RNA modification, Random Forest, deep sequencing, machine learning

Référence

Methods. 2022 Mar 18;: