Exploratory research Generalities Preclinical

Oligonucleotides and Machine Learning Tools

Today, oligonucleotides – short DNA or RNA molecules – are essential tools in molecular biology projects, but also in therapeutics and diagnostics. In 2021, ten or so antisense therapies are authorised on the market, and much more are under clinical trials.

The recent Covid-19 crisis has also brought PCR tests to the public’s knowledge, these tests use small sequences of about 20 nucleotides to amplify and detect genetic material. Oligos have been so successful that, since their synthesis was automated, their market share has grown steadily. It is estimated that it will reach $14 billion by 2026.

Oligonucleotides have an elegance in their simplicity. It was in the 1950s that Watson and Crick described the double helix that makes up our genetic code, and the way in which the bases Adenine/Thymine and Cytosine/Guanine pair up. Thanks to this property, antisense therapies can virtually target our entire genome, and regulate its expression. Diseases that are difficult to treat, such as Spinal Dystrophy Disorder or Duchenne’s disease, are now benefiting some therapeutic support (1).

This article does not aim to restate the history of oligonucleotides used in clinic (many reviews are already available in the literature (2), (3), (4)), but to provide a quick overview of what has been developed in this area, with a Machine Learning tint.

We hope that the article will inspire some researchers, and that others may find new ideas of research and exploration. At a time when Artificial Intelligence has reached a certain maturity, it is particularly interesting to exploit it and to streamline all decision making in R&D projects.

This list is not exhaustive, and if you have a project or article to share with us, please contact us at We will be happy to discuss it and include it in this article.

Using Deep Learning to design PCR primers

As the Covid-19 health crisis has shown, diagnosing the population is essential to control and evaluate a pandemic. Thanks to two primers of about twenty nucleotides, a specific sequence can be amplified and detected, even at a very low level (PCR technique is technically capable of detecting up to 3 copies of a sequence of interest (5)).

A group from Utrecht University in the Netherlands (6) has developed a CNN (for Convolutional Neural Network, a type of neural network particularly effective in image recognition) capable of revealing areas of exclusivity in a genome. This allows the development of highly specific primers for the target of interest. In their case, they analysed more than 500 genomes of viruses from the Coronavirus family in order to train the algorithm to sort the different genomes. The primers designed by the model showed similar efficiency to the sequences used in practice. This tool could be used to develop PCR diagnostic tools with greater efficiency and speed.

Predicting the penetration power of an oligonucleotide

There are many peptides that improve the penetration of oligonucleotides into cells. These are called CPPs for Cell Penetrating Peptides, small sequences of less than 30 amino acids. Using a random decision tree, a team from MIT (7) was able to predict the activity of CPPs for oligonucleotides, modified by morpholino phosphorodiamidates (MO). Although the use of this model is limited (there are many chemical modifications to date and MOs cover only a small fraction of them), it is still possible to develop it for larger chemical families. For example, the model was able to predict experimentally whether a CPP would improve the penetration of an oligonucleotide into cells by a factor of three.

Optimising therapeutic oligonucleotides

Although oligonucleotides are known to be little immunogenic (8), they do not escape the toxicity associated with all therapies. “Everything is poison, nothing is poison: it is the dose that makes the poison. “- Paracelsus

This last parameter is key in the future of a drug during its development. A Danish group (9) has developed a prediction model capable of estimating the hepatotoxicity of a nucleotide sequence in mouse models. Again, here “only” unmodified and LNA (Locked Nucleic Acid, a chemical modification that stabilises the hybridisation of the therapeutic oligonucleotide to its target) modified oligonucleotides were analysed. It would be interesting to increase the chemical space studied and thus extend the possibilities of the algorithm. However, it is this type of model that will eventually reduce attrition in the development of new drugs. From another perspective (10), a model has been developped for optimising the structure of LNAs using oligonucleotides as gapmers. Gapmers are hybrid oligonucleotide sequences that have two chemically modified ends, that are resistant to degrading enzymes, and an unmodified central part that can be degraded once hybridised to its target. It is this final ‘break’ that will generate the desired therapeutic effect. Using their model, the researchers were able to predict the gapmer design that has the best pharmacological profile.

Accelerating the discovery of new aptamers

Also known as “chemical antibodies”, aptamers are DNA or RNA sequences capable of recognising and binding to a particular target with the same affinity as a monoclonal antibody. Excellent reviews on the subject are available here (11) or here (12). In clinic, pegatinib is the first aptamer to be approved for use. The compound is indicated for certain forms of AMD.

Current research methods, based on SELEX (Systematic Evolution of Ligands by Exponential Enrichment), have made it possible to generate aptamers directed against targets of therapeutic and diagnostic interest, such as nucleolin or thrombin. Although the potential of the technology is attractive, it is difficult and time-consuming to discover new pairs of sequence/target. To boost the search of new candidates, an American team (13) was able to train an algorithm to optimise an aptamer and reduce the size of its sequence, while maintaining or even increasing its affinity to its target. They were able to prove experimentally that the aptamer generated by the algorithm had more affinity than the reference candidate, while being 70% shorter. The interest here is to keep the experimental part (the SELEX part), and to combine it with these in silico tools in order to accelerate the optimisation of new candidates.

There is no doubt that the future of oligonucleotides is promising, and their versatility is such that they can be found in completely different fields, ranging from DNA-based nanotechnology to CRISPR/Cas technology. The latter two areas alone could be the subject of individual articles, as their research horizons are so important and exciting.

In our case, we hope that this short article has given you some new ideas and concepts, and inspired you to learn more about oligonucleotides and machine learning.

  1. Bizot F, Vulin A, Goyenvalle A. Current Status of Antisense Oligonucleotide-Based Therapy in Neuromuscular Disorders. Drugs. 2020 Sep;80(14):1397–415.
  2. Roberts TC, Langer R, Wood MJA. Advances in oligonucleotide drug delivery. Nat Rev Drug Discov. 2020 Oct;19(10):673–94.
  3. Shen X, Corey DR. Chemistry, mechanism and clinical status of antisense oligonucleotides and duplex RNAs. Nucleic Acids Res. 2018 Feb 28;46(4):1584–600.
  4. Crooke ST, Liang X-H, Baker BF, Crooke RM. Antisense technology: A review. J Biol Chem [Internet]. 2021 Jan 1 [cited 2021 Jun 28];296. Available from:
  5. Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, et al. The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments. Clin Chem. 2009 Apr 1;55(4):611–22.
  6. Lopez-Rincon A, Tonda A, Mendoza-Maldonado L, Mulders DGJC, Molenkamp R, Perez-Romero CA, et al. Classification and specific primer design for accurate detection of SARS-CoV-2 using deep learning. Sci Rep. 2021 Jan 13;11(1):947.
  7. Wolfe JM, Fadzen CM, Choo Z-N, Holden RL, Yao M, Hanson GJ, et al. Machine Learning To Predict Cell-Penetrating Peptides for Antisense Delivery. ACS Cent Sci. 2018 Apr 25;4(4):512–20.
  8. Stebbins CC, Petrillo M, Stevenson LF. Immunogenicity for antisense oligonucleotides: a risk-based assessment. Bioanalysis. 2019 Nov 1;11(21):1913–6.
  9. Hagedorn PH, Yakimov V, Ottosen S, Kammler S, Nielsen NF, Høg AM, et al. Hepatotoxic Potential of Therapeutic Oligonucleotides Can Be Predicted from Their Sequence and Modification Pattern. Nucleic Acid Ther. 2013 Oct 1;23(5):302–10.
  10. Papargyri N, Pontoppidan M, Andersen MR, Koch T, Hagedorn PH. Chemical Diversity of Locked Nucleic Acid-Modified Antisense Oligonucleotides Allows Optimization of Pharmaceutical Properties. Mol Ther – Nucleic Acids. 2020 Mar 6;19:706–17.
  11. Zhou J, Rossi J. Aptamers as targeted therapeutics: current potential and challenges. Nat Rev Drug Discov. 2017 Mar;16(3):181–202.
  12. Recent Progress in Aptamer Discoveries and Modifications for Therapeutic Applications | ACS Applied Materials & Interfaces [Internet]. [cited 2021 Jul 25]. Available from:
  13. Bashir A, Yang Q, Wang J, Hoyer S, Chou W, McLean C, et al. Machine learning guided aptamer refinement and discovery. Nat Commun. 2021 Apr 22;12(1):2366.

These articles should interest you


Introduction to DeSci

How Science of the Future is being born before our eyes « [DeSci] transformed my research impact from a low-impact virology article every other year to saving the lives and…
Illustration In Silico

Towards virtual clinical trials?

Clinical trials are among the most critical and expensive steps in drug development. They are highly regulated by the various international health agencies, and for good reason: the molecule or…

To subscribe free of charge to the monthly Newsletter, click here.

Would you like to take part in the writing of Newsletter articles ? Would you like to take part in an entrepreneurial project on these topics ?

Contact us at ! Join our group LinkedIn !

By Quentin Vicentini

Quentin graduated from Pharmacy School in Lille – France. After various Research experiences in Medicinal Chemistry, he pursued his career abroad and settled in Oxford in 2019. His interest in innovation pushed him to learn more about Machine Learning in Drug Discovery. Today, Quentin is specializing in oligonucleotide chemistry, for therapeutics and diagnostics.