Exploratory research Generalities Preclinical

Oligonucleotides and Machine Learning Tools

Today, oligonucleotides – short DNA or RNA molecules – are essential tools in molecular biology projects, but also in therapeutics and diagnostics. In 2021, ten or so antisense therapies are authorised on the market, and much more are under clinical trials.

The recent Covid-19 crisis has also brought PCR tests to the public’s knowledge, these tests use small sequences of about 20 nucleotides to amplify and detect genetic material. Oligos have been so successful that, since their synthesis was automated, their market share has grown steadily. It is estimated that it will reach $14 billion by 2026.

Oligonucleotides have an elegance in their simplicity. It was in the 1950s that Watson and Crick described the double helix that makes up our genetic code, and the way in which the bases Adenine/Thymine and Cytosine/Guanine pair up. Thanks to this property, antisense therapies can virtually target our entire genome, and regulate its expression. Diseases that are difficult to treat, such as Spinal Dystrophy Disorder or Duchenne’s disease, are now benefiting some therapeutic support (1).

This article does not aim to restate the history of oligonucleotides used in clinic (many reviews are already available in the literature (2), (3), (4)), but to provide a quick overview of what has been developed in this area, with a Machine Learning tint.

We hope that the article will inspire some researchers, and that others may find new ideas of research and exploration. At a time when Artificial Intelligence has reached a certain maturity, it is particularly interesting to exploit it and to streamline all decision making in R&D projects.

This list is not exhaustive, and if you have a project or article to share with us, please contact us at We will be happy to discuss it and include it in this article.

Using Deep Learning to design PCR primers

As the Covid-19 health crisis has shown, diagnosing the population is essential to control and evaluate a pandemic. Thanks to two primers of about twenty nucleotides, a specific sequence can be amplified and detected, even at a very low level (PCR technique is technically capable of detecting up to 3 copies of a sequence of interest (5)).

A group from Utrecht University in the Netherlands (6) has developed a CNN (for Convolutional Neural Network, a type of neural network particularly effective in image recognition) capable of revealing areas of exclusivity in a genome. This allows the development of highly specific primers for the target of interest. In their case, they analysed more than 500 genomes of viruses from the Coronavirus family in order to train the algorithm to sort the different genomes. The primers designed by the model showed similar efficiency to the sequences used in practice. This tool could be used to develop PCR diagnostic tools with greater efficiency and speed.

Predicting the penetration power of an oligonucleotide

There are many peptides that improve the penetration of oligonucleotides into cells. These are called CPPs for Cell Penetrating Peptides, small sequences of less than 30 amino acids. Using a random decision tree, a team from MIT (7) was able to predict the activity of CPPs for oligonucleotides, modified by morpholino phosphorodiamidates (MO). Although the use of this model is limited (there are many chemical modifications to date and MOs cover only a small fraction of them), it is still possible to develop it for larger chemical families. For example, the model was able to predict experimentally whether a CPP would improve the penetration of an oligonucleotide into cells by a factor of three.

Optimising therapeutic oligonucleotides

Although oligonucleotides are known to be little immunogenic (8), they do not escape the toxicity associated with all therapies. “Everything is poison, nothing is poison: it is the dose that makes the poison. “- Paracelsus

This last parameter is key in the future of a drug during its development. A Danish group (9) has developed a prediction model capable of estimating the hepatotoxicity of a nucleotide sequence in mouse models. Again, here “only” unmodified and LNA (Locked Nucleic Acid, a chemical modification that stabilises the hybridisation of the therapeutic oligonucleotide to its target) modified oligonucleotides were analysed. It would be interesting to increase the chemical space studied and thus extend the possibilities of the algorithm. However, it is this type of model that will eventually reduce attrition in the development of new drugs. From another perspective (10), a model has been developped for optimising the structure of LNAs using oligonucleotides as gapmers. Gapmers are hybrid oligonucleotide sequences that have two chemically modified ends, that are resistant to degrading enzymes, and an unmodified central part that can be degraded once hybridised to its target. It is this final ‘break’ that will generate the desired therapeutic effect. Using their model, the researchers were able to predict the gapmer design that has the best pharmacological profile.

Accelerating the discovery of new aptamers

Also known as “chemical antibodies”, aptamers are DNA or RNA sequences capable of recognising and binding to a particular target with the same affinity as a monoclonal antibody. Excellent reviews on the subject are available here (11) or here (12). In clinic, pegatinib is the first aptamer to be approved for use. The compound is indicated for certain forms of AMD.

Current research methods, based on SELEX (Systematic Evolution of Ligands by Exponential Enrichment), have made it possible to generate aptamers directed against targets of therapeutic and diagnostic interest, such as nucleolin or thrombin. Although the potential of the technology is attractive, it is difficult and time-consuming to discover new pairs of sequence/target. To boost the search of new candidates, an American team (13) was able to train an algorithm to optimise an aptamer and reduce the size of its sequence, while maintaining or even increasing its affinity to its target. They were able to prove experimentally that the aptamer generated by the algorithm had more affinity than the reference candidate, while being 70% shorter. The interest here is to keep the experimental part (the SELEX part), and to combine it with these in silico tools in order to accelerate the optimisation of new candidates.

There is no doubt that the future of oligonucleotides is promising, and their versatility is such that they can be found in completely different fields, ranging from DNA-based nanotechnology to CRISPR/Cas technology. The latter two areas alone could be the subject of individual articles, as their research horizons are so important and exciting.

In our case, we hope that this short article has given you some new ideas and concepts, and inspired you to learn more about oligonucleotides and machine learning.

  1. Bizot F, Vulin A, Goyenvalle A. Current Status of Antisense Oligonucleotide-Based Therapy in Neuromuscular Disorders. Drugs. 2020 Sep;80(14):1397–415.
  2. Roberts TC, Langer R, Wood MJA. Advances in oligonucleotide drug delivery. Nat Rev Drug Discov. 2020 Oct;19(10):673–94.
  3. Shen X, Corey DR. Chemistry, mechanism and clinical status of antisense oligonucleotides and duplex RNAs. Nucleic Acids Res. 2018 Feb 28;46(4):1584–600.
  4. Crooke ST, Liang X-H, Baker BF, Crooke RM. Antisense technology: A review. J Biol Chem [Internet]. 2021 Jan 1 [cited 2021 Jun 28];296. Available from:
  5. Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, et al. The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments. Clin Chem. 2009 Apr 1;55(4):611–22.
  6. Lopez-Rincon A, Tonda A, Mendoza-Maldonado L, Mulders DGJC, Molenkamp R, Perez-Romero CA, et al. Classification and specific primer design for accurate detection of SARS-CoV-2 using deep learning. Sci Rep. 2021 Jan 13;11(1):947.
  7. Wolfe JM, Fadzen CM, Choo Z-N, Holden RL, Yao M, Hanson GJ, et al. Machine Learning To Predict Cell-Penetrating Peptides for Antisense Delivery. ACS Cent Sci. 2018 Apr 25;4(4):512–20.
  8. Stebbins CC, Petrillo M, Stevenson LF. Immunogenicity for antisense oligonucleotides: a risk-based assessment. Bioanalysis. 2019 Nov 1;11(21):1913–6.
  9. Hagedorn PH, Yakimov V, Ottosen S, Kammler S, Nielsen NF, Høg AM, et al. Hepatotoxic Potential of Therapeutic Oligonucleotides Can Be Predicted from Their Sequence and Modification Pattern. Nucleic Acid Ther. 2013 Oct 1;23(5):302–10.
  10. Papargyri N, Pontoppidan M, Andersen MR, Koch T, Hagedorn PH. Chemical Diversity of Locked Nucleic Acid-Modified Antisense Oligonucleotides Allows Optimization of Pharmaceutical Properties. Mol Ther – Nucleic Acids. 2020 Mar 6;19:706–17.
  11. Zhou J, Rossi J. Aptamers as targeted therapeutics: current potential and challenges. Nat Rev Drug Discov. 2017 Mar;16(3):181–202.
  12. Recent Progress in Aptamer Discoveries and Modifications for Therapeutic Applications | ACS Applied Materials & Interfaces [Internet]. [cited 2021 Jul 25]. Available from:
  13. Bashir A, Yang Q, Wang J, Hoyer S, Chou W, McLean C, et al. Machine learning guided aptamer refinement and discovery. Nat Commun. 2021 Apr 22;12(1):2366.

These articles should interest you


Introduction to DeSci

How Science of the Future is being born before our eyes « [DeSci] transformed my research impact from a low-impact virology article every other year to saving the lives and…
Illustration In Silico

Towards virtual clinical trials?

Clinical trials are among the most critical and expensive steps in drug development. They are highly regulated by the various international health agencies, and for good reason: the molecule or…

To subscribe free of charge to the monthly Newsletter, click here.

Would you like to take part in the writing of Newsletter articles ? Would you like to take part in an entrepreneurial project on these topics ?

Contact us at ! Join our group LinkedIn !

Exploratory research

Health data: an introduction to the synthetic data revolution

Data, sometimes considered as the black gold of the 21st century, are the essential fuel for artificial intelligence and are already widely used by the pharmaceutical industry. However, and especially because of the particular sensitivity of Health, their use has several limitations. Will synthetic data be one of the solutions to solve these problems?

What is synthetic data and why use it?

Synthetic data are data created artificially through the use of generative algorithms, rather than collected from real events. Originally developed in the 1990s to allow work on U.S. Census data, without disclosing respondents’ personal information, synthetic data have since been developed to generate high-quality, large-scale datasets.

These data are generally generated from real data, for example from patient files in the case of health data, and preserve their statistical distribution. Thus, it is theoretically possible to generate virtual patient cohorts, having no real identity, but corresponding statistically in all points to real cohorts. Researchers have succeeded in synthesizing virtual patient records from publicly available demographic and epidemiological data. In this case, we speak of “fully synthetic data“, as opposed to “partially synthetic data“, which are synthetic data manufactured to replace missing data from real data sets collected in the traditional way.


Currently, and despite various initiatives – such as the Health Data Hub in France,  for which we will come back to in future articles – aiming to democratize their use, many problems still limit the optimal and massive use of patient data, despite their ever growing volume. Synthetic data are one of the solutions that can be used.

  • Health data privacy:

Naturally, health data are particularly sensitive in terms of confidentiality. The need to preserve patient anonymity leads to a certain number of problems in terms of accessibility and data processing costs. Many players do not have an easy access to these data, and even when they do manage to gain access, their processing involves significant regulatory and cybersecurity costs. Access times are also often extremely long, which slows down the research projects. For some databases, it is sometimes a regulatory requirement to hire a third-party company, that is accredited to handle these data.

To allow their use, patient data are generally anonymized using methods such as the deletion of identifying variables; their modification by the addition of noise; or the grouping of categorical variables in order to avoid certain categories containing too few individuals. However, the efficiency of these methods has been regularly questioned by studies showing that it was generally possible to trace the identity of patients, by making matches (probabilistic or deterministic) with other databases. Synthetic data generation can, in this context, be used as a safe and easy-to-use alternative.

  • Data quality:

The technique of synthetic data generation is commonly used to fill in missing data in real data sets that are impossible or very costly to collect again. These new data are representative of the statistical distribution of variables from the real data set.

  • The volume of health data datasets is too small to be exploited by artificial intelligence:

The training of Machine or Deep Learning models sometimes requires large volumes of data in order to obtain satisfying predictions: it is commonly accepted that a minimum of about 10 times as many examples as degrees of freedom of the model are required. However, when Machine Learning is used in health care, it is common that the volume of data does not allow good results, for example in rare pathologies that are poorly documented, or sub-populations representing few individuals. In such cases, the use of synthetic data is part of the data scientists’ toolbox.

The use of synthetic data is an emerging field, some experts believe it will help overcoming some of the current limitations of AI. Among the various advantages brought by synthetic data in the field of AI, we can mention: the fact that it is fast and inexpensive to create as much data as you want, without the need to label them by hand as it is often the case with real data, but also that these data can be modified several times in order to make the model as efficient as possible, in its processing of real data.

The different techniques for generating synthetic data

The generation of synthetic data involves several phases:

  • The preparation of the sample data from which the synthetic data will be generated: in order to obtain a satisfying result, it is necessary to clean and harmonize the data if they come from different sources
  • The actual generation of the synthetic data, we will detail some of these techniques below
  • The verification and the evaluation of the confidentiality offered by the synthetic data

Figure 1 – Synthetic Data Generation Schema

The methods of data generation are numerous, and their use depends on the objective one is aiming for and the type of data one wants to create: should we create data from already existing data, and thus follow their statistical distributions?  Or fully virtual data following rules, allowing them to be realistic (like text for example)? In the case of “data-driven” methods, taking advantage of existing data, generative Deep Learning models will be used. In the case of “process-driven” methods, allowing mathematical models to generate data from underlying physical processes, it will be a question of what we call agent-based modelling.

Operationally, synthetic data are usually created in the Python language – very well known to Data Scientists. Different Python libraries are used, such as: Scikit-Learn, SymPy, Pydbgen and VirtualDataLab. A future Resolving Pharma article will follow up this introduction by presenting how to create synthetic health data using these libraries.

Evaluation of synthetic data

It is common to evaluate anonymized patient data according to two main criteria: the quality of the use that can be made with the data, and the quality of anonymization that has been achieved. It has been shown that the more the data is anonymized, the more limited the use is, since important but identifying features are removed, or precision is lost by grouping classes of values. There is a balance to be found between the two, depending on the destination of the data.

Synthetic data are evaluated according to three main criteria:

  • The fidelity of the data to the base sample
  • Fidelity of the data to the distribution of the general population
  • The level of anonymization allowed by the data

Different methods and metrics exist to evaluate the criteria: 

By ensuring that the quality of the data generated is sufficient for its intended use, evaluation is an essential and central element of the synthetic data generation process.

Which use cases for synthetic data in the pharmaceutical industry?

A few months ago, Accenture Life Sciences and Phesi, two companies providing services to pharmaceutical companies, co-authored a report urging them to integrate more techniques involving synthetic data into their activities. The use case mentioned in this report is about synthetic control arms, which however, generally use real data from different clinical trials and is statistically reworked.

Outside the pharmaceutical industry, in the world of Health, synthetic data are already used to train visual recognition models in imaging: researchers can artificially add pathologies to images of healthy patients and thus test their algorithms on their ability to detect the pathologies. Based on this use-case, it is also possible to create histological section data that could be used to train AI models in preclinical studies.


There is no doubt that the burgeoning synthetic data industry is well on its way to improve artificial intelligence as we currently know it, and its use in the health industry. This is particularly true when handling sensitive and difficult-to-access data. We can imagine, for example, a world where it is easier and more efficient for manufacturers to create their own synthetic data, than to seek access to medical or medico-administrative databases. This technology would then be one of those that would modify the organization of innovation in the health industries, by offering a less central place to real data.

To go further:

These articles should interest you


Introduction to DeSci

How Science of the Future is being born before our eyes « [DeSci] transformed my research impact from a low-impact virology article every other year to saving the lives and…
Illustration In Silico

Towards virtual clinical trials?

Clinical trials are among the most critical and expensive steps in drug development. They are highly regulated by the various international health agencies, and for good reason: the molecule or…

To subscribe free of charge to the monthly Newsletter, click here.

Would you like to take part in the writing of Newsletter articles ? Would you like to take part in an entrepreneurial project on these topics ?

Contact us at ! Join our group LinkedIn !

Entrepreneurship Generalities

Blockchain, Mobile Apps: will technology solve the problem of counterfeit drugs?

« Fighting counterfeit drugs is only the start of what blockchain could achieve through creating [pharmaceutical] ‘digital trust’.»

Andreas Schindler, Blockchain Expert

20% of the medicines circulating in the world are counterfeit, most of them do not contain the right active substance or not in the right quantity. Representing 200 billion dollars per year, this traffic – 10 to 20 times more profitable for organized crime than heroin – causes the death of hundreds of thousands of people every year, the majority of whom are children, whose parents think they are treating them with real medicine. To fight this scourge, laboratories and international health authorities must form a united front, where technology could be the keystone.

The problem of counterfeit drugs

It is an almost invisible scourge, which contours are difficult to define, a low-key global epidemic, which does not provoke confinements or massive vaccination campaigns, but which nevertheless kills hundreds of thousands of patients every year. Counterfeit medicines, defined by the WHO as “medicines that are fraudulently manufactured, mislabeled, of poor quality, conceal the details or identity of the source, and do not meet defined standards”, generally concern serious diseases such as AIDS, tuberculosis or malaria, and lead to the death of approximately 300,000 children under the age of 5 from pneumonia and malaria. In fact, the general term “counterfeit drugs” covers very different products: some containing no active ingredient, some containing active ingredients different from what is indicated on the label, and others containing the indicated Active Pharmaceutical Ingredient (API) in different quantities. In addition to their responsibility for the countless human tragedies, counterfeit medicines also contribute to future issues by increasing antibiotic resistance in areas of the world where health systems are already failing and will probably not be able to cope with this new challenge.

Now, from a financial perspective. Apart from public health considerations, counterfeit medicines are also an economic and political problem for countries: this traffic, which represents 200 billion dollars per year, feeds organized crime networks and represents a very high cost for health systems. As far as the pharmaceutical industry is concerned, the problems caused by this traffic are numerous: it represents a 20% loss of revenue of their worldwide sales; a lack of confidence from their patients – not knowing, most of the time, that the counterfeit drugs are not the originals; and finally considerable expenses in order to fight the counterfeits.

Initiatives against counterfeit drugs

Counterfeit medicines are usually distributed through highly complex networks, which makes it particularly difficult to curb their spread. In its “Guide for the development of measures to eliminate counterfeit medicines”, the WHO identifies various legal-socio-political initiatives that can be put in place for States in order to limit the spread of these counterfeit medicines. While these recommendations are relevant, they are particularly difficult to implement in regions of the world where countries have few resources and whose structures are plagued by endemic corruption. In this article, we will therefore focus on solutions implemented by private companies: start-ups specialized in fighting against counterfeit drugs or large pharmaceutical companies.

One of the methods used by various start-ups – such as PharmaSecure based in India, or Sproxil based in Nigeria, and actively collaborating with the government of that country – is to use the widespread access of the populations to smartphones to allow them to identify counterfeit drug boxes according to the following model: drug manufacturers collaborate with these start-ups to set up codes (in the form of numerical codes or QR codes) concealed  inside the box  or on the packaging of the drug, under a surface that needs to be scratched or removed. Patients can download a free app and scan these codes to verify the medication is authentic. These applications also allow patients to receive advice on their treatments. They function as a trusted third party to certify the patient, the final consumer of the drug, that no one has fraudulently substituted the legitimate manufacturer.

Figure 1 – Model for drug authenticity verification using mobile apps

The system described above works almost the same way as serialization. The implementation began several years ago and is described in European Regulation 2016/61; with the exception that the verification is performed by the patient and not by the pharmacist.

Other mobile apps, such as CheckFake and DrugSafe, are developing a different verification system, taking advantage of the smartphone’s camera to check the shape, content, and color compliance of drug packaging. Finally, another category of mobile apps implements a system that analyses the shape and the color of the drugs themselves to identify which tablets they are, and certify they are authentic.

These different solutions have a number of qualities, in particular their ease of deployment and use by patients in all over the world. On the other hand, they have the disadvantage of being launched in a speed race with counterfeiters, pushed to produce more and more realistic and similar counterfeits. Nevertheless, these technologies can hardly be applied in other circuits: securing the entire supply chain or tracking the circuit of drugs in hospitals. This is why many large pharmaceutical groups, such as Merck or Novartis for example, bet on a different technology: the Blockchain. Explanations.

Presentation of the Blockchain technology

Blockchain is a technology conceived in 2008, on which cryptocurrencies have been built since then. It is a cryptographically secured technology for storing and transmitting information without a centralized control body. The main objective is to allow a computer protocol to be a vector of trust between different actors without an intermediary third party. The Blockchain mechanism allows the different actors participating to obtain a unanimous agreement on the content of the data, and to avoid their subsequent falsification. Thus, the historical method of consensus between actors is the so-called “proof of work”: a number of actors provides computing power to validate the arrival of new information. In the context of cryptocurrencies, these actors are called miners: very powerful computing machines with high energy expenditure are all given a complex mathematical problem to solve at the same time. The first one to succeed will be able to validate the transaction and be paid for it. Each of the participants, called “nodes”, has therefore an updated history of the ledger that is the Blockchain. The way to corrupt a proof-of-work blockchain is to gather enough computational power to carry out a so-called “51%” attack, i.e., to carry the consensus towards a falsification of the chain: the double spending in particular. In fact, this attack is hardly conceivable on blockchains such as Bitcoin, as the computing power to be developed would be phenomenal (perhaps one day the quantum computer will make what we currently consider to be cryptography obsolete, but that is another debate…) Other validation techniques now exist; such as proof of participation or proof of storage. They were essentially designed to address the issues of scalability and energy sustainability of blockchains.

Figure 2 – Diagram of how to add a block to a blockchain.

Conceived in the aftermath of the 2008 financial crisis, this technology has a strong political connotation, and Bitcoin’s philosophy, for example, is to allow individuals to free themselves from banking and political control systems. Thus, the original blockchains, such as Bitcoin, are said to be “open”: anyone can read and write the chain’s registers. Over time, and for greater convenience by private companies, semi-closed blockchains (everyone can read but only a centralizing organization can write) or closed blockchains (reading and writing are reserved for a centralizing organization) have been developed. These new forms of blockchains move away considerably from the original philosophy, and one can legitimately question their relevance: they present some disadvantages of the blockchain in terms of difficulty of use while also retaining the problems associated with a centralized database: a single entity can voluntarily decide to corrupt it or suffer from a hacking.

This closed configuration often allows for greater scalability but raises a question that is as much technological as it is philosophical: is a blockchain, when fully centralized, still a blockchain?

Prospects for the use of technology in the fight against counterfeit drugs

At a time when trust is more than ever a central issue for the pharmaceutical industry, which sees its legitimacy and honesty questioned relentlessly, it is logical that the players in this sector are interested in this technology of trust par excellence. Among the various use cases, which we will no doubt come back to in future articles, the fight against counterfeit drugs is one of the most promising and most important in terms of human lives potentially saved. For example, Merck recently began collaborating with Walmart, IBM, and KPMG on an FDA-led pilot project to use blockchain to allow patients to track the entire pathway of the medication they take. This concept is already being functionally tested in Hong Kong on Gardasil, and using mobile applications downloaded by pharmacists and patients. Thus, the entire drug supply chain is built around the blockchain, making it possible to retrieve and assemble a large amount of data concerning, for example, shipping dates or storage conditions and temperatures. The aforementioned consortium is also exploring the use of Non-Fungible Tokens (NFT): unique and non-interchangeable digital tokens. Each box of medication produced would have an associated NFT, which would follow the box through its circuit, from the manufacturer to the wholesaler, from the wholesaler to the pharmacist and from the pharmacist to the patient. Thus, in the future, each patient would receive an NFT at the same time as the medication in order to certify the inviolability of its origin. None of the actors in the supply chain could take the liberty of fraudulently adding counterfeit drugs since they would not have their associated NFT. Future is probably pleasing and in favor of increased drug safety, but it will only be achievable after significant work, on the one hand to educate stakeholders and on the other hand to set up digital interfaces accessible to all patients.


With the emergence of e-commerce and its ever-increasing ease of access, the problem of counterfeit drugs has exploded in recent years, and it will be necessary for the pharmaceutical ecosystem to mobilize and innovate in order to curb it, as well as to restore the deteriorated trust. Several fascinating  initiatives using blockchain technology are currently being carried out by various stakeholders in the health sector, we can see in these projects the outline of a potential solution to drug counterfeiting, but we must however consider them with a certain critical mind. The temptation to market the buzz-word “blockchain” since the explosion of crypto-currencies in 2017 can be strong – and even, unfortunately, when the issues could be perfectly satisfied with a centralized database. Can we go so far as to think, as some specialists in this technology do, that blockchain is only viable and useful when it is used for financial transfers? The debate is open and there is no doubt that the future will quickly bring an answer!

Would you like to discuss the subject? You want to take part in the writing of articles for the Newsletter? You want to participate in an entrepreneurial project related to PharmaTech?

Contact us at! Join our LinkedIn group!

To subscribe to the monthly Newsletter for free: Registration

For further information:

These articles should interest you


Introduction to DeSci

How Science of the Future is being born before our eyes « [DeSci] transformed my research impact from a low-impact virology article every other year to saving the lives and…
Illustration In Silico

Towards virtual clinical trials?

Clinical trials are among the most critical and expensive steps in drug development. They are highly regulated by the various international health agencies, and for good reason: the molecule or…

To subscribe free of charge to the monthly Newsletter, click here.

Would you like to take part in the writing of Newsletter articles ? Would you like to take part in an entrepreneurial project on these topics ?

Contact us at ! Join our group LinkedIn !


Using Real World Data, an interview with Elise Bordet – RWD and Analytics Lead

Every month, Resolving Pharma interviews the stakeholders who shape the health and pharmaceutical industries of tomorrow. In this first interview, Elise Bordet honors us with her participation, many thanks for your time and your insights!

“Data access and analytics capabilities will become an increasingly important competitive advantage for pharmaceutical companies.”

Resolving Pharma] To begin with, could you introduce yourself and talk about your background? Why did you choose to work at the intersection of Data and Pharma?

[Elise Bordet] I am an agronomist, I did a PhD in Immunology-Virology and I then did an MBA before joining my current company. I am passionate about very technical and cutting-edge topics, and the implementation of new research approaches. I was very impressed by a conference on Artificial Intelligence and the notion of a 4th industrial revolution, I didn’t want to miss this subject.

I was very attached to fundamental research in the public sector, but I still wanted to form my own opinion about the pharmaceutical industry, and I am not disappointed at all. I think that it is a great place to contribute to research and the common good.

I love the ever-changing topics, where everything changes on a daily basis, where you always have to challenge yourself to stay updated on the latest innovations. Pharma, Data and AI subjects are heaven for me!

Can you tell us what Real World Data is and how the pharmaceutical industry uses it?

Real World Data is defined as data that is not collected in a randomized clinical trial. Therefore, it is a huge topic. It ranges from data collected in registries to larger databases such as medico-administrative databases.

This data allows the pharmaceutical industry to create drugs that are better adapted to the reality of Health systems. It also allows the creation of new research approaches, to support “drug repurposing” approaches for example.

How do Real World Evidence-based approaches differ from traditional pharmaceutical industry approaches? What are their added values?

Actually, these approaches have existed for a long time, particularly in Pharmacovigilance (the famous Phase IV). However, the amount of data available, its quality, our calculation and analysis capacities have been turned upside down. All these changes allow us to answer new research questions. Questions that remained unanswered because we did not have the capacity to look at what was happening in reality. The second subject is the major contributions of Artificial Intelligence: scientifically, we will be able to go much further.

In your opinion, how is the pharmaceutical industry going to balance the use of Real World Evidence with more traditionally generated clinical and pre-clinical data in the future?

Real World data will play an increasingly important role. Each type of data has its advantages and disadvantages. In fact, it is not a question of opposing data against each other, quite the contrary, the most interesting thing is to be able to bring all these data together and extract the most of information from them.

What impact could this type of data have on the drug value chain and the partnerships that the pharmaceutical industry needs to put in place?

Data access and analysis capabilities will become an increasingly important competitive advantage for pharmaceutical companies. The Data strategy of companies is one of the essential pillars. I imagine that in the future we will look not only at the value of a company’s portfolio, but also at the value and the impact of the analytics that can be performed by the company. Data is going to play so much on the projects’ probability of success that it is difficult to imagine not taking it into account in the metrics of economic valuation.

You recently gave a presentation on digital twin technology. Can you explain what it is?

Digital twin is a very elegant concept that can be summarized as follows: with each development, we generate new data we have to rely on for the next projects. This data should allow us to model most of the levels of biological organization: molecular, cellular, tissular and then at the scale of organs or even of organisms. This modeling will prevent replicating knowledge that has already been created and will notably allow us to accelerate pre-clinical and clinical development, and why not to model the first Phase I results very precisely.

How do you see the pharmaceutical industry in 30 years’ time?

Wow! Everything is going to be different! First of all, I think that, as in all industries, technology will have enabled a profound transformation of all decision making, what we call “data-driven decision making”. Science will have made incredible progress, calculation and prediction capacities will have been multiplied, there will be new approaches in Artificial Intelligence that we do not know today. We will have made immense progress in the interoperability of the various health databases that are fragmented today. It is a good exercise to try projecting ourselves in 30 years’ time. We won’t remember how we did things before, that’s the principle of technological revolutions; we’ve already forgotten how we lived without cell phones and the Internet! We will no longer see ourselves without Data and AI at the center of our decisions and projects. From a more organizational point of view, data sharing will have facilitated public and private scientific collaborations and the implementation of projects that will accelerate research, such as the Health Data Hub in France or the European Health Data Space that will be launched by the European Union.

Do you have any advice for someone who wants to work in Data Science in the Healthcare sector?

We scientists learned through doubt and are still haunted by it. Just because you have expertise in one field (clinical trials, laboratory research, etc.) does not mean that you cannot acquire other skills in Data Science or Artificial Intelligence, for example. Versatile profiles are and will be the most sought after. So my advice is: don’t panic!

If you can, start quickly to train yourself, the Internet puts us at a click of the best courses on programming, Data Science and many other advanced subjects, take advantage of it!

Go ahead and start tomorrow!

These articles should interest you


Introduction to DeSci

How Science of the Future is being born before our eyes « [DeSci] transformed my research impact from a low-impact virology article every other year to saving the lives and…
Illustration In Silico

Towards virtual clinical trials?

Clinical trials are among the most critical and expensive steps in drug development. They are highly regulated by the various international health agencies, and for good reason: the molecule or…

To subscribe free of charge to the monthly Newsletter, click here.

Would you like to take part in the writing of Newsletter articles ? Would you like to take part in an entrepreneurial project on these topics ?

Contact us at ! Join our group LinkedIn !