Breaking the 90% Barrier: How AI is Transforming Drug Development
The landscape of pharmaceutical drug discovery is undergoing a dramatic transformation, driven by the integration of AI. With the global AI in drug discovery market valued at US$ 1.70 billion in 2023 and projected to reach US$ 11.93 billion by 2033, growing at a CAGR of 21.5%, the industry stands at the cusp of a revolutionary change in how new therapeutics are developed. This has been quite present in both the drug development and AI world, yet I’ve failed to see a sufficient breakdown that blends the two together.
The statute quo - the current drug development process
The conventional drug development process is both time-intensive and financially demanding. Most researchers dedicate their entire careers to developing a single drug, with costs ranging from $314 million to $4.46 billion per successful drug, depending on the focused area. This extensive process consists of five critical stages:
1. Discovery and Development
This step consists of years of research (usually ranging from 3 - 6 years), typically conducted by highly trained scientists in laboratory settings. This stage encompasses pre-discovery research to understand disease mechanisms and identify potential targets and drug discovery efforts to find therapeutic molecules or biologics that can effectively treat or alleviate symptoms.
2. Preclinical Research
This stage focuses on validating drug candidates by essentially ensuring that it’s safe for humans to use. The main goals of step 2 here is, clarification of the drug's mode of action, checking for any potential toxicity, and efficacy validation through in vitro (research done on a living organism) and in vivo (research done in a lab dish or test tube) models and initial formulation evaluations.
3. Clinical Research
Human trials begin, marking a critical phase in determining the drug's real-world effectiveness and safety. This is usually further broken down into two steps, one for testing on smaller groups of people and another for larger groups of people.
4 & 5. FDA Review
The final stages involve regulatory review and approval, ensuring the drug meets all safety and efficacy standards.
The Current Crisis in Drug Development
As I previously laid out it is quite evident that this is an incredibly long, thorough/tedious process, but beyond even just one run being lengthy there is the potential that drugs could fail at any step of the process. The most striking statistic is the 90% failure rate in clinical trials, with those 90% of failures being attributed to 40-50% lack of clinical efficacy, 30% unmanageable toxicity, 10-15% poor drug-like properties and 10% lack of commercial needs or poor strategic planning.
These failures are particularly problematic because they often require researchers to return to square one:
Phase 1 (Safety Testing): Usually fails when the drug isn’t safe for human consumption. Toxicity issues often force a complete restart from preclinical development and scientists have to go back to the drug discovery phase.
Phase 2 (Initial Efficacy Testing): The drug fails when trying to administer to a smaller group of humans. Insufficient effectiveness requires new formulations or target identification, which in most cases means back to the drug discovery phase.
Phase 3 (Large-scale Efficacy Testing): Poor performance against existing treatments necessitates returning to drug discovery
So despite the drug discovery component not being as long as the trials, every time a drug fails, researchers end up right at the beginning of another 3-6 year research phase.
On top of this being incredibly time-consuming it is also very expensive to fail. In fact depending on the type of drug being developed the cost of acquiring a drug end-to-end can range from $314M to $4.46B. From that, most of the costs come from out-of-pocket expenses from failures by going through a series of phase-III clinical trials.
When a drug fails, scientists can’t simply tweak it and try again immediately. Instead, they often have to go back to square one, spending years identifying and testing a new candidate. Unlike software where failures lead to quick iterations and improvements, drug development is stuck in a slow, expensive cycle where each failure sets the process back by years. The rebound rate is slow mainly because of the many bottlenecks in the drug discovery phase.
A deeper look at the drug discovery phase
Drug discovery at a high level involves identifying a potential drug candidate that interacts with a disease-related biological target. There are 3 key steps:
Find a biological molecule (protein, enzyme or gene) that is linked to a disease.
Screen thousands or even millions of compounds to find ones that interact with the target.
Lead optimization by improving promising compounds to increase effectiveness and reduce toxicity.
As it stands now, this phase is incredibly inefficient for many reasons. First off, identifying and validating biological targets. The success of a drug is contingent on its ability to interact with a specific target involved in disease pathology. Developing reliable and reproducible assays (procedures to test or measure the activity of a drug) has also been proven to be a huge time sink, additionally, data Management and Integration is huge. Managing large volumes of data from various sources, including experimental results, computational models and literature can be overwhelming. If there were any ways of optimizing this phase, it would allow researchers to move faster and go from theory to testing faster and build a shorter feedback loop, resulting in them learning faster and greatly shortening the time to ship out a drug.
The need to shorten the drug development process
The need to move faster in this field is quite real. Every year people die because of these diseases that are just left uncured. For patients with serious or life-threatening conditions, the typical 10-15+ year timeline for drug development is too long. Faster development means potential life saving therapies. There are also a lot of incentives from a company/investor point of view to accelerate development, as this would allow pharmaceutical companies to gain a first-mover advantage, capture market share and secure investment.
The AI Revolution in Drug Discovery
Now the fun part, the tech! The integration of AI is transforming each stage of the drug discovery process. When Google’s Deepmind released Alphafold the drug development world paused, it became an incredibly promising field, and more and more ML researchers rushed to contribute to the space. Before jumping into the more research based content, let’s focus in on the type of data we can extract from proteins that become valuable when trying to make any sort of prediction. To keep it very brief here are some of the two most common groups of data:
Sequence data
Think more one dimensional data, this consists of amino acid composition, amino acid sequence length, molecular weight and isoelectric point
Structural data
This is more “multidimensional data“, the goal with structural data is to capture more context on the overall structure of these proteins, including alpha helix, beta sheets, ligand-binding regions, solvent interactions, various bonding interactions, etc.
Building on this, researchers have developed increasingly sophisticated approaches to target identification and validation.
The one I personally find more interesting, Graph Neural Networks (GNNs). We can treat atoms as nodes and chemical bonds as edges in a complex network. These networks implement message-passing algorithms where each atom's properties are updated based on its chemical environment, including features such as atomic number, hybridization state, and formal charge.
Support Vector Machines (SVMs), I recently read a paper where a team was using specialized kernels designed specifically for chemical similarity comparisons. They experimented with using molecular fingerprints that encode structural and chemical properties. These kernels can capture subtle patterns in molecular structure that correlate with biological activity.
Random Forests, using random forests have also been explored, with the goal of chemical space exploration, using ensemble methods that combine hundreds or thousands of decision trees, each trained on different aspects of molecular properties.
Three-dimensional convolutional neural networks, are another one I find super cool. The process voxelized representations of protein-ligand complexes, considering multiple channels of atomic properties simultaneously. These networks are often augmented with physics-informed layers that incorporate molecular mechanics terms, ensuring predictions align with known chemical principles.
Variational autoencoders (VAEs) and generative adversarial networks (GANs) can now explore chemical space in a directed manner, generating novel molecules with desired properties. These models work by learning a continuous representation of molecular structure. The generative models are often coupled with predictive models in a multi-objective optimization framework, simultaneously considering factors like synthetic accessibility, binding affinity, and potential toxicity.
I’ve also read a bunch about multi-modal learning systems that essentially combine a bunch of different types of data (i.e. structural data, sequence information, chemical descriptors, etc.) and various systems described above.
We also see researchers having the ability to quality iterate. Through techniques like layer-wise relevance propagation and attention visualization, researchers can understand why models make specific predictions. I’ve also seen researchers integrate Bayesian neural networks and ensemble methods to provide uncertainty quantification.
Early Success Stories and Industry Impact
The implementation of AI in drug discovery has shown promising results, AI-discovered drugs demonstrate 80-90% success rates in Phase 1 trials, compared to traditional 40-65% rates and have shown potential doubling of R&D productivity, with overall success rates potentially increasing from 5-10% to 9-18%. Additionally, time and cost reductions of 70-80% in early discovery stages
Another exciting stat shows how much more buzz there is in this space now as a result, clinical trial AI has shown remarkable growth, with a 444% increase in publication volume since 2019 (CAGR 40%), closely followed by AI drug discovery at 421% (CAGR 39%).
The Path Forward
The integration of AI in drug discovery represents more than just technological advancement; it addresses crucial human needs. For patients with life-threatening conditions, reducing the typical 10-15 year development timeline could mean the difference between life and death. For pharmaceutical companies, accelerated development provides competitive advantages and improved return on investment.
As AI technologies continue to evolve and demonstrate their value in drug discovery, we can expect to see increased adoption across the industry. The combination of reduced timelines, improved success rates, and lower costs promises to revolutionize how we develop new therapeutics, ultimately benefiting both patients and the healthcare industry as a whole.