Designing synthetic reaction pathways for molecules is a fundamental aspect of organic synthesis, holding significant implications for various fields such as biomedical, pharmaceutical, and materials industries. Retrosynthetic analysis is the most widely used approach for developing synthetic routes.
Organic synthesis plays a pivotal role in various fields, including biomedical, pharmaceutical, and materials industries. Retrosynthetic analysis serves as the primary approach for designing synthetic routes, aiming to decompose molecules into simpler precursors using established reactions. This methodology, initially formalized by Corey, led to the development of computer-aided synthesis planning (CASP). In recent years, artificial intelligence (AI)-driven retrosynthesis has facilitated the exploration of more complex molecules and significantly reduced the time and energy required to design synthetic experiments. Single-step retrosynthesis prediction is a crucial component of retrosynthetic planning, and several deep learning-based methods have been proposed with promising results. These methods can be broadly categorized into three groups: template-based, template-free, and semi-template-based methods.
Template-based methods regard retrosynthesis prediction as a template retrieval problem and compare the target molecule with precomputed templates. These templates capture the essential features of the reaction center in specific types of chemical reactions. They can be generated manually or automatically and serve as a guide for the model to identify the most suitable chemical transformation for a given molecule. Various works have proposed different approaches to prioritize candidate templates. RetroSim employed the molecular fingerprint similarity between the given product and the molecules present in the corpus to rank the candidate templates. NeuralSym was the pioneering work to utilize deep neural networks for template selection by learning a multi-class classifier. GLN built a conditional graph logic network to learn the conditional joint probability of templates and reactants. LocalRetro conducted an evaluation of the suitability of local atom/bond templates at all predicted reaction centers for a target molecule and incorporated the non-local effects in chemical reactions through global reactivity attention. It has demonstrated state-of-the-art performance within the template-based methods. Although providing interpretability and molecule validity, template-based models suffer from limited generalization and scalability issues, which can hinder their practical utility.
Template-free methods utilize deep generative models to generate reactant molecules without relying on predefined templates. Most of existing methods reformulate the task as a sequence-to-sequence problem, employing the sequence representation of molecules, specifically the simplified molecular-input line-entry system (SMILES). Liu et al. first utilized a long short-term memory (LSTM)-based sequence-to-sequence (Seq2Seq) model to convert the SMILES representation of a product to the SMILES of the reactants. Karpov et al. further proposed a Transformer-based Seq2Seq method for retrosynthesis. SCROP integrated a grammar corrector into the Transformer architecture, aiming to resolve the prevalent problem of grammatical invalidity in seq2seq methods. R-SMILES established a closely aligned one-to-one mapping between the SMILES representations of the products and the reactants to enhance the efficiency of synthesis prediction in Transformer-based methods. PMSR devised three tailored pre-training tasks for retrosynthesis, encompassing auto-regression, molecule recovery, and contrastive reaction classification, thereby enhancing the performance of retrosynthesis and achieving state-of-the-art accuracy within template-free methods. Some studies characterize the task as a graph-to-sequence problem, employing the molecular graph as input. Graph2SMILES integrated a sequential graph encoder with a Transformer decoder to preserve the permutation invariance of SMILES. Retroformer introduced a local attention head in the Transformer encoder to augment its reasoning capability for reactions. Recent studies, including MEGAN, MARS, and Graph2Edits, have explored the utilization of end-to-end molecular graph editing models to represent a chemical reaction as a series of graph edits, drawing inspiration from the arrow pushing formalism. However, these approaches usually require time-consuming predictions for sequential graph edit operations. Fang et al. developed a substructure-level decoding method by automatically extracting commonly preserved portions of product molecules. However, the extraction of substructures is fully data-driven, and its coverage depends on the reaction dataset. Furthermore, incorrect substructures can lead to erroneous predictions. While template-free methods are fully data-driven, they raise concerns regarding the interpretability, chemical validity, and diversity of the generated molecules.
Semi-template-based methods leverage the benefits of the two aforementioned methods. These methods follow a two-stage procedure: first, fragmenting the target molecule into synthons by identifying reactive sites, and subsequently converting the synthons into reactants using techniques such as leaving groups selection, graph generation, or SMILES generation. RetroXpert first identified the reaction center of the target molecule to obtain synthons by employing an edge-enhanced graph attention network, followed by the generation of the corresponding reactants based on the synthons. RetroPrime introduced the mix-and-match and label-and-align strategies within a Transformer-based two-stage workflow to mitigate the challenges of insufficient diversity and chemical implausibility. G2Gs initially partitioned the target molecular graph into several synthons by identifying potential reaction centers, followed by the translation of the synthons into the complete reactant graphs using a variational graph translation framework. GraphRetro first transformed the target into synthons by
Disclaimer:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.