bitcoin
bitcoin

$86835.56 USD 

-1.99%

ethereum
ethereum

$3148.47 USD 

-5.37%

tether
tether

$1.00 USD 

0.05%

solana
solana

$205.22 USD 

-5.48%

bnb
bnb

$604.00 USD 

-6.85%

dogecoin
dogecoin

$0.361466 USD 

-8.80%

xrp
xrp

$0.649617 USD 

8.29%

usd-coin
usd-coin

$1.00 USD 

0.03%

cardano
cardano

$0.531452 USD 

-9.19%

tron
tron

$0.176737 USD 

5.38%

shiba-inu
shiba-inu

$0.000024 USD 

-17.26%

toncoin
toncoin

$5.22 USD 

-5.28%

avalanche
avalanche

$31.82 USD 

-10.20%

sui
sui

$2.93 USD 

-5.39%

chainlink
chainlink

$13.27 USD 

-9.15%

加密货币新闻

EditRetro:将逆合成预测重新定义为分子串编辑任务

2024/07/31 03:06

设计分子的合成反应路径是有机合成的一个基本方面,对生物医学、制药和材料工业等各个领域具有重要意义。逆合成分析是开发合成路线最广泛使用的方法。

EditRetro:将逆合成预测重新定义为分子串编辑任务

Organic synthesis plays a pivotal role in various fields, including biomedical, pharmaceutical, and materials industries. Retrosynthetic analysis serves as the primary approach for designing synthetic routes, aiming to decompose molecules into simpler precursors using established reactions. This methodology, initially formalized by Corey, led to the development of computer-aided synthesis planning (CASP). In recent years, artificial intelligence (AI)-driven retrosynthesis has facilitated the exploration of more complex molecules and significantly reduced the time and energy required to design synthetic experiments. Single-step retrosynthesis prediction is a crucial component of retrosynthetic planning, and several deep learning-based methods have been proposed with promising results. These methods can be broadly categorized into three groups: template-based, template-free, and semi-template-based methods.

有机合成在生物医学、制药和材料工业等各个领域发挥着举足轻重的作用。逆合成分析是设计合成路线的主要方法,旨在利用已建立的反应将分子分解成更简单的前体。这种方法最初由 Corey 正式提出,导致了计算机辅助合成规划 (CASP) 的发展。近年来,人工智能(AI)驱动的逆合成促进了对更复杂分子的探索,并显着减少了设计合成实验所需的时间和精力。单步逆向综合预测是逆向综合规划的重要组成部分,并且已经提出了几种基于深度学习的方法,并取得了可喜的结果。这些方法可以大致分为三组:基于模板的方法、无模板的方法和半基于模板的方法。

Template-based methods regard retrosynthesis prediction as a template retrieval problem and compare the target molecule with precomputed templates. These templates capture the essential features of the reaction center in specific types of chemical reactions. They can be generated manually or automatically and serve as a guide for the model to identify the most suitable chemical transformation for a given molecule. Various works have proposed different approaches to prioritize candidate templates. RetroSim employed the molecular fingerprint similarity between the given product and the molecules present in the corpus to rank the candidate templates. NeuralSym was the pioneering work to utilize deep neural networks for template selection by learning a multi-class classifier. GLN built a conditional graph logic network to learn the conditional joint probability of templates and reactants. LocalRetro conducted an evaluation of the suitability of local atom/bond templates at all predicted reaction centers for a target molecule and incorporated the non-local effects in chemical reactions through global reactivity attention. It has demonstrated state-of-the-art performance within the template-based methods. Although providing interpretability and molecule validity, template-based models suffer from limited generalization and scalability issues, which can hinder their practical utility.

基于模板的方法将逆合成预测视为模板检索问题,并将目标分子与预先计算的模板进行比较。这些模板捕获特定类型化学反应中反应中心的基本特征。它们可以手动或自动生成,并作为模型的指南来识别给定分子的最合适的化学转化。各种工作提出了不同的方法来确定候选模板的优先级。 RetroSim 利用给定产品和语料库中存在的分子之间的分子指纹相似性来对候选模板进行排名。 NeuralSym 是通过学习多类分类器来利用深度神经网络进行模板选择的开创性工作。 GLN 构建了一个条件图逻辑网络来学习模板和反应物的条件联合概率。 LocalRetro 对目标分子的所有预测反应中心的局部原子/键模板的适用性进行了评估,并通过全局反应性关注将非局部效应纳入化学反应中。它在基于模板的方法中展示了最先进的性能。尽管提供了可解释性和分子有效性,但基于模板的模型存在有限的泛化性和可扩展性问题,这可能会阻碍其实际应用。

Template-free methods utilize deep generative models to generate reactant molecules without relying on predefined templates. Most of existing methods reformulate the task as a sequence-to-sequence problem, employing the sequence representation of molecules, specifically the simplified molecular-input line-entry system (SMILES). Liu et al. first utilized a long short-term memory (LSTM)-based sequence-to-sequence (Seq2Seq) model to convert the SMILES representation of a product to the SMILES of the reactants. Karpov et al. further proposed a Transformer-based Seq2Seq method for retrosynthesis. SCROP integrated a grammar corrector into the Transformer architecture, aiming to resolve the prevalent problem of grammatical invalidity in seq2seq methods. R-SMILES established a closely aligned one-to-one mapping between the SMILES representations of the products and the reactants to enhance the efficiency of synthesis prediction in Transformer-based methods. PMSR devised three tailored pre-training tasks for retrosynthesis, encompassing auto-regression, molecule recovery, and contrastive reaction classification, thereby enhancing the performance of retrosynthesis and achieving state-of-the-art accuracy within template-free methods. Some studies characterize the task as a graph-to-sequence problem, employing the molecular graph as input. Graph2SMILES integrated a sequential graph encoder with a Transformer decoder to preserve the permutation invariance of SMILES. Retroformer introduced a local attention head in the Transformer encoder to augment its reasoning capability for reactions. Recent studies, including MEGAN, MARS, and Graph2Edits, have explored the utilization of end-to-end molecular graph editing models to represent a chemical reaction as a series of graph edits, drawing inspiration from the arrow pushing formalism. However, these approaches usually require time-consuming predictions for sequential graph edit operations. Fang et al. developed a substructure-level decoding method by automatically extracting commonly preserved portions of product molecules. However, the extraction of substructures is fully data-driven, and its coverage depends on the reaction dataset. Furthermore, incorrect substructures can lead to erroneous predictions. While template-free methods are fully data-driven, they raise concerns regarding the interpretability, chemical validity, and diversity of the generated molecules.

无模板方法利用深度生成模型来生成反应物分子,而不依赖于预定义的模板。大多数现有方法将任务重新表述为序列到序列问题,采用分子的序列表示,特别是简化的分子输入行输入系统(SMILES)。刘等人。首先利用基于长短期记忆 (LSTM) 的序列到序列 (Seq2Seq) 模型将产物的 SMILES 表示转换为反应物的 SMILES。卡尔波夫等人。进一步提出了一种基于 Transformer 的 Seq2Seq 逆合成方法。 SCROP 将语法校正器集成到 Transformer 架构中,旨在解决 seq2seq 方法中普遍存在的语法无效问题。 R-SMILES 在产物和反应物的 SMILES 表示之间建立了紧密一致的一对一映射,以提高基于 Transformer 的方法中合成预测的效率。 PMSR 设计了三个量身定制的逆合成预训练任务,包括自回归、分子恢复和对比反应分类,从而提高逆合成的性能并在无模板方法中实现最先进的准确性。一些研究将任务描述为图到序列问题,采用分子图作为输入。 Graph2SMILES 将顺序图编码器与 Transformer 解码器集成在一起,以保持 SMILES 的排列不变性。 Retroformer 在 Transformer 编码器中引入了本地注意力头,以增强其反应推理能力。最近的研究,包括 MEGAN、MARS 和 Graph2Edits,探索了利用端到端分子图编辑模型将化学反应表示为一系列图编辑,从推动形式主义的箭头中汲取灵感。然而,这些方法通常需要对顺序图编辑操作进行耗时的预测。方等人。通过自动提取产品分子的共同保留部分,开发了一种子结构级解码方法。然而,子结构的提取完全是数据驱动的,其覆盖范围取决于反应数据集。此外,不正确的子结构可能会导致错误的预测。虽然无模板方法完全由数据驱动,但它们引起了对所生成分子的可解释性、化学有效性和多样性的担忧。

Semi-template-based methods leverage the benefits of the two aforementioned methods. These methods follow a two-stage procedure: first, fragmenting the target molecule into synthons by identifying reactive sites, and subsequently converting the synthons into reactants using techniques such as leaving groups selection, graph generation, or SMILES generation. RetroXpert first identified the reaction center of the target molecule to obtain synthons by employing an edge-enhanced graph attention network, followed by the generation of the corresponding reactants based on the synthons. RetroPrime introduced the mix-and-match and label-and-align strategies within a Transformer-based two-stage workflow to mitigate the challenges of insufficient diversity and chemical implausibility. G2Gs initially partitioned the target molecular graph into several synthons by identifying potential reaction centers, followed by the translation of the synthons into the complete reactant graphs using a variational graph translation framework. GraphRetro first transformed the target into synthons by

半基于模板的方法利用了上述两种方法的优点。这些方法遵循两阶段程序:首先,通过识别反应位点将目标分子片段化为合成子,然后使用离去基团选择、图形生成或 SMILES 生成等技术将合成子转化为反应物。 RetroXpert首先利用边缘增强图注意力网络识别目标分子的反应中心以获得合成子,然后根据合成子生成相应的反应物。 RetroPrime 在基于 Transformer 的两阶段工作流程中引入了混合匹配和标签对齐策略,以缓解多样性不足和化学不可信的挑战。 G2Gs 最初通过识别潜在的反应中心将目标分子图划分为多个合成子,然后使用变分图翻译框架将合成子翻译成完整的反应物图。 GraphRetro 首先将目标转化为合成子

新闻来源:www.nature.com

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!

如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。

2024年11月13日 发表的其他文章