$96418.721981 USD

-2.05%

ethereum

$2687.739314 USD

-2.74%

xrp

$2.588011 USD

-3.00%

tether

$0.999825 USD

-0.04%

bnb

$656.326573 USD

0.40%

solana

$171.386564 USD

-2.54%

usd-coin

$1.000043 USD

0.01%

dogecoin

$0.244077 USD

-3.80%

cardano

$0.767310 USD

-3.77%

tron

$0.237868 USD

-4.90%

chainlink

$17.505561 USD

-4.59%

sui

$3.344930 USD

-4.57%

avalanche

$24.939290 USD

-1.00%

stellar

$0.327623 USD

-3.46%

litecoin

$129.677981 USD

-3.20%

加密货币新闻

PatchScopes：大语模型神经元的手术（LLMS）

2025/02/23 01:00

大型语言模型（LLM）彻底改变了人工智能领域，表现出了自然语言理解和产生的显着能力。这些模型由相互联系的人工神经元层组成，通过称为隐藏表示形式的数字向量进行通信。但是，在这些隐藏表示形式中编码的含义是一个重大挑战。机器学习能力的领域旨在弥合这一差距，而Google研究人员提出了一种了解LLM“思考”的方法。

Large Language Models (LLMs) have revolutionized the field of artificial intelligence, demonstrating remarkable capabilities in natural language understanding and generation. These models, comprised of layers of interconnected artificial neurons, communicate through vectors of numbers known as hidden representations. However, deciphering the meaning encoded within these hidden representations has been a significant challenge. The field of machine learning interpretability seeks to bridge this gap, and "Patchscopes" that Google researchers came up with a method to understand what LLM “thinks”.

Patchscopes is a novel interpretability method that enables researchers to perform "surgery" on the neurons of an LLM. This involves cutting out and replacing hidden representations between different prompts and layers, allowing for a detailed inspection of the information contained within. The core concept is the "inspection prompt," which acts as a lens into the LLM's mind, facilitating the extraction of human-interpretable meaning. The framework leverages the inherent ability of LLMs to translate their own hidden representations into understandable text.

PatchScopes是一种新型的可解释性方法，使研究人员能够对LLM的神经元进行“手术”。这涉及切割和替换不同提示和层之间的隐藏表示形式，从而详细检查了其中包含的信息。核心概念是“检查提示”，它充当LLM脑海中的镜头，促进了人类解动意义的提取。该框架利用LLM将其隐藏表示形式转化为可理解的文本的固有能力。

Understanding the Transformer Architecture: A Foundation for Patchscopes

了解变压器体系结构：PatchScopes的基础

Patchscopes builds upon a deep understanding of LLMs and the transformer architecture, which forms the backbone of many modern language models. Transformer models process text by first tokenizing the input, breaking it down into smaller units (words or sub-words). Each token is then embedded into a high-dimensional vector space, creating an initial hidden representation.

PatchScopes建立在对LLM和Transformer架构的深入了解的基础上，后者构成了许多现代语言模型的骨干。变压器模型首先对输入进行处理，将其分解为较小的单元（单词或子字）。然后将每个令牌嵌入高维矢量空间中，从而创建一个初始的隐藏表示形式。

The transformer architecture consists of multiple layers of transformer blocks. Each layer refines the hidden representation based on the output of the preceding layer and the relationships between tokens in the input sequence. This process continues through the final layer, where the hidden representation is used to generate the output text. Decoder-only models, which are the focus of Patchscopes, only consider preceding tokens when generating the next token, making them particularly well-suited for language generation tasks.

变压器架构由多个变压器块组成。每一层都根据前面一层的输出以及输入序列中的令牌之间的关系来完善隐藏的表示。此过程继续通过最后一层，其中隐藏的表示形式用于生成输出文本。仅解码器模型是PatchScopes的重点，仅在生成下一个令牌时考虑之前的令牌，这使其特别适合语言生成任务。

The Patchscopes framework operates on a simple yet powerful premise: LLMs possess the inherent ability to translate their own hidden representations into human-understandable text. By patching hidden representations between different locations during inference, researchers can inspect the information within a hidden representation, understand LLM behavior, and even augment the model's performance.

PatchScopes框架以简单而强大的前提运行：LLM具有将自己的隐藏表示形式转化为人类可行的文本的固有能力。通过在推理期间修补不同位置之间的隐藏表示形式，研究人员可以检查隐藏表示形式中的信息，了解LLM行为，甚至增强模型的性能。

The process involves several key steps:

该过程涉及多个关键步骤：

Source Prompt: A source prompt is fed into the LLM, generating hidden representations at each layer. This prompt serves as the context from which information will be extracted.

源提示：源提示符被馈入LLM，在每一层中生成隐藏的表示形式。该提示是从中提取信息的上下文。

Inspection Prompt: An inspection prompt is designed to elicit a specific type of information from the LLM. This prompt typically includes a placeholder token where the hidden representation from the source prompt will be inserted.

检查提示：检查提示旨在从LLM中获取特定类型的信息。该提示通常包括一个占位符令牌，其中将插入源提示中隐藏的表示。

Patching: The hidden representation from a specific layer and token position in the source prompt is "patched" into the placeholder token in the inspection prompt. This effectively replaces the LLM's internal representation with the extracted information.

修补：在检查提示符中，将源提示符中特定层和令牌位置的隐藏表示形式“修补”。这有效地用提取的信息有效地取代了LLM的内部表示。

Generation: The LLM continues decoding from the patched inspection prompt, generating text based on the combined information from the source and inspection prompts.

一代：LLM继续从修补的检查提示符中解码，并根据来自源和检查提示的组合信息生成文本。

Analysis: The generated text is analyzed to understand the information encoded in the hidden representation. This can involve evaluating the accuracy of factual information, identifying the concepts captured by the representation, or assessing the model's reasoning process.

分析：分析生成的文本以了解隐藏表示形式中编码的信息。这可能涉及评估事实信息的准确性，识别代表捕获的概念或评估模型的推理过程。

Case Study 1: Entity Resolution

案例研究1：实体解决

The first case study explores how LLMs resolve entities (people, places, movies, etc.) across different layers of the model. The goal is to understand at what point the model associates a token with its correct meaning. For example, how does the model determine that "Diana" refers to "Princess Diana" rather than the generic name?

第一个案例研究探讨了LLM如何解决模型不同层的实体（人，地方，电影等）。目的是了解模型在什么时候将令牌与正确的含义相关联。例如，模型如何确定“戴安娜”是指“戴安娜公主”而不是通用名称？

To investigate this, a source prompt containing the entity name is fed into the LLM. The hidden representation of the entity token is extracted at each layer and patched into an inspection prompt designed to elicit a description of the entity. By analyzing the generated descriptions, researchers can determine when the model has successfully resolved the entity.

为了进行调查，包含实体名称的源提示将被送入LLM。实体令牌的隐藏表示形式在每一层中都提取，并将其修补成一个检查提示，旨在引起对实体的描述。通过分析生成的描述，研究人员可以确定模型何时成功解决了该实体。

The results of this case study suggest that entity resolution typically occurs in the early layers of the model (before layer 20). This aligns with theories about layer function, which posit that early layers are responsible for establishing context from the prompt. The study also reveals that tokenization (how the input text is broken down into tokens) has a significant impact on how the model navigates its embedding space.

该案例研究的结果表明，实体分辨率通常发生在模型的早期层（第20层之前）。这与关于层函数的理论保持一致，该理论认为早期层是从提示中建立上下文的原因。该研究还表明，令牌化（输入文本如何分解为令牌）对模型如何导航其嵌入空间有重大影响。

Case Study 2: Attribute Extraction

案例研究2：属性提取

The second case study focuses on evaluating how accurately the model's hidden representation captures well-known concepts and their attributes. For example, can the model identify that the largest city in Spain is Madrid?

第二个案例研究的重点是评估模型的隐藏表示形式的准确程度捕获了知名概念及其属性。例如，该模型可以确定西班牙最大的城市是马德里吗？

To extract an attribute, a source prompt containing the subject (e.g., "Spain") is fed into the LLM. The hidden representation of the subject token is extracted and patched into an inspection prompt designed to elicit the specific attribute (e.g., "The largest city is x"). By analyzing the generated text, researchers can determine whether the model correctly identifies the attribute.

为了提取属性，将包含受试者（例如“西班牙”）的源提示送入LLM。将主题令牌的隐藏表示形式提取并修补为旨在引起特定属性的检查提示（例如，“最大的城市是x”）。通过分析生成的文本，研究人员可以确定模型是否正确识别属性。

This case study compares Patchscopes to a technique called "probing," which involves training a classifier to predict an attribute from a hidden representation. Unlike probing, Patchscopes does not

该案例研究将PatchScopes与一种称为“探测”的技术进行了比较，该技术涉及训练分类器以预测隐藏表示形式的属性。与探测不同，PatchScopes没有

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资，kdj.com不承担任何责任。加密货币具有高波动性，强烈建议您深入研究后，谨慎投资！

如您认为本网站上使用的内容侵犯了您的版权，请立即联系我们（info@kdj.com），我们将及时删除。

2025年02月23日发表的其他文章