$101370.500588 USD

-1.55%

ethereum

$3308.844212 USD

-2.64%

tether

$0.999667 USD

-0.03%

xrp

$2.209579 USD

-5.26%

bnb

$950.926605 USD

0.10%

solana

$156.806796 USD

-2.58%

usd-coin

$0.999846 USD

0.01%

tron

$0.284035 USD

-1.68%

dogecoin

$0.162668 USD

-1.91%

cardano

$0.532525 USD

-1.22%

hyperliquid

$38.614736 USD

-5.42%

chainlink

$14.778338 USD

-1.59%

bitcoin-cash

$476.084896 USD

-1.66%

zcash

$547.658593 USD

13.82%

ethena-usde

$0.998916 USD

-0.03%

암호화폐 뉴스 기사

Patchscopes : 대형 언어 모델의 뉴런에 대한 수술 (LLMS)

2025/02/23 01:00

대형 언어 모델 (LLM)은 인공 지능 분야에 혁명을 일으켜 자연어 이해와 세대에서 놀라운 기능을 보여줍니다. 상호 연결된 인공 뉴런의 층으로 구성된이 모델은 숨겨진 표현으로 알려진 숫자의 벡터를 통해 전달됩니다. 그러나 이러한 숨겨진 표현 내에서 인코딩 된 의미를 해독하는 것은 중요한 도전이었습니다. 머신 러닝 해석 성 분야는 이러한 격차를 해소하고 Google 연구원들이 LLM이 무엇을 생각하는지 이해하는 방법을 생각해 냈습니다.

Large Language Models (LLMs) have revolutionized the field of artificial intelligence, demonstrating remarkable capabilities in natural language understanding and generation. These models, comprised of layers of interconnected artificial neurons, communicate through vectors of numbers known as hidden representations. However, deciphering the meaning encoded within these hidden representations has been a significant challenge. The field of machine learning interpretability seeks to bridge this gap, and "Patchscopes" that Google researchers came up with a method to understand what LLM “thinks”.

Patchscopes is a novel interpretability method that enables researchers to perform "surgery" on the neurons of an LLM. This involves cutting out and replacing hidden representations between different prompts and layers, allowing for a detailed inspection of the information contained within. The core concept is the "inspection prompt," which acts as a lens into the LLM's mind, facilitating the extraction of human-interpretable meaning. The framework leverages the inherent ability of LLMs to translate their own hidden representations into understandable text.

Patchscopes는 연구원이 LLM의 뉴런에서 "수술"을 수행 할 수있는 새로운 해석 방법입니다. 여기에는 서로 다른 프롬프트와 레이어 사이의 숨겨진 표현을 차단하고 대체하여 내부에 포함 된 정보를 자세히 검사 할 수 있습니다. 핵심 개념은 "검사 프롬프트"이며, 이는 LLM의 마음에 렌즈 역할을하여 인간의 인터뷰 가능한 의미의 추출을 촉진합니다. 프레임 워크는 LLM의 고유 한 능력을 활용하여 자신의 숨겨진 표현을 이해할 수있는 텍스트로 변환합니다.

Understanding the Transformer Architecture: A Foundation for Patchscopes

변압기 아키텍처 이해 : 패치 스코프의 기초

Patchscopes builds upon a deep understanding of LLMs and the transformer architecture, which forms the backbone of many modern language models. Transformer models process text by first tokenizing the input, breaking it down into smaller units (words or sub-words). Each token is then embedded into a high-dimensional vector space, creating an initial hidden representation.

Patchscopes는 많은 현대 언어 모델의 중추를 형성하는 LLM과 Transformer Architecture에 대한 깊은 이해를 바탕으로합니다. 변압기 모델은 먼저 입력을 발표하여 텍스트를 처리하여 더 작은 단위 (단어 또는 하위 단어)로 분해합니다. 그런 다음 각 토큰은 고차원 벡터 공간에 포함되어 초기 숨겨진 표현을 만듭니다.

The transformer architecture consists of multiple layers of transformer blocks. Each layer refines the hidden representation based on the output of the preceding layer and the relationships between tokens in the input sequence. This process continues through the final layer, where the hidden representation is used to generate the output text. Decoder-only models, which are the focus of Patchscopes, only consider preceding tokens when generating the next token, making them particularly well-suited for language generation tasks.

변압기 아키텍처는 여러 계층의 변압기 블록으로 구성됩니다. 각 층은 이전 레이어의 출력과 입력 순서에서 토큰 간의 관계를 기반으로 숨겨진 표현을 개선합니다. 이 프로세스는 최종 계층을 통해 계속되며, 숨겨진 표현은 출력 텍스트를 생성하는 데 사용됩니다. 패치 스코프의 초점 인 디코더 전용 모델은 다음 토큰을 생성 할 때만 우선 토큰을 고려하여 언어 생성 작업에 특히 적합합니다.

The Patchscopes framework operates on a simple yet powerful premise: LLMs possess the inherent ability to translate their own hidden representations into human-understandable text. By patching hidden representations between different locations during inference, researchers can inspect the information within a hidden representation, understand LLM behavior, and even augment the model's performance.

PatchScopes 프레임 워크는 간단하면서도 강력한 전제로 작동합니다. LLM은 자신의 숨겨진 표현을 인간 이해할 수있는 텍스트로 번역 할 수있는 고유의 능력을 가지고 있습니다. 추론 중 다른 위치 사이에 숨겨진 표현을 패치함으로써 연구원들은 숨겨진 표현 내에서 정보를 검사하고 LLM 동작을 이해하며 모델의 성능을 강화시킬 수 있습니다.

The process involves several key steps:

프로세스에는 몇 가지 주요 단계가 포함됩니다.

Source Prompt: A source prompt is fed into the LLM, generating hidden representations at each layer. This prompt serves as the context from which information will be extracted.

소스 프롬프트 : 소스 프롬프트가 LLM에 공급되어 각 계층에서 숨겨진 표현을 생성합니다. 이 프롬프트는 정보가 추출되는 컨텍스트 역할을합니다.

Inspection Prompt: An inspection prompt is designed to elicit a specific type of information from the LLM. This prompt typically includes a placeholder token where the hidden representation from the source prompt will be inserted.

검사 프롬프트 : 검사 프롬프트는 LLM의 특정 유형의 정보를 도출하도록 설계되었습니다. 이 프롬프트에는 일반적으로 소스 프롬프트의 숨겨진 표현이 삽입되는 자리 표시 자 토큰이 포함됩니다.

Patching: The hidden representation from a specific layer and token position in the source prompt is "patched" into the placeholder token in the inspection prompt. This effectively replaces the LLM's internal representation with the extracted information.

패치 : 소스 프롬프트의 특정 레이어 및 토큰 위치에서 숨겨진 표현은 검사 프롬프트에서 자리 표시 자 토큰으로 "패치"됩니다. 이것은 LLM의 내부 표현을 추출한 정보로 효과적으로 대체합니다.

Generation: The LLM continues decoding from the patched inspection prompt, generating text based on the combined information from the source and inspection prompts.

생성 : LLM은 패치 된 검사 프롬프트에서 계속 디코딩하여 소스 및 검사 프롬프트의 결합 된 정보를 기반으로 텍스트를 생성합니다.

Analysis: The generated text is analyzed to understand the information encoded in the hidden representation. This can involve evaluating the accuracy of factual information, identifying the concepts captured by the representation, or assessing the model's reasoning process.

분석 : 생성 된 텍스트는 숨겨진 표현으로 인코딩 된 정보를 이해하기 위해 분석됩니다. 여기에는 사실 정보의 정확성을 평가하거나 표현에 의해 캡처 된 개념을 식별하거나 모델의 추론 프로세스를 평가하는 것이 포함될 수 있습니다.

Case Study 1: Entity Resolution

사례 연구 1 : 엔티티 해결

The first case study explores how LLMs resolve entities (people, places, movies, etc.) across different layers of the model. The goal is to understand at what point the model associates a token with its correct meaning. For example, how does the model determine that "Diana" refers to "Princess Diana" rather than the generic name?

첫 번째 사례 연구는 LLM이 모델의 다른 계층에서 엔티티 (사람, 장소, 영화 등)를 어떻게 해결하는지 탐구합니다. 목표는 모델이 토큰을 올바른 의미와 연관시키는 시점에서 이해하는 것입니다. 예를 들어, 모델은 "다이아나"가 일반 이름이 아닌 "공주 다이아나"를 언급한다고 어떻게 결정합니까?

To investigate this, a source prompt containing the entity name is fed into the LLM. The hidden representation of the entity token is extracted at each layer and patched into an inspection prompt designed to elicit a description of the entity. By analyzing the generated descriptions, researchers can determine when the model has successfully resolved the entity.

이를 조사하기 위해 엔티티 이름을 포함하는 소스 프롬프트가 LLM에 공급됩니다. 엔티티 토큰의 숨겨진 표현은 각 층에서 추출되어 엔티티의 설명을 이끌어 내기 위해 설계된 검사 프롬프트로 패치됩니다. 생성 된 설명을 분석함으로써 연구원들은 모델이 실체를 성공적으로 해결 한시기를 결정할 수 있습니다.

The results of this case study suggest that entity resolution typically occurs in the early layers of the model (before layer 20). This aligns with theories about layer function, which posit that early layers are responsible for establishing context from the prompt. The study also reveals that tokenization (how the input text is broken down into tokens) has a significant impact on how the model navigates its embedding space.

이 사례 연구의 결과는 엔티티 해상도가 일반적으로 모델의 초기 층에서 발생한다는 것을 시사합니다 (레이어 20 전). 이는 층 기능에 대한 이론과 일치하며, 이는 초기 층이 프롬프트에서 컨텍스트를 설정하는 데 책임이 있다고 주장합니다. 이 연구는 또한 토큰 화 (입력 텍스트가 토큰으로 분해되는 방법)가 모델이 임베딩 공간을 항해하는 방법에 중대한 영향을 미친다는 것을 보여줍니다.

Case Study 2: Attribute Extraction

사례 연구 2 : 속성 추출

The second case study focuses on evaluating how accurately the model's hidden representation captures well-known concepts and their attributes. For example, can the model identify that the largest city in Spain is Madrid?

두 번째 사례 연구는 모델의 숨겨진 표현이 잘 알려진 개념과 그 속성을 얼마나 정확하게 캡처하는지 평가하는 데 중점을 둡니다. 예를 들어,이 모델이 스페인에서 가장 큰 도시가 마드리드임을 식별 할 수 있습니까?

To extract an attribute, a source prompt containing the subject (e.g., "Spain") is fed into the LLM. The hidden representation of the subject token is extracted and patched into an inspection prompt designed to elicit the specific attribute (e.g., "The largest city is x"). By analyzing the generated text, researchers can determine whether the model correctly identifies the attribute.

속성을 추출하기 위해, 주제 (예 : "스페인")를 포함하는 소스 프롬프트가 LLM에 공급됩니다. 대상 토큰의 숨겨진 표현은 특정 속성을 이끌어 내기 위해 설계된 검사 프롬프트로 추출되어 패치됩니다 (예 : "가장 큰 도시는 x"). 생성 된 텍스트를 분석함으로써 연구자들은 모델이 속성을 올바르게 식별하는지 여부를 결정할 수 있습니다.

This case study compares Patchscopes to a technique called "probing," which involves training a classifier to predict an attribute from a hidden representation. Unlike probing, Patchscopes does not

이 사례 연구는 Patchscopes를 "Probing"이라는 기술과 비교하는데, 여기에는 숨겨진 표현에서 속성을 예측하기 위해 분류기를 훈련시키는 것이 포함됩니다. 프로브와 달리 패치 스코프는 그렇지 않습니다

원본 소스：substack

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

2025年11月07日 에 게재된 다른 기사

더