$83571.608249 USD

-1.38%

ethereum

$1826.028236 USD

-3.02%

tether

$0.999839 USD

-0.01%

xrp

$2.053149 USD

-2.48%

bnb

$601.140115 USD

-0.44%

solana

$120.357332 USD

-3.79%

usd-coin

$0.999833 USD

-0.02%

dogecoin

$0.166175 USD

-3.43%

cardano

$0.652521 USD

-3.00%

tron

$0.236809 USD

-0.59%

toncoin

$3.785339 USD

-5.02%

chainlink

$13.253231 USD

-3.91%

unus-sed-leo

$9.397427 USD

-0.19%

stellar

$0.266444 USD

-1.00%

sui

$2.409007 USD

1.15%

暗号通貨のニュース記事

生成AIの再定義：出力精度を向上させる仕組みを採用

2024/04/19 08:06

構造化された生成 AI により、生成 AI モデルが特定の形式で出力を生成できるようになります。このアプローチでは、トークンの選択を有効なオプションに制限し、実行可能なクエリと解析可能なデータ構造を保証することで構文エラーを防ぎます。さらに、句読点とキーワードの一貫したトークン化により、モデルが学習する必要があるパターンが簡素化され、トレーニング時間が短縮され、精度が向上します。出力構造の知識を利用することにより、構造化生成 AI は、自然言語をさまざまな構造化フォーマットに変換するための強力なツールを提供します。

Redefining Generative AI: Embracing Structure for Enhanced Output Precision

生成 AI の再定義: 出力精度を向上させる構造の採用

Introduction

導入

Generative AI, a transformative technology revolutionizing natural language processing, has made significant strides in generating coherent and grammatically sound text. However, when it comes to producing structured output, such as SQL queries or JSON data, generative AI often falters, succumbing to errors that hinder the execution or parsing of the generated code.

自然言語処理に革命をもたらす革新的なテクノロジーである Generative AI は、一貫性があり文法的に健全なテキストの生成において大きな進歩を遂げました。ただし、SQL クエリや JSON データなどの構造化された出力を生成する場合、生成 AI は失敗することが多く、生成されたコードの実行や解析を妨げるエラーが発生します。

Enter Structured Generative AI

構造化生成 AI の登場

To overcome this limitation, we introduce the concept of "structured generative AI," a powerful technique that constrains the generative process within predefined formats, virtually eliminating syntax errors and ensuring the validity of the output. By leveraging the knowledge of the output language's structure, structured generative AI ensures that only legitimate tokens are considered during generation, effectively eliminating syntactical errors.

この制限を克服するために、私たちは「構造化生成 AI」の概念を導入します。これは、生成プロセスを事前定義された形式内に制限し、構文エラーを事実上排除し、出力の妥当性を保証する強力な技術です。構造化生成 AI は、出力言語の構造に関する知識を活用することで、生成中に正当なトークンのみが考慮されるようにし、構文エラーを効果的に排除します。

Mechanism of Token Generation

トークン生成の仕組み

Generative AI models, such as transformer architectures, generate tokens sequentially, relying on the input and previously generated tokens to determine the next selection. At each step, a classifier assigns probability values to all tokens in the vocabulary, guiding the selection of the next token.

トランスフォーマーアーキテクチャなどの生成 AI モデルは、入力と以前に生成されたトークンに基づいてトークンを順番に生成し、次の選択を決定します。各ステップで、分類子は語彙内のすべてのトークンに確率値を割り当て、次のトークンの選択をガイドします。

Constraining Token Generation

トークン生成の制限

Structured generative AI incorporates knowledge of the output language's structure to limit token generation. Illegitimate tokens, such as incorrect punctuation or invalid keywords, have their probabilities set to infinity (negative infinity), effectively excluding them from consideration. For instance, if a valid SQL query requires a comma after "SELECT name," all other token probabilities are set to infinity, ensuring that only a comma can be selected.

構造化生成 AI には、出力言語の構造に関する知識が組み込まれ、トークンの生成が制限されます。不適切な句読点や無効なキーワードなどの不正なトークンの確率は無限大 (負の無限大) に設定され、事実上考慮から除外されます。たとえば、有効な SQL クエリで「SELECT name」の後にコンマが必要な場合、他のすべてのトークンの確率が無限大に設定され、コンマのみが選択できるようになります。

Implementation with Hugging Face

ハグフェイスによる実装

Hugging Face, a leading provider of pretrained models and tools for natural language processing, offers a convenient way to implement structured generative AI through its "logits processor" feature. This feature allows users to define a custom function that modifies the token probabilities after they have been calculated but before the final selection is made.

自然言語処理用の事前トレーニング済みモデルとツールの大手プロバイダーである Hugging Face は、「ロジッツプロセッサ」機能を通じて構造化された生成 AI を実装する便利な方法を提供しています。この機能を使用すると、ユーザーは、トークンの確率が計算された後、最終的な選択が行われる前に、トークンの確率を変更するカスタム関数を定義できます。

Example: SQL Query Generation

例: SQL クエリの生成

To demonstrate the power of structured generative AI, let's consider the task of generating SQL queries from natural language. We initialize a pretrained BART model and define a set of rules that specify which tokens are allowed to follow each other in a valid SQL query.

構造化生成 AI の力を実証するために、自然言語から SQL クエリを生成するタスクを考えてみましょう。事前トレーニング済みの BART モデルを初期化し、有効な SQL クエリ内でどのトークンが相互にフォローできるかを指定する一連のルールを定義します。

rules = {'': ['SELECT', 'DELETE'], # beginning of the generationルール = {'
         'SELECT': ['name', 'email', 'id'],  # names of columns in our schema'SELECT': ['name', 'email', 'id'], # スキーマ内の列の名前
         'DELETE': ['name', 'email', 'id'],'削除': ['名前', 'メールアドレス', 'ID'],
         'name': [',', 'FROM'],'名前': [',', 'FROM'],
         'email': [',', 'FROM'],'電子メール': [',', 'FROM'],
         'id': [',', 'FROM'],'id': [',', 'FROM'],
         ',': ['name', 'email', 'id'],',': ['名前', 'メールアドレス', 'ID'],
         'FROM': ['customers', 'vendors'],  # names of tables in our schema'FROM': ['customers', 'vendors'], # スキーマ内のテーブルの名前
         'customers': [''],「顧客」: ['
         'vendors': [''],  # end of the generation}

Using these rules, we create a logits processor that converts the rules into token IDs and modifies the token probabilities accordingly.

「ベンダー」: ['

Results: Enhanced SQL Query Generation

結果: SQL クエリ生成の強化

Running the BART model with the logits processor yields significant improvements in the quality of generated SQL queries. The model now adheres to the predefined rules, producing syntactically correct queries that can be executed without errors.

ロジットプロセッサを使用して BART モデルを実行すると、生成される SQL クエリの品質が大幅に向上します。モデルは事前定義されたルールに準拠し、エラーなしで実行できる構文的に正しいクエリを生成します。

to_translate = 'customers emails from the us'to_translate = '米国からのお客様のメール'
words = to_translate.split()単語 = to_translate.split()
tokenized_text = tokenizer([words], is_split_into_words=True, return_offsets_mapping=True)tokenized_text = tokenizer([単語]、is_split_into_words=True、return_offsets_mapping=True)
logits_processor = LogitsProcessorList([SQLLogitsProcessor(tokenizer)])logits_processor = LogitsProcessorList([SQLLogitsProcessor(トークナイザー)])
out = pretrained_model.generate(out = pretrained_model.generate(
    torch.tensor(tokenized_text["input_ids"]),torch.tensor(tokenized_text["input_ids"]),
    max_new_tokens=20,max_new_tokens=20、
    logits_processor=logits_processor)

The Significance of Tokenization

logits_processor=logits_processor)トークン化の重要性

Tokenization, the process of converting text into a sequence of tokens, plays a crucial role in structured generative AI. Consistent tokenization ensures that similar concepts and punctuation are represented by the same token, simplifying the model's learning process. For instance, adding spaces before words and punctuation enhances consistency and reduces the complexity of patterns that the model needs to learn.

テキストを一連のトークンに変換するプロセスであるトークン化は、構造化された生成 AI において重要な役割を果たします。一貫したトークン化により、同様の概念と句読点が同じトークンで表現されるようになり、モデルの学習プロセスが簡素化されます。たとえば、単語や句読点の前にスペースを追加すると、一貫性が向上し、モデルが学習する必要があるパターンの複雑さが軽減されます。

Applications of Structured Generative AI

構造化生成AIの応用

The applications of structured generative AI extend far beyond SQL query generation. It empowers various tasks, including:

構造化生成 AI のアプリケーションは、SQL クエリの生成をはるかに超えています。次のようなさまざまなタスクを実行できるようになります。

JSON Data Extraction: Generating structured JSON data from natural language, enabling seamless data parsing and storage.
Query Generation: Creating executable queries for various database systems, facilitating efficient information retrieval.
Code Generation: Producing valid code snippets in different programming languages, accelerating software development.

Conclusion

JSON データ抽出: 自然言語から構造化された JSON データを生成し、シームレスなデータ解析と保存を可能にします。クエリ生成: さまざまなデータベースシステム用の実行可能なクエリを作成し、効率的な情報取得を促進します。コード生成: さまざまなプログラミング言語で有効なコードスニペットを生成し、ソフトウェア開発を加速します。。結論

Structured generative AI is a groundbreaking technique that dramatically enhances the precision and applicability of generative AI models. By incorporating knowledge of the output language's structure, structured generative AI eliminates syntax errors and guarantees the executability of generated code. This breakthrough enables a wide range of applications, empowering users to extract information, generate queries, and produce code more efficiently and accurately.

構造化生成 AI は、生成 AI モデルの精度と適用性を劇的に向上させる画期的な技術です。出力言語の構造に関する知識を組み込むことにより、構造化生成 AI は構文エラーを排除し、生成されたコードの実行可能性を保証します。この画期的な進歩により、幅広いアプリケーションが可能になり、ユーザーはより効率的かつ正確に情報を抽出し、クエリを生成し、コードを生成できるようになります。

免責事項:info@kdj.com

提供される情報は取引に関するアドバイスではありません。 kdj.com は、この記事で提供される情報に基づいて行われた投資に対して一切の責任を負いません。暗号通貨は変動性が高いため、十分な調査を行った上で慎重に投資することを強くお勧めします。

このウェブサイトで使用されているコンテンツが著作権を侵害していると思われる場合は、直ちに当社 (info@kdj.com) までご連絡ください。速やかに削除させていただきます。

2025年04月04日に掲載されたその他の記事

もっと