![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
結構化生成人工智慧使生成人工智慧模型能夠產生特定格式的輸出。這種方法透過將標記選擇限制為有效選項來防止語法錯誤,確保可執行查詢和可解析資料結構。此外,標點符號和關鍵字的一致標記化簡化了模型必須學習的模式,減少了訓練時間並提高了準確性。透過利用輸出結構的知識,結構化生成人工智慧提供了將自然語言翻譯為各種結構化格式的強大工具。
Redefining Generative AI: Embracing Structure for Enhanced Output Precision
重新定義生成式人工智慧:採用結構來提高輸出精度
Introduction
介紹
Generative AI, a transformative technology revolutionizing natural language processing, has made significant strides in generating coherent and grammatically sound text. However, when it comes to producing structured output, such as SQL queries or JSON data, generative AI often falters, succumbing to errors that hinder the execution or parsing of the generated code.
生成式人工智慧是一項徹底改變自然語言處理的變革性技術,在產生連貫且語法正確的文本方面取得了重大進展。然而,當涉及產生結構化輸出(例如 SQL 查詢或 JSON 資料)時,生成式 AI 常常會出現問題,出現阻礙生成程式碼執行或解析的錯誤。
Enter Structured Generative AI
進入結構化生成人工智慧
To overcome this limitation, we introduce the concept of "structured generative AI," a powerful technique that constrains the generative process within predefined formats, virtually eliminating syntax errors and ensuring the validity of the output. By leveraging the knowledge of the output language's structure, structured generative AI ensures that only legitimate tokens are considered during generation, effectively eliminating syntactical errors.
為了克服這一限制,我們引入了「結構化生成人工智慧」的概念,這是一種強大的技術,可以將生成過程限制在預先定義的格式內,幾乎消除語法錯誤並確保輸出的有效性。透過利用輸出語言結構的知識,結構化產生人工智慧可確保在生成過程中僅考慮合法標記,從而有效消除語法錯誤。
Mechanism of Token Generation
代幣生成機制
Generative AI models, such as transformer architectures, generate tokens sequentially, relying on the input and previously generated tokens to determine the next selection. At each step, a classifier assigns probability values to all tokens in the vocabulary, guiding the selection of the next token.
生成式 AI 模型(例如 Transformer 架構)會順序產生令牌,並根據輸入和先前產生的令牌來決定下一個選擇。在每一步中,分類器都會為詞彙表中的所有標記分配機率值,並指導下一個標記的選擇。
Constraining Token Generation
限制代幣生成
Structured generative AI incorporates knowledge of the output language's structure to limit token generation. Illegitimate tokens, such as incorrect punctuation or invalid keywords, have their probabilities set to infinity (negative infinity), effectively excluding them from consideration. For instance, if a valid SQL query requires a comma after "SELECT name," all other token probabilities are set to infinity, ensuring that only a comma can be selected.
結構化生成人工智慧結合了輸出語言結構的知識來限制令牌生成。非法標記(例如不正確的標點符號或無效關鍵字)的機率設定為無限大(負無限大),從而有效地將它們排除在考慮範圍之外。例如,如果有效的 SQL 查詢需要在「SELECT name」後使用逗號,則所有其他標記機率將設為無限大,以確保只能選擇逗號。
Implementation with Hugging Face
抱臉實施
Hugging Face, a leading provider of pretrained models and tools for natural language processing, offers a convenient way to implement structured generative AI through its "logits processor" feature. This feature allows users to define a custom function that modifies the token probabilities after they have been calculated but before the final selection is made.
Hugging Face 是自然語言處理預訓練模型和工具的領先提供商,透過其「logits 處理器」功能提供了實現結構化生成人工智慧的便捷方法。此功能允許使用者定義自訂函數,在計算令牌機率後但做出最終選擇之前修改令牌機率。
Example: SQL Query Generation
範例:SQL 查詢生成
To demonstrate the power of structured generative AI, let's consider the task of generating SQL queries from natural language. We initialize a pretrained BART model and define a set of rules that specify which tokens are allowed to follow each other in a valid SQL query.
為了展示結構化產生人工智慧的強大功能,讓我們考慮一下從自然語言產生 SQL 查詢的任務。我們初始化預訓練的 BART 模型並定義一組規則,指定允許哪些標記在有效的 SQL 查詢中相互跟隨。
rules = {'': ['SELECT', 'DELETE'], # beginning of the generation規則={'
'SELECT': ['name', 'email', 'id'], # names of columns in our schema'SELECT': ['name', 'email', 'id'], # 我們模式中的列名稱
'DELETE': ['name', 'email', 'id'],'刪除': ['姓名', '電子郵件', 'id'],
'name': [',', 'FROM'],'姓名': [',', '來自'],
'email': [',', 'FROM'],'電子郵件': [',', '寄件者'],
'id': [',', 'FROM'],'id': [',', '來自'],
',': ['name', 'email', 'id'],',': ['姓名', '電子郵件', 'id'],
'FROM': ['customers', 'vendors'], # names of tables in our schema'FROM': ['customers', 'vendors'], # 我們模式中的表格名稱
'customers': [''],'顧客': ['
'vendors': [''], # end of the generation}
Using these rules, we create a logits processor that converts the rules into token IDs and modifies the token probabilities accordingly.
'供應商':['
Results: Enhanced SQL Query Generation
結果:增強的 SQL 查詢生成
Running the BART model with the logits processor yields significant improvements in the quality of generated SQL queries. The model now adheres to the predefined rules, producing syntactically correct queries that can be executed without errors.
使用 logits 處理器執行 BART 模型可以顯著提高產生的 SQL 查詢的品質。模型現在遵循預先定義的規則,產生語法正確的查詢,可以無錯誤地執行。
to_translate = 'customers emails from the us'to_translate = '來自美國的客戶電子郵件'
words = to_translate.split()字 = to_translate.split()
tokenized_text = tokenizer([words], is_split_into_words=True, return_offsets_mapping=True)tokenized_text = tokenizer([單字], is_split_into_words=True, return_offsets_mapping=True)
logits_processor = LogitsProcessorList([SQLLogitsProcessor(tokenizer)])logits_processor = LogitsProcessorList([SQLLogitsProcessor(tokenizer)])
out = pretrained_model.generate(輸出 = pretrained_model.generate(
torch.tensor(tokenized_text["input_ids"]),torch.tensor(tokenized_text["input_ids"]),
max_new_tokens=20,最大新令牌=20,
logits_processor=logits_processor)
The Significance of Tokenization
logits_processor=logits_processor)Token化的意義
Tokenization, the process of converting text into a sequence of tokens, plays a crucial role in structured generative AI. Consistent tokenization ensures that similar concepts and punctuation are represented by the same token, simplifying the model's learning process. For instance, adding spaces before words and punctuation enhances consistency and reduces the complexity of patterns that the model needs to learn.
標記化是將文字轉換為標記序列的過程,在結構化生成人工智慧中發揮著至關重要的作用。一致的標記化確保相似的概念和標點符號由相同的標記表示,從而簡化模型的學習過程。例如,在單字和標點符號之前添加空格可以增強一致性並降低模型需要學習的模式的複雜性。
Applications of Structured Generative AI
結構化生成人工智慧的應用
The applications of structured generative AI extend far beyond SQL query generation. It empowers various tasks, including:
結構化產生人工智慧的應用遠遠超出了 SQL 查詢產生。它支援各種任務,包括:
- JSON Data Extraction: Generating structured JSON data from natural language, enabling seamless data parsing and storage.
- Query Generation: Creating executable queries for various database systems, facilitating efficient information retrieval.
- Code Generation: Producing valid code snippets in different programming languages, accelerating software development.
Conclusion
JSON資料擷取:從自然語言產生結構化JSON數據,實現無縫資料解析與儲存。的程式碼片段,加速軟體開發。
Structured generative AI is a groundbreaking technique that dramatically enhances the precision and applicability of generative AI models. By incorporating knowledge of the output language's structure, structured generative AI eliminates syntax errors and guarantees the executability of generated code. This breakthrough enables a wide range of applications, empowering users to extract information, generate queries, and produce code more efficiently and accurately.
結構化生成人工智慧是一項突破性技術,可顯著提高生成人工智慧模型的精確度和適用性。透過結合輸出語言結構的知識,結構化生成人工智慧消除了語法錯誤並保證了生成程式碼的可執行性。這項突破實現了廣泛的應用,使用戶能夠更有效率、更準確地提取資訊、產生查詢和產生程式碼。
免責聲明:info@kdj.com
所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大,建議您充分研究後謹慎投資!
如果您認為本網站使用的內容侵犯了您的版權,請立即聯絡我們(info@kdj.com),我們將及時刪除。
-
- 市場將隨著對比特幣和替代幣的需求增加而上升:分析師預測
- 2025-04-03 16:25:12
- 昨天,美國政府對包括中國,英國和韓國在內的一些著名貿易夥伴徵收了互惠關稅。
-
- 明尼蘇達州和阿拉巴馬州議員介紹伴侶法案以購買比特幣
- 2025-04-03 16:25:12
- 美國明尼蘇達州和阿拉巴馬州的立法者已向同一現有法案提交了同類法案,這些法案將使每個州都能購買比特幣。
-
-
-
- Metaplanet以1330萬美元的價格擴大其比特幣持有量,將其BTC儲藏在
- 2025-04-03 16:15:12
- 此舉增強了Metaplanet在亞洲和全球第9大公司持有人中最大的比特幣持有人的地位。
-
- 隨著2025年開始,加密市場正在做出一些嚴重的行動
- 2025-04-03 16:15:12
- 從比特幣戲弄新的高點到Altcoins左右闖入,不乏值得一看的項目。
-
- BTC收回85,000美元,因為看漲公司去購物
- 2025-04-03 16:10:12
- 領先的數字資產繼續上升,因為Stablecoin龐然大物Tether等公司在書籍上透露了數十億美元的BTC。
-
- Gamefi:遊戲和金融的動態融合
- 2025-04-03 16:10:12
- Gamefi是遊戲和金融的動態融合,利用區塊鏈技術改變我們的遊戲方式和投資方式。
-
- 南非的稅收服務敦促加密資產市場參與者註冊
- 2025-04-03 16:05:12
- 南非的稅務局敦促參與加密資產交易的人在當局登記,因為它試圖控制稅收違約者。