bitcoin
bitcoin

$98610.389496 USD

5.81%

ethereum
ethereum

$3496.580323 USD

4.87%

tether
tether

$0.999521 USD

0.08%

xrp
xrp

$2.328412 USD

7.06%

bnb
bnb

$695.000528 USD

1.89%

solana
solana

$196.060156 USD

5.75%

dogecoin
dogecoin

$0.334758 USD

7.28%

usd-coin
usd-coin

$1.000098 USD

0.02%

cardano
cardano

$0.936567 USD

6.12%

tron
tron

$0.257785 USD

3.04%

avalanche
avalanche

$41.205375 USD

11.38%

chainlink
chainlink

$24.787462 USD

9.56%

toncoin
toncoin

$5.780659 USD

4.95%

sui
sui

$4.658618 USD

8.87%

shiba-inu
shiba-inu

$0.000023 USD

6.30%

Cryptocurrency News Articles

Redefining Generative AI: Embracing Structure for Enhanced Output Precision

Apr 19, 2024 at 08:06 am

Structured generative AI empowers generative AI models to generate outputs in specific formats. This approach prevents syntax errors by limiting token selection to valid options, ensuring executable queries and parsable data structures. Additionally, consistent tokenization of punctuation and keywords simplifies the patterns that models must learn, reducing training time and improving accuracy. By harnessing the knowledge of output structures, structured generative AI provides a powerful tool for translating natural language into various structured formats.

Redefining Generative AI: Embracing Structure for Enhanced Output Precision

Redefining Generative AI: Embracing Structure for Enhanced Output Precision

Introduction

Generative AI, a transformative technology revolutionizing natural language processing, has made significant strides in generating coherent and grammatically sound text. However, when it comes to producing structured output, such as SQL queries or JSON data, generative AI often falters, succumbing to errors that hinder the execution or parsing of the generated code.

Enter Structured Generative AI

To overcome this limitation, we introduce the concept of "structured generative AI," a powerful technique that constrains the generative process within predefined formats, virtually eliminating syntax errors and ensuring the validity of the output. By leveraging the knowledge of the output language's structure, structured generative AI ensures that only legitimate tokens are considered during generation, effectively eliminating syntactical errors.

Mechanism of Token Generation

Generative AI models, such as transformer architectures, generate tokens sequentially, relying on the input and previously generated tokens to determine the next selection. At each step, a classifier assigns probability values to all tokens in the vocabulary, guiding the selection of the next token.

Constraining Token Generation

Structured generative AI incorporates knowledge of the output language's structure to limit token generation. Illegitimate tokens, such as incorrect punctuation or invalid keywords, have their probabilities set to infinity (negative infinity), effectively excluding them from consideration. For instance, if a valid SQL query requires a comma after "SELECT name," all other token probabilities are set to infinity, ensuring that only a comma can be selected.

Implementation with Hugging Face

Hugging Face, a leading provider of pretrained models and tools for natural language processing, offers a convenient way to implement structured generative AI through its "logits processor" feature. This feature allows users to define a custom function that modifies the token probabilities after they have been calculated but before the final selection is made.

Example: SQL Query Generation

To demonstrate the power of structured generative AI, let's consider the task of generating SQL queries from natural language. We initialize a pretrained BART model and define a set of rules that specify which tokens are allowed to follow each other in a valid SQL query.

rules = {'

Using these rules, we create a logits processor that converts the rules into token IDs and modifies the token probabilities accordingly.

Results: Enhanced SQL Query Generation

Running the BART model with the logits processor yields significant improvements in the quality of generated SQL queries. The model now adheres to the predefined rules, producing syntactically correct queries that can be executed without errors.

to_translate = 'customers emails from the us'
words = to_translate.split()
tokenized_text = tokenizer([words], is_split_into_words=True, return_offsets_mapping=True)
logits_processor = LogitsProcessorList([SQLLogitsProcessor(tokenizer)])
out = pretrained_model.generate(
    torch.tensor(tokenized_text["input_ids"]),
    max_new_tokens=20,
    logits_processor=logits_processor)

The Significance of Tokenization

Tokenization, the process of converting text into a sequence of tokens, plays a crucial role in structured generative AI. Consistent tokenization ensures that similar concepts and punctuation are represented by the same token, simplifying the model's learning process. For instance, adding spaces before words and punctuation enhances consistency and reduces the complexity of patterns that the model needs to learn.

Applications of Structured Generative AI

The applications of structured generative AI extend far beyond SQL query generation. It empowers various tasks, including:

  • JSON Data Extraction: Generating structured JSON data from natural language, enabling seamless data parsing and storage.
  • Query Generation: Creating executable queries for various database systems, facilitating efficient information retrieval.
  • Code Generation: Producing valid code snippets in different programming languages, accelerating software development.

Conclusion

Structured generative AI is a groundbreaking technique that dramatically enhances the precision and applicability of generative AI models. By incorporating knowledge of the output language's structure, structured generative AI eliminates syntax errors and guarantees the executability of generated code. This breakthrough enables a wide range of applications, empowering users to extract information, generate queries, and produce code more efficiently and accurately.

Disclaimer:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

Other articles published on Dec 25, 2024