$83456.399385 USD

-0.62%

ethereum

$1843.547137 USD

-2.81%

tether

$0.999712 USD

0.02%

xrp

$2.204343 USD

2.44%

bnb

$612.705254 USD

0.15%

solana

$126.453091 USD

-0.84%

usd-coin

$0.999969 USD

0.00%

dogecoin

$0.172208 USD

-2.52%

cardano

$0.683701 USD

-1.89%

tron

$0.231299 USD

-0.69%

toncoin

$3.725152 USD

-0.43%

chainlink

$13.761897 USD

-2.89%

unus-sed-leo

$9.650340 USD

-0.72%

stellar

$0.271854 USD

0.43%

avalanche

$19.853690 USD

-1.65%

加密貨幣新聞文章

Tokenbridge：在視覺生成中彌合連續和離散代表表示之間的差距

2025/03/28 06:13

自回歸視覺生成模型已成為一種開創性的圖像綜合方法，從語言模型令牌預測機制中汲取靈感。

Autoregressive visual generation models have emerged as a groundbreaking approach to image synthesis, drawing inspiration from language model token prediction mechanisms. These innovative models utilize image tokenizers to transform visual content into discrete or continuous tokens. The approach facilitates flexible multimodal integrations and allows adaptation of architectural innovations from LLM research. However, the field faces a critical challenge of determining the optimal token representation strategy. The choice between discrete and continuous tokens remains a fundamental dilemma, impacting model complexity and generation quality.

自回歸視覺生成模型已成為一種開創性的圖像綜合方法，從語言模型令牌預測機制中汲取靈感。這些創新的模型利用圖像令牌將視覺內容轉換為離散或連續令牌。該方法促進了靈活的多模式整合，並允許從LLM Research適應建築創新。但是，該領域面臨確定最佳令牌表示策略的關鍵挑戰。離散和連續令牌之間的選擇仍然是一個根本的困境，影響了模型的複雜性和發電質量。

Existing methods include visual tokenization that explores two primary approaches: continuous and discrete token representations. Variational autoencoders establish continuous latent spaces that maintain high visual fidelity, becoming foundational in diffusion model development. Discrete methods like VQ-VAE and VQGAN enable straightforward autoregressive modeling but encounter significant limitations, including codebook collapse and information loss.

現有方法包括探索兩種主要方法的視覺令牌化：連續和離散的令牌表示。變異自動編碼器建立了保持高視覺保真度的連續潛在空間，並成為擴散模型開發中的基礎。諸如VQ-VAE和VQGAN之類的離散方法可實現直接自迴旋建模，但遇到了重大限制，包括代碼書崩潰和信息丟失。

Autoregressive image generation evolves from computationally intensive pixel-based methods to more efficient token-based strategies. While models like DALL-E show promising results, hybrid methods such as GIVT and MAR introduce complex architectural modifications to improve generation quality, rendering the traditional autoregressive modeling pipeline complicated.

自回歸圖像的產生從基於計算密集的像素的方法發展為更有效的代幣策略。儘管Dall-E之類的模型顯示出令人鼓舞的結果，但GIVT和MAR等混合方法引入了複雜的建築修飾，以提高發電質量，從而使傳統的自動回歸建模管道變得複雜。

To bridge this critical gap between continuous and discrete token representations in visual generation, researchers from the University of Hong Kong, ByteDance Seed, Ecole Polytechnique, and Peking University propose TokenBridge. It aims to utilize the strong representation capacity of continuous tokens while maintaining the modeling simplicity of discrete tokens. TokenBridge decouples the discretization process from initial tokenizer training by introducing a novel post-training quantization technique. Moreover, it implements a unique dimension-wise quantization strategy that independently discretizes each feature dimension, complemented by a lightweight autoregressive prediction mechanism. It efficiently manages the expanded token space while preserving high-quality visual generation capabilities.

為了彌合視覺一代中連續和離散代幣表示之間的關鍵差距，香港大學的研究人員，獸人種子，Ecole Polytechnique和Peking University提出了Tokenbridge。它旨在利用連續令牌的強大表示能力，同時保持離散令牌的模型簡單性。 Tokenbridge通過引入一種新型的訓練後量化技術，將離散化過程與初始令牌培訓相結合。此外，它實現了獨特的尺寸量化策略，該策略將每個特徵維度獨立離散，並以輕巧的自回歸預測機制進行補充。它有效地管理了擴展的令牌空間，同時保留了高質量的視覺生成能力。

TokenBridge introduces a training-free dimension-wise quantization technique that operates independently on each feature channel, effectively addressing previous token representation limitations. The approach capitalizes on two crucial properties of Variational Autoencoder features: their bounded nature due to KL constraints and near-Gaussian distribution.

Tokenbridge引入了一種無訓練的尺寸量化技術，該技術在每個特徵通道上獨立運行，有效地解決了以前的令牌表示限制。該方法利用了變異自動編碼器特徵的兩個關鍵特性：由於KL限制和接近高斯的分佈，它們的界限。

The autoregressive model adopts a Transformer architecture with two primary configurations: a default L model comprising 32 blocks with 1024 width (approx 400 million parameters) for initial studies and a larger H model with 40 blocks and 1280 width (around 910 million parameters) for final evaluations. This design allows a detailed exploration of the proposed quantization strategy across different model scales.

自回歸模型採用具有兩種主要配置的變壓器體系結構：默認L模型，其中包括32個塊，具有1024個寬度（約4億個參數），用於初步研究，較大的H模型具有40個塊和1280個寬度和1280個寬度（約9.1億個參數）（約9.1億個參數），用於最終評估。該設計允許對不同模型量表進行擬議的量化策略進行詳細的探索。

The results demonstrate that TokenBridge outperforms traditional discrete token models, achieving superior Frechet Inception Distance (FID) with significantly fewer parameters. For instance, TokenBridge-L secures an FID of 1.76 with only 486 million parameters, contrasting with LlamaGen's 2.18 using 3.1 billion parameters. When benchmarked against continuous approaches, TokenBridge-L outperforms GIVT, achieving a FID of 1.76 versus 3.35.

結果表明，Tokenbridge的表現優於傳統的離散代幣模型，實現了優勢構成距離（FID）的參數較少。例如，Tokenbridge-L僅使用31億個參數與Llamagen的2.18相比，以4.86億個參數獲得1.76的FID。當對著連續的方法進行基準測試時，Tokenbridge-L優於Givt，獲得1.76的FID與3.35。

The H-model configuration further validates the method's effectiveness, matching MAR-H in FID (1.55) while delivering superior Inception Score and Recall metrics with marginally fewer parameters. These results highlight TokenBridge's capability to bridge discrete and continuous token representations.

H模型配置進一步驗證了該方法的有效性，與FID中的MAR-H相匹配（1.55），同時提供了較高的啟動分數和召回參數較少的指標。這些結果突出了Tokenbridge橋接離散和連續令牌表示的能力。

In conclusion, researchers present TokenBridge, which bridges the longstanding gap between discrete and continuous token representations. It achieves high-quality visual generation with remarkable efficiency by introducing a post-training quantization approach and dimension-wise autoregressive decomposition. The research demonstrates that discrete token approaches using standard cross-entropy loss can compete with state-of-the-art continuous methods, eliminating the need for complex distribution modeling techniques. This finding opens a promising pathway for future investigations, potentially transforming how researchers conceptualize and implement token-based visual synthesis technologies.

總之，研究人員提出了Tokenbridge，這彌合了離散和連續令牌表示之間的長期差距。通過引入訓練後量化方法和尺寸自回歸分解，它具有出色的效率來實現高質量的視覺產生。該研究表明，使用標準跨透鏡損失的離散令牌方法可以與最新的連續方法競爭，從而消除了對複雜分佈建模技術的需求。這一發現為未來研究開闢了一個有希望的途徑，有可能改變研究人員如何概念化和實施基於令牌的視覺合成技術。

Check out the Paper, GitHub Page and Project. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

查看紙張，GitHub頁面和項目。這項研究的所有信用都歸該項目的研究人員。另外，請隨時在Twitter上關注我們，不要忘記加入我們的85K+ ML Subreddit。

免責聲明:info@kdj.com

所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大，建議您充分研究後謹慎投資！

如果您認為本網站使用的內容侵犯了您的版權，請立即聯絡我們（info@kdj.com），我們將及時刪除。

2025年03月31日其他文章發表於