$84970.406320 USD

6.55%

ethereum

$2232.353309 USD

5.10%

tether

$0.999658 USD

0.10%

xrp

$2.180724 USD

8.22%

bnb

$595.246239 USD

4.24%

solana

$143.544771 USD

11.62%

usd-coin

$1.000023 USD

0.01%

dogecoin

$0.204990 USD

10.18%

cardano

$0.646461 USD

9.37%

tron

$0.236037 USD

7.84%

hedera

$0.230859 USD

25.60%

litecoin

$127.299070 USD

8.30%

chainlink

$14.651941 USD

7.13%

stellar

$0.300809 USD

15.64%

avalanche

$22.030668 USD

7.25%

加密貨幣新聞文章

FFTNET：自適應光譜過濾，以進行有效的遠程相互作用

2025/03/01 10:37

深度學習模型通過實現有效的數據驅動學習，具有顯著高級的自然語言處理和計算機視覺。

The remarkable capabilities of deep learning models in domains like natural language processing and computer vision are a product of efficient data-driven learning. However, a major obstacle to pushing these models even further is the computational burden of self-attention mechanisms, especially when handling long sequences or tasks with extensive data.

自然語言處理和計算機視覺等領域中深度學習模型的顯著功能是有效數據驅動學習的產物。但是，推動這些模型的主要障礙是自我發揮機制的計算負擔，尤其是在處理長序列或使用大量數據的任務時。

Traditional transformers perform pairwise comparisons between all tokens in a sequence to generate rich representations, an operation that scales quadratically with sequence length. For shorter sequences, this strategy is highly effective, but as sequences become longer, the models struggle with excessive memory usage and slow inference times. This poses a practical limitation for tasks like machine translation with rich context or open-ended code generation, which often involve processing lengthy sequences.

傳統的變壓器在序列中的所有令牌之間進行成對比較，以生成豐富的表示形式，該操作以序列長度縮放四倍。對於較短的序列，該策略是非常有效的，但是隨著序列的變長，模型在過度記憶使用和推理時間緩慢而掙扎。這對具有豐富上下文或開放式代碼生成的機器翻譯等任務構成了實際限制，這通常涉及處理冗長的序列。

To navigate this computational challenge, researchers have been developing more efficient architectures that can process long sequences without a significant drop in performance. This pursuit has focused on reducing the computational complexity of self-attention while preserving its ability to capture long-range dependencies, which are crucial for modeling the intricate structure of language and visual scenes.

為了應對這一計算挑戰，研究人員一直在開發更有效的架構，這些架構可以處理長序列而不會大幅下降。這種追求的重點是降低自我注意力的計算複雜性，同時保留其捕獲長期依賴性的能力，這對於建模語言和視覺場景的複雜結構至關重要。

One promising avenue has been exploring Fourier-based models for token mixing. These models, such as FNet, utilize the Fast Fourier Transform (FFT) to achieve efficient mixing in O(n log n) time. However, many Fourier-based models rely on a static Fourier transform, which might not be optimal for varying input distributions and tasks. Moreover, FNet's performance in LRA and ImageNet has been reported to be lower than traditional self-attention models.

一個有希望的大道一直在探索用於代幣混合的基於傅立葉的模型。這些模型（例如FNET）利用快速傅立葉變換（FFT）在O（n log n）時間中實現有效的混合。但是，許多基於傅立葉的模型都依賴於靜態傅立葉變換，這對於不同的輸入分佈和任務可能不是最佳的。此外，據報導，FNET在LRA和Imagenet中的性能低於傳統的自我注意力模型。

Another class of methods focuses on low-rank approximations of the attention matrix to achieve near-linear complexity. Models like Performer and Linformer decompose the attention matrix into low-rank components, reducing the computational cost. Nonetheless, these models might introduce additional approximations that could affect the quality of attention computation, especially in capturing fine-grained dependencies between tokens.

另一種方法的重點是注意矩陣的低級別近似值，以達到接近線性的複雜性。諸如表演者和Linformer之類的模型將注意力矩陣分解為低級別組件，從而降低了計算成本。但是，這些模型可能會引入其他可能影響注意力計算質量的近似值，尤其是在捕獲令牌之間的細粒度依賴性時。

Convolutional architectures have also been integrated to process sequences in a more efficient manner. These models extract hierarchical features from local neighborhoods using convolutional modules and combine them to capture long-range dependencies without direct token comparisons. While convolutional models excel at extracting spatial features in image processing, they might not be as efficient in fully capturing the complex interactions between tokens in natural language or the diverse patterns in image data.

卷積體系結構也已集成到更有效的方式處理序列。這些模型使用卷積模塊從本地社區提取分層特徵，並將它們組合起來以捕獲長期依賴性，而無需直接令牌比較。儘管卷積模型在圖像處理中提取空間特徵方面表現出色，但它們在完全捕獲自然語言中令牌或圖像數據中多種模式之間的複雜相互作用方面可能並不那麼有效。

Now, a research team from the University of Southern California has introduced FFTNet, an adaptive spectral filtering framework that introduces a novel variant of the Fast Fourier Transform (FFT) for global token mixing in O(n log n) time. In contrast to traditional self-attention, which performs pairwise comparisons between all tokens, FFTNet operates on the frequency domain, presenting an efficient and scalable approach for processing long sequences.

現在，南加州大學的一個研究團隊推出了FFTNet，這是一個自適應光譜過濾框架，它引入了o（n log n）時間中全球令牌混合的快速傅立葉變換（FFT）的新型變體。與在所有令牌之間進行成對比較的傳統自我注意力相反，FFTNet在頻域上運行，這是一種有效且可擴展的方法來處理長序列。

At the heart of FFTNet lies a learnable spectral filter that refines the frequency components of the input signal. This filter adjusts the amplitude and phase of different frequencies based on their contribution to the task at hand. The filtered frequency representation is then modulated by a novel activation function, termed modReLU, which applies a standard ReLU function to the real and imaginary components of the complex Fourier coefficients. This step introduces non-linearity into the model, enabling it to learn more complex mappings between input and output.

FFTNet的核心是可學習的光譜濾波器，可完善輸入信號的頻率成分。該過濾器根據其對手頭的任務的貢獻來調整不同頻率的幅度和相位。然後，通過稱為Modrelu的新型激活函數調製過濾的頻率表示，該函數將標準relu函數應用於復雜傅立葉係數的真實和虛構組件。此步驟將非線性引入模型，從而使其能夠在輸入和輸出之間學習更複雜的映射。

Finally, the modified frequency representation is transformed back into the original sequence domain using the inverse FFT, and a global context vector is computed from the spectral domain to guide the spectral filter. This integration of spatial and spectral information allows FFTNet to capture both local and global dependencies in the input sequence.

最後，使用逆FFT將修改的頻率表示轉換回原始序列域，並從光譜域計算全局上下文向量以指導光譜濾波器。空間和光譜信息的這種集成使FFTNet可以在輸入序列中捕獲本地和全局依賴性。

In their experiments, the researchers systematically evaluated the performance of FFTNet on the Long Range Arena (LRA) and ImageNet benchmarks, comparing it with standard Transformer, FNet, and Vision Transformer (ViT) variants. Their results demonstrate that FFTNet achieves superior or comparable performance to existing models in both text and image-based tasks.

在他們的實驗中，研究人員系統地評估了FFTNET在遠距離競技場（LRA）和Imagenet基準測試中的性能，將其與標準變壓器，FNET和Vision Transformer（VIT）變體進行了比較。他們的結果表明，FFTNET在文本和基於圖像的任務中都能達到與現有模型相比的優越或可比的性能。

On the ListOps task of the LRA benchmark, FFTNet attains an accuracy of 37.65%, outperforming both standard Transformer (36.06%) and FNet (35.33%). In text classification tasks, FFTNet consistently shows better performance than its counterparts, showcasing its strength in processing long sequences.

在LRA基準測試的ListOps任務上，FFTNet的準確度為37.65％，表現優於標準變壓器（36.06％）和FNET（35.33％）。在文本分類任務中，FFTNet始終顯示出比其對應物更好的性能，從而展示了其處理長序列的強度。

For image-based tasks, FFTNet exhibits competitive results. In ImageNet classification, the researchers applied ViT variants with FFTNet for efficient computation. Among them, FFTNetViT-B_16e200 exhibits the highest accuracy of 79.0%, and FFTNetViT_L_14e150 achieves the lowest computational cost in terms of FLOPs. Specifically, FFTNetViT_B_16e200 has a computational cost of 314.3M FLOPs, significantly lower than the standard Vision Transformer, which has 1.3B FLOPs.

對於基於圖像的任務，FFTNET表現出競爭成果。在ImageNet分類中，研究人員將使用FFTNET的VOT變體應用於有效的計算。其中，FFTNETVIT-B_16E200的精度最高為79.0％，而FFTNETVIT_L_14E150在FLOPS方面達到了最低的計算成本。具體而言，FFTNETVIT_B_16E200的計算成本為3.143億，大大低於標準視覺變壓器，該標準視覺變壓器具有1.3B的拖船。

This research highlights the potential of spectral methods for efficient and scalable sequence processing. By introducing an adaptive spectral filtering framework with efficient time complexity and the capacity to capture long-range dependencies, FFTNet provides a promising building block for developing more efficient and powerful deep learning models. As we continue to push

這項研究突出了光譜方法對有效且可擴展的序列處理的潛力。 FFTNet引入具有有效時間複雜性的自適應光譜過濾框架，並具有捕獲長期依賴性的能力，為開發更有效，更強大的深度學習模型提供了一個有希望的構建塊。隨著我們繼續推動

免責聲明:info@kdj.com

所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大，建議您充分研究後謹慎投資！

如果您認為本網站使用的內容侵犯了您的版權，請立即聯絡我們（info@kdj.com），我們將及時刪除。

2025年03月01日其他文章發表於