$84970.406320 USD

6.55%

ethereum

$2232.353309 USD

5.10%

tether

$0.999658 USD

0.10%

xrp

$2.180724 USD

8.22%

bnb

$595.246239 USD

4.24%

solana

$143.544771 USD

11.62%

usd-coin

$1.000023 USD

0.01%

dogecoin

$0.204990 USD

10.18%

cardano

$0.646461 USD

9.37%

tron

$0.236037 USD

7.84%

hedera

$0.230859 USD

25.60%

litecoin

$127.299070 USD

8.30%

chainlink

$14.651941 USD

7.13%

stellar

$0.300809 USD

15.64%

avalanche

$22.030668 USD

7.25%

Cryptocurrency News Articles

FFTNet: Adaptive Spectral Filtering for Efficient Long-Range Interactions

Mar 01, 2025 at 10:37 am

Deep learning models have significantly advanced natural language processing and computer vision by enabling efficient data-driven learning.

The remarkable capabilities of deep learning models in domains like natural language processing and computer vision are a product of efficient data-driven learning. However, a major obstacle to pushing these models even further is the computational burden of self-attention mechanisms, especially when handling long sequences or tasks with extensive data.

Traditional transformers perform pairwise comparisons between all tokens in a sequence to generate rich representations, an operation that scales quadratically with sequence length. For shorter sequences, this strategy is highly effective, but as sequences become longer, the models struggle with excessive memory usage and slow inference times. This poses a practical limitation for tasks like machine translation with rich context or open-ended code generation, which often involve processing lengthy sequences.

To navigate this computational challenge, researchers have been developing more efficient architectures that can process long sequences without a significant drop in performance. This pursuit has focused on reducing the computational complexity of self-attention while preserving its ability to capture long-range dependencies, which are crucial for modeling the intricate structure of language and visual scenes.

One promising avenue has been exploring Fourier-based models for token mixing. These models, such as FNet, utilize the Fast Fourier Transform (FFT) to achieve efficient mixing in O(n log n) time. However, many Fourier-based models rely on a static Fourier transform, which might not be optimal for varying input distributions and tasks. Moreover, FNet's performance in LRA and ImageNet has been reported to be lower than traditional self-attention models.

Another class of methods focuses on low-rank approximations of the attention matrix to achieve near-linear complexity. Models like Performer and Linformer decompose the attention matrix into low-rank components, reducing the computational cost. Nonetheless, these models might introduce additional approximations that could affect the quality of attention computation, especially in capturing fine-grained dependencies between tokens.

Convolutional architectures have also been integrated to process sequences in a more efficient manner. These models extract hierarchical features from local neighborhoods using convolutional modules and combine them to capture long-range dependencies without direct token comparisons. While convolutional models excel at extracting spatial features in image processing, they might not be as efficient in fully capturing the complex interactions between tokens in natural language or the diverse patterns in image data.

Now, a research team from the University of Southern California has introduced FFTNet, an adaptive spectral filtering framework that introduces a novel variant of the Fast Fourier Transform (FFT) for global token mixing in O(n log n) time. In contrast to traditional self-attention, which performs pairwise comparisons between all tokens, FFTNet operates on the frequency domain, presenting an efficient and scalable approach for processing long sequences.

At the heart of FFTNet lies a learnable spectral filter that refines the frequency components of the input signal. This filter adjusts the amplitude and phase of different frequencies based on their contribution to the task at hand. The filtered frequency representation is then modulated by a novel activation function, termed modReLU, which applies a standard ReLU function to the real and imaginary components of the complex Fourier coefficients. This step introduces non-linearity into the model, enabling it to learn more complex mappings between input and output.

Finally, the modified frequency representation is transformed back into the original sequence domain using the inverse FFT, and a global context vector is computed from the spectral domain to guide the spectral filter. This integration of spatial and spectral information allows FFTNet to capture both local and global dependencies in the input sequence.

In their experiments, the researchers systematically evaluated the performance of FFTNet on the Long Range Arena (LRA) and ImageNet benchmarks, comparing it with standard Transformer, FNet, and Vision Transformer (ViT) variants. Their results demonstrate that FFTNet achieves superior or comparable performance to existing models in both text and image-based tasks.

On the ListOps task of the LRA benchmark, FFTNet attains an accuracy of 37.65%, outperforming both standard Transformer (36.06%) and FNet (35.33%). In text classification tasks, FFTNet consistently shows better performance than its counterparts, showcasing its strength in processing long sequences.

For image-based tasks, FFTNet exhibits competitive results. In ImageNet classification, the researchers applied ViT variants with FFTNet for efficient computation. Among them, FFTNetViT-B_16e200 exhibits the highest accuracy of 79.0%, and FFTNetViT_L_14e150 achieves the lowest computational cost in terms of FLOPs. Specifically, FFTNetViT_B_16e200 has a computational cost of 314.3M FLOPs, significantly lower than the standard Vision Transformer, which has 1.3B FLOPs.

This research highlights the potential of spectral methods for efficient and scalable sequence processing. By introducing an adaptive spectral filtering framework with efficient time complexity and the capacity to capture long-range dependencies, FFTNet provides a promising building block for developing more efficient and powerful deep learning models. As we continue to push

Disclaimer:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research！

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

Other articles published on Mar 01, 2025