$84970.406320 USD

6.55%

ethereum

$2232.353309 USD

5.10%

tether

$0.999658 USD

0.10%

xrp

$2.180724 USD

8.22%

bnb

$595.246239 USD

4.24%

solana

$143.544771 USD

11.62%

usd-coin

$1.000023 USD

0.01%

dogecoin

$0.204990 USD

10.18%

cardano

$0.646461 USD

9.37%

tron

$0.236037 USD

7.84%

hedera

$0.230859 USD

25.60%

litecoin

$127.299070 USD

8.30%

chainlink

$14.651941 USD

7.13%

stellar

$0.300809 USD

15.64%

avalanche

$22.030668 USD

7.25%

加密货币新闻

FFTNET：自适应光谱过滤，以进行有效的远程相互作用

2025/03/01 10:37

深度学习模型通过实现有效的数据驱动学习，具有显着高级的自然语言处理和计算机视觉。

The remarkable capabilities of deep learning models in domains like natural language processing and computer vision are a product of efficient data-driven learning. However, a major obstacle to pushing these models even further is the computational burden of self-attention mechanisms, especially when handling long sequences or tasks with extensive data.

自然语言处理和计算机视觉等领域中深度学习模型的显着功能是有效数据驱动学习的产物。但是，推动这些模型的主要障碍是自我发挥机制的计算负担，尤其是在处理长序列或使用大量数据的任务时。

Traditional transformers perform pairwise comparisons between all tokens in a sequence to generate rich representations, an operation that scales quadratically with sequence length. For shorter sequences, this strategy is highly effective, but as sequences become longer, the models struggle with excessive memory usage and slow inference times. This poses a practical limitation for tasks like machine translation with rich context or open-ended code generation, which often involve processing lengthy sequences.

传统的变压器在序列中的所有令牌之间进行成对比较，以生成丰富的表示形式，该操作以序列长度缩放四倍。对于较短的序列，该策略是非常有效的，但是随着序列的变长，模型在过度记忆使用和推理时间缓慢而挣扎。这对具有丰富上下文或开放式代码生成的机器翻译等任务构成了实际限制，这通常涉及处理冗长的序列。

To navigate this computational challenge, researchers have been developing more efficient architectures that can process long sequences without a significant drop in performance. This pursuit has focused on reducing the computational complexity of self-attention while preserving its ability to capture long-range dependencies, which are crucial for modeling the intricate structure of language and visual scenes.

为了应对这一计算挑战，研究人员一直在开发更有效的架构，这些架构可以处理长序列而不会大幅下降。这种追求的重点是降低自我注意力的计算复杂性，同时保留其捕获长期依赖性的能力，这对于建模语言和视觉场景的复杂结构至关重要。

One promising avenue has been exploring Fourier-based models for token mixing. These models, such as FNet, utilize the Fast Fourier Transform (FFT) to achieve efficient mixing in O(n log n) time. However, many Fourier-based models rely on a static Fourier transform, which might not be optimal for varying input distributions and tasks. Moreover, FNet's performance in LRA and ImageNet has been reported to be lower than traditional self-attention models.

一个有希望的大道一直在探索用于代币混合的基于傅立叶的模型。这些模型（例如FNET）利用快速傅立叶变换（FFT）在O（n log n）时间中实现有效的混合。但是，许多基于傅立叶的模型都依赖于静态傅立叶变换，这对于不同的输入分布和任务可能不是最佳的。此外，据报道，FNET在LRA和Imagenet中的性能低于传统的自我注意力模型。

Another class of methods focuses on low-rank approximations of the attention matrix to achieve near-linear complexity. Models like Performer and Linformer decompose the attention matrix into low-rank components, reducing the computational cost. Nonetheless, these models might introduce additional approximations that could affect the quality of attention computation, especially in capturing fine-grained dependencies between tokens.

另一种方法的重点是注意矩阵的低级别近似值，以达到接近线性的复杂性。诸如表演者和Linformer之类的模型将注意力矩阵分解为低级别组件，从而降低了计算成本。但是，这些模型可能会引入其他可能影响注意力计算质量的近似值，尤其是在捕获令牌之间的细粒度依赖性时。

Convolutional architectures have also been integrated to process sequences in a more efficient manner. These models extract hierarchical features from local neighborhoods using convolutional modules and combine them to capture long-range dependencies without direct token comparisons. While convolutional models excel at extracting spatial features in image processing, they might not be as efficient in fully capturing the complex interactions between tokens in natural language or the diverse patterns in image data.

卷积体系结构也已集成到更有效的方式处理序列。这些模型使用卷积模块从本地社区提取分层特征，并将它们组合起来以捕获长期依赖性，而无需直接令牌比较。尽管卷积模型在图像处理中提取空间特征方面表现出色，但它们在完全捕获自然语言中令牌或图像数据中多种模式之间的复杂相互作用方面可能并不那么有效。

Now, a research team from the University of Southern California has introduced FFTNet, an adaptive spectral filtering framework that introduces a novel variant of the Fast Fourier Transform (FFT) for global token mixing in O(n log n) time. In contrast to traditional self-attention, which performs pairwise comparisons between all tokens, FFTNet operates on the frequency domain, presenting an efficient and scalable approach for processing long sequences.

现在，南加州大学的一个研究团队推出了FFTNet，这是一个自适应光谱过滤框架，它引入了o（n log n）时间中全球令牌混合的快速傅立叶变换（FFT）的新型变体。与在所有令牌之间进行成对比较的传统自我注意力相反，FFTNet在频域上运行，这是一种有效且可扩展的方法来处理长序列。

At the heart of FFTNet lies a learnable spectral filter that refines the frequency components of the input signal. This filter adjusts the amplitude and phase of different frequencies based on their contribution to the task at hand. The filtered frequency representation is then modulated by a novel activation function, termed modReLU, which applies a standard ReLU function to the real and imaginary components of the complex Fourier coefficients. This step introduces non-linearity into the model, enabling it to learn more complex mappings between input and output.

FFTNet的核心是可学习的光谱滤波器，可完善输入信号的频率成分。该过滤器根据其对手头的任务的贡献来调整不同频率的幅度和相位。然后，通过称为Modrelu的新型激活函数调制过滤的频率表示，该函数将标准relu函数应用于复杂傅立叶系数的真实和虚构组件。此步骤将非线性引入模型，从而使其能够在输入和输出之间学习更复杂的映射。

Finally, the modified frequency representation is transformed back into the original sequence domain using the inverse FFT, and a global context vector is computed from the spectral domain to guide the spectral filter. This integration of spatial and spectral information allows FFTNet to capture both local and global dependencies in the input sequence.

最后，使用逆FFT将修改的频率表示转换回原始序列域，并从光谱域计算全局上下文向量以指导光谱滤波器。空间和光谱信息的这种集成使FFTNet可以在输入序列中捕获本地和全局依赖性。

In their experiments, the researchers systematically evaluated the performance of FFTNet on the Long Range Arena (LRA) and ImageNet benchmarks, comparing it with standard Transformer, FNet, and Vision Transformer (ViT) variants. Their results demonstrate that FFTNet achieves superior or comparable performance to existing models in both text and image-based tasks.

在他们的实验中，研究人员系统地评估了FFTNET在远距离竞技场（LRA）和Imagenet基准测试中的性能，将其与标准变压器，FNET和Vision Transformer（VIT）变体进行了比较。他们的结果表明，FFTNET在文本和基于图像的任务中都能达到与现有模型相比的优越或可比的性能。

On the ListOps task of the LRA benchmark, FFTNet attains an accuracy of 37.65%, outperforming both standard Transformer (36.06%) and FNet (35.33%). In text classification tasks, FFTNet consistently shows better performance than its counterparts, showcasing its strength in processing long sequences.

在LRA基准测试的ListOps任务上，FFTNet的准确度为37.65％，表现优于标准变压器（36.06％）和FNET（35.33％）。在文本分类任务中，FFTNet始终显示出比其对应物更好的性能，从而展示了其处理长序列的强度。

For image-based tasks, FFTNet exhibits competitive results. In ImageNet classification, the researchers applied ViT variants with FFTNet for efficient computation. Among them, FFTNetViT-B_16e200 exhibits the highest accuracy of 79.0%, and FFTNetViT_L_14e150 achieves the lowest computational cost in terms of FLOPs. Specifically, FFTNetViT_B_16e200 has a computational cost of 314.3M FLOPs, significantly lower than the standard Vision Transformer, which has 1.3B FLOPs.

对于基于图像的任务，FFTNET表现出竞争成果。在ImageNet分类中，研究人员将使用FFTNET的VOT变体应用于有效的计算。其中，FFTNETVIT-B_16E200的精度最高为79.0％，而FFTNETVIT_L_14E150在FLOPS方面达到了最低的计算成本。具体而言，FFTNETVIT_B_16E200的计算成本为3.143亿，大大低于标准视觉变压器，该标准视觉变压器具有1.3B的拖船。

This research highlights the potential of spectral methods for efficient and scalable sequence processing. By introducing an adaptive spectral filtering framework with efficient time complexity and the capacity to capture long-range dependencies, FFTNet provides a promising building block for developing more efficient and powerful deep learning models. As we continue to push

这项研究突出了光谱方法对有效且可扩展的序列处理的潜力。 FFTNet引入具有有效时间复杂性的自适应光谱过滤框架，并具有捕获长期依赖性的能力，为开发更有效，更强大的深度学习模型提供了一个有希望的构建块。随着我们继续推动

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资，kdj.com不承担任何责任。加密货币具有高波动性，强烈建议您深入研究后，谨慎投资！

如您认为本网站上使用的内容侵犯了您的版权，请立即联系我们（info@kdj.com），我们将及时删除。

2025年03月01日发表的其他文章