$108530.002893 USD

1.12%

ethereum

$2501.495543 USD

2.83%

tether

$1.000245 USD

-0.01%

xrp

$2.198131 USD

0.43%

bnb

$654.360076 USD

0.87%

solana

$152.192030 USD

1.55%

usd-coin

$0.999839 USD

0.00%

tron

$0.276594 USD

0.49%

dogecoin

$0.167580 USD

2.68%

cardano

$0.568515 USD

0.60%

hyperliquid

$40.700758 USD

7.87%

bitcoin-cash

$500.972465 USD

1.64%

sui

$2.847545 USD

2.13%

chainlink

$13.518965 USD

1.41%

unus-sed-leo

$9.163651 USD

0.47%

암호화폐 뉴스 기사

FFTNET : 효율적인 장거리 상호 작용을위한 적응 형 스펙트럼 필터링

2025/03/01 10:37

딥 러닝 모델은 효율적인 데이터 중심 학습을 가능하게하여 자연어 처리 및 컴퓨터 비전을 상당히 발전 시켰습니다.

The remarkable capabilities of deep learning models in domains like natural language processing and computer vision are a product of efficient data-driven learning. However, a major obstacle to pushing these models even further is the computational burden of self-attention mechanisms, especially when handling long sequences or tasks with extensive data.

자연어 처리 및 컴퓨터 비전과 같은 도메인에서 딥 러닝 모델의 놀라운 기능은 효율적인 데이터 중심 학습의 산물입니다. 그러나 이러한 모델을 더욱 발전시키는 데있어 주요 장애물은 특히 광범위한 데이터로 긴 시퀀스 나 작업을 처리 할 때 자체 변환 메커니즘의 계산 부담입니다.

Traditional transformers perform pairwise comparisons between all tokens in a sequence to generate rich representations, an operation that scales quadratically with sequence length. For shorter sequences, this strategy is highly effective, but as sequences become longer, the models struggle with excessive memory usage and slow inference times. This poses a practical limitation for tasks like machine translation with rich context or open-ended code generation, which often involve processing lengthy sequences.

전통적인 변압기는 시퀀스 길이에 따라 2 차적으로 스케일링되는 연산 인 풍부한 표현을 생성하기 위해 모든 토큰 간의 쌍별 비교를 수행합니다. 더 짧은 시퀀스의 경우,이 전략은 매우 효과적이지만 시퀀스가 길어지면 모델은 과도한 메모리 사용량과 느린 추론 시간으로 어려움을 겪고 있습니다. 이는 풍부한 컨텍스트 또는 오픈 엔드 코드 생성을 갖는 기계 번역과 같은 작업에 실질적인 제한을 제기하며, 종종 긴 시퀀스를 처리하는 것이 포함됩니다.

To navigate this computational challenge, researchers have been developing more efficient architectures that can process long sequences without a significant drop in performance. This pursuit has focused on reducing the computational complexity of self-attention while preserving its ability to capture long-range dependencies, which are crucial for modeling the intricate structure of language and visual scenes.

이 계산 문제를 탐색하기 위해 연구원들은 성능이 크게 떨어지지 않고 긴 시퀀스를 처리 할 수있는보다 효율적인 아키텍처를 개발해 왔습니다. 이러한 추구는 자체 소송의 계산 복잡성을 줄이는 동시에 장거리 종속성을 포착하는 능력을 보존하는 데 중점을 두었습니다. 이는 언어 및 시각적 장면의 복잡한 구조를 모델링하는 데 중요합니다.

One promising avenue has been exploring Fourier-based models for token mixing. These models, such as FNet, utilize the Fast Fourier Transform (FFT) to achieve efficient mixing in O(n log n) time. However, many Fourier-based models rely on a static Fourier transform, which might not be optimal for varying input distributions and tasks. Moreover, FNet's performance in LRA and ImageNet has been reported to be lower than traditional self-attention models.

유망한 애비뉴 중 하나는 토큰 믹싱을위한 푸리에 기반 모델을 탐색하고 있습니다. FNET와 같은 이러한 모델은 빠른 푸리에 변환 (FFT)을 사용하여 O (n log n) 시간에서 효율적인 혼합을 달성합니다. 그러나 많은 푸리에 기반 모델은 정적 푸리에 변환에 의존하여 입력 분포 및 작업에 대한 다양한에는 최적이 아닐 수 있습니다. 또한 LRA 및 Imagenet에서의 FNET의 성능은 기존의 자체 변환 모델보다 낮은 것으로보고되었습니다.

Another class of methods focuses on low-rank approximations of the attention matrix to achieve near-linear complexity. Models like Performer and Linformer decompose the attention matrix into low-rank components, reducing the computational cost. Nonetheless, these models might introduce additional approximations that could affect the quality of attention computation, especially in capturing fine-grained dependencies between tokens.

또 다른 클래스의 방법은주의 매트릭스의 낮은 순위 근사에 초점을 맞추기 위해 거의 정사각형 복잡성을 달성합니다. Performer 및 Linformer와 같은 모델은주의 매트릭스를 저급 구성 요소로 분해하여 계산 비용을 줄입니다. 그럼에도 불구하고, 이러한 모델은 특히 토큰 간의 세밀한 의존성을 캡처하는 데주의 품질 계산에 영향을 줄 수있는 추가 근사치를 도입 할 수 있습니다.

Convolutional architectures have also been integrated to process sequences in a more efficient manner. These models extract hierarchical features from local neighborhoods using convolutional modules and combine them to capture long-range dependencies without direct token comparisons. While convolutional models excel at extracting spatial features in image processing, they might not be as efficient in fully capturing the complex interactions between tokens in natural language or the diverse patterns in image data.

컨볼 루션 아키텍처는 또한보다 효율적인 방식으로 시퀀스를 처리하기 위해 통합되었습니다. 이 모델은 Convolutional Modules를 사용하여 지역 지역에서 계층 적 기능을 추출하여 직접 토큰 비교없이 장거리 종속성을 캡처합니다. Convolutional 모델은 이미지 처리에서 공간 특징을 추출하는 데 탁월하지만 자연 언어로 된 토큰 또는 이미지 데이터의 다양한 패턴 간의 복잡한 상호 작용을 완전히 캡처하는 데 효율적이지 않을 수 있습니다.

Now, a research team from the University of Southern California has introduced FFTNet, an adaptive spectral filtering framework that introduces a novel variant of the Fast Fourier Transform (FFT) for global token mixing in O(n log n) time. In contrast to traditional self-attention, which performs pairwise comparisons between all tokens, FFTNet operates on the frequency domain, presenting an efficient and scalable approach for processing long sequences.

이제 남부 캘리포니아 대학교 (University of Southern California)의 연구팀은 O (n log n) 시간의 글로벌 토큰 믹싱을위한 빠른 푸리에 변환 (FFT)의 새로운 변형을 도입하는 적응 형 스펙트럼 필터링 프레임 워크 인 FFTNet을 소개했습니다. 모든 토큰 간의 쌍별 비교를 수행하는 전통적인 자체 변환과 달리, FFTNet은 주파수 영역에서 작동하여 긴 시퀀스를 처리하기위한 효율적이고 확장 가능한 접근법을 제시합니다.

At the heart of FFTNet lies a learnable spectral filter that refines the frequency components of the input signal. This filter adjusts the amplitude and phase of different frequencies based on their contribution to the task at hand. The filtered frequency representation is then modulated by a novel activation function, termed modReLU, which applies a standard ReLU function to the real and imaginary components of the complex Fourier coefficients. This step introduces non-linearity into the model, enabling it to learn more complex mappings between input and output.

FFTNET의 핵심에는 입력 신호의 주파수 구성 요소를 개선하는 학습 가능한 스펙트럼 필터가 있습니다. 이 필터는 당면한 작업에 대한 기여에 따라 다른 주파수의 진폭과 위상을 조정합니다. 여과 된 주파수 표현은 복잡한 푸리에 계수의 실제 및 가상 성분에 표준 Relu 함수를 적용하는 새로운 활성화 함수에 의해 변조된다. 이 단계는 모델에 비선형 성을 소개하여 입력과 출력 사이의보다 복잡한 매핑을 배울 수 있습니다.

Finally, the modified frequency representation is transformed back into the original sequence domain using the inverse FFT, and a global context vector is computed from the spectral domain to guide the spectral filter. This integration of spatial and spectral information allows FFTNet to capture both local and global dependencies in the input sequence.

마지막으로, 변형 된 주파수 표현은 역 FFT를 사용하여 원래 시퀀스 도메인으로 다시 변환되며, 스펙트럼 필터를 안내하기 위해 스펙트럼 도메인으로부터 글로벌 컨텍스트 벡터가 계산된다. 이러한 공간 및 스펙트럼 정보의 통합을 통해 FFTNet은 입력 순서에서 로컬 및 글로벌 종속성을 모두 캡처 할 수 있습니다.

In their experiments, the researchers systematically evaluated the performance of FFTNet on the Long Range Arena (LRA) and ImageNet benchmarks, comparing it with standard Transformer, FNet, and Vision Transformer (ViT) variants. Their results demonstrate that FFTNet achieves superior or comparable performance to existing models in both text and image-based tasks.

실험에서 연구원들은 LRA (Long Range Arena) 및 Imagenet 벤치 마크에서 FFTNet의 성능을 체계적으로 평가하여 표준 변압기, FNET 및 Vision Transformer (VIVE) 변형과 비교했습니다. 그들의 결과는 FFTNet이 텍스트 및 이미지 기반 작업 모두에서 기존 모델에 대한 우수하거나 비슷한 성능을 달성 함을 보여줍니다.

On the ListOps task of the LRA benchmark, FFTNet attains an accuracy of 37.65%, outperforming both standard Transformer (36.06%) and FNet (35.33%). In text classification tasks, FFTNet consistently shows better performance than its counterparts, showcasing its strength in processing long sequences.

LRA 벤치 마크의 ListOps 작업에서 FFTNet은 37.65%의 정확도를 달성하여 표준 변압기 (36.06%)와 FNET (35.33%)를 능가했습니다. 텍스트 분류 작업에서 FFTNET은 대응 자보다 성능이 향상되어 긴 시퀀스를 처리 할 때 강도를 보여줍니다.

For image-based tasks, FFTNet exhibits competitive results. In ImageNet classification, the researchers applied ViT variants with FFTNet for efficient computation. Among them, FFTNetViT-B_16e200 exhibits the highest accuracy of 79.0%, and FFTNetViT_L_14e150 achieves the lowest computational cost in terms of FLOPs. Specifically, FFTNetViT_B_16e200 has a computational cost of 314.3M FLOPs, significantly lower than the standard Vision Transformer, which has 1.3B FLOPs.

이미지 기반 작업의 경우 FFTNET는 경쟁력있는 결과를 보여줍니다. ImageNet 분류에서 연구자들은 효율적인 계산을 위해 FFTNET를 사용하여 VIT 변형을 적용했습니다. 그 중에서 FFTNETVIT-B_16E200은 79.0%의 정확도가 가장 높으며 FFTNETVIT_L_14E150은 FLOP 측면에서 가장 낮은 계산 비용을 달성합니다. 구체적으로, FFTNETVIT_B_16E200은 314.3m FLOP의 계산 비용을 가지고 있으며, 이는 1.3B 플롭을 갖는 표준 비전 변압기보다 상당히 낮습니다.

This research highlights the potential of spectral methods for efficient and scalable sequence processing. By introducing an adaptive spectral filtering framework with efficient time complexity and the capacity to capture long-range dependencies, FFTNet provides a promising building block for developing more efficient and powerful deep learning models. As we continue to push

이 연구는 효율적이고 확장 가능한 서열 처리를위한 스펙트럼 방법의 잠재력을 강조합니다. FFTNET는 효율적인 시간 복잡성과 장거리 종속성을 캡처 할 수있는 능력을 갖춘 적응 형 스펙트럼 필터링 프레임 워크를 도입함으로써보다 효율적이고 강력한 딥 러닝 모델을 개발하기위한 유망한 빌딩 블록을 제공합니다. 우리가 계속 밀면서

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

2025年07月01日 에 게재된 다른 기사

더