$85279.472095 USD

2.85%

ethereum

$1623.747089 USD

4.76%

tether

$0.999695 USD

0.01%

xrp

$2.152776 USD

7.12%

bnb

$594.596385 USD

1.70%

solana

$132.613105 USD

10.41%

usd-coin

$0.999979 USD

0.01%

dogecoin

$0.166192 USD

4.93%

tron

$0.247529 USD

1.81%

cardano

$0.648978 USD

4.66%

unus-sed-leo

$9.360080 USD

0.33%

chainlink

$13.072736 USD

4.48%

avalanche

$20.382619 USD

7.90%

sui

$2.371121 USD

9.57%

stellar

$0.243619 USD

4.29%

暗号通貨のニュース記事

100万のトークンのしきい値を超えて大規模な言語モデル（LLMS）を拡大する競争は、AIコミュニティでの激しい議論に火をつけました。

2025/04/13 03:30

Minimax-Text-01のようなモデルは400万のトークン容量を誇っており、Gemini 1.5 Proは最大200万トークンを同時に処理できます。

The race to expand large language models (LLMs) beyond the million-token threshold has ignited a fierce debate in the AI community. Models like MiniMax's MiniMax-Text-01 boast a 4-million-token capacity, and Gemini 1.5 Pro can process up to 2 million tokens simultaneously, setting a new standard in parallel processing. These models now promise game-changing applications, like analyzing entire codebases, legal contracts or research papers in a single inference call.

100万のトークンのしきい値を超えて大規模な言語モデル（LLMS）を拡大する競争は、AIコミュニティでの激しい議論に火をつけました。 MinimaxのMinimax-Text-01のようなモデルは400万枚のトークン容量を備えており、Gemini 1.5 Proは同時に最大200万トークンを処理し、並列処理で新しい標準を設定できます。これらのモデルは、単一の推論コールでコードベース全体、法的契約、または研究論文の分析など、ゲームを変えるアプリケーションを約束します。

At the core of this discussion is context length — the amount of text an AI model can process and also remember at once. A longer context window enables a machine learning (ML) model to handle much more information in a single request and reduces the need for chunking documents into sub-documents or splitting conversations. For context, a model with a 4-million-token capacity could digest 10,000 pages of books in one go.

この議論の中核は、コンテキストの長さです。AIモデルが処理し、一度に覚えているテキストの量です。長いコンテキストウィンドウを使用すると、機械学習（ML）モデルが1つの要求でより多くの情報を処理し、ドキュメントのチャンキングの必要性をサブドキュメントに削減したり、会話を分割したりすることができます。コンテキストでは、400万枚の容量を持つモデルは、1回で10,000ページの本を消化する可能性があります。

In theory, this should mean better comprehension and more sophisticated reasoning. But do these massive context windows translate to real-world business value?

理論的には、これはより良い理解とより洗練された推論を意味するはずです。しかし、これらの大規模なコンテキストウィンドウは、実際のビジネス価値に変換されますか？

As enterprises weigh the costs of scaling infrastructure against potential gains in productivity and accuracy, the question remains: Are we unlocking new frontiers in AI reasoning, or simply stretching the limits of token memory without meaningful improvements? This article examines the technical and economic trade-offs, benchmarking challenges and evolving enterprise workflows shaping the future of large-context LLMs.

企業は、生産性と正確性の潜在的な利益に対してインフラストラクチャをスケーリングするコストを比較検討しているため、問題は残ります。AI推論で新しいフロンティアを解き放つのか、それとも単に意味のある改善なしにトークン記憶の限界を伸ばしていますか？この記事では、技術的および経済的トレードオフ、ベンチマークの課題、および大規模なコンテキストLLMの将来を形作るエンタープライズワークフローを進化させることを検討します。

Why are AI companies racing to expand context lengths?

AI企業がコンテキストの長さを拡大するために競争するのはなぜですか？

The promise of deeper comprehension, fewer hallucinations and more seamless interactions has led to an arms race among leading labs to expand context length.

より深い理解、幻覚の減少、よりシームレスな相互作用の約束により、コンテキストの長さを拡大するための主要なラボ間の軍拡競争につながりました。

For enterprises, this means being able to analyze an entire legal contract to extract key clauses, debug a large codebase to identify bugs or summarize a lengthy research paper without breaking context.

企業の場合、これは、重要な条項を抽出したり、大きなコードベースをデバッグしてバグを識別したり、文脈を破ることなく長い研究論文を要約したりするための法的契約全体を分析できることを意味します。

The hope is that eliminating workarounds like chunking or retrieval-augmented generation (RAG) could make AI workflows smoother and more efficient.

希望は、チャンキングや検索の高世代（RAG）などの回避策を排除すると、AIワークフローがよりスムーズで効率的になる可能性があることです。

Solving the ‘needle-in-a-haystack’ problem

「ヘイスタックの針」問題を解決します

The "needle-in-a-haystack" problem refers to AI's difficulty in identifying critical information (needle) hidden within massive datasets (haystack). LLMs often miss key details, leading to inefficiencies.

「ヘイスタックの針」問題とは、大規模なデータセット（HayStack）内に隠された重要な情報（針）を特定する際のAIの困難を指します。 LLMは多くの場合、重要な詳細を見逃し、非効率性につながります。

Larger context windows help models retain more information and potentially reduce hallucinations. They also help in improving accuracy and enabling novel use cases:

より大きなコンテキストウィンドウは、モデルがより多くの情報を保持し、幻覚を減らすのに役立ちます。また、精度の向上と新しいユースケースの有効化にも役立ちます。

Increasing the context window also helps the model better reference relevant details and reduces the likelihood of generating incorrect or fabricated information. A 2024 Stanford study found that 128K-token models exhibited an 18% lower hallucination rate compared to RAG systems when analyzing merger agreements.

また、コンテキストウィンドウを増やすと、モデルが関連する詳細を参照するのに役立ち、間違った情報または製造された情報を生成する可能性が低くなります。 2024年のスタンフォード大学の調査では、合併契約を分析する際に、128KトークンモデルがRAGシステムと比較して18％低い幻覚率を示したことがわかりました。

However, early adopters have reported some challenges. For instance, JPMorgan Chase's research demonstrates how models perform poorly on approximately 75% of their context, with performance on complex financial tasks collapsing to nearly zero beyond 32K tokens. Models still broadly struggle with long-range recall, often prioritizing recent data over deeper insights.

ただし、アーリーアダプターはいくつかの課題を報告しています。たとえば、JPMorgan Chaseの調査では、コンテキストの約75％でモデルのパフォーマンスが低下し、複雑な財務タスクのパフォーマンスが32Kトークンを超えてほぼゼロに崩壊する方法を示しています。モデルは依然として長距離のリコールに広く苦労しており、多くの場合、より深い洞察よりも最近のデータを優先しています。

This raises questions: Does a 4-million-token window truly enhance reasoning, or is it just a costly expansion of memory? How much of this vast input does the model actually use? And do the benefits outweigh the rising computational costs?

これは疑問を提起します：400万のトークンウィンドウは本当に推論を強化しますか、それとも単なるコストのかかるメモリの拡大ですか？このモデルは実際に使用しているこの広大な入力はどれくらいですか？そして、利益は計算コストの上昇を上回っていますか？

What are the economic trade-offs of using RAG?

RAGを使用することの経済的トレードオフは何ですか？

RAG combines the power of LLMs with a retrieval system to fetch relevant information from an external database or document store. This allows the model to generate responses based on both pre-existing knowledge and dynamically retrieved data.

RAGは、LLMSのパワーを検索システムと組み合わせて、外部データベースまたはドキュメントストアから関連情報を取得します。これにより、モデルは、既存の知識と動的に取得されたデータの両方に基づいて応答を生成できます。

As companies adopt LLMs for increasingly complex tasks, they face a critical decision: Use massive prompts with large context windows, or rely on RAG to fetch relevant information dynamically.

企業がますます複雑になるタスクのためにLLMを採用すると、重要な決定に直面しています。大きなコンテキストウィンドウで大規模なプロンプトを使用するか、RAGに依存して関連情報を動的に取得します。

Comparing AI inference costs: Multi-step retrieval vs. large single prompts

AI推論コストの比較：マルチステップ検索と大きなシングルプロンプト

While large prompts offer the advantage of simplifying workflows into a single step, they require more GPU power and memory, rendering them costly at scale. In contrast, RAG-based approaches, despite requiring multiple retrieval and generation steps, often reduce overall token consumption, leading to lower inference costs without sacrificing accuracy.

大規模なプロンプトは、ワークフローを単一のステップに簡素化するという利点を提供しますが、より多くのGPUのパワーとメモリが必要になり、大規模にコストがかかります。対照的に、RAGベースのアプローチは、複数の検索と生成のステップを必要としているにもかかわらず、全体的なトークン消費を削減し、精度を犠牲にすることなく推論コストを削減します。

For most enterprises, the best approach depends on the use case:

ほとんどの企業にとって、最良のアプローチはユースケースに依存します。

A large context window is valuable when:

大きなコンテキストウィンドウは、次の場合に価値があります

Per Google research, stock prediction models using 128K-token windows and 10 years of earnings transcripts outperformed RAG by 29%. On the other hand, GitHub Copilot's internal testing showed that tasks like monorepo migrations were completed 2.3x faster with large prompts compared to RAG.

Googleの調査ごとに、128Kトークンのウィンドウを使用した在庫予測モデルと10年間の収益転写産物は、RAGを29％上回りました。一方、Github Copilotの内部テストは、Monorepo移動のようなタスクがRAGと比較して大きなプロンプトで2.3倍速く完了することを示しました。

Breaking down the diminishing returns

減少するリターンを分解します

The limits of large context models: Latency, costs and usability

大規模なコンテキストモデルの限界：レイテンシ、コスト、ユーザビリティ

While large context models offer impressive capabilities, there are limits to how much extra context is truly beneficial. As context windows expand, three key factors come into play:

大規模なコンテキストモデルは印象的な機能を提供しますが、どのくらいの追加コンテキストが本当に有益であるかには制限があります。コンテキストウィンドウが拡大すると、3つの重要な要素が発生します。

Google's Infini-attention technique attempts to circumvent these trade-offs by storing compressed representations of arbitrary-length context within bounded memory. However, compression leads to information loss, and models struggle to balance immediate and historical information. This leads to performance degradations and

Googleのinfini-attention技術は、境界のあるメモリ内に任意の長さのコンテキストの圧縮表現を保存することにより、これらのトレードオフを回避しようとします。ただし、圧縮は情報の損失につながり、モデルは即時および歴史的情報のバランスをとるのに苦労しています。これは、パフォーマンスの劣化につながります

免責事項:info@kdj.com

提供される情報は取引に関するアドバイスではありません。 kdj.com は、この記事で提供される情報に基づいて行われた投資に対して一切の責任を負いません。暗号通貨は変動性が高いため、十分な調査を行った上で慎重に投資することを強くお勧めします。

このウェブサイトで使用されているコンテンツが著作権を侵害していると思われる場合は、直ちに当社 (info@kdj.com) までご連絡ください。速やかに削除させていただきます。

2025年04月13日に掲載されたその他の記事

もっと