$85279.472095 USD

2.85%

ethereum

$1623.747089 USD

4.76%

tether

$0.999695 USD

0.01%

xrp

$2.152776 USD

7.12%

bnb

$594.596385 USD

1.70%

solana

$132.613105 USD

10.41%

usd-coin

$0.999979 USD

0.01%

dogecoin

$0.166192 USD

4.93%

tron

$0.247529 USD

1.81%

cardano

$0.648978 USD

4.66%

unus-sed-leo

$9.360080 USD

0.33%

chainlink

$13.072736 USD

4.48%

avalanche

$20.382619 USD

7.90%

sui

$2.371121 USD

9.57%

stellar

$0.243619 USD

4.29%

加密货币新闻

扩大大型语言模型（LLM）之外的竞赛超出了百万英语的门槛，这引起了AI社区的激烈辩论。

2025/04/13 03:30

Minimax-Text-01之类的模型具有400万英镑的容量，Gemini 1.5 Pro可以同时处理高达200万个令牌。

The race to expand large language models (LLMs) beyond the million-token threshold has ignited a fierce debate in the AI community. Models like MiniMax's MiniMax-Text-01 boast a 4-million-token capacity, and Gemini 1.5 Pro can process up to 2 million tokens simultaneously, setting a new standard in parallel processing. These models now promise game-changing applications, like analyzing entire codebases, legal contracts or research papers in a single inference call.

扩大大型语言模型（LLM）之外的竞赛超出了百万英语的门槛，这引起了AI社区的激烈辩论。 Minimax的Minimax-Text-01等型号具有400万台的容量，Gemini 1.5 Pro可以同时处理高达200万个令牌，从而在并行处理方面设定了新的标准。这些模型现在承诺改变游戏规则的应用程序，例如在单个推理电话中分析整个代码库，法律合同或研究论文。

At the core of this discussion is context length — the amount of text an AI model can process and also remember at once. A longer context window enables a machine learning (ML) model to handle much more information in a single request and reduces the need for chunking documents into sub-documents or splitting conversations. For context, a model with a 4-million-token capacity could digest 10,000 pages of books in one go.

讨论的核心是上下文长度 - AI模型可以处理的文本数量，也立即记住。较长的上下文窗口使机器学习（ML）模型可以在单个请求中处理更多信息，并减少将文档分解的需求或分组对话。在上下文中，一个拥有400万个容量的模型可以一次消化10,000页的书籍。

In theory, this should mean better comprehension and more sophisticated reasoning. But do these massive context windows translate to real-world business value?

从理论上讲，这应该意味着更好的理解和更复杂的推理。但是，这些庞大的上下文窗口是否转化为现实世界的业务价值？

As enterprises weigh the costs of scaling infrastructure against potential gains in productivity and accuracy, the question remains: Are we unlocking new frontiers in AI reasoning, or simply stretching the limits of token memory without meaningful improvements? This article examines the technical and economic trade-offs, benchmarking challenges and evolving enterprise workflows shaping the future of large-context LLMs.

随着企业权衡扩展基础设施的成本与生产力和准确性的潜在提高，问题仍然存在：我们是在AI推理中解锁新的边界，还是只是在没有有意义的改进的情况下扩展了代币记忆的限制？本文探讨了技术和经济权衡，基准了挑战，并不断发展企业工作流，从而塑造了大型文化LLM的未来。

Why are AI companies racing to expand context lengths?

为什么AI公司竞争扩大上下文长度？

The promise of deeper comprehension, fewer hallucinations and more seamless interactions has led to an arms race among leading labs to expand context length.

更深层次的理解力，更少的幻觉和更多无缝互动的希望导致领先实验室之间的军备竞赛扩大了上下文的长度。

For enterprises, this means being able to analyze an entire legal contract to extract key clauses, debug a large codebase to identify bugs or summarize a lengthy research paper without breaking context.

对于企业而言，这意味着能够分析整个法律合同以提取关键条款，调试大量代码库以识别错误或总结冗长的研究论文而不会破坏背景。

The hope is that eliminating workarounds like chunking or retrieval-augmented generation (RAG) could make AI workflows smoother and more efficient.

希望是消除诸如块或检索效果（RAG）之类的解决方法，可以使AI工作流变得更加顺畅，效率更高。

Solving the ‘needle-in-a-haystack’ problem

解决“针中的针刺”问题

The "needle-in-a-haystack" problem refers to AI's difficulty in identifying critical information (needle) hidden within massive datasets (haystack). LLMs often miss key details, leading to inefficiencies.

“针中的针刺”问题是指AI难以识别隐藏在大量数据集（Haystack）中的关键信息（针）。 LLMS通常会错过关键细节，导致效率低下。

Larger context windows help models retain more information and potentially reduce hallucinations. They also help in improving accuracy and enabling novel use cases:

较大的上下文Windows帮助模型保留了更多信息并有可能减少幻觉。它们还有助于提高准确性并实现新颖的用例：

Increasing the context window also helps the model better reference relevant details and reduces the likelihood of generating incorrect or fabricated information. A 2024 Stanford study found that 128K-token models exhibited an 18% lower hallucination rate compared to RAG systems when analyzing merger agreements.

增加上下文窗口还可以帮助模型更好地参考相关细节，并减少生成不正确或捏造信息的可能性。一项2024年的斯坦福大学研究发现，与分析合并协议时，与破布系统相比，128k token模型的幻觉速度低18％。

However, early adopters have reported some challenges. For instance, JPMorgan Chase's research demonstrates how models perform poorly on approximately 75% of their context, with performance on complex financial tasks collapsing to nearly zero beyond 32K tokens. Models still broadly struggle with long-range recall, often prioritizing recent data over deeper insights.

但是，早期采用者报告了一些挑战。例如，摩根大通（JPMorgan Chase）的研究表明，模型在大约75％的上下文中的表现较差，并且复杂的财务任务的绩效崩溃了至32K代币以上。模型仍然在长期召回中大致挣扎，通常将最新数据优先于更深入的见解。

This raises questions: Does a 4-million-token window truly enhance reasoning, or is it just a costly expansion of memory? How much of this vast input does the model actually use? And do the benefits outweigh the rising computational costs?

这引起了问题：一个400万座的窗口是否真正增强了推理，还是只是内存的昂贵扩展？该模型实际使用了多少大量输入？好处是否超过了计算成本上升？

What are the economic trade-offs of using RAG?

使用抹布的经济权衡是什么？

RAG combines the power of LLMs with a retrieval system to fetch relevant information from an external database or document store. This allows the model to generate responses based on both pre-existing knowledge and dynamically retrieved data.

RAG将LLM的功率与检索系统相结合，以从外部数据库或文档存储中获取相关信息。这使该模型可以基于预先存在的知识和动态检索数据生成响应。

As companies adopt LLMs for increasingly complex tasks, they face a critical decision: Use massive prompts with large context windows, or rely on RAG to fetch relevant information dynamically.

随着公司采用LLM进行日益复杂的任务，他们面临着一个关键的决定：使用大量上下文窗口的大量提示，或者依靠抹布动态获取相关信息。

Comparing AI inference costs: Multi-step retrieval vs. large single prompts

比较AI推理成本：多步检索与大型单个提示

While large prompts offer the advantage of simplifying workflows into a single step, they require more GPU power and memory, rendering them costly at scale. In contrast, RAG-based approaches, despite requiring multiple retrieval and generation steps, often reduce overall token consumption, leading to lower inference costs without sacrificing accuracy.

虽然大提示提供了将工作流程简化为一个步骤的优势，但它们需要更多的GPU功率和内存，从而使它们规模昂贵。相比之下，尽管需要多个检索和生成步骤，但基于抹布的方法通常会减少总体上的消耗，从而导致推理成本降低而无需牺牲准确性。

For most enterprises, the best approach depends on the use case:

对于大多数企业，最佳方法取决于用例：

A large context window is valuable when:

当以下情况下，一个大的上下文窗口很有价值

Per Google research, stock prediction models using 128K-token windows and 10 years of earnings transcripts outperformed RAG by 29%. On the other hand, GitHub Copilot's internal testing showed that tasks like monorepo migrations were completed 2.3x faster with large prompts compared to RAG.

根据Google的研究，使用128k token Windows和10年的收入成绩单的股票预测模型优于29％。另一方面，Github Copilot的内部测试表明，与RAG相比，MonorePo迁移之类的任务更快地完成了2.3倍。

Breaking down the diminishing returns

分解回报率降低

The limits of large context models: Latency, costs and usability

大环境模型的限制：延迟，成本和可用性

While large context models offer impressive capabilities, there are limits to how much extra context is truly beneficial. As context windows expand, three key factors come into play:

尽管大型上下文模型提供了令人印象深刻的功能，但额外的上下文确实是有益的。随着上下文窗口的扩展，三个关键因素开始起作用：

Google's Infini-attention technique attempts to circumvent these trade-offs by storing compressed representations of arbitrary-length context within bounded memory. However, compression leads to information loss, and models struggle to balance immediate and historical information. This leads to performance degradations and

Google的Infini-Inction技术试图通过在有限的内存中存储任意长度上下文的压缩表示来规避这些权衡。但是，压缩会导致信息丢失，模型努力平衡即时和历史信息。这会导致性能降解和

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资，kdj.com不承担任何责任。加密货币具有高波动性，强烈建议您深入研究后，谨慎投资！

如您认为本网站上使用的内容侵犯了您的版权，请立即联系我们（info@kdj.com），我们将及时删除。

2025年04月13日发表的其他文章