|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
研究人员在使用隐式和显式函数对法学硕士进行微调后,开发了推理时间对齐方法来整合人类价值观,而无需更改基础模型。
Integrating human values after training a model with Learning-based algorithms requires fine-tuning LLMs, which is computationally expensive and time-consuming. Moreover, it generates biased and undesirable responses by the user. A model that can efficiently adapt to user preferences in real time by integrating algorithms that can interfere at inference time is needed. This method will avoid retraining the models repeatedly for desired results by freezing the base model and reducing the computational cost of fine-tuning LLMs.
在使用基于学习的算法训练模型后整合人类价值观需要对 LLM 进行微调,这在计算上是昂贵且耗时的。此外,它还会引起用户的有偏见和不良反应。我们需要一种能够通过集成可在推理时进行干扰的算法来有效地实时适应用户偏好的模型。该方法将通过冻结基础模型并减少微调 LLM 的计算成本来避免重复重新训练模型以获得所需结果。
Researchers developed Inference-time alignment methods to integrate human values after fine-tuning LLMs using the implicit and explicit functions without changing the base model. Implicit functions are used for token generation, which conducts word-by-word evaluations and prefers the output with the highest probability. In contrast, explicit functions require a rigid structure to evaluate larger chunks of text and generate the following sequence of words with the highest probability while maintaining overall context. The explicit function is inflexible and computationally expensive, failing to address token-level optimization, while the implicit function faces interpretability issues and requires frequent forward passes, leading to low real-time efficiency.
研究人员在使用隐式和显式函数对法学硕士进行微调后,开发了推理时间对齐方法来整合人类价值观,而无需更改基础模型。隐式函数用于标记生成,逐字评估并优先选择概率最高的输出。相比之下,显式函数需要严格的结构来评估较大的文本块,并以最高的概率生成以下单词序列,同时保持整体上下文。显式函数不灵活且计算量大,无法解决token级别的优化,而隐式函数面临可解释性问题,需要频繁的前向传递,导致实时效率较低。
To tackle the disadvantages of both functions, the proposed method, Integrated Value Guidance (IVG), combines the implicit function’s token-level optimization and the explicit function’s broader perspective. It was able to ward off adaptation challenges and trade-offs in alignment efficacy, leading to decreased performance discrepancies and making it easier to implement. These advantages facilitated better performance on tasks like controlled sentiment generation and summarization. IVG, combined with the smaller models like GPT-2, could compete with higher models.
为了解决这两种函数的缺点,所提出的方法集成价值指导(IVG)结合了隐式函数的代币级优化和显式函数的更广泛的视角。它能够避免适应挑战和对齐效率的权衡,从而减少性能差异并使其更容易实施。这些优势有助于更好地执行受控情绪生成和摘要等任务。 IVG 与 GPT-2 等较小的模型相结合,可以与更高的模型竞争。
IVG incorporates the two value functions, the implicit and explicit functions, to align the model with human values. First, token-wise sampling fine-tunes individual tokens to a specific sequence length, generating multiple sequences. Then, chunk-level beam search compares the probabilities of these sequences and selects the one with the highest probability. Although this method ensures that the output is more robust, the computational power increases during the inference time due to frequent forward passes, leading to slower responses.
IVG 结合了隐式函数和显式函数这两个价值函数,使模型与人类价值观保持一致。首先,按标记采样将各个标记微调到特定的序列长度,生成多个序列。然后,块级波束搜索比较这些序列的概率并选择概率最高的一个。虽然这种方法确保了输出更加鲁棒,但由于频繁的前向传递,计算能力在推理时间内增加,导致响应速度变慢。
Researchers have used two experimental set-ups to evaluate IVG: 1. Controlled sentiment generation and Summarization, and 2. Instruction-following. In the first one, the GPT-2 model family is used by leveraging synthetic datasets from a gold-reward model to generate positive movie reviews and summarise Reddit posts. In comparison, the second one requires an instruction-tuned model, AlpacaEval 2.0. It employs Tulu Guidance, which uses specific models for implicit function and trains a reward-based model for the explicit function, and Ultraguidance, which fine-tunes a model with Direct Preference Optimization (DPO) for both functions. GPT-4-turbo was used as a reference to assess responses in the second experiment, and IVG consistently performed well.
研究人员使用两种实验设置来评估 IVG:1. 受控情绪生成和总结,2. 遵循指令。在第一个模型中,GPT-2 模型系列通过利用黄金奖励模型的合成数据集来生成积极的电影评论并总结 Reddit 帖子。相比之下,第二个需要指令调整模型 AlpacaEval 2.0。它采用了 Tulu Guidance,它使用隐式函数的特定模型,并为显式函数训练基于奖励的模型,以及 Ultraguidance,它使用直接偏好优化 (DPO) 为这两种函数微调模型。 GPT-4-turbo 被用作评估第二个实验中反应的参考,IVG 始终表现良好。
In addition to these two experiments, an ablation study proved that Chunk-Level Beam Search (CBS) had higher speed efficiency than Emulator Fine-Tuning (EFT), which uses the implicit function for fine-tuning. These results have proved that CBS is much better to use in practice.
除了这两个实验之外,一项消融研究证明,块级波束搜索(CBS)比使用隐式函数进行微调的仿真器微调(EFT)具有更高的速度效率。这些结果证明CBS在实践中使用起来要好得多。
In conclusion, Integrated Value Guidance (IVG) offers a novel and efficient approach to aligning large language models with human preferences purely at inference time, bypassing the complexities of traditional fine-tuning. By leveraging implicit and explicit value functions, IVG enhances performance in both token-wise sampling and chunk-level decoding, as demonstrated through significant improvements in sentiment generation, summarization, and instruction-following tasks. The results showed that IVG is a versatile method, providing strong empirical evidence of its ability to outclass existing approaches, making it a promising solution for fine-tuning large models in real-world applications.
总之,综合价值指导(IVG)提供了一种新颖而有效的方法,可以纯粹在推理时将大型语言模型与人类偏好结合起来,从而绕过传统微调的复杂性。通过利用隐式和显式价值函数,IVG 增强了 token-wise 采样和块级解码的性能,正如情感生成、摘要和指令跟踪任务方面的显着改进所证明的那样。结果表明,IVG 是一种多功能方法,提供了强有力的经验证据,证明其超越现有方法的能力,使其成为在现实应用中微调大型模型的有前途的解决方案。
Don’t Forget to join our 50k+ ML SubReddit
不要忘记加入我们超过 50k 的 ML SubReddit
Want to get in front of 1 Million+ AI Readers? Work with us here
想要面对超过 100 万人工智能读者吗?在这里与我们一起工作
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。
-
- 柴犬 (SHIB) 跻身鲸鱼青睐的山寨币行列,交易数量激增 360%
- 2024-10-03 16:25:02
- 过去一周,链上数据显示鲸鱼对柴犬的兴趣显着增加,使这种山寨币成为这些大型投资者最青睐的货币之一。