|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
探索 llama.cpp 内部结构和基本聊天程序流程
This tutorial will guide you through the process of building a simple C++ program that performs inference on GGUF LLM models using the llama.cpp framework. We will cover the essential steps involved in loading the model, performing inference, and displaying the results. The code for this tutorial can be found here.
本教程将指导您完成构建一个简单的 C++ 程序的过程,该程序使用 llama.cpp 框架对 GGUF LLM 模型执行推理。我们将介绍加载模型、执行推理和显示结果所涉及的基本步骤。本教程的代码可以在此处找到。
Prerequisites
先决条件
To follow along with this tutorial, you will need the following:
要学习本教程,您将需要以下内容:
A Linux-based operating system (native or WSL)
基于 Linux 的操作系统(本机或 WSL)
CMake installed
安装了 CMake
GNU/clang toolchain installed
安装 GNU/clang 工具链
Step 1: Setting Up the Project
第 1 步:设置项目
Let's start by setting up our project. We will be building a C/C++ program that uses llama.cpp to perform inference on GGUF LLM models.
让我们从设置我们的项目开始。我们将构建一个 C/C++ 程序,使用 llama.cpp 对 GGUF LLM 模型执行推理。
Create a new project directory, let's call it smol_chat.
创建一个新的项目目录,我们将其命名为 smol_chat。
Within the project directory, let's clone the llama.cpp repository into a subdirectory called externals. This will give us access to the llama.cpp source code and headers.
在项目目录中,让我们将 llama.cpp 存储库克隆到名为 externals 的子目录中。这将使我们能够访问 llama.cpp 源代码和标头。
mkdir -p externals
mkdir -p 外部文件
cd externals
CD外设
git clone https://github.com/georgigerganov/llama.cpp.git
git 克隆 https://github.com/georgigerganov/llama.cpp.git
cd ..
光盘 ..
Step 2: Configuring CMake
第2步:配置CMake
Now, let's configure our project to use CMake. This will allow us to easily compile and link our C/C++ code with the llama.cpp library.
现在,让我们配置我们的项目以使用 CMake。这将使我们能够轻松编译 C/C++ 代码并将其与 llama.cpp 库链接。
Create a CMakeLists.txt file in the project directory.
在项目目录中创建 CMakeLists.txt 文件。
In the CMakeLists.txt file, add the following code:
在CMakeLists.txt文件中,添加以下代码:
cmake_minimum_required(VERSION 3.10)
cmake_minimum_required(版本3.10)
project(smol_chat)
项目(smol_chat)
set(CMAKE_CXX_STANDARD 20)
设置(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
设置(CMAKE_CXX_STANDARD_REQUIRED ON)
add_executable(smol_chat main.cpp)
target_include_directories(smol_chat PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(smol_chat llama.cpp)
This code specifies the minimum CMake version, sets the C++ standard and standard flag, adds an executable named smol_chat, includes headers from the current source directory, and links the llama.cpp shared library to our executable.
此代码指定最低 CMake 版本,设置 C++ 标准和标准标志,添加名为 smol_chat 的可执行文件,包含当前源目录中的标头,并将 llama.cpp 共享库链接到我们的可执行文件。
Step 3: Defining the LLM Interface
第 3 步:定义 LLM 接口
Next, let's define a C++ class that will handle the high-level interactions with the LLM. This class will abstract away the low-level llama.cpp function calls and provide a convenient interface for performing inference.
接下来,我们定义一个 C++ 类来处理与 LLM 的高级交互。此类将抽象出低级 llama.cpp 函数调用,并提供用于执行推理的便捷接口。
In the project directory, create a header file called LLMInference.h.
在项目目录中,创建一个名为 LLMInference.h 的头文件。
In LLMInference.h, declare the following class:
在 LLMInference.h 中,声明以下类:
class LLMInference {
类 LLMInference {
public:
民众:
LLMInference(const std::string& model_path);
LLMInference(const std::string& model_path);
~LLMInference();
~LLMInference();
void startCompletion(const std::string& query);
void startCompletion(const std::string& query);
std::string completeNext();
std::stringcompleteNext();
private:
私人的:
llama_model llama_model_;
火焰_模型 火焰_模型_;
llama_context llama_context_;
llama_context llama_context_;
llama_sampler llama_sampler_;
call_sampler call_sampler_;
std::vector
std::向量_消息;
std::vector
std::vector _formattedMessages;
std::vector
std::vector _tokens;
llama_batch batch_;
骆驼_批批_;
};
This class has a public constructor that takes the path to the GGUF LLM model as an argument and a destructor that deallocates any dynamically-allocated objects. It also has two public member functions: startCompletion, which initiates the completion process for a given query, and completeNext, which fetches the next token in the LLM's response sequence.
此类具有一个公共构造函数,该构造函数将 GGUF LLM 模型的路径作为参数,还有一个析构函数,用于释放任何动态分配的对象。它还具有两个公共成员函数:startCompletion(启动给定查询的完成过程)和completeNext(获取LLM 响应序列中的下一个标记)。
Step 4: Implementing LLM Inference Functions
第 4 步:实现 LLM 推理功能
Now, let's define the implementation for the LLMInference class in a file called LLMInference.cpp.
现在,让我们在名为 LLMInference.cpp 的文件中定义 LLMInference 类的实现。
In LLMInference.cpp, include the necessary headers and implement the class methods as follows:
在 LLMInference.cpp 中,包含必要的标头并实现类方法,如下所示:
#include "LLMInference.h"
#include“LLMInference.h”
#include "common.h"
#include“common.h”
#include
#包括
#include
#包括
#include
#包括
LLMInference::LLMInference(const std::string& model_path) {
LLMInference::LLMInference(const std::string& model_path) {
llama_load_model_from_file(&llama_model_, model_path.c_str(), llama_model_default_params());
llama_load_model_from_file(&llama_model_, model_path.c_str(), llama_model_default_params());
llama_new_context_with_model(&llama_context_, &llama_model_);
llama_new_context_with_model(&llama_context_, &llama_model_);
llama_sampler_init_temp(&llama_sampler_, 0.8f);
llama_sampler_init_temp(&llama_sampler_, 0.8f);
llama_sampler_init_min_p(&llama_sampler_, 0.0f);
call_sampler_init_min_p(&call_sampler_, 0.0f);
}
LLMInference::~LLMInference() {
for (auto& msg : _messages) {
for (自动&消息:_messages) {
std::free(msg.content);
std::free(msg.content);
}
}
llama_free_model(&llama_model_);
llama_free_model(&llama_model_);
llama_free_context(&llama_context_);
llama_free_context(&llama_context_);
}
void LLMInference::startCompletion(const std::string& query)
void LLMInference::startCompletion(const std::string& query)
免责声明:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.
-
- 埃隆·马斯克(Elon Musk)以狗为主题的政府效率团队瞄准了一分钱
- 2025-02-01 21:00:56
- 摆脱一分钱将是对Doge影响力的早期考验:它可以帮助消除数十年来改革尝试的政府效率低下吗?
-
- 灰度文件将Doge Trust转换为ETF,加入针对模因硬币的一系列应用程序
- 2025-02-01 21:00:56
- 一旦被认为是海市rage楼,随着灰年刻板进入种族,多霉素ETF的前景已经获得了巨大的动力。
-
- 接近协议新闻和Polkadot价格预测:即将发生什么?
- 2025-02-01 21:00:56
- 是否可以设置近距离的协议?最新的近协议新闻强调了开发人员参与度的增加,新工具简化了区块链开发。