市值: $3.4914T -2.490%
成交额(24h): $120.1166B 16.000%
  • 市值: $3.4914T -2.490%
  • 成交额(24h): $120.1166B 16.000%
  • 恐惧与贪婪指数:
  • 市值: $3.4914T -2.490%
加密货币
话题
百科
资讯
加密话题
视频
热门新闻
加密货币
话题
百科
资讯
加密话题
视频
bitcoin
bitcoin

$102418.358867 USD

-1.97%

ethereum
ethereum

$3298.096549 USD

1.21%

xrp
xrp

$3.048127 USD

-1.30%

tether
tether

$0.999866 USD

-0.01%

solana
solana

$231.464380 USD

-2.61%

bnb
bnb

$675.655067 USD

-0.56%

usd-coin
usd-coin

$0.999928 USD

-0.01%

dogecoin
dogecoin

$0.327988 USD

-0.25%

cardano
cardano

$0.945324 USD

-1.12%

tron
tron

$0.256233 USD

0.65%

chainlink
chainlink

$25.471085 USD

1.61%

avalanche
avalanche

$34.603954 USD

-1.17%

stellar
stellar

$0.416369 USD

-2.01%

sui
sui

$4.058447 USD

-3.89%

toncoin
toncoin

$4.893106 USD

1.10%

加密货币新闻

llama.cpp:为 GGUF LLM 模型编写简单的 C++ 推理程序

2025/01/14 03:04

探索 llama.cpp 内部结构和基本聊天程序流程

llama.cpp:为 GGUF LLM 模型编写简单的 C++ 推理程序

This tutorial will guide you through the process of building a simple C++ program that performs inference on GGUF LLM models using the llama.cpp framework. We will cover the essential steps involved in loading the model, performing inference, and displaying the results. The code for this tutorial can be found here.

本教程将指导您完成构建一个简单的 C++ 程序的过程,该程序使用 llama.cpp 框架对 GGUF LLM 模型执行推理。我们将介绍加载模型、执行推理和显示结果所涉及的基本步骤。本教程的代码可以在此处找到。

Prerequisites

先决条件

To follow along with this tutorial, you will need the following:

要学习本教程,您将需要以下内容:

A Linux-based operating system (native or WSL)

基于 Linux 的操作系统(本机或 WSL)

CMake installed

安装了 CMake

GNU/clang toolchain installed

安装 GNU/clang 工具链

Step 1: Setting Up the Project

第 1 步:设置项目

Let's start by setting up our project. We will be building a C/C++ program that uses llama.cpp to perform inference on GGUF LLM models.

让我们从设置我们的项目开始。我们将构建一个 C/C++ 程序,使用 llama.cpp 对 GGUF LLM 模型执行推理。

Create a new project directory, let's call it smol_chat.

创建一个新的项目目录,我们将其命名为 smol_chat。

Within the project directory, let's clone the llama.cpp repository into a subdirectory called externals. This will give us access to the llama.cpp source code and headers.

在项目目录中,让我们将 llama.cpp 存储库克隆到名为 externals 的子目录中。这将使我们能够访问 llama.cpp 源代码和标头。

mkdir -p externals

mkdir -p 外部文件

cd externals

CD外设

git clone https://github.com/georgigerganov/llama.cpp.git

git 克隆 https://github.com/georgigerganov/llama.cpp.git

cd ..

光盘 ..

Step 2: Configuring CMake

第2步:配置CMake

Now, let's configure our project to use CMake. This will allow us to easily compile and link our C/C++ code with the llama.cpp library.

现在,让我们配置我们的项目以使用 CMake。这将使我们能够轻松编译 C/C++ 代码并将其与 llama.cpp 库链接。

Create a CMakeLists.txt file in the project directory.

在项目目录中创建 CMakeLists.txt 文件。

In the CMakeLists.txt file, add the following code:

在CMakeLists.txt文件中,添加以下代码:

cmake_minimum_required(VERSION 3.10)

cmake_minimum_required(版本3.10)

project(smol_chat)

项目(smol_chat)

set(CMAKE_CXX_STANDARD 20)

设置(CMAKE_CXX_STANDARD 20)

set(CMAKE_CXX_STANDARD_REQUIRED ON)

设置(CMAKE_CXX_STANDARD_REQUIRED ON)

add_executable(smol_chat main.cpp)

target_include_directories(smol_chat PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})

target_link_libraries(smol_chat llama.cpp)

This code specifies the minimum CMake version, sets the C++ standard and standard flag, adds an executable named smol_chat, includes headers from the current source directory, and links the llama.cpp shared library to our executable.

此代码指定最低 CMake 版本,设置 C++ 标准和标准标志,添加名为 smol_chat 的可执行文件,包含当前源目录中的标头,并将 llama.cpp 共享库链接到我们的可执行文件。

Step 3: Defining the LLM Interface

第 3 步:定义 LLM 接口

Next, let's define a C++ class that will handle the high-level interactions with the LLM. This class will abstract away the low-level llama.cpp function calls and provide a convenient interface for performing inference.

接下来,我们定义一个 C++ 类来处理与 LLM 的高级交互。此类将抽象出低级 llama.cpp 函数调用,并提供用于执行推理的便捷接口。

In the project directory, create a header file called LLMInference.h.

在项目目录中,创建一个名为 LLMInference.h 的头文件。

In LLMInference.h, declare the following class:

在 LLMInference.h 中,声明以下类:

class LLMInference {

类 LLMInference {

public:

民众:

LLMInference(const std::string& model_path);

LLMInference(const std::string& model_path);

~LLMInference();

~LLMInference();

void startCompletion(const std::string& query);

void startCompletion(const std::string& query);

std::string completeNext();

std::stringcompleteNext();

private:

私人的:

llama_model llama_model_;

火焰_模型 火焰_模型_;

llama_context llama_context_;

llama_context llama_context_;

llama_sampler llama_sampler_;

call_sampler call_sampler_;

std::vector _messages;

std::向量_消息;

std::vector _formattedMessages;

std::vector _formattedMessages;

std::vector _tokens;

std::vector _tokens;

llama_batch batch_;

骆驼_批批_;

};

This class has a public constructor that takes the path to the GGUF LLM model as an argument and a destructor that deallocates any dynamically-allocated objects. It also has two public member functions: startCompletion, which initiates the completion process for a given query, and completeNext, which fetches the next token in the LLM's response sequence.

此类具有一个公共构造函数,该构造函数将 GGUF LLM 模型的路径作为参数,还有一个析构函数,用于释放任何动态分配的对象。它还具有两个公共成员函数:startCompletion(启动给定查询的完成过程)和completeNext(获取LLM 响应序列中的下一个标记)。

Step 4: Implementing LLM Inference Functions

第 4 步:实现 LLM 推理功能

Now, let's define the implementation for the LLMInference class in a file called LLMInference.cpp.

现在,让我们在名为 LLMInference.cpp 的文件中定义 LLMInference 类的实现。

In LLMInference.cpp, include the necessary headers and implement the class methods as follows:

在 LLMInference.cpp 中,包含必要的标头并实现类方法,如下所示:

#include "LLMInference.h"

#include“LLMInference.h”

#include "common.h"

#include“common.h”

#include

#包括

#include

#包括

#include

#包括

LLMInference::LLMInference(const std::string& model_path) {

LLMInference::LLMInference(const std::string& model_path) {

llama_load_model_from_file(&llama_model_, model_path.c_str(), llama_model_default_params());

llama_load_model_from_file(&llama_model_, model_path.c_str(), llama_model_default_params());

llama_new_context_with_model(&llama_context_, &llama_model_);

llama_new_context_with_model(&llama_context_, &llama_model_);

llama_sampler_init_temp(&llama_sampler_, 0.8f);

llama_sampler_init_temp(&llama_sampler_, 0.8f);

llama_sampler_init_min_p(&llama_sampler_, 0.0f);

call_sampler_init_min_p(&call_sampler_, 0.0f);

}

LLMInference::~LLMInference() {

for (auto& msg : _messages) {

for (自动&消息:_messages) {

std::free(msg.content);

std::free(msg.content);

}

}

llama_free_model(&llama_model_);

llama_free_model(&llama_model_);

llama_free_context(&llama_context_);

llama_free_context(&llama_context_);

}

void LLMInference::startCompletion(const std::string& query)

void LLMInference::startCompletion(const std::string& query)

免责声明:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

2025年02月01日 发表的其他文章