시가총액: $2.7493T -11.190%
거래량(24시간): $183.3816B -1.370%
  • 시가총액: $2.7493T -11.190%
  • 거래량(24시간): $183.3816B -1.370%
  • 공포와 탐욕 지수:
  • 시가총액: $2.7493T -11.190%
Cryptos
주제
Cryptospedia
소식
CryptosTopics
비디오
Top News
Cryptos
주제
Cryptospedia
소식
CryptosTopics
비디오
bitcoin
bitcoin

$91229.967283 USD

5.84%

ethereum
ethereum

$2354.581560 USD

6.04%

xrp
xrp

$2.649458 USD

15.56%

tether
tether

$0.999525 USD

0.01%

bnb
bnb

$599.418199 USD

-1.77%

solana
solana

$160.462568 USD

11.29%

usd-coin
usd-coin

$0.999978 USD

0.01%

cardano
cardano

$0.995827 USD

49.40%

dogecoin
dogecoin

$0.218105 USD

5.31%

tron
tron

$0.238864 USD

2.27%

hedera
hedera

$0.248949 USD

0.83%

chainlink
chainlink

$16.162296 USD

8.94%

stellar
stellar

$0.331779 USD

2.02%

avalanche
avalanche

$23.462916 USD

6.85%

sui
sui

$2.948878 USD

2.62%

암호화폐 뉴스 기사

llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models

2025/01/14 03:04

llama.cpp: Writing A Simple C++ Inference Program for GGUF LLM Models

This tutorial will guide you through the process of building a simple C++ program that performs inference on GGUF LLM models using the llama.cpp framework. We will cover the essential steps involved in loading the model, performing inference, and displaying the results. The code for this tutorial can be found here.

Prerequisites

To follow along with this tutorial, you will need the following:

A Linux-based operating system (native or WSL)

CMake installed

GNU/clang toolchain installed

Step 1: Setting Up the Project

Let's start by setting up our project. We will be building a C/C++ program that uses llama.cpp to perform inference on GGUF LLM models.

Create a new project directory, let's call it smol_chat.

Within the project directory, let's clone the llama.cpp repository into a subdirectory called externals. This will give us access to the llama.cpp source code and headers.

mkdir -p externals

cd externals

git clone https://github.com/georgigerganov/llama.cpp.git

cd ..

Step 2: Configuring CMake

Now, let's configure our project to use CMake. This will allow us to easily compile and link our C/C++ code with the llama.cpp library.

Create a CMakeLists.txt file in the project directory.

In the CMakeLists.txt file, add the following code:

cmake_minimum_required(VERSION 3.10)

project(smol_chat)

set(CMAKE_CXX_STANDARD 20)

set(CMAKE_CXX_STANDARD_REQUIRED ON)

add_executable(smol_chat main.cpp)

target_include_directories(smol_chat PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})

target_link_libraries(smol_chat llama.cpp)

This code specifies the minimum CMake version, sets the C++ standard and standard flag, adds an executable named smol_chat, includes headers from the current source directory, and links the llama.cpp shared library to our executable.

Step 3: Defining the LLM Interface

Next, let's define a C++ class that will handle the high-level interactions with the LLM. This class will abstract away the low-level llama.cpp function calls and provide a convenient interface for performing inference.

In the project directory, create a header file called LLMInference.h.

In LLMInference.h, declare the following class:

class LLMInference {

public:

LLMInference(const std::string& model_path);

~LLMInference();

void startCompletion(const std::string& query);

std::string completeNext();

private:

llama_model llama_model_;

llama_context llama_context_;

llama_sampler llama_sampler_;

std::vector _messages;

std::vector _formattedMessages;

std::vector _tokens;

llama_batch batch_;

};

This class has a public constructor that takes the path to the GGUF LLM model as an argument and a destructor that deallocates any dynamically-allocated objects. It also has two public member functions: startCompletion, which initiates the completion process for a given query, and completeNext, which fetches the next token in the LLM's response sequence.

Step 4: Implementing LLM Inference Functions

Now, let's define the implementation for the LLMInference class in a file called LLMInference.cpp.

In LLMInference.cpp, include the necessary headers and implement the class methods as follows:

#include "LLMInference.h"

#include "common.h"

#include

#include

#include

LLMInference::LLMInference(const std::string& model_path) {

llama_load_model_from_file(&llama_model_, model_path.c_str(), llama_model_default_params());

llama_new_context_with_model(&llama_context_, &llama_model_);

llama_sampler_init_temp(&llama_sampler_, 0.8f);

llama_sampler_init_min_p(&llama_sampler_, 0.0f);

}

LLMInference::~LLMInference() {

for (auto& msg : _messages) {

std::free(msg.content);

}

llama_free_model(&llama_model_);

llama_free_context(&llama_context_);

}

void LLMInference::startCompletion(const std::string& query)

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

본 웹사이트에 사용된 내용이 귀하의 저작권을 침해한다고 판단되는 경우, 즉시 당사(info@kdj.com)로 연락주시면 즉시 삭제하도록 하겠습니다.

2025年03月04日 에 게재된 다른 기사