gollama

package module

v0.2.0-llamacpp.b6099 Latest Latest Go to latest Published: Aug 6, 2025 License: MIT Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/dianlight/gollama.cpp

Links

Open Source Insights

README ¶

Gollama.cpp

A high-performance Go binding for llama.cpp using purego for cross-platform compatibility without CGO.

Features

Pure Go: No CGO required, uses purego for C interop
Cross-Platform: Supports macOS (CPU/Metal), Linux (CPU/NVIDIA/AMD), Windows (CPU/NVIDIA/AMD)
Performance: Direct bindings to llama.cpp shared libraries
Compatibility: Version-synchronized with llama.cpp releases
Easy Integration: Simple Go API for LLM inference
GPU Acceleration: Supports Metal, CUDA, HIP, Vulkan, OpenCL, SYCL, and other backends

Supported Platforms

Gollama.cpp uses a platform-specific architecture with build tags to ensure optimal compatibility and performance across all operating systems.

✅ Fully Supported Platforms

macOS

CPU: Intel x64, Apple Silicon (ARM64)
GPU: Metal (Apple Silicon)
Status: Full feature support with purego
Build Tags: Uses !windows build tag

Linux

CPU: x86_64, ARM64
GPU: NVIDIA (CUDA/Vulkan), AMD (HIP/ROCm/Vulkan), Intel (SYCL/Vulkan)
Status: Full feature support with purego
Build Tags: Uses !windows build tag

🚧 In Development

Windows

CPU: x86_64, ARM64
GPU: NVIDIA (CUDA/Vulkan), AMD (HIP/Vulkan), Intel (SYCL/Vulkan), Qualcomm Adreno (OpenCL) - planned
Status: Build compatibility implemented, runtime support in development
Build Tags: Uses windows build tag with syscall-based library loading
Current State:
- ✅ Compiles without errors on Windows
- ✅ Cross-compilation from other platforms works
- 🚧 Runtime functionality being implemented
- 🚧 GPU acceleration being added

Platform-Specific Implementation Details

Our platform abstraction layer uses Go build tags to provide:

Unix-like systems (!windows): Uses purego for dynamic library loading
Windows (windows): Uses native Windows syscalls (LoadLibraryW, FreeLibrary, GetProcAddress)
Cross-compilation: Supports building for any platform from any platform
Automatic detection: Runtime platform capability detection

Installation

go get github.com/dianlight/gollama.cpp

The Go module automatically downloads pre-built llama.cpp libraries from the official ggml-org/llama.cpp releases on first use. No manual compilation required!

Cross-Platform Development

Build Compatibility Matrix

Our CI system tests compilation across all platforms:

Target Platform	Build From Linux	Build From macOS	Build From Windows
Linux (amd64)	✅	✅	✅
Linux (arm64)	✅	✅	✅
macOS (amd64)	✅	✅	✅
macOS (arm64)	✅	✅	✅
Windows (amd64)	✅	✅	✅
Windows (arm64)	✅	✅	✅

Development Workflow

# Test cross-compilation for all platforms
make test-cross-compile

# Build for specific platform
GOOS=windows GOARCH=amd64 go build ./...
GOOS=linux GOARCH=arm64 go build ./...
GOOS=darwin GOARCH=arm64 go build ./...

# Run platform-specific tests
go test -v -run TestPlatformSpecific ./...

Quick Start

package main

import (
    "fmt"
    "log"

    "github.com/dianlight/gollama.cpp"
)

func main() {
    // Initialize the library
    gollama.Backend_init()
    defer gollama.Backend_free()

    // Load model
    params := gollama.Model_default_params()
    model, err := gollama.Model_load_from_file("path/to/model.gguf", params)
    if err != nil {
        log.Fatal(err)
    }
    defer gollama.Model_free(model)

    // Create context
    ctxParams := gollama.Context_default_params()
    ctx, err := gollama.Init_from_model(model, ctxParams)
    if err != nil {
        log.Fatal(err)
    }
    defer gollama.Free(ctx)

    // Tokenize and generate
    prompt := "The future of AI is"
    tokens, err := gollama.Tokenize(model, prompt, true, false)
    if err != nil {
        log.Fatal(err)
    }

    // Create batch and decode
    batch := gollama.Batch_init(len(tokens), 0, 1)
    defer gollama.Batch_free(batch)

    for i, token := range tokens {
        gollama.Batch_add(batch, token, int32(i), []int32{0}, false)
    }

    if err := gollama.Decode(ctx, batch); err != nil {
        log.Fatal(err)
    }

    // Sample next token
    logits := gollama.Get_logits_ith(ctx, -1)
    candidates := gollama.Token_data_array_init(model)
    
    sampler := gollama.Sampler_init_greedy()
    defer gollama.Sampler_free(sampler)
    
    newToken := gollama.Sampler_sample(sampler, ctx, candidates)
    
    // Convert token to text
    text := gollama.Token_to_piece(model, newToken, false)
    fmt.Printf("Generated: %s\n", text)
}

Advanced Usage

GPU Configuration

Gollama.cpp automatically downloads the appropriate pre-built binaries with GPU support and configures the optimal backend:

// Automatic GPU detection and configuration
params := gollama.Context_default_params()
params.n_gpu_layers = 32 // Offload layers to GPU (if available)

// Detect available GPU backend
backend := gollama.DetectGpuBackend()
fmt.Printf("Using GPU backend: %s\n", backend.String())

// Platform-specific optimizations:
// - macOS: Uses Metal when available  
// - Linux: Supports CUDA, HIP, Vulkan, and SYCL
// - Windows: Supports CUDA, HIP, Vulkan, OpenCL, and SYCL
params.split_mode = gollama.LLAMA_SPLIT_MODE_LAYER

GPU Support Matrix

Platform	GPU Type	Backend	Status
macOS	Apple Silicon	Metal	✅ Supported
macOS	Intel/AMD	CPU only	✅ Supported
Linux	NVIDIA	CUDA	✅ Available in releases
Linux	NVIDIA	Vulkan	✅ Available in releases
Linux	AMD	HIP/ROCm	✅ Available in releases
Linux	AMD	Vulkan	✅ Available in releases
Linux	Intel	SYCL	✅ Available in releases
Linux	Intel/Other	Vulkan	✅ Available in releases
Linux	Intel/Other	CPU	✅ Fallback
Windows	NVIDIA	CUDA	✅ Available in releases
Windows	NVIDIA	Vulkan	✅ Available in releases
Windows	AMD	HIP	✅ Available in releases
Windows	AMD	Vulkan	✅ Available in releases
Windows	Intel	SYCL	✅ Available in releases
Windows	Qualcomm Adreno	OpenCL	✅ Available in releases
Windows	Intel/Other	Vulkan	✅ Available in releases
Windows	Intel/Other	CPU	✅ Fallback

The library automatically downloads pre-built binaries from the official llama.cpp releases with the appropriate GPU support for your platform. The download happens automatically on first use!

Model Loading Options

params := gollama.Model_default_params()
params.n_ctx = 4096           // Context size
params.use_mmap = true        // Memory mapping
params.use_mlock = true       // Memory locking
params.vocab_only = false     // Load full model

Library Management

Gollama.cpp automatically downloads pre-built binaries from the official llama.cpp releases. You can also manage libraries manually:

// Load a specific version
err := gollama.LoadLibraryWithVersion("b6099")

// Clean cache to force re-download
err := gollama.CleanLibraryCache()

Command Line Tools

# Download libraries for current platform
make download-libs

# Download libraries for all platforms  
make download-libs-all

# Test download functionality
make test-download

# Test GPU detection and functionality
make test-gpu

# Detect available GPU backends
make detect-gpu

# Clean library cache
make clean-libs

Available Library Variants

The downloader automatically selects the best variant for your platform:

macOS: Metal-enabled binaries (arm64/x64)
Linux: CPU-optimized binaries (CUDA/HIP/Vulkan/SYCL versions available)
Windows: CPU-optimized binaries (CUDA/HIP/Vulkan/OpenCL/SYCL versions available)

Cache Location

Downloaded libraries are cached in:

Linux/macOS: ~/.cache/gollama/libs/
Windows: %LOCALAPPDATA%/gollama/libs/

Building from Source

Prerequisites

Go 1.21 or later
Make

Quick Start

# Clone and build
git clone https://github.com/dianlight/gollama.cpp
cd gollama.cpp

# Build for current platform
make build

# Run tests (downloads libraries automatically)
make test

# Build examples
make build-examples

# Run tests
make test

# Generate release packages
make release

GPU Detection Logic

The Makefile implements intelligent GPU detection:

CUDA Detection: Checks for nvcc compiler and CUDA toolkit
HIP Detection: Checks for hipconfig and ROCm installation
Priority Order: CUDA > HIP > CPU (on Linux/Windows)
Metal: Always enabled on macOS when Xcode is available

No manual configuration or environment variables required!

Version Compatibility

This library tracks llama.cpp versions. The version number format is:

vX.Y.Z-llamacpp.ABCD

Where:

X.Y.Z is the gollama.cpp semantic version
ABCD is the corresponding llama.cpp build number

For example: v0.2.0-llamacpp.b6099 uses llama.cpp build b6099.

Documentation

Examples

See the examples directory for complete examples:

Contributing

Contributions are welcome! Please read our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

This license is compatible with llama.cpp's MIT license.

Acknowledgments

llama.cpp - The underlying C++ library
purego - Pure Go C interop library
ggml - Machine learning tensor library

Support

Issues - Bug reports and feature requests
Discussions - Questions and community support

Documentation ¶

Rendered for

Overview ¶

Package gollama provides Go bindings for llama.cpp using purego. This package allows you to use llama.cpp functionality from Go without CGO.

The bindings are designed to be as close to the original llama.cpp C API as possible, while providing Go-friendly interfaces where appropriate.

Example usage:

// Initialize the library
gollama.Backend_init()
defer gollama.Backend_free()

// Load a model
params := gollama.Model_default_params()
model, err := gollama.Model_load_from_file("model.gguf", params)
if err != nil {
	log.Fatal(err)
}
defer gollama.Model_free(model)

// Create context and generate text
ctxParams := gollama.Context_default_params()
ctx, err := gollama.Init_from_model(model, ctxParams)
if err != nil {
	log.Fatal(err)
}
defer gollama.Free(ctx)

Index ¶

Constants ¶

View Source

const (
	LLAMA_ERR_SUCCESS         = 0
	LLAMA_ERR_FAIL            = -1
	LLAMA_ERR_INVALID_PARAM   = -2
	LLAMA_ERR_OUT_OF_MEMORY   = -3
	LLAMA_ERR_FILE_NOT_FOUND  = -4
	LLAMA_ERR_FILE_READ       = -5
	LLAMA_ERR_FILE_WRITE      = -6
	LLAMA_ERR_INVALID_FORMAT  = -7
	LLAMA_ERR_UNSUPPORTED     = -8
	LLAMA_ERR_BACKEND_INIT    = -9
	LLAMA_ERR_CONTEXT_FULL    = -10
	LLAMA_ERR_TOKEN_INVALID   = -11
	LLAMA_ERR_MODEL_CORRUPTED = -12
	LLAMA_ERR_GPU_UNAVAILABLE = -13
)

Error codes matching llama.cpp return values

View Source

const (
	// Version is the gollama.cpp version
	Version = "0.2.0"
	// LlamaCppBuild is the llama.cpp build number this version is based on
	LlamaCppBuild = "b6099"
	// FullVersion combines both version numbers
	FullVersion = "v" + Version + "-llamacpp." + LlamaCppBuild
)

Version information

View Source

const (
	LLAMA_DEFAULT_SEED = 0xFFFFFFFF
	LLAMA_TOKEN_NULL   = -1

	// File magic numbers
	LLAMA_FILE_MAGIC_GGLA = 0x67676c61
	LLAMA_FILE_MAGIC_GGSN = 0x6767736e
	LLAMA_FILE_MAGIC_GGSQ = 0x67677371

	// Session constants
	LLAMA_SESSION_MAGIC   = LLAMA_FILE_MAGIC_GGSN
	LLAMA_SESSION_VERSION = 9

	LLAMA_STATE_SEQ_MAGIC   = LLAMA_FILE_MAGIC_GGSQ
	LLAMA_STATE_SEQ_VERSION = 2
)

Constants from llama.h

Variables ¶

View Source

var (
	// Library errors
	ErrLibraryNotLoaded   = errors.New("llama.cpp library not loaded")
	ErrLibraryLoadFailed  = errors.New("failed to load llama.cpp library")
	ErrFunctionNotFound   = errors.New("function not found in library")
	ErrInvalidLibraryPath = errors.New("invalid library path")

	// Model errors
	ErrModelNotLoaded       = errors.New("model not loaded")
	ErrModelLoadFailed      = errors.New("failed to load model")
	ErrModelSaveFailed      = errors.New("failed to save model")
	ErrInvalidModelPath     = errors.New("invalid model path")
	ErrModelCorrupted       = errors.New("model file corrupted")
	ErrUnsupportedModelType = errors.New("unsupported model type")

	// Context errors
	ErrContextNotCreated     = errors.New("context not created")
	ErrContextCreationFailed = errors.New("failed to create context")
	ErrInvalidContextSize    = errors.New("invalid context size")
	ErrContextFull           = errors.New("context is full")

	// Token errors
	ErrTokenizationFailed = errors.New("tokenization failed")
	ErrInvalidToken       = errors.New("invalid token")
	ErrTokenOutOfRange    = errors.New("token out of vocabulary range")

	// Generation errors
	ErrGenerationFailed      = errors.New("text generation failed")
	ErrSamplingFailed        = errors.New("token sampling failed")
	ErrInvalidSamplingParams = errors.New("invalid sampling parameters")

	// Memory errors
	ErrOutOfMemory            = errors.New("out of memory")
	ErrMemoryAllocationFailed = errors.New("memory allocation failed")
	ErrInvalidMemorySize      = errors.New("invalid memory size")

	// Configuration errors
	ErrInvalidConfig          = errors.New("invalid configuration")
	ErrConfigValidationFailed = errors.New("configuration validation failed")
	ErrUnsupportedPlatform    = errors.New("unsupported platform")

	// Backend errors
	ErrBackendNotAvailable = errors.New("backend not available")
	ErrBackendInitFailed   = errors.New("backend initialization failed")
	ErrGPUNotAvailable     = errors.New("GPU not available")
	ErrCUDANotAvailable    = errors.New("CUDA not available")
	ErrMetalNotAvailable   = errors.New("metal backend not available")
	ErrVulkanNotAvailable  = errors.New("vulkan backend not available")

	// File I/O errors
	ErrFileNotFound      = errors.New("file not found")
	ErrFileReadFailed    = errors.New("failed to read file")
	ErrFileWriteFailed   = errors.New("failed to write file")
	ErrInvalidFileFormat = errors.New("invalid file format")

	// Parameter errors
	ErrInvalidParameter    = errors.New("invalid parameter")
	ErrParameterOutOfRange = errors.New("parameter out of range")
	ErrMissingParameter    = errors.New("missing required parameter")

	// Thread/concurrency errors
	ErrThreadingFailed      = errors.New("threading operation failed")
	ErrConcurrencyViolation = errors.New("concurrency violation")
	ErrDeadlock             = errors.New("deadlock detected")
)

Error types for different categories of errors

Functions ¶

func ApplyConfig ¶

func ApplyConfig(config *Config) error

ApplyConfig applies the configuration to the library

func Backend_free ¶

func Backend_free()

Backend_free frees the llama + ggml backend

func Backend_init ¶

func Backend_init() error

Backend_init initializes the llama + ggml backend

func CheckResult ¶

func CheckResult(result int, function string) error

CheckResult checks a result code and returns an appropriate error

func CleanLibraryCache ¶

func CleanLibraryCache() error

CleanLibraryCache removes cached library files to force re-download

func Cleanup ¶

func Cleanup()

Cleanup function to be called when the program exits

func ErrorFromCode ¶

func ErrorFromCode(code int) error

ErrorFromCode converts a llama.cpp error code to a Go error

func GetConfigPath ¶

func GetConfigPath() string

GetConfigPath returns the default configuration file path

func GetSHA256ForFile ¶

func GetSHA256ForFile(filepath string) (string, error)

GetSHA256ForFile calculates the SHA256 checksum for a given file

func Get_embeddings ¶

func Get_embeddings(ctx LlamaContext) *float32

Get_embeddings returns the embeddings for the context

func Get_embeddings_ith ¶

func Get_embeddings_ith(ctx LlamaContext, i int32) *float32

Get_embeddings_ith returns the embeddings for the ith sequence in the context

func Get_logits_ith ¶

func Get_logits_ith(ctx LlamaContext, i int32) *float32

Get_logits_ith gets logits for a specific token

func HandleError ¶

func HandleError(err error, context string) error

HandleError is a convenience function that uses the global error handler

func IsFatalError ¶

func IsFatalError(err error) bool

IsFatalError checks if an error is fatal and should stop execution

func IsRetryableError ¶

func IsRetryableError(err error) bool

IsRetryableError checks if an error is retryable

func LoadLibraryWithVersion ¶

func LoadLibraryWithVersion(version string) error

LoadLibraryWithVersion loads a specific version of the llama.cpp library

func Max_devices ¶

func Max_devices() uint64

Max_devices returns the maximum number of devices

func Model_n_embd ¶

func Model_n_embd(model LlamaModel) int32

Model_n_embd returns the number of embedding dimensions for the model

func Must ¶

func Must(err error)

Must panics if the error is not nil - use only for initialization

func Print_system_info ¶

func Print_system_info() string

Print_system_info prints system information

func RegisterFunction ¶

func RegisterFunction(fptr interface{}, name string) error

RegisterFunction registers a function with the global library handle

func SetGlobalConfig ¶

func SetGlobalConfig(config *Config) error

SetGlobalConfig sets the global configuration

func SetGlobalErrorHandler ¶

func SetGlobalErrorHandler(handler *ErrorHandler)

SetGlobalErrorHandler sets the global error handler

func Set_causal_attn ¶

func Set_causal_attn(ctx LlamaContext, causal bool)

Set_causal_attn sets whether to use causal attention

func Set_embeddings ¶

func Set_embeddings(ctx LlamaContext, embeddings bool)

Set_embeddings sets whether to extract embeddings

func Supports_gpu_offload ¶

func Supports_gpu_offload() bool

Supports_gpu_offload returns whether GPU offload is supported

func Supports_mlock ¶

func Supports_mlock() bool

Supports_mlock returns whether mlock is supported

func Supports_mmap ¶

func Supports_mmap() bool

Supports_mmap returns whether mmap is supported

func Token_to_piece ¶

func Token_to_piece(model LlamaModel, token LlamaToken, special bool) string

Token_to_piece converts a token to its string representation using model

func Try ¶

func Try(fn func() error) (err error)

Try executes a function and handles any panics as errors

func WrapError ¶

func WrapError(err error, message string) error

WrapError wraps an error with additional context

func WrapErrorf ¶

func WrapErrorf(err error, format string, args ...interface{}) error

WrapErrorf wraps an error with formatted additional context

Types ¶

type Config ¶

type Config struct {
	// Library settings
	LibraryPath   string `json:"library_path,omitempty"`
	UseEmbedded   bool   `json:"use_embedded"`
	EnableLogging bool   `json:"enable_logging"`
	LogLevel      int    `json:"log_level"`

	// Performance settings
	NumThreads    int  `json:"num_threads"`
	EnableGPU     bool `json:"enable_gpu"`
	GPULayers     int  `json:"gpu_layers"`
	MetalEnabled  bool `json:"metal_enabled"`
	CUDAEnabled   bool `json:"cuda_enabled"`
	VulkanEnabled bool `json:"vulkan_enabled"`

	// Memory settings
	ContextSize       int  `json:"context_size"`
	BatchSize         int  `json:"batch_size"`
	UbatchSize        int  `json:"ubatch_size"`
	MemoryMapEnabled  bool `json:"memory_map_enabled"`
	MemoryLockEnabled bool `json:"memory_lock_enabled"`

	// Model settings
	ModelPath        string `json:"model_path,omitempty"`
	VocabOnly        bool   `json:"vocab_only"`
	UseQuantization  bool   `json:"use_quantization"`
	QuantizationType string `json:"quantization_type,omitempty"`

	// Backend settings
	BackendType string `json:"backend_type,omitempty"`
	DeviceID    int    `json:"device_id"`

	// Debug settings
	VerboseLogging bool `json:"verbose_logging"`
	DebugMode      bool `json:"debug_mode"`
}

Config holds configuration options for gollama

func DefaultConfig ¶

func DefaultConfig() *Config

DefaultConfig returns a configuration with sensible defaults

func GetGlobalConfig ¶

func GetGlobalConfig() *Config

GetGlobalConfig returns the global configuration

func LoadConfig ¶

func LoadConfig(path string) (*Config, error)

LoadConfig loads configuration from a JSON file

func LoadConfigFromEnv ¶

func LoadConfigFromEnv() *Config

LoadConfigFromEnv loads configuration from environment variables

func LoadDefaultConfig ¶

func LoadDefaultConfig() *Config

LoadDefaultConfig loads configuration from the default locations

func (*Config) SaveConfig ¶

func (c *Config) SaveConfig(path string) error

SaveConfig saves configuration to a JSON file

func (*Config) Validate ¶

func (c *Config) Validate() error

Validate validates the configuration

type DownloadResult ¶

type DownloadResult struct {
	Platform    string
	Success     bool
	Error       error
	LibraryPath string
	SHA256Sum   string
}

DownloadResult represents the result of a download task

func DownloadLibrariesForPlatforms ¶

func DownloadLibrariesForPlatforms(platforms []string, version string) ([]DownloadResult, error)

DownloadLibrariesForPlatforms downloads libraries for multiple platforms in parallel platforms should be in the format []string{"linux/amd64", "darwin/arm64", "windows/amd64"} version can be empty for latest version or specify a specific version like "b6099"

type DownloadTask ¶

type DownloadTask struct {
	Platform     string
	AssetName    string
	DownloadURL  string
	TargetDir    string
	ExpectedSHA2 string
}

DownloadTask represents a single download task for parallel processing

type ErrorCategory ¶

type ErrorCategory int

ErrorCategory represents different categories of errors

const (
	CategoryLibrary ErrorCategory = iota
	CategoryModel
	CategoryContext
	CategoryToken
	CategoryGeneration
	CategoryMemory
	CategoryConfig
	CategoryBackend
	CategoryFile
	CategoryParameter
	CategoryThread
)

func CategorizeError ¶

func CategorizeError(err error) ErrorCategory

CategorizeError determines the category of an error

func (ErrorCategory) String ¶

func (ec ErrorCategory) String() string

String returns the string representation of an error category

type ErrorHandler ¶

type ErrorHandler struct {
	// contains filtered or unexported fields
}

ErrorHandler provides centralized error handling and logging

func GetGlobalErrorHandler ¶

func GetGlobalErrorHandler() *ErrorHandler

GetGlobalErrorHandler returns the global error handler

func NewErrorHandler ¶

func NewErrorHandler(enableLogging bool) *ErrorHandler

NewErrorHandler creates a new error handler

func (*ErrorHandler) HandleError ¶

func (eh *ErrorHandler) HandleError(err error, context string) error

HandleError processes and logs an error

func (*ErrorHandler) SetLogCallback ¶

func (eh *ErrorHandler) SetLogCallback(callback func(level int, message string))

SetLogCallback sets the log callback function

type LibraryDownloader ¶

type LibraryDownloader struct {
	// contains filtered or unexported fields
}

LibraryDownloader handles downloading pre-built llama.cpp binaries

func NewLibraryDownloader ¶

func NewLibraryDownloader() (*LibraryDownloader, error)

NewLibraryDownloader creates a new library downloader instance

func (*LibraryDownloader) CleanCache ¶

func (d *LibraryDownloader) CleanCache() error

CleanCache removes old cached library files

func (*LibraryDownloader) DownloadAndExtract ¶

func (d *LibraryDownloader) DownloadAndExtract(downloadURL, filename string) (string, error)

DownloadAndExtract downloads and extracts the library archive

func (*LibraryDownloader) DownloadAndExtractWithChecksum ¶

func (d *LibraryDownloader) DownloadAndExtractWithChecksum(downloadURL, filename, expectedChecksum string) (string, string, error)

DownloadAndExtractWithChecksum downloads and extracts the library archive with checksum verification

func (*LibraryDownloader) DownloadMultiplePlatforms ¶

func (d *LibraryDownloader) DownloadMultiplePlatforms(platforms []string, version string) ([]DownloadResult, error)

DownloadMultiplePlatforms downloads libraries for multiple platforms in parallel

func (*LibraryDownloader) FindAssetByPattern ¶

func (d *LibraryDownloader) FindAssetByPattern(release *ReleaseInfo, pattern string) (string, string, error)

FindAssetByPattern finds an asset that matches the given pattern

func (*LibraryDownloader) FindLibraryPath ¶

func (d *LibraryDownloader) FindLibraryPath(extractedDir string) (string, error)

FindLibraryPath finds the main library file in the extracted directory

func (*LibraryDownloader) FindLibraryPathForPlatform ¶

func (d *LibraryDownloader) FindLibraryPathForPlatform(extractedDir, goos string) (string, error)

FindLibraryPathForPlatform finds the main library file for a specific platform

func (*LibraryDownloader) GetLatestRelease ¶

func (d *LibraryDownloader) GetLatestRelease() (*ReleaseInfo, error)

GetLatestRelease fetches the latest release information from GitHub

func (*LibraryDownloader) GetPlatformAssetPattern ¶

func (d *LibraryDownloader) GetPlatformAssetPattern() (string, error)

GetPlatformAssetPattern returns the asset name pattern for the current platform

func (*LibraryDownloader) GetPlatformAssetPatternForPlatform ¶

func (d *LibraryDownloader) GetPlatformAssetPatternForPlatform(goos, goarch string) (string, error)

GetPlatformAssetPatternForPlatform returns the asset name pattern for a specific platform

func (*LibraryDownloader) GetReleaseByTag ¶

func (d *LibraryDownloader) GetReleaseByTag(tag string) (*ReleaseInfo, error)

GetReleaseByTag fetches release information for a specific tag

type LibraryLoader ¶

type LibraryLoader struct {
	// contains filtered or unexported fields
}

Library loader manages the loading and lifecycle of llama.cpp shared libraries

func (*LibraryLoader) GetHandle ¶

func (l *LibraryLoader) GetHandle() uintptr

GetHandle returns the library handle

func (*LibraryLoader) IsLoaded ¶

func (l *LibraryLoader) IsLoaded() bool

IsLoaded returns whether the library is loaded

func (*LibraryLoader) LoadLibrary ¶

func (l *LibraryLoader) LoadLibrary() error

LoadLibrary loads the appropriate llama.cpp library for the current platform

func (*LibraryLoader) LoadLibraryWithVersion ¶

func (l *LibraryLoader) LoadLibraryWithVersion(version string) error

LoadLibraryWithVersion loads the llama.cpp library for a specific version If version is empty, it loads the latest version

func (*LibraryLoader) UnloadLibrary ¶

func (l *LibraryLoader) UnloadLibrary() error

UnloadLibrary unloads the library and cleans up resources

type LlamaAdapterLora ¶

type LlamaAdapterLora uintptr

type LlamaAttentionType ¶

type LlamaAttentionType int32

const (
	LLAMA_ATTENTION_TYPE_CAUSAL     LlamaAttentionType = 0
	LLAMA_ATTENTION_TYPE_NON_CAUSAL LlamaAttentionType = 1
)

type LlamaBatch ¶

type LlamaBatch struct {
	NTokens int32        // number of tokens
	Token   *LlamaToken  // tokens
	Embd    *float32     // embeddings (if using embeddings instead of tokens)
	Pos     *LlamaPos    // positions
	NSeqId  *int32       // number of sequence IDs per token
	SeqId   **LlamaSeqId // sequence IDs
	Logits  *int8        // whether to compute logits for each token
}

func Batch_get_one ¶

func Batch_get_one(tokens []LlamaToken) LlamaBatch

Batch_get_one creates a batch from a single set of tokens

func Batch_init ¶

func Batch_init(nTokens, embd, nSeqMax int32) LlamaBatch

Batch_init creates a new batch

type LlamaChatMessage ¶

type LlamaChatMessage struct {
	Role    *byte // role string
	Content *byte // content string
}

Chat message

type LlamaContext ¶

type LlamaContext uintptr

func Init_from_model ¶

func Init_from_model(model LlamaModel, params LlamaContextParams) (LlamaContext, error)

Init_from_model creates a context from a model

type LlamaContextParams ¶

type LlamaContextParams struct {
	Seed              uint32               // RNG seed, -1 for random
	NCtx              uint32               // text context, 0 = from model
	NBatch            uint32               // logical maximum batch size
	NUbatch           uint32               // physical maximum batch size
	NSeqMax           uint32               // max number of sequences
	NThreads          int32                // number of threads to use for generation
	NThreadsBatch     int32                // number of threads to use for batch processing
	RopeScalingType   LlamaRopeScalingType // RoPE scaling type
	PoolingType       LlamaPoolingType     // pooling type for embeddings
	AttentionType     LlamaAttentionType   // attention type
	RopeFreqBase      float32              // RoPE base frequency
	RopeFreqScale     float32              // RoPE frequency scaling factor
	YarnExtFactor     float32              // YaRN extrapolation mix factor
	YarnAttnFactor    float32              // YaRN magnitude scaling factor
	YarnBetaFast      float32              // YaRN low correction dim
	YarnBetaSlow      float32              // YaRN high correction dim
	YarnOrigCtx       uint32               // YaRN original context size
	DefragThold       float32              // defragment the KV cache if holes/size > thold
	CbEval            uintptr              // evaluation callback
	CbEvalUserData    uintptr              // user data for evaluation callback
	TypeK             int32                // data type for K cache
	TypeV             int32                // data type for V cache
	AbortCallback     uintptr              // abort callback
	AbortCallbackData uintptr              // user data for abort callback
	Logits            uint8                // whether to compute and return logits (bool as uint8)
	Embeddings        uint8                // whether to compute and return embeddings (bool as uint8)
	Offload_kqv       uint8                // whether to offload K, Q, V to GPU (bool as uint8)
	FlashAttn         uint8                // whether to use flash attention (bool as uint8)
	NoPerf            uint8                // whether to measure performance (bool as uint8)
}

Context parameters

func ContextDefaultParams ¶

func ContextDefaultParams() LlamaContextParams

func Context_default_params ¶

func Context_default_params() LlamaContextParams

Context_default_params returns default context parameters

type LlamaError ¶

type LlamaError struct {
	Code     int    `json:"code"`
	Message  string `json:"message"`
	Function string `json:"function,omitempty"`
	File     string `json:"file,omitempty"`
	Line     int    `json:"line,omitempty"`
	Cause    error  `json:"cause,omitempty"`
}

LlamaError represents a structured error from the llama.cpp library

func NewLlamaError ¶

func NewLlamaError(code int, message string) *LlamaError

NewLlamaError creates a new LlamaError

func NewLlamaErrorWithCause ¶

func NewLlamaErrorWithCause(code int, message string, cause error) *LlamaError

NewLlamaErrorWithCause creates a new LlamaError with an underlying cause

func NewLlamaErrorWithContext ¶

func NewLlamaErrorWithContext(code int, message, function string) *LlamaError

NewLlamaErrorWithContext creates a new LlamaError with function context

func (*LlamaError) Error ¶

func (e *LlamaError) Error() string

Error implements the error interface

func (*LlamaError) Is ¶

func (e *LlamaError) Is(target error) bool

Is implements error matching

func (*LlamaError) Unwrap ¶

func (e *LlamaError) Unwrap() error

Unwrap returns the underlying cause

type LlamaFtype ¶

type LlamaFtype int32

const (
	LLAMA_FTYPE_ALL_F32        LlamaFtype = 0
	LLAMA_FTYPE_MOSTLY_F16     LlamaFtype = 1
	LLAMA_FTYPE_MOSTLY_Q4_0    LlamaFtype = 2
	LLAMA_FTYPE_MOSTLY_Q4_1    LlamaFtype = 3
	LLAMA_FTYPE_MOSTLY_Q8_0    LlamaFtype = 7
	LLAMA_FTYPE_MOSTLY_Q5_0    LlamaFtype = 8
	LLAMA_FTYPE_MOSTLY_Q5_1    LlamaFtype = 9
	LLAMA_FTYPE_MOSTLY_Q2_K    LlamaFtype = 10
	LLAMA_FTYPE_MOSTLY_Q3_K_S  LlamaFtype = 11
	LLAMA_FTYPE_MOSTLY_Q3_K_M  LlamaFtype = 12
	LLAMA_FTYPE_MOSTLY_Q3_K_L  LlamaFtype = 13
	LLAMA_FTYPE_MOSTLY_Q4_K_S  LlamaFtype = 14
	LLAMA_FTYPE_MOSTLY_Q4_K_M  LlamaFtype = 15
	LLAMA_FTYPE_MOSTLY_Q5_K_S  LlamaFtype = 16
	LLAMA_FTYPE_MOSTLY_Q5_K_M  LlamaFtype = 17
	LLAMA_FTYPE_MOSTLY_Q6_K    LlamaFtype = 18
	LLAMA_FTYPE_MOSTLY_IQ2_XXS LlamaFtype = 19
	LLAMA_FTYPE_MOSTLY_IQ2_XS  LlamaFtype = 20
	LLAMA_FTYPE_MOSTLY_Q2_K_S  LlamaFtype = 21
	LLAMA_FTYPE_MOSTLY_IQ3_XS  LlamaFtype = 22
)

type LlamaGpuBackend ¶

type LlamaGpuBackend int32

const (
	LLAMA_GPU_BACKEND_NONE   LlamaGpuBackend = 0
	LLAMA_GPU_BACKEND_CPU    LlamaGpuBackend = 1
	LLAMA_GPU_BACKEND_CUDA   LlamaGpuBackend = 2
	LLAMA_GPU_BACKEND_METAL  LlamaGpuBackend = 3
	LLAMA_GPU_BACKEND_HIP    LlamaGpuBackend = 4
	LLAMA_GPU_BACKEND_VULKAN LlamaGpuBackend = 5
	LLAMA_GPU_BACKEND_OPENCL LlamaGpuBackend = 6
	LLAMA_GPU_BACKEND_SYCL   LlamaGpuBackend = 7
)

func DetectGpuBackend ¶

func DetectGpuBackend() LlamaGpuBackend

DetectGpuBackend detects the available GPU backend on the current system

func (LlamaGpuBackend) String ¶

func (b LlamaGpuBackend) String() string

String returns the string representation of the GPU backend

type LlamaLogitBias ¶

type LlamaLogitBias struct {
	Token LlamaToken
	Bias  float32
}

Logit bias

type LlamaMemory ¶

type LlamaMemory uintptr

Common types matching llama.cpp

func Get_memory ¶

func Get_memory(ctx LlamaContext) LlamaMemory

Get_memory returns the memory handle for the context

type LlamaModel ¶

type LlamaModel uintptr

Opaque types (represented as pointers)

func Model_load_from_file ¶

func Model_load_from_file(pathModel string, params LlamaModelParams) (LlamaModel, error)

Model_load_from_file loads a model from a file

type LlamaModelParams ¶

type LlamaModelParams struct {
	Devices                  uintptr        // ggml_backend_dev_t * - NULL-terminated list of devices
	TensorBuftOverrides      uintptr        // const struct llama_model_tensor_buft_override *
	NGpuLayers               int32          // number of layers to store in VRAM
	SplitMode                LlamaSplitMode // how to split the model across multiple GPUs
	MainGpu                  int32          // the GPU that is used for the entire model
	TensorSplit              *float32       // proportion of the model to offload to each GPU
	ProgressCallback         uintptr        // llama_progress_callback function pointer
	ProgressCallbackUserData uintptr        // context pointer passed to the progress callback
	KvOverrides              uintptr        // const struct llama_model_kv_override *
	VocabOnly                uint8          // only load the vocabulary, no weights (bool as uint8)
	UseMmap                  uint8          // use mmap if possible (bool as uint8)
	UseMlock                 uint8          // force system to keep model in RAM (bool as uint8)
	CheckTensors             uint8          // validate model tensor data (bool as uint8)
	UseExtraBufts            uint8          // use extra buffer types (bool as uint8)
}

Model parameters

func ModelDefaultParams ¶

func ModelDefaultParams() LlamaModelParams

Helper functions for platforms where struct returns aren't supported

func Model_default_params ¶

func Model_default_params() LlamaModelParams

Model_default_params returns default model parameters

type LlamaModelQuantizeParams ¶

type LlamaModelQuantizeParams struct {
	NThread              int32      // number of threads to use for quantizing
	Ftype                LlamaFtype // quantize to this llama_ftype
	OutputTensorType     int32      // output tensor type
	TokenEmbeddingType   int32      // itoken embeddings tensor type
	AllowRequantize      uint8      // allow quantizing non-f32/f16 tensors (bool as uint8)
	QuantizeOutputTensor uint8      // quantize output.weight (bool as uint8)
	OnlyF32              uint8      // quantize only f32 tensors (bool as uint8)
	PureF16              uint8      // disable k-quant mixtures and quantize all tensors to the same type (bool as uint8)
	KeepSplit            uint8      // keep split tensors (bool as uint8)
	IMatrix              *byte      // importance matrix data
	KqsWarning           uint8      // warning for quantization quality loss (bool as uint8)
}

Model quantize parameters

type LlamaPoolingType ¶

type LlamaPoolingType int32

const (
	LLAMA_POOLING_TYPE_UNSPECIFIED LlamaPoolingType = -1
	LLAMA_POOLING_TYPE_NONE        LlamaPoolingType = 0
	LLAMA_POOLING_TYPE_MEAN        LlamaPoolingType = 1
	LLAMA_POOLING_TYPE_CLS         LlamaPoolingType = 2
	LLAMA_POOLING_TYPE_LAST        LlamaPoolingType = 3
	LLAMA_POOLING_TYPE_RANK        LlamaPoolingType = 4
)

type LlamaPos ¶

type LlamaPos int32

Common types matching llama.cpp

type LlamaRopeScalingType ¶

type LlamaRopeScalingType int32

const (
	LLAMA_ROPE_SCALING_TYPE_UNSPECIFIED LlamaRopeScalingType = -1
	LLAMA_ROPE_SCALING_TYPE_NONE        LlamaRopeScalingType = 0
	LLAMA_ROPE_SCALING_TYPE_LINEAR      LlamaRopeScalingType = 1
	LLAMA_ROPE_SCALING_TYPE_YARN        LlamaRopeScalingType = 2
)

type LlamaSampler ¶

type LlamaSampler uintptr

func Sampler_init_greedy ¶

func Sampler_init_greedy() LlamaSampler

Sampler_init_greedy creates a greedy sampler

type LlamaSamplerChainParams ¶

type LlamaSamplerChainParams struct {
	NoPerf uint8 // whether to measure performance timings (bool as uint8)
}

Sampler chain parameters

func SamplerChainDefaultParams ¶

func SamplerChainDefaultParams() LlamaSamplerChainParams

func Sampler_chain_default_params ¶

func Sampler_chain_default_params() LlamaSamplerChainParams

Sampler_chain_default_params returns default sampler chain parameters

type LlamaSeqId ¶

type LlamaSeqId int32

Common types matching llama.cpp

type LlamaSplitMode ¶

type LlamaSplitMode int32

const (
	LLAMA_SPLIT_MODE_NONE  LlamaSplitMode = 0
	LLAMA_SPLIT_MODE_LAYER LlamaSplitMode = 1
	LLAMA_SPLIT_MODE_ROW   LlamaSplitMode = 2
)

type LlamaToken ¶

type LlamaToken int32

Common types matching llama.cpp

func Sampler_sample ¶

func Sampler_sample(sampler LlamaSampler, ctx LlamaContext, idx int32) LlamaToken

Sampler_sample samples a token from the logits at the given index (-1 for last token)

func Tokenize ¶

func Tokenize(model LlamaModel, text string, addSpecial, parseSpecial bool) ([]LlamaToken, error)

Tokenize tokenizes text

type LlamaTokenAttr ¶

type LlamaTokenAttr int32

const (
	LLAMA_TOKEN_ATTR_UNDEFINED   LlamaTokenAttr = 0
	LLAMA_TOKEN_ATTR_UNKNOWN     LlamaTokenAttr = 1 << 0
	LLAMA_TOKEN_ATTR_UNUSED      LlamaTokenAttr = 1 << 1
	LLAMA_TOKEN_ATTR_NORMAL      LlamaTokenAttr = 1 << 2
	LLAMA_TOKEN_ATTR_CONTROL     LlamaTokenAttr = 1 << 3
	LLAMA_TOKEN_ATTR_USER_DEF    LlamaTokenAttr = 1 << 4
	LLAMA_TOKEN_ATTR_BYTE        LlamaTokenAttr = 1 << 5
	LLAMA_TOKEN_ATTR_LSTRIP      LlamaTokenAttr = 1 << 6
	LLAMA_TOKEN_ATTR_RSTRIP      LlamaTokenAttr = 1 << 7
	LLAMA_TOKEN_ATTR_SINGLE_WORD LlamaTokenAttr = 1 << 8
)

type LlamaTokenData ¶

type LlamaTokenData struct {
	Id    LlamaToken // token id
	Logit float32    // log-odds of the token
	P     float32    // probability of the token
}

Structs

type LlamaTokenDataArray ¶

type LlamaTokenDataArray struct {
	Data     *LlamaTokenData // pointer to token data array
	Size     uint64          // number of tokens
	Selected int64           // index of selected token (-1 if none)
	Sorted   uint8           // whether the array is sorted by probability (bool as uint8)
}

func Token_data_array_from_logits ¶

func Token_data_array_from_logits(model LlamaModel, logits *float32) *LlamaTokenDataArray

Token_data_array_from_logits creates a token data array from logits

func Token_data_array_init ¶

func Token_data_array_init(model LlamaModel) *LlamaTokenDataArray

Token_data_array_init creates a token data array (helper function)

type LlamaTokenType ¶

type LlamaTokenType int32

const (
	LLAMA_TOKEN_TYPE_UNDEFINED LlamaTokenType = iota
	LLAMA_TOKEN_TYPE_NORMAL
	LLAMA_TOKEN_TYPE_UNKNOWN
	LLAMA_TOKEN_TYPE_CONTROL
	LLAMA_TOKEN_TYPE_USER_DEFINED
	LLAMA_TOKEN_TYPE_UNUSED
	LLAMA_TOKEN_TYPE_BYTE
)

type LlamaVocab ¶

type LlamaVocab uintptr

type LlamaVocabType ¶

type LlamaVocabType int32

Enums

const (
	LLAMA_VOCAB_TYPE_NONE LlamaVocabType = iota
	LLAMA_VOCAB_TYPE_SPM
	LLAMA_VOCAB_TYPE_BPE
	LLAMA_VOCAB_TYPE_WPM
	LLAMA_VOCAB_TYPE_UGM
	LLAMA_VOCAB_TYPE_RWKV
)

type ReleaseInfo ¶

type ReleaseInfo struct {
	TagName string `json:"tag_name"`
	Assets  []struct {
		Name               string `json:"name"`
		BrowserDownloadURL string `json:"browser_download_url"`
		Size               int64  `json:"size"`
	} `json:"assets"`
}

ReleaseInfo represents GitHub release information

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
gollama-download command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

Gollama.cpp

Features

Supported Platforms

✅ Fully Supported Platforms

macOS

Linux

🚧 In Development

Windows

Platform-Specific Implementation Details

Installation

Cross-Platform Development

Build Compatibility Matrix

Development Workflow

Quick Start

Advanced Usage

GPU Configuration

GPU Support Matrix

Model Loading Options

Library Management

Command Line Tools

Available Library Variants

Cache Location

Building from Source

Prerequisites

Quick Start

GPU Detection Logic

Version Compatibility

Documentation

Examples

Contributing

License

Acknowledgments

Support

Documentation ¶

Overview ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func ApplyConfig ¶

func Backend_free ¶

func Backend_init ¶

func Batch_free ¶

func CheckResult ¶

func CleanLibraryCache ¶

func Cleanup ¶

func Decode ¶

func ErrorFromCode ¶

func Free ¶

func GetConfigPath ¶

func GetSHA256ForFile ¶

func Get_embeddings ¶

func Get_embeddings_ith ¶

func Get_logits ¶

func Get_logits_ith ¶

func HandleError ¶

func IsFatalError ¶

func IsRetryableError ¶

func LoadLibraryWithVersion ¶

func Max_devices ¶

func Memory_clear ¶

func Model_free ¶

func Model_n_embd ¶

func Must ¶

func Print_system_info ¶

func RegisterFunction ¶

func Sampler_free ¶

func SetGlobalConfig ¶

func SetGlobalErrorHandler ¶

func Set_causal_attn ¶

func Set_embeddings ¶

func Supports_gpu_offload ¶

func Supports_mlock ¶

func Supports_mmap ¶

func Token_to_piece ¶

func Try ¶

func WrapError ¶

func WrapErrorf ¶

Types ¶