KEMBAR78 | Portal Berita

23 Oct 20:21

0bf47a1

b6829 Latest

Latest

server: add memory breakdown print (#16740)

Assets 16

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-23T20:21:34Z
llama-b6829-bin-macos-arm64.zip

sha256:570ab0813b548a353dc5e917a196dd726c551cca54cfe11836a2edbd89d8f8b4

10.4 MB 2025-10-23T20:21:51Z
llama-b6829-bin-macos-x64.zip

sha256:09454a6b6fee267d15c8c52965b0dc377224122d44bc4a91f6d4d5c16a5f2949

27.1 MB 2025-10-23T20:21:52Z
llama-b6829-bin-ubuntu-s390x-z15.zip

sha256:f0d2360c388fb6f8ee67ef8e9777868883cd21c3805e7a616908e5a246f62911

10.5 MB 2025-10-23T20:21:54Z
llama-b6829-bin-ubuntu-vulkan-x64.zip

sha256:e2e1122fec71f54b05b299b7abc70f98e70d99e60d454b4d5d52ea2aa0023f71

25.9 MB 2025-10-23T20:21:55Z
llama-b6829-bin-ubuntu-x64.zip

sha256:73358bcc03756d7a1cf32b1d24edbbd2bfcb7aaf4bc74b27c15beae51abc23a5

12.5 MB 2025-10-23T20:21:57Z
llama-b6829-bin-win-cpu-arm64.zip

sha256:85968a93b863605d3a30489274afa1b8032140564def5fbec78d82465e88c924

10.6 MB 2025-10-23T20:21:58Z
llama-b6829-bin-win-cpu-x64.zip

sha256:4e3fb22d0a5e2b5edecbf473b41bd0cc39a5922ceaf11a05022dd6ab48f2f4f3

13.7 MB 2025-10-23T20:21:59Z
llama-b6829-bin-win-cuda-12.4-x64.zip

sha256:1abd4115f3e5d20b5f33d81e1e4ec15b6c2a85acb5a1119f3fbfff06f735a7af

169 MB 2025-10-23T20:22:01Z
llama-b6829-bin-win-hip-radeon-x64.zip

sha256:9ffc5d7e756fb08962ede6c3b4dab7ab735753ac8311bd647d9a6fc0e77e266e

322 MB 2025-10-23T20:22:10Z
Source code (zip)

2025-10-23T19:30:17Z
Source code (tar.gz)

2025-10-23T19:30:17Z

23 Oct 14:39

github-actions

b6827

d0660f2

b6827

mtmd-cli : allow using --jinja (#16718)

* mtmd-cli : allow using --jinja

* support -sys

* implement chat_history

* fix clear memory

* rm -sys support, added TODO

Assets 16

23 Oct 12:38

github-actions

b6826

fe6a988

b6826

Manually link -lbsd to resolve flock symbol on AIX (#16610)

Assets 16

23 Oct 12:06

github-actions

b6825

061f0ef

b6825

ggml-cuda: use passed ops instead of hardcoded ops (#16712)

Assets 16

23 Oct 10:16

github-actions

b6824

8cf6b42

b6824

server : send partial stop string when <EOG> is reached (#15007)

Assets 16

23 Oct 01:28

github-actions

b6823

9de9672

b6823

sycl: use async memory allocation to fix crashes during graph recordi…

Assets 16

22 Oct 21:13

github-actions

b6822

63d2fc4

b6822

Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)

* model: add support for extra bufs for all devices

* hexagon: add experimental ggml-hexagon backend for the Hexagon NPU

This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU.

Highlights:
- Supports Hexagon versions: v73, v75, v79, and v81
- Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5
- Supports Q4_0, Q8_0, MXFP4, and FP32 data types
- Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX

**Note:** This backend is experimental and may exhibit instability or limited performance across supported devices.
It is intended for early testing and feedback from llama.cpp/ggml developer and user community.

Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com>
Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com>

* hexagon: fix format checker errors

* hexagon: update readme and cmake presets

* ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions

* hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input

* hexagon: move ADB helper scripts into scripts/snapdragon/adb

* hexagon: replace all f/printfs with GGML_LOG_...

* readme: add hexagon to the list supported backends

* hexagon: stack malmuts with quantized inputs only

* hexagon: add TODO for fixing issues in hexagon_graph_optimize

* hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC

* scripts: fix lint errors

* scripts: update qdc pytest script to make linter happy

* hexagon: add reduce sum in fp32

* hexagon: reduce number of vector stores in matmul output

* hexagon: remove the need for vdelta in reduce-multiply-x8

* hexagon: consistent use of reduce_sum_fp32 for row_sums

* hexagon: some more matmul optimizations and comments

Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models).
We've handled those cases already but at a higher overhead.

* hexagon: update cmake presets

* hexagon: add OPMASK support for run-bench.sh wrapper

* hexagon: update to use GGML_BACKEND_API

* hexagon: remove unused logic for setting tensor flags for the views

* hexagon: add asserts to set/get_tensor to make sure we handle complete tensors

Same asserts as the CPU backend.

* hexagon: use cpy_tensor slow path for non-host buffers

* hexagon: error checks in the buffer allocator

* cmake: move include(extProj) under ggml-hexagon

* hexagon: don't forget to delete the backend on free

* hexagon: set/get_tensor size assert apply only to quantized tensors

* hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now

GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way.
Ideally we need a bit more finer log levels.

* docs: typos in hexagon developer docs (libggm-...)

* hexagon: overhaul error handling in the session/device allocation

this should handle all failure paths in the session allocation.

* hexagon: update cmake presets to enable fp16 vectors

* hexagon: remove unused time_usec function

* hexagon: don't forget to release buffer contexts

* hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure)

* hexagon: remove custom can_repeat function and use ggml_can_repeat

---------

Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com>
Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>

Assets 16

22 Oct 18:46

github-actions

b6821

a2e0088

b6821

Revert "ggml : Leverage the existing GGML_F32_VEC helpers to vectoriz…

Assets 16

22 Oct 10:39

github-actions

b6818

d8eaa26

b6818

tests : fix test-thread-safety when compiling with multiple backends …

Assets 16

22 Oct 05:02

github-actions

b6817

9285325

b6817

CUDA: fix bug in topk-moe softmax (#16711)

Assets 16

Releases: ggml-org/llama.cpp

b6829

Uh oh!

b6827

Uh oh!

b6826

Uh oh!

b6825

Uh oh!

b6824

Uh oh!

b6823

Uh oh!

b6822

Uh oh!

b6821

Uh oh!

b6818

Uh oh!

b6817

Uh oh!