KEMBAR78
server : (webui) revamp the input area, plus many small UI improvements by ngxson · Pull Request #13365 · ggml-org/llama.cpp · GitHub
Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented May 7, 2025

Many small UI/UX improvements that I wanted to have, but now finally have time to do.

TODO:

  • allow upload files
  • allow uploading non-txt file, for example .c, .py, etc
  • check if server has multimodal, if not, don't allow image upload
  • allow renaming conversation
  • list conversations: group by time
  • autoscroll: detect size changed instead of checking on each token generated (better performance)

Overall UI:

image

Detail improvements

(1) Remove background color in assistant message, to make it easier to read

image

(2) Improve the input UI/UX

image

(3) Allow uploading files (plus, prepare the code to upload image - enable it once the API support image input via #12898 )

Drap-and-drop file to input area is supported

TODO in a next PR: allow ctrl+v long content, the long content will be converted to file automatically

image image

(4) Use HeroIcons everywhere + more consistent icons

image

(5) Move conversation options menu to sidebar, add "rename" option

image

(6) Improve "thought process" display

image

Note: the default assistant message is removed, this is because more and more models not requiring system message

@ngxson
Copy link
Collaborator Author

ngxson commented May 7, 2025

CC @gary149 if you have some suggestions :)

@ZUIcat
Copy link

ZUIcat commented May 8, 2025

Could you please label the conversation with the model name, like open-web-ui does? For example, displaying the model's name on each message bubble.

@ngxson
Copy link
Collaborator Author

ngxson commented May 8, 2025

Could you please label the conversation with the model name, like open-web-ui does? For example, displaying the model's name on each message bubble.

Yes I thought about this but forgot to note it down, will do it now

@ngxson ngxson marked this pull request as ready for review May 8, 2025 08:48
@ngxson ngxson requested review from ggerganov and slaren May 8, 2025 08:48
throw std::runtime_error("Failed to parse messages: " + std::string(e.what()) + "; messages = " + messages.dump(2));
// @ngxson : disable otherwise it's bloating the API response
// printf("%s\n", std::string("; messages = ") + messages.dump(2));
throw std::runtime_error("Failed to parse messages: " + std::string(e.what()));
Copy link
Collaborator Author

@ngxson ngxson May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small note @ochafik , we should never reflect user input via the error message, as it usually comes with security risks. In this case, it's not a security risk, but it's a bit inconvenient to display the error in the UI

If you want to print the input for debugging, consider using LOG_DBG

Comment on lines +284 to +290
<b>Server Info</b>
<p>
<b>Model</b>: {serverProps?.model_path?.split(/(\\|\/)/).pop()}
<br />
<b>Build</b>: {serverProps?.build_info}
<br />
</p>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: maybe display extra info, like server capabilities (speculative, FIM, multimodal, etc)

Copy link
Collaborator Author

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some comments to make reviewing easier

Comment on lines +101 to +110
// WARN: vibe code below
// This code is a heuristic to determine if a string is likely not binary.
// It is necessary because input file can have various mime types which we don't have time to investigate.
// For example, a python file can be text/plain, application/x-python, etc.
export function isLikelyNotBinary(str: string): boolean {
const options = {
prefixLength: 1024 * 10, // Check the first 10KB of the string
suspiciousCharThresholdRatio: 0.15, // Allow up to 15% suspicious chars
maxAbsoluteNullBytes: 2,
};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each file extension like .py, .c, .cpp, .sh, .bat, etc has their own set of mime type which is very difficult to keep track. Therefore, this function was added.

It should also work with unicode text. I tested with:

model_path: string;
n_ctx: number;
has_multimodal: boolean;
// TODO: support params
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a next PR, we can use the server's default params to override the settings, this will address #13277

Comment on lines +22 to +34
export function useChatScroll(msgListRef: React.RefObject<HTMLDivElement>) {
useEffect(() => {
if (!msgListRef.current) return;

const resizeObserver = new ResizeObserver((_) => {
scrollToBottomThrottled(true, 10);
});

resizeObserver.observe(msgListRef.current);
return () => {
resizeObserver.disconnect();
};
}, [msgListRef]);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the old logic where scrollToBottom gets triggered every time a new token is received. The new approach relies on ResizeObserver, which is a browser's API to observe if a HTML element changes its size. This should now cover the case where browser window is resized

Comment on lines +85 to +89
} else if (extra.type === 'textFile') {
contentArr.push({
type: 'text',
text: `File: ${extra.name}\nContent:\n\n${extra.content}`,
});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is now the input file will be formatted upon sending to server. The file name will be preserved

@slaren
Copy link
Member

slaren commented May 8, 2025

Looks very nice.

A bit of a nit, but I noticed that there is not enough margin at the bottom of the code boxes and it leads to situations like this:
image

Caused by a line separator (---) after the code:

    dfs(0, graph);
    std::cout << "\n";

    return 0;
}
```

---

### 🔍 Explanation

- **Graph Representation**: The graph is represented using an **adjacency list** as a `std::vector<std::vector<int>>`.

@ngxson
Copy link
Collaborator Author

ngxson commented May 8, 2025

Thanks for testing, this should be fixed now:

Before:

image

After:

image

@slaren
Copy link
Member

slaren commented May 8, 2025

That fixes it for the line separator, but I still think that the code blocks have the wrong margin. For example:
image
It should have even spacing on the top and bottom (from VSCode markdown preview):
image

Source:

```cpp
void dfs_recursive(int node, const std::vector<std::vector<int>>& graph, std::vector<bool>& visited) {
    visited[node] = true;
    std::cout << node << " ";
    for (int neighbor : graph[node]) {
        if (!visited[neighbor])
            dfs_recursive(neighbor, graph, visited);
    }
}
```

And in `main`:

```cpp
std::vector<bool> visited(graph.size(), false);
dfs_recursive(0, graph, visited);
std::cout << "\n";
```

@ngxson
Copy link
Collaborator Author

ngxson commented May 8, 2025

Ok I see, I added a margin bottom mb-3 to <pre> block, it should look better now:

image

@slaren
Copy link
Member

slaren commented May 8, 2025

Looks great now.

@ngxson ngxson merged commit 8c83449 into ggml-org:master May 8, 2025
44 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 9, 2025
* origin/master: (39 commits)
server : vision support via libmtmd (ggml-org#12898)
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (ggml-org#12858)
metal : optimize MoE for large batches (ggml-org#13388)
CUDA: FA support for Deepseek (Ampere or newer) (ggml-org#13306)
llama : do not crash if there is no CPU backend (ggml-org#13395)
CUDA: fix crash on large batch size for MoE models (ggml-org#13384)
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (ggml-org#13389)
llama-run: add support for downloading models from ModelScope (ggml-org#13370)
mtmd : fix batch_view for m-rope (ggml-org#13397)
llama : one-off chat template fix for Mistral-Small-2503 (ggml-org#13398)
rpc : add rpc_msg_set_tensor_hash_req (ggml-org#13353)
vulkan: Allow up to 4096 elements for mul_mat_id row_ids (ggml-org#13326)
server : (webui) rename has_multimodal --> modalities (ggml-org#13393)
ci : limit write permission to only the release step + fixes (ggml-org#13392)
mtmd : Expose helper_decode_image_chunk (ggml-org#13366)
server : (webui) fix a very small misalignment (ggml-org#13387)
server : (webui) revamp the input area, plus many small UI improvements (ggml-org#13365)
convert : support rope_scaling type and rope_type (ggml-org#13349)
mtmd : fix the calculation of n_tokens for smolvlm (ggml-org#13381)
context : allow cache-less context for embeddings (ggml-org#13108)
...
@AeneasZhu
Copy link

image

@ngxson my vertical 3 points icon is missing from the sidebar.

@AeneasZhu
Copy link

image

@ngxson my vertical 3 points icon is missing from the sidebar.

I found the reason behind that. After I turned off the fullscreen, the icon showed. Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants