mtmd: Expose helper_decode_image_chunk #13366

mattjcly · 2025-05-07T21:54:25Z

New API

Decoding-only helper

mtmd_helper_decode_image_chunk: Split out from mtmd_helper_eval_chunk_single. Same logic as before, but use as a standalone function enables clients to use mtmd_encode at some prior time, cache these embeddings, and then send them in later to mtmd_helper_decode_image_chunk to decode the embeddings without having to re-encode the image (expensive)

Edit: removed below APIs that were in original PR

Output embedding copy

mtmd_get_output_embd_copy: Allows client to embed with mtmd_encode, then get a copy of the embd to hold onto past the lifetime of the embeddings within the mtmd_context. Useful for caching these embedings, and sending into mtmd_helper_decode_image later

`mtmd_image_tokens` management functions

mtmd_image_tokens_copy: Allows clients to get a copy of mtmd_image_tokens from mtmd_input_chunk, for later use to send alongside pre-computed embeddings to mtmd_helper_decode_image.
mtmd_image_tokens_free: For use to free an mtmd_image_tokens *, as can be recieved from mtmd_image_tokens_copy
image_tokens_ptr(made public, existed privately in mtmd.cpp before): Enables auto memory management of mtmd_image_tokens *

@ngxson I'm thinking that maybe there's a way to avoid the need to expose new API for mtmd_image_tokens, since I feel like the statement "for later use to send alongside pre-computed embeddings" about mtmd_image_tokens_copy could potential be weak and the API of mtmd_helper_decode_image could be reworked not need this object in full? But it also seemed like the simplest conversion to enable decoupled embedding + decoding

…/free

tools/mtmd/mtmd.cpp

ngxson

I think this can be make more simple: In the application code, you can handle the embedding copy as I said. This way, you can even have a CPP struct with std::vector<float> which makes memory management much easier. The mtmd API already provided enough function allowing you to do that, so I think we should not extend it more.

A struct in your app could look like this:

struct my_image {
  std::vector<float> embeddings; // the encoded embeddings
  mtmd_input_chunk * chunk; // the chunk containing mtmd_image_tokens
}

tools/mtmd/mtmd.cpp

tools/mtmd/mtmd.h

ngxson

Nice, thanks! 💯 💯

Btw @mattjcly one nice-to-have thing that I'm thinking about, currently mtmd_helper_decode_image_chunk run non-stop while it actually support smaller batch under the hood.

This can lead to a poor UX where user hits "stop" button on the UI, but mtmd_helper_decode_image_chunk still tries to decode the whole image which may takes some extra seconds to finish.

I'm thinking about another version of mtmd_helper_decode_image_chunk (ofc will add it in another PR) which support interrupt-ability. I'm thinking about maybe exposing the i_batch and n_batch to the public API. Do you have any other ideas?

Edit: another idea could be to add a helper that does pre/post batch preparation, then you can llama_decode(prepared_image_batch) in the user code ; but still this may look quite cumbersome 😞

mattjcly · 2025-05-08T18:42:30Z

I'm thinking about another version of mtmd_helper_decode_image_chunk (ofc will add it in another PR) which support interrupt-ability. I'm thinking about maybe exposing the i_batch and n_batch to the public API. Do you have any other ideas?

I like this - I think that 1) having a point where the decoding can be stopped in between batches would be great 2) having a way to, as a user, get progress information during image decoding in the mutli-batch case (other than just the current log) would be great.

maybe exposing the i_batch and n_batch to the public API

Interesting. How would you envision this as the method of supporting interrupt-ability from the client-side? Just trying to understand more

ngxson · 2025-05-09T12:03:02Z

Interesting. How would you envision this as the method of supporting interrupt-ability from the client-side? Just trying to understand more

The most intuitive way to to provide to application code the notion of "a list of batches" instead of a one-do-all API call. A pseudo code looks like this:

list_batches = mtmd_generate_decode_batches()
for batch in list_batches:
  llama_decode(batch)

Then if you want the interrupt-ability:

list_batches = mtmd_generate_decode_batches()
for batch in list_batches:
  if check_user_interrupt():
    break  # stop the decode
  llama_decode(batch)

I'm thinking about this line, maybe this will be implemented as a cpp-only API to make it easier to manage batch allocation

* origin/master: (39 commits) server : vision support via libmtmd (ggml-org#12898) sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (ggml-org#12858) metal : optimize MoE for large batches (ggml-org#13388) CUDA: FA support for Deepseek (Ampere or newer) (ggml-org#13306) llama : do not crash if there is no CPU backend (ggml-org#13395) CUDA: fix crash on large batch size for MoE models (ggml-org#13384) imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (ggml-org#13389) llama-run: add support for downloading models from ModelScope (ggml-org#13370) mtmd : fix batch_view for m-rope (ggml-org#13397) llama : one-off chat template fix for Mistral-Small-2503 (ggml-org#13398) rpc : add rpc_msg_set_tensor_hash_req (ggml-org#13353) vulkan: Allow up to 4096 elements for mul_mat_id row_ids (ggml-org#13326) server : (webui) rename has_multimodal --> modalities (ggml-org#13393) ci : limit write permission to only the release step + fixes (ggml-org#13392) mtmd : Expose helper_decode_image_chunk (ggml-org#13366) server : (webui) fix a very small misalignment (ggml-org#13387) server : (webui) revamp the input area, plus many small UI improvements (ggml-org#13365) convert : support rope_scaling type and rope_type (ggml-org#13349) mtmd : fix the calculation of n_tokens for smolvlm (ggml-org#13381) context : allow cache-less context for embeddings (ggml-org#13108) ...

mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy…

227e139

…/free

github-actions bot added the examples label May 7, 2025

ngxson reviewed May 7, 2025

View reviewed changes

tools/mtmd/mtmd.cpp Outdated Show resolved Hide resolved

ngxson reviewed May 7, 2025

View reviewed changes

tools/mtmd/mtmd.cpp Outdated Show resolved Hide resolved

Slim down

816a375

mattjcly requested a review from ngxson May 8, 2025 16:47

mattjcly changed the title ~~mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free~~ mtmd: Expose helper_decode_image_chunk May 8, 2025

ngxson reviewed May 8, 2025

View reviewed changes

tools/mtmd/mtmd.cpp Outdated Show resolved Hide resolved

ngxson reviewed May 8, 2025

View reviewed changes

tools/mtmd/mtmd.h Outdated Show resolved Hide resolved

ngxson reviewed May 8, 2025

View reviewed changes

tools/mtmd/mtmd.h Show resolved Hide resolved

Cleanups

4d17bfc

mattjcly requested a review from ngxson May 8, 2025 17:17

ngxson approved these changes May 8, 2025

View reviewed changes

ngxson merged commit f05a6d7 into ggml-org:master May 8, 2025
44 checks passed

mattjcly deleted the mtmd-api-extension branch May 8, 2025 18:37

ngxson mentioned this pull request May 9, 2025

server : vision support via libmtmd #12898

Merged

8 tasks

ngxson mentioned this pull request May 10, 2025

mtmd : move helpers to dedicated file #13442

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd: Expose helper_decode_image_chunk #13366

mtmd: Expose helper_decode_image_chunk #13366

mattjcly commented May 7, 2025 •

edited

Loading

ngxson left a comment •

edited

Loading

ngxson left a comment •

edited

Loading

mattjcly commented May 8, 2025

ngxson commented May 9, 2025

mtmd: Expose helper_decode_image_chunk #13366

mtmd: Expose helper_decode_image_chunk #13366

Conversation

mattjcly commented May 7, 2025 • edited Loading

New API

Decoding-only helper

Output embedding copy

mtmd_image_tokens management functions

ngxson left a comment • edited Loading

Choose a reason for hiding this comment

ngxson left a comment • edited Loading

Choose a reason for hiding this comment

mattjcly commented May 8, 2025

ngxson commented May 9, 2025

mattjcly commented May 7, 2025 •

edited

Loading

`mtmd_image_tokens` management functions

ngxson left a comment •

edited

Loading

ngxson left a comment •

edited

Loading