kv_override issue with string values

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

Expected llama-cpp-python to correctly override the model parameters when passing {"tokenizer.ggml.pre": "llama3"} to kv_override. 

# Current Behavior

The string value to override always appears to be empty upon running the model as `validate_override: Using metadata override (  str) 'tokenizer.ggml.pre' = ` indicates, and thus the model ends up using the default pre-tokenizer instead of the llama3 one.
Example output:
~~~
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from ./Meta-Llama-3-8B-Instruct.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = .
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 128256
llama_model_loader: - kv   3:                       llama.context_length u32              = 8192
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   5:                          llama.block_count u32              = 32
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 500000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 15
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,128256]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,128256]  = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,128256]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  17:                      tokenizer.ggml.merges arr[str,280147]  = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv  18:                tokenizer.ggml.bos_token_id u32              = 128000
llama_model_loader: - kv  19:                tokenizer.ggml.eos_token_id u32              = 128009
llama_model_loader: - kv  20:                    tokenizer.chat_template str              = {% set loop_messages = messages %}{% ...
llama_model_loader: - kv  21:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
validate_override: Using metadata override (  str) 'tokenizer.ggml.pre' = 
llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:                                             
llm_load_vocab: ************************************        
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!        
llm_load_vocab: CONSIDER REGENERATING THE MODEL             
llm_load_vocab: ************************************   
...
~~~

# Environment and Context

* WSL with Ubuntu 20.04

`$ lscpu`

~~~
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      48 bits physical, 48 bits virtual
CPU(s):                             16
On-line CPU(s) list:                0-15
Thread(s) per core:                 2
Core(s) per socket:                 8
Socket(s):                          1
Vendor ID:                          AuthenticAMD
CPU family:                         25
Model:                              80
Model name:                         AMD Ryzen 9 5900HX with Radeon Graphics
Stepping:                           0
CPU MHz:                            3293.809
BogoMIPS:                           6587.61
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          256 KiB
L1i cache:                          256 KiB
L2 cache:                           4 MiB
L3 cache:                           16 MiB
~~~

`$ uname -a`

Linux LAPTOP 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

* SDK version, e.g. for Linux:

```
$ python3 --version
Python 3.8.10

$ make --version
GNU Make 4.2.1
Built for x86_64-pc-linux-gnu

$ g++ --version
g++ (Ubuntu 13.1.0-8ubuntu1~20.04.2) 13.1.0
```

# Failure Information (for bugs)

# Steps to Reproduce

I'm running the following code:
~~~
my_model_path = "./Meta-Llama-3-8B-Instruct.Q4_K_M.gguf"
CONTEXT_SIZE = 8000
model = Llama(model_path=my_model_path, kv_overrides={"tokenizer.ggml.pre":"llama3"}, n_ctx=CONTEXT_SIZE)
~~~

# Findings 
I checked out the code using a debugger and the problem seems to be on the following line:
~~~
ctypes.memmove(
  self._kv_overrides_array[i].value.str_value,
  v_bytes,
  min(len(v_bytes), 128),
)
~~~
For some reason memmove is not properly copying the string. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kv_override issue with string values #1487

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Findings

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

kv_override issue with string values #1487

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Findings

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions