fs/ggml: prevent runtime panics on malformed or corrupt GGUF inputs#14489
Open
maralcbr wants to merge 2 commits intoollama:mainfrom
Open
fs/ggml: prevent runtime panics on malformed or corrupt GGUF inputs#14489maralcbr wants to merge 2 commits intoollama:mainfrom
maralcbr wants to merge 2 commits intoollama:mainfrom
Conversation
readGGUFString, readGGUFArray, and the tensor-info decoder all cast user-supplied uint64 values to int without first validating them. On 64-bit platforms any value larger than math.MaxInt64 wraps to a negative integer, which then causes make() to panic at runtime with "makeslice: len out of range". This was observed in the wild when loading multi-shard Unsloth UD-Q3_K_XL GGUFs that contain mxfp4 tensors: a misaligned reader (present in prior versions where size-validation seeks were interleaved with tensor-info reads) could land on weight data whose bytes decoded as a huge string length, crashing the server instead of returning an error. Fix by: - Checking the raw uint64 length in readGGUFString before casting and returning a descriptive error if it exceeds 1 GiB. - Checking the array element count in readGGUFArray before casting and returning an error if it exceeds 2^32. - Validating that a tensor's dims field does not exceed GGML_MAX_DIMS (4) before allocating the shape slice. Tests added for all three cases to ensure errors are returned rather than panics produced. Made-with: Cursor
…ruptInputs The three regression tests added in the previous commit were nested inside TestWriteGGUF, which tests write+round-trip behavior. These tests have nothing to do with writing; they verify that Decode returns a descriptive error (rather than panicking) when given malformed inputs. Move them — along with their writeMinimalGGUF/writeKVString helpers — into a new top-level TestDecodeGGUFCorruptInputs function. Add a comment explaining that the oversized-string and oversized-array cases would have triggered a "makeslice: len out of range" runtime panic on unpatched code (int(math.MaxUint64) == -1, make([]T,-1) panics). Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
readGGUFString,readGGUFArray, and the tensor-info decoder all cast user-supplieduint64values tointwithout first validating them. On 64-bit platforms any value larger thanmath.MaxInt64wraps to a negative integer, which then causesmake()to panic at runtime with:This was reproduced loading multi-shard Unsloth UD-Q3_K_XL GGUFs that contain
mxfp4tensors: prior to the separation of tensor-info reads from size-validation seeks (now in a dedicated post-processing loop), a misaligned file reader could land on raw weight data whose bytes decoded as an enormous string length, crashing the entire Ollama server process instead of returning a descriptive error.Changes
readGGUFString: validate the rawuint64length before casting toint; return a descriptive error if it exceeds 1 GiB (far beyond any legitimate GGUF string).readGGUFArray: validate the element count before casting toint; return an error if it exceeds2^32.gguf.Decodetensor loop: validate thatdimsdoes not exceedGGML_MAX_DIMS(4) before allocating the shape slice, matching the llama.cpp spec.All three fixes convert would-be runtime panics into well-formed errors that propagate up cleanly.
Tests
Three new sub-tests added to
TestWriteGGUF:oversized_string_length_returns_error— covers bothmaxUint64andmaxInt64+1(the overflow boundary), and just over the 1 GiB capoversized_array_length_returns_error—maxUint64element counttoo_many_tensor_dims_returns_error—dims = 5 > GGML_MAX_DIMSTest plan
go test ./fs/ggml/...passes (all new sub-tests return errors rather than panicking)TestWriteGGUFround-trip tests still passollama createcompletes without server panic