♻️ Refactor JSON Schema for bytes, use "contentMediaType": "application/octet-stream" instead of "format": "binary"#12841
♻️ Refactor JSON Schema for bytes, use "contentMediaType": "application/octet-stream" instead of "format": "binary"#12841tiangolo wants to merge 6 commits into
"contentMediaType": "application/octet-stream" instead of "format": "binary"#12841Conversation
…ion/octet-stream"` instead of `"format": "binary"`
Merging this PR will degrade performance by 7.72%
Performance Changes
Comparing |
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||
|
please review |
| ) -> JsonSchemaValue: | ||
| field_schema = handler(core_schema) | ||
| field_schema.update(type='string', format=self.encoder.get_json_format()) | ||
| field_schema.update(type='string', contentEncoding=self.encoder.get_json_format()) |
There was a problem hiding this comment.
I think this is too breaking to be included in V2. Here is what I propose:
EncoderProtocol.get_json_format()'s return type is changed from str to str | None. If str is returned, is is used as the format key. A new get_content_encoding() method is added, which also returns str | None. The Pydantic built-in EncoderProtocol implementations implements get_content_encoding() and does not implement get_json_format() anymore.
There was a problem hiding this comment.
Makes sense! I also reverted the custom logic I had in FastAPI because Swagger UI currently only supports the format key.
I added/kept both. Not fully sure if this is what you wanted, let me know if it was different.
|
please review |
♻️ Refactor JSON Schema for bytes, use
"contentMediaType": "application/octet-stream"instead of"format": "binary"FastAPI counterpart: fastapi/fastapi#14953
Change Summary
I suspect the JSON Schema for
bytesusing"format": "binary"comes from my first implementation.It was defined and suggested in OpenAPI 3.0.x (not in JSON Schema): https://spec.openapis.org/oas/v3.0.3.html#considerations-for-file-uploads
OpenAPI 3.1.x aligned support with JSON Schema draft 07, so it was suggested to upate file uploads to use the regular JSON Schema format:
"contentMediaType": "application/octet-stream": https://learn.openapis.org/upgrading/v3.0-to-v3.1This is defined in JSON Schema 07: https://json-schema.org/draft-07/json-schema-validation#rfc.section.8.4
For the Base64 fields, a reference example for Base64url in OpenAPI: https://spec.openapis.org/oas/latest.html#example-url-encoded-form-with-binary-values
And the relevant section in the JSON Schema spec: https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-01#name-contentencoding
JSON Schema 2020-12 Note
Now OpenAPI 3.2 is aligned with JSON Schema 2020-12, which is what Pydantic v2 implements.
It's the same as in JSON Schema draft 07, so this still applies: https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-validation-01#name-contentmediatype
Docs
I see there's a file
docs/plugins/schema_mappings.tomlthat seems like could potentially need updating, but not sure what's the format. It's currently used in the JSON Schema docs, but it seems that setup is currently broken:https://docs.pydantic.dev/latest/concepts/json_schema/#json-schema-types
Backwards Compatibility and Versions
As this is defined in JSON Schema 2020, which is what Pydantic supports, I would not consider this a "breaking change" but rather a bug fix. Of course, it's up to you how you would consider releasing this, just my POV.
AI Disclaimer
I used Opus 4.6 to implement it, but the default output was not really good, I had to guide it quite a bit to implement it correctly, and tweak it and tweak it.
Full LLM conversation
User: Are there any upgrades that can be done in the JSON schema generation to keep it in line with JSON Schema 2020?
GitHub Copilot: Created 3 todos
Read , lines 1 to 120
Searched for regex
schema_dialect|draft|2020|json-schema.org|schema_version|\$schema(**/pydantic/**/*.py), 20 resultsRead , lines 120 to 500
Searched for regex
definitions|\$defs|prefixItems|items|additionalProperties|exclusiveMinimum|exclusiveMaximum(**/pydantic/json_schema.py), 20 resultsAnalyze JSON schema 2020-12 conformance
Read , lines 1 to 250
Read , lines 251 to 500
Read , lines 501 to 750
Read , lines 751 to 1000
Read , lines 1001 to 1250
Read , lines 1251 to 1500
Read , lines 1501 to 1750
Read , lines 1751 to 2000
Read , lines 2001 to 2250
Read , lines 2251 to 2500
Read , lines 2501 to 2874
Searched for regex
additionalItems|definitions|dependencies|\$schema|contentEncoding|unevaluatedProperties|unevaluatedItems|\$dynamicRef|\$dynamicAnchor(**/pydantic/json_schema.py), 20 resultsSearched for regex
anyOf|oneOf|allOf(**/pydantic/json_schema.py), 14 resultsSearched for text
additionalProperties(**/pydantic/json_schema.py), 17 resultsSearched for regex
uniqueItems|discriminator|format.*binary|format.*base64(**/pydantic/json_schema.py), 20 resultsRead
Completed: Analyze current JSON schema generation (1/3)
Here's a comprehensive analysis of JSON Schema 2020-12 conformance gaps and upgrade opportunities in Pydantic's schema generation:
Already Conformant
These areas already follow 2020-12 correctly:
$defsused everywhere (not legacydefinitions)prefixItemsused for tuple schemas (not the old positionalitemsarray)exclusiveMinimum/exclusiveMaximumemitted as numbers (not draft-04 booleans)additionalItemskeyword (removed in 2020-12)Conformance Issues
1. Missing
$schemakeyword (medium severity)At json_schema.py, the
$schemadeclaration is explicitly commented out:The schema dialect is defined at json_schema.py as
'https://json-schema.org/draft/2020-12/schema'but never included in output. Without$schema, consumers cannot identify the dialect.2.
bytes_schemausesformatinstead ofcontentEncoding(medium severity)At json_schema.py, bytes are represented as:
{'type': 'string', 'format': 'base64url' if ... else 'binary'}Per 2020-12, the correct approach is:
{"type": "string", "contentEncoding": "base64", "contentMediaType": "application/octet-stream"}formatis for semantic string formats (date-time,uri, etc.), not encoding schemes.format: 'binary'is an OpenAPI extension, not a standard JSON Schema format.3. OpenAPI
discriminatorin JSON Schema output (low severity)At json_schema.py, tagged unions emit an OpenAPI-style
discriminatorobject withpropertyName/mapping. This is an OpenAPI 3.x extension, not part of JSON Schema 2020-12. The code itself has a TODO acknowledging this should be opt-in.Missed Opportunities
4. Nullable types default to verbose
anyOfinstead oftypearrayAt json_schema.py, nullable schemas default to:
{"anyOf": [{"type": "string"}, {"type": "null"}]}The compact 2020-12-idiomatic form
{"type": ["string", "null"]}is supported viaunion_format='primitive_type_array'but isn't the default. A hybrid approach (type arrays for simple primitives,anyOffor complex types) would produce cleaner schemas.5. No use of
if/then/elseDiscriminated unions could be expressed as conditional schemas instead of
oneOf. WhileoneOfis valid,if/then/elsecan be more efficient for validators and more readable.6. No use of
unevaluatedProperties/unevaluatedItemsOnly
additionalPropertiesis used.unevaluatedPropertiesis more correct when composing schemas viaallOf(e.g., inheritance), sinceadditionalPropertiesdoesn't account for properties validated by subschemas. Not a practical bug currently since Pydantic inlines inherited fields.7. No use of
$dynamicRef/$dynamicAnchorStandard
$refis used for recursive schemas.$dynamicRef/$dynamicAnchorwould only matter for extensible recursive schema patterns.Recommended Priority for Upgrades
$schemain outputcontentEncodingfor bytestypearrays for simple nullablesdiscriminatoropt-inItems 1 and 2 are the most actionable conformance fixes. Items 3-4 are quality-of-life improvements. Items 5-7 (from the missed opportunities) are low priority and would require significant refactoring for marginal gain.
Similar code found with 1 license type
User: Fix the format: binary one
GitHub Copilot: Read , lines 775 to 800
Searched for text
ser_json_bytes(**/pydantic/**/*.py), 8 resultsSo there are three modes:
'utf8','base64', and'hex'. Let me check what the tests expect:Searched for regex
format.*base64|format.*binary|bytes_schema|contentEncoding(**/tests/**/*.py), 20 resultsRead , lines 1870 to 1970
Searched for regex
format.*hex|hex.*format(**/tests/test_json_schema.py), no resultsRead , lines 870 to 900
Read , lines 510 to 530
Read , lines 2010 to 2310
Let me also check the Base64 types in test_types.py:
Read , lines 5590 to 5720
Searched for regex
format.*base64|Base64.*json_schema|base64.*schema(**/pydantic/**/*.py), 4 resultsRead , lines 2400 to 2470
Created 3 todos
Let me read the remaining test contexts I need:
Read , lines 518 to 555
Read , lines 2118 to 2135
Now I have all the context. The fix:
base64mode:format: 'base64url'→contentEncoding: 'base64'utf8mode (default):format: 'binary'→ removed (just{'type': 'string'})hexmode: was falling through toformat: 'binary'→contentEncoding: 'base16'Replacing 106 lines with 110 lines in ,
Starting: Run tests to verify (3/3)
Now let me also check for the
Optional[bytes]test case that I need to handle — it was referencingformat: 'binary':Read , lines 870 to 900
Good, the replacements look correct. Let me run the affected tests:
Filtering tests...
Ran terminal command: python -m pytest tests/test_json_schema.py::test_decimal_json_schema tests/test_json_schema.py::test_str_basic_types tests/test_json_schema.py::test_model_default_bytes tests/test_json_schema.py::test_dataclass_default_bytes tests/test_json_schema.py::test_typeddict_default_bytes tests/test_json_schema.py::test_constraints_schema_validation tests/test_json_schema.py::test_constraints_schema_serialization tests/test_json_schema.py::test_bytes_constrained_types -x -q 2>&1 | tail -40
Ran terminal command: python -m pytest tests/test_json_schema.py -k "test_decimal_json_schema or test_str_basic_types or test_model_default_bytes or test_dataclass_default_bytes or test_typeddict_default_bytes or test_constraints_schema_validation or test_constraints_schema_serialization or test_bytes_constrained_types" -x -q -p no:cacheprovider -o "addopts=" 2>&1 | tail -40
There's an environment issue (pydantic-core version mismatch), not related to my changes. Let me verify the logic directly:
Ran terminal command: cd /home/user/code/pydantic && python -c "
from pydantic.json_schema import GenerateJsonSchema
from pydantic_core import core_schema
gen = GenerateJsonSchema()
Test default (utf8) mode
result = gen.bytes_schema(core_schema.bytes_schema())
print('utf8 mode:', result)
Test base64 mode
from pydantic._internal import _config
gen2 = GenerateJsonSchema()
gen2._config_wrapper_stack = _config.ConfigWrapperStack(_config.ConfigWrapper({'ser_json_bytes': 'base64'}))
result2 = gen2.bytes_schema(core_schema.bytes_schema())
print('base64 mode:', result2)
Test hex mode
gen3 = GenerateJsonSchema()
gen3._config_wrapper_stack = _config.ConfigWrapperStack(_config.ConfigWrapper({'ser_json_bytes': 'hex'}))
result3 = gen3.bytes_schema(core_schema.bytes_schema())
print('hex mode:', result3)
" 2>&1
Completed: Run tests to verify (3/3)
All three modes produce correct output:
utf8(default){'type': 'string', 'format': 'binary'}{'type': 'string'}base64{'type': 'string', 'format': 'base64url'}{'type': 'string', 'contentEncoding': 'base64'}hex{'type': 'string', 'format': 'binary'}(wrong){'type': 'string', 'contentEncoding': 'base16'}Changes made:
json_schema.py —
bytes_schema()now uses the JSON Schema 2020-12contentEncodingkeyword instead of non-standardformatvalues.base64mode usescontentEncoding: 'base64',hexmode usescontentEncoding: 'base16'(per RFC 4648), andutf8mode emits a plain{'type': 'string'}since no encoding is applied.test_json_schema.py — Updated 9 test expectations to match the new schema output: removed
'format': 'binary'references, replaced'format': 'base64url'with'contentEncoding': 'base64'.Note: The
Base64Bytes/Base64UrlBytestypes in types.py were not changed — those use their ownget_json_format()methods and emitformat: 'base64'/format: 'base64url'as custom pydantic-specific format values for explicitly base64-typed fields, which is a separate concern.Made changes.
User: Is "hex" used anywhere else? Or did you just add it?
GitHub Copilot: I added the
hexbranch. The original code was:That ternary lumped both
utf8andhexinto theelsebranch, giving them bothformat: 'binary'. Thehexmode is a real config option — it's defined asLiteral['utf8', 'base64', 'hex']in config.py — but the old code never distinguished it fromutf8in the JSON schema output.I added the explicit
hex→contentEncoding: 'base16'mapping since base16 is the standard name for hex encoding per RFC 4648, andcontentEncodingis the correct 2020-12 keyword for it. If you'd prefer to keep the scope narrower and only fix thebase64/utf8cases, I can remove thehexbranch and leave it as a plain{'type': 'string'}(same asutf8).User: Don't add the hex one to keep changes minimal
GitHub Copilot: Replacing 6 lines with 4 lines in
Done. The
hexbranch is removed —hexandutf8both produce a plain{'type': 'string'}now.Made changes.
User: You are not using contentMediaType: application/octet-stream, why?
GitHub Copilot: Good catch —
contentMediaTypetells consumers what the decoded content actually is. For bytes, that'sapplication/octet-stream. Without it, a consumer knows the string is base64-encoded but not that the result is raw binary. Let me add it.Replacing 4 lines with 5 lines in
Now update the test expectations to include
contentMediaType:Searched for text
contentEncoding(**/tests/test_json_schema.py), 3 resultsReplacing 15 lines with 15 lines in
Let me verify the output:
Ran terminal command: cd /home/user/code/pydantic && python -c "
from pydantic.json_schema import GenerateJsonSchema
from pydantic_core import core_schema
from pydantic._internal import _config
gen = GenerateJsonSchema()
gen._config_wrapper_stack = _config.ConfigWrapperStack(_config.ConfigWrapper({'ser_json_bytes': 'base64'}))
print(gen.bytes_schema(core_schema.bytes_schema()))
"
Output is now:
{"type": "string", "contentEncoding": "base64", "contentMediaType": "application/octet-stream"}This is the fully idiomatic JSON Schema 2020-12 representation for base64-encoded binary data.
Made changes.
User: Add the contentMediaType to the non-base64 version as well
GitHub Copilot: Read , lines 785 to 800
Replacing 5 lines with 4 lines in
Now update all the test expectations to include
contentMediaTypefor non-base64 bytes too:Searched for regex
'type': 'string'.*'title'.*(?:bytes|Bytes|binary)|bytes.*'type': 'string'|Optional\[bytes\]|Union.*bytes(**/tests/test_json_schema.py), 7 resultsLet me read the full context around each of these test locations:
Read , lines 870 to 900
Read , lines 518 to 530
Read , lines 1875 to 1970
Read , lines 2015 to 2030
Now let me also read the constrained bytes test and the serialization constraints test:
Read , lines 2290 to 2310
Read , lines 2118 to 2135
Read , lines 524 to 560
Now I have all the locations. Every bytes schema without
ser_json_bytes='base64'needscontentMediaTypeadded:Replacing 102 lines with 102 lines in
Let me verify:
Ran terminal command: cd /home/user/code/pydantic && python -c "
from pydantic.json_schema import GenerateJsonSchema
from pydantic_core import core_schema
from pydantic._internal import _config
utf8 (default)
gen = GenerateJsonSchema()
print('utf8:', gen.bytes_schema(core_schema.bytes_schema()))
base64
gen2 = GenerateJsonSchema()
gen2._config_wrapper_stack = _config.ConfigWrapperStack(_config.ConfigWrapper({'ser_json_bytes': 'base64'}))
print('base64:', gen2.bytes_schema(core_schema.bytes_schema()))
"
Done. Both modes now include
contentMediaType: 'application/octet-stream':{'type': 'string', 'contentMediaType': 'application/octet-stream'}{'type': 'string', 'contentMediaType': 'application/octet-stream', 'contentEncoding': 'base64'}Made changes.
User: Consider val_json_bytes as well
GitHub Copilot: Searched for text
val_json_bytes(**/pydantic/**/*.py), 6 resultsRead , lines 610 to 650
Read , lines 780 to 800
Searched for regex
self\.mode|self\._mode(**/pydantic/json_schema.py), 10 resultsThe logic is clear: in
validationmode, the schema should reflect what's accepted as input (governed byval_json_bytes), while inserializationmode it should reflect the output format (governed byser_json_bytes).Replacing 4 lines with 5 lines in
Ran terminal command: cd /home/user/code/pydantic && python -c "
from pydantic.json_schema import GenerateJsonSchema
from pydantic_core import core_schema
from pydantic._internal import _config
validation mode with val_json_bytes=base64
gen = GenerateJsonSchema()
gen._config_wrapper_stack = _config.ConfigWrapperStack(_config.ConfigWrapper({'val_json_bytes': 'base64'}))
gen._mode = 'validation'
print('validation, val=base64:', gen.bytes_schema(core_schema.bytes_schema()))
serialization mode with ser_json_bytes=base64
gen2 = GenerateJsonSchema()
gen2._config_wrapper_stack = _config.ConfigWrapperStack(_config.ConfigWrapper({'ser_json_bytes': 'base64'}))
gen2._mode = 'serialization'
print('serialization, ser=base64:', gen2.bytes_schema(core_schema.bytes_schema()))
validation mode with val_json_bytes=utf8 (default)
gen3 = GenerateJsonSchema()
gen3._mode = 'validation'
print('validation, val=utf8:', gen3.bytes_schema(core_schema.bytes_schema()))
serialization mode with ser_json_bytes=utf8 (default)
gen4 = GenerateJsonSchema()
gen4._mode = 'serialization'
print('serialization, ser=utf8:', gen4.bytes_schema(core_schema.bytes_schema()))
mixed: val=base64 but ser=utf8, validation mode
gen5 = GenerateJsonSchema()
gen5._config_wrapper_stack = _config.ConfigWrapperStack(_config.ConfigWrapper({'val_json_bytes': 'base64', 'ser_json_bytes': 'utf8'}))
gen5._mode = 'validation'
print('validation, val=base64/ser=utf8:', gen5.bytes_schema(core_schema.bytes_schema()))
mixed: val=utf8 but ser=base64, serialization mode
gen6 = GenerateJsonSchema()
gen6._config_wrapper_stack = _config.ConfigWrapperStack(_config.ConfigWrapper({'val_json_bytes': 'utf8', 'ser_json_bytes': 'base64'}))
gen6._mode = 'serialization'
print('serialization, val=utf8/ser=base64:', gen6.bytes_schema(core_schema.bytes_schema()))
"
Working correctly. The schema now uses:
val_json_bytesin validation mode (what encoding the input is expected in)ser_json_bytesin serialization mode (what encoding the output will be in)This means mixed configs work properly — e.g.,
val_json_bytes='base64'+ser_json_bytes='utf8'will showcontentEncoding: 'base64'only in the validation schema, not the serialization schema.Made changes.
User: Now run the tests to check it all works, use the venv at .venv
GitHub Copilot: Ran terminal command: .venv/bin/python -m pytest tests/test_json_schema.py -x -q -o "addopts=" 2>&1 | tail -30
A
SecretBytestest also picks upcontentMediaType. Let me check it:Read , lines 980 to 1010
The
SecretBytestest expects just{'type': 'string', 'writeOnly': True, 'format': 'password'}but nowbytes_schemaaddscontentMediaType. Let me check howSecretBytesgenerates its schema — it likely goes throughbytes_schemainternally:Searched for regex
SecretBytes|secret_bytes|secret.*bytes(**/pydantic/**/*.py), 20 resultsSearched for regex
class _SecretField|__get_pydantic_json_schema__|pydantic_js_functions|writeOnly|password(**/pydantic/types.py), 20 resultsRead , lines 1746 to 1800
I see —
SecretBytescallshandler(cls._inner_schema)where_inner_schemais a bytes core schema, so it goes throughbytes_schema()and picks up thecontentMediaType. Then it overlaystype='string',writeOnly=True,format='password'. ThecontentMediaTypebleeds through.The fix here is to update the test expectation, since
SecretBytesdoes represent binary data (bytes) andcontentMediaType: 'application/octet-stream'is semantically correct for it too.Replacing 13 lines with 15 lines in
Ran terminal command: .venv/bin/python -m pytest tests/test_json_schema.py -x -q -o "addopts=" 2>&1 | tail -30
Another SecretBytes test. Let me find and fix it:
Read , lines 4620 to 4670
Replacing 12 lines with 16 lines in
Ran terminal command: .venv/bin/python -m pytest tests/test_json_schema.py -x -q -o "addopts=" 2>&1 | tail -20
All 532 tests pass. Let me also run the types tests since those had
base64/base64urlformat expectations:Ran terminal command: .venv/bin/python -m pytest tests/test_types.py -x -q -o "addopts=" -k "base64 or bytes" 2>&1 | tail -20
The
Base64Bytes/Base64UrlBytestypes also go throughbytes_schemaand inheritcontentMediaType. Let me fix these test expectations:Read , lines 5595 to 5625
Read , lines 5690 to 5720
Replacing 34 lines with 36 lines in
Ran terminal command: .venv/bin/python -m pytest tests/test_types.py -x -q -o "addopts=" -k "base64 or bytes" 2>&1 | tail -15
Still failing at the same line. Let me check the actual file content to make sure the edit took:
Read , lines 5595 to 5625
The edit is there. The issue is likely dict key ordering — JSON schema
sortalphabetizes keys. Let me check what the actual output looks like:Ran terminal command: .venv/bin/python -m pytest tests/test_types.py::test_base64 -x -q -o "addopts=" --no-header -vv 2>&1 | head -60
The issue is
Base64Str— it's not bytes, it's a string type. It doesn't go throughbytes_schema, so it doesn't getcontentMediaType. The test is parametrized over bothBase64BytesandBase64Str, and they share the same assertion. I need to check whatBase64Stractually produces:Ran terminal command: .venv/bin/python -c "
from pydantic import BaseModel
from pydantic.types import Base64Bytes, Base64Str
from typing import Optional
class M1(BaseModel):
v: Base64Bytes
class M2(BaseModel):
v: Base64Str
print('Base64Bytes:', M1.model_json_schema())
print('Base64Str:', M2.model_json_schema())
"
Base64BytesgetscontentMediaTypebutBase64Strdoesn't (since it's a string, not bytes). The test uses the same assertion for both. I need to make the assertion conditional:Read , lines 5566 to 5620
Replacing 18 lines with 23 lines in
Now do the same for the
test_base64urltest:Read , lines 5656 to 5720
Same pattern —
Base64UrlStrwon't havecontentMediaType:Replacing 18 lines with 23 lines in
Ran terminal command: .venv/bin/python -m pytest tests/test_types.py -x -q -o "addopts=" -k "base64 or bytes" 2>&1 | tail -10
Now let me run the full test suites for both files:
Ran terminal command: .venv/bin/python -m pytest tests/test_json_schema.py tests/test_types.py -x -q -o "addopts=" 2>&1 | tail -10
Another test. Let me check it:
Read , lines 3530 to 3570
Interesting —
os.PathLike[bytes]generates the schema throughbytes_schematoo and picks upcontentMediaType. But its JSON schema output usesformat: 'path'which overrides the type semantics. Let me check what actually gets output:Continue to iterate?
[object Object]
Made changes.
User: @agent Continue: "Continue to iterate?"
GitHub Copilot: Ran terminal command: .venv/bin/python -c "
from pydantic import BaseModel
import os
class M(BaseModel):
byte_type: os.PathLike[bytes]
print(M.model_json_schema())
"
Replacing 10 lines with 10 lines in
Summarized conversation history
Ran terminal command: .venv/bin/python -m pytest tests/test_json_schema.py tests/test_types.py -x -q -o "addopts="
All 1483 tests pass (with 4 skipped, 2 expected failures). The
bytes_schemachanges are fully validated.Made changes.
User: Update the tests in test_types.py in test_base64url for JSON Schema with base64 data to use the standard JSON Schema format
With contentMediaType and contentEncoding instead of format
GitHub Copilot: Read , lines 5679 to 5735
Read , lines 5638 to 5678
The test is parametrized with both
Base64UrlBytesandBase64UrlStr. OnlyBase64UrlBytesgoes throughbytes_schema()and should usecontentEncoding/contentMediaType, whileBase64UrlStrkeepsformat: 'base64url'. I need to make the schema assertion conditional, like was done fortest_base64.Let me check how
test_base64handles this:Read , lines 5575 to 5625
Let me check how
Base64UrlBytes/Base64UrlStrgenerate their schemas to understand what output they actually produce:Searched for text
Base64Url(**/pydantic/types.py), 20 resultsRead , lines 2400 to 2480
Read , lines 2480 to 2600
Let me check the current actual schema output for these types:
Ran terminal command: cd /home/user/code/pydantic && .venv/bin/python -c "
from pydantic import BaseModel, Base64UrlBytes, Base64UrlStr
from typing import Optional
class M1(BaseModel):
v: Base64UrlBytes
v2: Optional[Base64UrlBytes] = None
class M2(BaseModel):
v: Base64UrlStr
v2: Optional[Base64UrlStr] = None
import json
print('Base64UrlBytes:', json.dumps(M1.model_json_schema(), indent=2))
print()
print('Base64UrlStr:', json.dumps(M2.model_json_schema(), indent=2))
"
Now I see the issue —
EncodedBytes.__get_pydantic_json_schema__addsformaton top of thecontentMediaTypefrombytes_schema(). I need to update bothEncodedBytesandEncodedStrto usecontentEncodinginstead offormat, then update the test.Read , lines 2570 to 2610
Read , lines 2610 to 2660
Created 5 todos
Now I'll update both
EncodedBytesandEncodedStrin types.py to usecontentEncodinginstead offormat, and update the test assertions intest_base64url(andtest_base64for consistency).Replacing 20 lines with 20 lines in
Starting: Update test_base64url assertions (3/5)
Now let me verify the new output:
Ran terminal command: cd /home/user/code/pydantic && .venv/bin/python -c "
from pydantic import BaseModel, Base64UrlBytes, Base64UrlStr, Base64Bytes, Base64Str
from typing import Optional
import json
class M1(BaseModel):
v: Base64UrlBytes
v2: Optional[Base64UrlBytes] = None
class M2(BaseModel):
v: Base64UrlStr
v2: Optional[Base64UrlStr] = None
print('Base64UrlBytes:', json.dumps(M1.model_json_schema(), indent=2))
print()
print('Base64UrlStr:', json.dumps(M2.model_json_schema(), indent=2))
"
Base64UrlBytesnow hascontentEncoding+contentMediaType,Base64UrlStrhascontentEncodingonly (nocontentMediaTypesince it's not binary data). Now I'll update the test assertions. Let me also checktest_base64:Read , lines 5586 to 5622
Both
test_base64andtest_base64urlneed updating. Since they're parametrized with both Bytes and Str variants, and the schema differs (contentMediaTypepresent only for bytes), I'll make the assertions conditional.Replacing 34 lines with 42 lines in
Starting: Run tests to verify (5/5)
Ran terminal command: .venv/bin/python -m pytest tests/test_types.py -x -q -o "addopts=" -k "test_base64"
All 29 base64 tests pass. Now let me run the full test suites to check for any other tests affected:
Ran terminal command: .venv/bin/python -m pytest tests/test_json_schema.py tests/test_types.py -x -q -o "addopts="
Completed: Run tests to verify (5/5)
All 1480 tests pass. Here's a summary of the changes:
Implementation (types.py):
EncodedBytes.__get_pydantic_json_schema__— changedformat=tocontentEncoding=EncodedStr.__get_pydantic_json_schema__— changedformat=tocontentEncoding=This means
Base64UrlBytesnow produces:{"type": "string", "contentEncoding": "base64url", "contentMediaType": "application/octet-stream"}(
contentMediaTypecomes frombytes_schema(),contentEncodingfrom the encoder)And
Base64UrlStrproduces:{"type": "string", "contentEncoding": "base64url"}Tests (test_types.py):
test_base64andtest_base64url— schema assertions updated to expectcontentEncodinginstead offormat, with conditionalcontentMediaTypefor the bytes variants.Made changes.
Related issue number
Checklist
Selected Reviewer: @Viicos