-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Add regex patterns to JSON schema for Decimal
type
#11987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Old tests have been updated to align with the new validation pattern for Decimal fields in the JSON schema.
…ith new test cases.
…ontain validaton pattern.
PR Change SummaryEnhanced JSON schema validation for Decimal fields by introducing a regex pattern based on max_digits and decimal_places parameters.
Modified Files
How can I customize these reviews?Check out the Hyperlint AI Reviewer docs for more information on how to customize the review. If you just want to ignore it on this PR, you can add the Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add |
CodSpeed Performance ReportMerging #11987 will not alter performanceComparing Summary
|
please review - thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. I've checked locally and the regex seems to correctly match.
Similar to #11420, we'd have to also add patterns when max_digits
xor decimal_places
is None. Would you be interested in pursuing?
refactor: Update tests to match the new regex
ca8c4f6
to
5874292
Compare
Thanks for reviewing! It seems the pattern already handles cases where I've added tests to cover these scenarios: @pytest.fixture
def get_decimal_pattern():
def pattern(max_digits=None, decimal_places=None) -> str:
filed = TypeAdapter(Annotated[Decimal, Field(max_digits=max_digits, decimal_places=decimal_places)])
return filed.json_schema()['anyOf'][1]['pattern']
return pattern
@pytest.mark.parametrize("valid_decimal", ["0.1", "0000.1", "11.1", "0011.1", "11111111.1"])
def test_decimal_pattern_max_digits_set_none(valid_decimal, get_decimal_pattern):
max_digits = None
decimal_places = 1
pattern = get_decimal_pattern(max_digits=max_digits, decimal_places=decimal_places)
assert re.fullmatch(pattern, valid_decimal) is not None
@pytest.mark.parametrize("valid_decimal", ["0.1", "000.1", "0.001000", "0000.001000"])
def test_decimal_pattern_decimal_places_set_none(valid_decimal, get_decimal_pattern):
max_digits = 3
decimal_places = None
pattern = get_decimal_pattern(max_digits=max_digits, decimal_places=decimal_places)
assert re.fullmatch(pattern, valid_decimal) is not None
@pytest.mark.parametrize("unvalid_decimal", ["0.0001", "111.1", "11.01"])
def test_decimal_pattern_with_unvalid_decimals_decimal_places_set_none(unvalid_decimal, get_decimal_pattern):
max_digits = 3
decimal_places = None
pattern = get_decimal_pattern(max_digits=max_digits, decimal_places=decimal_places)
assert re.fullmatch(pattern, unvalid_decimal) is None
@pytest.mark.parametrize("valid_decimal", ["11111111", "1111.11111", "0.00000001"])
def test_decimal_pattern_decimal_places_max_digits_set_none(valid_decimal, get_decimal_pattern):
max_digits = None
decimal_places = None
pattern = get_decimal_pattern(max_digits=max_digits, decimal_places=decimal_places)
assert re.fullmatch(pattern, valid_decimal) is not None |
Sorry what I said was confusing, I wanted to say that it would be nice to have simpler patterns generated for cases where |
refactor: Delete redundunt tests. test: Create new tests
…s are both set The pattern incorrectly counted trailing zeros in integer strings towards max_digits. For example, "1200" with max_digits=4 and decimal_places=2 would incorrectly match even thought pattern must allow only two integer places.
Thank you for clarifying your request. I've updated the implementation to generate simpler patterns for each combination of max_digits and decimal_places. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the complexity to implement this correctly (see my comments above), I'm starting to think if we should just define a simpler regex, describing the allowed characters (something like ^(?:\+|-)?[\d.]+$
in validation mode, and ^-?[\d.]+$
in serialization mode). This regex will also allow invalid inputs (but at least we are certain valid inputs wont be disallowed).
|
||
# Case 4: Both are None (no restrictions) | ||
else: | ||
pattern += r'\d*\.?\d*$' # look for arbitrary integer or decimal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows '.'
as an input, which can't be validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing this out.
However, r'\d*\.?\d*$'
is not the full pattern used for validation.
The complete pattern includes an additional regex component that prevents '.'
from being a valid input:
pydantic/pydantic/json_schema.py
Lines 682 to 684 in 88adb2d
pattern = ( | |
r'^(?!^[-+.]*$)[+-]?0*' # check it is not empty string and not one or sequence of ".+-" characters. | |
) |
Additionally, the edge case for '.'
is covered by the following test:
pydantic/tests/test_json_schema.py
Lines 7005 to 7009 in 88adb2d
@pytest.mark.parametrize('invalid_decimal', ['.', '-.', '..', '1.1.1', '0.0.0', '1..1', '-', '--']) | |
def test_decimal_pattern_reject_invalid_with_decimal_places_max_digits_unset(invalid_decimal, get_decimal_pattern): | |
pattern = get_decimal_pattern() | |
assert re.fullmatch(pattern, invalid_decimal) is None |
) | ||
|
||
# Case 2: Only max_digits is set | ||
elif max_digits is not None and decimal_places is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With max_digits
set to e.g. 5, it wrongfully matches '1000.1111111'
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback.
From my testing, the pattern appears to work correctly when max_digits=5
and the invalid value '1000.1111111'
is rejected as expected.
Here is a test demonstrating this behavior:
@pytest.fixture
def get_decimal_pattern():
def pattern(max_digits=None, decimal_places=None) -> str:
field = TypeAdapter(Annotated[Decimal, Field(max_digits=max_digits, decimal_places=decimal_places)])
return field.json_schema()['anyOf'][1]['pattern']
return pattern
@pytest.mark.parametrize('invalid_decimal', ['1000.1111111'])
def test_only_max_digits_set(invalid_decimal, get_decimal_pattern):
pattern = get_decimal_pattern(max_digits=5, decimal_places=None)
assert re.fullmatch(pattern, invalid_decimal) is None
Let me know if there's a specific case I may have missed!
I think in validation mode, it's better to use the pattern Also, the more complex regex patterns are only created in two cases:
I believe these cases don't come up very often, but if a user sets these options, they probably want the regex pattern to actually check For serialization mode, I agree that a simple regex like Please let me know what you think about this. I should also mention that it's possible I don't fully understand the purpose of the validation and serialization schemas or all the cases where they might be used. |
Decimal
type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dima-Bulavenko, @Jack70248 I'm adding you as a co-author for the initial work.
`tests/test_multi_body_errors.py::test_openapi_schema` failed with pydantic 2.12.0a1 due to pydantic/pydantic#11987. Since we're testing the exact contents of the JSON schema, the easiest fix seems to be to add version-dependent handling for this.
`tests/test_multi_body_errors.py::test_openapi_schema` failed with pydantic 2.12.0a1 due to pydantic/pydantic#11987. Since we're testing the exact contents of the JSON schema, the easiest fix seems to be to add version-dependent handling for this.
`tests/test_multi_body_errors.py::test_openapi_schema` failed with pydantic 2.12.0a1 due to pydantic/pydantic#11987. Since we're testing the exact contents of the JSON schema, the easiest fix seems to be to add version-dependent handling for this.
`tests/test_multi_body_errors.py::test_openapi_schema` failed with pydantic 2.12.0a1 due to pydantic/pydantic#11987. Since we're testing the exact contents of the JSON schema, the easiest fix seems to be to add version-dependent handling for this.
Change Summary
This PR adds a regular expression pattern to the JSON schema of
Decimal
fields to validate their string representation according to themax_digits
anddecimal_places
parameters of theField
class.Decimal
fields to align with the new validation pattern.max_digits
anddecimal_places
.You can run the new tests with the following command:
Related issue number
fixes #11016
fixes #11420
fixes #10867
Checklist
Selected Reviewer: @Viicos