I'm trying to thinking about a way for pydantic to communicate extra field information to hypothesis which is:
- reusable by other libraries - e.g. doesn't use hypothesis types
- doesn't require any understanding of pydantic internals - e.g. not based on pydantic core schema
- can be extended without further integration discussion - e.g. is a proper protocol, not a list of types
For reference here is the hypothesis plugin from pydantic V1. The types we'd like to support with this system are as follows:
- EmailStr - a string that is a valid email address, could crudely be just a regex
- NameEmail - email and name in the
name <email> format, can be a regex + email
- PyObject (now ImportString) - string that can represent anything, the hypothesis plugin used a random attribute of
math
- Color - could be a regex
- luhn valid card number - currently generated by trial and error
- IPvAnyAddress - either IPv4 or IPv6
- JsonWrapper - a JSON string, possibly with a type hint
Everything else I think is covered by using Annotated and things already defined by annotated-types.
Note in V2 some of the above are implemented as arguments to Annotated (albeit with an alias), some are legitimate custom types.
One idea I thought of was to use JSON Schema - you provide a method or property on either type or argument to
Annotated which returned some JSON Schema. But looking at the above list, I don't think JSON Schema would help with many.
Therefore here's my proposal:
annotated-types defines a new property or method on types and arguments to Annotated which returns the following
pieces of information (could be a tuple, a dict with specific keys, or a dataclass defined herein):
documentation_example - a canonical example of the datatype as might be shown in documentation, e.g. [email protected]
random_example - a random varying example of the datatype, e.g. [email protected]
type_code - a string that libraries can use to identify the type (e.g. email), and thereby decide to do more powerful things - e.g. hypothesis has a better strategy for generating random email addresses than pydantic will. Would be None if no type_code exists for a field.
The idea is that annotated-types defines:
- the above data structure
- a list of agreed
type_codes, starting with the ones from above
Example usage
EmailStr: would emit ExtraTypeInfo('email', '[email protected]', '[email protected]').
hypothesis would ignore the crude random example and use it's own strategy for email addresses since it recognises the email type code.
Color: would emit ExtraTypeInfo('color', '#ff0000', '#00ff00'), if hypothesis doesn't recognize color it could
fallback to using the random example generated by pydantic.
If a user wanted their own UKPostCode type (alias of Annotated[str, UKPostCodeMetadata]), then UKPostCodeMetadata could emit ExtraTypeInfo(None, 'W1A 1AA', 'sp119dg'), None for type code since no type code exists for uk post codes, hypothesis would use just use the random example 'sp119dg'.
In theory another tool could use this data (e.g. for generating documentation) with no knowledge of hypothesis or pydantic.
I'm trying to thinking about a way for pydantic to communicate extra field information to hypothesis which is:
For reference here is the hypothesis plugin from pydantic V1. The types we'd like to support with this system are as follows:
name <email>format, can be a regex + emailmathEverything else I think is covered by using
Annotatedand things already defined byannotated-types.Note in V2 some of the above are implemented as arguments to
Annotated(albeit with an alias), some are legitimate custom types.One idea I thought of was to use JSON Schema - you provide a method or property on either type or argument to
Annotatedwhich returned some JSON Schema. But looking at the above list, I don't think JSON Schema would help with many.Therefore here's my proposal:
annotated-typesdefines a new property or method on types and arguments toAnnotatedwhich returns the followingpieces of information (could be a tuple, a dict with specific keys, or a dataclass defined herein):
documentation_example- a canonical example of the datatype as might be shown in documentation, e.g.[email protected]random_example- a random varying example of the datatype, e.g.[email protected]type_code- a string that libraries can use to identify the type (e.g.email), and thereby decide to do more powerful things - e.g. hypothesis has a better strategy for generating random email addresses than pydantic will. Would beNoneif notype_codeexists for a field.The idea is that annotated-types defines:
type_codes, starting with the ones from aboveExample usage
EmailStr: would emitExtraTypeInfo('email', '[email protected]', '[email protected]').hypothesis would ignore the crude random example and use it's own strategy for email addresses since it recognises the
emailtype code.Color: would emitExtraTypeInfo('color', '#ff0000', '#00ff00'), if hypothesis doesn't recognizecolorit couldfallback to using the random example generated by pydantic.
If a user wanted their own
UKPostCodetype (alias ofAnnotated[str, UKPostCodeMetadata]), thenUKPostCodeMetadatacould emitExtraTypeInfo(None, 'W1A 1AA', 'sp119dg'),Nonefor type code since no type code exists for uk post codes, hypothesis would use just use the random example'sp119dg'.In theory another tool could use this data (e.g. for generating documentation) with no knowledge of hypothesis or pydantic.