Tissuebox is a pure Pythonic schema validator which takes advantage of Python's functional style programming to provide simple yet powerful validation framework. The standard usage would be validating incoming JSON objects upon http requests or to validate any Python dict in other common scenarios.
Use pip to install Tissuebox
pip install tissuebox
Tissuebox requires Python 3.7 however we are considering to add support for earlier versions of Python3
Assume the incoming JSON object or a python dict which contains hotel details and we will build upon this example.
payload = {
"name": "Park Sheraton",
"available": True,
"price_per_night": 270,
"email": "[email protected]",
"web": "www.sheraton.com",
}You can use tissuebox to define a schema to validate the payload against basic data types and validate
using validate method.
from tissuebox import validate
from tissuebox.basic import boolean, integer, string
schema = {
'name': string,
'available': boolean,
'price_per_night': integer
}
validate(payload, schema)will return
TrueA tissuebox schema is simply a dict where keys are payload keys and values are type_functions to which the payload
value would be passed. A type_function simply accepts a single parameter and returns a tuple with two
items (boolean, msg).
Tissuebox aims to amass a collection of commonly used types to it's library. For now common data types like email
, url, rfc_datetime, geolocation are part of tissuebox's standard collections. You can contribute more via
Github.
from tissuebox import validate
from tissuebox.basic import email, integer, string, url
schema = {
'name': string,
'price_per_night': integer,
"email": email,
"web": url
}
validate(payload, schema)will return
TrueOne of the ways tissuebox stands our from other alternatives is, the type_functions are stored and passed around as
Python variables which is helpful in identifying the schema definition errors ahead of time as most IDEs will display
squiggly lines if the variables aren't resolved, while other frameworks like JsonSchema and Cerebrus pass types within
strings which is hard for IDEs to detect errors in the schema.
Defining a schema in a nested fashion is very straight forward which enables re-use schemas around. Consider if the
payload has an address field. We can define a separate schema as address_schema and pass it to the main schema as
below.
from tissuebox import validate
from tissuebox.basic import email, integer, string, url
payload = {
"name": "Park Shereton",
"available": True,
"price_per_night": 270,
"email": "[email protected]",
"web": "www.shereton.com",
"address": {
"street": "128 George St",
"city": "Sydney",
"state": "NSW",
"zip": 2000
}
}
address = {
"street": string,
"city": string,
"state": string,
"zip": integer
}
schema = {
'name': string,
'price_per_night': integer,
"email": email,
"web": url,
"address": address
}
validate(payload, schema)would return
TrueThe prefered method of defining nested schema is by using . dot as delimiter to represent nested fields of the payload
hierarchy. Apparently this comes up with the downside wherein if . dot itself is part of keys which would be an
unfortunate scenario. But it can improve the readability to a tremendous level. See it yourself how elegantly we can
express the schema once we introduce the address field to our payload.
schema = {
'name': string,
'price_per_night': integer,
"email": email,
"web": url,
"address.street": string,
"address.city": string,
"address.state": string,
"address.zip": integer
}The primary reason why we suggest the later method is we can quickly define a nested field with any depth without creating unnecessary schema objects in the middle.
Let us try enforcing that the field address.state must be one of 8 Australian states. Tissuebox let's you define an
enum using the {} i.e set() syntax. Look at the example below.
schema = {
'name': string,
'price_per_night': integer,
"email": email,
"web": url,
"address.state": {'ACT', 'NSW', 'NT', 'QLD', 'SA', 'TAS', 'VIC', 'WA'},
"address.zip": integer
}To have a feel how Tissuebox responds when we pass something which is not an Australian state
payload = {
"name": "Park Shereton",
"available": True,
"price_per_night": 270,
"email": "[email protected]",
"web": "www.shereton.com",
"address": {
"street": "128 George St",
"city": "Sydney",
"state": "TX",
"zip": 2000
}
}
errors = []
validate(hotel, schema, errors)would return
FalseYou can also collect errors in a list by passing it into the validate method. That way you can be informed not only the scema fails but why the schema fails
# The valud of the errors list will be
['["address"]["state"] is failing to be enum of `{\'SA\', \'QLD\', \'NT\', \'TAS\', \'VIC\', \'WA\', \'ACT\', \'NSW\'}`']
Let us assume the payload has staffs which is array of staff names.
payload = {
"name": "Park Shereton",
"email": "[email protected]",
"web": "www.shereton.com",
"staffs"["John Doe", "Jane Smith"]
}Now the schema simple looks as below
schema = {
'name': string,
"email": email,
"web": url,
"staffs": [string]
}So in order to declare an element as array simply use [] syntax, if it's array of string simply say [string]. If
it's array of cats simply say [cat]. Array syntax can be either empty or single length where the element means a
type_function or another nested schema.
There are two scenarios where Tissuebox implicitly handles the array.
- The incoming payload is simply list of dicts then Tissuebox knows that the given schema must be validated against all the items in the array.
- While declaring
.dot separated nested attribute, and any of the middle element is array, Tissuebox is aware of such fact and will iterate the validation automatically.
These two cases are implemented to make Tissuebox as intuitive as possible,
By now you would have observed that tissuebox schema is simply a collection of key:value pairs where value
contains the data type verified against. tissuebox defines them in the style of type_function which is simply a
boolean function that takes one or more parameters.
Let us assume you want to validate the zip code as a valid Australian one. Since tissuebox does't have a built-in type
function, for that purpose you can come up with your own type function as below. For brevity I've removed few fields in
the payload & schema.
>> >
def australian_zip(x):
... # https://www.etl-tools.com/regular-expressions/is-australian-post-code.html
...
x = str(x)
...
import re
...
return re.match(
r'^(0[289][0-9]{2})|([1345689][0-9]{3})|(2[0-8][0-9]{2})|(290[0-9])|(291[0-4])|(7[0-4][0-9]{2})|(7[8-9][0-9]{2})$',
x), "must be a valida Australian zip"
...
>> > hotel = {
...
"address": {
...
"zip": 200
...}
...}
>> >
>> > schema = {
...
"address.zip": australian_zip
...}
>> > errors = []
>> > validate(hotel, schema, errors)
False
>> > errors
['["address"]["zip"] must be a valida Australian zip"]In tissuebox type_functions always accept one argument which is the payload value. There are times for a type_function
it makes sense to accepts multiple parameters. To achieve that they are declared as Python's higher order functions.
Let us try validating where the price_per_night must be multiple of 50. Also let us declare the Yelp review rating of
a hotel must be between 1-5.
>> > from tissuebox import validate
>> > from tissuebox.basic import between, divisible, string
>> > schema = {
...
"name": string,
...
"rating": between(1, 5),
...
"price_per_night": divisible(50)
...}
>> >
>> > hotel = {
...
"name": "Park Shereton",
...
"price_per_night": 370,
...
"rating": 5.1
...}
>> >
>> > errors = []
>> > validate(hotel, schema, errors)
False
>> > errors
[
'["price_per_night"] is failing to be `divisible(50)`',
'["rating"] is failing to be `between(1, 5)`'
]For curiosity here is the implementation of divisible from Tissuebox library. It has been defined as a higher order
function which returns another function which always accepts single parameter. While writing custom validators you are
encouraged to use the same pattern.
def divisible(n):
def divisible(x):
return numeric(x) and numeric(n) and x % n == 0, "multiple of {}".format(n)
return divisibleAs we have observed tissuebox schema is a dict with key:value format. In Python keys in dicts are unique. It's a
terrible idea to redeclare same key since the data will be overridden.
Assume that you are attempting to do something like this
from tissuebox.basic import divisible, integer, positive, string
schema = {
'name': string,
'price_per_night': integer,
'price_per_night': positive,
'price_per_night': divisible(50),
"address.zip": integer
}Here price_per_night will be overridden by the latest declaration which must be avoided. This can be solved with
another special syntax which yet Pythonic
Simply use () to chain type_functions.
```python
from tissuebox.basic import divisible, integer, positive, string
schema = {
'name': string,
'price_per_night': (integer, positive, divisible(50)),
"address.zip": integer
}
```
Now Tissuebox will iterate all these conditions against price_per_night
While Tissuebox validates the values with type_functions, it only does so only for the values are found in the payload. Otherwise they were simply ignored silently.
In a situation where a specific value is expected in payload declared them as required function. And it's a common
scenario to combine them under () operator as described in the above.
from tissuebox.basic import integer, required, string
schema = {
'name': (required, string),
"address.city": (required, string),
"address.zip": integer
}- Tissuebox has lots of advantages than the current alternatives like jsonschema, cerebrus etc.
- Truly Pythonic and heavily relies on short & static methods. The schema definition itself takes full advantages of
Python's built-in syntax like
{}for enum,()for parameterized function,[]chaining multiple rules etc - Highly readable with concise schema definition.
- Highly extensible with ability to insert your own custom methods without complicated class inheritance.
- Ability to provide all the error messages upfront upon validation.
0 - Tissuebox needs to support primitive literals
validate(5, 5)would beTruewhilevalidate(4, 5)isFalse
1 - Tissuebox needs to validate basic primitives, Supported primitives are int, str, float, list, dict
, Decimal, bool, None
validate(5, int)would returnTruevalidate('hello', str)would returnTrue
2 - Tissuebox needs to validate array of primitives
valiate([1,2,3], [int])would returnTrue
3 - Tissuebox needs to validate array of mixed primitives
validate([1, 'hello', 'world', 2, 3, 4], [int, str])would returnTrue
4 - Tissuebox needs to support tissues. A tissue is a tiny function which takes 'single' argument and returns a boolean
validate('[email protected]', email)would returnTrue
5 - Tissuebox needs to support list of tissues
validate(['[email protected]', '[email protected]'], [email])would returnTrue
6 - Tissuebox needs to support list of mixed tissues
validate(['[email protected]', '[email protected]', 'www.duck.com'], [email, url])would returnTrue
7 - Tissuebox needs to support tissues with parameters
validate(9, lt(10))would returnTrue
8 - Tissuebox needs to support tissues with parameters
validate(9, lt(10))would returnTruevalidate(11, lt(10))would returnFalse
9 - Tissuebox must support {} syntax which refers to or condition also should work for list
validate(1, {int, str})isTruevalidate('Hello', {int, str})isTruevalidate(1.1, {int, str})isFalsevalidate([1, 2, 'hello', 'world'], [{int, str}])isTrue
10 - Tissuebox must support () syntax which refers to and condition also should work for list
validate(4, (divisible(2), lt(10)))isTruevalidate([2, 4, 6, 8], [(divisible(2), lt(10))])isTrue
11 - Tissuebox must support dict based schemas
s = {
'name': str,
'active': bool,
'age': int,
'pets': [str]
}
p = {
'name': 'Roger',
'active': True,
'age': 38,
'pets': ['Jimmy', 'Roger', 'Jessey']
}
validate(p, s)would return True
12 - Tissuebox must support sub schema, i.e schemas can be reused
kid = {
'name': str,
'age': int,
'grade': int
}
schema = {
'name': str,
'active': bool,
'age': int,
'pets': [str],
'kids': [kid]
}
payload = {
'name': 'Roger',
'active': True,
'age': 38,
'pets': ['Jimmy', 'Roger', 'Jessey'],
'kids': [
{
'name': "Billy",
'age': 10,
'grade': 4
},
{
'name': "Christina",
'age': 13,
'grade': 8
}
]
}
validate(payload, schema)would return True
13 - Tissuebox would be able to handle dot . separated keys.
In the above schema can be expressed using the below alternate syntax
schema = {
'name': str,
'active': bool,
'age': int,
'pets': [str],
'kids.name': str,
'kids.age': int,
'kids.grade': int
}- Add support for preemptive evaluation of schema, i.e (1,2) doesn't make sense, it would always be False. So evaluate once and cache it.