Simple declarative data extraction and loading in Python, featuring:
- 🍰 Ease of use: Data extraction is performed in a simple, declarative types.
- ⚙ XML / HTML / JSON Extraction: Extraction can be performed across a wide array of structured data
- 🐼 Pandas Integration: Results are easily castable to Pandas Dataframes and Series.
- 😀 Custom Output Classes: Results can be automatically loaded into autogenerated dataclasses, or custom model types.
- 🚀 Performance: XML loading is supported by the excellent and fast lxml library, JSON is supported by UltraJSON for fast parsing, and jsonpath_ng for flexible data extraction.
To extract data from XML, use this import statement, and see the example below:
from yankee.xml.schema import Schema, fields as f, CSSSelectorTo extract data from JSON, use this import statement, and see the example below:
from yankee.xml.schema import Schema, fields as f, JSONPathTo extract data from HTML, use this import statement:
from yankee.html.schema import Schema, fields as f, CSSSelectorTo extract data from Python objects (either objects or dictionaries), use this import statement:
from yankee.base.schema import Schema, fields as fComplete documentation is available on Read The Docs
Data extraction from XML. By default, data keys are XPath expressions, but can also be CSS selectors.
Take this:
<xmlObject>
<name>Johnny Appleseed</name>
<birthdate>2000-01-01</birthdate>
<something>
<many>
<levels>
<deep>123</deep>
</levels>
</many>
</something>
</xmlObject>Do this:
from yankee.xml.schema import Schema, fields as f, CSSSelector
class XmlExample(Schema):
name = f.String("./name")
birthday = f.Date(CSSSelector("birthdate"))
deep_data = f.Int("./something/many/levels/deep")
XmlExample().load(xml_doc)Get this:
{
"name": "Johnny Appleseed",
"birthday": datetime.date(2000, 1, 1),
"deep_data": 123
}Data extraction from JSON. By default, data keys are implied from the field names, but can also be JSONPath expressions
Take this:
{
"name": "Johnny Appleseed",
"birthdate": "2000-01-01",
"something": [
{"many": {
"levels": {
"deep": 123
}
}}
]
}Do this:
from yankee.json.schema import Schema, fields as f
class JsonExample(Schema):
name = f.String()
birthday = f.Date("birthdate")
deep_data = f.Int("something.0.many.levels.deep")Get this:
{
"name": "Johnny Appleseed",
"birthday": datetime.date(2000, 1, 1),
"deep_data": 123
}