Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions docs/concepts/materializing-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ In the above example, we define a Redis table called `nycTaxiDemoFeature` and ma

## Feature Backfill

It is also possible to backfill the features for a particular time range, like below. If the `BackfillTime` part is not specified, it's by default to `now()` (i.e. if not specified, it's equivalent to `BackfillTime(start=now, end=now, step=timedelta(days=1))`).
It is also possible to backfill the features till a particular time, like below. If the `BackfillTime` part is not specified, it's by default to `now()` (i.e. if not specified, it's equivalent to `BackfillTime(start=now, end=now, step=timedelta(days=1))`).

```python
client = FeathrClient()
Expand All @@ -46,9 +46,30 @@ settings = MaterializationSettings("nycTaxiMaterializationJob",
client.materialize_features(settings)
```

Note that if you don't have features available in `now`, you'd better specify a `BackfillTime` range where you have features.
Feathr will submit a materialization job for each of the step for performance reasons. I.e. if you have `BackfillTime(start=datetime(2022, 2, 1), end=datetime(2022, 2, 20), step=timedelta(days=1))`, Feathr will submit 20 jobs to run in parallel for maximum performance.

Also, Feathr will submit a materialization job for each of the step for performance reasons. I.e. if you have `BackfillTime(start=datetime(2022, 2, 1), end=datetime(2022, 2, 20), step=timedelta(days=1))`, Feathr will submit 20 jobs to run in parallel for maximum performance.
Please note that the parameter forms a closed interval, which means that both start and end date will be included in materialized job,

Please also note that the `start` and `end` parameter means the cutoff start and end time. For example, we might have a dataset like below:

| TrackingID | UserId | Spending | Date |
| ---------- | ------ | -------- | ---------- |
| 1 | 1 | 10 | 2022/05/01 |
| 2 | 2 | 15 | 2022/05/02 |
| 3 | 3 | 19 | 2022/05/03 |
| 4 | 1 | 18 | 2022/05/04 |
| 5 | 3 | 7 | 2022/05/05 |

If we call the API like this:
`BackfillTime(start=datetime(2022, 5, 2), end=datetime(2022, 5, 4), step=timedelta(days=1))`

Feathr will trigger 3 jobs:

- job 1 will backfill all data till 2022/05/02 (so feature using data in 2022/05/01 will also be materialized)
- job 2 will backfill all data till 2022/05/03 (so feature using data in 2022/05/01 and 2022/05/02 will also be materialized)
- job 3 will backfill all data till 2022/05/04 (so feature using data in 2022/05/01, 2022/05/02, and 2022/05/03 will also be materialized)

This is in particular useful for aggregated features. For example, if there is a feature defined as `user_purchase_in_last_2_days`, this will grantee that all the materialized features come with the right result.

More reference on the APIs:

Expand Down
4 changes: 4 additions & 0 deletions docs/how-to-guides/streaming-source-ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,7 @@ res = client.multi_get_online_features('kafkaSampleDemoFeature', ['1', '2'], ['f
```

You can also refer to the [test case](../../feathr_project/test/test_azure_kafka_e2e.py) for more details.

## Kafka configuration

Please refer to the [Feathr Configuration Doc](./feathr-configuration-and-env.md#kafkasasljaasconfig) for more details on the credentials.
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


class BackfillTime:
"""Time range to materialize/backfill feature data.
"""Time range to materialize/backfill feature data. Please refer to https://linkedin.github.io/feathr/concepts/materializing-features.html#feature-backfill for a more detailed explanation.

Attributes:
start: start time of the backfill, inclusive.
Expand Down