-
Notifications
You must be signed in to change notification settings - Fork 243
Add Data Models in Feathr #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
c6acae3
Add Data Models in Feathr
hyingyang-linkedin 45a05ac
Update models.py
hyingyang-linkedin 0696648
Update models.py
hyingyang-linkedin 5cc163d
Update models.py
hyingyang-linkedin 53dfb5e
Update models.py
hyingyang-linkedin 9d9af5f
Update models.py
hyingyang-linkedin 0133fff
Merge branch 'model' of https://github.com/hyingyang-linkedin/feathr …
hyingyang-linkedin 50ddce9
Update models
hyingyang-linkedin 2cd0b9d
Merge branch 'model' of https://github.com/hyingyang-linkedin/feathr …
hyingyang-linkedin d07c922
Update comment based on PR review comments
hyingyang-linkedin dbe39db
Merge branch 'model' of https://github.com/hyingyang-linkedin/feathr …
hyingyang-linkedin 79de04d
To make Source of DerivedFeature to be one object, addine MultiFeaure…
hyingyang-linkedin 9e7e45b
Merge branch 'model' of https://github.com/hyingyang-linkedin/feathr …
hyingyang-linkedin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
|
|
||
| # Feathr Abstract backend Data Model Diagram | ||
|
|
||
| This file defines abstract backend data models diagram for feature registry. | ||
| [Python Code](./models.py) | ||
|
|
||
| ```mermaid | ||
| classDiagram | ||
| Project "1" --> "n" FeatureName : contains | ||
| Project "1" --> "n" Anchor : contains | ||
| FeatureName "1" --> "n" Feature : contains | ||
| Anchor "1" --> "n" Feature : contains | ||
| Feature <|-- AnchorFeature : extends | ||
| Feature <|-- DerivedFeature: extends | ||
| Feature --> Transformation | ||
| Feature --> Transformation : contains | ||
| Source <|-- DataSource: extends | ||
| Source <|-- MultiFeatureSource: extends | ||
| MultiFeatureSource "1" --> "1..n" FeatureSource: contains | ||
| AnchorFeature --> DataSource : contains | ||
| DerivedFeature --> MultiFeatureSource: contains | ||
|
|
||
| class Source{ | ||
| } | ||
| class DataSource{ | ||
| } | ||
| class FeatureSource{ | ||
| +FeatureNameId feature_name_id | ||
| } | ||
| class MultiFeatureSource{ | ||
| +List[FeatureSource] sources | ||
| } | ||
| class Feature{ | ||
| +FeatureId id | ||
| +FeatureNameId feature_namme_id | ||
| +Source source | ||
| +Transformation transformation | ||
| } | ||
| class AnchorFeature{ | ||
| +DataSource source | ||
| } | ||
| class DerivedFeature{ | ||
| +MultiFeatureSource source | ||
| } | ||
| class FeatureName{ | ||
| +FeatureNameId id | ||
| +ProjectId project_id | ||
| +List[FeatureId] feature_ids | ||
| } | ||
| class Project{ | ||
| +ProjectId id | ||
| +List[FeatureNameId] feature_name_ids | ||
| +List[AnchorId] anchor_ids | ||
| } | ||
| class Anchor{ | ||
| +AnchorId id | ||
| +ProjectId project_id | ||
| +DataSource source | ||
| +List[FeatureId] anchor_feature_ids | ||
| } | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,146 @@ | ||
| from pydantic import BaseModel | ||
| from typing import List | ||
|
|
||
| """ | ||
| This file defines abstract backend data models for feature registry. | ||
| Backend data models will be used by backend API server to talk to feature registry backend. | ||
| Purpose of this is to decouple backend data models from API specific data models. | ||
| For each feature registry provider/implementation, they will extend this abstract | ||
| data models and backend API. | ||
| Diagram of the data models: ./data-model-diagram.md | ||
| """ | ||
|
|
||
|
|
||
| class FeatureId(BaseModel): | ||
| """ | ||
| Id for Feature, it's unique ID represents Feature. | ||
| Id can be a simple string, int or complex key. | ||
| """ | ||
| id: str # id of a feature | ||
|
|
||
|
|
||
| class FeatureNameId(BaseModel): | ||
| """ | ||
| Id for FeatureName, it's unique ID represents FeatureName. | ||
| Id can be a simple string, int or complex key. | ||
| """ | ||
| id: str # id of a FeatureName | ||
|
|
||
|
|
||
| class AnchorId(BaseModel): | ||
| """ | ||
| Id for Anchor, it's unique ID represents Anchor. | ||
| Id can be a simple string, int or complex key. | ||
| """ | ||
| id: str # id of a anchor | ||
|
|
||
|
|
||
| class ProjectId(BaseModel): | ||
| """ | ||
| Id for Project, it's unique ID represents Project. | ||
| Id can be a simple string, int or complex key. | ||
| """ | ||
| id: str # id of a project | ||
|
|
||
|
|
||
| class Source(BaseModel): | ||
| """ | ||
| Source of the feature. | ||
| It defines where the feature is extracted or derived from. | ||
| """ | ||
| pass | ||
|
|
||
|
|
||
| class DataSource(Source): | ||
LeBenHL marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
| Data source of the feature. | ||
| It defines the raw data source the feature is extracted from. | ||
| """ | ||
| pass | ||
LeBenHL marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| class FeatureSource(BaseModel): | ||
| """ | ||
| Represents a feature source for a derived feature. That is, it is a source 'FeatureName' which is used for | ||
| creating other derived features. | ||
| """ | ||
| input_feature_name_id: FeatureNameId # Input feature name Key | ||
|
|
||
|
|
||
| class MultiFeatureSource(Source): | ||
| """ | ||
| Feature sources of the feature. | ||
| It defines one to many features where the feature is derived from. | ||
| """ | ||
| sources: List[FeatureSource] # All source features which the feature is derived from | ||
donegjookim marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| pass | ||
|
|
||
|
|
||
| class Transformation(BaseModel): | ||
| """ | ||
| The transformation of a Feature. | ||
| A transformation function represents the transformation logic to produce feature value from the source of FeatureAnchor | ||
| """ | ||
| pass | ||
|
|
||
|
|
||
| class Feature(BaseModel): | ||
| """ | ||
| Actual implementation of FeatureName. | ||
| An implementation defines where a feature is extracted from (Source) and how it is computed (Transformation). | ||
| The Source of a feature can be raw data sources and/or other features. | ||
| """ | ||
| id: FeatureId # Unique ID for Feature | ||
| feature_name_id: FeatureNameId # Id of the feature name that the feature belongs to | ||
| source: Source # Source can be either data source or feature source | ||
| transformation: Transformation # transformation logic to produce feature value | ||
|
|
||
|
|
||
| class AnchorFeature(Feature): | ||
| """ | ||
| Feature implementation of FeatureName which anchored to a data source. | ||
| """ | ||
| source: DataSource # Raw data source where the feature is extracted from | ||
|
|
||
|
|
||
| class DerivedFeature(Feature): | ||
| """ | ||
| Feature implementation that is derived from other FeatureNames. | ||
| """ | ||
| source: MultiFeatureSource # Source features where the feature is derived from | ||
|
|
||
|
|
||
| class FeatureName(BaseModel): | ||
| """ | ||
LeBenHL marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Named Feature Interface that can be backed by multiple Feature implementations across | ||
| different environments accessing different sources (data lake access for batch training, | ||
| KV store access for online serving). Each FeatureName is defined by feature producer. | ||
| Feature consumers reference a feature by that name to access that feature data, | ||
| agnostic of runtime environment. Each FeatureName also encloses attributes that does not | ||
| change across implementations. | ||
| """ | ||
| id: FeatureNameId # unique ID for FeatureName, used to extract data for current FeatureName | ||
| project_id: ProjectId # ID of the project the FeatureName belongs to | ||
| feature_ids: List[FeatureId] # List of ids of feature that the FeatureName has | ||
|
|
||
|
|
||
| class Project(BaseModel): | ||
| """ | ||
| Group of FeatureNames. It can be a project the team is working on, | ||
| or a namespace which related FeatureNames have. | ||
| """ | ||
| id: ProjectId # Unique ID of the project. | ||
| feature_name_ids: List[FeatureNameId] # List of feature name ids that the project has | ||
| anchor_ids: List[AnchorId] # List of Anchor ids that the project has | ||
|
|
||
|
|
||
| class Anchor(BaseModel): | ||
| """ | ||
| Group of AnchorFeatures which anchored on same DataSource. | ||
| This is mainly used by feature producer gather information about DataSource | ||
| and FeatureImplementations associated with the DataSource. | ||
| """ | ||
| id: AnchorId # Unique ID for Anchor | ||
| project_id: ProjectId # ID of Project that the anchor belongs to | ||
| source: DataSource # data source of the Anchor | ||
| anchor_feature_ids: List[FeatureId] # List of anchor features that the anchor has | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.