Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit e1895bb

Browse files
authored
Updates on docs (#950)
1 parent 308321d commit e1895bb

File tree

3 files changed

+9
-79
lines changed

3 files changed

+9
-79
lines changed
Lines changed: 6 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: default
3-
title: Using SQL databases, CosmosDb, and ElasticSearch with Feathr
3+
title: Using SQL databases, CosmosDb with Feathr
44
parent: Feathr How-to Guides
55
---
66

@@ -14,7 +14,7 @@ To use SQL database as source, we need to create `JdbcSource` instead of `HdfsSo
1414

1515
A `JdbcSource` can be created with follow statement:
1616

17-
```
17+
```python
1818
source = feathr.JdbcSource(name, url, dbtable, query, auth)
1919
```
2020

@@ -43,7 +43,7 @@ When the `auth` parameter is set to `TOKEN`, you need to set following environme
4343

4444
I.e., if you created a source:
4545

46-
```
46+
```python
4747
src1_name="source1"
4848
source1 = JdbcSource(name=src1_name, url="jdbc:...", dbtable="table1", auth="USERPASS")
4949
anchor1 = FeatureAnchor(name="anchor_name",
@@ -87,17 +87,16 @@ credential = DefaultAzureCredential()
8787
token = credential.get_token("https://management.azure.com/.default").token()
8888
```
8989

90-
9190
## Using SQL database as the offline store
9291

9392
To use SQL database as the offline store, you can use `JdbcSink` as the `output_path` parameter of `FeathrClient.get_offline_features`, e.g.:
94-
```
93+
```python
9594
name = 'output'
9695
sink = client.JdbcSink(name, some_jdbc_url, dbtable, "USERPASS")
9796
```
9897

9998
Then you need to set following environment variables before submitting job:
100-
```
99+
```python
101100
os.environ[f"{name.upper()}_USER"] = "some_user_name"
102101
os.environ[f"{name.upper()}_PASSWORD"] = "some_magic_word"
103102
client.get_offline_features(..., output_path=sink)
@@ -106,69 +105,4 @@ client.get_offline_features(..., output_path=sink)
106105

107106
## Using SQL database as the online store
108107

109-
Same as the offline, create JDBC sink and add it to the `MaterializationSettings`, set corresponding environment variables, then use it with `FeathrClient.materialize_features`.
110-
111-
## Using CosmosDb as the online store
112-
113-
To use CosmosDb as the online store, create `CosmosDbSink` and add it to the `MaterializationSettings`, then use it with `FeathrClient.materialize_features`, e.g..
114-
115-
```
116-
name = 'cosmosdb_output'
117-
sink = CosmosDbSink(name, some_cosmosdb_url, some_cosmosdb_database, some_cosmosdb_collection)
118-
os.environ[f"{name.upper()}_KEY"] = "some_cosmosdb_api_key"
119-
client.materialize_features(..., materialization_settings=MaterializationSettings(..., sinks=[sink]))
120-
```
121-
122-
Feathr client doesn't support getting feature values from CosmosDb, you need to use [official CosmosDb client](https://pypi.org/project/azure-cosmos/) to get the values:
123-
124-
```
125-
from azure.cosmos import exceptions, CosmosClient, PartitionKey
126-
127-
client = CosmosClient(some_cosmosdb_url, some_cosmosdb_api_key)
128-
db_client = client.get_database_client(some_cosmosdb_database)
129-
container_client = db_client.get_container_client(some_cosmosdb_collection)
130-
doc = container_client.read_item(some_key)
131-
feature_value = doc['feature_name']
132-
```
133-
134-
## Using ElasticSearch as online store
135-
136-
To use ElasticSearch as the online store, create `ElasticSearchSink` and add it to the `MaterializationSettings`, then use it with `FeathrClient.materialize_features`, e.g..
137-
138-
```
139-
name = 'es_output'
140-
sink = ElasticSearchSink(name, host="esnode1:9200", index="someindex", ssl=False, auth=True)
141-
os.environ[f"{name.upper()}_USER"] = "some_user_name"
142-
os.environ[f"{name.upper()}_PASSWORD"] = "some_magic_word"
143-
client.materialize_features(..., materialization_settings=MaterializationSettings(..., sinks=[sink]))
144-
```
145-
146-
Feathr client doesn't support getting feature values from ElasticSearch, you need to use [official ElasticSearch client](https://pypi.org/project/elasticsearch/) to get the values, e.g.:
147-
148-
```
149-
from elasticsearch import Elasticsearch
150-
151-
es = Elasticsearch("http://esnode1:9200")
152-
resp = es.get(index="someindex", id="somekey")
153-
print(resp['_source'])
154-
```
155-
156-
The feature generation job uses `upsert` mode to write data, so after the job the index may contain stale data, the recommended way is to create a new index each time, and use index alias to seamlessly switch over, detailed information can be found from [the official doc](https://www.elastic.co/guide/en/elasticsearch/reference/master/aliases.html), currently Feathr doesn't provide any helper to do this.
157-
158-
NOTE:
159-
+ You can use no auth or basic auth only, no other authentication methods are supported.
160-
+ If you enabled SSL, you need to make sure the certificate on ES nodes is trusted by the Spark cluster, otherwise the job will fail.
161-
162-
## Using ElasticSearch as offline store
163-
164-
To use ElasticSearch as the offline store, create `ElasticSearchSink` and use it with `FeathrClient.get_offline_features`, e.g..
165-
166-
```
167-
name = 'es_output'
168-
sink = ElasticSearchSink(name, host="esnode1", index="someindex", ssl=False, auth=True)
169-
os.environ[f"{name.upper()}_USER"] = "some_user_name"
170-
os.environ[f"{name.upper()}_PASSWORD"] = "some_magic_word"
171-
client.get_offline_features(..., output_path=sink)
172-
```
173-
174-
NOTE: The feature joining process doesn't generate meaningful keys for each document, you need to make sure the output dataset can be accessed/queried by some other ways such as full-text-search, otherwise you may have to fetch all the data from ES to get what you look for.
108+
Same as the offline, create JDBC sink and add it to the `MaterializationSettings`, set corresponding environment variables, then use it with `FeathrClient.materialize_features`.
Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
# SQL-Based Registry for Feathr
1+
# Purview-Based Registry for Feathr
22

3-
This is the reference implementation of [the Feathr API spec](./api-spec.md), base on SQL databases instead of PurView.
4-
5-
Please note that this implementation uses iterations of `select` to retrieve graph lineages, this approach is very inefficient and should **not** be considered as production-ready. We only suggest to use this implementation for testing/researching purposes.
3+
This is the reference implementation of [the Feathr API spec](./api-spec.md), base on Purview.

registry/sql-registry/README.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
11
# SQL-Based Registry for Feathr
22

3-
This is the reference implementation of [the Feathr API spec](./api-spec.md), base on SQL databases instead of PurView.
4-
5-
Please note that this implementation uses iterations of `select` to retrieve graph lineages, this approach is very inefficient and should **not** be considered as production-ready. We only suggest to use this implementation for testing/researching purposes.
3+
This is the reference implementation of [the Feathr API spec](./api-spec.md), base on SQL databases.

0 commit comments

Comments
 (0)