Updates on docs (#950)

blrchen · web-flow · commit e1895bbb57f4 · 2023-01-10T14:09:41.000+08:00
diff --git a/docs/how-to-guides/jdbc-cosmos-notes.md b/docs/how-to-guides/jdbc-cosmos-notes.md
@@ -1,6 +1,6 @@
 ---
 layout: default
-title: Using SQL databases, CosmosDb, and ElasticSearch with Feathr
+title: Using SQL databases, CosmosDb with Feathr
 parent: Feathr How-to Guides
 ---
 
@@ -14,7 +14,7 @@ To use SQL database as source, we need to create `JdbcSource` instead of `HdfsSo
 
 A `JdbcSource` can be created with follow statement:
 
-```
+```python
 source = feathr.JdbcSource(name, url, dbtable, query, auth)
 ```
 
@@ -43,7 +43,7 @@ When the `auth` parameter is set to `TOKEN`, you need to set following environme
 
 I.e., if you created a source:
 
-```
+```python
 src1_name="source1"
 source1 = JdbcSource(name=src1_name, url="jdbc:...", dbtable="table1", auth="USERPASS")
 anchor1 = FeatureAnchor(name="anchor_name",
@@ -87,17 +87,16 @@ credential = DefaultAzureCredential()
 token = credential.get_token("https://management.azure.com/.default").token()
 ```
 
-
 ## Using SQL database as the offline store
 
 To use SQL database as the offline store, you can use `JdbcSink` as the `output_path` parameter of `FeathrClient.get_offline_features`, e.g.:
-```
+```python
 name = 'output'
 sink = client.JdbcSink(name, some_jdbc_url, dbtable, "USERPASS")
 ```
 
 Then you need to set following environment variables before submitting job:
-```
+```python
 os.environ[f"{name.upper()}_USER"] = "some_user_name"
 os.environ[f"{name.upper()}_PASSWORD"] = "some_magic_word"
 client.get_offline_features(..., output_path=sink)
@@ -106,69 +105,4 @@ client.get_offline_features(..., output_path=sink)
 
 ## Using SQL database as the online store
 
-Same as the offline, create JDBC sink and add it to the `MaterializationSettings`, set corresponding environment variables, then use it with `FeathrClient.materialize_features`.
-
-## Using CosmosDb as the online store
-
-To use CosmosDb as the online store, create `CosmosDbSink` and add it to the `MaterializationSettings`, then use it with `FeathrClient.materialize_features`, e.g..
-
-```
-name = 'cosmosdb_output'
-sink = CosmosDbSink(name, some_cosmosdb_url, some_cosmosdb_database, some_cosmosdb_collection)
-os.environ[f"{name.upper()}_KEY"] = "some_cosmosdb_api_key"
-client.materialize_features(..., materialization_settings=MaterializationSettings(..., sinks=[sink]))
-```
-
-Feathr client doesn't support getting feature values from CosmosDb, you need to use [official CosmosDb client](https://pypi.org/project/azure-cosmos/) to get the values:
-
-```
-from azure.cosmos import exceptions, CosmosClient, PartitionKey
-
-client = CosmosClient(some_cosmosdb_url, some_cosmosdb_api_key)
-db_client = client.get_database_client(some_cosmosdb_database)
-container_client = db_client.get_container_client(some_cosmosdb_collection)
-doc = container_client.read_item(some_key)
-feature_value = doc['feature_name']
-```
-
-## Using ElasticSearch as online store
-
-To use ElasticSearch as the online store, create `ElasticSearchSink` and add it to the `MaterializationSettings`, then use it with `FeathrClient.materialize_features`, e.g..
-
-```
-name = 'es_output'
-sink = ElasticSearchSink(name, host="esnode1:9200", index="someindex", ssl=False, auth=True)
-os.environ[f"{name.upper()}_USER"] = "some_user_name"
-os.environ[f"{name.upper()}_PASSWORD"] = "some_magic_word"
-client.materialize_features(..., materialization_settings=MaterializationSettings(..., sinks=[sink]))
-```
-
-Feathr client doesn't support getting feature values from ElasticSearch, you need to use [official ElasticSearch client](https://pypi.org/project/elasticsearch/) to get the values, e.g.:
-
-```
-from elasticsearch import Elasticsearch
-
-es = Elasticsearch("http://esnode1:9200")
-resp = es.get(index="someindex", id="somekey")
-print(resp['_source'])
-```
-
-The feature generation job uses `upsert` mode to write data, so after the job the index may contain stale data, the recommended way is to create a new index each time, and use index alias to seamlessly switch over, detailed information can be found from [the official doc](https://www.elastic.co/guide/en/elasticsearch/reference/master/aliases.html), currently Feathr doesn't provide any helper to do this.
-
-NOTE:
-+ You can use no auth or basic auth only, no other authentication methods are supported.
-+ If you enabled SSL, you need to make sure the certificate on ES nodes is trusted by the Spark cluster, otherwise the job will fail.
-
-## Using ElasticSearch as offline store
-
-To use ElasticSearch as the offline store, create `ElasticSearchSink` and use it with `FeathrClient.get_offline_features`, e.g..
-
-```
-name = 'es_output'
-sink = ElasticSearchSink(name, host="esnode1", index="someindex", ssl=False, auth=True)
-os.environ[f"{name.upper()}_USER"] = "some_user_name"
-os.environ[f"{name.upper()}_PASSWORD"] = "some_magic_word"
-client.get_offline_features(..., output_path=sink)
-```
-
-NOTE: The feature joining process doesn't generate meaningful keys for each document, you need to make sure the output dataset can be accessed/queried by some other ways such as full-text-search, otherwise you may have to fetch all the data from ES to get what you look for.
+Same as the offline, create JDBC sink and add it to the `MaterializationSettings`, set corresponding environment variables, then use it with `FeathrClient.materialize_features`.
diff --git a/registry/purview-registry/README.md b/registry/purview-registry/README.md
@@ -1,5 +1,3 @@
-# SQL-Based Registry for Feathr
+# Purview-Based Registry for Feathr
 
-This is the reference implementation of [the Feathr API spec](./api-spec.md), base on SQL databases instead of PurView.
-
-Please note that this implementation uses iterations of `select` to retrieve graph lineages, this approach is very inefficient and should **not** be considered as production-ready. We only suggest to use this implementation for testing/researching purposes.
+This is the reference implementation of [the Feathr API spec](./api-spec.md), base on Purview.
diff --git a/registry/sql-registry/README.md b/registry/sql-registry/README.md
@@ -1,5 +1,3 @@
 # SQL-Based Registry for Feathr
 
-This is the reference implementation of [the Feathr API spec](./api-spec.md), base on SQL databases instead of PurView.
-
-Please note that this implementation uses iterations of `select` to retrieve graph lineages, this approach is very inefficient and should **not** be considered as production-ready. We only suggest to use this implementation for testing/researching purposes.
+This is the reference implementation of [the Feathr API spec](./api-spec.md), base on SQL databases.