Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
23a4e30
Package Setup Update for Data Files (#206)
GBBBAS Apr 28, 2023
7093be0
Time weighted average api (#207)
cching95 May 2, 2023
27786fd
fix unit tests for twa api (#208)
cching95 May 2, 2023
8c2dfe2
Performance Updates for Turbodbc and Databricks SQL Connector (#211)
GBBBAS May 3, 2023
c9cacf5
Update to install pyarrow (#212)
GBBBAS May 3, 2023
bef5ef6
Turbodbc Performance Updates (#213)
GBBBAS May 3, 2023
c175b26
Update pyarrow install (#214)
GBBBAS May 3, 2023
37fb4f8
Performance Updates, Transformer and Destinations Updates (#215)
GBBBAS May 4, 2023
d57ab47
Delta Merge Destination Component (#220)
GBBBAS May 5, 2023
e2e2f36
Fledge transformer component (#221)
JamesKnBr May 5, 2023
f7df870
Process Control Data Model to Delta (#225)
GBBBAS May 9, 2023
2dee3d2
Rest API Destination Component (#223)
GBBBAS May 9, 2023
3b5de55
Hashicorp Vault Secret Component (#227)
GBBBAS May 9, 2023
2bb3288
Spark ADLS Gen2 Service Principal Connect Utility (#232)
GBBBAS May 9, 2023
40e5bd2
Azure Key Vault Secret Component (#230)
GBBBAS May 9, 2023
a6b2c0a
SSIP PI Transformer Components (#233)
GBBBAS May 9, 2023
7dad487
Process Control Data Model (#234)
GBBBAS May 9, 2023
ce87ffb
For Each Batch Updates (#238)
GBBBAS May 10, 2023
4bf1e33
added timezone to docs (#236)
rodalynbarce May 10, 2023
d5a6645
Delta Table Creation using Pydantic Models for Column (#240)
GBBBAS May 10, 2023
b87e376
Remove Duplicates from PCDM to Delta (#241)
GBBBAS May 10, 2023
dd936e8
Interpolation at Time Function and API fixes (#244)
cching95 May 11, 2023
3d17976
Union of Types for ADLS Gen 2 Credentials Parameter (#246)
GBBBAS May 11, 2023
f4078ca
Updates for AMI Meter Classes (#243)
vbayon May 11, 2023
82a072a
Combine ChangeTypes for Merge of PCDM Data (#242)
GBBBAS May 11, 2023
f2e75f7
Fix for Unioned Types on ACLs (#247)
GBBBAS May 11, 2023
21f079a
Support 3.3.* Pyspark Versions (#251)
GBBBAS May 12, 2023
c52a084
Fix setup.py package (#252)
GBBBAS May 12, 2023
a9da2a5
Interpolation at Time API and unit tests (#253)
cching95 May 12, 2023
7a4fc53
Hotfix for resample and interpolate API (#257)
cching95 May 12, 2023
a5c05c9
Refactor parse_date for final use case (#255)
rodalynbarce May 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@
"python.testing.pytestEnabled": true,
"python.testing.cwd": "${workspaceFolder}",
"python.analysis.extraPaths": ["${workspaceFolder}"],
// "python.defaultInterpreterPath": "~/micromamba/envs/rtdip-sdk/bin/python",
"terminal.integrated.env.linux":{
"PYTHONPATH": "${workspaceFolder}:${env:PYTHONPATH}"
},
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/sonarcloud.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ run-name: ${{ github.event.workflow_run.display_title }}
on:
workflow_run:
workflows: [PR]
types: [requested]
types: [completed]
branches-ignore:
- "develop"
- "main"
Expand Down
13 changes: 10 additions & 3 deletions .github/workflows/sonarcloud_reusable.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,16 @@ on:
required: true

jobs:
job_test_python_latest_version:
job_test_python_pyspark_latest_version:
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
os: [ubuntu-latest]
python-version: ["3.10"]
pyspark: ["3.3.2"]
delta-spark: ["2.3.0"]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
Expand All @@ -58,7 +61,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: 3.11
python-version: ${{ matrix.python-version }}

- name: Install Boost
run: |
Expand All @@ -74,12 +77,16 @@ jobs:
uses: mamba-org/provision-with-micromamba@main
with:
environment-file: environment.yml
extra-specs: |
python=${{ matrix.python-version }}
pyspark=${{ matrix.pyspark }}
delta-spark=${{ matrix.delta-spark }}
cache-env: true

- name: Test
run: |
mkdir -p coverage-reports
coverage run -m pytest --junitxml=xunit-reports/xunit-result-unitttests.xml tests && tests_ok=true
coverage run -m pytest --junitxml=xunit-reports/xunit-result-unitttests.xml tests
coverage xml --omit "venv/**,maintenance/**,xunit-reports/**" -i -o coverage-reports/coverage-unittests.xml
echo Coverage `coverage report --omit "venv/**" | grep TOTAL | tr -s ' ' | cut -d" " -f4`

Expand Down
16 changes: 14 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,22 @@ on:
workflow_call:

jobs:
job_test_python_previous_version:
job_test_python_pyspark_versions:
defaults:
run:
shell: bash -l {0}
strategy:
matrix:
os: [ubuntu-latest]
python-version: ["3.8", "3.9", "3.10"]
pyspark: ["3.3.0", "3.3.1", "3.3.2"]
include:
- pyspark: "3.3.0"
delta-spark: "2.2.0"
- pyspark: "3.3.1"
delta-spark: "2.3.0"
- pyspark: "3.3.2"
delta-spark: "2.3.0"
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
Expand All @@ -51,11 +59,15 @@ jobs:
uses: mamba-org/provision-with-micromamba@main
with:
environment-file: environment.yml
extra-specs: |
python=${{ matrix.python-version }}
pyspark=${{ matrix.pyspark }}
delta-spark=${{ matrix.delta-spark }}
cache-env: true

- name: Test
run: |
mkdir -p coverage-reports-previous
coverage run -m pytest --junitxml=xunit-reports/xunit-result-unitttests.xml tests && tests_ok=true
coverage run -m pytest --junitxml=xunit-reports/xunit-result-unitttests.xml tests
coverage xml --omit "venv/**,maintenance/**,xunit-reports/**" -i -o coverage-reports-previous/coverage-unittests.xml
echo Coverage `coverage report --omit "venv/**" | grep TOTAL | tr -s ' ' | cut -d" " -f4`
5 changes: 3 additions & 2 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{
"version": "0.2.0",
"configurations": [
"configurations":
[
{
"name": "Attach to Python Functions",
"type": "python",
Expand All @@ -19,6 +20,6 @@
"PYTEST_ADDOPTS": "--no-cov"
},
"justMyCode": false
}
},
]
}
6 changes: 3 additions & 3 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@
"python.testing.pytestArgs": [
"--cov=.",
"--cov-report=xml:cov.xml",
"tests"
"tests",
"-vv"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
"python.testing.cwd": "${workspaceFolder}",
// "python.testing.cwd": "${workspaceFolder}",
"python.analysis.extraPaths": ["${workspaceFolder}"],
// "python.defaultInterpreterPath": "~/micromamba/envs/rtdip-sdk/bin/python",
"terminal.integrated.env.osx":{
"PYTHONPATH": "${workspaceFolder}:${env:PYTHONPATH}"
},
Expand Down
15 changes: 15 additions & 0 deletions docs/api/deployment/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,18 @@ To deploy the RTDIP APIs from Docker Hub, follow the steps below:
```bash
az functionapp config container set --name <function_app_name> --resource-group <resource_group_name> --docker-custom-image-name rtdip/api:azure-<version>
```

### Environment Variables

#### Azure Active Directory
1. Once Authentication has been configured on the Azure Function App correctly, it is required to set the following Environment Variable with the Tenant ID of the relevant Active Directory:
- TENANT_ID

#### Databricks
1. The following Environment Variables are required and the values can be retrieved from your Databricks SQL Warehouse or Databricks Cluster:
- DATABRICKS_SQL_SERVER_HOSTNAME
- DATABRICKS_SQL_HTTP_PATH

#### ODBC Driver
1. To allow the APIs to leverage Turbodbc for connectivity and possible performance improvements, it is possible to set the following environment variable:
- RTDIP_ODBC_CONNECTION=turbodbc
77 changes: 77 additions & 0 deletions docs/domains/process_control/data_model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Process Control Data Model

The Process Control Data Model consists of two key components:

1. **Metadata** about the sensor/object/measurement such as Description, Unit of Measure, Status and also provides metadata used in queries such as Step logic used in interpolation.
2. **Events** contains transactional data and is simply capturing the name of the sensor/object/measurement, the timestamp of the event, the status of the event recording and the value.

## Data Model

``` mermaid
erDiagram
METADATA ||--o{ EVENTS : contains
METADATA {
string TagName PK
string Description
string UoM
string DataType
boolean Step
string Status
dict Properties "Key Value pairs of varying metadata"
}
EVENTS {
string TagName PK
timestamp EventTime PK
string Status
dynamic Value "Value can be of different Data Types"
}
```

## References

| Reference | Description |
|------------|--------------------|
|[IEC 61850](https://en.wikipedia.org/wiki/IEC_61850#:~:text=IEC%2061850%20is%20an%20international,architecture%20for%20electric%20power%20systems.)|Relevant description to IEC 61850|
|[IEC CIM](https://en.wikipedia.org/wiki/Common_Information_Model_(electricity))|Relevant description to IEC CIM|

## Mappings

### Fledge OPC UA South Plugin

[Fledge](https://www.lfedge.org/projects/fledge/) provides support for sending data between various data sources and data destinations. The mapping below is for the [OPC UA South Pugin](https://fledge-iot.readthedocs.io/en/latest/plugins/fledge-south-opcua/index.html) that can be sent to message brokers like Kafka, Azure IoT Hub etc.

This mapping is performed by the [RTDIP Fledge to PCDM Component](../../sdk/code-reference/pipelines/transformers/spark/fledge_json_to_pcdm.md) and can be used in an [RTDIP Ingestion Pipeline.](../../sdk/pipelines/framework.md)

| From Data Model | From Field | From Type | To Data Model |To Field| To Type | Mapping Logic |
|------|----|---------|------|------|--------|-----------|
| Fledge OPC UA | Object ID | string | EVENTS| TagName | string | |
| Fledge OPC UA | EventTime | string | EVENTS| EventTime | timestamp | Converted to a timestamp |
| | | | EVENTS| Status | string | Can be defaulted in [RTDIP Fledge to PCDM Component](../../sdk/code-reference/pipelines/transformers/spark/fledge_json_to_pcdm.md) otherwise Null |
| Fledge OPC UA | Value | string | EVENTS | Value | dynamic | Converts Value into either a float number or string based on how it is received in the message |

### OPC Publisher

[OPC Publisher](https://learn.microsoft.com/en-us/azure/industrial-iot/overview-what-is-opc-publisher) connects to OPC UA assets and publishes data to the Microsoft Azure Cloud's IoT Hub.

The mapping below is performed by the [RTDIP OPC Publisher to PCDM Component](../../sdk/code-reference/pipelines/transformers/spark/opc_publisher_json_to_pcdm.md) and can be used in an [RTDIP Ingestion Pipeline.](../../sdk/pipelines/framework.md)

| From Data Model | From Field | From Type | To Data Model |To Field| To Type | Mapping Logic |
|------|----|---------|------|------|--------|-----------|
| OPC Publisher | DisplayName | string | EVENTS| TagName | string | |
| OPC Publisher | SourceTimestamp | string | EVENTS| EventTime | timestamp | Converted to a timestamp |
| OPC Publisher | StatusCode.Symbol | string | EVENTS| Status | string | Null values can be overriden in the [RTDIP OPC Publisher to PCDM Component](../../sdk/code-reference/pipelines/transformers/spark/opc_publisher_json_to_pcdm.md) |
| OPC Publisher | Value.Value | string | EVENTS | Value | dynamic | Converts Value into either a float number or string based on how it is received in the message |

### SSIP PI

[SSIP PI](https://bakerhughesc3.ai/oai-solution/shell-sensor-intelligence-platform/) connects to Osisoft PI Historians and sends the data to the Cloud.

The mapping below is performed by the RTDIP SSIP PI to PCDM Component and can be used in an [RTDIP Ingestion Pipeline.](../../sdk/pipelines/framework.md)

| From Data Model | From Field | From Type | To Data Model |To Field| To Type | Mapping Logic |
|------|----|---------|------|------|--------|-----------|
| SSIP PI | TagName | string | EVENTS| TagName | string | |
| SSIP PI | EventTime | string | EVENTS| EventTime | timestamp | |
| SSIP PI | Status | string | EVENTS| Status | string | |
| SSIP PI | Value | dynamic | EVENTS | Value | dynamic | |

Binary file added docs/domains/process_control/images/iot_hub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/domains/process_control/images/mqtt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 58 additions & 0 deletions docs/domains/process_control/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Process Control Domain Overview

For process control systems, RTDIP provides the ability to consume data from these sources, transform it and store the data in an open source format to enable:

- Data Science, ML and AI applications to consume the data
- Real time data in Digital Twins
- BI and Analytics
- Reporting

## Process Control Systems

Process control systems monitor, control and safeguard production operations and generate vast amounts of data. Typical industry use cases include:

- Electricity Generation, Transmission and Distribution
- Chemicals, Gas, Oil Production and Distribution
- LNG Processing and Product Refining

Process control systems record variables such as temperature, pressure, flow etc and automatically make adjustments to maintain preset specifications in a technical process.

This data can be made available to other systems over a number of protocols, such as [OPC UA.](https://opcfoundation.org/about/opc-technologies/opc-ua/) The protocols in turn make the data available to connectors that can send the data onwards to other systems and the cloud.

## Architecture

``` mermaid
graph LR
A(Process Control) --> B(OPC UA Server);
B --> C(Connectors);
C --> D(Message Broker);
D --> E(RTDIP);
E --> F[(Destinations)];
```

### Connectors

A number of connectors are available from various suppliers. Some open source options include:

<center>[![Fledge](https://www.lfedge.org/wp-content/uploads/2019/09/fledge-horizontal-color.svg){width=40%}](https://www.lfedge.org/projects/fledge/) </center>

<center> [![Edge X Foundry](https://github.com/lf-edge/artwork/blob/master/edgexfoundry/horizontal/color/edgexfoundry-horizontal-color.png?raw=true){width=50%}](https://www.lfedge.org/projects/edgexfoundry/) </center>

### Message Brokers

Message Brokers support publishing of data from connectors and subscribing(pub/sub) to data from consumbers. Popular options used with RTDIP are:

<center>[![Kafka](../process_control/images/kafka-logo-wide.png){width=40%}](https://kafka.apache.org/) </center>

<center>[![MQTT](../process_control/images/mqtt.png){width=40%}](https://mqtt.org/) </center>

<center>[![Azure IoT Hub](../process_control/images/iot_hub.png){width=40%}](https://azure.microsoft.com/en-us/products/iot-hub) </center>


## Real Time Data Ingestion Platform

For more information about the Real Time Data Platform and its components to connect to data sources and destinations, please refer to this [link.](../../sdk/overview.md)




3 changes: 1 addition & 2 deletions docs/macros.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
# limitations under the License.

from github import Github
import arrow

def define_env(env):
@env.macro
Expand All @@ -25,7 +24,7 @@ def github_releases(owner, repo):
title_markdown = "##[{}]({})".format(release.title, release.html_url)

subtitle_markdown = ":octicons-tag-24:[{}]({}) ".format(release.tag_name, release.html_url)
subtitle_markdown += ":octicons-calendar-24: Published {} ".format(arrow.get(release.published_at).humanize())
subtitle_markdown += ":octicons-calendar-24: Published {} ".format(release.published_at)
if release.draft:
subtitle_markdown += ":octicons-file-diff-24: Draft "
if release.prerelease:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Write to Delta using Merge
::: src.sdk.python.rtdip_sdk.pipelines.destinations.spark.delta_merge
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Write to Delta
::: src.sdk.python.rtdip_sdk.pipelines.destinations.spark.pcdm_to_delta
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Write to Rest API
::: src.sdk.python.rtdip_sdk.pipelines.destinations.spark.rest_api
2 changes: 2 additions & 0 deletions docs/sdk/code-reference/pipelines/secrets/azure_key_vault.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Databricks Secret Scope
::: src.sdk.python.rtdip_sdk.pipelines.secrets.azure_key_vault
2 changes: 2 additions & 0 deletions docs/sdk/code-reference/pipelines/secrets/hashicorp_vault.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Databricks Secret Scope
::: src.sdk.python.rtdip_sdk.pipelines.secrets.hashicorp_vault
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Convert Fledge Json to Process Control Data Model
::: src.sdk.python.rtdip_sdk.pipelines.transformers.spark.fledge_json_to_pcdm

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Convert OPC Publisher Json to Process Control Data Model
::: src.sdk.python.rtdip_sdk.pipelines.transformers.spark.opc_publisher_json_to_pcdm

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Convert SSIP PI Binary File data to the Process Control Data Model
::: src.sdk.python.rtdip_sdk.pipelines.transformers.spark.ssip_pi_binary_file_to_pcdm
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Convert SSIP PI Binary JSON data to the Process Control Data Model
::: src.sdk.python.rtdip_sdk.pipelines.transformers.spark.ssip_pi_binary_json_to_pcdm
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
::: src.sdk.python.rtdip_sdk.pipelines.utilities.spark.adls_gen2_spn_connect
4 changes: 2 additions & 2 deletions docs/sdk/code-reference/query/interpolate.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ parameters = {
"data_security_level": "Security Level",
"data_type": "float", #options:["float", "double", "integer", "string"]
"tag_names": ["tag_1", "tag_2"], #list of tags
"start_date": "2023-01-01", #start_date can be a date in the format "YYYY-MM-DD" or a datetime in the format "YYYY-MM-DDTHH:MM:SS"
"end_date": "2023-01-31", #end_date can be a date in the format "YYYY-MM-DD" or a datetime in the format "YYYY-MM-DDTHH:MM:SS"
"start_date": "2023-01-01", #start_date can be a date in the format "YYYY-MM-DD" or a datetime in the format "YYYY-MM-DDTHH:MM:SS" or specify the timezone offset in the format "YYYY-MM-DDTHH:MM:SS+zzzz"
"end_date": "2023-01-31", #end_date can be a date in the format "YYYY-MM-DD" or a datetime in the format "YYYY-MM-DDTHH:MM:SS" or specify the timezone offset in the format "YYYY-MM-DDTHH:MM:SS+zzzz"
"sample_rate": "1", #numeric input
"sample_unit": "hour", #options: ["second", "minute", "day", "hour"]
"agg_method": "first", #options: ["first", "last", "avg", "min", "max"]
Expand Down
30 changes: 30 additions & 0 deletions docs/sdk/code-reference/query/interpolation_at_time.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Interpolation at Time Function
::: src.sdk.python.rtdip_sdk.functions.interpolation_at_time

## Example
```python
from rtdip_sdk.authentication.authenticate import DefaultAuth
from rtdip_sdk.odbc.db_sql_connector import DatabricksSQLConnection
from rtdip_sdk.functions import interpolation_at_time

auth = DefaultAuth().authenticate()
token = auth.get_token("2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default").token
connection = DatabricksSQLConnection("{server_hostname}", "{http_path}", token)

parameters = {
"business_unit": "Business Unit",
"region": "Region",
"asset": "Asset Name",
"data_security_level": "Security Level",
"data_type": "float", #options:["float", "double", "integer", "string"]
"tag_names": ["tag_1", "tag_2"], #list of tags
"timestamps": ["2023-01-01"] #list of timestamps can be a date in the format "YYYY-MM-DD" or a datetime in the format "YYYY-MM-DDTHH:MM:SS" or specify the timezone offset in the format "YYYY-MM-DDTHH:MM:SS+zzzz"
}
x = interpolation_at_time.get(connection, parameters)
print(x)
```

This example is using [```DefaultAuth()```](../authentication/azure.md) and [```DatabricksSQLConnection()```](db-sql-connector.md) to authenticate and connect. You can find other ways to authenticate [here](../authentication/azure.md). The alternative built in connection methods are either by [```PYODBCSQLConnection()```](pyodbc-sql-connector.md) or [```TURBODBCSQLConnection()```](turbodbc-sql-connector.md).

!!! note "Note"
</b>```server_hostname``` and ```http_path``` can be found on the [SQL Warehouses Page](../../queries/databricks/sql-warehouses.md). <br />
Loading