Slings from a data source to a data target.
pip install sling
Then you should be able to run sling --help from command line.
sling run --src-conn MY_PG --src-stream myschema.mytable \
--tgt-conn YOUR_SNOWFLAKE --tgt-object yourschema.yourtable \
--mode full-refreshOr passing a yaml/json string or file
cat '
source: MY_POSTGRES
target: MY_SNOWFLAKE
# default config options which apply to all streams
defaults:
mode: full-refresh
object: new_schema.{stream_schema}_{stream_table}
streams:
my_schema.*:
' > /path/to/replication.yaml
sling run -r /path/to/replication.yamlRun a replication from file:
import yaml
from sling import Replication
with open('path/to/replication.yaml') as file:
config = yaml.load(file, Loader=yaml.FullLoader)
replication = Replication(**config)
replication.run()Build a replication dynamically:
from sling import Replication, ReplicationStream
# build sling replication
streams = {}
for (folder, table_name) in list(folders):
streams[folder] = ReplicationStream(mode='full-refresh', object=table_name, primary_key='_hash_id')
replication = Replication(
source='aws_s3',
target='snowflake',
streams=streams,
env=dict(SLING_STREAM_URL_COLUMN='true', SLING_LOADED_AT_COLUMN='true'),
debug=True,
)
replication.run()--src-conn/source.conn and --tgt-conn/target.conn can be a name or URL of a folder:
MY_PG(connection ref in db, profile or env)postgresql://user:[email protected]:5432/databases3://my_bucket/my_folder/file.csvgs://my_google_bucket/my_folder/file.jsonfile:///tmp/my_folder/file.csv(local storage)
--src-stream/source.stream can be an object name to stream from:
TABLE1SCHEMA1.TABLE2OBJECT_NAMEselect * from SCHEMA1.TABLE3/path/to/file.sql(if source conn is DB)
--tgt-object/target.object can be an object name to write to:
TABLE1SCHEMA1.TABLE2
{
"source": {
"conn": "MY_PG_URL",
"stream": "select * from my_table",
"options": {}
},
"target": {
"conn": "s3://my_bucket/my_folder/new_file.csv",
"options": {
"header": false
}
}
}