Thanks to visit codestin.com
Credit goes to github.com

Skip to content

PhilanthroLab/vandelay

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dead simple data pipeline utility belt.

vandelay NPM version Downloads Build Status

Install

npm install vandelay --save

Example - Flat File

import { tap, fetch, transform, parse } from 'vandelay'

fetch({
  url: 'http://google.com/example.geojson',
  parser: parse('geojson')
})
  .pipe(transform(async (row) => {
    const external = await otherApi(row.field)
    return {
      ...row,
      external
    }
  }))
  .pipe(tap(async (row, meta) => {
    // send row to an external api, db, or whatever!
  }))

Example - API

import { tap, fetch, transform, parse } from 'vandelay'

fetch({
  url: 'http://google.com/api/example',
  parser: parse('json', { selector: 'results.*' }),
  pagination: {
    offsetParam: 'offset',
    limitParam: 'limit'
  }
})
  .pipe(transform(async (row, meta) => {
    const external = await otherApi(row.field)
    return {
      ...row,
      external
    }
  }))
  .pipe(tap(async (row, meta) => {
    // send row to an external api, db, or whatever!
  }))

API

fetch(source[, options])

Returns a stream that fetches the given source and emits the parsed and selected objects.

source

  • url - Required String
  • parser - Required Function
  • pagination - Optional Object
    • offsetParam - Required String (if not using pageParam)
    • pageParam - Required String (if not using offsetParam)
    • limitParam - Required String
    • startPage - Optional Number, defaults to 0
    • limit - Required Number
  • setup - Optional Function or String
    • Asynchronous function, runs once before the request starts for each source. Receives source and meta as arguments.
      • First argument is the source object being set up.
      • Second arguments is a meta information argument, that contains a context key if provided.
    • If it a string, it will compile it and sandbox it using vm2.
    • Returns an object that controls request parameters.
  • oauth - Optional Object
    • grant - Required Object
      • url - Required String
        • OAuth2 API URL
      • type - Required String
  • headers - Optional Object
  • query - Optional Object

options

  • concurrency - Optional Number, defaults to 10
  • timeout - Optional Number
    • Timeout for the entire request, defaults to one day
  • connectTimeout - Optional Number
    • Timeout to establish the initial connection, defaults to five minutes
  • context - Optional Object
    • If specified, will be templated into the URL via RFC6570
  • setup - Optional Object
    • sandbox - Optional Object
      • Creates a frozen global context, used for sandboxed setup functions
      • Only applies when using a string setup function
    • timeout - Optional Number
      • Only applies when using a string setup function
    • compiler - Optional Function
      • Only applies when using a string setup function
  • onError - Optional Function
    • Receives a context object when an error occurs, so you can decide how to handle the error and opt out of the default behavior.
    • The default handler will emit an error on the stream.
  • onFetch - Optional Function
    • Receives the URL as the only argument, for debugging or logging purposes.
    • Called every time an HTTP request is created.

parse(format[, options])

Returns a function that creates a parser stream. Parser streams receive text as input, and output objects.

format

Built in parsers are:

  • csv
    • Optional autoFormat option, to automatically infer types of values and convert them.
    • Optional camelcase option, to camelcase and normalize header keys.
    • Optional zip option, if the content is a zip file it will parse each CSV file in the zip.
  • excel
    • Optional autoFormat option, to automatically infer types of values and convert them.
    • Optional camelcase option, to camelcase and normalize header keys.
    • Optional zip option, if the content is a zip file it will parse each XLSX file in the zip.
  • json
    • Requires a selector option that specifies where to grab rows in the data.
      • If needed, you may provide multiple selectors as an array (selector: [ 'a.*', 'b.*' ])
    • Optional zip option, if the content is a zip file it will parse each JSON file in the zip.
  • xml
    • Requires a selector option that specifies where to grab rows in the data.
      • Note that the selector applies to the xml2js output.
    • Optional autoFormat option, to automatically infer types of values and convert them.
    • Optional camelcase option, to camelcase and normalize header keys.
    • Optional zip option, if the content is a zip file it will parse each XML file in the zip.
  • ndjson
  • shp
  • kml
  • kmz
  • gdb
  • gpx
  • gtfs
  • gtfsrt

options

  • Optional autoFormat option, to automatically infer types of values and convert them.
    • If simple it will only infer types from values and trim keys
    • If aggressive it will add camelcasing of keys on top of simple mode
    • If extreme it will add more complex mapping on top of aggressive mode
      • For example, converting startLat and startLon fields to a GeoJSON Point

transform(transformer[, options])

transformer(row, meta)

  • Asynchronous function, receives the current row and the meta information object.
    • Meta information object contains: row, url, accessToken, context, source, and header (if using a JSON parser)
  • If transformer is a string, it will compile it and sandbox it using vm2.
  • If transformer is an object, it will use object-transform-stack to map objects.
  • Returning an object will pass it on, and null or undefined will remove the item from the stream (skip).

options

  • sandbox - Optional Object
    • Creates a frozen global context, used for sandboxed transformers
    • Only applies when using a string transformer
  • timeout - Optional Number
    • Only applies when using a string transformer
  • compiler - Optional Function
    • Only applies when using a string transformer
  • concurrency - Optional Number, defaults to 10
  • onBegin(row, meta) - Optional Function
  • onError(err, row, meta) - Optional Function
  • onSkip(row, meta) - Optional Function
  • onSuccess(row, meta) - Optional Function

tap(fn[, options])

fn(row, meta)

  • Asynchronous function, receives the current row and the meta information object.
  • Returning an object will pass it on, and null or undefined will remove the item from the stream.

options

  • concurrency - Optional Number, defaults to 10

normalize([options])

Returns the plain objects without any meta fields attached, useful for the end of a stream.

options

  • concurrency - Optional Number, defaults to 10

About

Imports, exports, and ETL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 80.1%
  • JavaScript 19.9%