A universal data import tool (a.k.a. ETL) 🙌
For now it's a single import job, made more composable and put into its own place to be used in other projects. The vision is to add more things, so it can handle more data sources and targets, eventually.
Add this line to your application's Gemfile:
gem 'importeur'And then execute:
$ bundle
Or install it yourself as:
$ gem install importeur
The center of it all is the ETL class. By inistantiating it with different
extractors, transformers and loaders, it can be whatever you want it to be:
Importeur::ETL.new(
extractor: my_extractor,
transformer: my_transformer,
loader: my_loader
)All three only need to have a call method and, hence, can be simple Procs.
The extractor should return something enumerable, which is then iterated over
and each item passed into the transformer. The loader receives the enumerable
result of that. If the transformer returns nil or false that element is
skipped and not received by the loader. The transformer can also return an
array of elements, so a single element from extractor can be received as
multiple elements by the loader.
So far a more or less generic Extractor exists.
Importeur::Extractor.new(
source,
cursor,
cursor_key
)It takes as arguments a data source that needs to implement
- An
itemsmethod, returning anything enumerable, as well as adataset_unique_id, returning a unique ID for the current version of the dataset - A cursor, that needs to implement
read(key)andwrite(key, value)(e.g.Rails.cache) it makes the extractor returnnilif there is no new data. - A key (string), to be passed into the cursor.
An ActiveRecordPostgresLoader exists, that has a few very specific
dependencies. As the name suggests, it imports data into a Postgres database
using ActiveRecord. Additionally, it supports acts_as_paranoid (optional).
Example use-cases can be found in the spec/integration directory.
In order to be able to run the tests, a Postgres database is needed. Having
a PostgreSQL user with CREATEDB permission, create .env.test file with your
DATABASE_URL configuration, for example:
DATABASE_URL: 'postgres://user:password@localhost:5432/importeur_test'
Then run bundle exec rake create_test_db to create the database and
bundle exec rspec to run tests.
We run rake rubocop to make sure, everything looks good.
Bug reports and pull requests are welcome on GitHub at https://github.com/ad2games/importeur.