Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Design issues #31

@turicas

Description

@turicas

Design issues

Some decisions need to be made before we declare the API as stable. We can put
here all the questions for discussing (we should answer these questions as soon
as possible since it impacts the current implementation and would cause rework
if delayed).

(A) About rows.Table

  • A.1) What about lazyness? Should rows.Table be always lazy? Always
    not lazy? Support both? What are the implications? if it's lazy, how to deal
    with deletion and addition of rows?
  • A.2) How should we handle row filtering? What would be the best API?
    For example: we have a rows.Table with many rows but want to filter some
    rows. Should we provide a special method for this or use Python's built-in
    filter? Using Python's built-in filter would be the more Pythonic way but
    we can optimize some operations on certain plugins if we provide a special
    method (example: filtering on a MySQL-based Table).
  • A.3) What if we want to import everything filtered? It's not a filter
    on a pre-existing rows.Table like in question A.2: it's a filter to be
    executed during importation process so we're going to import only some rows.
  • A.4) We should provide an API to modify the current rows during the
    iteration over the Table. User can specify a custom function that will
    receive Table.Row object and return a new one (that should be returned when
    iterating over the Table). This way we can deal with addition of new fields
    and other custom operations online. How should we expose this API? This
    implementation may solve problem on question A.3.
  • A.5) The default row class is a collections.namedtuple. What is the
    best API to change it? Should the default be another one? If we want an
    object with read-write access and also value access via attributes
    AttrDict would be a good option.
    Should we add metadata to the row instance, like its index on that Table?
    See sqlite3.Row and other Python's DBAPI implementations.
  • A.6) rows current architecture is good for importing and exporting
    data but is not well suited for working with that data. One of the key facts
    is that we cannot create a Table from a CSV, change some rows' values and
    save it to the same CSV without doing a batch operation. Should we implement
    read-write access? It can add a lot of complication on the implementation
    (not only the Table itself but in the plugins) since we'll need to deal
    with problems like seeking hrough the rows, saving/flushing partial data (not
    the entire set), amont other problems.
  • A.7) As many users will use rows to import-and-export data it'd be
    handy if we have a shortcut (and maybe some optimizations) to do it. If the
    entire Table is lazy we may not need this shortcut because we can iterate
    over one Table (in a lazy way) at the same time we're saving into another.
  • A.8) Should implement __add__ (so, for example,
    sum([table1, table2, ..., tableN]) will return another Table with all the
    rows -- but only if all table's types are the same). What metadata should
    remain?
  • A.9) Which other operations should be implemented? Join, intersect,
    ...?

(B) About rows.fields

  • B.1) Field instances (values, actually) should be native Python
    objects or custom objects (based on custom classes)? I'm inclined to use
    native Python objects (as it's implemented today).

(C) About Plugins

(D) About CLI

  • D.1) Should we implement --query (to query using SQL -- same as
    import-and-filter)?

(E) Other

  • E.1) How to deal with Table collections? Examples: a XLS file have more than one sheet (each one is a rows.Table itself), a HTML file could contain more than one <table>. See how tablib deals with it.
  • E.2) See sqlite's detect_types.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions