Thanks to visit codestin.com
Credit goes to github.com

Skip to content

tbanel/orgtbljoin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Join several Org Mode Tables

One table (the master table) is grown by selectively appending columns of other tables (the reference tables).

New

The wizard (see Wizard) has been extended. It can now modify an existing block, as well as creating a new one. It is more fine grained. It features a more extensive help for each query.

Still call it with C-c C-x x join or orgtbl-join.

Table of Contents

Example

Here is a list of products for a cooking recipe.

| type     | quty |
|----------+------|
| onion    |   70 |
| tomato   |  120 |
| eggplant |  300 |
| tofu     |  100 |

We want to complete it with nutritional facts: quantities of fiber, sugar, proteins, and carbohydrates. For this purpose, we have a long reference table of standard products. (This table has been freely borrowed from Nut-Nutrition, http://nut.sourceforge.net/, by Jim Jozwiak).

#+tblname: nut
| type     | Fiber | Sugar | Protein | Carb |
|----------+-------+-------+---------+------|
| eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tomato   |   0.6 |   2.1 |     0.8 |  3.4 |
| onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| egg      |     0 |  18.3 |    31.9 | 18.3 |
| rice     |   0.2 |     0 |     1.5 | 16.0 |
| bread    |   0.7 |   0.7 |     3.3 | 16.0 |
| orange   |   3.1 |  11.9 |     1.3 | 17.6 |
| banana   |   2.1 |   9.9 |     0.9 | 18.5 |
| tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| nut      |   2.6 |   1.3 |     4.9 |  7.2 |
| corn     |   4.7 |   1.8 |     2.8 | 21.3 |

Let us put the cursor on the type column of the recipe table, and type M-x orgtbl-join.

A few questions are asked. Then the recipe gets new columns appended with the needed nutrition facts:

| type     | quty | Fiber | Sugar | Protein | Carb |
|----------+------+-------+-------+---------+------|
| onion    |   70 |   1.3 |   4.4 |     1.3 |  9.0 |
| tomato   |  120 |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 |   0.7 |   0.5 |     6.6 |  1.4 |

SQL equivalent

If you are familiar with SQL, you would get a similar result with a join (actually a left outer join by default, but that can be configured with the :full parameter).

select *
from recipe, nut
where recipe.type = nut.type;
select *
from recipe, nut
left outer join nut on recipe.type = nut.type;

In-place, Push, Pull

Three modes are available: in-place, push, pull.

In in-place mode

The master table is changed (in-place) by appending columns from reference tables.

Invoke it with the M-x orgtbl-join command. The cursor must be positioned on the column used to perform the join.

In push mode

The master table drives the creation of derived tables. Specify the wanted result in #+ORGTBL: SEND directives (as many as desired):

#+ORGTBL: SEND enriched orgtbl-to-joined-table :ref-table nut :mas-column type :ref-column type
| type     | quty |
|----------+------|
| onion    |   70 |
| tomato   |  120 |
| eggplant |  300 |
| tofu     |  100 |

The receiving blocks must be created somewhere else in the same file:

#+BEGIN RECEIVE ORGTBL enriched
#+END RECEIVE ORGTBL enriched

Typing C-c C-c with the cursor on the first pipe of the master table refreshes all derived tables.

In pull mode

So-called “dynamic blocks” may also be used. The resulting table knows how to build itself. Example:

A master table is unaware that it will be enriched in a joined table:

#+TBLNAME: recipe
| type     | quty |
|----------+------|
| onion    |   70 |
| tomato   |  120 |
| eggplant |  300 |
| tofu     |  100 |

Create somewhere else a dynamic block which carries the specification of the join:

#+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type
| type     | quty | Fiber | Sugar | Protein | Carb |
|----------+------+-------+-------+---------+------|
| onion    |   70 |   1.3 |   4.4 |     1.3 |  9.0 |
| tomato   |  120 |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 |   0.7 |   0.5 |     6.6 |  1.4 |
#+END:

Typing C-c C-c with the cursor on the #+BEGIN: line refreshes the table.

As a rule of thumb

For quick and once-only processing, use in-place mode.

Use pull or push modes for reproducible work. The pull mode might be easier to use than the push, because there is a wizard bound to C-c C-x x (see below). Other than that, the two modes use the same underlying engine, so using one or the other is just a matter or convenience.

Duplicates

The reference tables may contain several matching rows for the same value in the master table. In this case, as many rows are created in the joined table. Therefore, the resulting table may be longer than the master table. Example, if a reference table contains three rows for “eggplants”:

#+tblname: nut
| type     | Cooking | Fiber | Sugar | Protein | Carb |
|----------+---------+-------+-------+---------+------|
| ...      | ...     |   ... |   ... |     ... |  ... |
| eggplant | boiled  |   2.5 |   3.2 |     0.8 |  8.6 |
| eggplant | pickled |   3.4 |   6.5 |     1.2 | 13.3 |
| eggplant | raw     |   2.8 |   1.9 |     0.8 |  4.7 |
| ...      | ...     |   ... |   ... |     ... |  ... |

Then the resulting table will have those three rows appended:

| type     | quty | type     | Cooking | Fiber | Sugar | Protein | Carb |
|----------+------+----------+---------+-------+-------+---------+------|
| ...      |  ... | ...      | ...     | ...   | ...   | ...     | ...  |
| eggplant |  300 | eggplant | boiled  |   2.5 |   3.2 |     0.8 |  8.6 |
| eggplant |  300 | eggplant | pickled |   3.4 |   6.5 |     1.2 | 13.3 |
| eggplant |  300 | eggplant | raw     |   2.8 |   1.9 |     0.8 |  4.7 |

If you are familiar with SQL, this behavior is reminiscent of the left outer join.

Duplicate entries may happen both in the master and the reference tables. The joined table will have all combinations. So for instance if there are 2 eggplant rows in the master table, and 3 eggplant rows in the reference table, then the joined table will get 6 eggplant rows.

Selecting the output columns

By default, all columns from the master table and all the reference tables are output (except the joining column, which is output only once).

This can be customized with the :cols parameter. Give it the list of desired columns, in the order they should be output.

Columns may be specified by their name (if they have one) or by a dollar form. Thus, $3 means the third column (numbering begins with 1).

By default, the first example give all columns (except type which appears only once):

#+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type
| type     | quty | Fiber | Sugar | Protein | Carb |
|----------+------+-------+-------+---------+------|
| onion    |   70 |   1.3 |   4.4 |     1.3 |  9.0 |
| tomato   |  120 |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 |   0.7 |   0.5 |     6.6 |  1.4 |
#+END:

If we want only quty and Protein, we specify it like that:

#+BEGIN: join :cols (quty Protein) :mas-table recipe :mas-column type :ref-table nut :ref-column type
| quty | Protein |
|------+---------|
|   70 |     1.3 |
|  120 |     0.8 |
|  300 |     0.8 |
|  100 |     6.6 |
#+END:

Or like that:

#+BEGIN: join :cols "quty Protein" :mas-table recipe :mas-column type :ref-table nut :ref-column type
| quty | Protein |
|------+---------|
|   70 |     1.3 |
|  120 |     0.8 |
|  300 |     0.8 |
|  100 |     6.6 |
#+END:

How to handle missing rows?

It may happen that no row in the reference table matches a value in the master table. By default, in this case, the master row is kept, with empty cells added to it. Information from the master table is not lost. If, for example, a line in the recipe refers to an unknown “amaranth” product (a cereal known by the ancient Incas), then the resulting table will still contain the amaranth row, with empty nutritional facts.

| type     | quty | type     | Fiber | Sugar | Protein | Carb |
|----------+------+----------+-------+-------+---------+------|
| onion    |   70 | onion    |   1.3 |   4.4 |     1.3 |  9.0 |
| tomato   |  120 | tomato   |   0.6 |   2.1 |     0.8 |  3.4 |
| eggplant |  300 | eggplant |   2.5 |   3.2 |     0.8 |  8.6 |
| tofu     |  100 | tofu     |   0.7 |   0.5 |     6.6 |  1.4 |
| amaranth |  120 |          |       |       |         |      |

This behavior is controlled by the :full parameter:

  • :full mas the joined result contains the full master table (the default)
  • :full ref the joined result contains the full reference tables
  • :full mas+ref the joined result contains all rows from both mater and all reference tables
  • :full none or :full nil the joined result contains only rows that appear in both tables

The use cases may be as follow:

  • :full mas is useful when the reference table is large, as a dictionary or a nutritional facts table. We just pick the needed rows from the reference.
  • :full mas+ref is useful when both tables are similar. For instance, one table has been grown by a team, and the other independently by another team. The joined table will contain additional rows from both teams.
  • :full none is useful to create the intersection of tables. For instance we have a list of items in the main warehouse, and another list of damaged items. We are interested only in damaged items in the main warehouse.

Malformed input tables

Sometimes an input table may be unaligned or malformed, with incomplete rows, like those ones:

| type     | Fiber | Sugar |      | Carb |
|----------+-------+-------+------+------|
| eggplant |   2.5 |   3.2 |  0.8 |  8.6 |
| tomato   |   0.6 |   2.1 |  0.8 |  3.4 |
| onion    |   1.3 |   4.4 |  1.3 |  9.0 |
    | egg      |     0 |  18.3 | 31.9 | 18.3 |
| rice     |   0.2 |     0 |  1.5 | 16.0 |
| tofu     |  0.7
| nut      |   2.6 |   1.3 |  4.9 |  7.2 |

| type     | quty |
|----------+------|
| onion    |   70 |
| tomato   |
| eggplant |  300 |
  | tofu     |  100 |

Missing cells are handled as though they were empty.

Headers

The master and the reference tables may or may not have a header. When there is a header, it may extend over several lines. A header ends with an horizontal line.

OrgtblJoin tries to preserve as much of the master table as possible. Therefore, if the master table has a header, the joined table will have it verbatim, over as many lines as needed.

The reference tables headers (if any), will fill-in the header (if any) of the resulting table. But if there is no room in the resulting table header, the reference tables headers lines will be ignored, partly of fully.

Header are useful to refer to columns. If there is no header, then columns must be referred with $ names: $1 is the name of the first column, $2 is the name of the second column, and so on. This is pretty much the same as in the Org Mode spreadsheet.

Wizard

The wizard may be invoked in-place or for the pull-mode.

Invoke the wizard in-place by typing M-x orgtbl-join with the cursor inside the master table to be enriched. The cursor should be anywhere in the column serving the join process.

The menu entry Tbl > Column > Join with another table is equivalent to M-x orgtbl-join.

For the pull-mode, the same wizard may create a fresh new block #+BEGIN: join..., or amend an existing one. Invoke it with

  • either M-x orgtbl-join-insert-dblock-join
  • or C-c C-x x =join.

Put the cursor on an empty space in your Org Mode file, or on an existing #+BEGIN: join... block.

For all questions, completion is available.

Note: there are many kinds of dynamic blocks that can be inserted besides join.

As there might be as many reference tables as wanted, the wizard continues asking for reference tables. When done, answer n when the wizard ask if you want an additional reference table to be joined.

The wizard does not (yet) take into account the :cols and :post parameters. If there where such parameters already specified, the wizard will leave them untouched.

Joining 2 similar tables

What if we need not to append data from some table/s to a main table, but to join 2 similar or symmetric tables with different data?

Let’s assume we have these 2 tables:

+#TBLNAME: TagsQ1
| tag  | Q1 |
|------+----|
| tagA | 25 |
| tagB | 18 |
| tagC | 13 |
| tagD |  6 |
| tagE |  2 |
| tagF |  2 |
| tagG |  1 |

and

+#TBLNAME: TagsQ2
| tag    | Q2 |
|--------+----|
| tagA   |  2 |
| tagD   |  3 |
| tagE   |  3 |
| tagF   |  5 |
| tagG   |  7 |
| tagH   | 11 |
| tagI   | 15 |

Looking closely at both tables we can observe that some of these tags appear in both (tags A, D, E, F, G), some only on Q1 (D, C) and other only on Q2 (H, I, …).

We want to create a table that includes all the tags, with a column with their frequency for table TagsQ1 and another for TagsQ2.

So we can create the orgtbl-join block with the Wizard. Type C-c C-x x, then answer join.

As our tables are somehow symmetric (no one is a primary one), you will choose arbitrarily TagsQ1 as the “master table” and TagsQ2 as the “reference table”.

So continue answering to the wizard:

  1. Master table: TagsQ1
  2. Reference table: TagsQ2
  3. joining column in reference table: tag
  4. joining column in master table: tag

Then there is a question about which table should appear entirely. In the result you want, there might be missing values in both Q1 and Q2 columns. Therefore the right answer is: mas+ref

Eventually you get:

#+BEGIN: join :mas-table "TagsQ1" :ref-table "TagsQ2" :mas-column "tag" :ref-column "tag" :full "mas+ref"
| tag    | Q1 | Q2 |
|--------+----+----|
| tagA   | 25 |  2 |
| tagB   | 18 |
| tagC   | 13 |
| tagD   |  6 |  3 |
| tagE   |  2 |  3 |
| tagF   |  2 |  5 |
| tagG   |  1 |  7 |
|--------+----+----|
| tagH   |    | 11 |
| tagI   |    | 15 |
| tag... |    | 19 |
#+END:

The tagB and tagC rows are incomplete on purpose. To fill in the table, just type TAB inside it.

Post-joining spreadsheet formulas

Additional columns can be specified for the resulting table. With the previous example, we added a 7th column multiplying columns 2 and 3. This results in a line beginning with #+TBLFM: below the table, as usual in Org spreadsheet. This line will survive re-computations.

Moreover, we added a spreadsheet formula with a :formula parameter. This will fill-in the 7th column header. It is translated into a usual #+TBLFM: spreadsheet line.

#+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type :formula "@1$7=totfiber"
#+name: richer
| type     | quty | Fiber | Sugar | Protein | Carb | totfiber |
|----------+------+-------+-------+---------+------+----------|
| onion    |   70 |   1.3 |   4.4 |     1.3 |  9.0 |      91. |
| tomato   |  120 |   0.6 |   2.1 |     0.8 |  3.4 |      72. |
| eggplant |  300 |   2.5 |   3.2 |     0.8 |  8.6 |     750. |
| tofu     |  100 |   0.7 |   0.5 |     6.6 |  1.4 |      70. |
#+TBLFM: $7=$2*$3::@1$7=totfiber
#+END:

Post processing

The joined table can be post-processed with the :post parameter. It accepts a Lisp lambda, a Lisp function, a Lisp expression, or a Babel block.

The processing receives the joined table as parameter in the form of a Lisp expression. It can process it in any way it wants, provided it returns a valid Lisp table.

A Lisp table is a list of rows. Each row is either a list of cells, or the special symbol hline.

In this example, a lambda expression adds a hline and a row for ginger.

#+BEGIN: join ... :post (lambda (table) (append table '(hline (ginger na na na na))))
| product   |   quty | Carb | Fiber | Sugar | Protein |
|-----------+--------+------+-------+-------+---------|
| onion     |     70 |  9.0 |   1.3 |   4.4 |     1.3 |
| unknown   |    999 |
| tomatoe   |    120 |  3.4 |   0.6 |   2.1 |     0.8 |
|-----------+--------+------+-------+-------+---------|
| ginger    |     33 |   na |    na |    na |      na |
#+END:

The lambda can be moved to a defun. The function is then passed to the :post parameter:

#+begin_src elisp
(defun my-function (table)
  (append table
          '(hline (ginger na na na na))))
#+end_src
... :post my-function

The :post parameter can also refer to a Babel block. Example:

#+BEGIN: join ... :post "my-babel-block(tbl=*this*)"
...
#+END:
#+name: my-babel-block
#+begin_src elisp :var tbl=""
(append tbl
        '(hline (ginger na na na na)))
#+end_src

The block is passed the table to process in a Lisp variable called *this*.

Virtual input table from Babel

Any of the input tables may be the result of executing a Babel script. In this case, the table is virtual in the sense that it appears nowhere.

(Babel is the Org Mode infrastructure to run scripts in any language, like Python, R, C++, Java, D, shell, whatever, with inputs and outputs connected to Org Mode).

Example:

Here is a script in Emacs Lisp which creates an Org Mode table.

#+name: ascript
(list
 '(type quty)
 'hline
 (list "tomato" (* 53.1 12))
 (list "tofu" (* 12.5 7)))

If executed, the script would output this table:

#+RESULTS: ascript
| type   |  quty |
|--------+-------|
| tomato | 637.2 |
| tofu   |  87.5 |

But instead, OrgtblJoin will execute the script and consume its output:

#+BEGIN: join :mas-table "ascript" :ref-table "nut" :mas-column "type" :ref-column "type" :full "mas"
| type   |  quty | Fiber | Sugar | Protein | Carb |
|--------+-------+-------+-------+---------+------|
| tomato | 637.2 |   0.6 |   2.1 |     0.8 |  3.4 |
| tofu   |  87.5 |   0.7 |   0.5 |     6.6 |  1.4 |
#+END:

Here the parameter :mas-table specifies the name of the script to be executed.

Wide variety of input tables

As in any other Org Mode source block, the input table may come from several places. OrgAggregate adds even more kinds of input.

The parameter after :table may be:

  • mytable: an ordinary Org Mode table in the same buffer, named mytable.
  • /some/dir/file.org:mytable: an ordinary Org Mode table named mytable, in a distant Org file named /some/dir/file.org.
  • mybabel: an Org Mode Babel block named mybabel in the current buffer, generating a table as its output, written in any language.
  • mybabel(param1=123,param2=456): passing parameters to an Org Mode Babel block named mybabel in the current buffer, generating a table as its output, written in any language.
  • /some/dir/file.org:mybabel(param1=123,param2=456): an Org Mode Babel block named mybabel in a distant org file named /some/dir/file.org, called with parameters.
  • /some/dir/file.csv:(csv params…): a comma-separated-values file in the CSV format, in the file /some/dir/file.csv. The separators may be TAB, comma, or semicolon, they are guessed and different separators may be mixed. Any empty row in the CSV file is interpreted as an horizontal separator (hline in Org table parlance).

    Parameters may be:

    • header: the first row in the CSV file is interpreted as a header containing the column names.
    • colnames (column1 column2 column3 …): the column names are given explicitly, in case the CSV file contains only data, no header.

    In any case, the columns may be references as $1, $2, $3, … as usual.

  • /some/dir/file.json:(json params…): a file containing a JSON formatted table, in the file /some/dir/file.csv. Currently, the only accepted format is an array of arrays. There are currently no parameters. In the future it may be possible to specify alternative sub-formats.
  • 34cbc63a-c664-471e-a620-d654b26ffa31: an identifier of an Org Mode sub-tree. The sub-tree is supposed to contain a table, which is retrieved. Those Org Mode identifiers span all known Org Mode files. To add such an identifier, put the cursor on the heading of the sub-tree, and type M-x org-id-get-create.

The Org Mode also provides for table slicing. All of the previous references may be followed by an optional slicing. Examples:

  • mytable[0:5]: retain only the first 6 rows of the input table; if the table has a header, then it counts as 2 rows (the header and the separation line); in this example, it would retain rows 0 and 1 for the header, and rows 2,3,4,5 for the content.
  • mytable[,0:1]: retain only the first 2 columns.
  • mytable[0:5,0:1]: retain only the first 6 rows and the first 2 columns.

Chaining

In an above example we gave a name to the resulting joined table: #+name: richer. Doing so the joined table may become an input for a further computation, for example in a Babel block.

The name will survive re-computations. This happens only in pull mode.

Note that the #+name: richer line could appear above the #+BEGIN: line. But sometimes this is not taken into account by further Babel blocks.

Multiple reference tables

OrgtblJoin used to handle just one reference table. Now, as many as wanted are handled.

To specify the reference tables, just use several times the :ref-table and :ref-column parameters. They must match: for instance, the third :ref-table must match the third :ref-column.

For now, the :full and :mas-column parameters should be mentionned just once. This could change in the future with as many such parameters as reference tables.

One side effect of going multiple, is that zero reference table is now accepted. In this case, the result of the join is just the master table. But it can be change in several ways:

  • Selection and re-ordering of columns through the :cols parameter.
  • Additional computed columns through the :formula parameter and survival of #+TBLFM: lines.
  • Lisp and Babel post-processing through the :post parameter.

Installation

Emacs package on Melpa: add the following lines to your .emacs file, and reload it.

(add-to-list 'package-archives '("melpa" . "http://melpa.org/packages/") t)
(package-initialize)

You may also customize this variable:

M-x customize-variable package-archives

Then browse the list of available packages and install orgtbl-join

M-x package-list-packages

Alternatively, you can download the Lisp files, and load them:

(load-file "orgtbl-join.el")

You may want to add an entry in the Table menu, Column sub-menu. You may also want to call orgtbl-join with C-c j. One way to do so is to use use-package in your .emacs init file:

(use-package orgtbl-join
  :after (org)
  :bind ("C-c j" . orgtbl-join)
  :init
  (easy-menu-add-item
   org-tbl-menu '("Column")
   ["Join with another table" orgtbl-join (org-at-table-p)]))

Note: there used to be a orgtbl-join-setup-keybindings function to do just what the above use-package does. In this new way, key and menu bindings are no longer hard-coded in the package.

Author, contributors

Comments, enhancements, etc. welcome.

Author

  • Thierry Banel, tbanelwebmin at free dot fr
  • bymoz089 (GitHub) found and tracked-down a bug in the in-place joining
  • Eduardo Mercovich (GitHub edumerco) wrote the documentation for the 2 similar tables use case.

Contributors

  • Dirk Schmitt, surviving #.NAME: line
  • wuqui, :cols parameter
  • Misohena (https://misohena.jp/blog/author/misohena), double width Japanese characters (string-width vs. length)
  • Shankar Rao, :post post-processing
  • Piotr Panasiuk, #+CAPTION: and any tags survive
  • Luis Miguel Hernanz, multiple reference tables suggestion, fix regex bug

Changes

  • remove duplicate reference column
  • fix keybindings
  • #.NAME: inside #.BEGIN: survives
  • missing input cells handled as empty ones
  • back-port Org Mode 9.4 speed up
  • increase performance when inserting result into the buffer
  • aligned output in push mode
  • 2 as column name no longer supported, write $2
  • add :full parameter
  • remove C-c C-x i, use standard C-c C-x x instead
  • added the :cols parameter
  • :post post-processing
  • 3x speedup org-table-to-lisp and avoid Emacs 27 to 30 incompatibilities
  • #+CAPTION: and any other tag survive inside #+BEGIN:
  • now there can be several reference tables in a join, instead of just one.
  • Documentation is now integrated right into Emacs in the info format. Type M-: (info "orgtbl-join")
  • TOC in README.org (thanks org-make-toc)
  • Virtual input table produced by Babel blocks
  • Speedup of resulting table recalculation when there are formulas in #+tblfm: or in :formula. The overall join may be up to x4 faster and ÷4 less memory hungry.
  • Add the chapter “Joining 2 similar tables” for a common use case.

GPL 3 License

Copyright (C) 2014-2025 Thierry Banel

orgtbl-join is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

orgtbl-join is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

About

Enrich an Org-table with reference tables

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published