One table (the master table) is grown by selectively appending columns of other tables (the reference tables).
The wizard (see Wizard) has been extended. It can now modify an existing block, as well as creating a new one. It is more fine grained. It features a more extensive help for each query.
Still call it with C-c C-x x join or orgtbl-join.
- Example
- SQL equivalent
- In-place, Push, Pull
- Duplicates
- Selecting the output columns
- How to handle missing rows?
- Malformed input tables
- Headers
- Wizard
- Joining 2 similar tables
- Post-joining spreadsheet formulas
- Post processing
- Virtual input table from Babel
- Wide variety of input tables
- Chaining
- Multiple reference tables
- Installation
- Author, contributors
- Changes
- GPL 3 License
Here is a list of products for a cooking recipe.
| type | quty | |----------+------| | onion | 70 | | tomato | 120 | | eggplant | 300 | | tofu | 100 |
We want to complete it with nutritional facts: quantities of fiber, sugar, proteins, and carbohydrates. For this purpose, we have a long reference table of standard products. (This table has been freely borrowed from Nut-Nutrition, http://nut.sourceforge.net/, by Jim Jozwiak).
#+tblname: nut | type | Fiber | Sugar | Protein | Carb | |----------+-------+-------+---------+------| | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tomato | 0.6 | 2.1 | 0.8 | 3.4 | | onion | 1.3 | 4.4 | 1.3 | 9.0 | | egg | 0 | 18.3 | 31.9 | 18.3 | | rice | 0.2 | 0 | 1.5 | 16.0 | | bread | 0.7 | 0.7 | 3.3 | 16.0 | | orange | 3.1 | 11.9 | 1.3 | 17.6 | | banana | 2.1 | 9.9 | 0.9 | 18.5 | | tofu | 0.7 | 0.5 | 6.6 | 1.4 | | nut | 2.6 | 1.3 | 4.9 | 7.2 | | corn | 4.7 | 1.8 | 2.8 | 21.3 |
Let us put the cursor on the type column of the recipe table, and
type M-x orgtbl-join.
A few questions are asked. Then the recipe gets new columns appended with the needed nutrition facts:
| type | quty | Fiber | Sugar | Protein | Carb | |----------+------+-------+-------+---------+------| | onion | 70 | 1.3 | 4.4 | 1.3 | 9.0 | | tomato | 120 | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | 0.7 | 0.5 | 6.6 | 1.4 |
If you are familiar with SQL, you would get a similar result with a
join (actually a left outer join by default, but that can be
configured with the :full parameter).
select *
from recipe, nut
where recipe.type = nut.type;select *
from recipe, nut
left outer join nut on recipe.type = nut.type;Three modes are available: in-place, push, pull.
The master table is changed (in-place) by appending columns from reference tables.
Invoke it with the M-x orgtbl-join command. The cursor must be
positioned on the column used to perform the join.
The master table drives the creation of derived tables. Specify the wanted
result in #+ORGTBL: SEND directives (as many as desired):
#+ORGTBL: SEND enriched orgtbl-to-joined-table :ref-table nut :mas-column type :ref-column type | type | quty | |----------+------| | onion | 70 | | tomato | 120 | | eggplant | 300 | | tofu | 100 |
The receiving blocks must be created somewhere else in the same file:
#+BEGIN RECEIVE ORGTBL enriched #+END RECEIVE ORGTBL enriched
Typing C-c C-c with the cursor on the first pipe of the master table
refreshes all derived tables.
So-called “dynamic blocks” may also be used. The resulting table knows how to build itself. Example:
A master table is unaware that it will be enriched in a joined table:
#+TBLNAME: recipe | type | quty | |----------+------| | onion | 70 | | tomato | 120 | | eggplant | 300 | | tofu | 100 |
Create somewhere else a dynamic block which carries the specification of the join:
#+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type | type | quty | Fiber | Sugar | Protein | Carb | |----------+------+-------+-------+---------+------| | onion | 70 | 1.3 | 4.4 | 1.3 | 9.0 | | tomato | 120 | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | 0.7 | 0.5 | 6.6 | 1.4 | #+END:
Typing C-c C-c with the cursor on the #+BEGIN: line refreshes the
table.
For quick and once-only processing, use in-place mode.
Use pull or push modes for reproducible work. The pull mode might be
easier to use than the push, because there is a wizard bound to C-c C-x x
(see below). Other than that, the two modes use the same underlying engine,
so using one or the other is just a matter or convenience.
The reference tables may contain several matching rows for the same value in the master table. In this case, as many rows are created in the joined table. Therefore, the resulting table may be longer than the master table. Example, if a reference table contains three rows for “eggplants”:
#+tblname: nut | type | Cooking | Fiber | Sugar | Protein | Carb | |----------+---------+-------+-------+---------+------| | ... | ... | ... | ... | ... | ... | | eggplant | boiled | 2.5 | 3.2 | 0.8 | 8.6 | | eggplant | pickled | 3.4 | 6.5 | 1.2 | 13.3 | | eggplant | raw | 2.8 | 1.9 | 0.8 | 4.7 | | ... | ... | ... | ... | ... | ... |
Then the resulting table will have those three rows appended:
| type | quty | type | Cooking | Fiber | Sugar | Protein | Carb | |----------+------+----------+---------+-------+-------+---------+------| | ... | ... | ... | ... | ... | ... | ... | ... | | eggplant | 300 | eggplant | boiled | 2.5 | 3.2 | 0.8 | 8.6 | | eggplant | 300 | eggplant | pickled | 3.4 | 6.5 | 1.2 | 13.3 | | eggplant | 300 | eggplant | raw | 2.8 | 1.9 | 0.8 | 4.7 |
If you are familiar with SQL, this behavior is reminiscent of the left outer join.
Duplicate entries may happen both in the master and the reference
tables. The joined table will have all combinations. So for instance
if there are 2 eggplant rows in the master table, and 3 eggplant rows
in the reference table, then the joined table will get 6 eggplant
rows.
By default, all columns from the master table and all the reference tables are output (except the joining column, which is output only once).
This can be customized with the :cols parameter. Give it the list of
desired columns, in the order they should be output.
Columns may be specified by their name (if they have one) or by a
dollar form. Thus, $3 means the third column (numbering begins with
1).
By default, the first example give all columns (except type which
appears only once):
#+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type | type | quty | Fiber | Sugar | Protein | Carb | |----------+------+-------+-------+---------+------| | onion | 70 | 1.3 | 4.4 | 1.3 | 9.0 | | tomato | 120 | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | 0.7 | 0.5 | 6.6 | 1.4 | #+END:
If we want only quty and Protein, we specify it like that:
#+BEGIN: join :cols (quty Protein) :mas-table recipe :mas-column type :ref-table nut :ref-column type | quty | Protein | |------+---------| | 70 | 1.3 | | 120 | 0.8 | | 300 | 0.8 | | 100 | 6.6 | #+END:
Or like that:
#+BEGIN: join :cols "quty Protein" :mas-table recipe :mas-column type :ref-table nut :ref-column type | quty | Protein | |------+---------| | 70 | 1.3 | | 120 | 0.8 | | 300 | 0.8 | | 100 | 6.6 | #+END:
It may happen that no row in the reference table matches a value in
the master table. By default, in this case, the master row is kept,
with empty cells added to it. Information from the master table is
not lost. If, for example, a line in the recipe refers to an unknown
“amaranth” product (a cereal known by the ancient Incas), then the
resulting table will still contain the amaranth row, with empty
nutritional facts.
| type | quty | type | Fiber | Sugar | Protein | Carb | |----------+------+----------+-------+-------+---------+------| | onion | 70 | onion | 1.3 | 4.4 | 1.3 | 9.0 | | tomato | 120 | tomato | 0.6 | 2.1 | 0.8 | 3.4 | | eggplant | 300 | eggplant | 2.5 | 3.2 | 0.8 | 8.6 | | tofu | 100 | tofu | 0.7 | 0.5 | 6.6 | 1.4 | | amaranth | 120 | | | | | |
This behavior is controlled by the :full parameter:
:full masthe joined result contains the full master table (the default):full refthe joined result contains the full reference tables:full mas+refthe joined result contains all rows from both mater and all reference tables:full noneor:full nilthe joined result contains only rows that appear in both tables
The use cases may be as follow:
:full masis useful when the reference table is large, as a dictionary or a nutritional facts table. We just pick the needed rows from the reference.:full mas+refis useful when both tables are similar. For instance, one table has been grown by a team, and the other independently by another team. The joined table will contain additional rows from both teams.:full noneis useful to create the intersection of tables. For instance we have a list of items in the main warehouse, and another list of damaged items. We are interested only in damaged items in the main warehouse.
Sometimes an input table may be unaligned or malformed, with incomplete rows, like those ones:
| type | Fiber | Sugar | | Carb |
|----------+-------+-------+------+------|
| eggplant | 2.5 | 3.2 | 0.8 | 8.6 |
| tomato | 0.6 | 2.1 | 0.8 | 3.4 |
| onion | 1.3 | 4.4 | 1.3 | 9.0 |
| egg | 0 | 18.3 | 31.9 | 18.3 |
| rice | 0.2 | 0 | 1.5 | 16.0 |
| tofu | 0.7
| nut | 2.6 | 1.3 | 4.9 | 7.2 |
| type | quty |
|----------+------|
| onion | 70 |
| tomato |
| eggplant | 300 |
| tofu | 100 |
Missing cells are handled as though they were empty.
The master and the reference tables may or may not have a header. When there is a header, it may extend over several lines. A header ends with an horizontal line.
OrgtblJoin tries to preserve as much of the master table as possible. Therefore, if the master table has a header, the joined table will have it verbatim, over as many lines as needed.
The reference tables headers (if any), will fill-in the header (if any) of the resulting table. But if there is no room in the resulting table header, the reference tables headers lines will be ignored, partly of fully.
Header are useful to refer to columns. If there is no header, then
columns must be referred with $ names: $1 is the name of the first
column, $2 is the name of the second column, and so on. This is
pretty much the same as in the Org Mode spreadsheet.
The wizard may be invoked in-place or for the pull-mode.
Invoke the wizard in-place by typing M-x orgtbl-join with the cursor
inside the master table to be enriched. The cursor should be anywhere
in the column serving the join process.
The menu entry Tbl > Column > Join with another table is equivalent to
M-x orgtbl-join.
For the pull-mode, the same wizard may create a fresh new block
#+BEGIN: join..., or amend an existing one. Invoke it with
- either
M-x orgtbl-join-insert-dblock-join - or
C-c C-x x =join.
Put the cursor on an empty space in your Org Mode file, or on an
existing #+BEGIN: join... block.
For all questions, completion is available.
Note: there are many kinds of dynamic blocks that can be inserted
besides join.
As there might be as many reference tables as wanted, the wizard
continues asking for reference tables. When done, answer n when the
wizard ask if you want an additional reference table to be joined.
The wizard does not (yet) take into account the :cols and :post
parameters. If there where such parameters already specified, the
wizard will leave them untouched.
What if we need not to append data from some table/s to a main table, but to join 2 similar or symmetric tables with different data?
Let’s assume we have these 2 tables:
+#TBLNAME: TagsQ1 | tag | Q1 | |------+----| | tagA | 25 | | tagB | 18 | | tagC | 13 | | tagD | 6 | | tagE | 2 | | tagF | 2 | | tagG | 1 |
and
+#TBLNAME: TagsQ2 | tag | Q2 | |--------+----| | tagA | 2 | | tagD | 3 | | tagE | 3 | | tagF | 5 | | tagG | 7 | | tagH | 11 | | tagI | 15 |
Looking closely at both tables we can observe that some of these tags appear in both (tags A, D, E, F, G), some only on Q1 (D, C) and other only on Q2 (H, I, …).
We want to create a table that includes all the tags, with a column with their frequency for table TagsQ1 and another for TagsQ2.
So we can create the orgtbl-join block with the Wizard. Type C-c C-x x, then answer join.
As our tables are somehow symmetric (no one is a primary one), you will choose arbitrarily TagsQ1 as the “master table” and TagsQ2 as the “reference table”.
So continue answering to the wizard:
- Master table: TagsQ1
- Reference table: TagsQ2
- joining column in reference table: tag
- joining column in master table: tag
Then there is a question about which table should appear entirely. In the result you want, there might be missing values in both Q1 and Q2 columns. Therefore the right answer is: mas+ref
Eventually you get:
#+BEGIN: join :mas-table "TagsQ1" :ref-table "TagsQ2" :mas-column "tag" :ref-column "tag" :full "mas+ref" | tag | Q1 | Q2 | |--------+----+----| | tagA | 25 | 2 | | tagB | 18 | | tagC | 13 | | tagD | 6 | 3 | | tagE | 2 | 3 | | tagF | 2 | 5 | | tagG | 1 | 7 | |--------+----+----| | tagH | | 11 | | tagI | | 15 | | tag... | | 19 | #+END:
The tagB and tagC rows are incomplete on purpose. To fill in the table, just type TAB inside it.
Additional columns can be specified for the resulting table. With the
previous example, we added a 7th column multiplying columns 2 and 3.
This results in a line beginning with #+TBLFM: below the table, as
usual in Org spreadsheet. This line will survive re-computations.
Moreover, we added a spreadsheet formula with a :formula
parameter. This will fill-in the 7th column header. It is translated
into a usual #+TBLFM: spreadsheet line.
#+BEGIN: join :mas-table recipe :mas-column type :ref-table nut :ref-column type :formula "@1$7=totfiber" #+name: richer | type | quty | Fiber | Sugar | Protein | Carb | totfiber | |----------+------+-------+-------+---------+------+----------| | onion | 70 | 1.3 | 4.4 | 1.3 | 9.0 | 91. | | tomato | 120 | 0.6 | 2.1 | 0.8 | 3.4 | 72. | | eggplant | 300 | 2.5 | 3.2 | 0.8 | 8.6 | 750. | | tofu | 100 | 0.7 | 0.5 | 6.6 | 1.4 | 70. | #+TBLFM: $7=$2*$3::@1$7=totfiber #+END:
The joined table can be post-processed with the :post parameter. It
accepts a Lisp lambda, a Lisp function, a Lisp expression, or a Babel
block.
The processing receives the joined table as parameter in the form of a Lisp expression. It can process it in any way it wants, provided it returns a valid Lisp table.
A Lisp table is a list of rows. Each row is either a list of cells, or
the special symbol hline.
In this example, a lambda expression adds a hline and a row for ginger.
#+BEGIN: join ... :post (lambda (table) (append table '(hline (ginger na na na na)))) | product | quty | Carb | Fiber | Sugar | Protein | |-----------+--------+------+-------+-------+---------| | onion | 70 | 9.0 | 1.3 | 4.4 | 1.3 | | unknown | 999 | | tomatoe | 120 | 3.4 | 0.6 | 2.1 | 0.8 | |-----------+--------+------+-------+-------+---------| | ginger | 33 | na | na | na | na | #+END:
The lambda can be moved to a defun. The function is then passed to the
:post parameter:
#+begin_src elisp
(defun my-function (table)
(append table
'(hline (ginger na na na na))))
#+end_src
... :post my-function
The :post parameter can also refer to a Babel block. Example:
#+BEGIN: join ... :post "my-babel-block(tbl=*this*)" ... #+END:
#+name: my-babel-block
#+begin_src elisp :var tbl=""
(append tbl
'(hline (ginger na na na na)))
#+end_src
The block is passed the table to process in a Lisp variable called
*this*.
Any of the input tables may be the result of executing a Babel script. In this case, the table is virtual in the sense that it appears nowhere.
(Babel is the Org Mode infrastructure to run scripts in any language, like Python, R, C++, Java, D, shell, whatever, with inputs and outputs connected to Org Mode).
Example:
Here is a script in Emacs Lisp which creates an Org Mode table.
#+name: ascript (list '(type quty) 'hline (list "tomato" (* 53.1 12)) (list "tofu" (* 12.5 7)))
If executed, the script would output this table:
#+RESULTS: ascript | type | quty | |--------+-------| | tomato | 637.2 | | tofu | 87.5 |
But instead, OrgtblJoin will execute the script and consume its output:
#+BEGIN: join :mas-table "ascript" :ref-table "nut" :mas-column "type" :ref-column "type" :full "mas" | type | quty | Fiber | Sugar | Protein | Carb | |--------+-------+-------+-------+---------+------| | tomato | 637.2 | 0.6 | 2.1 | 0.8 | 3.4 | | tofu | 87.5 | 0.7 | 0.5 | 6.6 | 1.4 | #+END:
Here the parameter :mas-table specifies the name of the script to be
executed.
As in any other Org Mode source block, the input table may come from several places. OrgAggregate adds even more kinds of input.
The parameter after :table may be:
mytable: an ordinary Org Mode table in the same buffer, namedmytable./some/dir/file.org:mytable: an ordinary Org Mode table namedmytable, in a distant Org file named/some/dir/file.org.mybabel: an Org Mode Babel block namedmybabelin the current buffer, generating a table as its output, written in any language.mybabel(param1=123,param2=456): passing parameters to an Org Mode Babel block namedmybabelin the current buffer, generating a table as its output, written in any language./some/dir/file.org:mybabel(param1=123,param2=456): an Org Mode Babel block namedmybabelin a distant org file named/some/dir/file.org, called with parameters./some/dir/file.csv:(csv params…): a comma-separated-values file in the CSV format, in the file/some/dir/file.csv. The separators may be TAB, comma, or semicolon, they are guessed and different separators may be mixed. Any empty row in the CSV file is interpreted as an horizontal separator (hlinein Org table parlance).Parameters may be:
header: the first row in the CSV file is interpreted as a header containing the column names.colnames (column1 column2 column3 …): the column names are given explicitly, in case the CSV file contains only data, no header.
In any case, the columns may be references as
$1, $2, $3, …as usual./some/dir/file.json:(json params…): a file containing a JSON formatted table, in the file/some/dir/file.csv. Currently, the only accepted format is an array of arrays. There are currently no parameters. In the future it may be possible to specify alternative sub-formats.34cbc63a-c664-471e-a620-d654b26ffa31: an identifier of an Org Mode sub-tree. The sub-tree is supposed to contain a table, which is retrieved. Those Org Mode identifiers span all known Org Mode files. To add such an identifier, put the cursor on the heading of the sub-tree, and typeM-x org-id-get-create.
The Org Mode also provides for table slicing. All of the previous references may be followed by an optional slicing. Examples:
mytable[0:5]: retain only the first 6 rows of the input table; if the table has a header, then it counts as 2 rows (the header and the separation line); in this example, it would retain rows 0 and 1 for the header, and rows 2,3,4,5 for the content.mytable[,0:1]: retain only the first 2 columns.mytable[0:5,0:1]: retain only the first 6 rows and the first 2 columns.
In an above example we gave a name to the resulting joined table:
#+name: richer. Doing so the joined table may become an input for a
further computation, for example in a Babel block.
The name will survive re-computations. This happens only in pull mode.
Note that the #+name: richer line could appear above the #+BEGIN:
line. But sometimes this is not taken into account by further Babel
blocks.
OrgtblJoin used to handle just one reference table. Now, as many as wanted are handled.
To specify the reference tables, just use several times the :ref-table
and :ref-column parameters. They must match: for instance, the third
:ref-table must match the third :ref-column.
For now, the :full and :mas-column parameters should be mentionned
just once. This could change in the future with as many such
parameters as reference tables.
One side effect of going multiple, is that zero reference table is now accepted. In this case, the result of the join is just the master table. But it can be change in several ways:
- Selection and re-ordering of columns through the
:colsparameter. - Additional computed columns through the
:formulaparameter and survival of#+TBLFM:lines. - Lisp and Babel post-processing through the
:postparameter.
Emacs package on Melpa: add the following lines to your .emacs file,
and reload it.
(add-to-list 'package-archives '("melpa" . "http://melpa.org/packages/") t)
(package-initialize)
You may also customize this variable:
M-x customize-variable package-archives
Then browse the list of available packages and install orgtbl-join
M-x package-list-packages
Alternatively, you can download the Lisp files, and load them:
(load-file "orgtbl-join.el")
You may want to add an entry in the Table menu, Column sub-menu. You
may also want to call orgtbl-join with C-c j. One way to do so is to
use use-package in your .emacs init file:
(use-package orgtbl-join
:after (org)
:bind ("C-c j" . orgtbl-join)
:init
(easy-menu-add-item
org-tbl-menu '("Column")
["Join with another table" orgtbl-join (org-at-table-p)]))Note: there used to be a orgtbl-join-setup-keybindings function to do
just what the above use-package does. In this new way, key and menu
bindings are no longer hard-coded in the package.
Comments, enhancements, etc. welcome.
Author
- Thierry Banel, tbanelwebmin at free dot fr
- bymoz089 (GitHub) found and tracked-down a bug in the in-place joining
- Eduardo Mercovich (GitHub edumerco) wrote the documentation for the 2 similar tables use case.
Contributors
- Dirk Schmitt, surviving
#.NAME:line - wuqui,
:colsparameter - Misohena (https://misohena.jp/blog/author/misohena), double width Japanese characters (string-width vs. length)
- Shankar Rao,
:postpost-processing - Piotr Panasiuk,
#+CAPTION:and any tags survive - Luis Miguel Hernanz, multiple reference tables suggestion, fix regex bug
- remove duplicate reference column
- fix keybindings
#.NAME:inside#.BEGIN:survives- missing input cells handled as empty ones
- back-port Org Mode
9.4speed up - increase performance when inserting result into the buffer
- aligned output in push mode
- 2 as column name no longer supported, write $2
- add
:fullparameter - remove
C-c C-x i, use standardC-c C-x xinstead - added the
:colsparameter :postpost-processing- 3x speedup
org-table-to-lispand avoid Emacs 27 to 30 incompatibilities #+CAPTION:and any other tag survive inside#+BEGIN:- now there can be several reference tables in a join, instead of just one.
- Documentation is now integrated right into Emacs in the
infoformat. TypeM-: (info "orgtbl-join") - TOC in README.org (thanks org-make-toc)
- Virtual input table produced by Babel blocks
- Speedup of resulting table recalculation when there are formulas in
#+tblfm:or in:formula. The overall join may be up to x4 faster and ÷4 less memory hungry. - Add the chapter “Joining 2 similar tables” for a common use case.
orgtbl-join is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
orgtbl-join is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.