Thanks to visit codestin.com
Credit goes to github.com

Skip to content

krisbalintona/indexed

 
 

Repository files navigation

Indexed

MELPA

An efficient cache of metadata about all your Org files.

Builds fast. My M-x indexed-reset:

indexed: Analyzed 160616 lines in 10734 entries (3397 with ID)
         in 2394 files in 1.37s (+ 0.16s to build SQLite DB)

This library came from asking myself “what could I move out of org-node, that’d make sense in core?” Maybe a proposal for upstream, or at least a PoC.

Many Org plugins now do reinvent the wheel, when it comes to keeping track of some or many files and what may be in them.

Example: org-roam’s DB, org-node’s hash tables, orgrr’s hash tables, …, and some just re-run grep all the time, which still leads to writing elisp to cross-reference the results with something useful.

And let’s not talk about the org-agenda… Try putting 2000 files into org-agenda-files! It needs to open each file in real-time to know anything about them, so everyday commands grind to a halt.

Quick overview

Data will exist after setup akin to this and you wait a second or two.

(setq indexed-org-dirs '("~/org" "~/Sync/notes"))
(indexed-updater-mode)
(indexed-roam-mode) ;; optional

Two different APIs to access the same data.

  • sql
  • elisp

Why two? It’s free. When the data has been gathered anyway, there is no reason to only insert it into a SQLite db, nor only put it in a hash table.

Hash table is nicer for simple lookups (and slightly faster). SQL excels for complex lookups.

As a bonus, it’s fairly easy to design your own SQL database, so this library does not lock you in. Included is one that matches org-roam’s database perfectly, and a future milestone might be adding one that matches org-sql.

To use SQL, see Appendix I. For the elisp API, see Appendix II.

Data only

A design choice: Indexed only delivers data. It could easily ship conveniences like, let’s call it a function “indexed-goto”:

(defun indexed-goto (entry)
  (find-file (indexed-file-name entry))
  (goto-char (indexed-pos entry))

but in my experience, that will spiral into dozens of lines over time, to handle a variety of edge cases, and then it will no longer be universally applicable. Maybe you prefer to handle edge cases different than I do.

So, it is up to you to write your own “goto” function and all else to do with user interaction.

No Org at init

A design choice: Indexed does not use Org code to analyze your files, but a custom, more dumb parser. That’s for three reasons:

  1. Fast init. Since I want Emacs init to be fast, it’s not allowed to load Org, but I still want to be able to use a command like org-node-find to browse my Org stuff, before deciding to actually jump into an Org file.

    That means the data must be able to exist, before Org has loaded.

    • Future milestone: I want to be told at init if there’s a dangling Org clock somewhere. Or a deadline.
  2. Robustness. Many users heavily customize Org, so no surprise that it sometimes breaks. In my experience, it’s very nice then to have an alternative way to browse that does not depend on a functional Org setup.
  3. Fast rebuild. As they say, there are two hard things in computer science: cache invalidation and naming thingsnote1.

    Indexed must update its cache as the user saves, renames and deletes files. Not a difficult problem, until you realize that files may change due to a Git operation, OS file operations, a rm command on the terminal, edits by another Emacs instance, or even remote edits by Logseq.

    A robust approach to cache invalidation is to avoid trying: ensure that a full rebuild is fast enough that you can just do that instead.

    In fact, indexed-updater-mode does a bit of both, because it is still important that saving a file does not lag; it does its best to update only the necessary tables on save, and an idle timer causes a full reset every now and then.

    • [note1]: I sometimes feel I failed at naming this library! Got opinions/ideas? Leave a comment!

Appendix I: A SQLite database, for free

Included are two designs:

  1. A drop-in for org-roam’s (org-roam-db), called (indexed-roam).
  2. Our own experimental (indexed-orgdb).

Quick start with indexed-roam

UPDATE NOTICE [2025-03-23 Sun 16:44]: Function (indexed-roam) now returns an EmacSQL connection.

Without org-roam installed

Activating the mode creates an in-memory database by default.

(indexed-roam-mode)

Test that it works:

(emacsql (indexed-roam) [:select * :from files :limit 10])

With org-roam installed

To end your dependence on org-roam-db-sync, you can set the following. It will overwrite the “org-roam.db” file.

(setq org-roam-db-update-on-save nil)
(setq indexed-roam-overwrite t)
(indexed-roam-mode)

Now, you have a new, all-fake org-roam.db! Test that it works:

(org-roam-db-query [:select * :from files :limit 10])

N/B: because (equal (org-roam-db) (indexed-roam)), the above is equivalent to these:

(emacsql (org-roam-db) [:select * :from files :limit 10])
(emacsql (indexed-roam) [:select * :from files :limit 10])

There’s a known issue if you use multiple Emacsen, the error “attempt to write a readonly database”. Get unstuck with M-: (org-roam-db--close-all) if that happens.

Quick start with indexed-orgdb

This DB is a bit different, subject to redesign.

(indexed-orgdb-mode)

Note it probably does not work with EmacSQL, just the Emacs 29+ built-in sqlite-select.

In practice, you can often translate a statement like

(org-roam-db-query [:select tag :from tags :where (= id $s1)] id)

to

(sqlite-select (indexed-orgdb) "select tag from tags where id = ?;" (list id))

or if you like mysterious aliases,

(indexed-orgdb "select tag from tags where id = ?;" id)

Appendix II: Lisp API

There are three types of objects: file-data, org-entry and org-link. Logically speaking, files contain entries and entries contain links, right?

Functions of no argument

  • indexed-org-files
    • Return all file objects
  • indexed-org-entries
    • Return all entry objects
  • indexed-org-id-nodes
    • Return all entry objects that have an ID
  • indexed-org-links-and-citations
    • Return all link objects
  • indexed-org-links
    • Return all link objects with a type such as id: or https:

Functions operating on raw file paths

  • indexed-entry-near-lnum-in-file
  • indexed-entry-near-pos-in-file
  • indexed-id-nodes-in
  • indexed-entries-in

Functions operating on raw id

  • indexed-entry-by-id
  • indexed-file-by-id
  • indexed-links-from

Functions operating on raw titles

  • indexed-id-node-by-title

Functions operating on ORG-LINK

  • indexed-dest
  • indexed-type
  • indexed-heading-above
  • indexed-pos
  • indexed-id-nearby
    • (old alias: indexed-origin. Org-roam calls the same thing “source” and org-node calls it “origin”, but both terms presume an ID-centric design to everything, and make less sense when you allow for the absence of IDs.)

Functions operating on ENTRY

  • indexed-deadline
  • indexed-heading-lvl
  • indexed-id-links-to
  • indexed-olpath
  • indexed-olpath-with-self
  • indexed-olpath-with-self-with-title
  • indexed-olpath-with-title
  • indexed-pos
  • indexed-priority
  • indexed-properties
  • indexed-property
  • indexed-property-assert
  • indexed-roam-aliases – works without indexed-roam-mode
  • indexed-roam-reflinks-to – needs indexed-roam-mode
  • indexed-roam-refs – needs indexed-roam-mode
  • indexed-root-heading-to
  • indexed-scheduled
  • indexed-tags
  • indexed-tags-inherited
  • indexed-tags-local
  • indexed-todo-state
  • indexed-toptitle

Polymorphic functions (that work on all three types)

  • indexed-file-name
  • indexed-file-data
  • indexed-file-title
  • indexed-file-title-or-basename
  • indexed-file-mtime

Hooks used by command indexed-reset and mode indexed-updater-mode

  • indexed-pre-full-reset-functions
  • indexed-post-full-reset-functions
  • indexed-record-file-functions
  • indexed-record-entry-functions
  • indexed-record-link-functions

Additional hooks used by mode indexed-updater-mode

  • indexed-pre-incremental-update-functions
  • indexed-post-incremental-update-functions
  • indexed-forget-file-functions
  • indexed-forget-entry-functions
  • indexed-forget-link-functions

Extension: indexed-x.el

A separate file indexed-x.el is loaded when you enable indexed-updater-mode.

It is separate because indexed-updater-mode is not strictly necessary – it could be replaced by a simple timer that calls indexed-reset every 20 seconds, or whatever you deem suitable.

The file also ships some extra tools.

Programmer tool: Instantly index thing at point

You may want to call the following functions after inserting entries or links in a custom way, if they need to become indexed instantly without waiting for user to save the buffer:

  • indexed-x-ensure-entry-at-point-known
  • indexed-x-ensure-link-at-point-known

Examples of when those are useful is when you write a command like org-node-extract-subtree or a subroutine like org-node-backlink–add-in-target.

Appendix III: Make your own database

Steps:

  1. Read file indexed-roam.el as a reference implementation, or file indexed-orgdb.el if you want only the built-in sqlite feature and no EmacSQL
    • See how it looks up the data it needs
    • See which things require a prin1-to-string
    • See how arguments are ultimately passed to sqlite-execute

    [TODO: write a simpler example impl]

  2. Hook your own DB-creator onto indexed-post-full-reset-functions, or just on some hook that suits your use-case
  3. Done!

Appendix IV: User stuff

Modes

  • indexed-updater-mode
  • indexed-roam-mode

Config settings

  • indexed-warn-title-collisions
  • indexed-seek-link-types
  • indexed-org-dirs
  • indexed-org-dirs-exclude
  • indexed-sync-with-org-id
  • indexed-roam-overwrite

Commands

  • indexed-list-dead-id-links
  • indexed-list-title-collisions
  • indexed-list-problems
  • indexed-list-entries
  • indexed-list-db-contents
  • indexed-reset

Tip: Fully inform org-id

Never sit through a slow M-x org-id-update-id-locations again.

(setq indexed-sync-with-org-id t)

This tells org-id about everything that Indexed can find under indexed-org-dirs. That’s nice because if you had clicked an ID-link that org-id did not know about, it would react by running org-id-update-id-locations, hanging Emacs for quite a while.

Never had this problem? If you came here from org-node or org-roam, that’s because they solve this problem for you.

About

Turn thousands of Org files into a database in seconds

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Emacs Lisp 100.0%