- A map implementation based on DAWG (Directed Acyclic Word Graph)
- Maps are serialized to sequences of bytes using double array trie format
- This takes static key set, and assigns the unique identifiers for the each keys
- Note that the input keys must be unique and lexically ordered
- The identifier assigned to a key is the zero-origin index of the key in the input sequence
- This library aims to provide a handy way for building maps which have tens of millions of elements in Common Lisp
(require :asdf)
(push *default-pathname-defaults* asdf:*central-registry*)
(asdf:load-system :dawg)(dawg:build :input "/usr/share/dict/words" :output "words.dawg")
(defparameter *dawg* (dawg:load "words.dawg"))
(dawg:member? "hello" *dawg*)
T
(dawg:get-id "hello" *dawg*)
50195
(dawg:each-common-prefix (id end) ("hello" *dawg*)
(print (list id (subseq "hello" 0 end))))
(49012 "h")
(49845 "he")
(50183 "hell")
(50195 "hello")Builds a DAWG index file from the input key set.
input:- The pathname of a key set file or a list of keys
- "key set file" is line delimitered plain text file (a line represents a key)
- Restrictions:
- The input keys must be unique and lexically ordered
- A key cannot contain null characters
- Type:
(or string pathname list)
- The pathname of a key set file or a list of keys
output:- The pathname of the resulting DAWG index file
- Type:
(or string pathname)
byte-order:- The endianness of the output file
- Type:
(member :native :little :big) - Default:
:native
show-progress:- Indicates whether or not to show the progress
- Type:
boolean - Default:
nil
Loads the DAWG map from the specified index file.
index-path:- The pathname of an index file that built via
dawg:buildfunction - Type:
(or string pathname file-stream)
- The pathname of an index file that built via
byte-order:- The endianness of the input file
- Type:
(member :native :little :big) - Default:
:native
Returns t if dawg contains the given key, otherwise nil.
key:- Type:
(simple-array character)
- Type:
dawg:- Type:
dawg:dawg
- Type:
start:- The start position in
key - Type:
positive-fixnum - Default:
0
- The start position in
end:- The end position in
key - Type:
positive-fixnum - Default:
(length key)
- The end position in
Returns the identifier assigned to the given key.
If the key does not exist in dawg, this function will return nil.
key:- Type:
(simple-array character)
- Type:
dawg:- Type:
dawg:dawg
- Type:
start:- The start position in
key - Type:
positive-fixnum - Default:
0
- The start position in
end:- The end position in
key - Type:
positive-fixnum - Default:
(length key)
- The end position in
Executes common-prefix search for the given key.
For each key in dawg that matches the prefix part of key, match-id and match-end are bound then body is exeucted.
By using the return function, it is possible to break the loop halfway.
match-id:- The identifier of the key matched with the prefix part of the input
key - Type:
positive-fixnum
- The identifier of the key matched with the prefix part of the input
match-end:- The end position of the matched part in the input
key(i.e., the length of the matched key) - Type:
positive-fixnum
- The end position of the matched part in the input
key:- Type:
(simple-array character)
- Type:
dawg:- Type:
dawg:dawg
- Type:
start:- The start position in
key - Type:
positive-fixnum - Default:
0
- The start position in
end:- The end position in
key - Type:
positive-fixnum - Default:
(length key)
- The end position in
body:- The expression to be executed in each iteration
A simplified version of dawg:each-common-prefix that does not bind the match-end variable in each iteration.