Forthress is a Forth dialect made for fun and educational purposes.
Forthress is written in NASM using bootstrap technique. It means that the main
interpreter/compiler loop (outer loop) is written in Forthress. The inner
interpreter (see next in src/forthress.asm) is written in assembly, and so
are some words.
Most of the language traits are fairy close to the classic Forth dialects. Several things have to be mentioned about Forthress:
- It uses Indirect Threaded Code
- Strings are null-terminated
- XT stands for execution token, an address immediately following word header
- Word header has zero-bytes around the name. Here is the example for
dup:
| link (8 bytes) | zero (1) | name (variable) | zero (1) | flags (1) | implementation |
|---|---|---|---|---|---|
| 0x0000000000000000 | 0 | d u p | 0 | 0 | dup_impl |
Forthress was written as an exercise and an example of how one can create a working Forth interpreter which bootstraps itself.
Forthress is also created as an example for my course book "Low-Level Programming: C, Assembly, and Program Execution on Intel x86-64 Architecture".
-
drop( a -- ) -
swap( a b -- b a ) -
dup( a -- a a ) -
rot( a b c -- b c a ) -
Arithmetic:
+( x y-- [ x + y ] )*( x y-- [ x * y ] )/( x y-- [ x / y ] )%( x y-- [ x mod y ] )-( x y-- [x - y] )<( x y-- [x < y] )
-
Logic:
-
not( a -- a' ) a' = 0 if a != 0 a' = 1 if a == 0 -
=( a b -- c ) c = 1 if a == b c = 0 if a != b -
land( a b -- a && b ) Logical and -
lor( a b -- a || b ) Logical or
-
-
Bitwise
and( a b -- a & b ) Bitwise andor( a b -- a | b ) Bitwise or
-
'Read word, find its XT, place on stack (or zero if no such word).
Example:
' dup . ( will output dup's address ) colon "info", info
-
count( str -- len ) Accepts a null-terminated string, calculates its length. -
printc( str cnt -- ) Prints a certain amount of characters from string. -
.Drops element from stack and sends it to stdout. -
.SShows stack contents. Does not pop elements. -
initStores the data stack base. It is useful for.S. -
docolThis is the implementation of any colon-word. The XT itself is not used, but the implementation (i_docol) is. -
exitExit from colon word. -
r>Push from return stack into data stack. -
>rPop from data stack into return stack. -
r@Non-destructive copy from the top of return stack to the top of data stack. -
find( str -- header_addr ) Accepts a pointer to a string, returns pointer to the word header in dictionary. -
cfa( word_addr -- xt ) Converts word header start address to the execution token -
emit( c -- ) Outputs a single character to stdout -
word( addr -- len ) Reads word from stdin and stores it starting at address addr. Word length is pushed into stack -
number( str -- len num ) Parses an integer from string. -
prints( addr -- ) Prints a null-terminated string. -
byeExits Forthress -
syscall( call_num a1 a2 a3 a4 a5 a6 -- new_rax new_rdx) Executes syscall The following registers store arguments (according to ABI) rdi , rsi , rdx , r10 , r8 and r9 -
branchJump to a location. Location is absolute. That means that using it interactively is quasi-impossible; however, using it as a low-level primitive to implementifand similar constructs is much more convenient.Branch is a compile-only word.
-
0branchJump to a location if TOS = 0. Location is calculated in a similar way.Branch0 is a compile-only word.
-
litPushes a value immediately following this XT. -
inbufAddress of the input buffer (is used by interpreter/compiler). -
memAddress of user memory. -
last_wordHeader of last word address. -
state, state State cell address. The state cell stores either 1 (compilation mode) or 0 (interpretation mode). -
herePoints to the last cell of the word currently being defined . -
execute( xt -- ) Execute word with this execution token on TOS. -
@( addr -- value ) Fetch value from memory. -
!( val addr -- ) Store value by address. -
c!( char addr -- ) Store one byte by address. -
c@( addr -- char ) Read one byte starting at addr. -
,( x -- ) Add x to the word being defined. -
c,( c -- ) Add a single byte to the word being defined. -
create( flags name -- ) Create an entry in the dictionary name is the new name. Only immediate flag is implemented ATM. -
:Read word from current input stream and start defining it. -
;" End the current word definition -
interpretForthress interpreter/compiler. Usesin_fdinternally to know what to interpret. -
interpret-fd(fd -- ) Interpret everything read from file descriptorfd.
trapdefault implementation of a word that will be executed on SIGSEGV.trap_dispatchselects the most recenttrapversion.
dpAddress of a cell storing the end of global data segment.memAddress of the start of global data segment.statecompile (1) or interpret (0)hereCurrent position in current word. Used in compile mode by immediate words.in_fdThe file descriptor from which we are currently reading words.
Forthress interpreter uses following words (in order of appearance):
dup
find
branch0
cfa
state
fetch
lit
minus,
fetch_char
not
swap
drop
comma
exit
execute
number
state
here
equals
prints
bye Linux/LXSS only (it relies on system calls). I don't think we should support more systems because this is an educational project first, and multiple preprocessor directives will clutter it to death.
src/forthress.asmdefines the entry point, most important constants, inner interpreter, memory regions etc.src/macro.incis an utility file which stores macro definitions to sweeten the words definition.src/words.incis the assembly file containing all predefined words.src/util.asmis built into a separate static library containing input and output utility functions to read strings or numbers from arbitrary descriptor and output them to arbitrary descriptor. Forthress is using Linux system calls directly to deal with I/O and does not rely on any library (such aslibc).