-
Notifications
You must be signed in to change notification settings - Fork 1
Notes on *
Recently I was working on some nodes for drawing simple geometric primitives on a small LCD using the GA144. The screen resolution was 128 x 128. This meant that the coordinates of a point in this 2D plane can be represented by a tuple of 7 bit integers. Conveniently this allows me to pack a point or 2D vector into a single 18bit word and still have 4 bits left for additional data or flags. Most geometric transformations or intersection tests require a multiply operation. The GA144 does not have a full 18x18 multiplier but does have a "Multiply Step" instruction. At first I went to great lengths to avoid having to do any multiplies at all because an 18 x 18 multiply function using the multiply step takes close to 100ns.
: *
a! @p dup dup
17
or push
+* unext a ;
The nice thing about the +* instruction is there are variations on how it can be used, especially if you don't need a full 18 bit multiply. In my case I only needed a 7b x 7b operation. Like the + instruction, +* needs another cycle for the ripple carry to propagate through the second half of the 18 bit word. The unext in the examples so far take care of that. All of this is documented nicely at https://colorforth.github.io/arith.htm.
: *7b
a! @p dup dup
6
or push
+* unext ;
In the short multiply function, the signed multiplicand needs to be shifted the number of bits you are multiplying to the left. So, in my case the format is,
multiplier in a reg = 00000000000|XXXXXXX
multiplicand in S = 0000XXXXXXX|0000000
In addition to multiplying 7 bit integers I also found the need to work with fixed point decimal numbers. What's really cool is that some of this comes for free with the short multiply. The trick is that the lower 7 (or up to 9) bits in the multiplicand can be used as fractional bits without changing any of the code. So what we get is a 7i.7f x 7i = 7i multiply. The result is a truncated integer. In my case this is perfect. Most of the time the result I will be looking for will be quantized in the 128 x 128 2D plane anyway.
The last trick I found was a nice optimization. If you are multiplying either 9i.0f x 9i or 0i.9f x 9i the second cycle for the ripple carry is not needed resulting in a much faster operation. In my last example below, the multiplier is already set in a. My calculations put this function at around 21.6ns or 46mips. Not bad!
: mul
dup dup or
+* +* +* +*
+* +* +* ;