Try to avoid promoting to int (16-bit) when doing operations with char (8-bit)

Currently we do all arithmetic in 16-bit, even when the operands and destination are 8-bit (i.e. `char` types). This is a very single-pass-compiler and C thing to do (surely this is what the usual arithmetic conversions were designed for), but since uxn has native 8-bit operations and a limited stack size, it's neither efficient nor makes for particularly æsthetically-pleasing assembly. This will get worse if we start doing sign extension when promoting `char` to `int` (see https://github.com/lynn/chibicc/issues/9).

So, it would be nice if `(char)(some_char * 2 + 3)` could be codegen'd as `#02 MUL #03 ADD` rather than `#0002 MUL2 #0003 ADD2`. As I see it there's two ways this could be done: in a “single-pass” fashion by changing the codegen step, or with some sort of later optimisation pass.

I am optimistic about the former approach. I think we could do it by propagating cast/conversion information downwards when doing codegen for expressions.

Currently the codegen behaviour is something like:

* When encountering `(char)(some_char * 2 + 3)`:
   * Recurse to generate `(some_char * 2 + 3)`
      * Recurse to generate `some_char * 2`
         * Recurse to generate `some_char`
           * Output code for loading `some_char`
           * Output code to extend to `int` (something like `00 SWP`)
         * Recurse to generate `2`
           * Output `0002`
         * Output `MUL2`
     * Recurse to generate `3`
        * Output `0003`
     * Output `ADD2`
   * Output code to truncate to `char` (something like `NIP`)
   * Output code to extend to `int` (something like `00 SWP`)

In the new system there would be a new flag used in expression codegen, something like `truncate_to_byte`. Now the behaviour would look something like:

* When encountering `(char)(some_char * 2 + 3)`:
   * Recurse to generate `(some_char * 2 + 3)` _with `truncate_to_byte` set_
      * Recurse to generate `some_char * 2` _with `truncate_to_byte` set_
         * Recurse to generate `some_char` _with `truncate_to_byte` set_
           * Output code for loading `some_char`
           * ~Output code to extend to `int` (something like `00 SWP`)~
         * Recurse to generate `2` _with `truncate_to_byte` set_
           * Output _`02`_ ~`0002`~
         * Output _`MUL`_ ~`MUL2`~
     * Recurse to generate `3` _with `truncate_to_byte` set_
        * Output _`03`_ ~`0003`~
     * Output _`ADD`_ ~`ADD2`~
   * ~Output code to truncate to `char` (something like `NIP`)~
   * Output code to extend to `int` (something like `00 SWP`) _but only if `truncate_to_byte` is not set_

A more tricky case is something like `(char)((1 << 8) >> 8)`, where we can't use 8-bit operations all the way down. That can be handled by not propagating `truncate_to_byte` when dealing with such operators.

An important thing to note here is that it's strictly a codegen optimisation: the type information isn't affected. I don't think it's possible to do the same trick during type assignment instead, it would break things.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Try to avoid promoting to int (16-bit) when doing operations with char (8-bit) #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Try to avoid promoting to int (16-bit) when doing operations with char (8-bit) #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions