# LoongArch Emulator (laemu)

A high-performance command-line emulator for executing LoongArch ELF binaries.

## Overview

`laemu` (LoongArch Emulator) is a userspace emulator that runs 64-bit LoongArch binaries on any host platform. It provides full Linux syscall emulation, allowing native execution of statically-linked LoongArch programs.

## Building

### Quick Build (Using build.sh)

The easiest way to build is using the provided build script:

```bash
cd emulator
./build.sh
```

For performance-critical workloads:

```bash
./build.sh --native
```

For help and all available options:

```bash
./build.sh --help
```

### Build Script Options

**Performance Options:**
- `-n, --native` - Enable native CPU optimizations (`-march=native`)
- `--lto` - Enable link-time optimization (enabled by default)
- `--no-lto` - Disable link-time optimization
- `-d, --debug` - Build in Debug mode (default: Release)

**Library Feature Options:**
- `--masked-memory-bits N` - Set masked memory arena size to 2^N bytes
  - Example: `--masked-memory-bits 32` for 4GB arena
  - Default: 0 (disabled, full address range)
- `--no-threaded` - Disable threaded dispatch optimization
- `--tailcall-dispatch` - Use tailcall dispatch instead of threaded dispatch
- `--embed <file.c>` - Embed pre-compiled binary translation from file

**Examples:**
```bash
# Standard optimized build
./build.sh

# Maximum performance with native optimizations
./build.sh --native

# With 4GB masked memory arena
./build.sh -N 32

# Build with embedded pre-compiled translation
./build.sh --bintr --embed program_bintr.c

# Debug build
./build.sh --debug
```

### Manual CMake Build

If you prefer direct CMake:

```bash
cd emulator
mkdir -p .build
cd .build
cmake .. -DCMAKE_BUILD_TYPE=Release
make
```

**CMake Options:**
- `NATIVE=ON` - Enable native CPU optimizations (`-march=native`)
- `LTO=ON` - Enable link-time optimization (default: ON)
- `LA_MASKED_MEMORY_BITS=N` - Set masked memory arena to 2^N bytes (0 = disabled)
- `LA_DEBUG=ON/OFF` - Enable debug output (default: OFF)
- `LA_BINARY_TRANSLATION=ON/OFF` - Enable binary translation (default: ON)
- `LA_THREADED=ON/OFF` - Enable threaded dispatch (default: ON)
- `LA_TAILCALL_DISPATCH=ON/OFF` - Use tailcall dispatch (default: OFF)
- `LA_EMBED_BINTR="path"` - Embed pre-compiled binary translation from file

**Example:**
```bash
cmake .. -DCMAKE_BUILD_TYPE=Release -DNATIVE=ON -DLTO=ON -DLA_MASKED_MEMORY_BITS=32
make
```

The emulator binary will be located at `.build/laemu`.

## Usage

```
laemu [options] <program> [args...]
```

### Options

| Option | Long Form | Description |
|--------|-----------|-------------|
| `-h` | `--help` | Show help message |
| `-v` | `--verbose` | Enable verbose output (loader & syscalls) |
| `-s` | `--silent` | Suppress all output except errors |
|  | `--precise` | Use precise simulation mode (slower, for verification) |
|  | `--stats` | Show bytecode usage statistics after execution |
| `-f <num>` | `--fuel <num>` | Maximum instructions to execute (default: 2000000000)<br/>Use 0 for unlimited |
| `-m <size>` | `--memory <size>` | Maximum memory in MiB (default: 4096) |
| `-n` | `--no-translate` | Disable binary translation (interpret only) |
|  | `--no-regcache` | Disable register caching in translated code |
|  | `--fast` | Enable fastest binary translation (unsafe optimizations) |
|  | `--nbit-as` | Use automatic N-bit address masking in binary translation |
| `-T` | `--trace` | Trace binary translation execution |
| `-O <file>` | `--output <file>` | Write generated translation code to file |

**Note:** The emulator automatically detects architecture from the ELF binary header.

### Examples

**Execute a program:**
```bash
./laemu program.elf
```

**Pass arguments to the guest program:**
```bash
./laemu program.elf arg1 arg2 arg3
```

**Verbose mode:**
```bash
./laemu --verbose program.elf
```

**Show bytecode statistics:**
```bash
./laemu --stats program.elf
```

**Limit execution and memory:**
```bash
./laemu --fuel 1000000 --memory 256 program.elf
```

**Silent execution (only errors and exit code):**
```bash
./laemu --silent program.elf
```

**Unlimited execution:**
```bash
./laemu --fuel 0 long_running_program.elf
```

**Generate binary translation code (for embedding):**
```bash
./laemu -O program_bintr.c program.elf
```

**Generate fastest embedded translation (unsafe optimizations):**
```bash
./laemu --fast -O program_bintr.c program.elf
```
This skips memory bounds checks for maximum performance.

**Compare fast-path vs slow-path (verification):**
```bash
./laemu --precise program.elf
```

## Output

### Default Mode
```
Program exited with code 0 (0.142857 seconds)
```
Will show instructions executed when -f is used.

### Verbose Mode
Shows loader information and syscall traces:
```
Loaded 4096 bytes from program.elf
Detected LA64 architecture
Arguments:
  program.elf
  arg1
Program entry point at: 0x120000790
[Syscall traces...]
Program exited with code 0
```

## Exit Codes

The emulator returns:
- **Guest exit code** - If the program completes successfully
- **-1** - If execution timeout or exception occurs
- **1** - If fatal error (file not found, invalid binary, etc.)

## Environment Variables (Windows/Non-POSIX Fallback)

On platforms without `getopt_long` support, configuration can be done via environment variables:

| Variable | Description |
|----------|-------------|
| `VERBOSE` | Set to enable verbose output |
| `SILENT` | Set to enable silent mode |
| `TIMING` | Set to enable timing output |
| `FUEL` | Maximum instructions to execute (0 = unlimited) |
| `MEMORY` | Maximum memory in MiB |

**Note:** Architecture is always auto-detected from the ELF binary.

**Example (Windows):**
```cmd
set TIMING=1
set MEMORY=256
laemu.exe program.elf
```

**Example (Unix):**
```bash
VERBOSE=1 TIMING=1 ./laemu program.elf
```

## Features

### Performance
- Fast interpreter with decoder cache
- **Binary translation** support for near-native performance
  - **Embedded translations**: ~75% of native speed (always activated when available)
  - **JIT compilation**: ~40% of native speed (fallback via libtcc)
  - See [Embedded Binary Translation Guide](../docs/EMBEDDED_BINTR.md)
- Efficient memory management with flat memory arena
- Native optimizations available via `--native`
- Link-time optimization enabled by default
- Threaded bytecode dispatch for maximum speed
- Optional tailcall dispatch for specific workloads

### Compatibility
- Linux syscall emulation
- Static binary support
- LA64 architecture (64-bit LoongArch)
- Cross-platform (Linux, macOS, Windows, FreeBSD)

### Execution Control
- Configurable instruction limits (fuel)
- Memory limits
- Silent mode for scripting
- Precise mode for verification
- Bytecode usage statistics with `--stats`

### Memory Management
- **Flat Memory Arena**: Uses a single contiguous memory region
- **Masked Memory Bits**: Optional power-of-two memory size restriction
  - When enabled (e.g., `--masked-memory-bits 32`), memory addresses are masked to fit within 2^N bytes
  - Example: `--masked-memory-bits 32` creates a 4GB arena where addresses wrap around
  - Default: disabled (bounds-checked arena)

### Program Arguments
Arguments passed after the program path are forwarded to the guest:
```bash
./laemu program.elf --guest-arg value
# Guest sees: argv[0]="program.elf", argv[1]="--guest-arg", argv[2]="value"
```

## Use Cases

### Running Cross-Compiled Programs

Compile and run LoongArch programs on x86_64:

```bash
loongarch64-linux-gnu-gcc -static hello.c -o hello.elf
./laemu hello.elf
```

### Performance Testing

Benchmark LoongArch binaries:

```bash
./laemu --timing benchmark.elf
```

### CI/CD Integration

Automated testing of LoongArch software:

```bash
#!/bin/bash
for test in tests/*.elf; do
  echo "Running $test..."
  ./laemu --silent "$test" && echo "PASS" || echo "FAIL"
done
```

### Development Workflow

Test cross-compiled code without LoongArch hardware:

```bash
# Build
loongarch64-linux-gnu-gcc -static myapp.c -o myapp.elf

# Test
./laemu myapp.elf test_input.txt

# Debug
./laemu --verbose myapp.elf
```

## Binary Requirements

The emulator supports:
- **Statically-linked** LoongArch ELF binaries
- **LA64** architecture
- Linux syscall ABI

Compile guest programs with:
```bash
loongarch64-linux-gnu-gcc -static -Wl,-Ttext-segment=0x200000 program.c -o program.elf
```

## Troubleshooting

### "Failed to open file"
- Check the file path
- Ensure the file exists and is readable

### "Execution timeout"
- Program exceeded instruction limit
- Increase fuel: `--fuel 10000000000`
- Or use unlimited: `--fuel 0`

### Slow Execution
- Ensure you built in Release mode
- Try native optimizations: `-DNATIVE=ON`
- Enable LTO: `-DLTO=ON`

### "Machine exception"
- Guest program has a bug (segfault, illegal instruction, etc.)
- Try verbose mode: `--verbose`
- Or use the debugger: `../tests/debug_test program.elf`

## Comparison with Debugger

libloong provides two executables:

| Tool | Purpose | Output | Speed |
|------|---------|--------|-------|
| `laemu` | Production emulator | Minimal | Fast |
| `debug_test` | Instruction tracer | Full traces | Slow |

Use `laemu` for:
- Running programs normally
- Performance testing
- CI/CD integration
- Production workflows

Use `debug_test` for:
- Debugging guest programs
- Verifying emulator correctness
- Understanding instruction execution
- Development and testing

## Integration with CI/CD

Example GitHub Actions usage:

```yaml
- name: Build emulator
  run: |
    cd emulator
    ./build.sh

- name: Run tests
  run: |
    ./emulator/.build/laemu test_suite.elf
```

See [.github/workflows/emulator-build.yml](../.github/workflows/emulator-build.yml) for a complete example.

## Architecture

The emulator is built on libloong:

```
laemu (CLI)
    ↓
Machine
    ↓
┌─────────────┬──────────────┐
│ CPU         │ Memory       │
│ - Decoder   │ - Arena      │
│ - Executor  │ - Execute    │
│ - Registers │   segments   │
└─────────────┴──────────────┘
    ↓
Linux Syscall Emulation
```

## Binary Translation

For maximum performance, the emulator supports binary translation:

1. **Build with binary translation**: `./build.sh --bintr` (recommended for all CLI builds)
2. **Generate translation code**: Use `-O <file.c>` to generate C code for embedding
3. **Build with embedding**: Use `./build.sh --bintr --embed <file.c>`
4. **Automatic activation**: Embedded translations activate automatically (~75% native performance)

**Recommendation**: Always build the CLI with `--bintr` enabled. You can disable translation for specific programs at runtime using `--no-translate` if needed. This gives you maximum flexibility without needing to rebuild.

For details, see [Embedded Binary Translation Guide](../docs/EMBEDDED_BINTR.md).

## See Also

- [Embedded Binary Translation](../docs/EMBEDDED_BINTR.md) - Near-native performance guide
- [Debugger](../tests/README.md) - Instruction-level debugging tool
- [Library Integration](../docs/INTEGRATION.md) - Using libloong in your project
- [API Reference](../docs/API.md) - Full library API
- [ISA Support](../docs/ISA.md) - Supported LoongArch instructions
