Improving quality with 8bit?

I can achieve around 1 token per second on a Ryzen 7 3700X on Linux with the 65B model and 4bit quantization.

If we use 8bit instead, would it run faster? I have 128GB RAM. Is 8bit already supported?
```
$ ./main -m models/65B/ggml-model-q4_0.bin -t 8 -n 128
main: mem per token = 70897348 bytes
main:     load time = 14010.35 ms
main:   sample time =   335.09 ms
main:  predict time = 140527.48 ms / 1089.36 ms per token
main:    total time = 157951.48 ms
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improving quality with 8bit? #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improving quality with 8bit? #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions