Support for 4 bit Quantization

Language model progress has been rapid recently and with the llama weights being released, so much progress is being made on the c++ side

https://github.com/ggerganov/llama.cpp

I see that fp16 is on the roadmap soon.

But it might also be a good idea to consider support for 4 bit quantization and related techniques. Is that something that will be considered?