Results in ~11Gb weights vs. 16Gb, implemented in PyTorch now as load_in_8bit=True: https://huggingface.co/hivemind/gpt-j-6B-8bit