Hello! A portion of the MoE implementation for Mixtral is copied directly from MegaBlocks. It's somewhat error prone code and I've been meaning to factor out helpers for it, which we could reuse to avoid having this duplicated in vLLM. If this is interesting to you I'll send a PR :)
Hello! A portion of the MoE implementation for Mixtral is copied directly from MegaBlocks. It's somewhat error prone code and I've been meaning to factor out helpers for it, which we could reuse to avoid having this duplicated in vLLM. If this is interesting to you I'll send a PR :)