Don't disable uneven k to support more headdims#21
Conversation
|
Adding the flag increased the wheel size of |
|
Thanks @WoosukKwon, I guess that's fairly significant. The motivation here is that we depend of flash attention currently for certain features like multi-step scheduling, so it's a big performance downside for models which have these other head sizes (in particular some of the IBM Granite models use head_dim 80) |
|
@njhill We can just add the head size 80 to Would that work for you? |
|
@WoosukKwon yes, I think for our immediate needs that would be great, if you're sure that will be sufficient! |
|
@WoosukKwon I've opened #22 for this |
|
@njhill Since the idea above doesn't work, can you check how much this PR affects the vllm wheel size? |
Originally disabled by @WoosukKwon in eee8e47