We seem to have a working implementation of [AudioLDM2](https://github.com/haoheliu/AudioLDM2) I understand you have already mentioned you will implement Vocos and AudioCraft. But it seems to me that AudioLDM produces better outputs. Please have a look? :)