Some issues about motion editing (motion in-betweening) evaluation without text condition

I noticed that the code doesn't include the motion editing without text and the default training doesn't use CFG. How can I achieve motion in-betweening without providing text information? I tried setting the word embeddings to zero or leaving the text input empty, but the results were extremely poor.