We need to explore integration of full-duplex (simultaneous input/output) or eventually full-triplex (input/output + animation) models to better control real-time interactions. This would allow:
- Continuous voice input while generating output.
- Coordinated voice + animation playback.
- Reduced latency and smoother conversational flow.