The motivation for this project is to communicate with robots through natural speech. For this we are using an open-source Speech-to-Text engine named DeepSpeech, developed by Mozila.
To get started, first install the following Dependencies
ROS-Noetic
sudo apt install ros-noetic-turtlebot3-gazebo #simulation env
sudo apt install ros-noetic-turtlebot3-slam #slam
sudo apt install ros-noetic-turtlebot3-navigation #navigation stack
sudo apt install ros-noetic-gmapping #for mapping
sudo apt install ros-noetic-dwa-local-planner #dynamic windowing approach controller
sudo apt install ros-noetic-behaviortree-cpp-v3 #for task planning
install the following python packages
pip3 install mediapipe #gesture recognition
#for using deepspeech
pip3 install deepspeech #speech to text
sudo apt-get install python3-pyaudio python3-pyaudio #
# for using whisper
pip install git+https://github.com/openai/whisper.git
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
#for text to speech
sudo apt install libespeak-dev
pip install pyttsx3
After installing all the dependencies, go to your ros workspace and clone this repository.
cd <your_catkin_workspace>
git clone [email protected]:brukg/hri_speech.gitThen build the project using the following command and source it your workspace.
catkin build
source devel/setup.bash or source devel/setup.zsh #depending on your shellNext go to project directory and create a folder named models.
export TURTLEBOT3_MODEL=waffle
roscd hri_speech or cd <your_catkin_workspace>/src/hri_speech
mkdir modelsGo inside the models folder and download two deepspeech models.
cd models
# makes sure to place/download the below files in the on the projects models directory
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scoreropenai davinci model is used to extract semantic meaning from text for this you have to setup openai key here and export it before running the project replace the string in the below command with your key
export OPENAI_API_KEY="key obtained from openai account"Finally run the project using the following command.
roslaunch hri_speech start_all.launch