Speech-Based Human Robot Interaction and Task Assignment

The motivation for this project is to communicate with robots through natural speech. For this we are using an open-source Speech-to-Text engine named DeepSpeech, developed by Mozila.

Getting Started

To get started, first install the following Dependencies

Dependencies

ROS-Noetic

    sudo apt install ros-noetic-turtlebot3-gazebo   #simulation env
    sudo apt install ros-noetic-turtlebot3-slam     #slam
    sudo apt install ros-noetic-turtlebot3-navigation #navigation stack
    sudo apt install ros-noetic-gmapping                #for mapping
    sudo apt install ros-noetic-dwa-local-planner       #dynamic windowing approach controller
    sudo apt install ros-noetic-behaviortree-cpp-v3     #for task planning

install the following python packages

    pip3 install mediapipe  #gesture recognition

    #for using deepspeech
    pip3 install deepspeech #speech to text
    sudo apt-get install python3-pyaudio python3-pyaudio #

    # for using whisper
    pip install git+https://github.com/openai/whisper.git 
    pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
    # on Ubuntu or Debian
    sudo apt update && sudo apt install ffmpeg
    #for text to speech
    sudo apt install libespeak-dev
    pip install pyttsx3

How to run the project

After installing all the dependencies, go to your ros workspace and clone this repository.

    cd <your_catkin_workspace>
    git clone [email protected]:brukg/hri_speech.git

Then build the project using the following command and source it your workspace.

    catkin build
    source devel/setup.bash or source devel/setup.zsh #depending on your shell

Next go to project directory and create a folder named models.

    export TURTLEBOT3_MODEL=waffle
    roscd hri_speech or cd <your_catkin_workspace>/src/hri_speech
    mkdir models

Go inside the models folder and download two deepspeech models.

    cd models
    # makes sure to place/download the below files in the on the projects models directory
    wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.pbmm
    wget https://github.com/mozilla/DeepSpeech/releases/download/v0.9.3/deepspeech-0.9.3-models.scorer

openai davinci model is used to extract semantic meaning from text for this you have to setup openai key here and export it before running the project replace the string in the below command with your key

export OPENAI_API_KEY="key obtained from openai account"

Finally run the project using the following command.

    roslaunch hri_speech start_all.launch

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
include/hri_speech		include/hri_speech
launch		launch
params		params
scripts		scripts
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
package.xml		package.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-Based Human Robot Interaction and Task Assignment

Getting Started

Dependencies

How to run the project

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

brukg/hri_speech

Folders and files

Latest commit

History

Repository files navigation

Speech-Based Human Robot Interaction and Task Assignment

Getting Started

Dependencies

How to run the project

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages