A complete system for capturing human motion from webcam or video and transferring it to a 3D avatar in real-time using MediaPipe, DNN and Three.js.
Created by Ashok BK and Ashim Nepal
The system detects body movements from webcam or video input and transfers them in real-time to a 3D avatar. You can use your own ReadyPlayerMe avatar and switch between webcam and video file inputs.
- Real-time pose detection from webcam or video file input
- Custom NN model for pose correction and refinement
- 17-keypoint skeleton mapping from detected landmarks
- Kalman filtering for smoother motion
- 3D visualization using Three.js
- Real-time motion transfer to 3D avatar models
- WebSocket communication between detection and visualization components
- Video or webcam input options with easy configuration
- Python 3.8+ with pip
- Web browser with WebGL support
- VS Code with Live Server extension (recommended for frontend)
- Webcam (for live capture) or video files (for pre-recorded motion)
- Internet connection (for loading avatar models)
-
Clone this repository:
git clone https://github.com/BlazeWild/Real-Time-Motion-Transfer-to-a-3D-Avatar.git cd Real-Time-Motion-Transfer-to-a-3D-Avatar -
Create a virtual environment in the main project directory:
On Windows:
python -m venv venv venv\Scripts\activate
On macOS/Linux:
python -m venv venv source venv/bin/activate -
Install required Python packages:
pip install -r backend_process/requirements.txt
-
Important for Windows users: The
run.batfile must be run from Git Bash (not Command Prompt or PowerShell). From the project root directory, run:./run.bat
For Mac/Linux users, follow the manual startup process below.
-
Open Git Bash in the project root directory
-
Run the
run.batfile:./run.bat
-
Choose your input source:
- Option 1: Webcam (default)
- Option 2: Video file (you can provide just the video name like "video" and the system will find it automatically)
-
For video files, you can configure:
- Playback speed (delay between frames)
- Looping options
- Frame rate for processing
- Debug mode for better model updates
-
The system will:
- Activate the virtual environment
- Start the Python backend for pose detection
- Open the frontend file in VS Code
- Provide instructions for opening with Live Server
-
In VS Code, right-click on
frontend_dis/index.htmland select "Open with Live Server"
-
Activate the virtual environment:
On Windows:
venv\Scripts\activate
On macOS/Linux:
source venv/bin/activate -
Start the Python backend:
For webcam:
python backend_process/scripts/capture.pyFor video file:
python backend_process/scripts/capture.py --video "path_to_video.mp4" --delay 1 --frame-rate 30 -
Serve the frontend:
- Using VS Code Live Server: Right-click on
frontend_dis/index.htmland select "Open with Live Server" - Or using Python's built-in server:
python -m http.server 8000 --directory frontend_dis
- Using VS Code Live Server: Right-click on
You can easily use your own custom avatar from ReadyPlayerMe:
- Visit ReadyPlayerMe and create your custom avatar
- After creating your avatar, click "Download" and choose "glTF/GLB"
- You can also just copy the URL from the share link (ends with .glb)
- Open
frontend_dis/glb-model.jsin a text editor - Find line 45 with:
const modelPath = "https://models.readyplayer.me/67be034c9fab1c21c486eb14.glb"; - Replace the URL with your avatar's URL
- Save the file and refresh the browser window
Example:
// Replace this
const modelPath = "https://models.readyplayer.me/67be034c9fab1c21c486eb14.glb";
// With your avatar URL
const modelPath = "https://models.readyplayer.me/YOUR_AVATAR_ID.glb";-
Stand in front of your webcam (or use a video file), ensuring your full body is visible.
-
The application will detect your pose and display:
- Top: MediaPipe pose detection output
- Bottom: Processed 17-keypoint skeleton
- Browser: 3D avatar following your movements
-
Keyboard controls:
k: Toggle Kalman filter for smoother movementd: Toggle DNN correctionq: Quit the applications: Save a screenshot
The system processes motion in several stages:
- MediaPipe Pose Detection: Captures 33 pose world landmarks using Google's MediaPipe/Blazepose library
- Landmark Selection: Extracts 12 essential keypoints from the 33 MediaPipe landmarks:
- Shoulders, elbows, wrists
- Hips, knees, ankles
- DNN Correction: Applies a neural network to correct and refine keypoint positions for accurate depth
- Orientation Enrichment: Calculates local quaternion for 8 joints to apply the longitudinal rotation
- 17-Keypoint Mapping: Creates a full skeleton by:
- Adding calculated joints (hips center, spine, neck)
- Organizing joints in a standard hierarchy
- Kalman Filtering: Applies statistical smoothing to reduce jitter and improve motion quality
- 3D Model Animation: Transfers processed joint rotations to the avatar's skeleton
backend_process/scripts/capture.py- Main entry point, handles webcam/video capture and UI displayprocessing.py- Core processing logic for keypoint extraction and visualizationquat_cal.py- Handles quaternion calculations for rotational data
backend_process/dependencies/- Python virtual environment (created during setup)backend_process/models/- Directory for model files (dnn_model.pth)backend_process/videos/- Place for storing video files (created automatically)websocket_server.py- Handles WebSocket communication with the frontendvideo_websocket.py- Handles streaming video frames to frontendfrontend_dis/- Frontend files for 3D visualization:index.html- Main frontend pagecanva.js- Canvas and Three.js initializationglb-model.js- 3D model handling and animationlive-reload.js- Auto-refresh functionality for development
streamlit_app/- Alternative Streamlit-based UI (see separate README in folder)- Self-contained app with web interface
- Uses
uvpackage manager - No need for Live Server
run.bat- Windows batch file for easy startup (requires Git Bash)videos/- Alternative location for video files
For a simpler, web-based interface, check out the streamlit_app/ folder. It provides:
- Modern web UI built with Streamlit
- All-in-one interface without needing Live Server
- Same pose detection and processing features
- See
streamlit_app/README.mdfor setup instructions
The NN model consists of a multi-layer perceptron with the following architecture:
- Input: 36 values (12 keypoints × 3 coordinates)
- Hidden layers: 72 → 64 → 50 → 54 neurons
- Output: 36 values (12 corrected keypoints × 3 coordinates)
The system maps MediaPipe's output to a 17-keypoint skeleton including:
- Hips (center)
- Spine (3 points)
- Head and neck
- Arms and hands (6 points)
- Legs and feet (6 points)
We implement a Kalman filter for each keypoint to reduce noise and jitter:
- State variables: Position (x,y,z) and velocity
- Observation: Raw keypoint positions
- Process noise and measurement covariance are tuned for smooth motion
- The backend sends 17-keypoint data and DNN status via WebSocket (port 8765)
- Video frames are streamed via a separate WebSocket (port 8766)
- The frontend receives this data and applies it to the 3D model
- 50Hz update rate for real-time performance
Common issues:
- No video feed: Check if your webcam is connected and accessible
- Poor detection: Ensure good lighting and that your full body is visible
- No model movement: Check WebSocket connection status in browser console
- NN correction fails: Verify the model file exists in
backend_process/models/ - Missing dependencies: Make sure the virtual environment is activated and all packages are installed
- Live Server not refreshing: Use the buttons in VS Code or toggle focus on the window
If Live Server isn't auto-refreshing:
- Make sure the
live-reload.jsscript is loaded in the HTML - Try clicking into another application window and back
- Manually refresh once to trigger the auto-refresh mechanism
Contributions are welcome! Please feel free to submit a Pull Request.
- MediaPipe for pose detection
- PyTorch for neural network implementation
- Three.js for 3D visualization
- ReadyPlayerMe for 3D avatar models