Thanks to visit codestin.com
Credit goes to github.com

Skip to content

🏦 VBank: Voice-Activated Banking Platform πŸŽ™οΈ Talk to Your Bank Perform banking tasks like checking balance, transferring money, and viewing transactions using natural voice commands. 🧠 Smart & Secure πŸ” Biometric authentication (voice + face) 🧬 Liveness detection with OTP πŸ›‘οΈ JWT-based session security

Notifications You must be signed in to change notification settings

indu-shekhar/VBank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Typing SVG

VBank Full Width Banner

πŸ’¬ Voice-Activated Banking Platform

Version 8.1.9 Β |Β  Author: Indu Shekhar Jha Β |Β  Date: July 7, 2025


πŸ–ΌοΈ Demo Screenshots

Login Page Demo Banking Page Demo

πŸ“š Table of Contents

  1. Executive Summary
  2. System Overview
  3. Frontend Architecture
  4. Backend Architecture
  5. API Specification
  6. Database Design
  7. Voice & Biometric Modules
  8. Security & Compliance
  9. Deployment & DevOps Pipeline
  10. Appendices

Executive Summary

Voice-Activated Banking Platform is a secure, user-centric application enabling customers to perform banking operations using natural voice commands. Designed for accessibility and robust security, it integrates advanced voice recognition, liveness detection, and biometric authentication to ensure only authorized users can execute sensitive transactions.

Target: Retail banking customers, financial institutions, and accessibility-focused organizations seeking seamless, hands-free banking experiences.

System Overview

System Architecture

mindmap
  root((Voice-Activated Banking Platform))
    Executive Summary
      Secure, User-Centric Application
      Natural Voice Commands
      Accessibility & Robust Security
      Integrates Voice Recognition
      Liveness Detection
      Biometric Authentication
      Target Audience
        Retail Banking Customers
        Financial Institutions
        Accessibility-Focused Organizations
      Key Features
        Voice-driven Command Execution
        Multi-factor Authentication
        Real-time Speech Recognition & Synthesis
        Secure Session & Error Management
        Modular, Extensible Architecture
    System Overview
    Frontend Architecture
      Component Hierarchy
        App Root
        Output Display
        State Controller
        SpeechService
        MediaRecorderService
        ApiService
        TransactionTable
      State Management (BankAppState)
        AWAITING_ACTIVATION
        LISTENING_FOR_COMMAND
        PROCESSING
        PRESENTING
      Event Handling & UI Flow
        Activation: 'hello bank'
        Command Recording & Sending
        Verification: OTP Prompt & Record
        Response Presentation & Reset
      Error Recovery
        Errors Routed to PRESENTING
        Graceful UI Inform
        Resets to AWAITING_ACTIVATION
    Backend Architecture
      API Routing: Modular, Blueprint-based
      Authentication & Session Handling
        JWT for Stateless Authentication
        Flask Sessions for OTP/Liveness
      External Integrations
        Voice/Biometric Modules
        Database
      Detailed Overview (Flask-based REST API)
        Key Modules
        API Endpoint Details
        Security & Compliance
    API Specification
      /process_command (POST)
      /generate_otp (GET)
      /verify_otp_audio (POST)
      /secret (GET)
      Example Request/Response
    Database Design
      ER Diagram
        USER
        TRANSACTION_HISTORY
      Table Definitions
        User Table
        TransactionHistory Table
      Sample Queries
        Get user by email
        Get last 5 transactions
    Voice & Biometric Modules
      Speech Recognition & Synthesis
      Intent Handling
        Classifies User Intent
        Extracts Entities (Amount, Recipient)
    Security & Compliance
      Authentication Flow
        User Speaks Command
        Audio for Voice Verification
        Liveness Required (OTP)
        User Repeats OTP
        OTP Audio Sent
        Verification Success
        Original Command Execution
        Result Presentation
      Data Encryption (HTTPS, At Rest)
      Authentication (JWT)
      Liveness (Session-based OTP, Expiry)
      Data Minimization
      Safe Fallback on Error
    Deployment & DevOps Pipeline
      CI/CD Pipeline
      Configuration (Env Vars, Secrets)
      Monitoring (Uptime, Error, Security)
      Rollbacks (Blue/Green, Versioned)
    Frontend Architecture (Detailed)
      High-Level Overview
      State Management & UI Flow
      Low-Level Design Details
    Appendices
      Glossary
      References
      Future Roadmap
Loading

Architecture Diagram

Click to view architecture diagram

User Device (Mic, Browser)
⬇️⬆️
Frontend (SPA) (JS, HTML, CSS)
⬇️⬆️
Backend (API) (App Server)

``` +-------------------+ +-------------------+ +-------------------+ | User Device |<---->| Frontend (SPA) |<---->| Backend (API) | | (Mic, Browser) | | (JS, HTML, CSS) | | (App Server) | +-------------------+ +-------------------+ +-------------------+ | | | |<---Voice/Audio-------->| | | |<---REST/JSON----------->| | | | | |<---DB/Storage----------->| | | | | |<---Biometric/Voice------>| ```

Technology Stack

  • Frontend: SPA, event-driven, browser APIs for speech/audio, stateful UI, Tailwind CSS for styling.
  • Backend: RESTful API, authentication/session, business logic, biometric/voice integration.
  • Database: Relational, stores user, transaction, and session data.
  • Voice/Biometric: Speech-to-text, text-to-speech, voice matching, liveness detection.

Frontend Architecture

Component Hierarchy

  • App Root
    • Output Display
    • State Controller
    • SpeechService (recognition/synthesis)
    • MediaRecorderService (audio capture)
    • ApiService (backend communication)
    • TransactionTable (dynamic rendering)

State Management

  • Centralized state machine (BankAppState):
    • AWAITING_ACTIVATION: Idle, waiting for trigger phrase.
    • LISTENING_FOR_COMMAND: Recording and recognizing user command.
    • PROCESSING: Sending command/audio to backend, handling OTP/liveness.
    • PRESENTING: Presenting results or errors, then resetting.
  • All transitions go through a single transitionToState method, ensuring cleanup and consistent UI.

Event Handling & UI Flow

  • User says "hello bank" β†’ transitions to command listening.
  • On command, records audio, sends to backend.
  • If liveness/OTP required, prompts and records OTP audio.
  • On backend response, presents result or error, then resets.

Error Recovery

  • All errors (network, backend, speech) are caught and routed to PRESENTING state.
  • After presenting, always resets to AWAITING_ACTIVATION.
  • UI and state are never left in an inconsistent state.

Backend Architecture

API Routing

  • Modular blueprint-based routing for all endpoints.
  • Endpoints for command processing, OTP generation/verification, authentication, and user/session management.

Authentication & Session Handling

  • JWT-based authentication for all sensitive endpoints.
  • Session management for OTP and liveness flows, with expiry and state tracking.

External Integrations

  • Voice/biometric modules for speech recognition, synthesis, and authentication.
  • Database for persistent user and transaction data.

Backend Architecture (Detailed)

High-Level Overview

The backend is a modular, Flask-based REST API server responsible for all business logic, authentication, session management, voice/biometric verification, and database operations. It is structured for maintainability, security, and extensibility.

Key Modules:

  • run.py: Application entry point, configures logging, initializes the Flask app, and ensures database schema creation.
  • intent.py: Handles all voice command processing, OTP/liveness flows, intent recognition, and transaction logic.
  • auth.py: Manages user registration and login, including voice and face biometric enrollment and verification.
  • utils.py: Provides utility functions for OTP generation, normalization, fuzzy matching, audio/face processing, and biometric verification.

Application Startup & Configuration

  • Logging is configured at INFO level with timestamps and log levels for traceability.
  • The Flask app is created via a factory (create_app), ensuring modularity and testability.
  • On startup, the database schema is created if not present.
  • The app runs in debug mode for development; production should use a WSGI server and disable debug.

API Routing & Blueprints

  • All endpoints are organized into Flask blueprints (intent_bp, auth_bp) for separation of concerns.
  • Each blueprint encapsulates related routes and logic (e.g., intent, authentication).

Authentication & Session Handling

  • JWT Authentication: All sensitive endpoints require a valid JWT access token, created on successful login and checked on each request.
  • Session Management: Flask session is used for OTP/liveness state, with session.permanent = True to ensure persistence across requests. OTPs have expiry timestamps.

Voice & Biometric Verification

  • Voice Verification: On registration, user audio is converted to a standard format and stored. On login/command, new audio is compared to the enrolled sample using a voice verification model. Distance thresholds and logging are used for security and debugging.
  • Face Verification: On registration, a face encoding is extracted and stored. On login, the provided face image is compared to the stored encoding using facial recognition.
  • Liveness/OTP: For every command, unless already verified in the session, a random OTP is generated, spoken to the user, and must be repeated back. The backend verifies both the voice and the recognized OTP text using fuzzy matching.

Intent Recognition & Command Processing

  • User commands are recognized via speech-to-text and sent to /process_command.
  • Intent is predicted using a trained logistic regression model on TF-IDF features (intent.py).
  • Supported intents: CheckBalance, TransferMoney, GetLastTransactions.
  • For transfers, entity extraction parses recipient and amount from the command text.
  • All actions are logged for traceability.

Error Handling & Logging

  • All endpoints use try/except/finally blocks for robust error handling.
  • Errors are logged with context and returned as structured JSON with appropriate HTTP status codes.
  • Resource cleanup (temp files) is always performed in finally blocks.

API Endpoint Details

Endpoint Method Auth Parameters Request Format Response Format Error Handling
/register POST No email, user_id, audio-file, face-image multipart/form-data JSON: message/error 400/500 JSON error
/login POST No email, audio-file, face-image multipart/form-data JSON: access_token/error 400/401/404/500 JSON error
/process_command POST Yes command, voice_sample multipart/form-data JSON: result, error 400/401/404/500 JSON error
/generate_otp GET Yes - - JSON: otp_numeric, text 400/401 JSON error
/verify_otp_audio POST Yes otp_audio multipart/form-data JSON: success, error 400/401/404/500 JSON error
/secret GET Yes access_token (cookie) - HTML/JSON 401/404 JSON error

Example: /process_command Flow

  1. Receives command and voice sample.
  2. Authenticates user via JWT.
  3. Verifies voice sample against enrolled audio.
  4. If liveness/OTP not verified, generates OTP and returns challenge.
  5. If OTP verified, predicts intent and executes command (balance, transfer, history).
  6. Logs all actions and errors.

Example: /register Flow

  1. Receives email, user_id, audio, and face image.
  2. Converts and stores audio, extracts and stores face encoding.
  3. Creates new user in database.
  4. Returns success or error.

Utility Functions (utils.py)

  • OTP Generation: Random 4-digit numeric and text phrase.
  • OTP Normalization & Fuzzy Matching: Cleans and compares recognized text to expected OTP, allowing for minor errors.
  • Audio Processing: Converts uploaded audio to standard format for verification.
  • Face Processing: Extracts face encodings for biometric matching.

Database Integration

  • Uses SQLAlchemy ORM for all database operations.
  • User table stores email, user_id, audio_file (binary), face_encoding (array), and balance.
  • TransactionHistory table records all transactions with sender, recipient, type, amount, and timestamp.
  • All queries are parameterized and indexed for performance.

Security & Compliance (Backend)

  • All sensitive data is encrypted in transit (HTTPS) and at rest (database encryption recommended).
  • JWT tokens are used for stateless authentication.
  • OTPs and session data are never exposed to the client except as needed for liveness.
  • All biometric data is processed and stored securely; no raw images/audio are retained after processing.
  • Logging avoids sensitive data exposure.
  • All errors are handled gracefully, with no stack traces or sensitive info leaked to clients.

API Specification

API Specification

Endpoint Method Auth Parameters Request Format Response Format Error Handling
/register POST No email, user_id, audio-file, face-image multipart/form-data JSON: message/error 400/500 JSON error
/login POST No email, audio-file, face-image multipart/form-data JSON: access_token/error 400/401/404/500 JSON error
/process_command POST Yes command, voice_sample multipart/form-data JSON: result, error 400/401/404/500 JSON error
/generate_otp GET Yes - - JSON: otp_numeric, text 400/401 JSON error
/verify_otp_audio POST Yes otp_audio multipart/form-data JSON: success, error 400/401/404/500 JSON error
/secret GET Yes access_token (cookie) - HTML/JSON 401/404 JSON error

Example Request:

POST /process_command
Authorization: Bearer <token>
Content-Type: multipart/form-data

command=Check my balance
voice_sample=<audio file>

Example Response:

{
  "balance": 1234.56
}

Database Design

ER Diagram

erDiagram
    USER {
        int user_id PK
        string email
        binary audio_file
        float balance
        %% other fields can be added here as needed
    }
    TRANSACTIONHISTORY {
        int transaction_id PK
        string acc_email
        string sent_to_email
        string transaction_type
        float amount
        datetime timestamp
    }
    USER ||--o{ TRANSACTIONHISTORY : has
Loading
Click to view text ER diagram
+---------+      +---------------------+
|  User   |<---->| TransactionHistory  |
+---------+      +---------------------+
| user_id |      | transaction_id      |
| email   |      | acc_email           |
| ...     |      | sent_to_email       |
| audio   |      | transaction_type    |
| balance |      | amount              |
+---------+      | timestamp           |
                 +---------------------+

Table Definitions

Table Columns Indexes
User user_id (PK), email, audio_file, balance, ... user_id, email
TransactionHistory transaction_id (PK), acc_email, sent_to_email, transaction_type, amount, timestamp acc_email, sent_to_email

Sample Queries

-- Get user by email
SELECT * FROM User WHERE email = '[email protected]';

-- Get last 5 transactions
SELECT * FROM TransactionHistory WHERE acc_email = '[email protected]' ORDER BY timestamp DESC LIMIT 5;

Voice & Biometric Modules

Speech Recognition & Synthesis

  • Converts user audio to text for command and OTP.
  • Synthesizes spoken responses for all outputs.

Intent Handling

  • Classifies user command intent (balance, transfer, history) using text analysis.
  • Extracts entities (amount, recipient) from recognized text.

Authentication Flow

Click to view authentication flow diagram
flowchart TD
    A[User Command] --> B[Voice Verification]
    B -->|Liveness required| C[Generate OTP]
    C --> D[Speak OTP]
    D --> E[Record OTP]
    E --> F[Voice & Speech Match]
    F -->|Success| G[Execute command]
    F -->|Fail| H[Error & Retry]
Loading

Security & Compliance

  • All sensitive data encrypted in transit (HTTPS) and at rest.
  • JWT authentication for all protected endpoints.
  • Session-based OTP and liveness with expiry.
  • No sensitive data exposed in logs or client.
  • Fallback: On error, system resets to safe state and requires re-authentication.
  • Data privacy: Only minimal, necessary data stored per user.

Deployment & DevOps Pipeline

  • CI/CD pipeline for automated testing, build, and deployment.
  • Environment configuration via environment variables and secrets management.
  • Monitoring and alerting for uptime, errors, and security events.
  • Rollback strategies: Blue/green deployments, versioned releases.

Appendices

Glossary

  • SPA: Single Page Application
  • JWT: JSON Web Token
  • OTP: One-Time Password
  • Liveness Detection: Verifying user is present and not replaying a recording

References

Future Roadmap

πŸš€ Roadmap
  • Add support for additional biometric factors (face, fingerprint)
  • Expand command set (bill pay, account linking)
  • Integrate with third-party financial APIs
  • Enhance accessibility features (multi-language, screen reader support)
  • Advanced fraud detection and anomaly monitoring

Frontend Architecture and Design (Detailed)

High-Level Overview

The frontend is a modern, event-driven single-page application (SPA) that provides a seamless, voice-first user experience for banking operations. It is designed for accessibility, security, and extensibility, integrating tightly with the backend for authentication, command execution, and biometric verification.

Key Components:

  • index.html: Login page, guides user through voice and face authentication.
  • register.html: Registration page, collects and enrolls user voice and face biometrics.
  • bank_index.html: Main banking dashboard, enables voice-driven banking commands and displays results.
  • app.js: Orchestrates the login flow, state machine, and voice/camera capture for authentication.
  • bank_script.js: Manages the banking command flow, state machine, OTP/liveness, and backend integration.
  • register.js: Handles user registration, including audio recording and face capture.
  • voice_ui_components.js: Provides reusable classes for speech recognition, synthesis, and media recording.

Component Hierarchy & UI Flow

App Root (HTML)
β”œβ”€β”€ Output/Status Display
β”œβ”€β”€ State Controller (App/BankApp)
β”‚   β”œβ”€β”€ SpeechService (recognition/synthesis)
β”‚   β”œβ”€β”€ MediaRecorderService (audio capture)
β”‚   β”œβ”€β”€ ApiService (backend communication)
β”‚   └── UI Components (forms, tables, camera, audio)
└── TransactionTable (dynamic rendering)
  • Login/Registration: Guides user through multi-step process using voice prompts, audio/face capture, and state transitions.
  • Banking Dashboard: Listens for activation, processes commands, handles OTP/liveness, and presents results.

State Management

  • Centralized state machines in both login (AuthAppState) and banking (BankAppState) flows.
  • All transitions go through a single transitionToState method, ensuring cleanup, consistent UI, and robust error recovery.
  • States include: activation, input, confirmation, recording, capturing, processing, presenting, error, and reset.

Event Handling & UI Flow

  • Voice Activation: Listens for trigger phrase (e.g., "hello bank" or "hello indu") to start flows.
  • Speech Recognition: Captures and processes user commands, email, confirmations, and OTPs.
  • Audio Recording: Uses MediaRecorder API to capture voice samples for authentication and commands.
  • Face Capture: Uses getUserMedia and Canvas APIs to capture and preview face images.
  • Form Submission: All data is sent to the backend via fetch with FormData, handling both success and error responses.
  • Dynamic UI Updates: Output/status fields, transaction tables, and error messages are updated in real time based on state and backend responses.

Error Recovery & Robustness

  • All errors (network, backend, speech, device) are caught and routed to a safe state (PRESENTING or ERROR).
  • After presenting a result or error, the app always resets to the initial state, ready for the next user action.
  • Defensive coding ensures that UI and state are never left inconsistent, even on unexpected failures.
  • Logging and user feedback are provided for all error conditions.

Integration with Backend

  • All API calls are made via fetch with proper authentication (JWT in cookies or headers).
  • Endpoints for registration, login, command processing, OTP/liveness, and transaction history are fully integrated.
  • All backend responses are handled with defensive JSON parsing and error handling.
  • State transitions and UI updates are driven by backend responses (e.g., liveness required, OTP success/failure, command results).

Low-Level Design Details

1. SpeechService (voice_ui_components.js, bank_script.js, app.js)

  • Wraps browser SpeechRecognition and SpeechSynthesis APIs.
  • Handles start/stop, error events, and result callbacks.
  • Ensures recognition is stopped before speaking to avoid feedback loops.
  • Used for all voice input and output throughout the app.

2. MediaRecorderService (voice_ui_components.js, bank_script.js, register.js, app.js)

  • Wraps MediaRecorder API for audio capture.
  • Handles start/stop, data collection, and blob creation.
  • Used for both command/OTP audio and registration/login samples.

3. ApiService (voice_ui_components.js, bank_script.js, app.js)

  • Centralizes all backend communication (login, command, OTP, etc.).
  • Handles JWT token management and error handling.
  • Ensures all requests are authenticated and responses are parsed defensively.

4. State Controllers (AuthApp, BankApp)

  • Each flow (login, registration, banking) is managed by a dedicated class with a state machine.
  • All UI actions, API calls, and error handling are routed through state transitions.
  • Ensures a consistent, recoverable user experience.

5. UI Components (register.html, index.html, bank_index.html)

  • Responsive, accessible layouts using modern CSS and Tailwind.
  • Dynamic elements for audio/face capture, transaction tables, and status/output.
  • All user actions are guided by voice and visual prompts.

6. Registration Flow (register.js, register.html)

  • Audio and face are captured via browser APIs and attached to a hidden form.
  • On submit, FormData is sent to /register endpoint.
  • Result or error is displayed and logged.

7. Login Flow (app.js, index.html)

  • Multi-step, voice-driven process: activation β†’ email β†’ confirmation β†’ voice β†’ face β†’ submit.
  • All steps are managed by state machine and voice prompts.
  • On success, JWT is stored in cookie for subsequent API calls.

8. Banking Command Flow (bank_script.js, bank_index.html)

  • Listens for activation, records command, sends to backend.
  • If liveness/OTP required, prompts and records OTP audio, verifies with backend.
  • On success, executes original command and presents result.
  • Transaction history is dynamically rendered in a styled table.

9. Error Handling and Recovery

  • All API and device errors are caught and presented to the user.
  • State is always reset after error or completion, preventing stuck UI.
  • Defensive checks for device permissions, missing files, and backend failures.

10. Accessibility and UX

  • All flows are voice-guided and keyboard-accessible.
  • Visual feedback (status, output, transaction tables) is provided at every step.
  • Responsive design for desktop and mobile.

Example UI Flow Diagram


```mermaid
flowchart TD
    A[AWAITING_ACTIVATION] -->|trigger phrase| B[LISTENING_FOR_COMMAND]
    B -->|command spoken| C[PROCESSING]
    C -->|liveness required| D[OTP PROMPT]
    D -->|OTP spoken| E[PROCESSING]
    E -->|result| F[PRESENTING]
    F --> G[AWAITING_ACTIVATION]

Example Code Snippet: State Transition (BankApp)

transitionToState(newState) {
  console.log(`[STATE] Transitioning from ${this.state} to ${newState}`);
  // Clean up previous state if needed
  switch (this.state) {
    case BankAppState.LISTENING_FOR_COMMAND:
    case BankAppState.PROCESSING:
      this.speechService.stop();
      this.recorderService.stopRecording();
      break;
  }
  this.state = newState;
  switch (this.state) {
    case BankAppState.AWAITING_ACTIVATION:
      output.textContent = "Say 'hello bank' to begin.";
      this.userCommand = "";
      this.livenessRequired = false;
      this.otpText = null;
      this.otpNumeric = null;
      this.lastCommandBlob = null;
      this.livenessBlob = null;
      try { this.speechService.start(); } catch (e) { console.error(e); }
      break;
    // ...other states...
  }
}

Security and Privacy in the Frontend

  • All sensitive data (audio, face images, tokens) is handled in memory and never stored in localStorage or indexedDB.
  • JWT tokens are stored in cookies with secure flags.
  • All API calls use HTTPS and include authentication headers.
  • Device permissions are requested only as needed and released after use.

About

🏦 VBank: Voice-Activated Banking Platform πŸŽ™οΈ Talk to Your Bank Perform banking tasks like checking balance, transferring money, and viewing transactions using natural voice commands. 🧠 Smart & Secure πŸ” Biometric authentication (voice + face) 🧬 Liveness detection with OTP πŸ›‘οΈ JWT-based session security

Topics

Resources

Stars

Watchers

Forks