Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FelixHuoEZ/subtitle-processor

Repository files navigation

Subtitle Processing Service 字幕处理服务

English | 中文

Note: This README is entirely generated by AI and is for reference only.
注意:本 README 完全由 AI 生成,仅供参考。

Recent Updates

  • Current language decision flow is documented in docs/language-decision-logic.md, including spoken language, content locale, subtitle strategy, and Readwise branching.
  • Runtime hotword settings can be toggled without restarting: /process/settings/hotword persists to config/hotword_settings.json, and Telegram 中新增 /hotword_status / /hotword_toggle 支持在线开关。
  • Telegram 机器人增加标签/热词交互提示、/skip 快捷命令,并在后台轮询 /process/status/<id> 自动推送字幕文件。
  • scripts/build-and-push.sh 新增 bgutil-provider 镜像构建;默认 Dockerfile 仅保留必需依赖,X11/VNC 相关组件以注释形式保留,构建镜像更轻量。
  • 后端提供 /process/status/<id>?include_content=1 以及 /process/status/<id>/subtitle,方便外部查询任务进度与字幕原文。
  • YouTube alternate URLs (/shorts/<id>, /live/<id>, youtu.be/<id>, /embed/<id>, /v/<id>) are normalized to watch?v= with fallback to the original URL if needed.
  • Download concurrency + 403 backoff retries are configurable, and transcription concurrency can be capped.
  • Optional Readwise URL-only clipping when Chinese subtitles are available (READWISE_URL_ONLY_WHEN_ZH_SUBS).

🌍 English

Overview

A comprehensive subtitle processing service that automatically downloads, transcribes, and manages video subtitles from various platforms. Features a Telegram bot interface and a web management portal.

🚀 Features

  • Multi-Platform Support

    • YouTube video subtitle extraction
    • YouTube alternate links (/shorts/<id>, /live/<id>, youtu.be/<id>, /embed/<id>, /v/<id>) normalized with fallback
    • Member-only/age-restricted YouTube videos with your own cookies/profile
    • Bilibili video subtitle processing
    • Automatic fallback to audio transcription
  • Subtitle Processing

    • Direct subtitle download from platforms
    • Audio transcription using FunASR
    • Support for multiple subtitle formats (SRT, VTT, JSON3)
  • User Interfaces

    • Telegram Bot for easy access
    • Web interface for subtitle management
    • Real-time subtitle viewing and searching
  • File Management

    • Automatic file organization
    • Metadata tracking
    • Timeline visualization
  • Deployment Flexibility

    • Telegram webhook via a single entrypoint, worker nodes run processing-only stack
    • Build script with persistent cache/export to push and reload images quickly
    • .env overrides for image tags per environment
  • Readwise Integration

    • Automatic article creation from subtitles
    • Rich text formatting support
    • Seamless sync with Readwise Reader
    • Smart content segmentation for long videos
    • Optional URL-only clipping when Chinese subtitles are available
  • Hotword Management

    • Runtime toggle API (/process/settings/hotword) with persisted JSON state
    • Telegram commands /hotword_status/hotword_toggle 查看/切换自动热词
    • /prompt_toggle on|off|status only affects the current bot process (not persisted or shared)
    • Conversation flow supports manual hotword input或 /skip 跳过
    • config/hotwords-example/config/hotword_settings.json.example 提供可定制模板

🛠️ Technical Stack

  • Backend: Python Flask
  • Frontend: HTML/CSS/JavaScript
  • Transcription: FunASR
  • Container: Docker
  • Storage: JSON-based file system

📦 Installation

  1. Clone the repository
  2. Install Docker and Docker Compose
  3. Configure environment variables:
    TELEGRAM_TOKEN=your_telegram_bot_token
    READWISE_TOKEN=your_readwise_token
  4. Configure hotword settings (optional but recommended):
    cp config/hotword_settings.json.example config/hotword_settings.json
    # Edit config/hotword_settings.json to set defaults for auto_hotwords/post_process/mode/max_count
    # For advanced generation rules, copy config/hotwords-example/hotwords_config-example.yml to config/hotwords/hotwords_config.yml
  5. Configure YouTube cookies (required for member-only/age-restricted videos):
    • Option A (Firefox profile): copy your Firefox profile directory into firefox_profile/ or set cookies in config/config.yml.
      • macOS: ~/Library/Application Support/Firefox/Profiles/<profile>
      • Windows: C:\Users\<USER_NAME>\AppData\Roaming\Mozilla\Firefox\Profiles\
      • Linux: ~/.mozilla/firefox/<profile>
    • Option B (cookie file): export cookies to Netscape format and set YTDLP_COOKIE_FILE=/path/to/cookies.txt.
    • Ensure the profile contains cookies.sqlite and you are logged into YouTube.
  6. Start the services:
    docker-compose up --build

⚙️ Optional Configuration

  • READWISE_URL_ONLY_WHEN_ZH_SUBS=true to clip the original URL to Readwise when Chinese subtitles exist (skips subtitle download/transcription).
  • DOWNLOAD_CONCURRENCY (0/1 means serial), plus DOWNLOAD_MAX_RETRIES, DOWNLOAD_RETRY_BASE_DELAY, DOWNLOAD_RETRY_BACKOFF, DOWNLOAD_RETRY_MAX_DELAY for 403 backoff.
  • TRANSCRIBE_CONCURRENCY to cap concurrent transcriptions (0/1 means serial, empty means unlimited).
  • YTDLP_COOKIE_FILE to provide a Netscape-format cookie file instead of a Firefox profile. All defaults are listed in .env.example.

🧩 Distribute Docker Images to Multiple Hosts

  1. Generate and push images from a build machine:
    cp images.env.example images.env
    # Edit images.env to set IMAGE_PREFIX (e.g. docker.io/myteam) and IMAGE_TAG
    # Optionally set EXTRA_TAGS=latest if you also want a latest tag
    set -a; source images.env; set +a
    ./scripts/build-and-push.sh
    The script tags/pushes:
    • ${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
    • ${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
    • ${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG} Notes:
    • If IMAGE_PREFIX points at a self-signed registry, place its CA at ~/.docker/certs.d/<registry>/ca.crt; scripts/build-and-push.sh will auto-mount it into BuildKit.
    • If IMAGE_PREFIX points at a private registry host and BASE_IMAGE_REGISTRY is unset, scripts/build-and-push.sh now defaults it to <registry>/dockerhub.
    • To avoid unstable Docker Hub access on the build machine, set BASE_IMAGE_REGISTRY to an internal mirror prefix such as 10.0.0.23:5443/dockerhub. The Dockerfiles will then read:
      • ${BASE_IMAGE_REGISTRY}/library/python:3.11-slim
      • ${BASE_IMAGE_REGISTRY}/library/python:3.9-slim
      • ${BASE_IMAGE_REGISTRY}/nvidia/cuda:11.8.0-base-ubuntu22.04
      • ${BASE_IMAGE_REGISTRY}/brainicism/bgutil-ytdlp-pot-provider:1.2.2
    • If the build host itself must go through a local proxy, export BUILDER_HTTP_PROXY / BUILDER_HTTPS_PROXY before running the script.
  2. On each target host, create (or edit) .env with the new image references:
    SUBTITLE_PROCESSOR_IMAGE=${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
    TRANSCRIBE_AUDIO_IMAGE=${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
    TELEGRAM_BOT_IMAGE=${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
  3. Pull and start containers without rebuilding locally:
    docker compose pull
    docker compose up -d --no-build
  4. For the usual "build here, then roll it out on the NAS" workflow, use the fixed wrapper:
    ./scripts/release-to-nas.sh
    Useful variants:
    • ./scripts/release-to-nas.sh --dry-run
    • ./scripts/release-to-nas.sh --services subtitle-processor,telegram-bot
    • ./scripts/release-to-nas.sh --nas-only --service subtitle-processor The wrapper runs local build-and-push.sh, then on the NAS executes docker compose pull, docker compose up -d --force-recreate, and docker compose ps inside /share/ZFS530_DATA/.qpkg/container-station/data/application/subtitle. When --services is provided, the same service filter now applies to both the local build step and the NAS deploy step. It uses direct ssh nas when available, otherwise falls back to ~/nas-remote.
  5. If the private base-image mirror is missing and build-and-push.sh fails on ${BASE_IMAGE_REGISTRY}/...: not found, sync the mirror first:
    ./scripts/sync-base-images.sh
    Notes:
    • The script defaults to the 4 upstream bases used by the current Dockerfiles.
    • It runs on the NAS over ssh nas, pulls both linux/amd64 and linux/arm64, then publishes a multi-arch manifest to ${BASE_IMAGE_REGISTRY}.
    • Use ./scripts/sync-base-images.sh --verify-only to inspect the mirrored manifests without copying again.

🤖 Telegram Deployment (Single Entry)

  • Choose one machine (for example the NAS that fronts Caddy) to run the telegram-bot service with webhook enabled. Configure telegram.webhook.public_url (or TELEGRAM_WEBHOOK_* envs) only on this host so it remains the sole webhook endpoint. Start the stack with the Telegram profile:
    docker compose --profile telegram up -d
  • On additional worker machines, keep running subtitle-processor and transcribe-audio but skip the bot service. You can do this by launching only the needed services: The default profile starts only processing services, so a plain docker compose up -d works. You can also explicitly target services:
    docker compose up -d subtitle-processor transcribe-audio
    or comment out the telegram-bot section in the worker’s compose file. Setting TELEGRAM_BOT_ENABLED=false in the worker’s environment keeps the container in health-check mode if you ever need the image present.
  • The worker nodes will still take part in transcription because the primary bot forwards requests to them via the shared FunASR server list in config/config.yml.
  • This “single entry + multiple workers” layout prevents Telegram from redelivering the same webhook to different instances, eliminating duplicate replies in chats.
  • Each webhook is acknowledged immediately and the heavy lifting runs in background tasks, so Telegram never retries the same update due to timeouts.
  • For exceptionally long jobs you can raise the HTTP timeouts via SUBTITLE_CONNECT_TIMEOUT (default 120 seconds) and SUBTITLE_READ_TIMEOUT (default 1800 seconds). Defaults are defined in docker-compose.yml and may be overridden with environment variables if needed.

🔧 Usage

  1. Telegram Bot

    • Send video URL to the bot
    • Receive processed subtitle file
  2. Web Interface

    • Access http://localhost:5000
    • Upload video files or URLs
    • View and search subtitles
  3. Readwise Integration

    • Automatically creates articles in Readwise Reader
    • Preserves video metadata (title, URL, publish date)
    • Intelligently splits long content into readable segments
    • Access transcripts alongside your other reading materials

📝 License

MIT License

🙏 Acknowledgments

Special thanks to:

  • Windsurf - The world's first agentic IDE that made this project development possible
  • Claude 3.5 Sonnet - For providing comprehensive AI assistance throughout the development process

🌏 中文

概述

一个综合性的字幕处理服务,可以自动下载、转录和管理来自各种平台的视频字幕。提供 Telegram 机器人接口和网页管理门户。

最近更新

  • 当前语言判定链路已整理到 docs/language-decision-logic.md,包含主语言、内容语境、字幕策略和 Readwise 分支流程图。
  • scripts/build-and-push.sh 支持持续化 BuildKit 缓存,多架构推送后会自动在本机加载当前架构镜像,无需再执行 docker pull
  • Telegram Webhook 立即返回,并将字幕处理放到后台执行,避免因为重试导致的重复回复。
  • Telegram 部署改为“单入口 + 多工作节点”模式,避免同一条消息被多个 bot 实例重复回复。
  • 文档补充镜像分发与 .env 覆盖指引,便于多机器快速上线。
  • 支持将 YouTube shorts/<id>live/<id>youtu.be/<id>embed/<id>v/<id> 等链接自动转换为 watch?v=,必要时回退原始 URL。
  • 下载并发 + 403 退避重试可配置,转录并发可选限制。
  • 可选:检测到中文字幕时直接剪藏 URL 到 Readwise(READWISE_URL_ONLY_WHEN_ZH_SUBS)。

🚀 功能特点

  • 多平台支持

    • YouTube 视频字幕提取
    • YouTube shorts/<id>live/<id>youtu.be/<id>embed/<id>v/<id> 链接自动规范化并提供回退策略
    • 支持会员/受限 YouTube 视频(需使用自己的 cookies/profile)
    • Bilibili 视频字幕处理
    • 自动音频转录备选方案
  • 字幕处理

    • 直接从平台下载字幕
    • 使用 FunASR 进行音频转录
    • 支持多种字幕格式(SRT、VTT、JSON3)
  • 用户界面

    • Telegram 机器人便捷访问
    • 网页字幕管理界面
    • 实时字幕查看和搜索
  • 文件管理

    • 自动文件组织
    • 元数据跟踪
    • 时间轴可视化
  • 部署灵活性

    • Telegram 仅在单一入口启用 webhook,其他节点专注处理任务
    • 构建脚本带持久缓存,提高推送/本地加载效率
    • 通过 .env 覆盖镜像标签,适配不同环境
  • Readwise 集成

    • 自动从字幕创建文章
    • 支持富文本格式
    • 与 Readwise Reader 无缝同步
    • 智能分段处理长视频内容
    • 可选:检测到中文字幕时直接剪藏原始 URL
  • 热词管理

    • 运行期热词开关可通过 /process/settings/hotword 与 Telegram 指令在线调整
    • /prompt_toggle on|off|status 仅对当前 bot 进程生效,不会持久化或跨节点同步
    • 标签/热词会话支持手动输入或 /skip 快捷跳过
    • config/hotword_settings.json.exampleconfig/hotwords-example/ 提供自定义模板,轻松扩展自动热词策略

🛠️ 技术栈

  • 后端:Python Flask
  • 前端:HTML/CSS/JavaScript
  • 转录:FunASR
  • 容器:Docker
  • 存储:基于 JSON 的文件系统

📦 安装步骤

  1. 克隆仓库
  2. 安装 Docker 和 Docker Compose
  3. 配置环境变量:
    TELEGRAM_TOKEN=你的_telegram_机器人_token
    READWISE_TOKEN=你的_readwise_token
  4. 可选:配置热词默认策略
    cp config/hotword_settings.json.example config/hotword_settings.json
    # 编辑热词开关/模式/最大数量等默认值
    # 如需自定义生成规则,可复制 config/hotwords-example/hotwords_config-example.yml 至 config/hotwords/hotwords_config.yml
  5. 配置 YouTube cookies(会员/受限视频下载需要自己的 cookies):
    • 方案 A(Firefox profile):将 Firefox 配置文件目录复制到 firefox_profile/,或在 config/config.yml 中配置 cookies 路径。
      • macOS:~/Library/Application Support/Firefox/Profiles/<profile>
      • Windows:C:\Users\<USER_NAME>\AppData\Roaming\Mozilla\Firefox\Profiles\
      • Linux:~/.mozilla/firefox/<profile>
    • 方案 B(cookie 文件):导出 Netscape 格式 cookies,并设置 YTDLP_COOKIE_FILE=/path/to/cookies.txt
    • 确保 profile 内含 cookies.sqlite,且已登录 YouTube。
  6. 启动服务:
    docker-compose up --build

⚙️ 可选配置

  • READWISE_URL_ONLY_WHEN_ZH_SUBS=true:检测到中文字幕时直接剪藏原始 URL 到 Readwise(跳过字幕下载/转录)。
  • DOWNLOAD_CONCURRENCY(0/1 视为串行)以及 DOWNLOAD_MAX_RETRIESDOWNLOAD_RETRY_BASE_DELAYDOWNLOAD_RETRY_BACKOFFDOWNLOAD_RETRY_MAX_DELAY 用于 403 退避重试。
  • TRANSCRIBE_CONCURRENCY:限制转录并发(0/1 串行,留空为不限)。
  • YTDLP_COOKIE_FILE:使用 Netscape 格式 cookies 文件替代 Firefox profile。 默认值可参考 .env.example

🤖 Telegram 单入口部署

  • 仅在一台机器(例如承载 Caddy 的 NAS)运行 telegram-bot 并启用 webhook,在该节点的配置文件或环境变量中填写 telegram.webhook.public_url,并使用带有 telegram profile 的启动方式:
    docker compose --profile telegram up -d
  • 其他工作节点只运行 subtitle-processortranscribe-audio: 默认 profile 只会启动处理服务,因此直接执行 docker compose up -d 即可;也可以显式指定服务:
    docker compose up -d subtitle-processor transcribe-audio
  • 或在它们的 docker-compose.yml 中注释掉 telegram-bot 服务;若需要保留容器,可在环境变量中设置 TELEGRAM_BOT_ENABLED=false,让其仅提供健康检查而不处理消息。
  • 所有节点共享 config/config.yml 内的转录服务器列表,主节点收到请求后仍会委派后端 FunASR 服务执行转录。
  • 该拓扑阻止 Telegram 将同一条 webhook 投递给多台实例,从根源上消除重复回复。
  • 每条 Webhook 请求都会立即响应,字幕生成移至后台任务执行,Telegram 不会因超时而重试。
  • 如果处理超长视频,可以通过环境变量 SUBTITLE_CONNECT_TIMEOUT(默认 120 秒)和 SUBTITLE_READ_TIMEOUT(默认 1800 秒)调高字幕请求的连接/读取超时。默认值写在 docker-compose.yml,需要时可在环境变量中覆盖。

🧩 多机快速分发 Docker 镜像

  1. 在构建机器上生成并推送镜像:
    cp images.env.example images.env
    # 编辑 images.env,设置 IMAGE_PREFIX(如 docker.io/myteam)和 IMAGE_TAG
    # 如需同时推送 latest,可设置 EXTRA_TAGS=latest
    set -a; source images.env; set +a
    ./scripts/build-and-push.sh
    脚本会推送以下镜像:
    • ${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
    • ${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
    • ${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG} 说明:
    • 如果 IMAGE_PREFIX 指向自签名 registry,把 CA 放到 ~/.docker/certs.d/<registry>/ca.crtscripts/build-and-push.sh 会自动挂到 BuildKit。
    • 如果 IMAGE_PREFIX 指向私有 registry,且没有显式设置 BASE_IMAGE_REGISTRYscripts/build-and-push.sh 现在会默认补成 <registry>/dockerhub
    • 如果构建机访问 Docker Hub 不稳定,可以设置 BASE_IMAGE_REGISTRY=10.0.0.23:5443/dockerhub 之类的内网 mirror。各 Dockerfile 会改为读取:
      • ${BASE_IMAGE_REGISTRY}/library/python:3.11-slim
      • ${BASE_IMAGE_REGISTRY}/library/python:3.9-slim
      • ${BASE_IMAGE_REGISTRY}/nvidia/cuda:11.8.0-base-ubuntu22.04
      • ${BASE_IMAGE_REGISTRY}/brainicism/bgutil-ytdlp-pot-provider:1.2.2
    • 如果构建机本身必须走本地代理,可在运行脚本前导出 BUILDER_HTTP_PROXY / BUILDER_HTTPS_PROXY
  2. 在每台目标机器根目录创建(或修改).env 文件,填入最新镜像:
    SUBTITLE_PROCESSOR_IMAGE=${IMAGE_PREFIX}/subtitle-processor:${IMAGE_TAG}
    TRANSCRIBE_AUDIO_IMAGE=${IMAGE_PREFIX}/transcribe-audio:${IMAGE_TAG}
    TELEGRAM_BOT_IMAGE=${IMAGE_PREFIX}/telegram-bot:${IMAGE_TAG}
  3. 拉取并启动容器,避免本地重新构建:
    docker compose pull
    docker compose up -d --no-build
  4. 如果是你平时那种“本机构建并推送,然后在 NAS 上拉取并重启”的流程,直接用固定脚本:
    ./scripts/release-to-nas.sh
    常用变体:
    • ./scripts/release-to-nas.sh --dry-run
    • ./scripts/release-to-nas.sh --services subtitle-processor,telegram-bot
    • ./scripts/release-to-nas.sh --nas-only --service subtitle-processor 这个脚本会先执行本地 build-and-push.sh,然后在 NAS 的 /share/ZFS530_DATA/.qpkg/container-station/data/application/subtitle 目录里运行 docker compose pulldocker compose up -d --force-recreatedocker compose ps。当前 shell 能直连 ssh nas 时会同步执行, 不能直连时自动回退到 ~/nas-remote。传入 --services 时, 这个服务过滤会同时作用在本地 build 和 NAS deploy 两边。
  5. 如果私有 base-image mirror 缺失,导致 build-and-push.sh${BASE_IMAGE_REGISTRY}/...: not found,先执行:
    ./scripts/sync-base-images.sh
    说明:
    • 脚本默认同步当前 Dockerfile 用到的 4 个 upstream base image。
    • 它会通过 ssh nas 在 NAS 上分别拉取 linux/amd64linux/arm64,再向 ${BASE_IMAGE_REGISTRY} 推送多架构 manifest。
    • 只想核对 mirror 是否齐全,可运行 ./scripts/sync-base-images.sh --verify-only

🔧 使用方法

  1. Telegram 机器人

    • 向机器人发送视频 URL
    • 接收处理好的字幕文件
  2. 网页界面

    • 访问 http://localhost:5000
    • 上传视频文件或 URL
    • 查看和搜索字幕
  3. Readwise 集成

    • 自动在 Readwise Reader 中创建文章
    • 保留视频元数据(标题、URL、发布日期)
    • 智能分割长内容为易读片段
    • 在其他阅读材料旁边访问转录文本

📝 许可证

MIT 许可证

🙏 致谢

特别感谢:

  • Windsurf - 世界首个智能代理 IDE,使本项目的开发成为可能
  • Claude 3.5 Sonnet - 在整个开发过程中提供全面的 AI 辅助

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors