mirror of
https://github.com/JefferyHcool/BiliNote.git
synced 2026-06-03 23:01:38 +08:00
Compare commits
22 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
db556b8991 | ||
|
|
b431db545a | ||
|
|
25face4b67 | ||
|
|
edfd6e4765 | ||
|
|
b53cafda5a | ||
|
|
adda5fd240 | ||
|
|
3e28f1fe38 | ||
|
|
bffa285cd0 | ||
|
|
b740e70068 | ||
|
|
261c95cf12 | ||
|
|
1cc7f38e14 | ||
|
|
7fffd6873b | ||
|
|
7b927db363 | ||
|
|
c42ceaaa32 | ||
|
|
177ee4ba3a | ||
|
|
aae17abf9a | ||
|
|
33d44e32d2 | ||
|
|
ce58cb9352 | ||
|
|
de630dadb3 | ||
|
|
e9d4740cc7 | ||
|
|
ec33ae35ed | ||
|
|
0742387235 |
3
.github/workflows/docker-build.yml
vendored
3
.github/workflows/docker-build.yml
vendored
@@ -66,6 +66,9 @@ jobs:
|
||||
echo "Run the container:"
|
||||
echo " docker run -d -p 80:80 \\"
|
||||
echo " -v bilinote-data:/app/backend/data \\"
|
||||
echo " -v bilinote-config:/app/backend/config \\"
|
||||
echo " -v bilinote-static:/app/backend/static \\"
|
||||
echo " -v bilinote-models:/app/backend/models \\"
|
||||
echo " --name bilinote \\"
|
||||
echo " ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest"
|
||||
echo ""
|
||||
|
||||
13
.github/workflows/main.yml
vendored
13
.github/workflows/main.yml
vendored
@@ -79,6 +79,19 @@ jobs:
|
||||
key: ${{ runner.os }}-cargo-${{ hashFiles('BillNote_frontend/src-tauri/Cargo.lock') }}
|
||||
restore-keys: ${{ runner.os }}-cargo-
|
||||
|
||||
# 从 tag 注入版本号到 tauri.conf.json:Tauri 取该文件的静态 version 作为
|
||||
# 产物版本,不同步的话构建产物会恒为 conf 里写死的值(此前的 2.0.0)。
|
||||
# github.ref_name 形如 v2.3.2,去掉前缀 v。workflow_dispatch(无 tag)时跳过,保留静态值。
|
||||
- name: Sync version from tag
|
||||
if: startsWith(github.ref, 'refs/tags/v')
|
||||
working-directory: BillNote_frontend
|
||||
shell: bash
|
||||
run: |
|
||||
VERSION="${GITHUB_REF_NAME#v}"
|
||||
echo "Injecting version $VERSION into tauri.conf.json"
|
||||
node -e "const f='src-tauri/tauri.conf.json'; const fs=require('fs'); const j=JSON.parse(fs.readFileSync(f,'utf8')); j.version=process.argv[1]; fs.writeFileSync(f, JSON.stringify(j,null,2)+'\n');" "$VERSION"
|
||||
node -e "console.log('tauri.conf.json version =', require('./src-tauri/tauri.conf.json').version)"
|
||||
|
||||
# 打包 Tauri 应用
|
||||
- name: Build Tauri App
|
||||
working-directory: BillNote_frontend
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"$schema": "../node_modules/@tauri-apps/cli/config.schema.json",
|
||||
"productName": "BiliNote",
|
||||
"version": "2.0.0",
|
||||
"version": "2.3.3",
|
||||
"identifier": "com.jefferyhuang.bilinote",
|
||||
"build": {
|
||||
"frontendDist": "../dist",
|
||||
|
||||
20
CHANGELOG.md
20
CHANGELOG.md
@@ -2,6 +2,26 @@
|
||||
|
||||
本项目所有重要变更记录于此。格式参考 [Keep a Changelog](https://keepachangelog.com/zh-CN/1.1.0/),遵循 [语义化版本](https://semver.org/lang/zh-CN/)。
|
||||
|
||||
## [2.3.3] - 2026-05-22
|
||||
|
||||
### Fixed
|
||||
|
||||
- **预构建 Docker 镜像数据持久化**:文档的 `docker run` 只挂了 `data/`(媒体缓存),而 SQLite 数据库(LLM 供应商配置 + 笔记历史)和笔记文件不在该卷下,导致删除 / 升级容器时丢失配置与历史。现将数据库重定向到 `/app/backend/data/bili_note.db`、笔记到 `data/note_results`(随 data 卷持久化);README 更新为挂载 `data` / `config` / `static` / `models` 四个数据卷,并提示**勿**挂整个 `/app/backend`(命名卷会固化镜像内代码,导致 `docker pull` 升级不生效)。`docker-compose` 路径本就正确(`./backend:/app` 整目录绑挂),未受影响。
|
||||
|
||||
## [2.3.2] - 2026-05-22
|
||||
|
||||
### Fixed
|
||||
|
||||
- **后端启动崩溃(Docker)**:`python:3.11-slim` 基础镜像升级到 Debian 13 / glibc 2.41 后,`ctranslate2` 4.5.0 预编译库带「可执行栈」标记被 glibc 拒绝加载(`cannot enable executable stack ... Invalid argument`)。由于 `from faster_whisper import WhisperModel` 在顶层 import,import 失败直接拖垮整个后端启动 → 容器反复重启。升级 `ctranslate2` 4.5.0→4.6.0(wheel 加入 `noexecstack` 链接标志,从二进制层根治)
|
||||
- **whisper 模型误报「离线模式找不到模型」**:下载(modelscope 自定义目录)与加载(faster-whisper HF cache)布局不一致导致命不中缓存。统一为下载 / 加载 / 完整性检测 / 损坏自愈都走 HF cache 布局,并向后兼容老 modelscope 目录
|
||||
- **桌面端构建产物版本恒为 2.0.0**:Release 工作流在 `pnpm tauri build` 前从 git tag 注入版本号到 `tauri.conf.json`,使产物版本与 Release 版本对齐
|
||||
|
||||
## [2.3.1] - 2026-05-22
|
||||
|
||||
### Changed
|
||||
|
||||
- **更新微信交流群二维码**:旧二维码即将失效,替换 README 中 5 个交流群(群 1-5)的入群二维码。
|
||||
|
||||
## [2.3.0] - 2026-05-14
|
||||
|
||||
主线:一波部署与运行时韧性专项——Docker / 桌面端 / 在线引擎三端的"装不上、起不来、跑一半挂"问题集中清理,并新增全局代理与转写模型就绪门禁。
|
||||
|
||||
@@ -95,14 +95,16 @@ COPY ./nginx/default.conf /etc/nginx/conf.d/default.conf
|
||||
# 在 [supervisord] 块用 environment= 设兜底默认值;在 [program:backend] 用
|
||||
# %(ENV_*)s 显式引用,等价于「把 host 通过 docker run -e 或 env_file 传进来的
|
||||
# 变量再透传给 python main.py」。漏掉这一步就是用户「改 .env 没反应」的根因。
|
||||
RUN mkdir -p /var/log/supervisor
|
||||
# /app/backend/data 用于持久化数据库与笔记(见下方 DATABASE_URL / NOTE_OUTPUT_DIR),
|
||||
# 预建好目录,避免不挂卷启动时 sqlite 因父目录不存在而创建库失败。
|
||||
RUN mkdir -p /var/log/supervisor /app/backend/data
|
||||
COPY <<EOF /etc/supervisor/conf.d/supervisord.conf
|
||||
[supervisord]
|
||||
nodaemon=true
|
||||
user=root
|
||||
logfile=/var/log/supervisor/supervisord.log
|
||||
pidfile=/var/run/supervisord.pid
|
||||
environment=BACKEND_PORT="8483",BACKEND_HOST="0.0.0.0",TRANSCRIBER_TYPE="fast-whisper",WHISPER_MODEL_SIZE="tiny",FFMPEG_BIN_PATH="",HF_ENDPOINT="https://hf-mirror.com",STATIC="/static",OUT_DIR="./static/screenshots",DATA_DIR="data",NOTE_OUTPUT_DIR="note_results",IMAGE_BASE_URL="/static/screenshots",ENV="production",GROQ_TRANSCRIBER_MODEL="whisper-large-v3-turbo"
|
||||
environment=BACKEND_PORT="8483",BACKEND_HOST="0.0.0.0",TRANSCRIBER_TYPE="fast-whisper",WHISPER_MODEL_SIZE="tiny",FFMPEG_BIN_PATH="",HF_ENDPOINT="https://hf-mirror.com",STATIC="/static",OUT_DIR="./static/screenshots",DATA_DIR="data",NOTE_OUTPUT_DIR="data/note_results",DATABASE_URL="sqlite:////app/backend/data/bili_note.db",IMAGE_BASE_URL="/static/screenshots",ENV="production",GROQ_TRANSCRIBER_MODEL="whisper-large-v3-turbo"
|
||||
|
||||
[program:nginx]
|
||||
command=nginx -g "daemon off;"
|
||||
@@ -118,7 +120,7 @@ stdout_logfile=/var/log/supervisor/backend.log
|
||||
stderr_logfile=/var/log/supervisor/backend.log
|
||||
autorestart=true
|
||||
priority=20
|
||||
environment=BACKEND_PORT="%(ENV_BACKEND_PORT)s",BACKEND_HOST="%(ENV_BACKEND_HOST)s",TRANSCRIBER_TYPE="%(ENV_TRANSCRIBER_TYPE)s",WHISPER_MODEL_SIZE="%(ENV_WHISPER_MODEL_SIZE)s",FFMPEG_BIN_PATH="%(ENV_FFMPEG_BIN_PATH)s",HF_ENDPOINT="%(ENV_HF_ENDPOINT)s",STATIC="%(ENV_STATIC)s",OUT_DIR="%(ENV_OUT_DIR)s",DATA_DIR="%(ENV_DATA_DIR)s",NOTE_OUTPUT_DIR="%(ENV_NOTE_OUTPUT_DIR)s",IMAGE_BASE_URL="%(ENV_IMAGE_BASE_URL)s",ENV="%(ENV_ENV)s",GROQ_TRANSCRIBER_MODEL="%(ENV_GROQ_TRANSCRIBER_MODEL)s"
|
||||
environment=BACKEND_PORT="%(ENV_BACKEND_PORT)s",BACKEND_HOST="%(ENV_BACKEND_HOST)s",TRANSCRIBER_TYPE="%(ENV_TRANSCRIBER_TYPE)s",WHISPER_MODEL_SIZE="%(ENV_WHISPER_MODEL_SIZE)s",FFMPEG_BIN_PATH="%(ENV_FFMPEG_BIN_PATH)s",HF_ENDPOINT="%(ENV_HF_ENDPOINT)s",STATIC="%(ENV_STATIC)s",OUT_DIR="%(ENV_OUT_DIR)s",DATA_DIR="%(ENV_DATA_DIR)s",NOTE_OUTPUT_DIR="%(ENV_NOTE_OUTPUT_DIR)s",DATABASE_URL="%(ENV_DATABASE_URL)s",IMAGE_BASE_URL="%(ENV_IMAGE_BASE_URL)s",ENV="%(ENV_ENV)s",GROQ_TRANSCRIBER_MODEL="%(ENV_GROQ_TRANSCRIBER_MODEL)s"
|
||||
EOF
|
||||
|
||||
# 修改 nginx 配置以使用本地 backend
|
||||
|
||||
58
README.md
58
README.md
@@ -3,7 +3,7 @@
|
||||
<p align="center">
|
||||
<img src="./doc/icon.svg" alt="BiliNote Banner" width="50" height="50" />
|
||||
</p>
|
||||
<h1 align="center" > BiliNote v2.3.0</h1>
|
||||
<h1 align="center" > BiliNote v2.3.3</h1>
|
||||
</div>
|
||||
|
||||
<p align="center"><i>AI 视频笔记生成工具 让 AI 为你的视频做笔记</i></p>
|
||||
@@ -18,13 +18,34 @@
|
||||
<img src="https://img.shields.io/github/stars/jefferyhcool/BiliNote?style=social" />
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://www.bilinote.app/"><b>🚀 BiliNote Pro · 在线版</b></a>
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<b>不想折腾部署?</b>访问 <a href="https://www.bilinote.app/"><b>www.bilinote.app</b></a> 即开即用 —— 免安装、免配置环境、免下模型,注册即可把视频转成笔记。
|
||||
<br/>
|
||||
本地部署遇到的依赖、代理、模型下载这些坑,云端版统统不用管。
|
||||
</p>
|
||||
|
||||
<p align="center">
|
||||
<a href="https://www.bilinote.app/">
|
||||
<img src="https://img.shields.io/badge/%E7%AB%8B%E5%8D%B3%E4%BD%93%E9%AA%8C-BiliNote%20Pro-ff5c5c?style=for-the-badge" alt="立即体验 BiliNote Pro" />
|
||||
</a>
|
||||
</p>
|
||||
|
||||
|
||||
|
||||
## ✨ 项目简介
|
||||
|
||||
BiliNote 是一个开源的 AI 视频笔记助手,支持通过哔哩哔哩、YouTube、抖音等视频链接,自动提取内容并生成结构清晰、重点明确的 Markdown 格式笔记。支持插入截图、原片跳转、AI 问答等功能。
|
||||
## 在线使用
|
||||
可以通过访问 [这里](https://www.bilinote.app/) 进行使用。
|
||||
|
||||
> 💡 **想直接用、不想本地部署?** —— [BiliNote Pro 在线版 www.bilinote.app](https://www.bilinote.app/) 已上线,云端托管、开箱即用,省去依赖安装 / 代理配置 / 模型下载的全部麻烦。
|
||||
|
||||
## 🌐 在线使用(推荐)
|
||||
|
||||
直接访问 **[www.bilinote.app](https://www.bilinote.app/)** 即可使用 BiliNote Pro 在线版,无需本地部署。
|
||||
|
||||
## 📝 使用文档
|
||||
详细文档可以查看[这里](https://docs.bilinote.app/)
|
||||
## 📦 桌面版下载
|
||||
@@ -142,10 +163,17 @@ docker pull ghcr.io/jefferyhcool/bilinote:latest
|
||||
|
||||
docker run -d -p 80:80 \
|
||||
-v bilinote-data:/app/backend/data \
|
||||
-v bilinote-config:/app/backend/config \
|
||||
-v bilinote-static:/app/backend/static \
|
||||
-v bilinote-models:/app/backend/models \
|
||||
--name bilinote \
|
||||
ghcr.io/jefferyhcool/bilinote:latest
|
||||
```
|
||||
|
||||
上面四个卷分别持久化:`data`(SQLite 数据库 + 生成的笔记)、`config`(LLM 供应商配置 / Cookie / 转写设置)、`static`(笔记引用的视频截图)、`models`(Whisper 模型缓存,可选,避免每次重新下载)。这样 `docker pull` 升级新镜像、删旧容器重建后,配置和历史都不会丢。
|
||||
|
||||
> ⚠️ **不要**用 `-v 卷名:/app/backend` 挂整个后端目录——命名卷会用首次启动时的镜像内容固化,之后 `docker pull` 升级也会被旧代码盖住,导致「升级不生效」。只挂上面这些数据子目录即可。
|
||||
|
||||
访问:`http://localhost`
|
||||
|
||||
也可以使用 docker-compose 本地构建:
|
||||
@@ -281,10 +309,17 @@ docker pull ghcr.io/jefferyhcool/bilinote:latest
|
||||
# 运行容器
|
||||
docker run -d -p 80:80 \
|
||||
-v bilinote-data:/app/backend/data \
|
||||
-v bilinote-config:/app/backend/config \
|
||||
-v bilinote-static:/app/backend/static \
|
||||
-v bilinote-models:/app/backend/models \
|
||||
--name bilinote \
|
||||
ghcr.io/jefferyhcool/bilinote:latest
|
||||
```
|
||||
|
||||
上面四个卷分别持久化:`data`(SQLite 数据库 + 生成的笔记)、`config`(LLM 供应商配置 / Cookie / 转写设置)、`static`(笔记引用的视频截图)、`models`(Whisper 模型缓存,可选,避免每次重新下载)。这样 `docker pull` 升级新镜像、删旧容器重建后,配置和历史都不会丢。
|
||||
|
||||
> ⚠️ **不要**用 `-v 卷名:/app/backend` 挂整个后端目录——命名卷会用首次启动时的镜像内容固化,之后 `docker pull` 升级也会被旧代码盖住,导致「升级不生效」。只挂上面这些数据子目录即可。
|
||||
|
||||
访问:`http://localhost`
|
||||
|
||||
也可以使用 docker-compose 本地构建:
|
||||
@@ -309,11 +344,20 @@ docker-compose -f docker-compose.gpu.yml up -d
|
||||
|
||||
### Contact and Join-联系和加入社区
|
||||
|
||||
扫码加入 BiliNote 交流微信群(如二维码失效,请到 [Issues](https://github.com/JefferyHcool/BiliNote/issues) 反馈):
|
||||
扫码加入 BiliNote 交流微信群(共 5 个群,任选一个即可;二维码会定期更新,如已失效请到 [Issues](https://github.com/JefferyHcool/BiliNote/issues) 反馈):
|
||||
|
||||
<p align="center">
|
||||
<img src="./doc/wechat.png" alt="BiliNote 交流微信群" width="240" />
|
||||
</p>
|
||||
<table align="center">
|
||||
<tr>
|
||||
<td align="center"><img src="./doc/wechat-group-1.png" alt="BiliNote 交流群 1" width="200" /><br/>交流群 1</td>
|
||||
<td align="center"><img src="./doc/wechat-group-2.png" alt="BiliNote 交流群 2" width="200" /><br/>交流群 2</td>
|
||||
<td align="center"><img src="./doc/wechat-group-3.png" alt="BiliNote 交流群 3" width="200" /><br/>交流群 3</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td align="center"><img src="./doc/wechat-group-4.png" alt="BiliNote 交流群 4" width="200" /><br/>交流群 4</td>
|
||||
<td align="center"><img src="./doc/wechat-group-5.png" alt="BiliNote 交流群 5" width="200" /><br/>交流群 5</td>
|
||||
<td></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -119,12 +119,21 @@ _downloading: dict[str, str] = {} # model_size -> status ("downloading" | "done
|
||||
def _check_whisper_model_exists(model_size: str, subdir: str = "whisper") -> bool:
|
||||
"""检查指定 whisper 模型是否已下载完整到本地。
|
||||
|
||||
必须 model.bin 落盘才算完成,仅有空目录或半成品不能算「已下载」——
|
||||
否则监控页会显示绿勾但加载时报「Unable to open file 'model.bin'」。
|
||||
faster-whisper 把模型缓存在 HF cache 布局下:
|
||||
<model_dir>/models--Systran--faster-whisper-{size}/snapshots/<hash>/model.bin
|
||||
必须能在某个 snapshot 目录里找到 model.bin 才算完成。
|
||||
(历史 modelscope 布局 <model_dir>/whisper-{size}/model.bin 也兼容识别。)
|
||||
"""
|
||||
model_dir = get_model_dir(subdir)
|
||||
model_path = os.path.join(model_dir, f"whisper-{model_size}")
|
||||
return (Path(model_path) / "model.bin").exists()
|
||||
model_dir = Path(get_model_dir(subdir))
|
||||
# HF cache 布局
|
||||
hf_repo_dir = model_dir / f"models--Systran--faster-whisper-{model_size}" / "snapshots"
|
||||
if hf_repo_dir.exists():
|
||||
for snapshot in hf_repo_dir.iterdir():
|
||||
if (snapshot / "model.bin").exists():
|
||||
return True
|
||||
# 历史 modelscope 布局(向后兼容老用户)
|
||||
legacy = model_dir / f"whisper-{model_size}" / "model.bin"
|
||||
return legacy.exists()
|
||||
|
||||
|
||||
def _check_mlx_whisper_model_exists(model_size: str) -> bool:
|
||||
@@ -189,24 +198,37 @@ class ModelDownloadRequest(BaseModel):
|
||||
|
||||
|
||||
def _do_download_whisper(model_size: str):
|
||||
"""后台下载 faster-whisper 模型。"""
|
||||
from app.transcriber.whisper import MODEL_MAP
|
||||
from modelscope import snapshot_download
|
||||
"""后台下载 faster-whisper 模型。
|
||||
|
||||
直接走 huggingface_hub.snapshot_download,把模型放到 HF cache 布局里——
|
||||
这样 faster-whisper 加载时(WhisperModel(model_size_or_path=size_name,
|
||||
download_root=model_dir))能直接命中缓存,跟加载路径完全对齐。
|
||||
"""
|
||||
from huggingface_hub import snapshot_download
|
||||
|
||||
try:
|
||||
_downloading[model_size] = "downloading"
|
||||
model_dir = get_model_dir("whisper")
|
||||
model_path = os.path.join(model_dir, f"whisper-{model_size}")
|
||||
# 用 model.bin 判定而非目录存在:半成品目录不能算「已下载」
|
||||
if (Path(model_path) / "model.bin").exists():
|
||||
|
||||
# 已经下好就不重复下
|
||||
if _check_whisper_model_exists(model_size, "whisper"):
|
||||
_downloading[model_size] = "done"
|
||||
return
|
||||
repo_id = MODEL_MAP.get(model_size)
|
||||
if not repo_id:
|
||||
_downloading[model_size] = "failed"
|
||||
return
|
||||
logger.info(f"开始下载 whisper 模型: {model_size}")
|
||||
snapshot_download(repo_id, local_dir=model_path)
|
||||
repo_id = f"Systran/faster-whisper-{model_size}"
|
||||
logger.info(f"开始下载 whisper 模型: {repo_id}")
|
||||
# 跟 faster-whisper utils.py 用同样的 allow_patterns,避免多下无关文件;
|
||||
# 不传 local_dir 让它走 HF 默认 cache 布局(与加载逻辑对齐)
|
||||
snapshot_download(
|
||||
repo_id,
|
||||
cache_dir=model_dir,
|
||||
allow_patterns=[
|
||||
"config.json",
|
||||
"preprocessor_config.json",
|
||||
"model.bin",
|
||||
"tokenizer.json",
|
||||
"vocabulary.*",
|
||||
],
|
||||
)
|
||||
logger.info(f"whisper 模型下载完成: {model_size}")
|
||||
_downloading[model_size] = "done"
|
||||
except Exception as e:
|
||||
|
||||
@@ -11,8 +11,6 @@ from events import transcription_finished
|
||||
from pathlib import Path
|
||||
import os
|
||||
import shutil
|
||||
from tqdm import tqdm
|
||||
from modelscope import snapshot_download
|
||||
|
||||
|
||||
'''
|
||||
@@ -20,19 +18,16 @@ from modelscope import snapshot_download
|
||||
'''
|
||||
logger=get_logger(__name__)
|
||||
|
||||
MODEL_MAP={
|
||||
"tiny": "pengzhendong/faster-whisper-tiny",
|
||||
'base':'pengzhendong/faster-whisper-base',
|
||||
'small':'pengzhendong/faster-whisper-small',
|
||||
'medium':'pengzhendong/faster-whisper-medium',
|
||||
'large-v1':'pengzhendong/faster-whisper-large-v1',
|
||||
'large-v2':'pengzhendong/faster-whisper-large-v2',
|
||||
'large-v3':'pengzhendong/faster-whisper-large-v3',
|
||||
'large-v3-turbo':'pengzhendong/faster-whisper-large-v3-turbo',
|
||||
}
|
||||
|
||||
# 历史遗留:之前用 modelscope 下载到自定义目录然后把路径传给 WhisperModel。
|
||||
# 但 faster-whisper 1.1.1 的 download_model(utils.py:76)逻辑是:
|
||||
# 只要 size_or_id 里含 "/" 就当 HF repo_id 处理,没有「本地目录直接返回」分支。
|
||||
# 我们传 /app/models/whisper/whisper-tiny 进去 → 被当成不存在的 HF repo →
|
||||
# 在线请求失败 → fallback local_files_only=True → HF cache 找不到(因为是
|
||||
# modelscope 目录布局不是 HF)→ LocalEntryNotFoundError,误导说"离线模式"。
|
||||
# 解法:彻底让 faster-whisper 自己处理下载——传 size name,配 download_root
|
||||
# 作为 HF cache 根目录,HF_ENDPOINT 已经在 Dockerfile 里指到 hf-mirror.com,
|
||||
# 国内能用。删掉 modelscope 那一套,避免布局不匹配。
|
||||
class WhisperTranscriber(Transcriber):
|
||||
# TODO:修改为可配置
|
||||
def __init__(
|
||||
self,
|
||||
model_size: str = "base",
|
||||
@@ -48,44 +43,40 @@ class WhisperTranscriber(Transcriber):
|
||||
print('没有 cuda 使用 cpu进行计算')
|
||||
|
||||
self.compute_type = compute_type or ("float16" if self.device == "cuda" else "int8")
|
||||
self.model_size = model_size
|
||||
|
||||
model_dir = get_model_dir("whisper")
|
||||
model_path = os.path.join(model_dir, f"whisper-{model_size}")
|
||||
repo_id = MODEL_MAP[model_size]
|
||||
|
||||
# 第一步:目录 / model.bin 不在 → 下载。
|
||||
# 关键判据用 model.bin 而不是目录存在:首次下载若被打断(网络中断 / 磁盘满 /
|
||||
# 容器被 kill)会留下半成品目录,只看目录存在会跳过下载。
|
||||
model_bin = Path(model_path) / "model.bin"
|
||||
if not model_bin.exists():
|
||||
if Path(model_path).exists():
|
||||
logger.warning(f"模型目录 {model_path} 存在但 model.bin 缺失(上次下载未完成),重新下载")
|
||||
else:
|
||||
logger.info(f"模型 whisper-{model_size} 不存在,开始下载...")
|
||||
model_path = snapshot_download(repo_id, local_dir=model_path)
|
||||
logger.info("模型下载完成")
|
||||
|
||||
# 第二步:加载。model.bin 可能存在但【内容截断】(下载到一半被 kill),
|
||||
# 此时 WhisperModel() 会抛 "File model.bin is incomplete: failed to read a buffer..."。
|
||||
# 捕获后删掉损坏目录、重新下载、再试一次——自愈,避免 500 死循环。
|
||||
try:
|
||||
self.model = WhisperModel(
|
||||
model_size_or_path=model_path,
|
||||
device=self.device,
|
||||
compute_type=self.compute_type,
|
||||
download_root=model_dir,
|
||||
)
|
||||
self.model = self._build_model(model_size, model_dir)
|
||||
except Exception as e:
|
||||
logger.warning(f"加载 whisper-{model_size} 失败(疑似模型文件损坏 / 截断):{e};删除后重新下载")
|
||||
shutil.rmtree(model_path, ignore_errors=True)
|
||||
model_path = snapshot_download(repo_id, local_dir=model_path)
|
||||
logger.info("模型重新下载完成,重试加载")
|
||||
self.model = WhisperModel(
|
||||
model_size_or_path=model_path,
|
||||
device=self.device,
|
||||
compute_type=self.compute_type,
|
||||
download_root=model_dir,
|
||||
)
|
||||
# 自愈:损坏 / 截断 / 半成品 cache → 删掉对应 HF cache 重下一次
|
||||
logger.warning(f"加载 whisper-{model_size} 失败:{e};清理 cache 后重新下载")
|
||||
self._purge_cache(model_dir, model_size)
|
||||
self.model = self._build_model(model_size, model_dir)
|
||||
|
||||
def _build_model(self, model_size: str, model_dir: str) -> WhisperModel:
|
||||
return WhisperModel(
|
||||
model_size_or_path=model_size, # 传 size name,让 faster-whisper 自己映射到 Systran/faster-whisper-*
|
||||
device=self.device,
|
||||
compute_type=self.compute_type,
|
||||
download_root=model_dir,
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _purge_cache(model_dir: str, model_size: str) -> None:
|
||||
"""删掉 HF cache 里这个 size 对应的 snapshot 目录,强制下次重新下载。
|
||||
|
||||
HF cache 布局:<model_dir>/models--Systran--faster-whisper-{size}/
|
||||
没找到也不报错——可能用户改了 endpoint 或者 cache 布局变了。
|
||||
"""
|
||||
candidates = [
|
||||
Path(model_dir) / f"models--Systran--faster-whisper-{model_size}",
|
||||
Path(model_dir) / f"whisper-{model_size}", # 历史 modelscope 目录,顺手清掉
|
||||
]
|
||||
for path in candidates:
|
||||
if path.exists():
|
||||
logger.info(f"清理损坏 cache: {path}")
|
||||
shutil.rmtree(path, ignore_errors=True)
|
||||
@staticmethod
|
||||
def is_torch_installed() -> bool:
|
||||
try:
|
||||
|
||||
@@ -24,7 +24,7 @@ click-repl==0.3.0
|
||||
colorama==0.4.6
|
||||
coloredlogs==15.0.1
|
||||
cssselect2==0.8.0
|
||||
ctranslate2==4.5.0
|
||||
ctranslate2==4.6.0
|
||||
distro==1.9.0
|
||||
dnspython==2.7.0
|
||||
email_validator==2.2.0
|
||||
|
||||
BIN
doc/wechat-group-1.png
Normal file
BIN
doc/wechat-group-1.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 560 KiB |
BIN
doc/wechat-group-2.png
Normal file
BIN
doc/wechat-group-2.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 562 KiB |
BIN
doc/wechat-group-3.png
Normal file
BIN
doc/wechat-group-3.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 568 KiB |
BIN
doc/wechat-group-4.png
Normal file
BIN
doc/wechat-group-4.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 565 KiB |
BIN
doc/wechat-group-5.png
Normal file
BIN
doc/wechat-group-5.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 491 KiB |
Reference in New Issue
Block a user