24 Commits

Author SHA1 Message Date
Jianwu Huang
f6d299ce48 Merge pull request #353 from JefferyHcool/feature/extension-video-understanding
feat(extension): 多模态视频理解开关 + 抽帧/拼图参数(对齐 web NoteForm)
2026-05-07 17:28:00 +08:00
Jianwu Huang
ed1ee0a151 Merge pull request #352 from JefferyHcool/feature/extension-form-parity
Feature/extension form parity
2026-05-07 17:27:35 +08:00
huangjianwu
a7c717abbd feat(extension): 多模态视频理解开关 + 抽帧/拼图参数(对齐 web NoteForm)
web 端 NoteForm 早就有 video_understanding / video_interval / grid_size 三件套,
插件之前没有,导致用户在视觉模型上想用「画面理解」时只能去 web 端发任务。

新增字段(types.ts Settings 与 GenerateRequest 同步):
- video_understanding: boolean,默认 false(关)
- video_interval: number,1-30 秒,默认 6(与 web NoteForm 默认一致)
- grid_size: [number, number],1-10,默认 [2,2]

UI 落地:
- popup 「高级」折叠区:开关 + interval + grid_size 行/列三栏,启用时才显示后两个,
  并提示需要选视觉模型
- options General 页:单独一节「视频理解(多模态)」展开同样字段
- popup start() 与 background startTask() 在 generate_note 请求里带上这三个字段;
  关闭时不传(避免覆盖 backend 默认)

回归风险:默认 false,对现有用户行为不变。

依赖:feature/extension-form-parity(叠加在它之上,因为 Settings 是同一片字段域)。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:22:57 +08:00
huangjianwu
799ab64a28 feat(extension): NoteForm 字段对齐 web 端(style 预设 + format 完整 + extras)
之前插件 popup / options 的笔记选项跟 web 端 NoteForm 不齐,存在三处差距:

1. style 字段实质 broken
   · backend prompt_builder.get_style_format 是 enum 映射(minimal/detailed/
     academic/tutorial/xiaohongshu/life_journal/task_oriented/business/
     meeting_minutes 共 9 个),不命中直接 return ''
   · 插件原来给的是自由文本框,用户填什么都对不上 enum,等于没传
   · 改:popup + options 都换成 9 个预设的下拉框,与 backend 严格对齐
2. format 字段缺一半
   · backend 支持 toc / link / screenshot / summary 四个
   · 插件只暴露了 screenshot / link 两个 checkbox
   · 改:types.ts 新增 NOTE_FORMATS 常量,UI 渲染完整 4 个 checkbox。
     生成请求时 format 数组、screenshot/link 单布尔由 settings.formats 派生,单一真相源
3. 缺 extras 字段
   · backend VideoRequest.extras 直接拼到 prompt 末尾给 LLM
   · 改:popup 折叠的"高级"区 + options 默认生成选项区都加 textarea

Settings 默认值:style='minimal'、formats=['toc','summary']、extras=''。
旧 settings 里若 style 是无效字符串,下拉会显示空白,用户重选一次即可。

logic/types.ts:
- 新增 NoteStyle / NoteFormat type alias 与 NOTE_STYLES / NOTE_FORMATS 常量
- Settings 接口加 formats: NoteFormat[] / extras: string,style 改为 NoteStyle
- 老的 screenshot / link 布尔保留(向后兼容旧 storage),但 UI 不再绑定,submit 时也由 formats 派生

popup / background / options 三处提交 generate_note 的逻辑同步收口。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:18:28 +08:00
huangjianwu
c0837e0132 chore(release): merge release/2.1.4 back into develop 2026-05-07 16:45:11 +08:00
huangjianwu
c9497b502c chore(release): v2.1.4
CI 工程化修复,无运行时行为变化。详见 CHANGELOG.md。
2026-05-07 16:44:59 +08:00
huangjianwu
1aea86a8d6 docs: v2.1.4 CHANGELOG + README 版本
CI 工程化修复,无运行时行为变化。详见 CHANGELOG.md。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 16:44:49 +08:00
Jianwu Huang
9237cac9c3 Merge pull request #351 from JefferyHcool/fix/ci-commitlint
fix(ci): commitlint workflow 去掉伪 input + 规范 release merge commit 格式
2026-05-07 14:47:03 +08:00
Jianwu Huang
f97ab0b7bc Merge pull request #350 from JefferyHcool/fix/ci-drop-linux-tauri-build
ci(tauri): 桌面端构建去掉 Linux,只保留 macOS + Windows
2026-05-07 14:42:30 +08:00
huangjianwu
ac72cc6d6e ci(tauri): 桌面端构建去掉 Linux,只保留 macOS + Windows
Tauri Linux 构建 (ubuntu-22.04, x86_64-unknown-linux-gnu) 在 v2.1.x 这几次发版上
持续 17m+ 才完成,相比 macOS / Windows 更慢,且没有面向 Linux 桌面端用户的实际分发渠道。
直接从 matrix 里去掉。

清理:
- matrix 删除 ubuntu-22.04 条目
- 'Install Linux Dependencies' step(仅 ubuntu 触发)整段移除
- artifact 收集步里的 .deb / .AppImage 两条 find 命令移除

Linux 用户继续可以走 Docker 镜像 (ghcr.io/jefferyhcool/bilinote),那条线没变。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:30:07 +08:00
huangjianwu
7358cd0123 fix(ci): commitlint workflow 去掉伪 input + 规范 release merge commit 格式
v2.1.3 push master 时 Lint commit messages job 红了,根因两条:

1. workflow 里写了 'firstParent: false',但 wagoid/commitlint-github-action@v6
   的合法 input 列表里没这个字段,被 ignore 同时打 warn

2. release merge commit 标题 'Release v2.1.3' 不符合 type(scope): subject 格式,
   commitlint 报 subject-empty + type-empty
   · @commitlint/config-conventional 默认 ignore 'Merge ' 前缀的 commit,
     但我们手动 -m 把标题写成 'Release vX.Y.Z' 跳过了豁免

修:
- 去掉 .github/workflows/commitlint.yml 里那条 firstParent 假 input
- RELEASING.md §3 加入 merge commit 标题模板:
  · 合 master 用 'chore(release): vX.Y.Z'
  · 回灌 develop 用 'chore(release): merge release/X.Y.Z back into develop'
- CONTRIBUTING.md §6.3 同步加这条提醒

历史上 master / develop 的 'Release v2.1.x' 那几个 merge commit 已经在 history
里,没法回头改(不能强推 master)。但 commitlint 在 push 时只 lint 推送范围里的
新 commit,旧 commit 不会重新校验,所以不会持续报错。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:19:39 +08:00
huangjianwu
80f081613b Merge branch 'release/2.1.3' back into develop 2026-05-07 14:14:43 +08:00
huangjianwu
26e23d0f2c Release v2.1.3
修 issue #282 (DeepSeek 等非多模态供应商被 400 拒绝)。详见 CHANGELOG.md。
2026-05-07 14:14:33 +08:00
huangjianwu
234e3b9d2a docs: v2.1.3 CHANGELOG + README 版本
修 issue #282(DeepSeek 等非多模态供应商被 400 拒绝)。详见 CHANGELOG.md。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:14:22 +08:00
Jianwu Huang
1d93d1c5f5 Merge pull request #345 from voidborne-d/fix/backend-deepseek-content-format
fix(backend): UniversalGPT.create_messages emit string content when no images
2026-05-07 14:12:34 +08:00
huangjianwu
c19d462505 Merge branch 'release/2.1.2' back into develop 2026-05-07 14:06:35 +08:00
huangjianwu
64882e6a77 Release v2.1.2
补 v2.1.1 ghcr.io 镜像构建失败。详见 CHANGELOG.md。
2026-05-07 14:06:26 +08:00
huangjianwu
f32a6944d1 docs: v2.1.2 CHANGELOG + README 版本
补 v2.1.1 ghcr.io 镜像构建失败。详见 CHANGELOG.md。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:06:16 +08:00
Jianwu Huang
c5c84a8ec7 Merge pull request #349 from JefferyHcool/fix/docker-frontend-build
fix(docker): 修复 Tag push 触发的镜像构建失败
2026-05-07 14:04:38 +08:00
huangjianwu
c4413c66a1 fix(docker): 修复 Tag push 触发的镜像构建失败
ghcr.io 镜像推送在 v2.1.1 tag 上失败,停在 frontend-builder 第 7/7 步
'pnpm run build':vite loadConfigFromBundledFile 1.5s 内挂掉,没具体行号——
典型现象是 vite.config.ts 顶部 import 的某个 plugin(@tailwindcss/vite)的
native binding 在容器里 require 失败。

三处修复:
1. Dockerfile.complete + BillNote_frontend/Dockerfile:node:18-alpine → node:20-alpine
   · Tailwind v4 已不再支持 Node 18(package 现实需要 20+)
   · Vite 6 也建议 Node 20+
2. Dockerfile.complete 的 frontend 阶段:复制 pnpm-lock.yaml + 改用 --frozen-lockfile
   · 之前没传 lockfile,每次 pnpm install 重解析 semver,有可能拉到比本地更新的 native dep
3. BillNote_frontend/pnpm-lock.yaml 强制入库(git add -f)
   · 之前根 .gitignore 有条诡异的 'BiliNote/pnpm-lock.yaml'(拼错的路径),
     虽然没真匹配上这个文件,但 lockfile 历史上一直没被提交,导致 CI 与本地依赖图持续漂移
   · lockfile 里 @tailwindcss/oxide 同时锁了 musl 与 gnu 变体,alpine 跑没问题

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 14:01:26 +08:00
Jianwu Huang
26ee15ce28 Merge pull request #347 from JefferyHcool/docs/readme-wechat-qr
docs(readme): 联系和加入社区段落补上微信群二维码
2026-05-07 13:59:47 +08:00
huangjianwu
29fa3d9540 docs(readme): 联系和加入社区段落补上微信群二维码
之前只写"年会恢复更新以后放出最新社区地址",现在直接挂 doc/wechat.png 上去。
GitHub 渲染相对路径图片时按 raw.githubusercontent.com 自动转,无需 CDN。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 13:57:40 +08:00
huangjianwu
61cb4ec9fa Merge branch 'release/2.1.1' back into develop 2026-05-07 13:55:01 +08:00
voidborne-d
3ff7086491 fix(backend): UniversalGPT.create_messages emit string content when no images
DeepSeek deepseek-chat 等非多模态模型只接受 ``content`` 为字符串。旧实现在
没有 ``video_img_urls`` 输入时也把 ``content`` 拼成
``[{"type":"text","text":...}]`` 多模态数组,导致 DeepSeek API 返回
``Failed to deserialize the JSON body into the target type: messages[0]:
unknown variant `image_url`, expected `text```,整个笔记生成流程随之崩溃。

修复方式:``create_messages`` 在没有截图时退回 string content;有截图时维持
原多模态数组形态,多模态模型功能不退化。同时把 ``_build_merge_messages`` 也
改为 string content —— 合并阶段从不带图片,旧的数组形态会让长视频 chunk
之后的合并阶段同样命中 DeepSeek 400。

新增 ``backend/tests/test_universal_gpt_content_format.py`` (6 cases):

- 无图片 / 显式空 image 列表都走 string content
- 有图片仍输出多模态数组(含 ``image_url`` + ``detail: auto``)
- 纯文本响应里完全不含 ``image_url`` 字段
- ``_build_merge_messages`` 用 string content + 仍带入 partials 文本

红基线:在不打补丁的 ``universal_gpt.py`` 上跑这 6 个 case,3 个 string-
content 断言会失败(命中 issue #282 的同一根因),打补丁后 6/6 通过。

Closes #282
2026-05-07 13:50:59 +08:00
16 changed files with 11299 additions and 64 deletions

View File

@@ -26,7 +26,5 @@ jobs:
uses: wagoid/commitlint-github-action@v6
with:
configFile: .commitlintrc.json
# PR 上检查 base..head 之间所有 commitpush 上只校验最新 commit
firstParent: false
failOnWarnings: false
helpURL: https://github.com/JefferyHcool/BiliNote/blob/develop/CONTRIBUTING.md#5-提交规范

View File

@@ -13,8 +13,6 @@ jobs:
include:
- platform: macos-latest
target: universal-apple-darwin
- platform: ubuntu-22.04
target: x86_64-unknown-linux-gnu
- platform: windows-latest
target: x86_64-pc-windows-msvc
@@ -24,13 +22,6 @@ jobs:
- name: Checkout Code
uses: actions/checkout@v4
# Linux 系统依赖Tauri 需要)
- name: Install Linux Dependencies
if: matrix.platform == 'ubuntu-22.04'
run: |
sudo apt-get update
sudo apt-get install -y libwebkit2gtk-4.1-dev libappindicator3-dev librsvg2-dev patchelf
# 设置 Python 环境(带 pip 缓存)
- name: Set up Python
uses: actions/setup-python@v5
@@ -103,9 +94,6 @@ jobs:
# Windows: .msi, .exe (NSIS)
find "$BUNDLE_DIR" -name "*.msi" -exec cp {} release-artifacts/ \; 2>/dev/null || true
find "$BUNDLE_DIR/nsis" -name "*.exe" -exec cp {} release-artifacts/ \; 2>/dev/null || true
# Linux: .deb, .AppImage
find "$BUNDLE_DIR" -name "*.deb" -exec cp {} release-artifacts/ \; 2>/dev/null || true
find "$BUNDLE_DIR" -name "*.AppImage" -exec cp {} release-artifacts/ \; 2>/dev/null || true
echo "=== Collected artifacts ==="
ls -lh release-artifacts/

View File

@@ -70,6 +70,7 @@ async function startTask(url: string): Promise<{ ok: boolean, taskId?: string, e
// B 站:先在浏览器里抓字幕(带本地登录态 cookie随提交带过去
const prefetched = platform === 'bilibili' ? await fetchBilibiliSubtitle(url) : null
const formats = settings.formats || []
try {
const res = await fetch(`${backend}/api/generate_note`, {
method: 'POST',
@@ -80,13 +81,15 @@ async function startTask(url: string): Promise<{ ok: boolean, taskId?: string, e
quality: settings.quality,
provider_id: settings.providerId,
model_name: settings.modelName,
screenshot: settings.screenshot,
link: settings.link,
// backend 同时接受 format 数组与 screenshot/link 单独布尔;从 formats 派生保持单一真相源
format: [...formats],
screenshot: formats.includes('screenshot'),
link: formats.includes('link'),
style: settings.style || undefined,
format: [
...(settings.screenshot ? ['screenshot'] : []),
...(settings.link ? ['link'] : []),
],
extras: settings.extras || undefined,
video_understanding: settings.video_understanding || undefined,
video_interval: settings.video_understanding ? settings.video_interval : undefined,
grid_size: settings.video_understanding ? settings.grid_size : undefined,
prefetched_transcript: prefetched ?? undefined,
}),
})

View File

@@ -7,9 +7,14 @@ export const DEFAULT_SETTINGS: Settings = {
providerId: '',
modelName: '',
quality: 'medium',
formats: ['toc', 'summary'],
screenshot: false,
link: false,
style: '',
style: 'minimal',
extras: '',
video_understanding: false,
video_interval: 6,
grid_size: [2, 2],
}
export const MAX_TASKS = 30

View File

@@ -40,6 +40,9 @@ export interface GenerateRequest {
format?: string[]
style?: string
extras?: string
video_understanding?: boolean
video_interval?: number
grid_size?: [number, number]
// 客户端在浏览器里直接抓到的字幕,跳过后端的 download_subtitles + 音频转写
prefetched_transcript?: {
language: string
@@ -78,14 +81,52 @@ export interface TaskRecord {
result?: NoteResult
}
// 与 backend/app/gpt/prompt_builder.py note_styles 一一对齐
export type NoteStyle =
| 'minimal' | 'detailed' | 'academic' | 'tutorial'
| 'xiaohongshu' | 'life_journal' | 'task_oriented'
| 'business' | 'meeting_minutes'
// 与 backend/app/gpt/prompt_builder.py note_formats 一一对齐
export type NoteFormat = 'toc' | 'link' | 'screenshot' | 'summary'
export const NOTE_STYLES: Array<{ value: NoteStyle, label: string }> = [
{ value: 'minimal', label: '精简' },
{ value: 'detailed', label: '详细' },
{ value: 'tutorial', label: '教程' },
{ value: 'academic', label: '学术' },
{ value: 'xiaohongshu', label: '小红书' },
{ value: 'life_journal', label: '生活向' },
{ value: 'task_oriented', label: '任务导向' },
{ value: 'business', label: '商业风格' },
{ value: 'meeting_minutes', label: '会议纪要' },
]
export const NOTE_FORMATS: Array<{ value: NoteFormat, label: string }> = [
{ value: 'toc', label: '目录' },
{ value: 'summary', label: 'AI 总结' },
{ value: 'screenshot', label: '原片截图' },
{ value: 'link', label: '原片跳转' },
]
export interface Settings {
backendUrl: string
providerId: string
modelName: string
quality: Quality
// 输出 format 的 toggle 集合screenshot / link 与下方两个布尔保持联动)
formats: NoteFormat[]
screenshot: boolean
link: boolean
style: string
style: NoteStyle
extras: string
// 多模态视频理解:抽帧拼图喂给视觉模型,提升画面相关问题的回答质量
// 要求所选 model 是视觉模型(如 gpt-4o / gemini / claude-opus 系列),文字模型会忽略图片
video_understanding: boolean
// 抽帧间隔(秒),范围 1-30默认 6
video_interval: number
// 拼图网格 [rows, cols],每张拼图最多 rows*cols 帧。默认 [2,2]
grid_size: [number, number]
}
export interface ProviderUpdatePayload {

View File

@@ -3,9 +3,16 @@ import { onMounted, ref } from 'vue'
import { getProviders, ping } from '~/logic/api'
import { settings, settingsReady } from '~/logic/storage'
import { getModelsByProvider } from '~/logic/api'
import type { Model, Provider } from '~/logic/types'
import { NOTE_FORMATS, NOTE_STYLES, type Model, type NoteFormat, type Provider } from '~/logic/types'
import { watch } from 'vue'
function toggleFormat(value: NoteFormat, checked: boolean) {
const cur = settings.value.formats || []
settings.value.formats = checked
? Array.from(new Set([...cur, value]))
: cur.filter(v => v !== value)
}
const providers = ref<Provider[]>([])
const models = ref<Model[]>([])
const status = ref<{ kind: 'idle' | 'ok' | 'err', text: string }>({ kind: 'idle', text: '' })
@@ -128,13 +135,67 @@ onMounted(async () => {
</label>
<label class="flex flex-col gap-1">
<span class="text-gray-600">笔记风格</span>
<input v-model="settings.style" class="input" placeholder="留空使用默认">
<select v-model="settings.style" class="input">
<option v-for="s in NOTE_STYLES" :key="s.value" :value="s.value">{{ s.label }}</option>
</select>
</label>
<label class="flex items-center gap-2">
<input v-model="settings.screenshot" type="checkbox"> 自动插入截图
</div>
<div class="flex flex-col gap-1 text-sm">
<span class="text-gray-600">输出形式 web NoteForm 对齐</span>
<div class="flex flex-wrap gap-x-4 gap-y-2">
<label v-for="f in NOTE_FORMATS" :key="f.value" class="flex items-center gap-2">
<input
type="checkbox"
:checked="(settings.formats || []).includes(f.value)"
@change="toggleFormat(f.value, ($event.target as HTMLInputElement).checked)"
>
{{ f.label }}
</label>
</div>
</div>
<label class="flex flex-col gap-1 text-sm">
<span class="text-gray-600">额外提示词追加到 prompt 末尾</span>
<textarea
v-model="settings.extras"
class="input resize-y"
rows="3"
placeholder="例如:重点关注游戏开发部分;保留所有专业术语原文"
/>
</label>
</section>
<section class="section-card">
<h2 class="font-semibold">视频理解多模态</h2>
<p class="text-xs text-gray-500">
启用后会按抽帧间隔截取视频帧拼成网格图连同字幕一起喂给视觉模型提升画面相关问题的回答质量
<strong class="text-amber-700">需要选择视觉模型</strong>GPT-4o / Gemini / Claude 文字模型会忽略图片
</p>
<label class="flex items-center gap-2 text-sm">
<input v-model="settings.video_understanding" type="checkbox">
启用视频理解
</label>
<div v-if="settings.video_understanding" class="grid grid-cols-3 gap-3 text-sm">
<label class="flex flex-col gap-1">
<span class="text-gray-600">抽帧间隔(, 1-30)</span>
<input v-model.number="settings.video_interval" type="number" min="1" max="30" class="input">
</label>
<label class="flex items-center gap-2">
<input v-model="settings.link" type="checkbox"> 插入原片跳转链接
<label class="flex flex-col gap-1">
<span class="text-gray-600">拼图行 (1-10)</span>
<input
:value="settings.grid_size?.[0] ?? 2"
type="number" min="1" max="10" class="input"
@input="settings.grid_size = [Number(($event.target as HTMLInputElement).value) || 2, settings.grid_size?.[1] ?? 2]"
>
</label>
<label class="flex flex-col gap-1">
<span class="text-gray-600">拼图列 (1-10)</span>
<input
:value="settings.grid_size?.[1] ?? 2"
type="number" min="1" max="10" class="input"
@input="settings.grid_size = [settings.grid_size?.[0] ?? 2, Number(($event.target as HTMLInputElement).value) || 2]"
>
</label>
</div>
</section>

View File

@@ -4,7 +4,7 @@ import { detectPlatform } from '~/logic/platform'
import { settings, settingsReady, tasks, tasksReady, upsertTask } from '~/logic/storage'
import { generateNote, getTaskStatus, resolveImageUrl } from '~/logic/api'
import { fetchBilibiliSubtitle } from '~/logic/bilibili-subtitle'
import type { TaskRecord } from '~/logic/types'
import { NOTE_FORMATS, NOTE_STYLES, type NoteFormat, type TaskRecord } from '~/logic/types'
const tabUrl = ref<string>('')
const tabTitle = ref<string>('')
@@ -67,19 +67,22 @@ async function start() {
try {
// B 站:在用户浏览器里直接抓字幕(带本地登录态 cookie跳过后端的 download_subtitles 与音频转写
const prefetched = platform.value === 'bilibili' ? await fetchBilibiliSubtitle(tabUrl.value) : null
const formats = settings.value.formats || []
const { task_id } = await generateNote({
video_url: tabUrl.value,
platform: platform.value!,
quality: settings.value.quality,
provider_id: settings.value.providerId,
model_name: settings.value.modelName,
screenshot: settings.value.screenshot,
link: settings.value.link,
// backend VideoRequest 同时接受 format 数组与 screenshot/link 单独布尔,从 formats 派生保持单一真相源
format: [...formats],
screenshot: formats.includes('screenshot'),
link: formats.includes('link'),
style: settings.value.style || undefined,
format: [
...(settings.value.screenshot ? ['screenshot'] : []),
...(settings.value.link ? ['link'] : []),
],
extras: settings.value.extras || undefined,
video_understanding: settings.value.video_understanding || undefined,
video_interval: settings.value.video_understanding ? settings.value.video_interval : undefined,
grid_size: settings.value.video_understanding ? settings.value.grid_size : undefined,
prefetched_transcript: prefetched ?? undefined,
})
activeTaskId.value = task_id
@@ -108,6 +111,13 @@ function openOptions() {
browser.runtime.openOptionsPage()
}
function toggleFormat(value: NoteFormat, checked: boolean) {
const cur = settings.value.formats || []
settings.value.formats = checked
? Array.from(new Set([...cur, value]))
: cur.filter(v => v !== value)
}
async function openSidePanel() {
// 只能在用户操作触发的同步上下文里调,且需要明确的 tabId
try {
@@ -176,7 +186,7 @@ onUnmounted(() => {
</div>
<fieldset class="border rounded p-2 flex flex-col gap-2" :disabled="!supported || submitting">
<div class="grid grid-cols-3 gap-2 text-xs">
<div class="grid grid-cols-2 gap-2 text-xs">
<label class="flex flex-col gap-1">
<span class="text-gray-600">画质</span>
<select v-model="settings.quality" class="border rounded px-1 py-0.5">
@@ -185,14 +195,76 @@ onUnmounted(() => {
<option value="slow">高质</option>
</select>
</label>
<label class="flex items-center gap-1 mt-4">
<input v-model="settings.screenshot" type="checkbox"> 截图
</label>
<label class="flex items-center gap-1 mt-4">
<input v-model="settings.link" type="checkbox"> 跳转
<label class="flex flex-col gap-1">
<span class="text-gray-600">笔记风格</span>
<select v-model="settings.style" class="border rounded px-1 py-0.5">
<option v-for="s in NOTE_STYLES" :key="s.value" :value="s.value">{{ s.label }}</option>
</select>
</label>
</div>
<div class="flex flex-col gap-1 text-xs">
<span class="text-gray-600">输出形式</span>
<div class="flex flex-wrap gap-x-3 gap-y-1">
<label v-for="f in NOTE_FORMATS" :key="f.value" class="flex items-center gap-1">
<input
type="checkbox"
:checked="(settings.formats || []).includes(f.value)"
@change="toggleFormat(f.value, ($event.target as HTMLInputElement).checked)"
>
{{ f.label }}
</label>
</div>
</div>
<details class="text-xs">
<summary class="cursor-pointer text-gray-500">高级</summary>
<label class="flex flex-col gap-1 mt-2">
<span class="text-gray-600">额外提示词追加到 prompt 末尾</span>
<textarea
v-model="settings.extras"
class="border rounded px-1 py-1 resize-y"
rows="2"
placeholder="例如:重点关注游戏开发部分;保留所有专业术语原文"
/>
</label>
<label class="flex items-center gap-2 mt-2">
<input v-model="settings.video_understanding" type="checkbox">
<span class="text-gray-600">启用视频理解抽帧拼图喂视觉模型</span>
</label>
<div v-if="settings.video_understanding" class="grid grid-cols-3 gap-2 mt-2">
<label class="flex flex-col gap-1">
<span class="text-gray-600">抽帧间隔()</span>
<input
v-model.number="settings.video_interval"
type="number" min="1" max="30"
class="border rounded px-1 py-0.5"
>
</label>
<label class="flex flex-col gap-1">
<span class="text-gray-600">拼图行</span>
<input
:value="settings.grid_size?.[0] ?? 2"
type="number" min="1" max="10"
class="border rounded px-1 py-0.5"
@input="settings.grid_size = [Number(($event.target as HTMLInputElement).value) || 2, settings.grid_size?.[1] ?? 2]"
>
</label>
<label class="flex flex-col gap-1">
<span class="text-gray-600">拼图列</span>
<input
:value="settings.grid_size?.[1] ?? 2"
type="number" min="1" max="10"
class="border rounded px-1 py-0.5"
@input="settings.grid_size = [settings.grid_size?.[0] ?? 2, Number(($event.target as HTMLInputElement).value) || 2]"
>
</label>
</div>
<p v-if="settings.video_understanding" class="text-amber-700 mt-1">
需要选择视觉模型GPT-4o / Gemini / Claude 文字模型会忽略图片
</p>
</details>
<div class="text-xs text-gray-600">
<span v-if="settings.providerId && settings.modelName">
模型{{ settings.modelName }}

View File

@@ -1,5 +1,6 @@
# === 前端构建阶段 ===
FROM node:18-alpine AS builder
# Tailwind v4 / Vite 6 需要 Node 20+alpine + pnpm 会按 lockfile 拉 musl native binary。
FROM node:20-alpine AS builder
RUN corepack enable && corepack prepare pnpm@latest --activate

10810
BillNote_frontend/pnpm-lock.yaml generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -2,6 +2,39 @@
本项目所有重要变更记录于此。格式参考 [Keep a Changelog](https://keepachangelog.com/zh-CN/1.1.0/),遵循 [语义化版本](https://semver.org/lang/zh-CN/)。
## [2.1.4] - 2026-05-07
CI 工程化修复,无运行时行为变化。
### Internal
- 桌面端 Tauri 构建矩阵去掉 Linux`ubuntu-22.04 / x86_64-unknown-linux-gnu`。Linux 桌面端构建持续 17m+且无对应分发渠道Linux 用户继续可以走 Docker 镜像 (`ghcr.io/jefferyhcool/bilinote`)
- commitlint workflow 去掉无效的 `firstParent` inputwagoid/commitlint-github-action@v6 不支持,被忽略并打 warn
- 规范 release merge commit 标题:`chore(release): vX.Y.Z`(合 master/ `chore(release): merge release/X.Y.Z back into develop`(回灌 develop让 commitlint 能正确识别。`RELEASING.md` §3 与 `CONTRIBUTING.md` §6.3 同步更新
## [2.1.3] - 2026-05-07
### Fixed
- DeepSeek 等非多模态供应商被 400 拒绝issue #282`UniversalGPT.create_messages``_build_merge_messages` 此前**无条件**把 content 拼成 OpenAI 多模态数组 `[{"type":"text",...}]`DeepSeek `deepseek-chat` 等模型不识别 `image_url` 变体直接报 `invalid_request_error``GPTFactory.from_config` 一律实例化 `UniversalGPT`,所以问题覆盖**所有**通过模型设置页接入的非多模态供应商,不止 DeepSeek。
- 现按 `video_img_urls` 是否非空切换 content 形态:有图保留多模态数组(视觉模型不退化),无图退回 string。合并阶段历来不带图统一改 string。
- 与同包内 `deepseek_gpt.py` / `openai_gpt.py` / `qwen_gpt.py` 的 message builder 行为对齐。
- 新增 `backend/tests/test_universal_gpt_content_format.py` 6 个 case 回归覆盖(含 `image_url` 字面 not-in JSON 断言)。
感谢 @voidborne-d 的修复(#345)。
## [2.1.2] - 2026-05-07
补 v2.1.1 上 ghcr.io 镜像构建失败的坑。
### Fixed
- Docker 镜像构建失败v2.1.1 tag 触发的 ghcr.io 推送在 frontend-builder 第 7/7 步 `pnpm run build` 挂掉vite `loadConfigFromBundledFile` 加载 `@tailwindcss/vite` plugin 时 1.5s 内异常退出)。
- `Dockerfile.complete``BillNote_frontend/Dockerfile``node:18-alpine``node:20-alpine`Tailwind v4 已不再支持 Node 18Vite 6 也推荐 Node 20+
- `Dockerfile.complete` 的 frontend 阶段同时复制 `pnpm-lock.yaml` 并改用 `--frozen-lockfile`,杜绝每次构建重解析 semver 拉到比本地新的 native dep
- `BillNote_frontend/pnpm-lock.yaml` 强制入库(之前一直未提交,导致 CI / 本地依赖图持续漂移)
- README 联系社区段补上微信群二维码(之前只写"年会恢复更新以后放出最新社区地址"
## [2.1.1] - 2026-05-07
工程化与文档收尾,无运行时行为变化。

View File

@@ -250,6 +250,7 @@ chore(ci): 优化 docker 构建缓存
- `feature/*` / `fix/*` 合入 `develop`:推荐 **Squash and merge**,保持 develop 历史线性。
- `release/*` 合入 `master` 与回灌 `develop`:使用 **Merge commit (--no-ff)**,保留发版结构。
· merge commit 标题用 `chore(release): vX.Y.Z`(合 master`chore(release): merge release/X.Y.Z back into develop`(回灌 develop保证 commitlint 通过。
- `hotfix/*` 同上 release。
### 6.4 合并后

View File

@@ -26,15 +26,18 @@ RUN pip install --no-cache-dir -i ${PIP_INDEX} -r requirements.txt
COPY ./backend /tmp/backend
# === 阶段2构建 Frontend ===
FROM node:18-alpine AS frontend-builder
# Node 18-alpine 跑不动 Tailwind v4 / Vite 6前者要求 Node 20+,后者推荐 Node 20+
# 升到 node:20-alpine。alpine 走 muslpnpm 会按 lockfile 拉 *-linux-x64-musl native binary。
FROM node:20-alpine AS frontend-builder
RUN corepack enable && corepack prepare pnpm@latest --activate
WORKDIR /tmp/frontend
# 先复制 package.json 利用依赖层缓存
COPY ./BillNote_frontend/package.json ./
RUN pnpm install
# 先复制 package.json + lockfile 利用依赖层缓存
# --frozen-lockfile 保证 CI 与本地开发依赖版本一致,杜绝 semver 漂移引入的破坏性升级
COPY ./BillNote_frontend/package.json ./BillNote_frontend/pnpm-lock.yaml ./
RUN pnpm install --frozen-lockfile
COPY ./BillNote_frontend /tmp/frontend

View File

@@ -3,7 +3,7 @@
<p align="center">
<img src="./doc/icon.svg" alt="BiliNote Banner" width="50" height="50" />
</p>
<h1 align="center" > BiliNote v2.1.1</h1>
<h1 align="center" > BiliNote v2.1.4</h1>
</div>
<p align="center"><i>AI 视频笔记生成工具 让 AI 为你的视频做笔记</i></p>
@@ -53,6 +53,21 @@ BiliNote 是一个开源的 AI 视频笔记助手支持通过哔哩哔哩、Y
- 笔记顶部视频封面 Banner 展示
- 工作区和生成历史面板支持折叠/展开
### v2.1.4 修订
- CI桌面端 Tauri 构建去掉 Linux17m+ 慢线退役Linux 用户继续走 Docker 镜像)
- CIcommitlint workflow 修复 + 规范 release merge commit 标题约定
### v2.1.3 修订
- 修复 DeepSeek 等非多模态供应商被 400 拒绝的问题issue #282`UniversalGPT` 的 message builder 按是否带图切换 string / 多模态数组形态
- 感谢 @voidborne-d (#345)
### v2.1.2 修订
- 修复 v2.1.1 触发的 ghcr.io Docker 镜像构建失败Node 18 + Tailwind v4 不兼容、缺 lockfile
- README 补上微信群二维码
### v2.1.1 修订
- 工程化与文档收尾CONTRIBUTING.md / RELEASING.md / issue + PR 模板 / commitlint CI / 插件发版工作流
@@ -205,7 +220,12 @@ docker-compose -f docker-compose.gpu.yml up -d
- [ ] 笔记导出为 PDF / Word / Notion
### Contact and Join-联系和加入社区
年会恢复更新以后放出最新社区地址
扫码加入 BiliNote 交流微信群(如二维码失效,请到 [Issues](https://github.com/JefferyHcool/BiliNote/issues) 反馈):
<p align="center">
<img src="./doc/wechat.png" alt="BiliNote 交流微信群" width="240" />
</p>

View File

@@ -42,10 +42,13 @@ git push -u origin release/X.Y.Z
在 GitHub 上发起两个 PR
| PR | base | 合并方式 |
|---|---|---|
| `release/X.Y.Z``master` | `master` | **Merge commit (--no-ff)** |
| `release/X.Y.Z``develop` | `develop` | **Merge commit (--no-ff)** |
| PR | base | 合并方式 | 合并后 commit 标题 |
|---|---|---|---|
| `release/X.Y.Z``master` | `master` | **Merge commit (--no-ff)** | `chore(release): vX.Y.Z` |
| `release/X.Y.Z``develop` | `develop` | **Merge commit (--no-ff)** | `chore(release): merge release/X.Y.Z back into develop` |
> ⚠️ Merge commit 的标题**必须**符合 `type(scope): subject` 格式commitlint 在 push 到 master/develop 时会校验)。
> 历史上用过 `Release vX.Y.Z` 这种形式,会被 commitlint 报 `type-empty` / `subject-empty`。
`master` 分支保护要求 review 通过。回灌 `develop` 是为了把发版冻结期内的小修同步回来。

View File

@@ -53,20 +53,26 @@ class UniversalGPT(GPT):
extras=kwargs.get('extras'),
)
# ⛳ 组装 content 数组,支持 text + image_url 混合
content: List[dict] = [{"type": "text", "text": content_text}]
video_img_urls = kwargs.get('video_img_urls', [])
for url in video_img_urls:
content.append({
"type": "image_url",
"image_url": {
"url": url,
"detail": "auto"
}
})
content: list[dict] | str
if video_img_urls:
# 有截图时走 OpenAI 多模态 content 数组text + image_url
content = [{"type": "text", "text": content_text}]
for url in video_img_urls:
content.append({
"type": "image_url",
"image_url": {
"url": url,
"detail": "auto"
}
})
else:
# 纯文本场景退回 string contentDeepSeek deepseek-chat 等非多模态模型
# 不识别 [{"type":"text",...}] 数组形态,会返回 invalid_request_error
# issue #282。OpenAI 规范本身也允许 content 为 string。
content = content_text
# 正确格式:整体包在一个 message 里role + content array
messages = [{
"role": "user",
"content": content
@@ -83,9 +89,10 @@ class UniversalGPT(GPT):
def _build_merge_messages(self, partials: list) -> list:
merge_text = MERGE_PROMPT + "\n\n" + "\n\n---\n\n".join(partials)
# 合并阶段没有图片,直接用 string content 兼容非多模态模型issue #282
return [{
"role": "user",
"content": [{"type": "text", "text": merge_text}]
"content": merge_text
}]
def _checkpoint_path(self, checkpoint_key: str) -> Path:

View File

@@ -0,0 +1,189 @@
"""issue #282 回归测试UniversalGPT 拼装 content 时按是否有图片切换 string / array 形态。
DeepSeek deepseek-chat 等非多模态模型只接受 ``content`` 为字符串,旧实现无条件
emit ``[{"type":"text","text":...}]`` 导致 ``invalid_request_error``。
"""
import importlib.util
import pathlib
import sys
import types
import unittest
def _install_stubs():
app_mod = types.ModuleType("app")
gpt_pkg = types.ModuleType("app.gpt")
models_pkg = types.ModuleType("app.models")
base_mod = types.ModuleType("app.gpt.base")
class _GPT:
pass
base_mod.GPT = _GPT
prompt_builder_mod = types.ModuleType("app.gpt.prompt_builder")
def _generate_base_prompt(**_kwargs):
return "PROMPT_BODY"
prompt_builder_mod.generate_base_prompt = _generate_base_prompt
prompt_mod = types.ModuleType("app.gpt.prompt")
prompt_mod.BASE_PROMPT = ""
prompt_mod.AI_SUM = ""
prompt_mod.SCREENSHOT = ""
prompt_mod.LINK = ""
prompt_mod.MERGE_PROMPT = "MERGE_HEAD"
utils_mod = types.ModuleType("app.gpt.utils")
def _fix_markdown(text):
return text
utils_mod.fix_markdown = _fix_markdown
request_chunker_mod = types.ModuleType("app.gpt.request_chunker")
class _RequestChunker:
def __init__(self, *_args, **_kwargs):
pass
def group_texts_by_budget(self, texts, _builder, **_kwargs):
return [texts]
request_chunker_mod.RequestChunker = _RequestChunker
gpt_model_mod = types.ModuleType("app.models.gpt_model")
class _GPTSource:
pass
gpt_model_mod.GPTSource = _GPTSource
transcriber_model_mod = types.ModuleType("app.models.transcriber_model")
class _TranscriptSegment:
def __init__(self, **kwargs):
self.start = kwargs.get("start", 0)
self.end = kwargs.get("end", 0)
self.text = kwargs.get("text", "")
transcriber_model_mod.TranscriptSegment = _TranscriptSegment
sys.modules.setdefault("app", app_mod)
sys.modules.setdefault("app.gpt", gpt_pkg)
sys.modules.setdefault("app.models", models_pkg)
sys.modules["app.gpt.base"] = base_mod
sys.modules["app.gpt.prompt_builder"] = prompt_builder_mod
sys.modules["app.gpt.prompt"] = prompt_mod
sys.modules["app.gpt.utils"] = utils_mod
sys.modules["app.gpt.request_chunker"] = request_chunker_mod
sys.modules["app.models.gpt_model"] = gpt_model_mod
sys.modules["app.models.transcriber_model"] = transcriber_model_mod
def _load_universal_gpt_class():
_install_stubs()
root = pathlib.Path(__file__).resolve().parents[1]
module_path = root / "app" / "gpt" / "universal_gpt.py"
spec = importlib.util.spec_from_file_location(
"universal_gpt_content_format", module_path
)
if spec is None or spec.loader is None:
raise ImportError("universal_gpt module spec not found")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module.UniversalGPT
UniversalGPT = _load_universal_gpt_class()
class _DummyClient:
"""create_messages 不会真的调用 client给个空壳即可。"""
def _make_gpt():
return UniversalGPT(_DummyClient(), model="deepseek-chat")
class TestCreateMessagesContentFormat(unittest.TestCase):
"""覆盖 create_messages 在不同 video_img_urls 输入下的输出形态。"""
def test_no_images_emits_string_content(self):
"""无图片时 content 为 strDeepSeek / 非多模态模型可解析)。"""
gpt = _make_gpt()
messages = gpt.create_messages(segments=[])
self.assertEqual(len(messages), 1)
self.assertEqual(messages[0]["role"], "user")
self.assertIsInstance(messages[0]["content"], str)
self.assertEqual(messages[0]["content"], "PROMPT_BODY")
def test_empty_image_list_emits_string_content(self):
"""显式传入空列表也要走纯文本分支,避免图片字段误触发。"""
gpt = _make_gpt()
messages = gpt.create_messages(segments=[], video_img_urls=[])
self.assertIsInstance(messages[0]["content"], str)
def test_with_images_emits_multimodal_array(self):
"""有图片时保留多模态 array 形态,确保多模态模型功能不退化。"""
gpt = _make_gpt()
messages = gpt.create_messages(
segments=[],
video_img_urls=["https://example.com/a.jpg", "https://example.com/b.jpg"],
)
content = messages[0]["content"]
self.assertIsInstance(content, list)
self.assertEqual(len(content), 3) # 1 text + 2 images
self.assertEqual(content[0], {"type": "text", "text": "PROMPT_BODY"})
self.assertEqual(content[1]["type"], "image_url")
self.assertEqual(content[1]["image_url"]["url"], "https://example.com/a.jpg")
self.assertEqual(content[1]["image_url"]["detail"], "auto")
self.assertEqual(content[2]["image_url"]["url"], "https://example.com/b.jpg")
def test_no_image_url_field_when_no_images(self):
"""纯文本响应里不应该出现 image_url 关键字 —— 这是触发 DeepSeek 400 的根因。"""
gpt = _make_gpt()
messages = gpt.create_messages(segments=[])
import json
serialized = json.dumps(messages, ensure_ascii=False)
self.assertNotIn("image_url", serialized)
class TestBuildMergeMessagesContentFormat(unittest.TestCase):
"""合并阶段从不带图片,应该统一走 string content 路径。"""
def test_merge_messages_use_string_content(self):
"""否则长视频 chunk 后的合并阶段还会复现 issue #282 错误。"""
gpt = _make_gpt()
messages = gpt._build_merge_messages(["partial-A", "partial-B"])
self.assertEqual(len(messages), 1)
self.assertEqual(messages[0]["role"], "user")
self.assertIsInstance(messages[0]["content"], str)
self.assertIn("MERGE_HEAD", messages[0]["content"])
self.assertIn("partial-A", messages[0]["content"])
self.assertIn("partial-B", messages[0]["content"])
def test_merge_messages_no_image_url_field(self):
gpt = _make_gpt()
messages = gpt._build_merge_messages(["x"])
import json
serialized = json.dumps(messages, ensure_ascii=False)
self.assertNotIn("image_url", serialized)
if __name__ == "__main__":
unittest.main()