5 Commits

Author SHA1 Message Date
Jianwu Huang
f6d299ce48 Merge pull request #353 from JefferyHcool/feature/extension-video-understanding
feat(extension): 多模态视频理解开关 + 抽帧/拼图参数(对齐 web NoteForm)
2026-05-07 17:28:00 +08:00
Jianwu Huang
ed1ee0a151 Merge pull request #352 from JefferyHcool/feature/extension-form-parity
Feature/extension form parity
2026-05-07 17:27:35 +08:00
huangjianwu
a7c717abbd feat(extension): 多模态视频理解开关 + 抽帧/拼图参数(对齐 web NoteForm)
web 端 NoteForm 早就有 video_understanding / video_interval / grid_size 三件套,
插件之前没有,导致用户在视觉模型上想用「画面理解」时只能去 web 端发任务。

新增字段(types.ts Settings 与 GenerateRequest 同步):
- video_understanding: boolean,默认 false(关)
- video_interval: number,1-30 秒,默认 6(与 web NoteForm 默认一致)
- grid_size: [number, number],1-10,默认 [2,2]

UI 落地:
- popup 「高级」折叠区:开关 + interval + grid_size 行/列三栏,启用时才显示后两个,
  并提示需要选视觉模型
- options General 页:单独一节「视频理解(多模态)」展开同样字段
- popup start() 与 background startTask() 在 generate_note 请求里带上这三个字段;
  关闭时不传(避免覆盖 backend 默认)

回归风险:默认 false,对现有用户行为不变。

依赖:feature/extension-form-parity(叠加在它之上,因为 Settings 是同一片字段域)。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:22:57 +08:00
huangjianwu
799ab64a28 feat(extension): NoteForm 字段对齐 web 端(style 预设 + format 完整 + extras)
之前插件 popup / options 的笔记选项跟 web 端 NoteForm 不齐,存在三处差距:

1. style 字段实质 broken
   · backend prompt_builder.get_style_format 是 enum 映射(minimal/detailed/
     academic/tutorial/xiaohongshu/life_journal/task_oriented/business/
     meeting_minutes 共 9 个),不命中直接 return ''
   · 插件原来给的是自由文本框,用户填什么都对不上 enum,等于没传
   · 改:popup + options 都换成 9 个预设的下拉框,与 backend 严格对齐
2. format 字段缺一半
   · backend 支持 toc / link / screenshot / summary 四个
   · 插件只暴露了 screenshot / link 两个 checkbox
   · 改:types.ts 新增 NOTE_FORMATS 常量,UI 渲染完整 4 个 checkbox。
     生成请求时 format 数组、screenshot/link 单布尔由 settings.formats 派生,单一真相源
3. 缺 extras 字段
   · backend VideoRequest.extras 直接拼到 prompt 末尾给 LLM
   · 改:popup 折叠的"高级"区 + options 默认生成选项区都加 textarea

Settings 默认值:style='minimal'、formats=['toc','summary']、extras=''。
旧 settings 里若 style 是无效字符串,下拉会显示空白,用户重选一次即可。

logic/types.ts:
- 新增 NoteStyle / NoteFormat type alias 与 NOTE_STYLES / NOTE_FORMATS 常量
- Settings 接口加 formats: NoteFormat[] / extras: string,style 改为 NoteStyle
- 老的 screenshot / link 布尔保留(向后兼容旧 storage),但 UI 不再绑定,submit 时也由 formats 派生

popup / background / options 三处提交 generate_note 的逻辑同步收口。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 17:18:28 +08:00
huangjianwu
c0837e0132 chore(release): merge release/2.1.4 back into develop 2026-05-07 16:45:11 +08:00
5 changed files with 209 additions and 27 deletions

View File

@@ -70,6 +70,7 @@ async function startTask(url: string): Promise<{ ok: boolean, taskId?: string, e
// B 站:先在浏览器里抓字幕(带本地登录态 cookie随提交带过去
const prefetched = platform === 'bilibili' ? await fetchBilibiliSubtitle(url) : null
const formats = settings.formats || []
try {
const res = await fetch(`${backend}/api/generate_note`, {
method: 'POST',
@@ -80,13 +81,15 @@ async function startTask(url: string): Promise<{ ok: boolean, taskId?: string, e
quality: settings.quality,
provider_id: settings.providerId,
model_name: settings.modelName,
screenshot: settings.screenshot,
link: settings.link,
// backend 同时接受 format 数组与 screenshot/link 单独布尔;从 formats 派生保持单一真相源
format: [...formats],
screenshot: formats.includes('screenshot'),
link: formats.includes('link'),
style: settings.style || undefined,
format: [
...(settings.screenshot ? ['screenshot'] : []),
...(settings.link ? ['link'] : []),
],
extras: settings.extras || undefined,
video_understanding: settings.video_understanding || undefined,
video_interval: settings.video_understanding ? settings.video_interval : undefined,
grid_size: settings.video_understanding ? settings.grid_size : undefined,
prefetched_transcript: prefetched ?? undefined,
}),
})

View File

@@ -7,9 +7,14 @@ export const DEFAULT_SETTINGS: Settings = {
providerId: '',
modelName: '',
quality: 'medium',
formats: ['toc', 'summary'],
screenshot: false,
link: false,
style: '',
style: 'minimal',
extras: '',
video_understanding: false,
video_interval: 6,
grid_size: [2, 2],
}
export const MAX_TASKS = 30

View File

@@ -40,6 +40,9 @@ export interface GenerateRequest {
format?: string[]
style?: string
extras?: string
video_understanding?: boolean
video_interval?: number
grid_size?: [number, number]
// 客户端在浏览器里直接抓到的字幕,跳过后端的 download_subtitles + 音频转写
prefetched_transcript?: {
language: string
@@ -78,14 +81,52 @@ export interface TaskRecord {
result?: NoteResult
}
// 与 backend/app/gpt/prompt_builder.py note_styles 一一对齐
export type NoteStyle =
| 'minimal' | 'detailed' | 'academic' | 'tutorial'
| 'xiaohongshu' | 'life_journal' | 'task_oriented'
| 'business' | 'meeting_minutes'
// 与 backend/app/gpt/prompt_builder.py note_formats 一一对齐
export type NoteFormat = 'toc' | 'link' | 'screenshot' | 'summary'
export const NOTE_STYLES: Array<{ value: NoteStyle, label: string }> = [
{ value: 'minimal', label: '精简' },
{ value: 'detailed', label: '详细' },
{ value: 'tutorial', label: '教程' },
{ value: 'academic', label: '学术' },
{ value: 'xiaohongshu', label: '小红书' },
{ value: 'life_journal', label: '生活向' },
{ value: 'task_oriented', label: '任务导向' },
{ value: 'business', label: '商业风格' },
{ value: 'meeting_minutes', label: '会议纪要' },
]
export const NOTE_FORMATS: Array<{ value: NoteFormat, label: string }> = [
{ value: 'toc', label: '目录' },
{ value: 'summary', label: 'AI 总结' },
{ value: 'screenshot', label: '原片截图' },
{ value: 'link', label: '原片跳转' },
]
export interface Settings {
backendUrl: string
providerId: string
modelName: string
quality: Quality
// 输出 format 的 toggle 集合screenshot / link 与下方两个布尔保持联动)
formats: NoteFormat[]
screenshot: boolean
link: boolean
style: string
style: NoteStyle
extras: string
// 多模态视频理解:抽帧拼图喂给视觉模型,提升画面相关问题的回答质量
// 要求所选 model 是视觉模型(如 gpt-4o / gemini / claude-opus 系列),文字模型会忽略图片
video_understanding: boolean
// 抽帧间隔(秒),范围 1-30默认 6
video_interval: number
// 拼图网格 [rows, cols],每张拼图最多 rows*cols 帧。默认 [2,2]
grid_size: [number, number]
}
export interface ProviderUpdatePayload {

View File

@@ -3,9 +3,16 @@ import { onMounted, ref } from 'vue'
import { getProviders, ping } from '~/logic/api'
import { settings, settingsReady } from '~/logic/storage'
import { getModelsByProvider } from '~/logic/api'
import type { Model, Provider } from '~/logic/types'
import { NOTE_FORMATS, NOTE_STYLES, type Model, type NoteFormat, type Provider } from '~/logic/types'
import { watch } from 'vue'
function toggleFormat(value: NoteFormat, checked: boolean) {
const cur = settings.value.formats || []
settings.value.formats = checked
? Array.from(new Set([...cur, value]))
: cur.filter(v => v !== value)
}
const providers = ref<Provider[]>([])
const models = ref<Model[]>([])
const status = ref<{ kind: 'idle' | 'ok' | 'err', text: string }>({ kind: 'idle', text: '' })
@@ -128,13 +135,67 @@ onMounted(async () => {
</label>
<label class="flex flex-col gap-1">
<span class="text-gray-600">笔记风格</span>
<input v-model="settings.style" class="input" placeholder="留空使用默认">
<select v-model="settings.style" class="input">
<option v-for="s in NOTE_STYLES" :key="s.value" :value="s.value">{{ s.label }}</option>
</select>
</label>
<label class="flex items-center gap-2">
<input v-model="settings.screenshot" type="checkbox"> 自动插入截图
</div>
<div class="flex flex-col gap-1 text-sm">
<span class="text-gray-600">输出形式 web NoteForm 对齐</span>
<div class="flex flex-wrap gap-x-4 gap-y-2">
<label v-for="f in NOTE_FORMATS" :key="f.value" class="flex items-center gap-2">
<input
type="checkbox"
:checked="(settings.formats || []).includes(f.value)"
@change="toggleFormat(f.value, ($event.target as HTMLInputElement).checked)"
>
{{ f.label }}
</label>
</div>
</div>
<label class="flex flex-col gap-1 text-sm">
<span class="text-gray-600">额外提示词追加到 prompt 末尾</span>
<textarea
v-model="settings.extras"
class="input resize-y"
rows="3"
placeholder="例如:重点关注游戏开发部分;保留所有专业术语原文"
/>
</label>
</section>
<section class="section-card">
<h2 class="font-semibold">视频理解多模态</h2>
<p class="text-xs text-gray-500">
启用后会按抽帧间隔截取视频帧拼成网格图连同字幕一起喂给视觉模型提升画面相关问题的回答质量
<strong class="text-amber-700">需要选择视觉模型</strong>GPT-4o / Gemini / Claude 文字模型会忽略图片
</p>
<label class="flex items-center gap-2 text-sm">
<input v-model="settings.video_understanding" type="checkbox">
启用视频理解
</label>
<div v-if="settings.video_understanding" class="grid grid-cols-3 gap-3 text-sm">
<label class="flex flex-col gap-1">
<span class="text-gray-600">抽帧间隔(, 1-30)</span>
<input v-model.number="settings.video_interval" type="number" min="1" max="30" class="input">
</label>
<label class="flex items-center gap-2">
<input v-model="settings.link" type="checkbox"> 插入原片跳转链接
<label class="flex flex-col gap-1">
<span class="text-gray-600">拼图行 (1-10)</span>
<input
:value="settings.grid_size?.[0] ?? 2"
type="number" min="1" max="10" class="input"
@input="settings.grid_size = [Number(($event.target as HTMLInputElement).value) || 2, settings.grid_size?.[1] ?? 2]"
>
</label>
<label class="flex flex-col gap-1">
<span class="text-gray-600">拼图列 (1-10)</span>
<input
:value="settings.grid_size?.[1] ?? 2"
type="number" min="1" max="10" class="input"
@input="settings.grid_size = [settings.grid_size?.[0] ?? 2, Number(($event.target as HTMLInputElement).value) || 2]"
>
</label>
</div>
</section>

View File

@@ -4,7 +4,7 @@ import { detectPlatform } from '~/logic/platform'
import { settings, settingsReady, tasks, tasksReady, upsertTask } from '~/logic/storage'
import { generateNote, getTaskStatus, resolveImageUrl } from '~/logic/api'
import { fetchBilibiliSubtitle } from '~/logic/bilibili-subtitle'
import type { TaskRecord } from '~/logic/types'
import { NOTE_FORMATS, NOTE_STYLES, type NoteFormat, type TaskRecord } from '~/logic/types'
const tabUrl = ref<string>('')
const tabTitle = ref<string>('')
@@ -67,19 +67,22 @@ async function start() {
try {
// B 站:在用户浏览器里直接抓字幕(带本地登录态 cookie跳过后端的 download_subtitles 与音频转写
const prefetched = platform.value === 'bilibili' ? await fetchBilibiliSubtitle(tabUrl.value) : null
const formats = settings.value.formats || []
const { task_id } = await generateNote({
video_url: tabUrl.value,
platform: platform.value!,
quality: settings.value.quality,
provider_id: settings.value.providerId,
model_name: settings.value.modelName,
screenshot: settings.value.screenshot,
link: settings.value.link,
// backend VideoRequest 同时接受 format 数组与 screenshot/link 单独布尔,从 formats 派生保持单一真相源
format: [...formats],
screenshot: formats.includes('screenshot'),
link: formats.includes('link'),
style: settings.value.style || undefined,
format: [
...(settings.value.screenshot ? ['screenshot'] : []),
...(settings.value.link ? ['link'] : []),
],
extras: settings.value.extras || undefined,
video_understanding: settings.value.video_understanding || undefined,
video_interval: settings.value.video_understanding ? settings.value.video_interval : undefined,
grid_size: settings.value.video_understanding ? settings.value.grid_size : undefined,
prefetched_transcript: prefetched ?? undefined,
})
activeTaskId.value = task_id
@@ -108,6 +111,13 @@ function openOptions() {
browser.runtime.openOptionsPage()
}
function toggleFormat(value: NoteFormat, checked: boolean) {
const cur = settings.value.formats || []
settings.value.formats = checked
? Array.from(new Set([...cur, value]))
: cur.filter(v => v !== value)
}
async function openSidePanel() {
// 只能在用户操作触发的同步上下文里调,且需要明确的 tabId
try {
@@ -176,7 +186,7 @@ onUnmounted(() => {
</div>
<fieldset class="border rounded p-2 flex flex-col gap-2" :disabled="!supported || submitting">
<div class="grid grid-cols-3 gap-2 text-xs">
<div class="grid grid-cols-2 gap-2 text-xs">
<label class="flex flex-col gap-1">
<span class="text-gray-600">画质</span>
<select v-model="settings.quality" class="border rounded px-1 py-0.5">
@@ -185,14 +195,76 @@ onUnmounted(() => {
<option value="slow">高质</option>
</select>
</label>
<label class="flex items-center gap-1 mt-4">
<input v-model="settings.screenshot" type="checkbox"> 截图
</label>
<label class="flex items-center gap-1 mt-4">
<input v-model="settings.link" type="checkbox"> 跳转
<label class="flex flex-col gap-1">
<span class="text-gray-600">笔记风格</span>
<select v-model="settings.style" class="border rounded px-1 py-0.5">
<option v-for="s in NOTE_STYLES" :key="s.value" :value="s.value">{{ s.label }}</option>
</select>
</label>
</div>
<div class="flex flex-col gap-1 text-xs">
<span class="text-gray-600">输出形式</span>
<div class="flex flex-wrap gap-x-3 gap-y-1">
<label v-for="f in NOTE_FORMATS" :key="f.value" class="flex items-center gap-1">
<input
type="checkbox"
:checked="(settings.formats || []).includes(f.value)"
@change="toggleFormat(f.value, ($event.target as HTMLInputElement).checked)"
>
{{ f.label }}
</label>
</div>
</div>
<details class="text-xs">
<summary class="cursor-pointer text-gray-500">高级</summary>
<label class="flex flex-col gap-1 mt-2">
<span class="text-gray-600">额外提示词追加到 prompt 末尾</span>
<textarea
v-model="settings.extras"
class="border rounded px-1 py-1 resize-y"
rows="2"
placeholder="例如:重点关注游戏开发部分;保留所有专业术语原文"
/>
</label>
<label class="flex items-center gap-2 mt-2">
<input v-model="settings.video_understanding" type="checkbox">
<span class="text-gray-600">启用视频理解抽帧拼图喂视觉模型</span>
</label>
<div v-if="settings.video_understanding" class="grid grid-cols-3 gap-2 mt-2">
<label class="flex flex-col gap-1">
<span class="text-gray-600">抽帧间隔()</span>
<input
v-model.number="settings.video_interval"
type="number" min="1" max="30"
class="border rounded px-1 py-0.5"
>
</label>
<label class="flex flex-col gap-1">
<span class="text-gray-600">拼图行</span>
<input
:value="settings.grid_size?.[0] ?? 2"
type="number" min="1" max="10"
class="border rounded px-1 py-0.5"
@input="settings.grid_size = [Number(($event.target as HTMLInputElement).value) || 2, settings.grid_size?.[1] ?? 2]"
>
</label>
<label class="flex flex-col gap-1">
<span class="text-gray-600">拼图列</span>
<input
:value="settings.grid_size?.[1] ?? 2"
type="number" min="1" max="10"
class="border rounded px-1 py-0.5"
@input="settings.grid_size = [settings.grid_size?.[0] ?? 2, Number(($event.target as HTMLInputElement).value) || 2]"
>
</label>
</div>
<p v-if="settings.video_understanding" class="text-amber-700 mt-1">
需要选择视觉模型GPT-4o / Gemini / Claude 文字模型会忽略图片
</p>
</details>
<div class="text-xs text-gray-600">
<span v-if="settings.providerId && settings.modelName">
模型{{ settings.modelName }}