refine custom identifier skill scope

This commit is contained in:
jxxghp
2026-04-21 17:31:37 +08:00
parent d1d3fc7f30
commit 8c256d91bd
2 changed files with 62 additions and 12 deletions

View File

@@ -23,7 +23,12 @@ class UpdateCustomIdentifiersInput(BaseModel):
description=(
"The complete list of custom identifier rules to save. "
"This REPLACES the entire existing list. "
"Always query existing identifiers first, merge new rules, then pass the full list."
"Always query existing identifiers first, merge new rules, then pass the full list. "
"These rules are global and affect future recognition for all torrents/files. "
"When adding a rule for a user-provided sample, prefer narrow regex patterns that include "
"sample-specific anchors such as the title alias, year, season/episode marker, group tag, "
"resolution, or other distinctive fragments. Avoid overly broad patterns like bare generic "
"tags, pure episode numbers, or common release words unless the user explicitly wants a global rule."
),
)
@@ -35,6 +40,10 @@ class UpdateCustomIdentifiersTool(MoviePilotTool):
"This tool REPLACES all existing identifier rules with the provided list. "
"IMPORTANT: Always use 'query_custom_identifiers' first to get existing rules, "
"then merge new rules into the list before calling this tool to avoid accidentally deleting existing rules. "
"IMPORTANT: New identifier rules are global. When the rule is created from a specific torrent/file name, "
"make the regex as narrow as possible and include distinctive elements from that sample so unrelated titles "
"are not affected. Prefer contextual replacements with capture groups/backreferences over bare block words "
"when a generic word like REPACK, WEB-DL, 1080p, 字幕, or a simple episode marker would otherwise match too broadly. "
"Supported rule formats (spaces around operators are required): "
"1) Block word: just the word/regex to remove; "
"2) Replacement: '被替换词 => 替换词'; "

View File

@@ -1,11 +1,13 @@
---
name: generate-identifiers
version: 1
version: 2
description: >-
Use this skill when a user provides a torrent name or file name and wants to fix recognition issues,
or asks to add/manage custom identifiers (自定义识别词).
This skill generates identifier rules based on the WordsMatcher preprocessing logic,
checks for duplicates against existing rules, and saves them via MCP tools.
Because custom identifiers are global, generated rules must default to conservative,
sample-specific regex patterns instead of broad matches unless the user explicitly wants global cleanup.
Applicable scenarios include:
1) A torrent or file name is incorrectly recognized (wrong title, season, episode, etc.);
2) The user wants to block unwanted keywords from torrent names;
@@ -34,9 +36,11 @@ There are **four formats**. Operators must have spaces on both sides.
Removes matched text from the title. Supports regex.
```
REPACK
SomeUniqueAlias
```
Use a bare block word only when the token itself is specific enough globally, or when the user explicitly wants a global cleanup rule.
### 2. Replacement (被替换词 => 替换词)
Regex substitution. The left side is a regex pattern, the right side is the replacement (supports backreferences).
@@ -84,6 +88,40 @@ Lines starting with `#` are comments and will be skipped during processing.
5. **Chinese number support**: Episode offset handles Chinese numbers (一二三四五六七八九十).
6. **Empty replacement**: Using nothing after `=>` is equivalent to a block word.
## Global Scope Guardrails
Custom identifiers are **global**. A new rule affects all future torrent/file recognition, not just the sample provided by the user.
When generating a new rule, default to **the narrowest regex that still fixes the user's sample**:
- Extract the sample's unique anchors first: wrong title alias, year, season/episode marker, group tag, source, resolution, release tag, file extension, or other distinctive fragments.
- The matching side should usually contain **at least two meaningful anchors**, and one of them should normally be the title alias or another highly distinctive identifier from the user-provided sample.
- Prefer matching the **full wrong alias or a stable unique fragment** from the sample, not a short generic substring.
- Avoid generic global rules such as bare `1080p`, `WEB-DL`, `中字`, `国配`, `REPACK`, `S01E01`, or pure numbers unless the user explicitly wants a global cleanup rule.
- If the rule only needs to fix one specific naming pattern, prefer a **contextual replacement** with capture groups/backreferences over a bare block word.
- For episode offset rules, the `前定位词` and `后定位词` should use sample-specific context so the offset only runs on the intended naming pattern.
- For direct TMDB/Douban binding, the left side should match the user's specific wrong alias or naming pattern, not a broad season/episode pattern that could hit other media.
### Narrow vs Broad Examples
Bad (too broad for a global rule):
```
REPACK
1080p
S01E01 => {[tmdbid=12345;type=tv;s=1;e=1]}
```
Better (scoped to the user's sample pattern):
```
(\[SubGroup\].*?My\.Show.*?2024.*?)REPACK => \1
Some\.Weird\.Name(?:\.2024)?(?:\.S01E\d+)? => {[tmdbid=12345;type=tv;s=1]}
\[Baha\] <> \[1080P\] >> EP-12
```
Before saving, mentally test the rule against:
- the user's sample: it should match
- unrelated titles with common release tags: it should usually **not** match
## Workflow
### Step 1: Analyze the Problem
@@ -92,6 +130,7 @@ Parse the torrent/file name provided by the user. Identify:
- What is being incorrectly recognized (title, season, episode, year, quality, etc.)
- What the correct recognition result should be
- Which identifier format(s) will solve the problem
- Which fragments in the provided sample are unique enough to use as regex anchors, so the rule does not accidentally affect unrelated titles
### Step 2: Generate the Identifier Rule(s)
@@ -99,6 +138,7 @@ Write the rule using the appropriate format. Ensure:
- Regex special characters are properly escaped
- Add a comment line (starting with `#`) above the rule to describe what it does
- Test the regex mentally against the provided name to verify correctness
- Because the rule is global, prefer the most specific viable match; if a bare block word would be too broad, rewrite it as a contextual replacement that includes sample-specific anchors
### Step 3: Query Existing Identifiers
@@ -159,30 +199,30 @@ Tell the user:
**User**: "种子名 `My.Show.2024.REPACK.1080p.mkv`REPACK导致识别异常"
**Solution**: Block word:
**Solution**: Contextual replacement, scoped to this title pattern:
```
# 屏蔽REPACK标记
REPACK
# 仅在 My.Show.2024 命名中移除 REPACK
(My\.Show\.2024\.)REPACK(\.1080p) => \1\2
```
### Non-Standard Naming
**User**: "文件名 `[OldName] EP01.mkv`,应该识别为 NewName"
**Solution**: Replacement:
**Solution**: Replacement scoped to the wrong alias:
```
# OldName替换为NewName
OldName => NewName
# 将特定错误别名 OldName 替换为 NewName
\[OldName\] => [NewName]
```
### Force TMDB ID Recognition
**User**: "种子名 `Some.Weird.Name.S01E01.1080p.mkv`识别不到TMDB ID是12345是电视剧"
**Solution**: Direct ID specification:
**Solution**: Direct ID specification with a sample-specific alias pattern:
```
# 强制识别Some.Weird.NameTMDB ID 12345
Some\.Weird\.Name => {[tmdbid=12345;type=tv;s=1]}
# 仅在 Some.Weird.Name 这一命名模式下强制绑定 TMDB ID 12345
Some\.Weird\.Name(?:\.S01E\d+)?(?:\.1080p)? => {[tmdbid=12345;type=tv;s=1]}
```
### Combined Fix
@@ -224,4 +264,5 @@ The `WordsMatcher.prepare()` method (in `app/core/meta/words.py`) processes each
- Always query existing rules first before updating
- Never remove existing rules unless the user explicitly asks
- Add comment lines before new rules for maintainability
- Remember that new rules are global. If a rule looks broad, rewrite it to include more sample-specific anchors before saving.
- When uncertain about the correct approach, present multiple options and let the user choose