feat(BackupX): harden agent cluster backup workflow

Squash merge PR #61
This commit is contained in:
Wu Qing
2026-05-13 14:24:45 +08:00
committed by GitHub
parent 7a6ffd4ddd
commit 7084d47c4b
30 changed files with 1360 additions and 155 deletions

View File

@@ -8,6 +8,8 @@ description: File, MySQL, PostgreSQL, SQLite and SAP HANA — what they back up
BackupX supports five built-in backup types. Type determines which runner executes the job.
When a task is routed to a remote Agent, the source tools and paths are resolved on that Agent host. Multi-target uploads are still tracked per storage target; if at least one target succeeds, the backup record is marked successful and the per-target result table shows partial failures.
## File / Directory
Tars (and optionally gzips) one or more filesystem paths.

View File

@@ -62,6 +62,8 @@ The script runs automatically and:
5. Runs `systemctl enable --now backupx-agent`
6. Polls `/api/v1/agent/self` until the master confirms `status: online` (up to 30 s)
Docker mode uses the same `BACKUPX_AGENT_MASTER`, `BACKUPX_AGENT_TOKEN`, and `BACKUPX_AGENT_TEMP_DIR=/var/lib/backupx-agent/tmp` environment contract. After starting the container, the installer also probes `/api/v1/agent/self`; if the node does not come online, it prints `docker ps` and `docker logs --tail=100 backupx-agent` diagnostics before exiting non-zero.
If you choose the URL-based fallback command and `curl` prints HTML or the shell reports `Syntax error: newline unexpected`, the install URL is being served by the web console instead of the backend. Ensure either `/api/install/` or `/install/` is forwarded to the BackupX backend, or use the embedded command generated by the console.
Reruns are idempotent — to upgrade or re-provision, simply generate a new install command and run it again. The one-time install link expires after its TTL or after first consumption, whichever is sooner.
@@ -81,9 +83,15 @@ In the **Backup Tasks** page, pick the target node when creating the task. When
- Local (`nodeId=0`) → Master executes in-process
- Remote node → Master enqueues the command → Agent claims → Agent runs locally → uploads → reports back
The node table shows the Agent health and command queue state: pending/dispatched depth, running long commands, timeouts, oldest active command age, and the latest Agent-side error. The same queue depth, running-command, and timeout snapshots are exported as Prometheus metrics:
- `backupx_agent_command_queue_depth`
- `backupx_agent_command_running`
- `backupx_agent_command_timeout_total`
## Known limitations
- **Encrypted backups don't work via Agent** — the Agent doesn't hold Master's AES-256 key. Tasks with `encrypt: true` will fail if routed to an Agent
- **Encrypted backups are Master-only** — the Agent doesn't hold Master's AES-256 key. Creating or updating a task with `encrypt: true` and a remote node or node pool is rejected up front
- **Directory browser timeout** — remote dir listing is a synchronous RPC through the queue (15s default)
- **Dispatched command timeout** — claimed-but-unfinished commands are marked `timeout` after 10 minutes

View File

@@ -42,6 +42,8 @@ Go to **Backup Tasks → New**. Three steps:
2. **Source** — paths for file backup (multi-source supported), or connection info for databases
3. **Storage & policy** — pick target(s), compression, retention days, encryption on/off
For Agent-routed tasks, encryption must stay off because the Agent never receives the Master's encryption key. BackupX rejects remote-node or node-pool tasks with encryption enabled during create/update.
Save, then click **Run Now** to trigger a test. Live logs stream on the **Backup Records** page.
:::note

View File

@@ -8,6 +8,8 @@ description: 文件、MySQL、PostgreSQL、SQLite 和 SAP HANA — 各自的能
BackupX 支持五种内置备份类型,类型决定了用哪个 runner 执行。
当任务路由到远程 Agent 时,源路径和外部工具都会在该 Agent 主机上解析。多存储目标上传仍会逐目标记录结果;只要至少一个目标上传成功,备份记录即为成功,详情中的目标结果表会展示部分失败。
## 文件 / 目录
打包(可选 gzip一个或多个文件系统路径。

View File

@@ -62,6 +62,8 @@ Web 控制台 → **节点管理** → **添加节点**,打开三步向导:
5. 执行 `systemctl enable --now backupx-agent`
6. 轮询 `/api/v1/agent/self`,直到 Master 确认 `status: online`(最多 30 秒)
Docker 模式使用同一组环境变量约定:`BACKUPX_AGENT_MASTER`、`BACKUPX_AGENT_TOKEN` 和 `BACKUPX_AGENT_TEMP_DIR=/var/lib/backupx-agent/tmp`。容器启动后,安装脚本同样会探测 `/api/v1/agent/self`;如果节点没有上线,会输出 `docker ps` 与 `docker logs --tail=100 backupx-agent` 排查命令,并以非零状态退出。
如果使用 URL 备用命令时 `curl` 输出 HTML或 shell 报 `Syntax error: newline unexpected`,说明安装 URL 被 Web 控制台接管而不是转发到后端。需要确保 `/api/install/` 或 `/install/` 至少一个路径能转发到 BackupX 后端,或改用控制台生成的嵌入式命令。
脚本是幂等的:升级或重装只需重新生成一条安装命令再跑一次。一次性安装链接在 TTL 到期或被首次消费后立即作废。
@@ -81,9 +83,15 @@ Web 控制台 → **节点管理** → **添加节点**,打开三步向导:
- 本机 / 未指定(`nodeId=0`Master 进程内直接执行
- 远程节点Master 写入命令队列 → Agent 拉取 → Agent 本地执行 → 上传 → 回报
节点列表会展示 Agent 健康与命令队列状态pending/dispatched 深度、运行中的长任务、超时数、最旧活跃命令年龄和最近 Agent 错误。同样的队列深度、运行中命令数和超时快照会导出为 Prometheus 指标:
- `backupx_agent_command_queue_depth`
- `backupx_agent_command_running`
- `backupx_agent_command_timeout_total`
## 已知限制
- **Agent 不支持加密备份**Agent 不持有 Master 的 AES-256 密钥。`encrypt: true` 的任务路由到 Agent 时会直接上报失败
- **加密备份仅支持 Master 本机执行**Agent 不持有 Master 的 AES-256 密钥。创建或更新任务时,如果 `encrypt: true` 且选择了远程节点或节点池,会在入口直接拒绝
- **目录浏览超时**:远程目录浏览通过命令队列做同步 RPC默认 15s 超时
- **派发命令超时**Agent 领取但未完成的命令超过 10 分钟会被置 `timeout`

View File

@@ -42,6 +42,8 @@ description: 部署 BackupX、添加存储目标、创建第一个备份任务
2. **源配置** — 文件备份选择源路径(支持多个),数据库备份填写连接信息
3. **存储与策略** — 选择存储目标(支持多个)、压缩策略、保留天数、是否加密
对于路由到 Agent 的任务,加密必须关闭,因为 Agent 不会拿到 Master 的加密密钥。BackupX 会在创建/更新阶段拒绝开启加密的远程节点或节点池任务。
保存后可点击 **立即执行** 测试,**备份记录** 页面实时查看执行日志。
:::note