feat(BackupX): harden agent cluster backup workflow

Squash merge PR #61
This commit is contained in:
Wu Qing
2026-05-13 14:24:45 +08:00
committed by GitHub
parent 7a6ffd4ddd
commit 7084d47c4b
30 changed files with 1360 additions and 155 deletions

View File

@@ -62,6 +62,8 @@ The script runs automatically and:
5. Runs `systemctl enable --now backupx-agent`
6. Polls `/api/v1/agent/self` until the master confirms `status: online` (up to 30 s)
Docker mode uses the same `BACKUPX_AGENT_MASTER`, `BACKUPX_AGENT_TOKEN`, and `BACKUPX_AGENT_TEMP_DIR=/var/lib/backupx-agent/tmp` environment contract. After starting the container, the installer also probes `/api/v1/agent/self`; if the node does not come online, it prints `docker ps` and `docker logs --tail=100 backupx-agent` diagnostics before exiting non-zero.
If you choose the URL-based fallback command and `curl` prints HTML or the shell reports `Syntax error: newline unexpected`, the install URL is being served by the web console instead of the backend. Ensure either `/api/install/` or `/install/` is forwarded to the BackupX backend, or use the embedded command generated by the console.
Reruns are idempotent — to upgrade or re-provision, simply generate a new install command and run it again. The one-time install link expires after its TTL or after first consumption, whichever is sooner.
@@ -81,9 +83,15 @@ In the **Backup Tasks** page, pick the target node when creating the task. When
- Local (`nodeId=0`) → Master executes in-process
- Remote node → Master enqueues the command → Agent claims → Agent runs locally → uploads → reports back
The node table shows the Agent health and command queue state: pending/dispatched depth, running long commands, timeouts, oldest active command age, and the latest Agent-side error. The same queue depth, running-command, and timeout snapshots are exported as Prometheus metrics:
- `backupx_agent_command_queue_depth`
- `backupx_agent_command_running`
- `backupx_agent_command_timeout_total`
## Known limitations
- **Encrypted backups don't work via Agent** — the Agent doesn't hold Master's AES-256 key. Tasks with `encrypt: true` will fail if routed to an Agent
- **Encrypted backups are Master-only** — the Agent doesn't hold Master's AES-256 key. Creating or updating a task with `encrypt: true` and a remote node or node pool is rejected up front
- **Directory browser timeout** — remote dir listing is a synchronous RPC through the queue (15s default)
- **Dispatched command timeout** — claimed-but-unfinished commands are marked `timeout` after 10 minutes