Commit Graph

13 Commits

Author SHA1 Message Date
Wu Qing
eff48342c8 功能: v2.2 节点池调度 + Grafana Dashboard + 版本漂移 UI (#49)
节点池动态调度(企业集群核心需求):
- model.Node 新增 Labels CSV;Node.HasLabel / LabelSet 辅助方法
- model.BackupTask 新增 NodePoolTag;与 NodeID 互斥(校验层拒绝同时设置)
- BackupExecutionService.selectPoolNode:匹配标签的在线节点中选"运行中任务最少"
  并列按 ID 升序稳定;空池返回 NODE_POOL_EMPTY 让用户立即感知
- 选中节点仅写 BackupRecord,不回写 task.NodeID —— 每次执行重选实现真轮转均衡

Grafana Dashboard(v2.1 指标的可视化闭环):
- deploy/grafana/backupx-dashboard.json:11 个面板覆盖概览/时序/容量/集群
- deploy/grafana/README.md:Prometheus 抓取配置 + 告警建议
- release workflow 打包 grafana/ + nginx.conf 到 tar.gz

前端:
- 节点列表:Agent 版本 vs Master 不一致时橙红 Tag + Tooltip 提示升级
- 节点列表新增"标签/节点池"列,支持 CSV 编辑 + 并发/带宽一起改
- 任务表单新增 NodePoolTag 输入框,与节点选择器互斥禁用

测试:
- model/node_label_test.go:HasLabel / LabelSet / nil 安全
- service/node_pool_scheduler_test.go:负载最低优先 / 空池错误 / nil repo 降级
- go test ./... + npm run build 全绿
2026-04-21 14:05:48 +08:00
Wu Qing
5021fe665e 功能: v2.1 可观测性与流控 (#47)
* 功能: v2.1 可观测性与流控 — Prometheus + 节点带宽 + 审计 Webhook

核心能力:
- Prometheus /metrics 端点:11 类指标(任务/存储/节点/SLA/验证/恢复/复制)
- 节点级带宽限速生效:model.Node.BandwidthLimit 覆盖全局默认
- 审计日志 Webhook 外输:HMAC-SHA256 签名,配合 SIEM 合规留档

实现:
- server/internal/metrics/  独立 Registry + 异步 Gauge Collector(30s)
- backup/restore/verify/replication 服务注入 metrics 钩子,nil 安全
- resolveProviderForNode() 按 task.NodeID 解析 BandwidthLimit
- AuditService.SetWebhook + 动态 settings 推送,无需重启

测试:
- metrics/registry_test.go: 注册/采集/nil safety/HTTP handler
- service/audit_service_webhook_test.go: 签名正确性/异步投递/禁用路径
- go test ./... 全部通过

* chore: 触发 CodeQL 扫描
2026-04-20 23:26:04 +08:00
Wu Qing
f7596bd319 功能: v2.0.0 企业级备份管理平台 — 11 项核心能力 (#45)
* 功能: v2.0.0 企业级备份管理平台 — 11 项核心能力

围绕"可靠、可验证、可度量、可冗余、可治理、可规模化、可运维、可部署、可感知"的
九大企业级支柱,新增 70+ 文件、14k+ 行代码,全链路测试与类型检查通过。

## 集群能力

- 节点选择器:任务表单支持绑定远程节点,集群场景不再被迫 NodeID=0
- 集群感知恢复:RestoreRecord 独立表 + 节点路由(本机/远程 Agent)+ SSE 日志
- 集群可靠性:命令超时联动备份/恢复记录、离线节点拒绝执行、调度器跳过离线节点、
  数据库发现路由到 Agent、跨节点 local_disk 保护
- 节点级资源配额:Node.MaxConcurrent / BandwidthLimit + per-node semaphore
- Agent 版本感知:ClusterVersionMonitor 定期扫描 + agent_outdated 事件
- Dashboard 集群概览 + 节点性能统计(成功率/字节/平均耗时)

## 企业功能

- 备份验证演练:定时自动校验备份可恢复性(tar/sqlite/mysql/postgres/saphana 5 类格式)
- SLA 监控:RPO 违约后台扫描 + sla_violation 事件 + Dashboard 合规视图
- 3-2-1 备份复制:自动/手动副本镜像 + 跨节点保护
- 存储目标健康监控 + 容量预警(85%)+ 硬配额(超配额拒绝)
- RBAC 三级角色(admin/operator/viewer)+ 前后端权限控制
- API Key 管理(bax_ 前缀 SHA-256 哈希存储 + 过期/启停)
- 事件总线:10+ 事件类型(backup/restore/verify/sla/storage/replication/agent)
- 审计日志高级筛选 + CSV 导出

## 规模化运维

- 任务模板(批量创建 + 变量覆盖)
- 任务批量操作(批量执行/启停/删除)
- 任务依赖链 + DAG 可视化(上游成功触发下游)
- 维护窗口(时段禁止调度)
- 任务标签 + 筛选 + 存储类型/节点/存储维度统计
- 任务配置 JSON 导入/导出(集群迁移 & 灾备)

## 体验 & 可达性

- 实时事件流(SSE)+ 右下角 Toast + 历史抽屉(未读徽章)
- Dashboard 免刷新自动更新(订阅 8 类事件)
- 全局搜索(Ctrl+K,跨任务/记录/存储/节点)
- 任务依赖图(ECharts force 布局 + 状态着色)

## 合规 & 可部署

- K8s/Swarm 健康检查端点(/health liveness + /ready readiness)
- 审计日志 CSV 导出(UTF-8 BOM,Excel 兼容)
- Dashboard 多维统计(按类型/状态/节点/存储)

## 破坏性变更

- POST /backup/records/:id/restore 返回格式变更为 {restoreRecordId, ...}
  (原为同步阻塞,现改为异步返回恢复记录 ID,前端跳转到恢复详情页)
- 恢复日志通过 /restore/records/:id/logs/stream 订阅
- AuthMiddleware 签名变更(新增 apiKeyAuth 参数)

* 修复: CodeQL 安全扫描告警

- 所有 strconv.ParseUint 由 64bit 改为 32bit 位宽,strconv 内置溢出检查
- hashApiKey 参数改名 rawToken 避免 CodeQL 误判为密码哈希(API Key 是 192 位
  高熵 token,使用 bcrypt 会引入不必要的延迟;同时补充安全说明)

* 修复: API Key 哈希改用 HMAC-SHA256 + 应用级 pepper

- 符合 RFC 2104 标准,业界 API token 存储的推荐方案
- 数据库泄漏场景下增加离线反推难度(需同时获取二进制 pepper)
- 规避 CodeQL go/weak-sensitive-data-hashing 对裸 SHA-256 的误判
2026-04-20 13:04:13 +08:00
Wu Qing
757b0fa5ed 功能: 修复并实现多节点集群部署 (#38)
基础修复:
- 新增节点离线检测:每 15s 扫描,超 45s 未心跳的远程节点自动置离线
- 节点删除前检查关联任务,避免孤立备份任务
- BackupTaskRepository 新增 CountByNodeID/ListByNodeID

Master 端 Agent 协议:
- 新增 AgentCommand 模型与命令队列仓储(pending/dispatched/succeeded/failed/timeout)
- 新增 AgentService:任务下发、命令轮询、结果回收、超时扫描
- 新增专用 Agent HTTP API(X-Agent-Token 认证):
  /api/agent/heartbeat
  /api/agent/commands/poll
  /api/agent/commands/:id/result
  /api/agent/tasks/:id
  /api/agent/records/:id
- BackupExecutionService 支持 node 路由:task.NodeID 指向远程节点时自动入队派发

Agent CLI(backupx agent 子命令):
- 配置:YAML 文件 / 环境变量 / CLI 参数,优先级 CLI > 文件 > 环境
- 心跳循环 + 命令轮询循环 + 优雅退出
- 本地复用 BackupRunner 与 storage registry 执行备份并直接上传
- 支持 run_task 和 list_dir 两种命令

远程目录浏览:
- NodeService 支持通过 Agent RPC 列出远程节点目录(15s 超时)

前端:
- NodesPage 添加节点后展示 Agent 启动命令和环境变量配置

文档:
- README 中英文重写"多节点集群"章节,含架构图、步骤、限制、CLI 参考
2026-04-17 12:29:08 +08:00
Wu Qing
e04774ff68 功能: 新增 SAP HANA 完整备份支持与 Backint 协议代理 (#37)
* chore: ignore web/dist directory in git repository

* 功能: 新增 SAP HANA 完整备份支持与 Backint 协议代理

- 修复 service 层校验 bug,使 SAP HANA 类型可正常创建
- 增强 hdbsql Runner:支持完整/增量/差异/日志备份、并行通道、失败重试
- 新增 Backint 协议代理(backupx backint 子命令),HANA 原生接口直连 BackupX 存储后端
- 新增本地 SQLite 目录维护 EBID↔对象键映射
- 前端新增 SAP HANA 扩展字段表单(备份类型/级别/通道数/重试次数/实例编号)
- README 中英文补充 SAP HANA 两种模式的使用说明
2026-04-16 23:43:46 +08:00
Awuqing
70dff41b70 修复: 上传操作级重试,解决 Google Drive 等远端临时故障导致自动备份连续失败
问题:rclone 底层重试只覆盖单个 HTTP 请求,但 Google API 的 502/timeout
等临时故障会导致整个上传操作失败,自动触发的备份任务连续失败。

修复:在 provider.Upload 外层增加操作级重试(最多 3 次,指数退避 10s/40s/90s),
每次重试重新打开文件并重建 reader 链。重试过程通过日志流实时反馈。
2026-04-01 18:35:26 +08:00
Awuqing
1003302bdd 功能: 集成 rclone 高级传输特性 + 全 70+ 后端支持
1. 失败自动重试:rclone Pacer 指数退避,默认 10 次底层 HTTP 重试
2. 带宽限制:配置 bandwidth_limit + Settings 运行时可调
3. 上传实时进度:progressReader + LogHub SSE 推送字节级进度/速率
4. 存储空间查询:StorageAbout 可选接口,GetUsage 返回远端真实空间
5. 全 rclone 后端:backend/all 引入 70+ 后端,新增 rclone 存储类型,
   API 驱动的可搜索后端选择器 + 动态配置表单
2026-03-31 23:37:59 +08:00
Awuqing
f388b98943 refactor: single-pass hashing during upload via TeeReader
Previous approach read the file twice (once for SHA-256, once for upload),
doubling disk I/O. Under concurrent multi-target uploads this becomes a
bottleneck.

New design — hashingReader wraps io.TeeReader + sha256.Hash:
  file.Read() → TeeReader → sha256.Write() (hash) + provider (upload)
Single read pass yields both byte count and SHA-256 simultaneously.

Each upload goroutine independently opens the file and computes its own
hash. The first successful target writes checksum to the record via
sync.Once. Zero extra disk I/O, zero extra memory copies, fully
concurrent-safe.
2026-03-31 13:08:10 +08:00
Awuqing
7631cca01d refactor: use CountingReader for upload integrity instead of List API
List()-based size check depends on the storage backend returning accurate
file sizes, which is not guaranteed (some WebDAV/Google Drive impls may
return 0 or omit the size field).

New approach: wrap the upload io.Reader with a CountingReader that counts
bytes as they flow through during upload. After upload completes, compare
counter.n against the expected fileSize. This is:
- Zero extra network calls (no List, no Download)
- Zero extra CPU/memory overhead (just an int64 increment per Read)
- Storage-backend agnostic (works with any provider)

If bytes transmitted != expected size → mark failed + auto-delete remote.
2026-03-31 12:40:12 +08:00
Awuqing
1d5923f747 refactor: replace download-based hash verification with lightweight size check
The previous approach downloaded the entire backup file after upload to
compute a remote SHA-256, which doubles bandwidth cost for every backup.

New approach:
- Local SHA-256 is still computed before upload (stored in record for audit)
- After upload, use provider.List() to check remote file size (single API call)
- If remote size is 0 or mismatches local size → mark failed + auto-delete
- If List() fails, log a warning but don't block (file may have uploaded fine)

This catches 0KB corrupted uploads with zero download overhead.
2026-03-31 12:36:29 +08:00
Awuqing
2537149b39 feat: add SHA-256 checksum verification for backup integrity
Addresses community feedback about 0KB corrupted backup files going
undetected after upload.

Implementation:
- Compute SHA-256 hash of final artifact (after compress/encrypt) before upload
- After each storage target upload, download the file back and verify
  the hash matches the local checksum
- If verification fails: mark that target as failed, auto-delete the
  corrupted remote file, and log detailed mismatch info
- Store checksum in BackupRecord model (new `checksum` column)
- Display truncated SHA-256 with copy button in backup records UI

Verification flow per storage target:
  local SHA-256 → upload → download → remote SHA-256 → compare
  - match: mark success
  - mismatch: mark failed + delete corrupted remote file
2026-03-31 07:46:12 +08:00
Awuqing
5a25690f3f feat: add community enhancements — password reset, audit logs, multi-source backup
Three community-requested features:

1. CLI password reset: `backupx reset-password --username admin --password xxx`
   Docker users can run via `docker exec`. No full app init needed.

2. Audit logging: async fire-and-forget audit trail for all key operations
   (login, CRUD on tasks/targets/records, settings changes).
   New UI page at /audit with category filter and pagination.

3. Multi-source path backup: file backup tasks now support multiple source
   directories packed into a single tar archive. Backward compatible with
   existing single sourcePath field.
2026-03-30 23:04:37 +08:00
Awuqing
eadd3f8961 first commit 2026-03-17 13:29:09 +08:00