fix(gateway): debounce restart with single-flight queue (#248)

Root cause for #243 / #244 / #240: model edits trigger
api.restartGateway() with only 300ms debounce. Fast consecutive
edits stack up restart calls, creating zombie Gateway processes,
failed restarts, and CPU fan spikes.

Layer A (frontend):
- New src/lib/gateway-restart-queue.js: 3s debounce + single-flight
  lock + reschedule on in-flight request
- Refactor src/pages/models.js doAutoSave: write config immediately,
  schedule restart via queue with 'Apply now' toast button
- Subscribe to queue state for unified success/failure toast
- Add i18n: models.configQueued, models.applyNow

Layer B (backend):
- src-tauri/src/commands/config.rs: wrap restart_gateway /
  reload_gateway with tokio::sync::Mutex + 2s cooldown
- Cargo.toml: add tokio 'sync' feature
- scripts/dev-api.js: same guard for Web mode (inflight promise
  reuse + 2s cooldown)

Effects:
- 10 rapid edits within 3s -> 1 restart (was 10+ with races)
- Backend serializes concurrent restart calls, no zombie spawns
- User sees single 'Apply now' toast instead of restart storm

Refs #243 #244 #240
This commit is contained in:
晴天
2026-04-24 19:35:39 +08:00
committed by GitHub
parent 66e57adab0
commit 5235853373
6 changed files with 273 additions and 42 deletions

View File

@@ -4963,15 +4963,52 @@ async fn reload_gateway_internal(app: Option<&tauri::AppHandle>) -> Result<Strin
}
}
/// 全局 Gateway 重启 mutex单飞行锁
/// 保证同时只有一个重启操作在运行彻底避免僵尸进程堆积issue #243
static RESTART_MUTEX: tokio::sync::Mutex<()> = tokio::sync::Mutex::const_new(());
/// 上一次重启完成的时间戳(用于 2 秒冷却,防止穿透式重复调用)
static LAST_RESTART_FINISHED_AT: std::sync::Mutex<Option<std::time::Instant>> =
std::sync::Mutex::new(None);
const RESTART_COOLDOWN: std::time::Duration = std::time::Duration::from_secs(2);
/// 带单飞行锁和 2s 冷却的 restart 入口
/// 即使前端穿透节流发来多个请求,后端也只串行执行,且 2s 内不重复
async fn restart_gateway_guarded(app: Option<&tauri::AppHandle>) -> Result<String, String> {
// 获取 mutex并发调用时串行化
let _guard = RESTART_MUTEX.lock().await;
// 2 秒冷却:如果刚刚才完成一次重启,跳过本次(配置已被前一次生效)
let last_finished = {
let guard = LAST_RESTART_FINISHED_AT.lock().unwrap();
*guard
};
if let Some(last) = last_finished {
if last.elapsed() < RESTART_COOLDOWN {
return Ok("Gateway 刚重启过,本次请求已合并(冷却中)".to_string());
}
}
let result = reload_gateway_internal(app).await;
// 无论成功失败都记录时间,避免失败后被重试风暴压爆
{
let mut guard = LAST_RESTART_FINISHED_AT.lock().unwrap();
*guard = Some(std::time::Instant::now());
}
result
}
#[tauri::command]
pub async fn reload_gateway(app: tauri::AppHandle) -> Result<String, String> {
reload_gateway_internal(Some(&app)).await
restart_gateway_guarded(Some(&app)).await
}
/// 重启 Gateway 服务(与 reload_gateway 相同实现)
#[tauri::command]
pub async fn restart_gateway(app: tauri::AppHandle) -> Result<String, String> {
reload_gateway_internal(Some(&app)).await
restart_gateway_guarded(Some(&app)).await
}
/// 运行 openclaw doctor --fix 自动修复配置问题