diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..cb9eb81 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,115 @@ +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## Project Overview + +GoProxy is an intelligent proxy pool system written in Go. It automatically fetches HTTP/SOCKS5 proxies from public sources, validates them (exit IP + geolocation + latency), and serves them via 4 proxy ports (HTTP random/stable, SOCKS5 random/stable) plus a WebUI dashboard. + +## Build & Run + +```bash +# Run directly (requires Go 1.25, CGO enabled for sqlite3) +go run . + +# Build and run +go build -o proxygo . +./proxygo + +# Docker +docker compose up -d +``` + +CGO is required (`CGO_ENABLED=1`) because of the `github.com/mattn/go-sqlite3` dependency. + +## Testing + +There are no Go unit tests (`go test ./...`). Testing is done via shell scripts against a running instance: + +```bash +# HTTP proxy test (continuous, Ctrl+C to stop) +./test/test_proxy.sh # port 7777 (random) +./test/test_proxy.sh 7776 # port 7776 (stable) + +# HTTP proxy HTTPS access test (random visits to Google/OpenAI/GitHub etc.) +./test/test_http_https.sh # port 7777, continuous +./test/test_http_https.sh 7776 20 # port 7776, 20 iterations + +# SOCKS5 proxy test +./test/test_socks5.sh localhost 7779 # random +./test/test_socks5.sh localhost 7780 50 # stable, 50 iterations + +# Go/Python test scripts +go run test/test_proxy.go 7777 +python test/test_proxy.py 7776 +``` + +## Architecture + +The system is a single binary with several cooperating goroutines. Module `go.mod` name is `goproxy`. + +### Module Dependency Flow + +``` +main.go (orchestrator) + ├── config/ — Global config (env vars + config.json), thread-safe singleton + ├── storage/ — SQLite persistence layer (proxies + source_status tables) + ├── fetcher/ — Multi-source proxy fetcher with circuit breaker (SourceManager) + ├── validator/ — Concurrent proxy validation (connectivity + exit IP + geo + latency) + ├── pool/ — Pool manager (admission control, slot allocation, replacement logic) + ├── checker/ — Background health checker (batch-based, skips S-grade when healthy) + ├── optimizer/ — Background quality optimizer (replaces slow proxies with faster ones) + ├── proxy/ — Outward-facing proxy servers + │ ├── server.go — HTTP proxy (implements http.Handler) + │ └── socks5_server.go — SOCKS5 proxy (raw TCP, manual protocol implementation) + ├── webui/ — Dashboard server (embedded HTML in html.go, API in dashboard.go) + └── logger/ — In-memory log collector for WebUI display +``` + +### Key Design Patterns + +- **Pool state machine**: healthy → warning → critical → emergency. State determines fetch mode (optimize/refill/emergency) and latency thresholds. +- **Slot-based capacity**: Pool has fixed size split between HTTP/SOCKS5 by configurable ratio (default 3:7). Each protocol has guaranteed minimum slots. +- **Smart admission**: New proxies enter if slots available, or replace worst existing proxy if significantly faster (30%+ by default via `ReplaceThreshold`). HTTP proxies must also pass an HTTPS CONNECT tunnel test (random real HTTPS site visit with retry) before admission. +- **Protocol-parallel validation**: `smartFetchAndFill` splits candidates by protocol and validates SOCKS5/HTTP concurrently. SOCKS5 fills faster (no HTTPS check overhead); HTTP validation runs in parallel without blocking SOCKS5 admission. +- **Circuit breaker on sources**: `SourceManager` tracks consecutive failures per source URL. 3 fails → degraded, 5 → disabled for 30min. +- **Auto-retry on proxy failure**: Both HTTP and SOCKS5 servers retry with different upstream proxies on failure (up to `MaxRetry` times), deleting failed proxies immediately. +- **SOCKS5 service only uses SOCKS5 upstreams** (many free HTTP proxies don't support CONNECT). HTTP service can use either protocol upstream. + +### Background Goroutines (started in main.go) + +1. **Status monitor** — every 30s, checks pool state and triggers `smartFetchAndFill` if needed +2. **Health checker** — every `HealthCheckInterval` min, validates a batch of proxies +3. **Optimizer** — every `OptimizeInterval` min, fetches from slow sources and replaces B/C grade proxies +4. **Config watcher** — listens for WebUI config changes and adjusts pool slots + +### Ports + +| Port | Service | +|------|---------| +| 7776 | HTTP proxy (lowest-latency mode) | +| 7777 | HTTP proxy (random rotation mode) | +| 7778 | WebUI dashboard | +| 7779 | SOCKS5 proxy (random rotation mode) | +| 7780 | SOCKS5 proxy (lowest-latency mode) | + +### Configuration + +- Environment variables: `WEBUI_PASSWORD`, `PROXY_AUTH_ENABLED`, `PROXY_AUTH_USERNAME`, `PROXY_AUTH_PASSWORD`, `BLOCKED_COUNTRIES`, `DATA_DIR` +- Persistent config: `config.json` (or `$DATA_DIR/config.json`) — pool capacity, latency thresholds, intervals. Editable via WebUI. +- Config is loaded once at startup via `config.Load()`, updated in-memory via `config.Save()`. Thread-safe via `sync.RWMutex`. + +### Storage + +SQLite with `MaxOpenConns(1)` (single-writer). Two tables: `proxies` (with quality grades S/A/B/C based on latency) and `source_status` (circuit breaker state). Schema auto-migrates on startup. + +### WebUI + +The entire frontend is embedded as Go string literals in `webui/html.go`. The server (`webui/server.go`) serves HTML and API endpoints. `webui/dashboard.go` contains API handlers. Dual-role auth: guest (read-only) and admin (full control via password). + +## Code Conventions + +- All log messages use `[module]` prefix: `[pool]`, `[fetch]`, `[health]`, `[optimize]`, `[monitor]`, `[socks5]`, `[proxy]`, `[tunnel]`, `[storage]`, `[source]` +- Comments and log messages are in Chinese +- Quality grades: S (≤500ms), A (501-1000ms), B (1001-2000ms), C (>2000ms) +- `storage.Proxy` is the shared data type across all modules diff --git a/README.md b/README.md index d9f4ad4..6a0f91c 100644 --- a/README.md +++ b/README.md @@ -27,15 +27,17 @@ GoProxy 从多个公开代理源自动抓取 HTTP/SOCKS5 代理,通过严格 ## ✨ 核心特性 ### 🎯 智能池子机制 -- **固定容量管理**:可配置池子大小和 HTTP/SOCKS5 协议比例 +- **固定容量管理**:可配置池子大小和 HTTP/SOCKS5 协议比例(默认 3:7) - **质量分级**:S/A/B/C 四级评分(基于延迟),智能选择高质量代理 - **动态状态感知**:Healthy → Warning → Critical → Emergency 四级状态自适应 - **严格准入标准**:必须通过出口 IP、地理位置、延迟三重验证才可入池 +- **HTTPS 可用性验证**:HTTP 协议代理入池前额外验证 HTTPS CONNECT 隧道能力,随机访问真实 HTTPS 网站确认可用(失败自动换站重试),确保入池的 HTTP 代理都能正常访问 HTTPS 网站 - **智能替换**:新代理必须显著优于现有代理(默认快 30%)才触发替换 ### 🚀 按需抓取 - **源分组策略**:快更新源(5-30min)用于紧急补充,慢更新源(每天)用于优化轮换 - **断路器保护**:连续失败的源自动降级/禁用,冷却后恢复 +- **协议并发验证**:抓取到的候选代理按协议分组,SOCKS5 和 HTTP 各自并发验证入池。SOCKS5 无额外检测,天然更快优先填充;HTTP 带 HTTPS CONNECT 检测较慢但不阻塞 SOCKS5 入池 - **多模式抓取**: - **Emergency**:单协议缺失或池子 <10%,使用所有可用源 - **Refill**:池子 <80%,使用快更新源 @@ -133,6 +135,7 @@ GoProxy 从多个公开代理源自动抓取 HTTP/SOCKS5 代理,通过严格 ├── test/ # 🧪 测试脚本与文档 │ ├── test_proxy.sh # HTTP 代理测试脚本(Bash) │ ├── test_socks5.sh # SOCKS5 代理测试脚本(Bash) +│ ├── test_http_https.sh # HTTP 代理 HTTPS 访问测试脚本(Bash) │ ├── test_proxy.go # Go 测试脚本 │ ├── test_proxy.py # Python 测试脚本 │ └── README.md # 测试脚本使用说明 @@ -571,7 +574,7 @@ proxies = {'http': 'socks5://myuser:secure_pass_123@server-ip:7779', 'https': 's ```json { "pool_max_size": 100, - "pool_http_ratio": 0.5, + "pool_http_ratio": 0.3, "pool_min_per_protocol": 10, "max_latency_ms": 2000, "max_latency_healthy": 1500, @@ -615,7 +618,7 @@ proxies = {'http': 'socks5://myuser:secure_pass_123@server-ip:7779', 'https': 's | 参数 | 默认值 | 说明 | 推荐范围 | | --- | --- | --- | --- | | `pool_max_size` | `100` | 代理池总容量 | 50-150 ⚠️ | -| `pool_http_ratio` | `0.5` | HTTP 协议占比 | 0.3-0.8 | +| `pool_http_ratio` | `0.3` | HTTP 协议占比 | 0.2-0.5 | | `pool_min_per_protocol` | `10` | 每协议最少保证数量 | 5-50 | > ⚠️ **容量限制说明**:公开代理源质量有限,验证通过率通常只有 1-3%。受地理过滤、延迟标准、出口检测等因素影响,**实际填充率约为 70-90%**。如设置 150 容量,实际可能稳定在 105-135 个。建议根据实际需求设置合理容量。 @@ -657,7 +660,7 @@ proxies = {'http': 'socks5://myuser:secure_pass_123@server-ip:7779', 'https': 's ```json { "pool_max_size": 50, - "pool_http_ratio": 0.5, + "pool_http_ratio": 0.3, "validate_concurrency": 100, "health_check_interval": 10, "health_check_batch_size": 10, @@ -949,6 +952,7 @@ Emergency (总数<10% 或 单协议缺失) 3. **地理位置查询**:获取出口 IP 的国家/城市 4. **延迟测试**:测量连接延迟 5. **质量评估**:根据延迟计算质量等级 +6. **HTTPS 隧道验证**(仅 HTTP 协议):通过代理实际访问随机 HTTPS 网站(Google/OpenAI/GitHub/Cloudflare/httpbin),验证 CONNECT 隧道可用性,首次失败自动换站重试 **入池判断逻辑** - ✅ 协议槽位未满:直接加入 @@ -1140,6 +1144,18 @@ go run test/test_proxy.go 7777 python test/test_proxy.py 7776 ``` +**HTTP 代理 HTTPS 访问测试**: +```bash +# 持续测试 HTTP 代理访问 HTTPS 网站(随机访问 Google/OpenAI/GitHub 等) +./test/test_http_https.sh + +# 指定端口 +./test/test_http_https.sh 7776 + +# 指定端口 + 测试次数 +./test/test_http_https.sh 7777 20 +``` + **SOCKS5 代理测试**: ```bash # 测试 SOCKS5 随机轮换模式(7779 端口) @@ -1347,7 +1363,8 @@ docker logs proxygo --tail 200 | grep -i "socks5.*failed" ### 本项目增强功能 在原项目基础上,我们进行了大量改进和功能增强: -- 🆕 **智能池子机制**:固定容量管理、质量分级(S/A/B/C)、智能替换逻辑 +- 🆕 **智能池子机制**:固定容量管理、质量分级(S/A/B/C)、智能替换逻辑、HTTP/SOCKS5 默认 3:7 比例 +- 🆕 **HTTPS 可用性验证**:HTTP 协议代理入池/刷新时额外验证 HTTPS CONNECT 隧道,随机访问真实网站确认可用 - 🆕 **按需抓取策略**:源分组、断路器保护、Emergency/Refill/Optimize 多模式 - 🆕 **分层健康管理**:批次检查、智能跳过 S 级、定时优化轮换 - 🆕 **智能重试机制**:自动故障切换、失败即删除、防重复尝试 @@ -1357,7 +1374,7 @@ docker logs proxygo --tail 200 | grep -i "socks5.*failed" - 🆕 **黑客风格 WebUI**:Matrix 美学、实时仪表盘、完整配置界面、中英文切换 - 🆕 **双角色权限**:访客模式(只读)+ 管理员模式(完全控制),可安全公网开放 - 🆕 **扩展存储层**:质量等级、使用统计、源状态管理 -- 🆕 **测试套件**:HTTP + SOCKS5 测试脚本,持续运行模式,显示国旗 emoji +- 🆕 **测试套件**:HTTP + SOCKS5 + HTTPS 访问测试脚本,持续运行模式,显示国旗 emoji - 🆕 **CI/CD 自动化**:GitHub Actions 自动构建多架构镜像(amd64/arm64),双仓库发布 - 🆕 **环境变量配置**:docker-compose + .env 文件,灵活配置各种部署场景 diff --git a/config/config.go b/config/config.go index 8e639ce..eac884f 100644 --- a/config/config.go +++ b/config/config.go @@ -161,7 +161,7 @@ func DefaultConfig() *Config { // 池子容量配置 PoolMaxSize: 100, // 总容量 - PoolHTTPRatio: 0.5, // HTTP占50% + PoolHTTPRatio: 0.3, // HTTP占30% PoolMinPerProtocol: 10, // 每协议最少10个 // 延迟标准配置 diff --git a/main.go b/main.go index 18a1c78..40e6566 100644 --- a/main.go +++ b/main.go @@ -172,41 +172,49 @@ func smartFetchAndFill(fetch *fetcher.Fetcher, validate *validator.Validator, st return } - log.Printf("[main] 抓取到 %d 个候选代理,开始严格验证...", len(candidates)) + // 按协议分组 + var httpCandidates, socks5Candidates []storage.Proxy + for _, c := range candidates { + if c.Protocol == "http" { + httpCandidates = append(httpCandidates, c) + } else { + socks5Candidates = append(socks5Candidates, c) + } + } - // 严格验证并尝试入池 - addedCount := 0 - validCount := 0 - rejectedNoExit := 0 - rejectedLatency := 0 - rejectedGeo := 0 - rejectedFull := 0 + log.Printf("[main] 抓取到 %d 个候选代理(SOCKS5=%d HTTP=%d),按协议并发验证...", + len(candidates), len(socks5Candidates), len(httpCandidates)) - for result := range validate.ValidateStream(candidates) { + // 共享计数器 + var addedCount atomic.Int32 + var validCount atomic.Int32 + var rejectedNoExit atomic.Int32 + var rejectedLatency atomic.Int32 + var rejectedGeo atomic.Int32 + var rejectedFull atomic.Int32 + + // 入池处理函数(两个协程共用) + processResult := func(result validator.Result) { if !result.Valid { - continue + return } - validCount++ + validCount.Add(1) latencyMs := int(result.Latency.Milliseconds()) - // 根据池子状态动态调整延迟标准 cfg := config.Get() maxLatency := cfg.GetLatencyThreshold(status.State) - // 检查:有出口IP、有位置 if result.ExitIP == "" || result.ExitLocation == "" { - rejectedNoExit++ - continue + rejectedNoExit.Add(1) + return } - // 检查:延迟达标 if latencyMs > maxLatency { - rejectedLatency++ - continue + rejectedLatency.Add(1) + return } - // 尝试加入池子 proxyToAdd := storage.Proxy{ Address: result.Proxy.Address, Protocol: result.Proxy.Protocol, @@ -216,41 +224,71 @@ func smartFetchAndFill(fetch *fetcher.Fetcher, validate *validator.Validator, st } if added, reason := poolMgr.TryAddProxy(proxyToAdd); added { - addedCount++ + addedCount.Add(1) } else if reason == "slots_full" { - rejectedFull++ + rejectedFull.Add(1) } else if len(result.ExitLocation) >= 2 { - // 检查是否被地理过滤 countryCode := result.ExitLocation[:2] for _, blocked := range cfg.BlockedCountries { if countryCode == blocked { - rejectedGeo++ + rejectedGeo.Add(1) break } } } - - // 如果是紧急模式且已达到最小要求,停止验证 - if mode == "emergency" && status.HTTP >= cfg.PoolMinPerProtocol && status.SOCKS5 >= cfg.PoolMinPerProtocol { - log.Println("[main] 🎉 紧急模式:达到最小要求,停止验证") - break - } - - // 动态检查是否已经填满 - if addedCount > 0 && addedCount%20 == 0 { - currentStatus, _ := poolMgr.GetStatus() - if !poolMgr.NeedsFetchQuick(currentStatus) { - log.Println("[main] ✅ 池子已填满,停止验证") - break - } - } } + // 池子是否已满的检查函数 + poolFilled := func() bool { + currentStatus, _ := poolMgr.GetStatus() + return !poolMgr.NeedsFetchQuick(currentStatus) + } + + var wg sync.WaitGroup + + // SOCKS5 协程:验证快,优先填充 + if len(socks5Candidates) > 0 { + wg.Add(1) + go func() { + defer wg.Done() + count := 0 + for result := range validate.ValidateStream(socks5Candidates) { + processResult(result) + count++ + if count%20 == 0 && poolFilled() { + log.Println("[main] ✅ SOCKS5 验证中检测到池子已满,停止") + break + } + } + log.Printf("[main] SOCKS5 验证完成,处理 %d 个", count) + }() + } + + // HTTP 协程:有额外 HTTPS 检测,较慢 + if len(httpCandidates) > 0 { + wg.Add(1) + go func() { + defer wg.Done() + count := 0 + for result := range validate.ValidateStream(httpCandidates) { + processResult(result) + count++ + if count%20 == 0 && poolFilled() { + log.Println("[main] ✅ HTTP 验证中检测到池子已满,停止") + break + } + } + log.Printf("[main] HTTP 验证完成,处理 %d 个", count) + }() + } + + wg.Wait() + // 最终状态 finalStatus, _ := poolMgr.GetStatus() log.Printf("[main] 填充完成: 验证%d 通过%d 入池%d | 拒绝[无出口:%d 延迟:%d 地理:%d 满:%d] | 最终: %s HTTP=%d SOCKS5=%d", - len(candidates), validCount, addedCount, - rejectedNoExit, rejectedLatency, rejectedGeo, rejectedFull, + len(candidates), validCount.Load(), addedCount.Load(), + rejectedNoExit.Load(), rejectedLatency.Load(), rejectedGeo.Load(), rejectedFull.Load(), finalStatus.State, finalStatus.HTTP, finalStatus.SOCKS5) } diff --git a/test/test_http_https.sh b/test/test_http_https.sh new file mode 100755 index 0000000..7c0c09b --- /dev/null +++ b/test/test_http_https.sh @@ -0,0 +1,81 @@ +#!/bin/bash + +# GoProxy HTTP 协议代理 HTTPS 访问测试脚本 +# 随机访问多个 HTTPS 网站,验证 HTTP 代理的 CONNECT 隧道能力 +# 用法: ./test_http_https.sh [端口号,默认7777] [测试次数,默认持续运行] +# 按 Ctrl+C 停止测试 + +PROXY_HOST="127.0.0.1" +PROXY_PORT="${1:-7777}" +MAX_COUNT="${2:-0}" # 0 = 持续运行 +DELAY=2 + +# 测试目标(HTTPS 网站) +TARGETS=( + "https://www.google.com" + "https://www.openai.com" + "https://www.github.com" + "https://www.cloudflare.com" + "https://httpbin.org/ip" +) + +# 统计变量 +total=0 +success=0 +fail=0 + +# 获取毫秒时间戳 +get_ms_time() { + python3 -c 'import time; print(int(time.time() * 1000))' +} + +# 捕获 Ctrl+C 信号 +trap ctrl_c INT +function ctrl_c() { + echo "" + echo "---" + if [ $total -gt 0 ]; then + loss_rate=$(awk "BEGIN {printf \"%.1f\", ($total - $success)/$total*100}") + success_rate=$(awk "BEGIN {printf \"%.1f\", $success/$total*100}") + echo "$total requests transmitted, $success succeeded, $fail failed, ${loss_rate}% loss, ${success_rate}% success rate" + fi + exit 0 +} + +echo "HTTP PROXY HTTPS TEST — $PROXY_HOST:$PROXY_PORT" +echo "targets: ${#TARGETS[@]} HTTPS sites" +echo "" + +while true; do + # 随机选择目标 + idx=$((RANDOM % ${#TARGETS[@]})) + target="${TARGETS[$idx]}" + + total=$((total + 1)) + + start_time=$(get_ms_time) + response=$(curl -x "http://${PROXY_HOST}:${PROXY_PORT}" \ + -s -k \ + -o /dev/null \ + -w "%{http_code}" \ + --connect-timeout 10 \ + --max-time 15 \ + "${target}" 2>&1) + end_time=$(get_ms_time) + elapsed=$((end_time - start_time)) + + if [[ "$response" =~ ^[23] ]]; then + echo "✅ seq=$total ${target} -> HTTP $response time=${elapsed}ms" + success=$((success + 1)) + else + echo "❌ seq=$total ${target} -> HTTP $response time=${elapsed}ms" + fail=$((fail + 1)) + fi + + # 达到指定次数则停止 + if [ "$MAX_COUNT" -gt 0 ] && [ "$total" -ge "$MAX_COUNT" ]; then + ctrl_c + fi + + sleep $DELAY +done diff --git a/validator/validator.go b/validator/validator.go index 29ba1be..4cc846f 100644 --- a/validator/validator.go +++ b/validator/validator.go @@ -83,6 +83,52 @@ func getExitIPInfo(client *http.Client) (string, string) { return result.Query, location } +// HTTPS 测试目标列表,随机选一个验证代理的 CONNECT 隧道能力 +var httpsTestTargets = []string{ + "https://www.google.com", + "https://www.openai.com", + "https://www.github.com", + "https://www.cloudflare.com", + "https://httpbin.org/ip", +} + +// checkHTTPSConnect 通过 HTTP 代理实际访问一个随机 HTTPS 网站,验证 CONNECT 隧道是否可用 +// 首次失败会换一个目标重试一次,避免目标网站偶尔抽风导致误杀 +func checkHTTPSConnect(proxyAddr string, timeout time.Duration) bool { + proxyURL, err := url.Parse(fmt.Sprintf("http://%s", proxyAddr)) + if err != nil { + return false + } + + client := &http.Client{ + Transport: &http.Transport{ + Proxy: http.ProxyURL(proxyURL), + TLSHandshakeTimeout: timeout, + }, + Timeout: timeout, + } + + // 随机起始索引 + start := int(time.Now().UnixNano() % int64(len(httpsTestTargets))) + + for attempt := 0; attempt < 2; attempt++ { + idx := (start + attempt) % len(httpsTestTargets) + resp, err := client.Get(httpsTestTargets[idx]) + if err != nil { + continue + } + io.Copy(io.Discard, resp.Body) + resp.Body.Close() + + // 2xx 或 3xx 都算成功(部分网站会重定向) + if resp.StatusCode >= 200 && resp.StatusCode < 400 { + return true + } + } + + return false +} + // ValidateAll 并发验证所有代理,返回验证结果 func (v *Validator) ValidateAll(proxies []storage.Proxy) []Result { var results []Result @@ -172,6 +218,13 @@ func (v *Validator) ValidateOne(p storage.Proxy) (bool, time.Duration, string, s } } + // HTTP 代理额外检测:必须支持 HTTPS CONNECT 隧道 + if p.Protocol == "http" { + if !checkHTTPSConnect(p.Address, v.timeout) { + return false, latency, exitIP, exitLocation + } + } + return true, latency, exitIP, exitLocation }