GithubBackup/httprunner

Fork 0

mirror of https://github.com/httprunner/httprunner.git synced 2026-07-16 18:12:57 +08:00

Files

History

lilong.129 4959c2e47e feat: extractJSONFromContent

2025-06-10 14:08:44 +08:00

testdata

refactor: ai asserter

2025-04-29 20:08:22 +08:00

ai.go

feat: add MCP tools registration to LLM service

2025-06-09 22:19:43 +08:00

asserter_prompts.go

refactor: json asserter

2025-05-22 18:22:12 +08:00

asserter_test.go

change: rename VLM name

2025-06-05 18:09:25 +08:00

asserter.go

feat: extractJSONFromContent

2025-06-10 14:08:44 +08:00

converter_test.go

feat: add MCP tools registration to LLM service

2025-06-09 22:19:43 +08:00

converter.go

feat: add MCP tools registration to LLM service

2025-06-09 22:19:43 +08:00

cv_vedem_test.go

refactor: NewXTDriver api, return error if init failed

2025-04-30 14:31:36 +08:00

cv_vedem.go

feat: add status code for llm

2025-04-28 21:06:53 +08:00

cv.go

refactor: NewXTDriver api, return error if init failed

2025-04-30 14:31:36 +08:00

env_test.go

feat: implement multi-model service configuration support

2025-06-06 22:17:59 +08:00

env.go

change: set llm timeout to 120s

2025-06-09 22:42:19 +08:00

parser_default.go

feat: extractJSONFromContent

2025-06-10 14:08:44 +08:00

parser_test.go

fix(uixt): fix uixt__input not working and add comprehensive unit tests

2025-06-07 15:03:29 +08:00

parser_ui_tars.go

feat: add model name display in AI actions and optimize HTML report

2025-06-08 22:08:51 +08:00

planner_prompts.go

feat: compress image data for html report

2025-06-08 23:48:23 +08:00

planner_test.go

change: rename VLM name

2025-06-05 18:09:25 +08:00

planner.go

change: remove unnecessary logs

2025-06-10 13:19:36 +08:00

README.md

feat: implement multi-model service configuration support

2025-06-06 22:17:59 +08:00

session.go

feat: implement multi-model service configuration support

2025-06-06 22:17:59 +08:00

utils.go

feat: extractJSONFromContent

2025-06-10 14:08:44 +08:00

README.md

HttpRunner AI 模块文档

📖 概述

HttpRunner AI 模块是一个集成了多种人工智能服务的 UI 自动化智能引擎，提供基于大语言模型（LLM）的智能规划、断言验证、计算机视觉识别等功能，实现真正的智能化 UI 自动化测试。

🎯 核心功能

1. 智能规划 (Planning)

视觉语言模型驱动: 基于屏幕截图和自然语言指令生成操作序列
多模型支持: 支持 UI-TARS、豆包视觉等多种专业模型
上下文感知: 维护对话历史，支持多轮交互规划
动作解析: 将模型输出解析为标准化的工具调用

2. 智能断言 (Assertion)

视觉验证: 基于屏幕截图验证断言条件
自然语言断言: 支持自然语言描述的断言条件
结构化输出: 返回标准化的断言结果和推理过程

3. 计算机视觉 (Computer Vision)

OCR 文本识别: 提取屏幕中的文本内容和位置信息
UI 元素检测: 识别界面中的图标、按钮等 UI 元素
弹窗检测: 自动识别和定位弹窗及关闭按钮
坐标转换: 支持相对坐标和绝对坐标的转换

4. 会话管理 (Session Management)

对话历史: 维护完整的对话上下文
消息管理: 智能管理用户图像消息和助手回复
历史清理: 自动清理过期的对话记录

🏗️ 架构设计

整体架构

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   UI Driver     │    │   AI Module     │    │  LLM Services   │
│   (XTDriver)    │◄──►│   (ai package)  │◄──►│ (OpenAI/豆包)   │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │  CV Services    │
                       │   (VEDEM)       │
                       └─────────────────┘

核心接口

ILLMService - LLM 服务接口

type ILLMService interface {
    Call(ctx context.Context, opts *PlanningOptions) (*PlanningResult, error)
    Assert(ctx context.Context, opts *AssertOptions) (*AssertionResult, error)
}

IPlanner - 规划器接口

type IPlanner interface {
    Call(ctx context.Context, opts *PlanningOptions) (*PlanningResult, error)
}

IAsserter - 断言器接口

type IAsserter interface {
    Assert(ctx context.Context, opts *AssertOptions) (*AssertionResult, error)
}

ICVService - 计算机视觉服务接口

type ICVService interface {
    ReadFromBuffer(imageBuf *bytes.Buffer, opts ...option.ActionOption) (*CVResult, error)
    ReadFromPath(imagePath string, opts ...option.ActionOption) (*CVResult, error)
}

🔧 主要组件

1. AI 服务管理器 (ai.go)

功能: 统一管理 LLM 服务，提供规划和断言功能的统一入口

核心类型:

type combinedLLMService struct {
    planner  IPlanner  // 提供规划功能
    asserter IAsserter // 提供断言功能
}

type ModelConfig struct {
    *openai.ChatModelConfig
    ModelType option.LLMServiceType
}

主要功能:

模型配置管理和验证
环境变量读取和验证
API 密钥安全处理
多模型类型支持

支持的模型类型:

DOUBAO_1_5_THINKING_VISION_PRO_250428: 豆包思维视觉专业版
DOUBAO_1_5_UI_TARS_250428: 豆包UI-TARS专业UI自动化模型

2. 智能规划器 (planner.go)

功能: 基于视觉语言模型进行 UI 操作规划

核心类型:

type Planner struct {
    modelConfig *ModelConfig
    model       model.ToolCallingChatModel
    parser      LLMContentParser
    history     ConversationHistory
}

type PlanningOptions struct {
    UserInstruction string          `json:"user_instruction"`
    Message         *schema.Message `json:"message"`
    Size            types.Size      `json:"size"`
}

type PlanningResult struct {
    ToolCalls     []schema.ToolCall `json:"tool_calls"`
    ActionSummary string            `json:"summary"`
    Thought       string            `json:"thought"`
    Content       string            `json:"content"`
    Error         string            `json:"error,omitempty"`
}

工作流程:

接收用户指令和屏幕截图
构建包含系统提示词的对话历史
调用视觉语言模型生成响应
解析模型输出为标准化工具调用
更新对话历史以支持多轮交互

特性:

支持工具注册和函数调用
智能对话历史管理
多种输出格式解析
详细的日志记录

3. 智能断言器 (asserter.go)

功能: 基于视觉语言模型进行断言验证

核心类型:

type Asserter struct {
    modelConfig  *ModelConfig
    model        model.ToolCallingChatModel
    systemPrompt string
    history      ConversationHistory
}

type AssertOptions struct {
    Assertion  string     `json:"assertion"`
    Screenshot string     `json:"screenshot"`
    Size       types.Size `json:"size"`
}

type AssertionResult struct {
    Pass    bool   `json:"pass"`
    Thought string `json:"thought"`
}

工作流程:

接收断言条件和屏幕截图
构建断言验证提示词
调用视觉语言模型进行判断
解析模型输出为结构化结果
返回断言通过状态和推理过程

特性:

结构化 JSON 输出格式
自然语言断言支持
详细的推理过程记录
多模型适配

4. 内容解析器 (parser_*.go)

功能: 将不同模型的输出解析为标准化的工具调用格式

JSONContentParser (parser_default.go)

适用于支持 JSON 格式输出的通用模型
解析标准 JSON 格式的动作序列
支持坐标归一化和参数处理

UITARSContentParser (parser_ui_tars.go)

专门适配 UI-TARS 模型的 Thought/Action 格式
支持多种坐标格式解析 (<point>, <bbox>, [x,y,x,y])
智能参数名称映射和归一化
相对坐标到绝对坐标转换

核心功能:

type LLMContentParser interface {
    SystemPrompt() string
    Parse(content string, size types.Size) (*PlanningResult, error)
}

type Action struct {
    ActionType   string         `json:"action_type"`
    ActionInputs map[string]any `json:"action_inputs"`
}

解析特性:

多种坐标格式支持
智能参数映射
坐标系统转换
错误处理和验证

5. 计算机视觉服务 (cv.go)

功能: 提供图像识别和分析能力

核心类型:

type CVResult struct {
    URL               string             `json:"url,omitempty"`
    OCRResult         OCRResults         `json:"ocrResult,omitempty"`
    LiveType          string             `json:"liveType,omitempty"`
    LivePopularity    int64              `json:"livePopularity,omitempty"`
    UIResult          UIResultMap        `json:"uiResult,omitempty"`
    ClosePopupsResult *ClosePopupsResult `json:"closeResult,omitempty"`
}

type OCRText struct {
    Text    string          `json:"text"`
    RectStr string          `json:"rect"`
    Rect    image.Rectangle `json:"-"`
}

type UIResult struct {
    Box
}

type ClosePopupsResult struct {
    Type      string `json:"type"`
    PopupArea Box    `json:"popupArea"`
    CloseArea Box    `json:"closeArea"`
    Text      string `json:"text"`
}

主要功能:

OCR 文本识别: 提取文本内容和精确位置
UI 元素检测: 识别按钮、图标等界面元素
弹窗检测: 自动识别弹窗和关闭按钮
区域过滤: 支持指定区域的元素筛选
坐标计算: 提供中心点和随机点计算

OCR 功能特性:

文本精确定位
正则表达式匹配
索引选择支持
区域范围过滤

6. 会话管理器 (session.go)

功能: 管理 AI 对话的历史记录和上下文

核心类型:

type ConversationHistory []*schema.Message

管理策略:

用户消息: 最多保留 4 条用户图像消息
助手消息: 最多保留 10 条助手回复
自动清理: 超出限制时自动删除最旧的消息
系统消息: 始终保留系统提示词

功能特性:

智能消息管理
内存优化
日志记录和调试
敏感信息脱敏

🚀 使用指南

1. 环境配置

HttpRunner AI 模块支持多模型服务配置，您可以同时配置多个大模型服务，然后在测试用例中灵活切换。

多模型配置方式

服务特定配置：

# 豆包思维视觉专业版配置
DOUBAO_1_5_THINKING_VISION_PRO_250428_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
DOUBAO_1_5_THINKING_VISION_PRO_250428_API_KEY=your_doubao_api_key

# 豆包UI-TARS配置
DOUBAO_1_5_UI_TARS_250428_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
DOUBAO_1_5_UI_TARS_250428_API_KEY=your_doubao_ui_tars_api_key

**默认配置（向后兼容）**：
```bash
# 默认配置，当没有找到服务特定配置时使用
LLM_MODEL_NAME=doubao-1.5-thinking-vision-pro-250428
OPENAI_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
OPENAI_API_KEY=your_default_api_key

环境变量命名规则

将服务名称转换为大写
将连字符 - 和点号 . 替换为下划线 _
添加对应的后缀：_BASE_URL、_API_KEY
模型名称直接从服务类型推导，无需单独配置

例如：

doubao-1.5-thinking-vision-pro-250428 → DOUBAO_1_5_THINKING_VISION_PRO_250428_*
gpt-4 → GPT_4_*
claude-3.5-sonnet → CLAUDE_3_5_SONNET_*

配置优先级

服务特定配置（最高优先级）：{SERVICE_NAME}_BASE_URL、{SERVICE_NAME}_API_KEY
默认配置（向后兼容）：OPENAI_BASE_URL、OPENAI_API_KEY、LLM_MODEL_NAME
模型名称：优先使用服务类型名称，仅在完全使用默认配置时才使用 LLM_MODEL_NAME

示例 .env 文件

# 默认配置
LLM_MODEL_NAME=doubao-1.5-thinking-vision-pro-250428
OPENAI_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
OPENAI_API_KEY=your_default_api_key

# doubao-1.5-thinking-vision-pro-250428
DOUBAO_1_5_THINKING_VISION_PRO_250428_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
DOUBAO_1_5_THINKING_VISION_PRO_250428_API_KEY=your_doubao_thinking_api_key

# doubao-1.5-ui-tars-250428
DOUBAO_1_5_UI_TARS_250428_BASE_URL=https://ark.cn-beijing.volces.com/api/v3
DOUBAO_1_5_UI_TARS_250428_API_KEY=your_doubao_ui_tars_api_key

2. 创建 LLM 服务

在测试用例中指定服务

{
    "config": {
        "name": "AI测试用例",
        "llm_service": "doubao-1.5-thinking-vision-pro-250428"
    },
    "teststeps": [
        {
            "name": "AI操作步骤",
            "android": {
                "actions": [
                    {
                        "method": "start_to_goal",
                        "params": "启动应用并完成某个任务"
                    }
                ]
            }
        }
    ]
}

在Go代码中使用

// 创建豆包思维视觉专业版服务
llmService, err := ai.NewLLMService(option.DOUBAO_1_5_THINKING_VISION_PRO_250428)
if err != nil {
    log.Fatal().Err(err).Msg("failed to create LLM service")
}

// 创建豆包UI-TARS服务
llmService, err := ai.NewLLMService(option.DOUBAO_1_5_UI_TARS_250428)
if err != nil {
    log.Fatal().Err(err).Msg("failed to create LLM service")
}

模型切换

要切换到不同的模型服务，只需要修改测试用例中的 llm_service 字段：

{
    "config": {
        "name": "连连看游戏测试",
        "llm_service": "doubao-1.5-ui-tars-250428"
    }
}

系统会自动根据服务名称获取对应的配置，无需修改环境变量。

3. 智能规划使用

// 准备规划选项
planningOpts := &ai.PlanningOptions{
    UserInstruction: "点击登录按钮",
    Message: &schema.Message{
        Role: schema.User,
        MultiContent: []schema.ChatMessagePart{
            {
                Type: schema.ChatMessagePartTypeImageURL,
                ImageURL: &schema.ChatMessageImageURL{
                    URL: "data:image/jpeg;base64," + base64Screenshot,
                },
            },
        },
    },
    Size: types.Size{Width: 1080, Height: 1920},
}

// 执行规划
result, err := llmService.Call(ctx, planningOpts)
if err != nil {
    log.Error().Err(err).Msg("planning failed")
    return
}

// 处理规划结果
for _, toolCall := range result.ToolCalls {
    log.Info().Str("action", toolCall.Function.Name).
        Interface("args", toolCall.Function.Arguments).
        Msg("planned action")
}

4. 智能断言使用

// 准备断言选项
assertOpts := &ai.AssertOptions{
    Assertion:  "登录按钮应该可见",
    Screenshot: "data:image/jpeg;base64," + base64Screenshot,
    Size:       types.Size{Width: 1080, Height: 1920},
}

// 执行断言
result, err := llmService.Assert(ctx, assertOpts)
if err != nil {
    log.Error().Err(err).Msg("assertion failed")
    return
}

// 检查断言结果
if result.Pass {
    log.Info().Str("thought", result.Thought).Msg("assertion passed")
} else {
    log.Warn().Str("thought", result.Thought).Msg("assertion failed")
}

5. 计算机视觉使用

// 创建 CV 服务
cvService, err := ai.NewCVService(option.CVServiceTypeVEDEM)
if err != nil {
    log.Fatal().Err(err).Msg("failed to create CV service")
}

// 从图像缓冲区读取
cvResult, err := cvService.ReadFromBuffer(imageBuffer)
if err != nil {
    log.Error().Err(err).Msg("CV analysis failed")
    return
}

// 处理 OCR 结果
ocrTexts := cvResult.OCRResult.ToOCRTexts()
for _, ocrText := range ocrTexts {
    log.Info().Str("text", ocrText.Text).
        Str("rect", ocrText.RectStr).
        Msg("found text")
}

// 查找特定文本
targetText, err := ocrTexts.FindText("登录", option.WithRegex(false))
if err != nil {
    log.Error().Err(err).Msg("text not found")
    return
}

// 获取文本中心点
center := targetText.Center()
log.Info().Float64("x", center.X).Float64("y", center.Y).
    Msg("text center coordinates")

📋 配置参数

模型配置

参数	类型	说明	默认值
`BaseURL`	string	API 基础 URL	从环境变量读取
`APIKey`	string	API 密钥	从环境变量读取
`Model`	string	模型名称	从环境变量读取
`Temperature`	float32	温度参数	0
`TopP`	float32	Top-P 参数	0.7
`Timeout`	time.Duration	请求超时	30s

规划选项

参数	类型	说明	必需
`UserInstruction`	string	用户指令	✓
`Message`	*schema.Message	消息内容	✓
`Size`	types.Size	屏幕尺寸	✓

断言选项

参数	类型	说明	必需
`Assertion`	string	断言条件	✓
`Screenshot`	string	Base64 截图	✓
`Size`	types.Size	屏幕尺寸	✓

🔍 高级特性

1. 多模型适配

AI 模块支持多种不同的语言模型，每种模型都有其特定的优势：

豆包思维视觉专业版: 支持深度思考的视觉语言模型，适合复杂场景分析
豆包UI-TARS: 专门针对 UI 自动化优化的模型，支持 Thought/Action 格式

2. 坐标系统转换

支持多种坐标格式的智能转换：

// 相对坐标 (0-1000 范围) 转换为绝对像素坐标
func convertRelativeToAbsolute(relativeCoord float64, isXCoord bool, size types.Size) float64 {
    if isXCoord {
        return math.Round((relativeCoord/DefaultFactor*float64(size.Width))*10) / 10
    }
    return math.Round((relativeCoord/DefaultFactor*float64(size.Height))*10) / 10
}

3. 智能参数映射

自动处理不同模型输出格式的参数名称映射：

func normalizeParameterName(paramName string) string {
    switch paramName {
    case "start_point":
        return "start_box"
    case "end_point":
        return "end_box"
    case "point":
        return "start_box"
    default:
        return paramName
    }
}

4. 对话历史优化

智能管理对话历史，平衡上下文完整性和内存使用：

用户图像消息限制：4 条
助手回复消息限制：10 条
自动清理策略：FIFO (先进先出)

⚠️ 注意事项

1. 环境变量配置

确保所有必需的环境变量都已正确设置
API 密钥需要有足够的权限和配额
支持多模型配置，可以同时配置多个服务
模型名称自动从服务类型推导，无需手动配置

2. 图像格式要求

支持 Base64 编码的图像数据
推荐使用 JPEG 格式以减少数据传输量
图像尺寸信息必须准确提供

3. 坐标系统

豆包UI-TARS 使用 1000x1000 相对坐标系统
需要正确的屏幕尺寸信息进行坐标转换
注意不同模型的坐标格式差异

4. 错误处理

网络请求可能失败，需要适当的重试机制
模型输出格式可能不稳定，需要健壮的解析逻辑
资源使用需要监控，避免内存泄漏

5. 性能考虑

LLM 调用有延迟，适合异步处理
图像数据较大，注意网络传输优化
对话历史会占用内存，需要定期清理

🧪 测试数据

模块包含丰富的测试数据，位于 testdata/ 目录：

xhs-feed.jpeg: 小红书信息流界面
popup_risk_warning.png: 风险警告弹窗
llk_*.png: 连连看游戏界面
deepseek_*.png: DeepSeek 应用界面
chat_list.jpeg: 聊天列表界面

这些测试数据覆盖了各种典型的 UI 场景，用于验证 AI 模块的功能正确性。

📈 扩展开发

添加新的模型支持

在 option 包中定义新的模型类型
实现对应的 LLMContentParser
在 GetModelConfig 中添加模型验证逻辑
更新系统提示词和输出格式

添加新的 CV 服务

实现 ICVService 接口
在 NewCVService 中添加服务创建逻辑
定义服务特定的配置和选项
添加相应的测试用例

优化解析逻辑

扩展坐标格式支持
改进参数映射规则
增强错误处理机制
优化性能和内存使用

通过这些扩展点，AI 模块可以持续演进，支持更多的模型和服务，提供更强大的智能化 UI 自动化能力。

README.md Unescape Escape

HttpRunner AI 模块文档

📖 概述

🎯 核心功能

1. 智能规划 (Planning)

2. 智能断言 (Assertion)

3. 计算机视觉 (Computer Vision)

4. 会话管理 (Session Management)

🏗️ 架构设计

整体架构

核心接口

ILLMService - LLM 服务接口

IPlanner - 规划器接口

IAsserter - 断言器接口

ICVService - 计算机视觉服务接口

🔧 主要组件

1. AI 服务管理器 (ai.go)

2. 智能规划器 (planner.go)

3. 智能断言器 (asserter.go)

4. 内容解析器 (parser_*.go)

JSONContentParser (parser_default.go)

UITARSContentParser (parser_ui_tars.go)

5. 计算机视觉服务 (cv.go)

6. 会话管理器 (session.go)

🚀 使用指南

1. 环境配置

多模型配置方式

环境变量命名规则

配置优先级

示例 .env 文件

2. 创建 LLM 服务

在测试用例中指定服务

在Go代码中使用

模型切换

3. 智能规划使用

4. 智能断言使用

5. 计算机视觉使用

📋 配置参数

模型配置

规划选项

断言选项

🔍 高级特性

1. 多模型适配

2. 坐标系统转换

3. 智能参数映射

4. 对话历史优化

⚠️ 注意事项

1. 环境变量配置

2. 图像格式要求

3. 坐标系统

4. 错误处理

5. 性能考虑

🧪 测试数据

📈 扩展开发

添加新的模型支持

添加新的 CV 服务

优化解析逻辑

README.md