feat: 实现 AIQuery 功能并支持 OutputSchema

- 新增 AIQuery 方法到 StepMobile,支持使用自然语言从屏幕中提取信息
- 实现 AIQuery 在 driver_ext_ai.go 中的完整功能,包括屏幕截图和 LLM 查询
- 添加 OutputSchema 支持,允许用户定义自定义输出格式进行结构化查询
- 新增 ToolAIQuery MCP 工具,完整集成到 MCP 服务器中
- 在 ActionOptions 中添加 OutputSchema 字段和 WithOutputSchema 选项函数
- 添加 ACTION_Query 的配置支持和字段映射
- 完善测试覆盖:
  * 添加 TestAIQuery 单元测试,包含多种 OutputSchema 使用场景
  * 添加 TestToolAIQuery MCP 工具测试
  * 定义 GameInfo、UIElementInfo 等结构体用于测试
- 更新文档:
  * 在 docs/uixt/ai.md 中添加完整的 AIQuery 使用指南
  * 包含基本用法、OutputSchema 示例、最佳实践等
- 支持复杂的嵌套结构体和数组类型的 OutputSchema
- 与现有 AIAction、AIAssert 功能保持一致的 API 设计
This commit is contained in:
lilong.129
2025-06-12 23:12:25 +08:00
parent fb0418fa95
commit f6e7e970f8
9 changed files with 502 additions and 11 deletions

View File

@@ -508,4 +508,210 @@ type Element struct {
queryResult, err := driver.LLMService.Query(ctx, queryOpts)
```
通过 HttpRunner UIXT AI 模块,您可以轻松实现智能化的 UI 自动化测试,大幅提升测试效率和准确性。
通过 HttpRunner UIXT AI 模块,您可以轻松实现智能化的 UI 自动化测试,大幅提升测试效率和准确性。
# AI 功能使用指南
HttpRunner v5 提供了强大的 AI 功能支持基于视觉语言模型VLM的智能化测试操作。
## 功能概述
HttpRunner v5 集成了多种 AI 功能:
- **AIAction**: 使用自然语言执行 UI 操作
- **AIAssert**: 使用自然语言进行断言验证
- **AIQuery**: 使用自然语言从屏幕中提取信息
- **StartToGoal**: 目标导向的智能操作序列
## AIQuery 功能详解
### 概述
AIQuery 是 HttpRunner v5 中新增的 AI 查询功能允许用户使用自然语言从屏幕截图中提取信息。它基于视觉语言模型VLM能够理解屏幕内容并返回结构化的查询结果。
### 功能特点
- **自然语言查询**: 使用自然语言描述要查询的信息
- **智能屏幕分析**: 基于 AI 视觉模型分析屏幕内容
- **结构化输出**: 返回格式化的查询结果
- **多平台支持**: 支持 Android、iOS、Browser 等平台
### 基本用法
#### 1. 在测试步骤中使用 AIQuery
```go
// 基本查询示例
hrp.NewStep("Query Screen Content").
Android().
AIQuery("Please describe what is displayed on the screen")
// 提取特定信息
hrp.NewStep("Extract App List").
Android().
AIQuery("What apps are visible on the home screen? List them as a comma-separated string")
// UI 元素分析
hrp.NewStep("Analyze Buttons").
Android().
AIQuery("Are there any buttons visible? Describe their text and positions")
```
#### 2. 配置 LLM 服务
在使用 AIQuery 之前,需要配置 LLM 服务:
```go
testcase := &hrp.TestCase{
Config: hrp.NewConfig("AIQuery Test").
SetLLMService(option.OPENAI_GPT_4O), // 配置 LLM 服务
TestSteps: []hrp.IStep{
// 使用 AIQuery 的步骤
},
}
```
#### 3. 支持的选项
AIQuery 支持以下选项:
```go
hrp.NewStep("Query with Options").
Android().
AIQuery("Describe the screen content",
option.WithLLMService("openai_gpt_4o"), // 指定 LLM 服务
option.WithCVService("openai_gpt_4o"), // 指定 CV 服务
option.WithOutputSchema(CustomSchema{}), // 自定义输出格式
)
```
#### 4. 自定义输出格式 (OutputSchema)
AIQuery 支持自定义输出格式,可以返回结构化数据:
```go
// 定义自定义输出格式
type GameAnalysis struct {
Content string `json:"content"` // 必须:人类可读描述
Thought string `json:"thought"` // 必须AI推理过程
GameType string `json:"game_type"` // 游戏类型
Rows int `json:"rows"` // 行数
Cols int `json:"cols"` // 列数
Icons []string `json:"icons"` // 图标类型
TotalIcons int `json:"total_icons"` // 图标总数
}
// 使用自定义格式查询
hrp.NewStep("Analyze Game Interface").
Android().
AIQuery("分析这个连连看游戏界面,告诉我有多少行多少列,有哪些不同类型的图案",
option.WithOutputSchema(GameAnalysis{}))
```
### 实际应用场景
#### 1. 游戏界面分析
```go
// 分析连连看游戏界面
hrp.NewStep("Analyze Game Board").
Android().
AIQuery("This is a LianLianKan (连连看) game interface. Please analyze: 1) How many rows and columns are there? 2) What types of icons are present?")
```
#### 2. 应用状态检查
```go
// 检查应用状态
hrp.NewStep("Check App State").
Android().
AIQuery("Is the login screen displayed? Are there any error messages visible?")
```
#### 3. 内容提取
```go
// 提取列表内容
hrp.NewStep("Extract List Items").
Android().
AIQuery("Extract all items from the list displayed on screen as a JSON array")
```
### 与其他 AI 功能的对比
| 功能 | 用途 | 返回值 | 使用场景 |
|------|------|--------|----------|
| AIAction | 执行操作 | 无 | 点击、输入、滑动等交互操作 |
| AIAssert | 断言验证 | 布尔值 | 验证界面状态、元素存在性 |
| AIQuery | 信息查询 | 字符串 | 提取屏幕信息、分析内容 |
### 最佳实践
#### 1. 明确的查询描述
```go
// 好的示例:具体明确
AIQuery("How many unread messages are shown in the notification badge?")
// 避免:过于模糊
AIQuery("Tell me about the screen")
```
#### 2. 结构化查询
```go
// 请求结构化输出
AIQuery("List all visible buttons with their text and approximate positions in JSON format")
```
#### 3. 上下文相关查询
```go
// 结合应用上下文
AIQuery("In this shopping app, what products are displayed in the current category? Include product names and prices")
```
### 错误处理
AIQuery 可能遇到的常见错误:
1. **LLM 服务未配置**: 确保在测试配置中设置了 LLM 服务
2. **网络连接问题**: 检查网络连接和 API 密钥配置
3. **屏幕截图失败**: 确保设备连接正常
### 注意事项
1. AIQuery 需要网络连接来访问 LLM 服务
2. 查询结果的准确性依赖于所使用的 LLM 模型
3. 建议在查询中使用具体、明确的描述以获得更好的结果
4. 对于复杂的信息提取,可以要求返回 JSON 格式的结构化数据
## 完整示例
以下是一个完整的 AIQuery 使用示例:
```go
func TestAIQuery(t *testing.T) {
testCase := &hrp.TestCase{
Config: hrp.NewConfig("AIQuery Demo").
SetLLMService(option.OPENAI_GPT_4O),
TestSteps: []hrp.IStep{
hrp.NewStep("Take Screenshot").
Android().
ScreenShot(),
hrp.NewStep("Query Screen Content").
Android().
AIQuery("Please describe what is displayed on the screen and identify any interactive elements"),
hrp.NewStep("Extract App Information").
Android().
AIQuery("What apps are visible on the screen? List them as a comma-separated string"),
hrp.NewStep("Analyze UI Elements").
Android().
AIQuery("Are there any buttons or clickable elements visible? Describe their locations and purposes"),
},
}
err := hrp.NewRunner(t).Run(testCase)
assert.Nil(t, err)
}
```

View File

@@ -1 +1 @@
v5.0.0-beta-2506121751
v5.0.0-beta-2506131027

View File

@@ -201,6 +201,18 @@ func (s *StepMobile) AIAction(prompt string, opts ...option.ActionOption) *StepM
return s
}
// AIQuery query information from screen using VLM
func (s *StepMobile) AIQuery(prompt string, opts ...option.ActionOption) *StepMobile {
action := option.MobileAction{
Method: option.ACTION_Query,
Params: prompt,
Options: option.NewActionOptions(opts...),
}
s.obj().Actions = append(s.obj().Actions, action)
return s
}
// DoubleTapXY double taps the point {X,Y}, X & Y is percentage of coordinates
func (s *StepMobile) DoubleTapXY(x, y float64, opts ...option.ActionOption) *StepMobile {
s.obj().Actions = append(s.obj().Actions, option.MobileAction{
@@ -863,11 +875,15 @@ func runStepMobileUI(s *SessionRunner, step IStep) (stepResult *StepResult, err
action.Method == option.ACTION_AIAssert || action.Method == option.ACTION_Query {
if config.LLMService != "" && action.Options.LLMService == "" {
action.Options.LLMService = string(config.LLMService)
log.Debug().Str("action", string(action.Method)).Str("llmService", action.Options.LLMService).Msg("Applied global LLM service config to action")
log.Debug().Str("action", string(action.Method)).
Str("llmService", action.Options.LLMService).
Msg("Applied global LLM service config to action")
}
if config.CVService != "" && action.Options.CVService == "" {
action.Options.CVService = string(config.CVService)
log.Debug().Str("action", string(action.Method)).Str("cvService", action.Options.CVService).Msg("Applied global CV service config to action")
log.Debug().Str("action", string(action.Method)).
Str("cvService", action.Options.CVService).
Msg("Applied global CV service config to action")
}
}
}

View File

@@ -11,6 +11,35 @@ import (
"github.com/stretchr/testify/require"
)
// GameInfo 定义游戏界面分析的输出格式
type GameInfo struct {
Content string `json:"content"` // 必须:人类可读描述
Thought string `json:"thought"` // 必须AI推理过程
GameType string `json:"game_type"` // 游戏类型
Rows int `json:"rows"` // 行数
Cols int `json:"cols"` // 列数
Icons []string `json:"icons"` // 图标类型
TotalIcons int `json:"total_icons"` // 图标总数
}
// UIElementInfo 定义UI元素分析的输出格式
type UIElementInfo struct {
Content string `json:"content"` // 必须:人类可读描述
Thought string `json:"thought"` // 必须AI推理过程
ScreenType string `json:"screen_type"` // 屏幕类型
Elements []UIElement `json:"elements"` // UI元素列表
ButtonCount int `json:"button_count"` // 按钮数量
TextCount int `json:"text_count"` // 文本数量
}
// UIElement 定义单个UI元素
type UIElement struct {
Type string `json:"type"` // 元素类型 (button, text, input等)
Text string `json:"text"` // 元素文本
Clickable bool `json:"clickable"` // 是否可点击
Description string `json:"description"` // 元素描述
}
func TestIOSSettingsAction(t *testing.T) {
testCase := &hrp.TestCase{
Config: hrp.NewConfig("ios ui action on Settings").
@@ -173,3 +202,92 @@ func TestAIAction(t *testing.T) {
err := hrp.NewRunner(t).Run(testCase)
assert.Nil(t, err)
}
func TestAIQuery(t *testing.T) {
testCase := &hrp.TestCase{
Config: hrp.NewConfig("AIQuery Demo with OutputSchema").
SetLLMService(option.DOUBAO_SEED_1_6_250615), // Configure LLM service for AI operations
TestSteps: []hrp.IStep{
// Step 1: Take a screenshot for analysis
hrp.NewStep("Take Screenshot").
Android().
ScreenShot(),
// Step 2: Basic AIQuery without OutputSchema
hrp.NewStep("Basic Query").
Android().
AIQuery("Please describe what is displayed on the screen and identify any interactive elements"),
// Step 3: Use AIQuery to extract specific information
hrp.NewStep("Extract App Information").
Android().
AIQuery("What apps are visible on the screen? List them as a comma-separated string"),
// Step 4: Use AIQuery for UI element analysis
hrp.NewStep("Analyze UI Elements").
Android().
AIQuery("Are there any buttons or clickable elements visible? Describe their locations and purposes"),
// Step 5: Use AIQuery with validation
hrp.NewStep("Query and Validate").
Android().
AIQuery("Is the home screen currently displayed?").
Validate().
AssertAI("The query result should indicate whether home screen is visible"),
// Step 6: Use AIQuery with simple custom OutputSchema
hrp.NewStep("Query with Simple Custom Schema").
Android().
AIQuery("Analyze the screen and provide structured information about UI elements",
option.WithOutputSchema(struct {
Content string `json:"content"`
Thought string `json:"thought"`
ElementType string `json:"element_type"`
ElementText []string `json:"element_text"`
ButtonCount int `json:"button_count"`
}{})),
// Step 7: Use AIQuery with GameInfo OutputSchema
hrp.NewStep("Game Analysis with Custom Schema").
Android().
AIQuery("分析这个游戏界面,告诉我游戏类型、行列数和图标信息",
option.WithOutputSchema(GameInfo{})),
// Step 8: Use AIQuery with UIElementInfo OutputSchema
hrp.NewStep("UI Element Analysis with Custom Schema").
Android().
AIQuery("分析屏幕上的UI元素识别所有按钮、文本和可交互元素",
option.WithOutputSchema(UIElementInfo{})),
// Step 9: Complex analysis with nested structure
hrp.NewStep("Complex Analysis with Nested Schema").
Android().
AIQuery("Provide a comprehensive analysis of this interface including all interactive elements and their properties",
option.WithOutputSchema(struct {
Content string `json:"content"`
Thought string `json:"thought"`
AppName string `json:"app_name"`
ScreenTitle string `json:"screen_title"`
MainActions []struct {
Name string `json:"name"`
Description string `json:"description"`
Available bool `json:"available"`
} `json:"main_actions"`
NavigationElements []struct {
Type string `json:"type"`
Label string `json:"label"`
Position string `json:"position"`
} `json:"navigation_elements"`
ContentSummary struct {
HasImages bool `json:"has_images"`
HasText bool `json:"has_text"`
HasForms bool `json:"has_forms"`
Keywords []string `json:"keywords"`
} `json:"content_summary"`
}{})),
},
}
err := hrp.NewRunner(t).Run(testCase)
assert.Nil(t, err)
}

View File

@@ -322,7 +322,37 @@ type SessionData struct {
}
func (dExt *XTDriver) AIQuery(text string, opts ...option.ActionOption) (string, error) {
return "", nil
if dExt.LLMService == nil {
return "", errors.New("LLM service is not initialized")
}
screenShotBase64, err := GetScreenShotBufferBase64(dExt.IDriver)
if err != nil {
return "", err
}
// get window size
size, err := dExt.IDriver.WindowSize()
if err != nil {
return "", errors.Wrap(err, "get window size for AI query failed")
}
// parse action options to extract OutputSchema
actionOptions := option.NewActionOptions(opts...)
// execute query
queryOpts := &ai.QueryOptions{
Query: text,
Screenshot: screenShotBase64,
Size: size,
OutputSchema: actionOptions.OutputSchema,
}
result, err := dExt.LLMService.Query(context.Background(), queryOpts)
if err != nil {
return "", errors.Wrap(err, "AI query failed")
}
return result.Content, nil
}
func (dExt *XTDriver) AIAssert(assertion string, opts ...option.ActionOption) error {

View File

@@ -127,6 +127,7 @@ func (s *MCPServer4XTDriver) registerTools() {
// AI Tools
s.registerTool(&ToolStartToGoal{})
s.registerTool(&ToolAIAction{})
s.registerTool(&ToolAIQuery{})
s.registerTool(&ToolFinished{})
}

View File

@@ -115,6 +115,7 @@ func TestToolInterfaces(t *testing.T) {
&ToolSecondaryClickBySelector{},
&ToolWebCloseTab{},
&ToolAIAction{},
&ToolAIQuery{},
&ToolFinished{},
}
@@ -1308,6 +1309,39 @@ func TestToolAIAction(t *testing.T) {
assert.Error(t, err)
}
// TestToolAIQuery tests the ToolAIQuery implementation
func TestToolAIQuery(t *testing.T) {
tool := &ToolAIQuery{}
// Test Name
assert.Equal(t, option.ACTION_Query, tool.Name())
// Test Description
assert.NotEmpty(t, tool.Description())
// Test Options
options := tool.Options()
assert.NotNil(t, options)
// Test ConvertActionToCallToolRequest with valid params
action := option.MobileAction{
Method: option.ACTION_Query,
Params: "What is displayed on the screen?",
}
request, err := tool.ConvertActionToCallToolRequest(action)
assert.NoError(t, err)
assert.Equal(t, string(option.ACTION_Query), request.Params.Name)
assert.Equal(t, "What is displayed on the screen?", request.Params.Arguments["prompt"])
// Test ConvertActionToCallToolRequest with invalid params
invalidAction := option.MobileAction{
Method: option.ACTION_Query,
Params: 123, // should be string
}
_, err = tool.ConvertActionToCallToolRequest(invalidAction)
assert.Error(t, err)
}
// TestToolFinished tests the ToolFinished implementation
func TestToolFinished(t *testing.T) {
tool := &ToolFinished{}

View File

@@ -130,6 +130,71 @@ func (t *ToolAIAction) ConvertActionToCallToolRequest(action option.MobileAction
return mcp.CallToolRequest{}, fmt.Errorf("invalid AI action params: %v", action.Params)
}
// ToolAIQuery implements the ai_query tool call.
type ToolAIQuery struct {
// Return data fields - these define the structure of data returned by this tool
Prompt string `json:"prompt" desc:"AI query prompt that was executed"`
Result string `json:"result" desc:"Query result content"`
}
func (t *ToolAIQuery) Name() option.ActionName {
return option.ACTION_Query
}
func (t *ToolAIQuery) Description() string {
return "Query information from screen using AI vision model with natural language prompts"
}
func (t *ToolAIQuery) Options() []mcp.ToolOption {
unifiedReq := &option.ActionOptions{}
return unifiedReq.GetMCPOptions(option.ACTION_Query)
}
func (t *ToolAIQuery) Implement() server.ToolHandlerFunc {
return func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
driverExt, err := setupXTDriver(ctx, request.Params.Arguments)
if err != nil {
return nil, fmt.Errorf("setup driver failed: %w", err)
}
unifiedReq, err := parseActionOptions(request.Params.Arguments)
if err != nil {
return nil, err
}
// Build action options from unified request
opts := unifiedReq.Options()
// AI query logic with options
result, err := driverExt.AIQuery(unifiedReq.Prompt, opts...)
if err != nil {
return NewMCPErrorResponse(fmt.Sprintf("AI query failed: %s", err.Error())), nil
}
message := fmt.Sprintf("Successfully queried information with prompt: %s", unifiedReq.Prompt)
returnData := ToolAIQuery{
Prompt: unifiedReq.Prompt,
Result: result,
}
return NewMCPSuccessResponse(message, &returnData), nil
}
}
func (t *ToolAIQuery) ConvertActionToCallToolRequest(action option.MobileAction) (mcp.CallToolRequest, error) {
if prompt, ok := action.Params.(string); ok {
arguments := map[string]any{
"prompt": prompt,
}
// Extract options to arguments
extractActionOptionsToArguments(action.GetOptions(), arguments)
return buildMCPCallToolRequest(t.Name(), arguments), nil
}
return mcp.CallToolRequest{}, fmt.Errorf("invalid AI query params: %v", action.Params)
}
// ToolFinished implements the finished tool call.
type ToolFinished struct {
// Return data fields - these define the structure of data returned by this tool

View File

@@ -184,11 +184,12 @@ type ActionOptions struct {
Params []float64 `json:"params,omitempty" yaml:"params,omitempty" desc:"Generic parameter array"`
// AI related
Prompt string `json:"prompt,omitempty" yaml:"prompt,omitempty" desc:"AI action prompt"`
Content string `json:"content,omitempty" yaml:"content,omitempty" desc:"Content for finished action"`
LLMService string `json:"llm_service,omitempty" yaml:"llm_service,omitempty" desc:"LLM service type for AI actions"`
CVService string `json:"cv_service,omitempty" yaml:"cv_service,omitempty" desc:"Computer vision service type for AI actions"`
ResetHistory bool `json:"reset_history,omitempty" yaml:"reset_history,omitempty" desc:"Whether to reset conversation history before AI planning"`
Prompt string `json:"prompt,omitempty" yaml:"prompt,omitempty" desc:"AI action prompt"`
Content string `json:"content,omitempty" yaml:"content,omitempty" desc:"Content for finished action"`
LLMService string `json:"llm_service,omitempty" yaml:"llm_service,omitempty" desc:"LLM service type for AI actions"`
CVService string `json:"cv_service,omitempty" yaml:"cv_service,omitempty" desc:"Computer vision service type for AI actions"`
ResetHistory bool `json:"reset_history,omitempty" yaml:"reset_history,omitempty" desc:"Whether to reset conversation history before AI planning"`
OutputSchema interface{} `json:"output_schema,omitempty" yaml:"output_schema,omitempty" desc:"Custom output schema for structured AI query response"`
// Time related
Seconds float64 `json:"seconds,omitempty" yaml:"seconds,omitempty" desc:"Sleep duration in seconds"`
@@ -558,6 +559,13 @@ func WithResetHistory(resetHistory bool) ActionOption {
}
}
// WithOutputSchema sets the custom output schema for structured AI query response
func WithOutputSchema(schema interface{}) ActionOption {
return func(o *ActionOptions) {
o.OutputSchema = schema
}
}
// HTTP API direct usage methods
// ValidateForHTTPAPI validates the request for HTTP API usage
@@ -700,6 +708,9 @@ func (o *ActionOptions) validateActionSpecificFields(actionType ActionName) erro
ACTION_StartToGoal: func() error {
return o.requireFields("prompt", o.Prompt != "")
},
ACTION_Query: func() error {
return o.requireFields("prompt", o.Prompt != "")
},
ACTION_Finished: func() error {
return o.requireFields("content", o.Content != "")
},
@@ -774,6 +785,8 @@ func (o *ActionOptions) GetMCPOptions(actionType ActionName) []mcp.ToolOption {
ACTION_SleepRandom: {"platform", "serial", "params"},
ACTION_AIAction: {"platform", "serial", "prompt", "llm_service", "cv_service"},
ACTION_StartToGoal: {"platform", "serial", "prompt", "llm_service", "cv_service"},
ACTION_Query: {"platform", "serial", "prompt", "llm_service", "cv_service", "output_schema"},
ACTION_AIAssert: {"platform", "serial", "prompt", "llm_service", "cv_service"},
ACTION_Finished: {"content"},
ACTION_ListAvailableDevices: {},
ACTION_SelectDevice: {"platform", "serial"},
@@ -862,7 +875,15 @@ func (o *ActionOptions) generateMCPOptionsForFields(fields []string) []mcp.ToolO
}
}
case reflect.Map, reflect.Interface:
// Skip map and interface types for now
// Handle OutputSchema as object type
if name == "output_schema" {
if required {
options = append(options, mcp.WithObject(name, mcp.Required(), mcp.Description(desc)))
} else {
options = append(options, mcp.WithObject(name, mcp.Description(desc)))
}
}
// Skip other map and interface types for now
continue
default:
log.Warn().Str("field_type", fieldType.String()).Msg("Unsupported field type")