mirror of
https://github.com/httprunner/httprunner.git
synced 2026-05-07 05:42:46 +08:00
feat: 实现 AIQuery 功能并支持 OutputSchema
- 新增 AIQuery 方法到 StepMobile,支持使用自然语言从屏幕中提取信息 - 实现 AIQuery 在 driver_ext_ai.go 中的完整功能,包括屏幕截图和 LLM 查询 - 添加 OutputSchema 支持,允许用户定义自定义输出格式进行结构化查询 - 新增 ToolAIQuery MCP 工具,完整集成到 MCP 服务器中 - 在 ActionOptions 中添加 OutputSchema 字段和 WithOutputSchema 选项函数 - 添加 ACTION_Query 的配置支持和字段映射 - 完善测试覆盖: * 添加 TestAIQuery 单元测试,包含多种 OutputSchema 使用场景 * 添加 TestToolAIQuery MCP 工具测试 * 定义 GameInfo、UIElementInfo 等结构体用于测试 - 更新文档: * 在 docs/uixt/ai.md 中添加完整的 AIQuery 使用指南 * 包含基本用法、OutputSchema 示例、最佳实践等 - 支持复杂的嵌套结构体和数组类型的 OutputSchema - 与现有 AIAction、AIAssert 功能保持一致的 API 设计
This commit is contained in:
@@ -508,4 +508,210 @@ type Element struct {
|
||||
queryResult, err := driver.LLMService.Query(ctx, queryOpts)
|
||||
```
|
||||
|
||||
通过 HttpRunner UIXT AI 模块,您可以轻松实现智能化的 UI 自动化测试,大幅提升测试效率和准确性。
|
||||
通过 HttpRunner UIXT AI 模块,您可以轻松实现智能化的 UI 自动化测试,大幅提升测试效率和准确性。
|
||||
|
||||
# AI 功能使用指南
|
||||
|
||||
HttpRunner v5 提供了强大的 AI 功能,支持基于视觉语言模型(VLM)的智能化测试操作。
|
||||
|
||||
## 功能概述
|
||||
|
||||
HttpRunner v5 集成了多种 AI 功能:
|
||||
|
||||
- **AIAction**: 使用自然语言执行 UI 操作
|
||||
- **AIAssert**: 使用自然语言进行断言验证
|
||||
- **AIQuery**: 使用自然语言从屏幕中提取信息
|
||||
- **StartToGoal**: 目标导向的智能操作序列
|
||||
|
||||
## AIQuery 功能详解
|
||||
|
||||
### 概述
|
||||
|
||||
AIQuery 是 HttpRunner v5 中新增的 AI 查询功能,允许用户使用自然语言从屏幕截图中提取信息。它基于视觉语言模型(VLM),能够理解屏幕内容并返回结构化的查询结果。
|
||||
|
||||
### 功能特点
|
||||
|
||||
- **自然语言查询**: 使用自然语言描述要查询的信息
|
||||
- **智能屏幕分析**: 基于 AI 视觉模型分析屏幕内容
|
||||
- **结构化输出**: 返回格式化的查询结果
|
||||
- **多平台支持**: 支持 Android、iOS、Browser 等平台
|
||||
|
||||
### 基本用法
|
||||
|
||||
#### 1. 在测试步骤中使用 AIQuery
|
||||
|
||||
```go
|
||||
// 基本查询示例
|
||||
hrp.NewStep("Query Screen Content").
|
||||
Android().
|
||||
AIQuery("Please describe what is displayed on the screen")
|
||||
|
||||
// 提取特定信息
|
||||
hrp.NewStep("Extract App List").
|
||||
Android().
|
||||
AIQuery("What apps are visible on the home screen? List them as a comma-separated string")
|
||||
|
||||
// UI 元素分析
|
||||
hrp.NewStep("Analyze Buttons").
|
||||
Android().
|
||||
AIQuery("Are there any buttons visible? Describe their text and positions")
|
||||
```
|
||||
|
||||
#### 2. 配置 LLM 服务
|
||||
|
||||
在使用 AIQuery 之前,需要配置 LLM 服务:
|
||||
|
||||
```go
|
||||
testcase := &hrp.TestCase{
|
||||
Config: hrp.NewConfig("AIQuery Test").
|
||||
SetLLMService(option.OPENAI_GPT_4O), // 配置 LLM 服务
|
||||
TestSteps: []hrp.IStep{
|
||||
// 使用 AIQuery 的步骤
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
#### 3. 支持的选项
|
||||
|
||||
AIQuery 支持以下选项:
|
||||
|
||||
```go
|
||||
hrp.NewStep("Query with Options").
|
||||
Android().
|
||||
AIQuery("Describe the screen content",
|
||||
option.WithLLMService("openai_gpt_4o"), // 指定 LLM 服务
|
||||
option.WithCVService("openai_gpt_4o"), // 指定 CV 服务
|
||||
option.WithOutputSchema(CustomSchema{}), // 自定义输出格式
|
||||
)
|
||||
```
|
||||
|
||||
#### 4. 自定义输出格式 (OutputSchema)
|
||||
|
||||
AIQuery 支持自定义输出格式,可以返回结构化数据:
|
||||
|
||||
```go
|
||||
// 定义自定义输出格式
|
||||
type GameAnalysis struct {
|
||||
Content string `json:"content"` // 必须:人类可读描述
|
||||
Thought string `json:"thought"` // 必须:AI推理过程
|
||||
GameType string `json:"game_type"` // 游戏类型
|
||||
Rows int `json:"rows"` // 行数
|
||||
Cols int `json:"cols"` // 列数
|
||||
Icons []string `json:"icons"` // 图标类型
|
||||
TotalIcons int `json:"total_icons"` // 图标总数
|
||||
}
|
||||
|
||||
// 使用自定义格式查询
|
||||
hrp.NewStep("Analyze Game Interface").
|
||||
Android().
|
||||
AIQuery("分析这个连连看游戏界面,告诉我有多少行多少列,有哪些不同类型的图案",
|
||||
option.WithOutputSchema(GameAnalysis{}))
|
||||
```
|
||||
|
||||
### 实际应用场景
|
||||
|
||||
#### 1. 游戏界面分析
|
||||
|
||||
```go
|
||||
// 分析连连看游戏界面
|
||||
hrp.NewStep("Analyze Game Board").
|
||||
Android().
|
||||
AIQuery("This is a LianLianKan (连连看) game interface. Please analyze: 1) How many rows and columns are there? 2) What types of icons are present?")
|
||||
```
|
||||
|
||||
#### 2. 应用状态检查
|
||||
|
||||
```go
|
||||
// 检查应用状态
|
||||
hrp.NewStep("Check App State").
|
||||
Android().
|
||||
AIQuery("Is the login screen displayed? Are there any error messages visible?")
|
||||
```
|
||||
|
||||
#### 3. 内容提取
|
||||
|
||||
```go
|
||||
// 提取列表内容
|
||||
hrp.NewStep("Extract List Items").
|
||||
Android().
|
||||
AIQuery("Extract all items from the list displayed on screen as a JSON array")
|
||||
```
|
||||
|
||||
### 与其他 AI 功能的对比
|
||||
|
||||
| 功能 | 用途 | 返回值 | 使用场景 |
|
||||
|------|------|--------|----------|
|
||||
| AIAction | 执行操作 | 无 | 点击、输入、滑动等交互操作 |
|
||||
| AIAssert | 断言验证 | 布尔值 | 验证界面状态、元素存在性 |
|
||||
| AIQuery | 信息查询 | 字符串 | 提取屏幕信息、分析内容 |
|
||||
|
||||
### 最佳实践
|
||||
|
||||
#### 1. 明确的查询描述
|
||||
|
||||
```go
|
||||
// 好的示例:具体明确
|
||||
AIQuery("How many unread messages are shown in the notification badge?")
|
||||
|
||||
// 避免:过于模糊
|
||||
AIQuery("Tell me about the screen")
|
||||
```
|
||||
|
||||
#### 2. 结构化查询
|
||||
|
||||
```go
|
||||
// 请求结构化输出
|
||||
AIQuery("List all visible buttons with their text and approximate positions in JSON format")
|
||||
```
|
||||
|
||||
#### 3. 上下文相关查询
|
||||
|
||||
```go
|
||||
// 结合应用上下文
|
||||
AIQuery("In this shopping app, what products are displayed in the current category? Include product names and prices")
|
||||
```
|
||||
|
||||
### 错误处理
|
||||
|
||||
AIQuery 可能遇到的常见错误:
|
||||
|
||||
1. **LLM 服务未配置**: 确保在测试配置中设置了 LLM 服务
|
||||
2. **网络连接问题**: 检查网络连接和 API 密钥配置
|
||||
3. **屏幕截图失败**: 确保设备连接正常
|
||||
|
||||
### 注意事项
|
||||
|
||||
1. AIQuery 需要网络连接来访问 LLM 服务
|
||||
2. 查询结果的准确性依赖于所使用的 LLM 模型
|
||||
3. 建议在查询中使用具体、明确的描述以获得更好的结果
|
||||
4. 对于复杂的信息提取,可以要求返回 JSON 格式的结构化数据
|
||||
|
||||
## 完整示例
|
||||
|
||||
以下是一个完整的 AIQuery 使用示例:
|
||||
|
||||
```go
|
||||
func TestAIQuery(t *testing.T) {
|
||||
testCase := &hrp.TestCase{
|
||||
Config: hrp.NewConfig("AIQuery Demo").
|
||||
SetLLMService(option.OPENAI_GPT_4O),
|
||||
TestSteps: []hrp.IStep{
|
||||
hrp.NewStep("Take Screenshot").
|
||||
Android().
|
||||
ScreenShot(),
|
||||
hrp.NewStep("Query Screen Content").
|
||||
Android().
|
||||
AIQuery("Please describe what is displayed on the screen and identify any interactive elements"),
|
||||
hrp.NewStep("Extract App Information").
|
||||
Android().
|
||||
AIQuery("What apps are visible on the screen? List them as a comma-separated string"),
|
||||
hrp.NewStep("Analyze UI Elements").
|
||||
Android().
|
||||
AIQuery("Are there any buttons or clickable elements visible? Describe their locations and purposes"),
|
||||
},
|
||||
}
|
||||
|
||||
err := hrp.NewRunner(t).Run(testCase)
|
||||
assert.Nil(t, err)
|
||||
}
|
||||
```
|
||||
@@ -1 +1 @@
|
||||
v5.0.0-beta-2506121751
|
||||
v5.0.0-beta-2506131027
|
||||
|
||||
20
step_ui.go
20
step_ui.go
@@ -201,6 +201,18 @@ func (s *StepMobile) AIAction(prompt string, opts ...option.ActionOption) *StepM
|
||||
return s
|
||||
}
|
||||
|
||||
// AIQuery query information from screen using VLM
|
||||
func (s *StepMobile) AIQuery(prompt string, opts ...option.ActionOption) *StepMobile {
|
||||
action := option.MobileAction{
|
||||
Method: option.ACTION_Query,
|
||||
Params: prompt,
|
||||
Options: option.NewActionOptions(opts...),
|
||||
}
|
||||
|
||||
s.obj().Actions = append(s.obj().Actions, action)
|
||||
return s
|
||||
}
|
||||
|
||||
// DoubleTapXY double taps the point {X,Y}, X & Y is percentage of coordinates
|
||||
func (s *StepMobile) DoubleTapXY(x, y float64, opts ...option.ActionOption) *StepMobile {
|
||||
s.obj().Actions = append(s.obj().Actions, option.MobileAction{
|
||||
@@ -863,11 +875,15 @@ func runStepMobileUI(s *SessionRunner, step IStep) (stepResult *StepResult, err
|
||||
action.Method == option.ACTION_AIAssert || action.Method == option.ACTION_Query {
|
||||
if config.LLMService != "" && action.Options.LLMService == "" {
|
||||
action.Options.LLMService = string(config.LLMService)
|
||||
log.Debug().Str("action", string(action.Method)).Str("llmService", action.Options.LLMService).Msg("Applied global LLM service config to action")
|
||||
log.Debug().Str("action", string(action.Method)).
|
||||
Str("llmService", action.Options.LLMService).
|
||||
Msg("Applied global LLM service config to action")
|
||||
}
|
||||
if config.CVService != "" && action.Options.CVService == "" {
|
||||
action.Options.CVService = string(config.CVService)
|
||||
log.Debug().Str("action", string(action.Method)).Str("cvService", action.Options.CVService).Msg("Applied global CV service config to action")
|
||||
log.Debug().Str("action", string(action.Method)).
|
||||
Str("cvService", action.Options.CVService).
|
||||
Msg("Applied global CV service config to action")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -11,6 +11,35 @@ import (
|
||||
"github.com/stretchr/testify/require"
|
||||
)
|
||||
|
||||
// GameInfo 定义游戏界面分析的输出格式
|
||||
type GameInfo struct {
|
||||
Content string `json:"content"` // 必须:人类可读描述
|
||||
Thought string `json:"thought"` // 必须:AI推理过程
|
||||
GameType string `json:"game_type"` // 游戏类型
|
||||
Rows int `json:"rows"` // 行数
|
||||
Cols int `json:"cols"` // 列数
|
||||
Icons []string `json:"icons"` // 图标类型
|
||||
TotalIcons int `json:"total_icons"` // 图标总数
|
||||
}
|
||||
|
||||
// UIElementInfo 定义UI元素分析的输出格式
|
||||
type UIElementInfo struct {
|
||||
Content string `json:"content"` // 必须:人类可读描述
|
||||
Thought string `json:"thought"` // 必须:AI推理过程
|
||||
ScreenType string `json:"screen_type"` // 屏幕类型
|
||||
Elements []UIElement `json:"elements"` // UI元素列表
|
||||
ButtonCount int `json:"button_count"` // 按钮数量
|
||||
TextCount int `json:"text_count"` // 文本数量
|
||||
}
|
||||
|
||||
// UIElement 定义单个UI元素
|
||||
type UIElement struct {
|
||||
Type string `json:"type"` // 元素类型 (button, text, input等)
|
||||
Text string `json:"text"` // 元素文本
|
||||
Clickable bool `json:"clickable"` // 是否可点击
|
||||
Description string `json:"description"` // 元素描述
|
||||
}
|
||||
|
||||
func TestIOSSettingsAction(t *testing.T) {
|
||||
testCase := &hrp.TestCase{
|
||||
Config: hrp.NewConfig("ios ui action on Settings").
|
||||
@@ -173,3 +202,92 @@ func TestAIAction(t *testing.T) {
|
||||
err := hrp.NewRunner(t).Run(testCase)
|
||||
assert.Nil(t, err)
|
||||
}
|
||||
|
||||
func TestAIQuery(t *testing.T) {
|
||||
testCase := &hrp.TestCase{
|
||||
Config: hrp.NewConfig("AIQuery Demo with OutputSchema").
|
||||
SetLLMService(option.DOUBAO_SEED_1_6_250615), // Configure LLM service for AI operations
|
||||
TestSteps: []hrp.IStep{
|
||||
// Step 1: Take a screenshot for analysis
|
||||
hrp.NewStep("Take Screenshot").
|
||||
Android().
|
||||
ScreenShot(),
|
||||
|
||||
// Step 2: Basic AIQuery without OutputSchema
|
||||
hrp.NewStep("Basic Query").
|
||||
Android().
|
||||
AIQuery("Please describe what is displayed on the screen and identify any interactive elements"),
|
||||
|
||||
// Step 3: Use AIQuery to extract specific information
|
||||
hrp.NewStep("Extract App Information").
|
||||
Android().
|
||||
AIQuery("What apps are visible on the screen? List them as a comma-separated string"),
|
||||
|
||||
// Step 4: Use AIQuery for UI element analysis
|
||||
hrp.NewStep("Analyze UI Elements").
|
||||
Android().
|
||||
AIQuery("Are there any buttons or clickable elements visible? Describe their locations and purposes"),
|
||||
|
||||
// Step 5: Use AIQuery with validation
|
||||
hrp.NewStep("Query and Validate").
|
||||
Android().
|
||||
AIQuery("Is the home screen currently displayed?").
|
||||
Validate().
|
||||
AssertAI("The query result should indicate whether home screen is visible"),
|
||||
|
||||
// Step 6: Use AIQuery with simple custom OutputSchema
|
||||
hrp.NewStep("Query with Simple Custom Schema").
|
||||
Android().
|
||||
AIQuery("Analyze the screen and provide structured information about UI elements",
|
||||
option.WithOutputSchema(struct {
|
||||
Content string `json:"content"`
|
||||
Thought string `json:"thought"`
|
||||
ElementType string `json:"element_type"`
|
||||
ElementText []string `json:"element_text"`
|
||||
ButtonCount int `json:"button_count"`
|
||||
}{})),
|
||||
|
||||
// Step 7: Use AIQuery with GameInfo OutputSchema
|
||||
hrp.NewStep("Game Analysis with Custom Schema").
|
||||
Android().
|
||||
AIQuery("分析这个游戏界面,告诉我游戏类型、行列数和图标信息",
|
||||
option.WithOutputSchema(GameInfo{})),
|
||||
|
||||
// Step 8: Use AIQuery with UIElementInfo OutputSchema
|
||||
hrp.NewStep("UI Element Analysis with Custom Schema").
|
||||
Android().
|
||||
AIQuery("分析屏幕上的UI元素,识别所有按钮、文本和可交互元素",
|
||||
option.WithOutputSchema(UIElementInfo{})),
|
||||
|
||||
// Step 9: Complex analysis with nested structure
|
||||
hrp.NewStep("Complex Analysis with Nested Schema").
|
||||
Android().
|
||||
AIQuery("Provide a comprehensive analysis of this interface including all interactive elements and their properties",
|
||||
option.WithOutputSchema(struct {
|
||||
Content string `json:"content"`
|
||||
Thought string `json:"thought"`
|
||||
AppName string `json:"app_name"`
|
||||
ScreenTitle string `json:"screen_title"`
|
||||
MainActions []struct {
|
||||
Name string `json:"name"`
|
||||
Description string `json:"description"`
|
||||
Available bool `json:"available"`
|
||||
} `json:"main_actions"`
|
||||
NavigationElements []struct {
|
||||
Type string `json:"type"`
|
||||
Label string `json:"label"`
|
||||
Position string `json:"position"`
|
||||
} `json:"navigation_elements"`
|
||||
ContentSummary struct {
|
||||
HasImages bool `json:"has_images"`
|
||||
HasText bool `json:"has_text"`
|
||||
HasForms bool `json:"has_forms"`
|
||||
Keywords []string `json:"keywords"`
|
||||
} `json:"content_summary"`
|
||||
}{})),
|
||||
},
|
||||
}
|
||||
|
||||
err := hrp.NewRunner(t).Run(testCase)
|
||||
assert.Nil(t, err)
|
||||
}
|
||||
|
||||
@@ -322,7 +322,37 @@ type SessionData struct {
|
||||
}
|
||||
|
||||
func (dExt *XTDriver) AIQuery(text string, opts ...option.ActionOption) (string, error) {
|
||||
return "", nil
|
||||
if dExt.LLMService == nil {
|
||||
return "", errors.New("LLM service is not initialized")
|
||||
}
|
||||
|
||||
screenShotBase64, err := GetScreenShotBufferBase64(dExt.IDriver)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
// get window size
|
||||
size, err := dExt.IDriver.WindowSize()
|
||||
if err != nil {
|
||||
return "", errors.Wrap(err, "get window size for AI query failed")
|
||||
}
|
||||
|
||||
// parse action options to extract OutputSchema
|
||||
actionOptions := option.NewActionOptions(opts...)
|
||||
|
||||
// execute query
|
||||
queryOpts := &ai.QueryOptions{
|
||||
Query: text,
|
||||
Screenshot: screenShotBase64,
|
||||
Size: size,
|
||||
OutputSchema: actionOptions.OutputSchema,
|
||||
}
|
||||
result, err := dExt.LLMService.Query(context.Background(), queryOpts)
|
||||
if err != nil {
|
||||
return "", errors.Wrap(err, "AI query failed")
|
||||
}
|
||||
|
||||
return result.Content, nil
|
||||
}
|
||||
|
||||
func (dExt *XTDriver) AIAssert(assertion string, opts ...option.ActionOption) error {
|
||||
|
||||
@@ -127,6 +127,7 @@ func (s *MCPServer4XTDriver) registerTools() {
|
||||
// AI Tools
|
||||
s.registerTool(&ToolStartToGoal{})
|
||||
s.registerTool(&ToolAIAction{})
|
||||
s.registerTool(&ToolAIQuery{})
|
||||
s.registerTool(&ToolFinished{})
|
||||
}
|
||||
|
||||
|
||||
@@ -115,6 +115,7 @@ func TestToolInterfaces(t *testing.T) {
|
||||
&ToolSecondaryClickBySelector{},
|
||||
&ToolWebCloseTab{},
|
||||
&ToolAIAction{},
|
||||
&ToolAIQuery{},
|
||||
&ToolFinished{},
|
||||
}
|
||||
|
||||
@@ -1308,6 +1309,39 @@ func TestToolAIAction(t *testing.T) {
|
||||
assert.Error(t, err)
|
||||
}
|
||||
|
||||
// TestToolAIQuery tests the ToolAIQuery implementation
|
||||
func TestToolAIQuery(t *testing.T) {
|
||||
tool := &ToolAIQuery{}
|
||||
|
||||
// Test Name
|
||||
assert.Equal(t, option.ACTION_Query, tool.Name())
|
||||
|
||||
// Test Description
|
||||
assert.NotEmpty(t, tool.Description())
|
||||
|
||||
// Test Options
|
||||
options := tool.Options()
|
||||
assert.NotNil(t, options)
|
||||
|
||||
// Test ConvertActionToCallToolRequest with valid params
|
||||
action := option.MobileAction{
|
||||
Method: option.ACTION_Query,
|
||||
Params: "What is displayed on the screen?",
|
||||
}
|
||||
request, err := tool.ConvertActionToCallToolRequest(action)
|
||||
assert.NoError(t, err)
|
||||
assert.Equal(t, string(option.ACTION_Query), request.Params.Name)
|
||||
assert.Equal(t, "What is displayed on the screen?", request.Params.Arguments["prompt"])
|
||||
|
||||
// Test ConvertActionToCallToolRequest with invalid params
|
||||
invalidAction := option.MobileAction{
|
||||
Method: option.ACTION_Query,
|
||||
Params: 123, // should be string
|
||||
}
|
||||
_, err = tool.ConvertActionToCallToolRequest(invalidAction)
|
||||
assert.Error(t, err)
|
||||
}
|
||||
|
||||
// TestToolFinished tests the ToolFinished implementation
|
||||
func TestToolFinished(t *testing.T) {
|
||||
tool := &ToolFinished{}
|
||||
|
||||
@@ -130,6 +130,71 @@ func (t *ToolAIAction) ConvertActionToCallToolRequest(action option.MobileAction
|
||||
return mcp.CallToolRequest{}, fmt.Errorf("invalid AI action params: %v", action.Params)
|
||||
}
|
||||
|
||||
// ToolAIQuery implements the ai_query tool call.
|
||||
type ToolAIQuery struct {
|
||||
// Return data fields - these define the structure of data returned by this tool
|
||||
Prompt string `json:"prompt" desc:"AI query prompt that was executed"`
|
||||
Result string `json:"result" desc:"Query result content"`
|
||||
}
|
||||
|
||||
func (t *ToolAIQuery) Name() option.ActionName {
|
||||
return option.ACTION_Query
|
||||
}
|
||||
|
||||
func (t *ToolAIQuery) Description() string {
|
||||
return "Query information from screen using AI vision model with natural language prompts"
|
||||
}
|
||||
|
||||
func (t *ToolAIQuery) Options() []mcp.ToolOption {
|
||||
unifiedReq := &option.ActionOptions{}
|
||||
return unifiedReq.GetMCPOptions(option.ACTION_Query)
|
||||
}
|
||||
|
||||
func (t *ToolAIQuery) Implement() server.ToolHandlerFunc {
|
||||
return func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
|
||||
driverExt, err := setupXTDriver(ctx, request.Params.Arguments)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("setup driver failed: %w", err)
|
||||
}
|
||||
|
||||
unifiedReq, err := parseActionOptions(request.Params.Arguments)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
// Build action options from unified request
|
||||
opts := unifiedReq.Options()
|
||||
|
||||
// AI query logic with options
|
||||
result, err := driverExt.AIQuery(unifiedReq.Prompt, opts...)
|
||||
if err != nil {
|
||||
return NewMCPErrorResponse(fmt.Sprintf("AI query failed: %s", err.Error())), nil
|
||||
}
|
||||
|
||||
message := fmt.Sprintf("Successfully queried information with prompt: %s", unifiedReq.Prompt)
|
||||
returnData := ToolAIQuery{
|
||||
Prompt: unifiedReq.Prompt,
|
||||
Result: result,
|
||||
}
|
||||
|
||||
return NewMCPSuccessResponse(message, &returnData), nil
|
||||
}
|
||||
}
|
||||
|
||||
func (t *ToolAIQuery) ConvertActionToCallToolRequest(action option.MobileAction) (mcp.CallToolRequest, error) {
|
||||
if prompt, ok := action.Params.(string); ok {
|
||||
arguments := map[string]any{
|
||||
"prompt": prompt,
|
||||
}
|
||||
|
||||
// Extract options to arguments
|
||||
extractActionOptionsToArguments(action.GetOptions(), arguments)
|
||||
|
||||
return buildMCPCallToolRequest(t.Name(), arguments), nil
|
||||
}
|
||||
return mcp.CallToolRequest{}, fmt.Errorf("invalid AI query params: %v", action.Params)
|
||||
}
|
||||
|
||||
// ToolFinished implements the finished tool call.
|
||||
type ToolFinished struct {
|
||||
// Return data fields - these define the structure of data returned by this tool
|
||||
|
||||
@@ -184,11 +184,12 @@ type ActionOptions struct {
|
||||
Params []float64 `json:"params,omitempty" yaml:"params,omitempty" desc:"Generic parameter array"`
|
||||
|
||||
// AI related
|
||||
Prompt string `json:"prompt,omitempty" yaml:"prompt,omitempty" desc:"AI action prompt"`
|
||||
Content string `json:"content,omitempty" yaml:"content,omitempty" desc:"Content for finished action"`
|
||||
LLMService string `json:"llm_service,omitempty" yaml:"llm_service,omitempty" desc:"LLM service type for AI actions"`
|
||||
CVService string `json:"cv_service,omitempty" yaml:"cv_service,omitempty" desc:"Computer vision service type for AI actions"`
|
||||
ResetHistory bool `json:"reset_history,omitempty" yaml:"reset_history,omitempty" desc:"Whether to reset conversation history before AI planning"`
|
||||
Prompt string `json:"prompt,omitempty" yaml:"prompt,omitempty" desc:"AI action prompt"`
|
||||
Content string `json:"content,omitempty" yaml:"content,omitempty" desc:"Content for finished action"`
|
||||
LLMService string `json:"llm_service,omitempty" yaml:"llm_service,omitempty" desc:"LLM service type for AI actions"`
|
||||
CVService string `json:"cv_service,omitempty" yaml:"cv_service,omitempty" desc:"Computer vision service type for AI actions"`
|
||||
ResetHistory bool `json:"reset_history,omitempty" yaml:"reset_history,omitempty" desc:"Whether to reset conversation history before AI planning"`
|
||||
OutputSchema interface{} `json:"output_schema,omitempty" yaml:"output_schema,omitempty" desc:"Custom output schema for structured AI query response"`
|
||||
|
||||
// Time related
|
||||
Seconds float64 `json:"seconds,omitempty" yaml:"seconds,omitempty" desc:"Sleep duration in seconds"`
|
||||
@@ -558,6 +559,13 @@ func WithResetHistory(resetHistory bool) ActionOption {
|
||||
}
|
||||
}
|
||||
|
||||
// WithOutputSchema sets the custom output schema for structured AI query response
|
||||
func WithOutputSchema(schema interface{}) ActionOption {
|
||||
return func(o *ActionOptions) {
|
||||
o.OutputSchema = schema
|
||||
}
|
||||
}
|
||||
|
||||
// HTTP API direct usage methods
|
||||
|
||||
// ValidateForHTTPAPI validates the request for HTTP API usage
|
||||
@@ -700,6 +708,9 @@ func (o *ActionOptions) validateActionSpecificFields(actionType ActionName) erro
|
||||
ACTION_StartToGoal: func() error {
|
||||
return o.requireFields("prompt", o.Prompt != "")
|
||||
},
|
||||
ACTION_Query: func() error {
|
||||
return o.requireFields("prompt", o.Prompt != "")
|
||||
},
|
||||
ACTION_Finished: func() error {
|
||||
return o.requireFields("content", o.Content != "")
|
||||
},
|
||||
@@ -774,6 +785,8 @@ func (o *ActionOptions) GetMCPOptions(actionType ActionName) []mcp.ToolOption {
|
||||
ACTION_SleepRandom: {"platform", "serial", "params"},
|
||||
ACTION_AIAction: {"platform", "serial", "prompt", "llm_service", "cv_service"},
|
||||
ACTION_StartToGoal: {"platform", "serial", "prompt", "llm_service", "cv_service"},
|
||||
ACTION_Query: {"platform", "serial", "prompt", "llm_service", "cv_service", "output_schema"},
|
||||
ACTION_AIAssert: {"platform", "serial", "prompt", "llm_service", "cv_service"},
|
||||
ACTION_Finished: {"content"},
|
||||
ACTION_ListAvailableDevices: {},
|
||||
ACTION_SelectDevice: {"platform", "serial"},
|
||||
@@ -862,7 +875,15 @@ func (o *ActionOptions) generateMCPOptionsForFields(fields []string) []mcp.ToolO
|
||||
}
|
||||
}
|
||||
case reflect.Map, reflect.Interface:
|
||||
// Skip map and interface types for now
|
||||
// Handle OutputSchema as object type
|
||||
if name == "output_schema" {
|
||||
if required {
|
||||
options = append(options, mcp.WithObject(name, mcp.Required(), mcp.Description(desc)))
|
||||
} else {
|
||||
options = append(options, mcp.WithObject(name, mcp.Description(desc)))
|
||||
}
|
||||
}
|
||||
// Skip other map and interface types for now
|
||||
continue
|
||||
default:
|
||||
log.Warn().Str("field_type", fieldType.String()).Msg("Unsupported field type")
|
||||
|
||||
Reference in New Issue
Block a user