feat: 实现 AIQuery 功能并支持 OutputSchema

- 新增 AIQuery 方法到 StepMobile，支持使用自然语言从屏幕中提取信息 - 实现 AIQuery 在 driver_ext_ai.go 中的完整功能，包括屏幕截图和 LLM 查询 - 添加 OutputSchema 支持，允许用户定义自定义输出格式进行结构化查询 - 新增 ToolAIQuery MCP 工具，完整集成到 MCP 服务器中 - 在 ActionOptions 中添加 OutputSchema 字段和 WithOutputSchema 选项函数 - 添加 ACTION_Query 的配置支持和字段映射 - 完善测试覆盖： * 添加 TestAIQuery 单元测试，包含多种 OutputSchema 使用场景 * 添加 TestToolAIQuery MCP 工具测试 * 定义 GameInfo、UIElementInfo 等结构体用于测试 - 更新文档： * 在 docs/uixt/ai.md 中添加完整的 AIQuery 使用指南 * 包含基本用法、OutputSchema 示例、最佳实践等 - 支持复杂的嵌套结构体和数组类型的 OutputSchema - 与现有 AIAction、AIAssert 功能保持一致的 API 设计
2026-05-07 05:42:46 +08:00 · 2025-06-12 23:12:25 +08:00
parent fb0418fa95
commit f6e7e970f8
9 changed files with 502 additions and 11 deletions
--- a/docs/uixt/ai-service.md
+++ b/docs/uixt/ai-service.md
@@ -508,4 +508,210 @@ type Element struct {
   queryResult, err := driver.LLMService.Query(ctx, queryOpts)
   ```

-通过 HttpRunner UIXT AI 模块，您可以轻松实现智能化的 UI 自动化测试，大幅提升测试效率和准确性。
+通过 HttpRunner UIXT AI 模块，您可以轻松实现智能化的 UI 自动化测试，大幅提升测试效率和准确性。
+
+# AI 功能使用指南
+
+HttpRunner v5 提供了强大的 AI 功能，支持基于视觉语言模型（VLM）的智能化测试操作。
+
+## 功能概述
+
+HttpRunner v5 集成了多种 AI 功能：
+
+- **AIAction**: 使用自然语言执行 UI 操作
+- **AIAssert**: 使用自然语言进行断言验证
+- **AIQuery**: 使用自然语言从屏幕中提取信息
+- **StartToGoal**: 目标导向的智能操作序列
+
+## AIQuery 功能详解
+
+### 概述
+
+AIQuery 是 HttpRunner v5 中新增的 AI 查询功能，允许用户使用自然语言从屏幕截图中提取信息。它基于视觉语言模型（VLM），能够理解屏幕内容并返回结构化的查询结果。
+
+### 功能特点
+
+- **自然语言查询**: 使用自然语言描述要查询的信息
+- **智能屏幕分析**: 基于 AI 视觉模型分析屏幕内容
+- **结构化输出**: 返回格式化的查询结果
+- **多平台支持**: 支持 Android、iOS、Browser 等平台
+
+### 基本用法
+
+#### 1. 在测试步骤中使用 AIQuery
+
+```go
+// 基本查询示例
+hrp.NewStep("Query Screen Content").
+    Android().
+    AIQuery("Please describe what is displayed on the screen")
+
+// 提取特定信息
+hrp.NewStep("Extract App List").
+    Android().
+    AIQuery("What apps are visible on the home screen? List them as a comma-separated string")
+
+// UI 元素分析
+hrp.NewStep("Analyze Buttons").
+    Android().
+    AIQuery("Are there any buttons visible? Describe their text and positions")
+```
+
+#### 2. 配置 LLM 服务
+
+在使用 AIQuery 之前，需要配置 LLM 服务：
+
+```go
+testcase := &hrp.TestCase{
+    Config: hrp.NewConfig("AIQuery Test").
+        SetLLMService(option.OPENAI_GPT_4O), // 配置 LLM 服务
+    TestSteps: []hrp.IStep{
+        // 使用 AIQuery 的步骤
+    },
+}
+```
+
+#### 3. 支持的选项
+
+AIQuery 支持以下选项：
+
+```go
+hrp.NewStep("Query with Options").
+    Android().
+    AIQuery("Describe the screen content",
+        option.WithLLMService("openai_gpt_4o"),  // 指定 LLM 服务
+        option.WithCVService("openai_gpt_4o"),   // 指定 CV 服务
+        option.WithOutputSchema(CustomSchema{}), // 自定义输出格式
+    )
+```
+
+#### 4. 自定义输出格式 (OutputSchema)
+
+AIQuery 支持自定义输出格式，可以返回结构化数据：
+
+```go
+// 定义自定义输出格式
+type GameAnalysis struct {
+    Content     string   `json:"content"`     // 必须：人类可读描述
+    Thought     string   `json:"thought"`     // 必须：AI推理过程
+    GameType    string   `json:"game_type"`   // 游戏类型
+    Rows        int      `json:"rows"`        // 行数
+    Cols        int      `json:"cols"`        // 列数
+    Icons       []string `json:"icons"`       // 图标类型
+    TotalIcons  int      `json:"total_icons"` // 图标总数
+}
+
+// 使用自定义格式查询
+hrp.NewStep("Analyze Game Interface").
+    Android().
+    AIQuery("分析这个连连看游戏界面，告诉我有多少行多少列，有哪些不同类型的图案",
+        option.WithOutputSchema(GameAnalysis{}))
+```
+
+### 实际应用场景
+
+#### 1. 游戏界面分析
+
+```go
+// 分析连连看游戏界面
+hrp.NewStep("Analyze Game Board").
+    Android().
+    AIQuery("This is a LianLianKan (连连看) game interface. Please analyze: 1) How many rows and columns are there? 2) What types of icons are present?")
+```
+
+#### 2. 应用状态检查
+
+```go
+// 检查应用状态
+hrp.NewStep("Check App State").
+    Android().
+    AIQuery("Is the login screen displayed? Are there any error messages visible?")
+```
+
+#### 3. 内容提取
+
+```go
+// 提取列表内容
+hrp.NewStep("Extract List Items").
+    Android().
+    AIQuery("Extract all items from the list displayed on screen as a JSON array")
+```
+
+### 与其他 AI 功能的对比
+
+| 功能 | 用途 | 返回值 | 使用场景 |
+|------|------|--------|----------|
+| AIAction | 执行操作 | 无 | 点击、输入、滑动等交互操作 |
+| AIAssert | 断言验证 | 布尔值 | 验证界面状态、元素存在性 |
+| AIQuery | 信息查询 | 字符串 | 提取屏幕信息、分析内容 |
+
+### 最佳实践
+
+#### 1. 明确的查询描述
+
+```go
+// 好的示例：具体明确
+AIQuery("How many unread messages are shown in the notification badge?")
+
+// 避免：过于模糊
+AIQuery("Tell me about the screen")
+```
+
+#### 2. 结构化查询
+
+```go
+// 请求结构化输出
+AIQuery("List all visible buttons with their text and approximate positions in JSON format")
+```
+
+#### 3. 上下文相关查询
+
+```go
+// 结合应用上下文
+AIQuery("In this shopping app, what products are displayed in the current category? Include product names and prices")
+```
+
+### 错误处理
+
+AIQuery 可能遇到的常见错误：
+
+1. **LLM 服务未配置**: 确保在测试配置中设置了 LLM 服务
+2. **网络连接问题**: 检查网络连接和 API 密钥配置
+3. **屏幕截图失败**: 确保设备连接正常
+
+### 注意事项
+
+1. AIQuery 需要网络连接来访问 LLM 服务
+2. 查询结果的准确性依赖于所使用的 LLM 模型
+3. 建议在查询中使用具体、明确的描述以获得更好的结果
+4. 对于复杂的信息提取，可以要求返回 JSON 格式的结构化数据
+
+## 完整示例
+
+以下是一个完整的 AIQuery 使用示例：
+
+```go
+func TestAIQuery(t *testing.T) {
+    testCase := &hrp.TestCase{
+        Config: hrp.NewConfig("AIQuery Demo").
+            SetLLMService(option.OPENAI_GPT_4O),
+        TestSteps: []hrp.IStep{
+            hrp.NewStep("Take Screenshot").
+                Android().
+                ScreenShot(),
+            hrp.NewStep("Query Screen Content").
+                Android().
+                AIQuery("Please describe what is displayed on the screen and identify any interactive elements"),
+            hrp.NewStep("Extract App Information").
+                Android().
+                AIQuery("What apps are visible on the screen? List them as a comma-separated string"),
+            hrp.NewStep("Analyze UI Elements").
+                Android().
+                AIQuery("Are there any buttons or clickable elements visible? Describe their locations and purposes"),
+        },
+    }
+
+    err := hrp.NewRunner(t).Run(testCase)
+    assert.Nil(t, err)
+}
+```
--- a/internal/version/VERSION
+++ b/internal/version/VERSION
@@ -1 +1 @@
-v5.0.0-beta-2506121751
+v5.0.0-beta-2506131027
--- a/step_ui.go
+++ b/step_ui.go
@@ -201,6 +201,18 @@ func (s *StepMobile) AIAction(prompt string, opts ...option.ActionOption) *StepM
 	return s
 }

+// AIQuery query information from screen using VLM
+func (s *StepMobile) AIQuery(prompt string, opts ...option.ActionOption) *StepMobile {
+	action := option.MobileAction{
+		Method:  option.ACTION_Query,
+		Params:  prompt,
+		Options: option.NewActionOptions(opts...),
+	}
+
+	s.obj().Actions = append(s.obj().Actions, action)
+	return s
+}
+
 // DoubleTapXY double taps the point {X,Y}, X & Y is percentage of coordinates
 func (s *StepMobile) DoubleTapXY(x, y float64, opts ...option.ActionOption) *StepMobile {
 	s.obj().Actions = append(s.obj().Actions, option.MobileAction{
@@ -863,11 +875,15 @@ func runStepMobileUI(s *SessionRunner, step IStep) (stepResult *StepResult, err
 						action.Method == option.ACTION_AIAssert || action.Method == option.ACTION_Query {
 						if config.LLMService != "" && action.Options.LLMService == "" {
 							action.Options.LLMService = string(config.LLMService)
-							log.Debug().Str("action", string(action.Method)).Str("llmService", action.Options.LLMService).Msg("Applied global LLM service config to action")
+							log.Debug().Str("action", string(action.Method)).
+								Str("llmService", action.Options.LLMService).
+								Msg("Applied global LLM service config to action")
 						}
 						if config.CVService != "" && action.Options.CVService == "" {
 							action.Options.CVService = string(config.CVService)
-							log.Debug().Str("action", string(action.Method)).Str("cvService", action.Options.CVService).Msg("Applied global CV service config to action")
+							log.Debug().Str("action", string(action.Method)).
+								Str("cvService", action.Options.CVService).
+								Msg("Applied global CV service config to action")
 						}
 					}
 				}
--- a/tests/step_ui_test.go
+++ b/tests/step_ui_test.go
@@ -11,6 +11,35 @@ import (
 	"github.com/stretchr/testify/require"
 )

+// GameInfo 定义游戏界面分析的输出格式
+type GameInfo struct {
+	Content    string   `json:"content"`     // 必须：人类可读描述
+	Thought    string   `json:"thought"`     // 必须：AI推理过程
+	GameType   string   `json:"game_type"`   // 游戏类型
+	Rows       int      `json:"rows"`        // 行数
+	Cols       int      `json:"cols"`        // 列数
+	Icons      []string `json:"icons"`       // 图标类型
+	TotalIcons int      `json:"total_icons"` // 图标总数
+}
+
+// UIElementInfo 定义UI元素分析的输出格式
+type UIElementInfo struct {
+	Content     string      `json:"content"`      // 必须：人类可读描述
+	Thought     string      `json:"thought"`      // 必须：AI推理过程
+	ScreenType  string      `json:"screen_type"`  // 屏幕类型
+	Elements    []UIElement `json:"elements"`     // UI元素列表
+	ButtonCount int         `json:"button_count"` // 按钮数量
+	TextCount   int         `json:"text_count"`   // 文本数量
+}
+
+// UIElement 定义单个UI元素
+type UIElement struct {
+	Type        string `json:"type"`        // 元素类型 (button, text, input等)
+	Text        string `json:"text"`        // 元素文本
+	Clickable   bool   `json:"clickable"`   // 是否可点击
+	Description string `json:"description"` // 元素描述
+}
+
 func TestIOSSettingsAction(t *testing.T) {
 	testCase := &hrp.TestCase{
 		Config: hrp.NewConfig("ios ui action on Settings").
@@ -173,3 +202,92 @@ func TestAIAction(t *testing.T) {
 	err := hrp.NewRunner(t).Run(testCase)
 	assert.Nil(t, err)
 }
+
+func TestAIQuery(t *testing.T) {
+	testCase := &hrp.TestCase{
+		Config: hrp.NewConfig("AIQuery Demo with OutputSchema").
+			SetLLMService(option.DOUBAO_SEED_1_6_250615), // Configure LLM service for AI operations
+		TestSteps: []hrp.IStep{
+			// Step 1: Take a screenshot for analysis
+			hrp.NewStep("Take Screenshot").
+				Android().
+				ScreenShot(),
+
+			// Step 2: Basic AIQuery without OutputSchema
+			hrp.NewStep("Basic Query").
+				Android().
+				AIQuery("Please describe what is displayed on the screen and identify any interactive elements"),
+
+			// Step 3: Use AIQuery to extract specific information
+			hrp.NewStep("Extract App Information").
+				Android().
+				AIQuery("What apps are visible on the screen? List them as a comma-separated string"),
+
+			// Step 4: Use AIQuery for UI element analysis
+			hrp.NewStep("Analyze UI Elements").
+				Android().
+				AIQuery("Are there any buttons or clickable elements visible? Describe their locations and purposes"),
+
+			// Step 5: Use AIQuery with validation
+			hrp.NewStep("Query and Validate").
+				Android().
+				AIQuery("Is the home screen currently displayed?").
+				Validate().
+				AssertAI("The query result should indicate whether home screen is visible"),
+
+			// Step 6: Use AIQuery with simple custom OutputSchema
+			hrp.NewStep("Query with Simple Custom Schema").
+				Android().
+				AIQuery("Analyze the screen and provide structured information about UI elements",
+					option.WithOutputSchema(struct {
+						Content     string   `json:"content"`
+						Thought     string   `json:"thought"`
+						ElementType string   `json:"element_type"`
+						ElementText []string `json:"element_text"`
+						ButtonCount int      `json:"button_count"`
+					}{})),
+
+			// Step 7: Use AIQuery with GameInfo OutputSchema
+			hrp.NewStep("Game Analysis with Custom Schema").
+				Android().
+				AIQuery("分析这个游戏界面，告诉我游戏类型、行列数和图标信息",
+					option.WithOutputSchema(GameInfo{})),
+
+			// Step 8: Use AIQuery with UIElementInfo OutputSchema
+			hrp.NewStep("UI Element Analysis with Custom Schema").
+				Android().
+				AIQuery("分析屏幕上的UI元素，识别所有按钮、文本和可交互元素",
+					option.WithOutputSchema(UIElementInfo{})),
+
+			// Step 9: Complex analysis with nested structure
+			hrp.NewStep("Complex Analysis with Nested Schema").
+				Android().
+				AIQuery("Provide a comprehensive analysis of this interface including all interactive elements and their properties",
+					option.WithOutputSchema(struct {
+						Content     string `json:"content"`
+						Thought     string `json:"thought"`
+						AppName     string `json:"app_name"`
+						ScreenTitle string `json:"screen_title"`
+						MainActions []struct {
+							Name        string `json:"name"`
+							Description string `json:"description"`
+							Available   bool   `json:"available"`
+						} `json:"main_actions"`
+						NavigationElements []struct {
+							Type     string `json:"type"`
+							Label    string `json:"label"`
+							Position string `json:"position"`
+						} `json:"navigation_elements"`
+						ContentSummary struct {
+							HasImages bool     `json:"has_images"`
+							HasText   bool     `json:"has_text"`
+							HasForms  bool     `json:"has_forms"`
+							Keywords  []string `json:"keywords"`
+						} `json:"content_summary"`
+					}{})),
+		},
+	}
+
+	err := hrp.NewRunner(t).Run(testCase)
+	assert.Nil(t, err)
+}
--- a/uixt/driver_ext_ai.go
+++ b/uixt/driver_ext_ai.go
@@ -322,7 +322,37 @@ type SessionData struct {
 }

 func (dExt *XTDriver) AIQuery(text string, opts ...option.ActionOption) (string, error) {
-	return "", nil
+	if dExt.LLMService == nil {
+		return "", errors.New("LLM service is not initialized")
+	}
+
+	screenShotBase64, err := GetScreenShotBufferBase64(dExt.IDriver)
+	if err != nil {
+		return "", err
+	}
+
+	// get window size
+	size, err := dExt.IDriver.WindowSize()
+	if err != nil {
+		return "", errors.Wrap(err, "get window size for AI query failed")
+	}
+
+	// parse action options to extract OutputSchema
+	actionOptions := option.NewActionOptions(opts...)
+
+	// execute query
+	queryOpts := &ai.QueryOptions{
+		Query:        text,
+		Screenshot:   screenShotBase64,
+		Size:         size,
+		OutputSchema: actionOptions.OutputSchema,
+	}
+	result, err := dExt.LLMService.Query(context.Background(), queryOpts)
+	if err != nil {
+		return "", errors.Wrap(err, "AI query failed")
+	}
+
+	return result.Content, nil
 }

 func (dExt *XTDriver) AIAssert(assertion string, opts ...option.ActionOption) error {
--- a/uixt/mcp_server.go
+++ b/uixt/mcp_server.go
@@ -127,6 +127,7 @@ func (s *MCPServer4XTDriver) registerTools() {
 	// AI Tools
 	s.registerTool(&ToolStartToGoal{})
 	s.registerTool(&ToolAIAction{})
+	s.registerTool(&ToolAIQuery{})
 	s.registerTool(&ToolFinished{})
 }

--- a/uixt/mcp_server_test.go
+++ b/uixt/mcp_server_test.go
@@ -115,6 +115,7 @@ func TestToolInterfaces(t *testing.T) {
 		&ToolSecondaryClickBySelector{},
 		&ToolWebCloseTab{},
 		&ToolAIAction{},
+		&ToolAIQuery{},
 		&ToolFinished{},
 	}

@@ -1308,6 +1309,39 @@ func TestToolAIAction(t *testing.T) {
 	assert.Error(t, err)
 }

+// TestToolAIQuery tests the ToolAIQuery implementation
+func TestToolAIQuery(t *testing.T) {
+	tool := &ToolAIQuery{}
+
+	// Test Name
+	assert.Equal(t, option.ACTION_Query, tool.Name())
+
+	// Test Description
+	assert.NotEmpty(t, tool.Description())
+
+	// Test Options
+	options := tool.Options()
+	assert.NotNil(t, options)
+
+	// Test ConvertActionToCallToolRequest with valid params
+	action := option.MobileAction{
+		Method: option.ACTION_Query,
+		Params: "What is displayed on the screen?",
+	}
+	request, err := tool.ConvertActionToCallToolRequest(action)
+	assert.NoError(t, err)
+	assert.Equal(t, string(option.ACTION_Query), request.Params.Name)
+	assert.Equal(t, "What is displayed on the screen?", request.Params.Arguments["prompt"])
+
+	// Test ConvertActionToCallToolRequest with invalid params
+	invalidAction := option.MobileAction{
+		Method: option.ACTION_Query,
+		Params: 123, // should be string
+	}
+	_, err = tool.ConvertActionToCallToolRequest(invalidAction)
+	assert.Error(t, err)
+}
+
 // TestToolFinished tests the ToolFinished implementation
 func TestToolFinished(t *testing.T) {
 	tool := &ToolFinished{}
--- a/uixt/mcp_tools_ai.go
+++ b/uixt/mcp_tools_ai.go
@@ -130,6 +130,71 @@ func (t *ToolAIAction) ConvertActionToCallToolRequest(action option.MobileAction
 	return mcp.CallToolRequest{}, fmt.Errorf("invalid AI action params: %v", action.Params)
 }

+// ToolAIQuery implements the ai_query tool call.
+type ToolAIQuery struct {
+	// Return data fields - these define the structure of data returned by this tool
+	Prompt string `json:"prompt" desc:"AI query prompt that was executed"`
+	Result string `json:"result" desc:"Query result content"`
+}
+
+func (t *ToolAIQuery) Name() option.ActionName {
+	return option.ACTION_Query
+}
+
+func (t *ToolAIQuery) Description() string {
+	return "Query information from screen using AI vision model with natural language prompts"
+}
+
+func (t *ToolAIQuery) Options() []mcp.ToolOption {
+	unifiedReq := &option.ActionOptions{}
+	return unifiedReq.GetMCPOptions(option.ACTION_Query)
+}
+
+func (t *ToolAIQuery) Implement() server.ToolHandlerFunc {
+	return func(ctx context.Context, request mcp.CallToolRequest) (*mcp.CallToolResult, error) {
+		driverExt, err := setupXTDriver(ctx, request.Params.Arguments)
+		if err != nil {
+			return nil, fmt.Errorf("setup driver failed: %w", err)
+		}
+
+		unifiedReq, err := parseActionOptions(request.Params.Arguments)
+		if err != nil {
+			return nil, err
+		}
+
+		// Build action options from unified request
+		opts := unifiedReq.Options()
+
+		// AI query logic with options
+		result, err := driverExt.AIQuery(unifiedReq.Prompt, opts...)
+		if err != nil {
+			return NewMCPErrorResponse(fmt.Sprintf("AI query failed: %s", err.Error())), nil
+		}
+
+		message := fmt.Sprintf("Successfully queried information with prompt: %s", unifiedReq.Prompt)
+		returnData := ToolAIQuery{
+			Prompt: unifiedReq.Prompt,
+			Result: result,
+		}
+
+		return NewMCPSuccessResponse(message, &returnData), nil
+	}
+}
+
+func (t *ToolAIQuery) ConvertActionToCallToolRequest(action option.MobileAction) (mcp.CallToolRequest, error) {
+	if prompt, ok := action.Params.(string); ok {
+		arguments := map[string]any{
+			"prompt": prompt,
+		}
+
+		// Extract options to arguments
+		extractActionOptionsToArguments(action.GetOptions(), arguments)
+
+		return buildMCPCallToolRequest(t.Name(), arguments), nil
+	}
+	return mcp.CallToolRequest{}, fmt.Errorf("invalid AI query params: %v", action.Params)
+}
+
 // ToolFinished implements the finished tool call.
 type ToolFinished struct {
 	// Return data fields - these define the structure of data returned by this tool
--- a/uixt/option/action.go
+++ b/uixt/option/action.go
@@ -184,11 +184,12 @@ type ActionOptions struct {
 	Params []float64 `json:"params,omitempty" yaml:"params,omitempty" desc:"Generic parameter array"`

 	// AI related
-	Prompt       string `json:"prompt,omitempty" yaml:"prompt,omitempty" desc:"AI action prompt"`
-	Content      string `json:"content,omitempty" yaml:"content,omitempty" desc:"Content for finished action"`
-	LLMService   string `json:"llm_service,omitempty" yaml:"llm_service,omitempty" desc:"LLM service type for AI actions"`
-	CVService    string `json:"cv_service,omitempty" yaml:"cv_service,omitempty" desc:"Computer vision service type for AI actions"`
-	ResetHistory bool   `json:"reset_history,omitempty" yaml:"reset_history,omitempty" desc:"Whether to reset conversation history before AI planning"`
+	Prompt       string      `json:"prompt,omitempty" yaml:"prompt,omitempty" desc:"AI action prompt"`
+	Content      string      `json:"content,omitempty" yaml:"content,omitempty" desc:"Content for finished action"`
+	LLMService   string      `json:"llm_service,omitempty" yaml:"llm_service,omitempty" desc:"LLM service type for AI actions"`
+	CVService    string      `json:"cv_service,omitempty" yaml:"cv_service,omitempty" desc:"Computer vision service type for AI actions"`
+	ResetHistory bool        `json:"reset_history,omitempty" yaml:"reset_history,omitempty" desc:"Whether to reset conversation history before AI planning"`
+	OutputSchema interface{} `json:"output_schema,omitempty" yaml:"output_schema,omitempty" desc:"Custom output schema for structured AI query response"`

 	// Time related
 	Seconds      float64 `json:"seconds,omitempty" yaml:"seconds,omitempty" desc:"Sleep duration in seconds"`
@@ -558,6 +559,13 @@ func WithResetHistory(resetHistory bool) ActionOption {
 	}
 }

+// WithOutputSchema sets the custom output schema for structured AI query response
+func WithOutputSchema(schema interface{}) ActionOption {
+	return func(o *ActionOptions) {
+		o.OutputSchema = schema
+	}
+}
+
 // HTTP API direct usage methods

 // ValidateForHTTPAPI validates the request for HTTP API usage
@@ -700,6 +708,9 @@ func (o *ActionOptions) validateActionSpecificFields(actionType ActionName) erro
 		ACTION_StartToGoal: func() error {
 			return o.requireFields("prompt", o.Prompt != "")
 		},
+		ACTION_Query: func() error {
+			return o.requireFields("prompt", o.Prompt != "")
+		},
 		ACTION_Finished: func() error {
 			return o.requireFields("content", o.Content != "")
 		},
@@ -774,6 +785,8 @@ func (o *ActionOptions) GetMCPOptions(actionType ActionName) []mcp.ToolOption {
 		ACTION_SleepRandom:              {"platform", "serial", "params"},
 		ACTION_AIAction:                 {"platform", "serial", "prompt", "llm_service", "cv_service"},
 		ACTION_StartToGoal:              {"platform", "serial", "prompt", "llm_service", "cv_service"},
+		ACTION_Query:                    {"platform", "serial", "prompt", "llm_service", "cv_service", "output_schema"},
+		ACTION_AIAssert:                 {"platform", "serial", "prompt", "llm_service", "cv_service"},
 		ACTION_Finished:                 {"content"},
 		ACTION_ListAvailableDevices:     {},
 		ACTION_SelectDevice:             {"platform", "serial"},
@@ -862,7 +875,15 @@ func (o *ActionOptions) generateMCPOptionsForFields(fields []string) []mcp.ToolO
 				}
 			}
 		case reflect.Map, reflect.Interface:
-			// Skip map and interface types for now
+			// Handle OutputSchema as object type
+			if name == "output_schema" {
+				if required {
+					options = append(options, mcp.WithObject(name, mcp.Required(), mcp.Description(desc)))
+				} else {
+					options = append(options, mcp.WithObject(name, mcp.Description(desc)))
+				}
+			}
+			// Skip other map and interface types for now
 			continue
 		default:
 			log.Warn().Str("field_type", fieldType.String()).Msg("Unsupported field type")