Files
httprunner/docs/uixt/mcp-tools.md
2025-06-11 14:57:08 +08:00

25 KiB
Raw Blame History

MCP 工具文档

概述

HttpRunner UIXT 基于 Model Context Protocol (MCP) 协议实现了标准化的工具接口,将所有 UI 操作封装为 MCP 工具,支持 AI 模型直接调用,实现真正的智能化 UI 自动化。

MCP 架构

整体架构

┌─────────────────────────────────────────────────────────────────┐
│                        MCP 生态系统                             │
├─────────────────────────────────────────────────────────────────┤
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐  │
│  │   MCP Client    │    │   MCP Server    │    │  Tool Registry  │  │
│  │   (AI Model)    │◄──►│  (UIXT Server)  │◄──►│   (工具注册)     │  │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│                        工具层                                   │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │  Device Tools   │  │  Action Tools   │  │   AI Tools      │  │
│  │   (设备工具)     │  │   (操作工具)     │  │   (AI工具)      │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│                      底层驱动                                   │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │  Android Driver │  │   iOS Driver    │  │  Browser Driver │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

核心组件

MCPServer4XTDriver

MCP 协议服务器主体:

type MCPServer4XTDriver struct {
    mcpServer     *server.MCPServer                // MCP 协议服务器
    mcpTools      []mcp.Tool                       // 注册的工具列表
    actionToolMap map[option.ActionName]ActionTool // 动作到工具的映射
}

ActionTool 接口

所有 MCP 工具的统一契约:

type ActionTool interface {
    Name() option.ActionName                                              // 工具名称
    Description() string                                                  // 工具描述
    Options() []mcp.ToolOption                                           // MCP 选项定义
    Implement() server.ToolHandlerFunc                                   // 工具实现逻辑
    ConvertActionToCallToolRequest(action MobileAction) (mcp.CallToolRequest, error) // 动作转换
}

工具分类

设备管理工具 (mcp_tools_device.go)

list_available_devices

发现可用的设备和模拟器。

{
  "name": "uixt__list_available_devices",
  "description": "List all available devices including Android devices, iOS devices, and simulators",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "required": []
  }
}

响应示例

{
  "action": "list_available_devices",
  "success": true,
  "message": "Found 3 available devices",
  "devices": [
    {
      "platform": "android",
      "serial": "emulator-5554",
      "name": "Android Emulator",
      "status": "online"
    }
  ],
  "count": 3
}

select_device

选择特定的设备进行操作。

{
  "name": "uixt__select_device",
  "description": "Select a specific device by platform and serial number",
  "inputSchema": {
    "type": "object",
    "properties": {
      "platform": {
        "type": "string",
        "description": "Device platform (android, ios, browser, harmony)"
      },
      "serial": {
        "type": "string",
        "description": "Device serial number or identifier"
      }
    },
    "required": ["platform", "serial"]
  }
}

触摸操作工具 (mcp_tools_touch.go)

tap_xy

在相对坐标位置点击0-1 范围)。

{
  "name": "uixt__tap_xy",
  "description": "Tap at relative coordinates (0-1 range)",
  "inputSchema": {
    "type": "object",
    "properties": {
      "x": {
        "type": "number",
        "description": "X coordinate (0-1 range)"
      },
      "y": {
        "type": "number",
        "description": "Y coordinate (0-1 range)"
      }
    },
    "required": ["x", "y"]
  }
}

tap_abs_xy

在绝对像素坐标位置点击。

{
  "name": "uixt__tap_abs_xy",
  "description": "Tap at absolute pixel coordinates",
  "inputSchema": {
    "type": "object",
    "properties": {
      "x": {
        "type": "number",
        "description": "Absolute X coordinate in pixels"
      },
      "y": {
        "type": "number",
        "description": "Absolute Y coordinate in pixels"
      }
    },
    "required": ["x", "y"]
  }
}

tap_ocr

通过 OCR 识别文本并点击。

{
  "name": "uixt__tap_ocr",
  "description": "Find text using OCR and tap on it",
  "inputSchema": {
    "type": "object",
    "properties": {
      "text": {
        "type": "string",
        "description": "Text to find and tap"
      },
      "regex": {
        "type": "boolean",
        "description": "Whether to use regex matching"
      },
      "index": {
        "type": "integer",
        "description": "Index of text occurrence to tap (0-based)"
      }
    },
    "required": ["text"]
  }
}

tap_cv

通过计算机视觉识别 UI 元素并点击。

{
  "name": "uixt__tap_cv",
  "description": "Find UI element using computer vision and tap on it",
  "inputSchema": {
    "type": "object",
    "properties": {
      "element_type": {
        "type": "string",
        "description": "Type of UI element to find"
      },
      "description": {
        "type": "string",
        "description": "Description of the element"
      }
    },
    "required": ["element_type"]
  }
}

滑动操作工具 (mcp_tools_swipe.go)

swipe

通用滑动操作,自动检测方向或坐标。

{
  "name": "uixt__swipe",
  "description": "Perform swipe gesture with automatic direction or coordinate detection",
  "inputSchema": {
    "type": "object",
    "properties": {
      "direction": {
        "type": "string",
        "description": "Swipe direction (up, down, left, right)"
      },
      "from_x": {
        "type": "number",
        "description": "Start X coordinate (0-1 range)"
      },
      "from_y": {
        "type": "number",
        "description": "Start Y coordinate (0-1 range)"
      },
      "to_x": {
        "type": "number",
        "description": "End X coordinate (0-1 range)"
      },
      "to_y": {
        "type": "number",
        "description": "End Y coordinate (0-1 range)"
      }
    }
  }
}

swipe_to_tap_app

滑动查找并点击应用。

{
  "name": "uixt__swipe_to_tap_app",
  "description": "Swipe to find and tap on an app",
  "inputSchema": {
    "type": "object",
    "properties": {
      "app_name": {
        "type": "string",
        "description": "Name of the app to find and tap"
      },
      "max_swipes": {
        "type": "integer",
        "description": "Maximum number of swipes to perform"
      }
    },
    "required": ["app_name"]
  }
}

输入操作工具 (mcp_tools_input.go)

input

在焦点元素上输入文本。

{
  "name": "uixt__input",
  "description": "Input text into the focused element",
  "inputSchema": {
    "type": "object",
    "properties": {
      "text": {
        "type": "string",
        "description": "Text to input"
      }
    },
    "required": ["text"]
  }
}

set_ime

设置输入法编辑器。

{
  "name": "uixt__set_ime",
  "description": "Set the Input Method Editor (IME)",
  "inputSchema": {
    "type": "object",
    "properties": {
      "ime": {
        "type": "string",
        "description": "IME package name or identifier"
      }
    },
    "required": ["ime"]
  }
}

按键操作工具 (mcp_tools_button.go)

press_button

按设备按键。

{
  "name": "uixt__press_button",
  "description": "Press a device button",
  "inputSchema": {
    "type": "object",
    "properties": {
      "button": {
        "type": "string",
        "description": "Button name (home, back, volume_up, volume_down, etc.)"
      }
    },
    "required": ["button"]
  }
}

应用管理工具 (mcp_tools_app.go)

list_packages

列出所有已安装的应用包。

{
  "name": "uixt__list_packages",
  "description": "List all installed app packages on the device",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "required": []
  }
}

app_launch

启动应用。

{
  "name": "uixt__app_launch",
  "description": "Launch an app by package name",
  "inputSchema": {
    "type": "object",
    "properties": {
      "package_name": {
        "type": "string",
        "description": "Package name of the app to launch"
      }
    },
    "required": ["package_name"]
  }
}

app_terminate

终止应用。

{
  "name": "uixt__app_terminate",
  "description": "Terminate a running app",
  "inputSchema": {
    "type": "object",
    "properties": {
      "package_name": {
        "type": "string",
        "description": "Package name of the app to terminate"
      }
    },
    "required": ["package_name"]
  }
}

屏幕操作工具 (mcp_tools_screen.go)

screenshot

捕获屏幕截图。

{
  "name": "uixt__screenshot",
  "description": "Take a screenshot of the device screen",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "required": []
  }
}

响应示例

{
  "action": "screenshot",
  "success": true,
  "message": "Screenshot captured successfully",
  "screenshot": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA...",
  "width": 1080,
  "height": 1920
}

get_screen_size

获取屏幕尺寸。

{
  "name": "uixt__get_screen_size",
  "description": "Get the screen size of the device",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "required": []
  }
}

实用工具 (mcp_tools_utility.go)

sleep

等待指定秒数。

{
  "name": "uixt__sleep",
  "description": "Sleep for specified number of seconds",
  "inputSchema": {
    "type": "object",
    "properties": {
      "seconds": {
        "type": "number",
        "description": "Number of seconds to sleep"
      }
    },
    "required": ["seconds"]
  }
}

close_popups

关闭弹窗或对话框。

{
  "name": "uixt__close_popups",
  "description": "Close popups or dialogs on the screen",
  "inputSchema": {
    "type": "object",
    "properties": {},
    "required": []
  }
}

Web 操作工具 (mcp_tools_web.go)

secondary_click

在指定坐标右键点击。

{
  "name": "uixt__secondary_click",
  "description": "Perform secondary click (right-click) at coordinates",
  "inputSchema": {
    "type": "object",
    "properties": {
      "x": {
        "type": "number",
        "description": "X coordinate for secondary click"
      },
      "y": {
        "type": "number",
        "description": "Y coordinate for secondary click"
      }
    },
    "required": ["x", "y"]
  }
}

hover_by_selector

通过选择器悬停元素。

{
  "name": "uixt__hover_by_selector",
  "description": "Hover over element by CSS selector or XPath",
  "inputSchema": {
    "type": "object",
    "properties": {
      "selector": {
        "type": "string",
        "description": "CSS selector or XPath of the element"
      }
    },
    "required": ["selector"]
  }
}

AI 操作工具 (mcp_tools_ai.go)

start_to_goal

使用自然语言描述执行从开始到目标的任务。

{
  "name": "uixt__start_to_goal",
  "description": "Execute a task from start to goal using natural language description",
  "inputSchema": {
    "type": "object",
    "properties": {
      "goal": {
        "type": "string",
        "description": "Natural language description of the goal"
      }
    },
    "required": ["goal"]
  }
}

ai_action

使用自然语言提示执行 AI 驱动的动作。

{
  "name": "uixt__ai_action",
  "description": "Execute AI-driven action using natural language prompt",
  "inputSchema": {
    "type": "object",
    "properties": {
      "prompt": {
        "type": "string",
        "description": "Natural language prompt for the action"
      }
    },
    "required": ["prompt"]
  }
}

工具实现

ActionTool 实现示例

// 点击工具实现
type ToolTapXY struct {
    X float64 `json:"x" desc:"X coordinate (0-1 range)"`
    Y float64 `json:"y" desc:"Y coordinate (0-1 range)"`
}

func (t *ToolTapXY) Name() option.ActionName {
    return option.ActionTapXY
}

func (t *ToolTapXY) Description() string {
    return "Tap at relative coordinates (0-1 range)"
}

func (t *ToolTapXY) Options() []mcp.ToolOption {
    return []mcp.ToolOption{
        {
            Name:        "x",
            Type:        "number",
            Description: "X coordinate (0-1 range)",
            Required:    true,
        },
        {
            Name:        "y",
            Type:        "number",
            Description: "Y coordinate (0-1 range)",
            Required:    true,
        },
    }
}

func (t *ToolTapXY) Implement() server.ToolHandlerFunc {
    return func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
        // 解析参数
        x, ok := req.Params.Arguments["x"].(float64)
        if !ok {
            return mcp.NewToolResultError("invalid x coordinate"), nil
        }

        y, ok := req.Params.Arguments["y"].(float64)
        if !ok {
            return mcp.NewToolResultError("invalid y coordinate"), nil
        }

        // 执行操作
        err := GetXTDriverFromContext(ctx).TapXY(x, y)
        if err != nil {
            return mcp.NewToolResultError(fmt.Sprintf("tap failed: %v", err)), nil
        }

        // 设置响应数据
        t.X = x
        t.Y = y

        return NewMCPSuccessResponse(
            fmt.Sprintf("Tapped at coordinates (%.2f, %.2f)", x, y),
            t,
        ), nil
    }
}

响应格式

所有工具使用统一的扁平化响应格式:

func NewMCPSuccessResponse(message string, actionTool ActionTool) *mcp.CallToolResult {
    response := map[string]interface{}{
        "action":  string(actionTool.Name()),
        "success": true,
        "message": message,
    }

    // 使用反射提取工具字段
    toolValue := reflect.ValueOf(actionTool)
    if toolValue.Kind() == reflect.Ptr {
        toolValue = toolValue.Elem()
    }

    toolType := toolValue.Type()
    for i := 0; i < toolValue.NumField(); i++ {
        field := toolType.Field(i)
        jsonTag := field.Tag.Get("json")
        if jsonTag != "" && jsonTag != "-" {
            fieldName := strings.Split(jsonTag, ",")[0]
            response[fieldName] = toolValue.Field(i).Interface()
        }
    }

    return &mcp.CallToolResult{
        Content: []mcp.Content{
            {
                Type: mcp.ContentTypeText,
                Text: toJSONString(response),
            },
        },
    }
}

工具注册

服务器初始化

func NewMCPServer() *MCPServer4XTDriver {
    server := &MCPServer4XTDriver{
        mcpTools:      make([]mcp.Tool, 0),
        actionToolMap: make(map[option.ActionName]ActionTool),
    }

    // 注册所有工具
    server.registerDeviceTools()
    server.registerTouchTools()
    server.registerSwipeTools()
    server.registerInputTools()
    server.registerButtonTools()
    server.registerAppTools()
    server.registerScreenTools()
    server.registerUtilityTools()
    server.registerWebTools()
    server.registerAITools()

    return server
}

工具注册方法

func (s *MCPServer4XTDriver) registerTool(tool ActionTool) {
    // 创建 MCP 工具定义
    mcpTool := mcp.Tool{
        Name:        fmt.Sprintf("uixt__%s", tool.Name()),
        Description: tool.Description(),
        InputSchema: map[string]interface{}{
            "type":       "object",
            "properties": generateProperties(tool.Options()),
            "required":   getRequiredFields(tool.Options()),
        },
    }

    // 注册到服务器
    s.mcpTools = append(s.mcpTools, mcpTool)
    s.actionToolMap[tool.Name()] = tool
}

工具调用

客户端调用

// 通过 MCP 客户端调用工具
func callTool(client client.MCPClient, toolName string, args map[string]interface{}) (*mcp.CallToolResult, error) {
    req := mcp.CallToolRequest{
        Params: mcp.CallToolParams{
            Name:      fmt.Sprintf("uixt__%s", toolName),
            Arguments: args,
        },
    }

    return client.CallTool(context.Background(), req)
}

// 使用示例
result, err := callTool(client, "tap_xy", map[string]interface{}{
    "x": 0.5,
    "y": 0.5,
})

服务器处理

func (s *MCPServer4XTDriver) CallTool(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
    // 提取工具名称
    toolName := strings.TrimPrefix(req.Params.Name, "uixt__")
    actionName := option.ActionName(toolName)

    // 查找工具
    tool, exists := s.actionToolMap[actionName]
    if !exists {
        return mcp.NewToolResultError(fmt.Sprintf("tool %s not found", toolName)), nil
    }

    // 执行工具
    handler := tool.Implement()
    return handler(ctx, req)
}

扩展开发

创建自定义工具

// 1. 定义工具结构
type ToolCustomAction struct {
    Parameter1 string `json:"parameter1" desc:"Description of parameter1"`
    Parameter2 int    `json:"parameter2" desc:"Description of parameter2"`
}

// 2. 实现 ActionTool 接口
func (t *ToolCustomAction) Name() option.ActionName {
    return option.ActionName("custom_action")
}

func (t *ToolCustomAction) Description() string {
    return "Perform a custom action"
}

func (t *ToolCustomAction) Options() []mcp.ToolOption {
    return []mcp.ToolOption{
        {
            Name:        "parameter1",
            Type:        "string",
            Description: "Description of parameter1",
            Required:    true,
        },
        {
            Name:        "parameter2",
            Type:        "integer",
            Description: "Description of parameter2",
            Required:    false,
        },
    }
}

func (t *ToolCustomAction) Implement() server.ToolHandlerFunc {
    return func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
        // 解析参数
        param1, ok := req.Params.Arguments["parameter1"].(string)
        if !ok {
            return mcp.NewToolResultError("invalid parameter1"), nil
        }

        param2, _ := req.Params.Arguments["parameter2"].(float64)

        // 执行自定义逻辑
        err := performCustomAction(param1, int(param2))
        if err != nil {
            return mcp.NewToolResultError(fmt.Sprintf("custom action failed: %v", err)), nil
        }

        // 设置响应数据
        t.Parameter1 = param1
        t.Parameter2 = int(param2)

        return NewMCPSuccessResponse("Custom action completed", t), nil
    }
}

// 3. 注册工具
func (s *MCPServer4XTDriver) registerCustomTools() {
    s.registerTool(&ToolCustomAction{})
}

工具分组

// 按功能分组注册工具
func (s *MCPServer4XTDriver) registerToolGroup(groupName string, tools []ActionTool) {
    for _, tool := range tools {
        // 添加分组前缀
        mcpTool := mcp.Tool{
            Name:        fmt.Sprintf("uixt__%s__%s", groupName, tool.Name()),
            Description: fmt.Sprintf("[%s] %s", groupName, tool.Description()),
            InputSchema: generateInputSchema(tool),
        }

        s.mcpTools = append(s.mcpTools, mcpTool)
        s.actionToolMap[tool.Name()] = tool
    }
}

最佳实践

1. 工具设计原则

// 单一职责:每个工具只做一件事
type ToolSinglePurpose struct {
    // 明确的参数定义
    TargetText string `json:"target_text" desc:"Text to search for"`
}

// 参数验证:在工具实现中验证参数
func (t *ToolSinglePurpose) Implement() server.ToolHandlerFunc {
    return func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
        // 参数验证
        if err := t.validateParameters(req.Params.Arguments); err != nil {
            return mcp.NewToolResultError(err.Error()), nil
        }

        // 执行逻辑
        return t.execute(ctx, req)
    }
}

2. 错误处理

// 统一的错误处理
func handleToolError(err error, toolName string) *mcp.CallToolResult {
    if err == nil {
        return nil
    }

    // 记录错误日志
    log.Error().Err(err).Str("tool", toolName).Msg("tool execution failed")

    // 返回用户友好的错误信息
    return mcp.NewToolResultError(fmt.Sprintf("Tool %s failed: %v", toolName, err))
}

3. 性能优化

// 工具执行缓存
type ToolCache struct {
    cache map[string]*mcp.CallToolResult
    mutex sync.RWMutex
}

func (c *ToolCache) GetOrExecute(key string, executor func() (*mcp.CallToolResult, error)) (*mcp.CallToolResult, error) {
    c.mutex.RLock()
    if result, exists := c.cache[key]; exists {
        c.mutex.RUnlock()
        return result, nil
    }
    c.mutex.RUnlock()

    // 执行工具
    result, err := executor()
    if err != nil {
        return nil, err
    }

    // 缓存结果
    c.mutex.Lock()
    c.cache[key] = result
    c.mutex.Unlock()

    return result, nil
}

4. 工具组合

// 复合工具:组合多个基础工具
type ToolComposite struct {
    Steps []ToolStep `json:"steps" desc:"Sequence of tool steps"`
}

type ToolStep struct {
    Tool      string                 `json:"tool"`
    Arguments map[string]interface{} `json:"arguments"`
}

func (t *ToolComposite) Implement() server.ToolHandlerFunc {
    return func(ctx context.Context, req mcp.CallToolRequest) (*mcp.CallToolResult, error) {
        results := make([]interface{}, 0, len(t.Steps))

        for i, step := range t.Steps {
            // 执行每个步骤
            result, err := executeToolStep(ctx, step)
            if err != nil {
                return mcp.NewToolResultError(fmt.Sprintf("step %d failed: %v", i+1, err)), nil
            }
            results = append(results, result)
        }

        return NewMCPSuccessResponse("Composite tool completed", t), nil
    }
}

故障排除

常见问题

工具注册失败

// 检查工具注册
func validateToolRegistration(server *MCPServer4XTDriver) error {
    tools := server.ListTools()
    if len(tools) == 0 {
        return fmt.Errorf("no tools registered")
    }

    // 检查必需工具
    requiredTools := []string{"tap_xy", "screenshot", "app_launch"}
    for _, required := range requiredTools {
        found := false
        for _, tool := range tools {
            if strings.HasSuffix(tool.Name, required) {
                found = true
                break
            }
        }
        if !found {
            return fmt.Errorf("required tool %s not found", required)
        }
    }

    return nil
}

工具调用失败

// 调试工具调用
func debugToolCall(req mcp.CallToolRequest) {
    log.Debug().
        Str("tool", req.Params.Name).
        Interface("arguments", req.Params.Arguments).
        Msg("tool call debug")

    // 验证参数类型
    for key, value := range req.Params.Arguments {
        log.Debug().
            Str("param", key).
            Str("type", fmt.Sprintf("%T", value)).
            Interface("value", value).
            Msg("parameter debug")
    }
}

性能问题

// 监控工具性能
func monitorToolPerformance(toolName string, executor func() (*mcp.CallToolResult, error)) (*mcp.CallToolResult, error) {
    start := time.Now()

    result, err := executor()

    elapsed := time.Since(start)
    log.Info().
        Str("tool", toolName).
        Dur("elapsed", elapsed).
        Bool("success", err == nil).
        Msg("tool performance")

    if elapsed > 5*time.Second {
        log.Warn().
            Str("tool", toolName).
            Dur("elapsed", elapsed).
            Msg("slow tool execution")
    }

    return result, err
}

参考资料