Commit Graph

50 Commits

Author SHA1 Message Date
lilong.129
a1c8b7fab3 refactor: remove unused handlers and related files to streamline the server codebase 2025-06-21 22:08:54 +08:00
lilong.129
ed5d3127cb fix: add missing action options 2025-06-19 21:57:26 +08:00
lilong.129
b271e655b1 feat: add MCP plugin support and optimize AI service configuration
- Add UIXT runner with MCP plugin support
   - Refactor AI service options handling
   - Optimize configuration parsing for LLM and CV services
   - Update dependencies to latest versions
2025-06-13 20:24:57 +08:00
lilong.129
f6e7e970f8 feat: 实现 AIQuery 功能并支持 OutputSchema
- 新增 AIQuery 方法到 StepMobile,支持使用自然语言从屏幕中提取信息
- 实现 AIQuery 在 driver_ext_ai.go 中的完整功能,包括屏幕截图和 LLM 查询
- 添加 OutputSchema 支持,允许用户定义自定义输出格式进行结构化查询
- 新增 ToolAIQuery MCP 工具,完整集成到 MCP 服务器中
- 在 ActionOptions 中添加 OutputSchema 字段和 WithOutputSchema 选项函数
- 添加 ACTION_Query 的配置支持和字段映射
- 完善测试覆盖:
  * 添加 TestAIQuery 单元测试,包含多种 OutputSchema 使用场景
  * 添加 TestToolAIQuery MCP 工具测试
  * 定义 GameInfo、UIElementInfo 等结构体用于测试
- 更新文档:
  * 在 docs/uixt/ai.md 中添加完整的 AIQuery 使用指南
  * 包含基本用法、OutputSchema 示例、最佳实践等
- 支持复杂的嵌套结构体和数组类型的 OutputSchema
- 与现有 AIAction、AIAssert 功能保持一致的 API 设计
2025-06-13 10:27:08 +08:00
lilong.129
72df285fed fix: get resultsPath 2025-06-12 14:51:15 +08:00
lilong.129
fbc888655f feat: optimize ILLMService interface to support different models for each component
- Add LLMServiceConfig to support mixed model configuration
- Enable Planner, Asserter, Querier to use different optimal models
- Provide recommended configurations for various use cases
- Maintain backward compatibility with existing API
- Update documentation to reflect current state without iteration history
- Merge test files and add comprehensive configuration tests
- Resolve circular dependency by moving config to option package
2025-06-11 12:18:31 +08:00
lilong.129
88ae8faee1 feat: enhance VLM response parsing and DOUBAO model support
- Fix JSON extraction logic by prioritizing brace counting method
- Add support for DOUBAO string array coordinate format
- Introduce IS_UI_TARS helper function for model type checking
- Add comprehensive tests for JSON parsing and coordinate handling
- Improve error handling with retry delays for LLM service failures
2025-06-10 15:56:13 +08:00
lilong.129
533c1f4bff feat: add mcp tool ToolScreenRecord 2025-06-09 17:18:26 +08:00
lilong.129
660e8ca124 feat: add mcp tool ToolGetForegroundApp 2025-06-08 19:25:09 +08:00
lilong.129
b642ea004e feat: implement UI automation test history isolation
- Add ResetHistory option to PlanningOptions and ActionOptions
- Implement task completion detection with isTaskFinished() method
- Add executeActions() method to separate action execution logic
- Modify ConversationHistory.Clear() to completely clear all messages including system message
- Refactor StartToGoal() to automatically reset history on first attempt
- Add WithResetHistory() option function for consistent API
- Consolidate test files into driver_ext_ai_test.go with comprehensive test coverage
2025-06-06 15:29:42 +08:00
lilong.129
5f400735fc fix: 修复 StartToGoal 命令无法通过 CTRL+C 中断的问题
- 为 AI 相关方法添加 context.Context 参数支持中断

- 在重试循环中添加上下文取消检查

- 创建可取消的上下文并监听中断信号

- 更新 MCP 工具调用使用带上下文的方法

现在用户可以通过 CTRL+C 正常中断长时间运行的 AI 自动化任务
2025-06-05 20:00:20 +08:00
lilong.129
d883aa6a21 change: rename VLM name 2025-06-05 18:09:25 +08:00
lilong.129
c4e7ab00a7 feat: implement ToolStartToGoal and fix LLM service initialization
- Add ToolStartToGoal implementation with AI-driven goal automation
- Fix LLM service not initialized issue by applying global AI config to XTDriver creation
- Ensure XTDriver is created with proper AI services from the first initialization
- Add StartToGoal method to StepMobile for goal-oriented automation
- Register ToolStartToGoal in MCP server and add corresponding action type
- Add comprehensive test case for StartToGoal functionality
- Fix ReturnSchema consistency across AI tools (StartToGoal, AIAction, Finished)
- Extract AI service options in MCP argument processing

This resolves the root cause where XTDriver was created without AI services
in runStepMobileUI, ensuring only one XTDriver initialization with complete
AI service configuration.
2025-06-05 16:52:11 +08:00
lilong.129
c204542f1f feat: optimize UI-TARS parser with coordinate conversion and action mapping
- Add action mapping for UI-TARS parser to convert action names to option.ActionName
- Implement bounding box to center point coordinate conversion for better accuracy
- Update coordinate normalization to handle coordinates > 1000 properly
- Enhance test cases to verify coordinate scaling and center point conversion
- Improve action argument processing with proper coordinate transformation
- Add comprehensive test coverage for coordinate conversion edge cases

Key improvements:
- Bounding box [x1,y1,x2,y2] now converts to center point [cx,cy] for actions
- Coordinate scaling properly handles different screen resolutions
- Action names are mapped through doubao_1_5_ui_tars_action_mapping
- Enhanced error handling for invalid coordinate formats
2025-06-04 23:16:14 +08:00
lilong.129
bd8cb5abf4 refactor: move MobileAction to option package and update imports
- Move MobileAction struct from uixt package to uixt/option package
- Delete uixt/driver_action.go file as MobileAction is now in option package
- Update all import statements across the codebase to use option.MobileAction
- Update ActionTool interface to use option.MobileAction in ConvertActionToCallToolRequest method
- Maintain backward compatibility while improving package organization
- Clean up code structure by consolidating action-related types in option package

Files affected:
- server/uixt.go: Updated imports and type references
- step.go: Updated imports and ActionResult struct
- step_ui.go: Updated all MobileAction references to option.MobileAction
- uixt/mcp_server.go: Updated ActionTool interface and removed detailed comments
- uixt/mcp_server_test.go: Updated all test cases to use option.MobileAction
- uixt/mcp_tools_*.go: Updated ConvertActionToCallToolRequest method signatures
- uixt/option/action.go: Added MobileAction struct definition
- uixt/sdk.go: Updated ExecuteAction method signature
2025-06-03 18:15:28 +08:00
lilong.129
2fe5b14d63 refactor: integrate and optimize MCP tool calling methods 2025-05-27 21:39:17 +08:00
lilong.129
866cc0e4d2 feat: implement MCP hooks integration with anti_risk option 2025-05-27 19:46:08 +08:00
lilong.129
404865ba6b refactor: complete ActionOptions unification and pointer type optimization 2025-05-27 13:34:12 +08:00
lilong.129
7fb966b7ba refactor: improve ActionMethod type safety and eliminate type conversions 2025-05-27 11:49:30 +08:00
lilong.129
466fe39cb9 docs: add comprehensive migration summary for ActionOptions and Request integration
- Document the complete integration process of ActionOptions and Request structures
- Include detailed statistics: 40 tools migrated with 100% test pass rate
- Provide technical implementation details and usage examples
- Record backward compatibility guarantees and migration helpers
- Summarize code quality improvements and performance optimizations
- Outline future development plans and goals

This documentation serves as a complete record of the unification initiative
and provides guidance for future development and maintenance.
2025-05-26 23:13:19 +08:00
lilong.129
6ae4c300c1 add generic swipe tool with auto-detection of direction vs coordinate params
- Added ACTION_Swipe to option/action.go for generic swipe functionality
- Implemented ToolSwipe in mcp_server.go that automatically detects parameter type:
  - String params (up/down/left/right) use direction-based swipe logic
  - Array params [fromX, fromY, toX, toY] use coordinate-based swipe logic
- Added comprehensive test coverage for ToolSwipe in mcp_server_test.go
- Updated tool registration to include the new generic swipe tool
- All tests pass, confirming backward compatibility with existing tools
2025-05-26 22:39:23 +08:00
lilong.129
77f5683f9a fix: remove unnecessary IgnoreNotFoundError and MaxRetryTimes from coordinate-based tap tools
- Removed IgnoreNotFoundError and MaxRetryTimes parameters from TapRequest, TapAbsXYRequest, and DoubleTapXYRequest structures
- Updated corresponding tool implementations to remove references to these non-existent fields
- These parameters are not applicable to coordinate-based operations as they don't involve element searching
- Only OCR/CV-based operations need these error handling parameters

This ensures that only relevant tools have the ignore_NotFoundError functionality,
making the API more consistent and avoiding confusion.
2025-05-26 22:10:08 +08:00
lilong.129
df65f9a828 fix: MCP server ignore_NotFoundError option not working
- Fixed TapByOCR and TapByCV tools to properly handle ignore_NotFoundError option
- Added option parameters to all MCP tool request structures
- Fixed ConvertActionToCallToolRequest methods to extract action options
- Added extractActionOptionsToArguments helper function for consistent option handling
- Extended fix to all MCP tools: SwipeToTapApp, SwipeToTapText, SwipeToTapTexts, TapXY, TapAbsXY
- Added comprehensive tests for option parameter handling
- Updated test expectations to match actual registered tools

This ensures that when ignore_NotFoundError is set to true, OCR/CV operations
will return nil instead of throwing errors when target elements are not found,
allowing tests to continue execution as expected.
2025-05-26 22:02:01 +08:00
lilong.129
9a5e0849de fix: handle GetOrCreateXTDriver when serial is empty 2025-05-26 21:25:25 +08:00
lilong.129
2569670c7f feat: implement unified XTDriver cache 2025-05-26 19:39:46 +08:00
lilong.129
36c5044402 feat: add mcp tool finished 2025-05-26 09:05:48 +08:00
lilong.129
778344c826 change: remove call function tool 2025-05-26 00:43:01 +08:00
lilong.129
2e17d9df16 refactor: merge DoAction to mcp server tools 2025-05-25 23:53:07 +08:00
lilong.129
7986c4899f refactor: move DoAction to MCP tools call 2025-05-25 08:10:57 +08:00
lilong.129
4ff2692f02 refactor: move action options 2025-05-25 00:15:18 +08:00
lilong.129
97dad38b7b refactor: move tool request types to option 2025-05-24 23:51:58 +08:00
lilong.129
c377664518 refactor: add LLMServiceTypeDoubaoVL 2025-05-22 15:34:11 +08:00
lilong.129
d145784910 fix: swipe with params 2025-05-14 14:36:46 +08:00
lilong.129
d95eec78b0 feat: add WithPreMarkOperation and WithPostMarkOperation to mark UI operation before/after action 2025-05-12 08:58:27 +08:00
lilong.129
9bafea53af feat: support action options for AppLaunch/AppTerminate 2025-05-10 00:01:30 +08:00
lilong.129
3715cbb432 feat: support pre hook and post hook for actions 2025-05-09 23:01:27 +08:00
徐聪
6cce5e3c5b fix: web ui test 2025-05-07 20:12:06 +08:00
lilong.129
cfc71819d2 feat: mark tap/swipe UI operation 2025-05-05 16:31:13 +08:00
lilong.129
0e9389c796 refactor: NewXTDriver api, return error if init failed 2025-04-30 14:31:36 +08:00
lilong.129
d2976844fc fix: load testcase panic caused by config options 2025-04-27 11:50:50 +08:00
徐聪
382aad2d9f fix: 修复浏览器驱动的一些问题 2025-04-24 22:57:08 +08:00
lilong.129
182de16751 feat: ApplySwipeOffset 2025-03-17 17:57:10 +08:00
lilong.129
9fb53590ca refactor: rename ApplyTapOffset 2025-03-17 17:44:04 +08:00
lilong.129
b34a2218fe feat: tap random point in ocr text rect 2025-03-17 15:36:35 +08:00
lilong.129
3e7e9b0ef9 change: tap/swipe with offset 2025-03-17 14:36:09 +08:00
lilong.129
0d416e74a1 change: use context to contral ScreenRecord timeout or cancel 2025-03-06 22:16:49 +08:00
lilong.129
79e0323471 fix: screen record with scrcpy 2025-03-06 17:50:01 +08:00
lilong.129
cc81c00a82 feat: add adb screen record 2025-03-06 16:57:51 +08:00
lilong.129
b5fffdf548 move ghdc to pkg 2025-03-05 21:33:06 +08:00
lilong.129
e107389d6e refactor: move uixt pkg 2025-03-05 11:04:02 +08:00