- Add action mapping for UI-TARS parser to convert action names to option.ActionName
- Implement bounding box to center point coordinate conversion for better accuracy
- Update coordinate normalization to handle coordinates > 1000 properly
- Enhance test cases to verify coordinate scaling and center point conversion
- Improve action argument processing with proper coordinate transformation
- Add comprehensive test coverage for coordinate conversion edge cases
Key improvements:
- Bounding box [x1,y1,x2,y2] now converts to center point [cx,cy] for actions
- Coordinate scaling properly handles different screen resolutions
- Action names are mapped through doubao_1_5_ui_tars_action_mapping
- Enhanced error handling for invalid coordinate formats
- Move MobileAction struct from uixt package to uixt/option package
- Delete uixt/driver_action.go file as MobileAction is now in option package
- Update all import statements across the codebase to use option.MobileAction
- Update ActionTool interface to use option.MobileAction in ConvertActionToCallToolRequest method
- Maintain backward compatibility while improving package organization
- Clean up code structure by consolidating action-related types in option package
Files affected:
- server/uixt.go: Updated imports and type references
- step.go: Updated imports and ActionResult struct
- step_ui.go: Updated all MobileAction references to option.MobileAction
- uixt/mcp_server.go: Updated ActionTool interface and removed detailed comments
- uixt/mcp_server_test.go: Updated all test cases to use option.MobileAction
- uixt/mcp_tools_*.go: Updated ConvertActionToCallToolRequest method signatures
- uixt/option/action.go: Added MobileAction struct definition
- uixt/sdk.go: Updated ExecuteAction method signature
- Document the complete integration process of ActionOptions and Request structures
- Include detailed statistics: 40 tools migrated with 100% test pass rate
- Provide technical implementation details and usage examples
- Record backward compatibility guarantees and migration helpers
- Summarize code quality improvements and performance optimizations
- Outline future development plans and goals
This documentation serves as a complete record of the unification initiative
and provides guidance for future development and maintenance.
- Added ACTION_Swipe to option/action.go for generic swipe functionality
- Implemented ToolSwipe in mcp_server.go that automatically detects parameter type:
- String params (up/down/left/right) use direction-based swipe logic
- Array params [fromX, fromY, toX, toY] use coordinate-based swipe logic
- Added comprehensive test coverage for ToolSwipe in mcp_server_test.go
- Updated tool registration to include the new generic swipe tool
- All tests pass, confirming backward compatibility with existing tools
- Removed IgnoreNotFoundError and MaxRetryTimes parameters from TapRequest, TapAbsXYRequest, and DoubleTapXYRequest structures
- Updated corresponding tool implementations to remove references to these non-existent fields
- These parameters are not applicable to coordinate-based operations as they don't involve element searching
- Only OCR/CV-based operations need these error handling parameters
This ensures that only relevant tools have the ignore_NotFoundError functionality,
making the API more consistent and avoiding confusion.
- Fixed TapByOCR and TapByCV tools to properly handle ignore_NotFoundError option
- Added option parameters to all MCP tool request structures
- Fixed ConvertActionToCallToolRequest methods to extract action options
- Added extractActionOptionsToArguments helper function for consistent option handling
- Extended fix to all MCP tools: SwipeToTapApp, SwipeToTapText, SwipeToTapTexts, TapXY, TapAbsXY
- Added comprehensive tests for option parameter handling
- Updated test expectations to match actual registered tools
This ensures that when ignore_NotFoundError is set to true, OCR/CV operations
will return nil instead of throwing errors when target elements are not found,
allowing tests to continue execution as expected.