- Add complete HTML report generator with template-based rendering
- Implement log time filtering for step-specific logs
- Support responsive design and interactive UI features
- Consolidate duplicate report implementations
- Use defer for summary saving and HTML report generation to ensure they run regardless of exit path
- Remove unnecessary sync.Once for cleanup operations since defer guarantees single execution
- Simplify error handling logic by removing redundant runErr checks
- Improve interrupt handling with better logging messages
- Ensure graceful cleanup and data persistence even when interrupted
- Fix parameter mapping issue where AI model's 'content' parameter wasn't mapped to 'text' field
- Add mapParameterName function to handle parameter name mapping (content->text, key->keycode)
- Add comprehensive unit tests for convertProcessedArgs and mapParameterName functions
- Update existing test cases to match new parameter format (x,y for single coords, from_x,from_y,to_x,to_y for drag)
This resolves the issue where uixt__input action was not working due to parameter name mismatch.
- Support configuring multiple LLM services simultaneously
- Auto-derive model names from service types to simplify configuration
- Maintain backward compatibility with existing configurations
- Refactor configuration logic into dedicated env module
- Add comprehensive unit test coverage
- Update documentation with new configuration approach
- Add ResetHistory option to PlanningOptions and ActionOptions
- Implement task completion detection with isTaskFinished() method
- Add executeActions() method to separate action execution logic
- Modify ConversationHistory.Clear() to completely clear all messages including system message
- Refactor StartToGoal() to automatically reset history on first attempt
- Add WithResetHistory() option function for consistent API
- Consolidate test files into driver_ext_ai_test.go with comprehensive test coverage
- Remove all manual ReturnSchema() methods from tools
- Implement automatic schema generation using reflection
- Unify response format to flat structure with action/success/message fields
- Simplify tool implementation by removing MCPResponse embedding
- Update documentation to reflect new architecture
- Achieve ~70% code reduction while maintaining type safety
- Add ToolStartToGoal implementation with AI-driven goal automation
- Fix LLM service not initialized issue by applying global AI config to XTDriver creation
- Ensure XTDriver is created with proper AI services from the first initialization
- Add StartToGoal method to StepMobile for goal-oriented automation
- Register ToolStartToGoal in MCP server and add corresponding action type
- Add comprehensive test case for StartToGoal functionality
- Fix ReturnSchema consistency across AI tools (StartToGoal, AIAction, Finished)
- Extract AI service options in MCP argument processing
This resolves the root cause where XTDriver was created without AI services
in runStepMobileUI, ensuring only one XTDriver initialization with complete
AI service configuration.
- Remove redundant ActionSummary field from PlanningResult struct
- Update parsers to use unified Thought field instead of duplicate fields
- Modify chat interface to display Thought instead of ActionSummary
- Update planner logging to use thought instead of summary
- Adjust prompt templates to use thought field consistently
- Switch test LLM service from UI-TARS to DoubaoVL
- Add default parameter handling for sleep tool
- Add action mapping for UI-TARS parser to convert action names to option.ActionName
- Implement bounding box to center point coordinate conversion for better accuracy
- Update coordinate normalization to handle coordinates > 1000 properly
- Enhance test cases to verify coordinate scaling and center point conversion
- Improve action argument processing with proper coordinate transformation
- Add comprehensive test coverage for coordinate conversion edge cases
Key improvements:
- Bounding box [x1,y1,x2,y2] now converts to center point [cx,cy] for actions
- Coordinate scaling properly handles different screen resolutions
- Action names are mapped through doubao_1_5_ui_tars_action_mapping
- Enhanced error handling for invalid coordinate formats
- Move MobileAction struct from uixt package to uixt/option package
- Delete uixt/driver_action.go file as MobileAction is now in option package
- Update all import statements across the codebase to use option.MobileAction
- Update ActionTool interface to use option.MobileAction in ConvertActionToCallToolRequest method
- Maintain backward compatibility while improving package organization
- Clean up code structure by consolidating action-related types in option package
Files affected:
- server/uixt.go: Updated imports and type references
- step.go: Updated imports and ActionResult struct
- step_ui.go: Updated all MobileAction references to option.MobileAction
- uixt/mcp_server.go: Updated ActionTool interface and removed detailed comments
- uixt/mcp_server_test.go: Updated all test cases to use option.MobileAction
- uixt/mcp_tools_*.go: Updated ConvertActionToCallToolRequest method signatures
- uixt/option/action.go: Added MobileAction struct definition
- uixt/sdk.go: Updated ExecuteAction method signature
- Add identification for normal shutdown pipe errors in startStdioLog
- Optimize stdio log error handling logic to distinguish between normal shutdown and actual errors
- Add proper handling for SIGTERM (exit status 143) in isSignalError function
- Add debug logging for MCP config loading process
- Ensure clean shutdown without confusing error messages
- Add support for loading environment variables from ~/.hrp/.env
- Implement priority order: current working directory > ~/.hrp/.env > system environment variables
- Use godotenv.Overload() for both global and local .env files to ensure proper priority
- Maintain backward compatibility with existing functionality
- Add comprehensive error handling and logging
- Add detailed documentation for HttpRunner AI module
- Cover planning, assertion, computer vision, and session management
- Include architecture design, usage guide, and configuration
- Provide code examples and best practices
- Document all core components and interfaces
- Add detailed package-level documentation for mcp_server.go
- Create MCP_SERVER_DOCUMENTATION.md with complete implementation guide
- Create MCP_TOOLS_REFERENCE.md with quick reference for all tools
- Add extensive code comments for key structures and functions
- Document architecture, features, extension guide, and best practices
- Include usage examples and troubleshooting information
This provides complete documentation for developers to understand,
use, and extend the HttpRunner MCP server functionality.
- Replace all mapToStruct calls with parseActionOptions function
- Add parseActionOptions implementation for MCP request parameter parsing
- Remove undefined mapToStruct function that was causing compilation errors
- Standardize parameter names (fromX/fromY/toX/toY -> from_x/from_y/to_x/to_y)
- Add AntiRisk support for TapAbsXY and Drag tools
- Improve parameter validation for Drag tool
- Update corresponding test cases to match new parameter names
This fixes compilation errors and ensures all MCP tools work correctly.
- Add AntiRisk field to TConfig struct for global anti-risk switch
- Add SetAntiRisk method to configure global anti-risk setting
- Implement automatic AntiRisk application in mobile UI steps
- Global AntiRisk setting applies to all actions unless explicitly disabled
- Maintains backward compatibility with existing action-level AntiRisk settings
- Document the complete integration process of ActionOptions and Request structures
- Include detailed statistics: 40 tools migrated with 100% test pass rate
- Provide technical implementation details and usage examples
- Record backward compatibility guarantees and migration helpers
- Summarize code quality improvements and performance optimizations
- Outline future development plans and goals
This documentation serves as a complete record of the unification initiative
and provides guidance for future development and maintenance.
- Modified ToolSwipe.ConvertActionToCallToolRequest to delegate to ToolSwipeDirection and ToolSwipeCoordinate
- Removed duplicate parameter handling logic in favor of reusing existing implementations
- Fixed linter error by removing unused variable
- Maintained backward compatibility while reducing code duplication
- All tests pass, confirming the refactoring is successful
- Added ACTION_Swipe to option/action.go for generic swipe functionality
- Implemented ToolSwipe in mcp_server.go that automatically detects parameter type:
- String params (up/down/left/right) use direction-based swipe logic
- Array params [fromX, fromY, toX, toY] use coordinate-based swipe logic
- Added comprehensive test coverage for ToolSwipe in mcp_server_test.go
- Updated tool registration to include the new generic swipe tool
- All tests pass, confirming backward compatibility with existing tools
- Merged all individual MCP tool test functions from mcp_tools_test.go into mcp_server_test.go
- Added require import for additional test assertions
- Removed duplicate TestMCPServer4XTDriver function
- Deleted the original mcp_tools_test.go file
- All 39 MCP tools now have comprehensive unit tests in a single file
- Tests cover tool name, description, options, and request conversion functionality
- Removed IgnoreNotFoundError and MaxRetryTimes parameters from TapRequest, TapAbsXYRequest, and DoubleTapXYRequest structures
- Updated corresponding tool implementations to remove references to these non-existent fields
- These parameters are not applicable to coordinate-based operations as they don't involve element searching
- Only OCR/CV-based operations need these error handling parameters
This ensures that only relevant tools have the ignore_NotFoundError functionality,
making the API more consistent and avoiding confusion.
- Fixed TapByOCR and TapByCV tools to properly handle ignore_NotFoundError option
- Added option parameters to all MCP tool request structures
- Fixed ConvertActionToCallToolRequest methods to extract action options
- Added extractActionOptionsToArguments helper function for consistent option handling
- Extended fix to all MCP tools: SwipeToTapApp, SwipeToTapText, SwipeToTapTexts, TapXY, TapAbsXY
- Added comprehensive tests for option parameter handling
- Updated test expectations to match actual registered tools
This ensures that when ignore_NotFoundError is set to true, OCR/CV operations
will return nil instead of throwing errors when target elements are not found,
allowing tests to continue execution as expected.