- Add new AI Querier module for structured information extraction from screenshots
- Support custom output schema for structured data response
- Implement automatic type conversion and data validation
- Add comprehensive test suite with various data structure examples
- Refactor callModelWithLogging to utils.go as shared function for planner, asserter, and querier
- Eliminate code duplication across AI modules (30+ lines of repeated code)
- Improve maintainability with unified logging and timing logic
- Add environment variable checks in test setup to handle missing API keys gracefully
Key features:
- Custom output schema support with JSON Schema generation
- Automatic data type conversion with reflection
- Fallback mechanisms for robust parsing
- Comprehensive documentation and usage examples
- Backward compatibility with existing functionality
- Handle direction parameter in convertProcessedArgs for scroll actions
- Ensure scroll operations map to swipe with both coordinates and direction
- Add comprehensive test coverage for scroll action parsing
- Fix issue where scroll direction was missing from tool call arguments
- Replace byte-based brace counting with UTF-8 aware rune iteration
- Add proper string state tracking to handle escaped quotes
- Add comprehensive test cases for Chinese character handling
- Fix parsing errors when JSON contains Chinese text like 2048经典
- Fix JSON extraction logic by prioritizing brace counting method
- Add support for DOUBAO string array coordinate format
- Introduce IS_UI_TARS helper function for model type checking
- Add comprehensive tests for JSON parsing and coordinate handling
- Improve error handling with retry delays for LLM service failures
- Implement lazy loading for directory creation in config.go
- Add logFile parameter to InitLogger for better control
- Use dynamic directory existence check instead of flags
- Report command now uses console-only logging to prevent directory creation
- Support both JSON and colorized console output formats
- Maintain backward compatibility for all other commands
Changes:
- config.go: Convert directory paths to getter methods with lazy creation
- logger.go: Add logFile parameter and improve logging control
- cmd/root.go: Detect report command and disable file logging
- uixt/*: Update all references to use new getter methods
Fixes the issue where 'hrp report results/' would create unwanted timestamp directories
- Add RegisterTools method to ILLMService interface
- Create shared MCP to eino tool converter
- Auto-register built-in uixt tools in XTDriver initialization
- Refactor MCPHost to use shared converter
- Add comprehensive test coverage for tool conversion
This enables doubao-1.5-thinking-vision-pro model to access
MCP tools through function calling mechanism.
- Add ModelName field to PlanningResult and SubActionResult
- Update HTML report with improved layout and model name display
- Fix elapsed time setting bug and enhance mobile responsiveness
- Fix parameter mapping issue where AI model's 'content' parameter wasn't mapped to 'text' field
- Add mapParameterName function to handle parameter name mapping (content->text, key->keycode)
- Add comprehensive unit tests for convertProcessedArgs and mapParameterName functions
- Update existing test cases to match new parameter format (x,y for single coords, from_x,from_y,to_x,to_y for drag)
This resolves the issue where uixt__input action was not working due to parameter name mismatch.
- Support configuring multiple LLM services simultaneously
- Auto-derive model names from service types to simplify configuration
- Maintain backward compatibility with existing configurations
- Refactor configuration logic into dedicated env module
- Add comprehensive unit test coverage
- Update documentation with new configuration approach
- Add ResetHistory option to PlanningOptions and ActionOptions
- Implement task completion detection with isTaskFinished() method
- Add executeActions() method to separate action execution logic
- Modify ConversationHistory.Clear() to completely clear all messages including system message
- Refactor StartToGoal() to automatically reset history on first attempt
- Add WithResetHistory() option function for consistent API
- Consolidate test files into driver_ext_ai_test.go with comprehensive test coverage
- Remove all manual ReturnSchema() methods from tools
- Implement automatic schema generation using reflection
- Unify response format to flat structure with action/success/message fields
- Simplify tool implementation by removing MCPResponse embedding
- Update documentation to reflect new architecture
- Achieve ~70% code reduction while maintaining type safety
- Add ToolStartToGoal implementation with AI-driven goal automation
- Fix LLM service not initialized issue by applying global AI config to XTDriver creation
- Ensure XTDriver is created with proper AI services from the first initialization
- Add StartToGoal method to StepMobile for goal-oriented automation
- Register ToolStartToGoal in MCP server and add corresponding action type
- Add comprehensive test case for StartToGoal functionality
- Fix ReturnSchema consistency across AI tools (StartToGoal, AIAction, Finished)
- Extract AI service options in MCP argument processing
This resolves the root cause where XTDriver was created without AI services
in runStepMobileUI, ensuring only one XTDriver initialization with complete
AI service configuration.
- Remove redundant ActionSummary field from PlanningResult struct
- Update parsers to use unified Thought field instead of duplicate fields
- Modify chat interface to display Thought instead of ActionSummary
- Update planner logging to use thought instead of summary
- Adjust prompt templates to use thought field consistently
- Switch test LLM service from UI-TARS to DoubaoVL
- Add default parameter handling for sleep tool
- Add action mapping for UI-TARS parser to convert action names to option.ActionName
- Implement bounding box to center point coordinate conversion for better accuracy
- Update coordinate normalization to handle coordinates > 1000 properly
- Enhance test cases to verify coordinate scaling and center point conversion
- Improve action argument processing with proper coordinate transformation
- Add comprehensive test coverage for coordinate conversion edge cases
Key improvements:
- Bounding box [x1,y1,x2,y2] now converts to center point [cx,cy] for actions
- Coordinate scaling properly handles different screen resolutions
- Action names are mapped through doubao_1_5_ui_tars_action_mapping
- Enhanced error handling for invalid coordinate formats
- Move MobileAction struct from uixt package to uixt/option package
- Delete uixt/driver_action.go file as MobileAction is now in option package
- Update all import statements across the codebase to use option.MobileAction
- Update ActionTool interface to use option.MobileAction in ConvertActionToCallToolRequest method
- Maintain backward compatibility while improving package organization
- Clean up code structure by consolidating action-related types in option package
Files affected:
- server/uixt.go: Updated imports and type references
- step.go: Updated imports and ActionResult struct
- step_ui.go: Updated all MobileAction references to option.MobileAction
- uixt/mcp_server.go: Updated ActionTool interface and removed detailed comments
- uixt/mcp_server_test.go: Updated all test cases to use option.MobileAction
- uixt/mcp_tools_*.go: Updated ConvertActionToCallToolRequest method signatures
- uixt/option/action.go: Added MobileAction struct definition
- uixt/sdk.go: Updated ExecuteAction method signature
- Add detailed documentation for HttpRunner AI module
- Cover planning, assertion, computer vision, and session management
- Include architecture design, usage guide, and configuration
- Provide code examples and best practices
- Document all core components and interfaces
- Add detailed package-level documentation for mcp_server.go
- Create MCP_SERVER_DOCUMENTATION.md with complete implementation guide
- Create MCP_TOOLS_REFERENCE.md with quick reference for all tools
- Add extensive code comments for key structures and functions
- Document architecture, features, extension guide, and best practices
- Include usage examples and troubleshooting information
This provides complete documentation for developers to understand,
use, and extend the HttpRunner MCP server functionality.
- Replace all mapToStruct calls with parseActionOptions function
- Add parseActionOptions implementation for MCP request parameter parsing
- Remove undefined mapToStruct function that was causing compilation errors
- Standardize parameter names (fromX/fromY/toX/toY -> from_x/from_y/to_x/to_y)
- Add AntiRisk support for TapAbsXY and Drag tools
- Improve parameter validation for Drag tool
- Update corresponding test cases to match new parameter names
This fixes compilation errors and ensures all MCP tools work correctly.
- Document the complete integration process of ActionOptions and Request structures
- Include detailed statistics: 40 tools migrated with 100% test pass rate
- Provide technical implementation details and usage examples
- Record backward compatibility guarantees and migration helpers
- Summarize code quality improvements and performance optimizations
- Outline future development plans and goals
This documentation serves as a complete record of the unification initiative
and provides guidance for future development and maintenance.
- Modified ToolSwipe.ConvertActionToCallToolRequest to delegate to ToolSwipeDirection and ToolSwipeCoordinate
- Removed duplicate parameter handling logic in favor of reusing existing implementations
- Fixed linter error by removing unused variable
- Maintained backward compatibility while reducing code duplication
- All tests pass, confirming the refactoring is successful
- Added ACTION_Swipe to option/action.go for generic swipe functionality
- Implemented ToolSwipe in mcp_server.go that automatically detects parameter type:
- String params (up/down/left/right) use direction-based swipe logic
- Array params [fromX, fromY, toX, toY] use coordinate-based swipe logic
- Added comprehensive test coverage for ToolSwipe in mcp_server_test.go
- Updated tool registration to include the new generic swipe tool
- All tests pass, confirming backward compatibility with existing tools