Commit Graph

4541 Commits

Author SHA1 Message Date
lilong.129
caf75b087b fix: remove unneccessary tests 2025-06-10 22:52:52 +08:00
lilong.129
514d321188 refactor: remove toggle buttons and expand all actions by default in HTML report 2025-06-10 21:24:21 +08:00
lilong.129
81a92ae155 docs: update AI module README with latest features
- Add comprehensive documentation for the new Query functionality
- Update interface method names from Call to Plan for consistency
- Add OpenAI GPT-4O model support documentation
- Include detailed usage examples for basic and custom schema queries
- Add configuration examples for multiple model services
- Document new features like ResetHistory, Usage statistics, and automatic type conversion
- Expand advanced features section with custom output format examples
- Update all code examples to reflect the latest API changes

The documentation now reflects the current state of the AI module with all three core capabilities:
- Planning (renamed from Call)
- Assertion
- Query (new feature)

All examples and configurations are updated to match the latest implementation.
2025-06-10 20:52:44 +08:00
lilong.129
c513e56d30 feat: add Query method to ILLMService interface
- Add Query method to ILLMService interface for unified AI service access
- Update combinedLLMService to include querier functionality
- Add comprehensive tests for ILLMService Query method
- Support both basic query and custom schema query through unified interface
- Add environment variable checks for test reliability

This allows users to access all AI capabilities (planning, assertion, and query) 
through a single ILLMService interface, providing better API consistency and ease of use.
2025-06-10 20:45:49 +08:00
lilong.129
7c45acd061 feat: add AI Querier module with custom output schema support and refactor common model calling logic
- Add new AI Querier module for structured information extraction from screenshots
- Support custom output schema for structured data response
- Implement automatic type conversion and data validation
- Add comprehensive test suite with various data structure examples
- Refactor callModelWithLogging to utils.go as shared function for planner, asserter, and querier
- Eliminate code duplication across AI modules (30+ lines of repeated code)
- Improve maintainability with unified logging and timing logic
- Add environment variable checks in test setup to handle missing API keys gracefully

Key features:
- Custom output schema support with JSON Schema generation
- Automatic data type conversion with reflection
- Fallback mechanisms for robust parsing
- Comprehensive documentation and usage examples
- Backward compatibility with existing functionality
2025-06-10 20:41:35 +08:00
lilong.129
fa9a53d2ae change: add test 2025-06-10 18:16:36 +08:00
lilong.129
304abe653a feat: optimize HTML report layout and clean up redundant code
- Redesign planning section with three-column layout
- Improve screenshot display with adaptive sizing
- Enhance actions details presentation
- Add compact request toggle functionality
- Remove unused CSS styles and redundant code
- Improve responsive design for mobile devices
2025-06-10 18:13:19 +08:00
lilong.129
9c906934fd fix: resolve Chinese character encoding issue in HTML report downloads
- Add decodeBase64UTF8 function to properly handle UTF-8 encoded Base64 content
- Replace atob() with TextDecoder for correct Chinese character decoding
- Explicitly specify UTF-8 charset when creating download Blob
- Fix garbled Chinese text when downloading summary.json from HTML report
2025-06-10 17:07:08 +08:00
lilong.129
98bd41ff33 fix: add direction parameter support for scroll operations in UI-TARS parser
- Handle direction parameter in convertProcessedArgs for scroll actions
- Ensure scroll operations map to swipe with both coordinates and direction
- Add comprehensive test coverage for scroll action parsing
- Fix issue where scroll direction was missing from tool call arguments
2025-06-10 16:40:10 +08:00
lilong.129
c322d7c36c fix: improve JSON extraction to handle UTF-8 Chinese characters properly
- Replace byte-based brace counting with UTF-8 aware rune iteration
- Add proper string state tracking to handle escaped quotes
- Add comprehensive test cases for Chinese character handling
- Fix parsing errors when JSON contains Chinese text like 2048经典
2025-06-10 16:09:50 +08:00
lilong.129
88ae8faee1 feat: enhance VLM response parsing and DOUBAO model support
- Fix JSON extraction logic by prioritizing brace counting method
- Add support for DOUBAO string array coordinate format
- Introduce IS_UI_TARS helper function for model type checking
- Add comprehensive tests for JSON parsing and coordinate handling
- Improve error handling with retry delays for LLM service failures
2025-06-10 15:56:13 +08:00
lilong.129
4959c2e47e feat: extractJSONFromContent 2025-06-10 14:08:44 +08:00
lilong.129
7dc0f869be fix: extracts JSON content from various formats in the response 2025-06-10 14:02:41 +08:00
lilong.129
90401eeb78 change: remove unnecessary logs 2025-06-10 13:19:36 +08:00
lilong.129
f5f6d177ab fix: optimize report command to avoid creating timestamp directories
- Implement lazy loading for directory creation in config.go
- Add logFile parameter to InitLogger for better control
- Use dynamic directory existence check instead of flags
- Report command now uses console-only logging to prevent directory creation
- Support both JSON and colorized console output formats
- Maintain backward compatibility for all other commands

Changes:
- config.go: Convert directory paths to getter methods with lazy creation
- logger.go: Add logFile parameter and improve logging control
- cmd/root.go: Detect report command and disable file logging
- uixt/*: Update all references to use new getter methods

Fixes the issue where 'hrp report results/' would create unwanted timestamp directories
2025-06-10 12:06:08 +08:00
lilong.129
6588d95154 fix: 修复 summary.json 中文乱码问题
- 改进 Dump2JSON 函数的文件写入方式,确保 UTF-8 编码正确处理
- 添加文件同步操作防止数据不完整
- 新增 UTF-8 编码测试验证修复效果
- 同步改进 HTML 报告生成的文件写入方式
2025-06-10 11:03:10 +08:00
lilong.129
12cebef3b9 change: set llm timeout to 120s 2025-06-09 22:42:19 +08:00
lilong.129
39acadb0a7 feat: add MCP tools registration to LLM service
- Add RegisterTools method to ILLMService interface
- Create shared MCP to eino tool converter
- Auto-register built-in uixt tools in XTDriver initialization
- Refactor MCPHost to use shared converter
- Add comprehensive test coverage for tool conversion

This enables doubao-1.5-thinking-vision-pro model to access
MCP tools through function calling mechanism.
2025-06-09 22:19:43 +08:00
lilong.129
dd52faef57 refactor: move Call function 2025-06-09 20:52:32 +08:00
lilong.129
f1544d4a5c feat: implement separate log levels for console and file output
- Console logger respects user-specified log level
- File logger always uses DEBUG level to capture all logs
- Add custom leveledMultiWriter for different output levels
- Remove global log level setting for more granular control
2025-06-09 19:16:39 +08:00
lilong.129
533c1f4bff feat: add mcp tool ToolScreenRecord 2025-06-09 17:18:26 +08:00
lilong.129
96da4515a1 feat: optimize test report UI and add LLM usage tracking 2025-06-09 17:04:55 +08:00
lilong.129
e85802cdda feat: add download for summary.json and hrp.log in report.html 2025-06-09 00:29:27 +08:00
lilong.129
a91a10ac13 docs: update cmd docs 2025-06-09 00:06:23 +08:00
lilong.129
cf360c8c46 feat: compress image data for html report 2025-06-08 23:48:23 +08:00
lilong.129
14cef72f5a feat: add model name display in AI actions and optimize HTML report
- Add ModelName field to PlanningResult and SubActionResult
- Update HTML report with improved layout and model name display
- Fix elapsed time setting bug and enhance mobile responsiveness
2025-06-08 22:08:51 +08:00
lilong.129
660e8ca124 feat: add mcp tool ToolGetForegroundApp 2025-06-08 19:25:09 +08:00
lilong.129
b9de3cf7a3 refactor: simplify AI action execution and improve sub-action handling 2025-06-08 19:16:37 +08:00
lilong.129
bdf64a08aa feat: enhance HTML report with statistics and collapsible log fields 2025-06-08 10:05:30 +08:00
lilong.129
f2607f7664 style: optimize log display for more compact layout
- Move log message to same line as timestamp and level
- Reduce padding and font sizes for tighter spacing
- Optimize log data display with left border and indentation
- Add responsive design for mobile devices
- Achieve more compact display with fewer lines per log entry
2025-06-08 09:34:21 +08:00
lilong.129
5f7698c6b4 fix: improve Chinese character display in HTML reports
- Fix JSON serialization to preserve Chinese characters instead of Unicode escaping
- Use SetEscapeHTML(false) in toJSON template function
- Apply safeHTML to prevent HTML entity encoding of Chinese text
- Now displays {"text":"连了又连"} instead of {"text":"连了又连"}
2025-06-08 09:29:41 +08:00
lilong.129
4053cc9985 feat: add comprehensive HTML report generation with log filtering
- Add complete HTML report generator with template-based rendering
- Implement log time filtering for step-specific logs
- Support responsive design and interactive UI features
- Consolidate duplicate report implementations
2025-06-08 09:23:14 +08:00
lilong.129
ec4f1eb68a refactor: unify action execution interface and merge AI action handling 2025-06-07 23:59:07 +08:00
lilong.129
fcf3009c67 fix: abnormal indent in summary.json 2025-06-07 20:45:35 +08:00
lilong.129
e75edf8400 feat: add log file output to results/taskID directory 2025-06-07 16:52:41 +08:00
lilong.129
604eed3340 refactor: optimize runner error handling and cleanup logic
- Use defer for summary saving and HTML report generation to ensure they run regardless of exit path
- Remove unnecessary sync.Once for cleanup operations since defer guarantees single execution
- Simplify error handling logic by removing redundant runErr checks
- Improve interrupt handling with better logging messages
- Ensure graceful cleanup and data persistence even when interrupted
2025-06-07 16:36:53 +08:00
lilong.129
460570f651 fix(uixt): fix uixt__input not working and add comprehensive unit tests
- Fix parameter mapping issue where AI model's 'content' parameter wasn't mapped to 'text' field
- Add mapParameterName function to handle parameter name mapping (content->text, key->keycode)
- Add comprehensive unit tests for convertProcessedArgs and mapParameterName functions
- Update existing test cases to match new parameter format (x,y for single coords, from_x,from_y,to_x,to_y for drag)

This resolves the issue where uixt__input action was not working due to parameter name mismatch.
2025-06-07 15:03:29 +08:00
lilong.129
334c0dc141 fix: 修复移动端步骤包含 validate 时验证器不执行的问题 2025-06-06 22:18:43 +08:00
lilong.129
484eebdefd feat: implement multi-model service configuration support
- Support configuring multiple LLM services simultaneously
- Auto-derive model names from service types to simplify configuration
- Maintain backward compatibility with existing configurations
- Refactor configuration logic into dedicated env module
- Add comprehensive unit test coverage
- Update documentation with new configuration approach
2025-06-06 22:17:59 +08:00
lilong.129
b642ea004e feat: implement UI automation test history isolation
- Add ResetHistory option to PlanningOptions and ActionOptions
- Implement task completion detection with isTaskFinished() method
- Add executeActions() method to separate action execution logic
- Modify ConversationHistory.Clear() to completely clear all messages including system message
- Refactor StartToGoal() to automatically reset history on first attempt
- Add WithResetHistory() option function for consistent API
- Consolidate test files into driver_ext_ai_test.go with comprehensive test coverage
2025-06-06 15:29:42 +08:00
lilong.129
6e1bd5bbe2 feat: optimize MCP tools response format with automatic schema generation
- Remove all manual ReturnSchema() methods from tools
- Implement automatic schema generation using reflection
- Unify response format to flat structure with action/success/message fields
- Simplify tool implementation by removing MCPResponse embedding
- Update documentation to reflect new architecture
- Achieve ~70% code reduction while maintaining type safety
2025-06-05 23:17:06 +08:00
lilong.129
56831845ca change: fix logs 2025-06-05 20:26:18 +08:00
lilong.129
5f400735fc fix: 修复 StartToGoal 命令无法通过 CTRL+C 中断的问题
- 为 AI 相关方法添加 context.Context 参数支持中断

- 在重试循环中添加上下文取消检查

- 创建可取消的上下文并监听中断信号

- 更新 MCP 工具调用使用带上下文的方法

现在用户可以通过 CTRL+C 正常中断长时间运行的 AI 自动化任务
2025-06-05 20:00:20 +08:00
lilong.129
d883aa6a21 change: rename VLM name 2025-06-05 18:09:25 +08:00
lilong.129
8cdc71d90b change: RoundToOneDecimal 2025-06-05 17:47:29 +08:00
lilong.129
c4e7ab00a7 feat: implement ToolStartToGoal and fix LLM service initialization
- Add ToolStartToGoal implementation with AI-driven goal automation
- Fix LLM service not initialized issue by applying global AI config to XTDriver creation
- Ensure XTDriver is created with proper AI services from the first initialization
- Add StartToGoal method to StepMobile for goal-oriented automation
- Register ToolStartToGoal in MCP server and add corresponding action type
- Add comprehensive test case for StartToGoal functionality
- Fix ReturnSchema consistency across AI tools (StartToGoal, AIAction, Finished)
- Extract AI service options in MCP argument processing

This resolves the root cause where XTDriver was created without AI services
in runStepMobileUI, ensuring only one XTDriver initialization with complete
AI service configuration.
2025-06-05 16:52:11 +08:00
lilong.129
0add3231ff refactor: merge ActionSummary and Thought fields to eliminate duplication
- Remove redundant ActionSummary field from PlanningResult struct
- Update parsers to use unified Thought field instead of duplicate fields
- Modify chat interface to display Thought instead of ActionSummary
- Update planner logging to use thought instead of summary
- Adjust prompt templates to use thought field consistently
- Switch test LLM service from UI-TARS to DoubaoVL
- Add default parameter handling for sleep tool
2025-06-05 14:19:09 +08:00
lilong.129
0864f74021 fix: update AI parser to use doubao-1.5-thinking-vision-pro configuration 2025-06-05 13:28:31 +08:00
lilong.129
c204542f1f feat: optimize UI-TARS parser with coordinate conversion and action mapping
- Add action mapping for UI-TARS parser to convert action names to option.ActionName
- Implement bounding box to center point coordinate conversion for better accuracy
- Update coordinate normalization to handle coordinates > 1000 properly
- Enhance test cases to verify coordinate scaling and center point conversion
- Improve action argument processing with proper coordinate transformation
- Add comprehensive test coverage for coordinate conversion edge cases

Key improvements:
- Bounding box [x1,y1,x2,y2] now converts to center point [cx,cy] for actions
- Coordinate scaling properly handles different screen resolutions
- Action names are mapped through doubao_1_5_ui_tars_action_mapping
- Enhanced error handling for invalid coordinate formats
2025-06-04 23:16:14 +08:00
lilong.129
1df529ecaa merge master 2025-06-03 18:20:55 +08:00