Commit Graph

4549 Commits

Author SHA1 Message Date
lilong.129
b271e655b1 feat: add MCP plugin support and optimize AI service configuration
- Add UIXT runner with MCP plugin support
   - Refactor AI service options handling
   - Optimize configuration parsing for LLM and CV services
   - Update dependencies to latest versions
2025-06-13 20:24:57 +08:00
lilong.129
409cd693f0 refactor: GetScreenshotBase64WithSize 2025-06-13 12:01:21 +08:00
lilong.129
f6e7e970f8 feat: 实现 AIQuery 功能并支持 OutputSchema
- 新增 AIQuery 方法到 StepMobile,支持使用自然语言从屏幕中提取信息
- 实现 AIQuery 在 driver_ext_ai.go 中的完整功能,包括屏幕截图和 LLM 查询
- 添加 OutputSchema 支持,允许用户定义自定义输出格式进行结构化查询
- 新增 ToolAIQuery MCP 工具,完整集成到 MCP 服务器中
- 在 ActionOptions 中添加 OutputSchema 字段和 WithOutputSchema 选项函数
- 添加 ACTION_Query 的配置支持和字段映射
- 完善测试覆盖:
  * 添加 TestAIQuery 单元测试,包含多种 OutputSchema 使用场景
  * 添加 TestToolAIQuery MCP 工具测试
  * 定义 GameInfo、UIElementInfo 等结构体用于测试
- 更新文档:
  * 在 docs/uixt/ai.md 中添加完整的 AIQuery 使用指南
  * 包含基本用法、OutputSchema 示例、最佳实践等
- 支持复杂的嵌套结构体和数组类型的 OutputSchema
- 与现有 AIAction、AIAssert 功能保持一致的 API 设计
2025-06-13 10:27:08 +08:00
lilong.129
fb0418fa95 feat: add LianLianKan (连连看) game bot implementation
- Add complete LianLianKan game bot with AI-powered interface analysis
- Implement static analysis solver with 0-2 turn connection algorithms
- Support cross-platform game automation (Android, iOS, HarmonyOS, Browser)
- Include comprehensive test suite with real game data
- Add command line tool and documentation
- Integrate with HttpRunner @/uixt module and Doubao AI models
2025-06-12 17:51:23 +08:00
lilong.129
72df285fed fix: get resultsPath 2025-06-12 14:51:15 +08:00
lilong.129
51ee639cac docs: update docs 2025-06-11 14:57:08 +08:00
lilong.129
fbc888655f feat: optimize ILLMService interface to support different models for each component
- Add LLMServiceConfig to support mixed model configuration
- Enable Planner, Asserter, Querier to use different optimal models
- Provide recommended configurations for various use cases
- Maintain backward compatibility with existing API
- Update documentation to reflect current state without iteration history
- Merge test files and add comprehensive configuration tests
- Resolve circular dependency by moving config to option package
2025-06-11 12:18:31 +08:00
lilong.129
50414ec74d fix(ai): 修复 OpenAI 结构化输出解析问题并重构代码结构
- 修复 OpenAI structured output 的 properties 包装层解析问题
- 重构 parseCustomSchemaResult 函数,提高代码可维护性:
  - 拆分为多个职责单一的小函数
  - 消除重复的字段提取逻辑
  - 采用清晰的策略模式处理不同解析场景
- 增强测试用例,添加具体的数值和结构验证
- 保持完全向后兼容,所有现有测试通过

Fixes: TestQueryFunctionality/ComprehensiveAnalysis 测试失败问题
2025-06-11 11:15:02 +08:00
lilong.129
caf75b087b fix: remove unneccessary tests 2025-06-10 22:52:52 +08:00
lilong.129
514d321188 refactor: remove toggle buttons and expand all actions by default in HTML report 2025-06-10 21:24:21 +08:00
lilong.129
81a92ae155 docs: update AI module README with latest features
- Add comprehensive documentation for the new Query functionality
- Update interface method names from Call to Plan for consistency
- Add OpenAI GPT-4O model support documentation
- Include detailed usage examples for basic and custom schema queries
- Add configuration examples for multiple model services
- Document new features like ResetHistory, Usage statistics, and automatic type conversion
- Expand advanced features section with custom output format examples
- Update all code examples to reflect the latest API changes

The documentation now reflects the current state of the AI module with all three core capabilities:
- Planning (renamed from Call)
- Assertion
- Query (new feature)

All examples and configurations are updated to match the latest implementation.
2025-06-10 20:52:44 +08:00
lilong.129
c513e56d30 feat: add Query method to ILLMService interface
- Add Query method to ILLMService interface for unified AI service access
- Update combinedLLMService to include querier functionality
- Add comprehensive tests for ILLMService Query method
- Support both basic query and custom schema query through unified interface
- Add environment variable checks for test reliability

This allows users to access all AI capabilities (planning, assertion, and query) 
through a single ILLMService interface, providing better API consistency and ease of use.
2025-06-10 20:45:49 +08:00
lilong.129
7c45acd061 feat: add AI Querier module with custom output schema support and refactor common model calling logic
- Add new AI Querier module for structured information extraction from screenshots
- Support custom output schema for structured data response
- Implement automatic type conversion and data validation
- Add comprehensive test suite with various data structure examples
- Refactor callModelWithLogging to utils.go as shared function for planner, asserter, and querier
- Eliminate code duplication across AI modules (30+ lines of repeated code)
- Improve maintainability with unified logging and timing logic
- Add environment variable checks in test setup to handle missing API keys gracefully

Key features:
- Custom output schema support with JSON Schema generation
- Automatic data type conversion with reflection
- Fallback mechanisms for robust parsing
- Comprehensive documentation and usage examples
- Backward compatibility with existing functionality
2025-06-10 20:41:35 +08:00
lilong.129
fa9a53d2ae change: add test 2025-06-10 18:16:36 +08:00
lilong.129
304abe653a feat: optimize HTML report layout and clean up redundant code
- Redesign planning section with three-column layout
- Improve screenshot display with adaptive sizing
- Enhance actions details presentation
- Add compact request toggle functionality
- Remove unused CSS styles and redundant code
- Improve responsive design for mobile devices
2025-06-10 18:13:19 +08:00
lilong.129
9c906934fd fix: resolve Chinese character encoding issue in HTML report downloads
- Add decodeBase64UTF8 function to properly handle UTF-8 encoded Base64 content
- Replace atob() with TextDecoder for correct Chinese character decoding
- Explicitly specify UTF-8 charset when creating download Blob
- Fix garbled Chinese text when downloading summary.json from HTML report
2025-06-10 17:07:08 +08:00
lilong.129
98bd41ff33 fix: add direction parameter support for scroll operations in UI-TARS parser
- Handle direction parameter in convertProcessedArgs for scroll actions
- Ensure scroll operations map to swipe with both coordinates and direction
- Add comprehensive test coverage for scroll action parsing
- Fix issue where scroll direction was missing from tool call arguments
2025-06-10 16:40:10 +08:00
lilong.129
c322d7c36c fix: improve JSON extraction to handle UTF-8 Chinese characters properly
- Replace byte-based brace counting with UTF-8 aware rune iteration
- Add proper string state tracking to handle escaped quotes
- Add comprehensive test cases for Chinese character handling
- Fix parsing errors when JSON contains Chinese text like 2048经典
2025-06-10 16:09:50 +08:00
lilong.129
88ae8faee1 feat: enhance VLM response parsing and DOUBAO model support
- Fix JSON extraction logic by prioritizing brace counting method
- Add support for DOUBAO string array coordinate format
- Introduce IS_UI_TARS helper function for model type checking
- Add comprehensive tests for JSON parsing and coordinate handling
- Improve error handling with retry delays for LLM service failures
2025-06-10 15:56:13 +08:00
lilong.129
4959c2e47e feat: extractJSONFromContent 2025-06-10 14:08:44 +08:00
lilong.129
7dc0f869be fix: extracts JSON content from various formats in the response 2025-06-10 14:02:41 +08:00
lilong.129
90401eeb78 change: remove unnecessary logs 2025-06-10 13:19:36 +08:00
lilong.129
f5f6d177ab fix: optimize report command to avoid creating timestamp directories
- Implement lazy loading for directory creation in config.go
- Add logFile parameter to InitLogger for better control
- Use dynamic directory existence check instead of flags
- Report command now uses console-only logging to prevent directory creation
- Support both JSON and colorized console output formats
- Maintain backward compatibility for all other commands

Changes:
- config.go: Convert directory paths to getter methods with lazy creation
- logger.go: Add logFile parameter and improve logging control
- cmd/root.go: Detect report command and disable file logging
- uixt/*: Update all references to use new getter methods

Fixes the issue where 'hrp report results/' would create unwanted timestamp directories
2025-06-10 12:06:08 +08:00
lilong.129
6588d95154 fix: 修复 summary.json 中文乱码问题
- 改进 Dump2JSON 函数的文件写入方式,确保 UTF-8 编码正确处理
- 添加文件同步操作防止数据不完整
- 新增 UTF-8 编码测试验证修复效果
- 同步改进 HTML 报告生成的文件写入方式
2025-06-10 11:03:10 +08:00
lilong.129
12cebef3b9 change: set llm timeout to 120s 2025-06-09 22:42:19 +08:00
lilong.129
39acadb0a7 feat: add MCP tools registration to LLM service
- Add RegisterTools method to ILLMService interface
- Create shared MCP to eino tool converter
- Auto-register built-in uixt tools in XTDriver initialization
- Refactor MCPHost to use shared converter
- Add comprehensive test coverage for tool conversion

This enables doubao-1.5-thinking-vision-pro model to access
MCP tools through function calling mechanism.
2025-06-09 22:19:43 +08:00
lilong.129
dd52faef57 refactor: move Call function 2025-06-09 20:52:32 +08:00
lilong.129
f1544d4a5c feat: implement separate log levels for console and file output
- Console logger respects user-specified log level
- File logger always uses DEBUG level to capture all logs
- Add custom leveledMultiWriter for different output levels
- Remove global log level setting for more granular control
2025-06-09 19:16:39 +08:00
lilong.129
533c1f4bff feat: add mcp tool ToolScreenRecord 2025-06-09 17:18:26 +08:00
lilong.129
96da4515a1 feat: optimize test report UI and add LLM usage tracking 2025-06-09 17:04:55 +08:00
lilong.129
e85802cdda feat: add download for summary.json and hrp.log in report.html 2025-06-09 00:29:27 +08:00
lilong.129
a91a10ac13 docs: update cmd docs 2025-06-09 00:06:23 +08:00
lilong.129
cf360c8c46 feat: compress image data for html report 2025-06-08 23:48:23 +08:00
lilong.129
14cef72f5a feat: add model name display in AI actions and optimize HTML report
- Add ModelName field to PlanningResult and SubActionResult
- Update HTML report with improved layout and model name display
- Fix elapsed time setting bug and enhance mobile responsiveness
2025-06-08 22:08:51 +08:00
lilong.129
660e8ca124 feat: add mcp tool ToolGetForegroundApp 2025-06-08 19:25:09 +08:00
lilong.129
b9de3cf7a3 refactor: simplify AI action execution and improve sub-action handling 2025-06-08 19:16:37 +08:00
lilong.129
bdf64a08aa feat: enhance HTML report with statistics and collapsible log fields 2025-06-08 10:05:30 +08:00
lilong.129
f2607f7664 style: optimize log display for more compact layout
- Move log message to same line as timestamp and level
- Reduce padding and font sizes for tighter spacing
- Optimize log data display with left border and indentation
- Add responsive design for mobile devices
- Achieve more compact display with fewer lines per log entry
2025-06-08 09:34:21 +08:00
lilong.129
5f7698c6b4 fix: improve Chinese character display in HTML reports
- Fix JSON serialization to preserve Chinese characters instead of Unicode escaping
- Use SetEscapeHTML(false) in toJSON template function
- Apply safeHTML to prevent HTML entity encoding of Chinese text
- Now displays {"text":"连了又连"} instead of {"text":"连了又连"}
2025-06-08 09:29:41 +08:00
lilong.129
4053cc9985 feat: add comprehensive HTML report generation with log filtering
- Add complete HTML report generator with template-based rendering
- Implement log time filtering for step-specific logs
- Support responsive design and interactive UI features
- Consolidate duplicate report implementations
2025-06-08 09:23:14 +08:00
lilong.129
ec4f1eb68a refactor: unify action execution interface and merge AI action handling 2025-06-07 23:59:07 +08:00
lilong.129
fcf3009c67 fix: abnormal indent in summary.json 2025-06-07 20:45:35 +08:00
lilong.129
e75edf8400 feat: add log file output to results/taskID directory 2025-06-07 16:52:41 +08:00
lilong.129
604eed3340 refactor: optimize runner error handling and cleanup logic
- Use defer for summary saving and HTML report generation to ensure they run regardless of exit path
- Remove unnecessary sync.Once for cleanup operations since defer guarantees single execution
- Simplify error handling logic by removing redundant runErr checks
- Improve interrupt handling with better logging messages
- Ensure graceful cleanup and data persistence even when interrupted
2025-06-07 16:36:53 +08:00
lilong.129
460570f651 fix(uixt): fix uixt__input not working and add comprehensive unit tests
- Fix parameter mapping issue where AI model's 'content' parameter wasn't mapped to 'text' field
- Add mapParameterName function to handle parameter name mapping (content->text, key->keycode)
- Add comprehensive unit tests for convertProcessedArgs and mapParameterName functions
- Update existing test cases to match new parameter format (x,y for single coords, from_x,from_y,to_x,to_y for drag)

This resolves the issue where uixt__input action was not working due to parameter name mismatch.
2025-06-07 15:03:29 +08:00
lilong.129
334c0dc141 fix: 修复移动端步骤包含 validate 时验证器不执行的问题 2025-06-06 22:18:43 +08:00
lilong.129
484eebdefd feat: implement multi-model service configuration support
- Support configuring multiple LLM services simultaneously
- Auto-derive model names from service types to simplify configuration
- Maintain backward compatibility with existing configurations
- Refactor configuration logic into dedicated env module
- Add comprehensive unit test coverage
- Update documentation with new configuration approach
2025-06-06 22:17:59 +08:00
lilong.129
b642ea004e feat: implement UI automation test history isolation
- Add ResetHistory option to PlanningOptions and ActionOptions
- Implement task completion detection with isTaskFinished() method
- Add executeActions() method to separate action execution logic
- Modify ConversationHistory.Clear() to completely clear all messages including system message
- Refactor StartToGoal() to automatically reset history on first attempt
- Add WithResetHistory() option function for consistent API
- Consolidate test files into driver_ext_ai_test.go with comprehensive test coverage
2025-06-06 15:29:42 +08:00
lilong.129
6e1bd5bbe2 feat: optimize MCP tools response format with automatic schema generation
- Remove all manual ReturnSchema() methods from tools
- Implement automatic schema generation using reflection
- Unify response format to flat structure with action/success/message fields
- Simplify tool implementation by removing MCPResponse embedding
- Update documentation to reflect new architecture
- Achieve ~70% code reduction while maintaining type safety
2025-06-05 23:17:06 +08:00
lilong.129
56831845ca change: fix logs 2025-06-05 20:26:18 +08:00