Commit Graph

40 Commits

Author SHA1 Message Date
lilong.129
a1a235d2b4 change: disable register tools for models with function calling 2025-07-03 23:07:39 +08:00
lilong.129
1694f36837 change: replace openai model with ark model 2025-07-03 22:13:23 +08:00
lilong.129
90ce090e35 fix: remove redundant message cleaning logic in callModelWithLogging
The previous message cleaning logic was flawed:
- cleanedMsg.Content was already set to message.Content
- The condition checked if message.Content == "" then set cleanedMsg.Content = ""
- This was redundant since cleanedMsg.Content would already be empty

The real fix for the API 400 error is in planner.go where we ensure Tool messages
have non-empty content. The utils.go changes were unnecessary.
2025-06-26 13:41:39 +08:00
lilong.129
e070801b00 fix: resolve doubao-seed-1.6-250615 model API 400 error with empty content
- Fix Tool message content issue when model returns empty content in function calling
- Add content validation in callModelWithLogging to handle empty content in messages
- Ensure compatibility between UI-TARS and function calling models

This resolves the "missing messages.content parameter" error when using 
doubao-seed-1.6-250615 model compared to doubao-1.5-ui-tars-250328
2025-06-26 10:57:27 +08:00
lilong.129
81a92ae155 docs: update AI module README with latest features
- Add comprehensive documentation for the new Query functionality
- Update interface method names from Call to Plan for consistency
- Add OpenAI GPT-4O model support documentation
- Include detailed usage examples for basic and custom schema queries
- Add configuration examples for multiple model services
- Document new features like ResetHistory, Usage statistics, and automatic type conversion
- Expand advanced features section with custom output format examples
- Update all code examples to reflect the latest API changes

The documentation now reflects the current state of the AI module with all three core capabilities:
- Planning (renamed from Call)
- Assertion
- Query (new feature)

All examples and configurations are updated to match the latest implementation.
2025-06-10 20:52:44 +08:00
lilong.129
7c45acd061 feat: add AI Querier module with custom output schema support and refactor common model calling logic
- Add new AI Querier module for structured information extraction from screenshots
- Support custom output schema for structured data response
- Implement automatic type conversion and data validation
- Add comprehensive test suite with various data structure examples
- Refactor callModelWithLogging to utils.go as shared function for planner, asserter, and querier
- Eliminate code duplication across AI modules (30+ lines of repeated code)
- Improve maintainability with unified logging and timing logic
- Add environment variable checks in test setup to handle missing API keys gracefully

Key features:
- Custom output schema support with JSON Schema generation
- Automatic data type conversion with reflection
- Fallback mechanisms for robust parsing
- Comprehensive documentation and usage examples
- Backward compatibility with existing functionality
2025-06-10 20:41:35 +08:00
lilong.129
88ae8faee1 feat: enhance VLM response parsing and DOUBAO model support
- Fix JSON extraction logic by prioritizing brace counting method
- Add support for DOUBAO string array coordinate format
- Introduce IS_UI_TARS helper function for model type checking
- Add comprehensive tests for JSON parsing and coordinate handling
- Improve error handling with retry delays for LLM service failures
2025-06-10 15:56:13 +08:00
lilong.129
90401eeb78 change: remove unnecessary logs 2025-06-10 13:19:36 +08:00
lilong.129
39acadb0a7 feat: add MCP tools registration to LLM service
- Add RegisterTools method to ILLMService interface
- Create shared MCP to eino tool converter
- Auto-register built-in uixt tools in XTDriver initialization
- Refactor MCPHost to use shared converter
- Add comprehensive test coverage for tool conversion

This enables doubao-1.5-thinking-vision-pro model to access
MCP tools through function calling mechanism.
2025-06-09 22:19:43 +08:00
lilong.129
96da4515a1 feat: optimize test report UI and add LLM usage tracking 2025-06-09 17:04:55 +08:00
lilong.129
14cef72f5a feat: add model name display in AI actions and optimize HTML report
- Add ModelName field to PlanningResult and SubActionResult
- Update HTML report with improved layout and model name display
- Fix elapsed time setting bug and enhance mobile responsiveness
2025-06-08 22:08:51 +08:00
lilong.129
484eebdefd feat: implement multi-model service configuration support
- Support configuring multiple LLM services simultaneously
- Auto-derive model names from service types to simplify configuration
- Maintain backward compatibility with existing configurations
- Refactor configuration logic into dedicated env module
- Add comprehensive unit test coverage
- Update documentation with new configuration approach
2025-06-06 22:17:59 +08:00
lilong.129
b642ea004e feat: implement UI automation test history isolation
- Add ResetHistory option to PlanningOptions and ActionOptions
- Implement task completion detection with isTaskFinished() method
- Add executeActions() method to separate action execution logic
- Modify ConversationHistory.Clear() to completely clear all messages including system message
- Refactor StartToGoal() to automatically reset history on first attempt
- Add WithResetHistory() option function for consistent API
- Consolidate test files into driver_ext_ai_test.go with comprehensive test coverage
2025-06-06 15:29:42 +08:00
lilong.129
d883aa6a21 change: rename VLM name 2025-06-05 18:09:25 +08:00
lilong.129
0add3231ff refactor: merge ActionSummary and Thought fields to eliminate duplication
- Remove redundant ActionSummary field from PlanningResult struct
- Update parsers to use unified Thought field instead of duplicate fields
- Modify chat interface to display Thought instead of ActionSummary
- Update planner logging to use thought instead of summary
- Adjust prompt templates to use thought field consistently
- Switch test LLM service from UI-TARS to DoubaoVL
- Add default parameter handling for sleep tool
2025-06-05 14:19:09 +08:00
lilong.129
c204542f1f feat: optimize UI-TARS parser with coordinate conversion and action mapping
- Add action mapping for UI-TARS parser to convert action names to option.ActionName
- Implement bounding box to center point coordinate conversion for better accuracy
- Update coordinate normalization to handle coordinates > 1000 properly
- Enhance test cases to verify coordinate scaling and center point conversion
- Improve action argument processing with proper coordinate transformation
- Add comprehensive test coverage for coordinate conversion edge cases

Key improvements:
- Bounding box [x1,y1,x2,y2] now converts to center point [cx,cy] for actions
- Coordinate scaling properly handles different screen resolutions
- Action names are mapped through doubao_1_5_ui_tars_action_mapping
- Enhanced error handling for invalid coordinate formats
2025-06-04 23:16:14 +08:00
lilong.129
4e74247cab fix: miss tool call ID 2025-05-26 09:28:46 +08:00
lilong.129
014140ccc7 change: append tool call message for planner 2025-05-24 10:28:55 +08:00
lilong.129
81c854f963 refactor: merge ai parser 2025-05-24 00:25:44 +08:00
lilong.129
19ddcb40cc change: update ui-tars prompt 2025-05-23 22:05:21 +08:00
lilong.129
009bfa4ecb refactor: replace ui-tars parser with https://github.com/bytedance/UI-TARS/blob/main/codes/ui_tars/action_parser.py 2025-05-22 22:52:47 +08:00
lilong.129
3f1ee03529 refactor: mcphost planner 2025-05-18 21:55:01 +08:00
lilong.129
fcddcfb630 refactor: GetModelConfig 2025-04-30 15:21:17 +08:00
lilong.129
2ae252b52a refactor: merge planner 2025-04-30 14:07:48 +08:00
lilong.129
4d7c7e8aaf refactor: ai asserter 2025-04-29 20:08:22 +08:00
lilong.129
7132eec39e feat: add status code for llm 2025-04-28 21:06:53 +08:00
lilong.129
68dbeb368a refactor: adds a message to the conversation history 2025-04-28 20:12:08 +08:00
lilong.129
7fa4155390 refactor: move code 2025-04-27 22:37:48 +08:00
lilong.129
9bcdd5d19a feat: add AIAsert 2025-04-27 22:25:06 +08:00
lilong.129
84ff75c3b1 change: add tests 2025-04-27 19:13:55 +08:00
lilong.129
70a8ee01f7 refactor: llm planner 2025-04-21 21:33:30 +08:00
lilong.129
ebeae596a7 stash 2025-04-21 14:39:37 +08:00
lilong.129
2ad5c4f6db fix: load env 2025-03-23 10:06:50 +08:00
lilong.129
12e0f7f9a2 feat: save screenshots for PlanNextAction 2025-03-22 01:07:28 +08:00
lilong.129
8a3b6b5c4c feat: appendConversationHistory for ai planner 2025-03-22 00:06:30 +08:00
lilong.129
868acd45ac fix: load jpeg image 2025-03-20 20:39:32 +08:00
lilong.129
da0bdc4fe5 fix: convertCoordinateAction 2025-03-20 18:02:35 +08:00
lilong.129
3801ffb744 feat: load .env file from current working directory upward recursively 2025-03-20 14:23:56 +08:00
lilong.129
b5f3e7ff96 change: remove unused code 2025-03-19 22:47:10 +08:00
lilong.129
55acaceb09 feat: add TapByLLM/PlanNextAction for XTDriver 2025-03-19 21:16:21 +08:00