* fix: convert AI tests from skip statements to build tags
- Add //go:build localtest tag to uixt/ai/ai_test.go and examples/game/llk/main_test.go
- Remove environment-based skip statements and hasRequiredEnvVars functions
- Maintain consistency with existing build tag approach for mobile/device tests
- Prevents CI/CD failures when external AI services are not available
Co-authored-by: debugtalk <debugtalk@users.noreply.github.com>
* fix: add missing BoundBox type and field to Element struct
- Add BoundBox struct with X, Y, Width, Height fields
- Update Element struct to include BoundBox field
- Fix structural mismatch between test expectations and Go code
- Resolves CI compilation failures
Co-authored-by: debugtalk <debugtalk@users.noreply.github.com>
---------
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
The previous message cleaning logic was flawed:
- cleanedMsg.Content was already set to message.Content
- The condition checked if message.Content == "" then set cleanedMsg.Content = ""
- This was redundant since cleanedMsg.Content would already be empty
The real fix for the API 400 error is in planner.go where we ensure Tool messages
have non-empty content. The utils.go changes were unnecessary.
- Fix Tool message content issue when model returns empty content in function calling
- Add content validation in callModelWithLogging to handle empty content in messages
- Ensure compatibility between UI-TARS and function calling models
This resolves the "missing messages.content parameter" error when using
doubao-seed-1.6-250615 model compared to doubao-1.5-ui-tars-250328
- Add LLMServiceConfig to support mixed model configuration
- Enable Planner, Asserter, Querier to use different optimal models
- Provide recommended configurations for various use cases
- Maintain backward compatibility with existing API
- Update documentation to reflect current state without iteration history
- Merge test files and add comprehensive configuration tests
- Resolve circular dependency by moving config to option package
- Add comprehensive documentation for the new Query functionality
- Update interface method names from Call to Plan for consistency
- Add OpenAI GPT-4O model support documentation
- Include detailed usage examples for basic and custom schema queries
- Add configuration examples for multiple model services
- Document new features like ResetHistory, Usage statistics, and automatic type conversion
- Expand advanced features section with custom output format examples
- Update all code examples to reflect the latest API changes
The documentation now reflects the current state of the AI module with all three core capabilities:
- Planning (renamed from Call)
- Assertion
- Query (new feature)
All examples and configurations are updated to match the latest implementation.
- Add Query method to ILLMService interface for unified AI service access
- Update combinedLLMService to include querier functionality
- Add comprehensive tests for ILLMService Query method
- Support both basic query and custom schema query through unified interface
- Add environment variable checks for test reliability
This allows users to access all AI capabilities (planning, assertion, and query)
through a single ILLMService interface, providing better API consistency and ease of use.
- Add new AI Querier module for structured information extraction from screenshots
- Support custom output schema for structured data response
- Implement automatic type conversion and data validation
- Add comprehensive test suite with various data structure examples
- Refactor callModelWithLogging to utils.go as shared function for planner, asserter, and querier
- Eliminate code duplication across AI modules (30+ lines of repeated code)
- Improve maintainability with unified logging and timing logic
- Add environment variable checks in test setup to handle missing API keys gracefully
Key features:
- Custom output schema support with JSON Schema generation
- Automatic data type conversion with reflection
- Fallback mechanisms for robust parsing
- Comprehensive documentation and usage examples
- Backward compatibility with existing functionality
- Handle direction parameter in convertProcessedArgs for scroll actions
- Ensure scroll operations map to swipe with both coordinates and direction
- Add comprehensive test coverage for scroll action parsing
- Fix issue where scroll direction was missing from tool call arguments
- Replace byte-based brace counting with UTF-8 aware rune iteration
- Add proper string state tracking to handle escaped quotes
- Add comprehensive test cases for Chinese character handling
- Fix parsing errors when JSON contains Chinese text like 2048经典
- Fix JSON extraction logic by prioritizing brace counting method
- Add support for DOUBAO string array coordinate format
- Introduce IS_UI_TARS helper function for model type checking
- Add comprehensive tests for JSON parsing and coordinate handling
- Improve error handling with retry delays for LLM service failures