feat: 添加TTS相关配置和功能

- 在.env.example中添加TTS模型、语音名称和语速的配置选项
- 更新README文件,增加TTS相关配置的说明
- 在配置类中添加TTS相关设置
- 新增TTS请求模型以支持文本转语音功能
- 更新智能路由中间件以支持音频请求
- 在路由中添加处理TTS请求的API接口
- 更新前端配置编辑器以支持TTS配置选项
This commit is contained in:
snaily
2025-07-05 00:47:55 +08:00
parent 418b3ca13c
commit f38b5ae870
9 changed files with 313 additions and 65 deletions

View File

@@ -74,3 +74,8 @@ FAKE_STREAM_EMPTY_DATA_INTERVAL_SECONDS=5
# 安全设置 (JSON 字符串格式)
# 注意:这里的示例值可能需要根据实际模型支持情况调整
SAFETY_SETTINGS=[{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"}, {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_CIVIC_INTEGRITY", "threshold": "BLOCK_NONE"}]
URL_NORMALIZATION_ENABLED=false
# tts配置
TTS_MODEL=gemini-2.5-flash-preview-tts
TTS_VOICE_NAME=Zephyr
TTS_SPEED=normal

121
README.md
View File

@@ -11,7 +11,7 @@
[![Uvicorn](https://img.shields.io/badge/Uvicorn-running-purple.svg)](https://www.uvicorn.org/)
[![Telegram Group](https://img.shields.io/badge/Telegram-Group-blue.svg?logo=telegram)](https://t.me/+soaHax5lyI0wZDVl)
> Telegram Group: https://t.me/+soaHax5lyI0wZDVl
> Telegram Group: <https://t.me/+soaHax5lyI0wZDVl>
## Project Introduction
@@ -40,39 +40,39 @@ app/
## ✨ Feature Highlights
* **Multi-Key Load Balancing**: Supports configuring multiple Gemini API Keys (`API_KEYS`) for automatic sequential polling, improving availability and concurrency.
* **Visual Configuration Takes Effect Immediately**: Configurations modified through the admin backend take effect without restarting the service. Remember to click save for changes to apply.
* **Multi-Key Load Balancing**: Supports configuring multiple Gemini API Keys (`API_KEYS`) for automatic sequential polling, improving availability and concurrency.
* **Visual Configuration Takes Effect Immediately**: Configurations modified through the admin backend take effect without restarting the service. Remember to click save for changes to apply.
![Configuration Panel](files/image4.png)
* **Dual Protocol API Compatibility**: Supports forwarding CHAT API requests in both Gemini and OpenAI formats.
* **Dual Protocol API Compatibility**: Supports forwarding CHAT API requests in both Gemini and OpenAI formats.
```plaintext
openai baseurl `http://localhost:8000(/hf)/v1`
gemini baseurl `http://localhost:8000(/gemini)/v1beta`
```
* **Supports Image-Text Chat and Image Modification**: `IMAGE_MODELS` configures which models can perform image-text chat and image editing. When actually calling, use the `configured_model-image` model name to use this feature.
* **Supports Image-Text Chat and Image Modification**: `IMAGE_MODELS` configures which models can perform image-text chat and image editing. When actually calling, use the `configured_model-image` model name to use this feature.
![Chat with Image Generation](files/image6.png)
![Modify Image](files/image7.png)
* **Supports Web Search**: Supports web search. `SEARCH_MODELS` configures which models can perform web searches. When actually calling, use the `configured_model-search` model name to use this feature.
* **Supports Web Search**: Supports web search. `SEARCH_MODELS` configures which models can perform web searches. When actually calling, use the `configured_model-search` model name to use this feature.
![Web Search](files/image8.png)
* **Key Status Monitoring**: Provides a `/keys_status` page (requires authentication) to view the status and usage of each Key in real-time.
* **Key Status Monitoring**: Provides a `/keys_status` page (requires authentication) to view the status and usage of each Key in real-time.
![Monitoring Panel](files/image.png)
* **Detailed Logging**: Provides detailed error logs for easy troubleshooting.
* **Detailed Logging**: Provides detailed error logs for easy troubleshooting.
![Call Details](files/image1.png)
![Log List](files/image2.png)
![Log Details](files/image3.png)
* **Support for Custom Gemini Proxy**: Supports custom Gemini proxies, such as those built on Deno or Cloudflare.
* **OpenAI Image Generation API Compatibility**: Adapts the `imagen-3.0-generate-002` model interface to be compatible with the OpenAI image generation API, supporting client calls.
* **Flexible Key Addition**: Flexible way to add keys using regex matching for `gemini_key`, with key deduplication.
* **Support for Custom Gemini Proxy**: Supports custom Gemini proxies, such as those built on Deno or Cloudflare.
* **OpenAI Image Generation API Compatibility**: Adapts the `imagen-3.0-generate-002` model interface to be compatible with the OpenAI image generation API, supporting client calls.
* **Flexible Key Addition**: Flexible way to add keys using regex matching for `gemini_key`, with key deduplication.
![Add Key](files/image5.png)
* **OpenAI Format Embeddings API Compatibility**: Perfectly adapts to the OpenAI format `embeddings` interface, usable for local document vectorization.
* **Streamlined Response Optimization**: Optional stream output optimizer (`STREAM_OPTIMIZER_ENABLED`) to improve the experience of long-text stream responses.
* **Failure Retry and Key Management**: Automatically handles API request failures, retries (`MAX_RETRIES`), automatically disables Keys after too many failures (`MAX_FAILURES`), and periodically checks for recovery (`CHECK_INTERVAL_HOURS`).
* **Docker Support**: Supports AMD and ARM architecture Docker deployments. You can also build your own Docker image.
* **OpenAI Format Embeddings API Compatibility**: Perfectly adapts to the OpenAI format `embeddings` interface, usable for local document vectorization.
* **Streamlined Response Optimization**: Optional stream output optimizer (`STREAM_OPTIMIZER_ENABLED`) to improve the experience of long-text stream responses.
* **Failure Retry and Key Management**: Automatically handles API request failures, retries (`MAX_RETRIES`), automatically disables Keys after too many failures (`MAX_FAILURES`), and periodically checks for recovery (`CHECK_INTERVAL_HOURS`).
* **Docker Support**: Supports AMD and ARM architecture Docker deployments. You can also build your own Docker image.
> Image address: docker pull ghcr.io/snailyp/gemini-balance:latest
* **Automatic Model List Maintenance**: Supports fetching OpenAI and Gemini model lists, perfectly compatible with NewAPI's automatic model list fetching, no manual entry required.
* **Support for Removing Unused Models**: Too many default models are provided, many of which are not used. You can filter them out using `FILTERED_MODELS`.
* **Proxy Support**: Supports configuring HTTP/SOCKS5 proxy servers (`PROXIES`) for accessing the Gemini API, convenient for use in special network environments. Supports batch adding proxies.
* **Automatic Model List Maintenance**: Supports fetching OpenAI and Gemini model lists, perfectly compatible with NewAPI's automatic model list fetching, no manual entry required.
* **Support for Removing Unused Models**: Too many default models are provided, many of which are not used. You can filter them out using `FILTERED_MODELS`.
* **Proxy Support**: Supports configuring HTTP/SOCKS5 proxy servers (`PROXIES`) for accessing the Gemini API, convenient for use in special network environments. Supports batch adding proxies.
## 🚀 Quick Start
@@ -80,79 +80,83 @@ app/
#### a) Build with Dockerfile
1. **Build Image**:
1. **Build Image**:
```bash
docker build -t gemini-balance .
```
2. **Run Container**:
2. **Run Container**:
```bash
docker run -d -p 8000:8000 --env-file .env gemini-balance
```
* `-d`: Run in detached mode.
* `-p 8000:8000`: Map port 8000 of the container to port 8000 of the host.
* `--env-file .env`: Use the `.env` file to set environment variables.
* `-d`: Run in detached mode.
* `-p 8000:8000`: Map port 8000 of the container to port 8000 of the host.
* `--env-file .env`: Use the `.env` file to set environment variables.
> Note: If using an SQLite database, you need to mount a data volume to persist
> Note: If using an SQLite database, you need to mount a data volume to persist
>
> ```bash
> docker run -d -p 8000:8000 --env-file .env -v /path/to/data:/app/data gemini-balance
> ```
>
> Where `/path/to/data` is the data storage path on the host, and `/app/data` is the data directory inside the container.
#### b) Deploy with an Existing Docker Image
1. **Pull Image**:
1. **Pull Image**:
```bash
docker pull ghcr.io/snailyp/gemini-balance:latest
```
2. **Run Container**:
2. **Run Container**:
```bash
docker run -d -p 8000:8000 --env-file .env ghcr.io/snailyp/gemini-balance:latest
```
* `-d`: Run in detached mode.
* `-p 8000:8000`: Map port 8000 of the container to port 8000 of the host (adjust as needed).
* `--env-file .env`: Use the `.env` file to set environment variables (ensure the `.env` file exists in the directory where the command is executed).
* `-d`: Run in detached mode.
* `-p 8000:8000`: Map port 8000 of the container to port 8000 of the host (adjust as needed).
* `--env-file .env`: Use the `.env` file to set environment variables (ensure the `.env` file exists in the directory where the command is executed).
> Note: If using an SQLite database, you need to mount a data volume to persist
> Note: If using an SQLite database, you need to mount a data volume to persist
>
> ```bash
> docker run -d -p 8000:8000 --env-file .env -v /path/to/data:/app/data ghcr.io/snailyp/gemini-balance:latest
> ```
>
> Where `/path/to/data` is the data storage path on the host, and `/app/data` is the data directory inside the container.
### Run Locally (Suitable for Development and Testing)
If you want to run the source code directly locally for development or testing, follow these steps:
1. **Ensure Prerequisites are Met**:
* Clone the repository locally.
* Install Python 3.9 or higher.
* Create and configure the `.env` file in the project root directory (refer to the "Configure Environment Variables" section above).
* Install project dependencies:
1. **Ensure Prerequisites are Met**:
* Clone the repository locally.
* Install Python 3.9 or higher.
* Create and configure the `.env` file in the project root directory (refer to the "Configure Environment Variables" section above).
* Install project dependencies:
```bash
pip install -r requirements.txt
```
2. **Start Application**:
2. **Start Application**:
Run the following command in the project root directory:
```bash
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
```
* `app.main:app`: Specifies the location of the FastAPI application instance (the `app` object in the `main.py` file within the `app` module).
* `--host 0.0.0.0`: Makes the application accessible from any IP address on the local network.
* `--port 8000`: Specifies the port number the application listens on (you can change this as needed).
* `--reload`: Enables automatic reloading. When you modify the code, the service will automatically restart, which is very suitable for development environments (remove this option in production environments).
* `app.main:app`: Specifies the location of the FastAPI application instance (the `app` object in the `main.py` file within the `app` module).
* `--host 0.0.0.0`: Makes the application accessible from any IP address on the local network.
* `--port 8000`: Specifies the port number the application listens on (you can change this as needed).
* `--reload`: Enables automatic reloading. When you modify the code, the service will automatically restart, which is very suitable for development environments (remove this option in production environments).
3. **Access Application**:
3. **Access Application**:
After the application starts, you can access `http://localhost:8000` (or the host and port you specified) through a browser or API tool.
### Complete Configuration List
@@ -181,6 +185,7 @@ If you want to run the source code directly locally for development or testing,
| `SHOW_THINKING_PROCESS` | Optional, whether to display the model's thinking process | `true` |
| `THINKING_MODELS` | Optional, list of models that support thinking functions | `[]` |
| `THINKING_BUDGET_MAP` | Optional, thinking function budget mapping (model_name:budget_value) | `{}` |
| `URL_NORMALIZATION_ENABLED` | Optional, whether to enable intelligent URL routing mapping | `false` |
| `BASE_URL` | Optional, Gemini API base URL, no modification needed by default | `https://generativelanguage.googleapis.com/v1beta` |
| `MAX_FAILURES` | Optional, number of times a single key is allowed to fail | `3` |
| `MAX_RETRIES` | Optional, maximum number of retries for failed API requests | `3` |
@@ -194,6 +199,10 @@ If you want to run the source code directly locally for development or testing,
| `AUTO_DELETE_REQUEST_LOGS_ENABLED`| Optional, whether to enable automatic deletion of request logs | `false` |
| `AUTO_DELETE_REQUEST_LOGS_DAYS` | Optional, automatically delete request logs older than this many days (e.g., 1, 7, 30) | `30` |
| `SAFETY_SETTINGS` | Optional, safety settings (JSON string format), used to configure content safety thresholds. Example values may need adjustment based on actual model support. | `[{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"}, {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_CIVIC_INTEGRITY", "threshold": "BLOCK_NONE"}]` |
| **TTS Related** | | |
| `TTS_MODEL` | Optional, TTS model name | `gemini-2.5-flash-preview-tts` |
| `TTS_VOICE_NAME` | Optional, TTS voice name | `Zephyr` |
| `TTS_SPEED` | Optional, TTS speed | `normal` |
| **Image Generation Related** | | |
| `PAID_KEY` | Optional, paid API Key for advanced features like image generation | `your-paid-api-key` |
| `CREATE_IMAGE_MODEL` | Optional, image generation model | `imagen-3.0-generate-002` |
@@ -219,20 +228,20 @@ The following are the main API endpoints provided by the service:
### Gemini API Related (`(/gemini)/v1beta`)
* `GET /models`: List available Gemini models.
* `POST /models/{model_name}:generateContent`: Generate content using the specified Gemini model.
* `POST /models/{model_name}:streamGenerateContent`: Stream content generation using the specified Gemini model.
* `GET /models`: List available Gemini models.
* `POST /models/{model_name}:generateContent`: Generate content using the specified Gemini model.
* `POST /models/{model_name}:streamGenerateContent`: Stream content generation using the specified Gemini model.
### OpenAI API Related
* `GET (/hf)/v1/models`: List available models (uses Gemini format underneath).
* `POST (/hf)/v1/chat/completions`: Perform chat completion (uses Gemini format underneath, supports streaming).
* `POST (/hf)/v1/embeddings`: Create text embeddings (uses Gemini format underneath).
* `POST (/hf)/v1/images/generations`: Generate images (uses Gemini format underneath).
* `GET /openai/v1/models`: List available models (uses OpenAI format underneath).
* `POST /openai/v1/chat/completions`: Perform chat completion (uses OpenAI format underneath, supports streaming, can prevent truncation, and is faster).
* `POST /openai/v1/embeddings`: Create text embeddings (uses OpenAI format underneath).
* `POST /openai/v1/images/generations`: Generate images (uses OpenAI format underneath).
* `GET (/hf)/v1/models`: List available models (uses Gemini format underneath).
* `POST (/hf)/v1/chat/completions`: Perform chat completion (uses Gemini format underneath, supports streaming).
* `POST (/hf)/v1/embeddings`: Create text embeddings (uses Gemini format underneath).
* `POST (/hf)/v1/images/generations`: Generate images (uses Gemini format underneath).
* `GET /openai/v1/models`: List available models (uses OpenAI format underneath).
* `POST /openai/v1/chat/completions`: Perform chat completion (uses OpenAI format underneath, supports streaming, can prevent truncation, and is faster).
* `POST /openai/v1/embeddings`: Create text embeddings (uses OpenAI format underneath).
* `POST /openai/v1/images/generations`: Generate images (uses OpenAI format underneath).
## 🤝 Contributing
@@ -242,9 +251,9 @@ Pull Requests or Issues are welcome.
Special thanks to the following projects and platforms for providing image hosting services for this project:
* [PicGo](https://www.picgo.net/)
* [SM.MS](https://smms.app/)
* [CloudFlare-ImgBed](https://github.com/MarSeventh/CloudFlare-ImgBed) open source project
* [PicGo](https://www.picgo.net/)
* [SM.MS](https://smms.app/)
* [CloudFlare-ImgBed](https://github.com/MarSeventh/CloudFlare-ImgBed) open source project
## 🙏 Thanks to Contributors
@@ -266,7 +275,7 @@ CDN acceleration and security protection for this project are sponsored by Tence
## 💖 Friendly Projects
* **[OneLine](https://github.com/chengtx809/OneLine)** by [chengtx809](https://github.com/chengtx809) - OneLine: AI-driven hot event timeline generation tool
* **[OneLine](https://github.com/chengtx809/OneLine)** by [chengtx809](https://github.com/chengtx809) - OneLine: AI-driven hot event timeline generation tool
## 🎁 Project Support

View File

@@ -178,6 +178,7 @@ app/
| `SHOW_THINKING_PROCESS` | 可选,是否显示模型思考过程 | `true` |
| `THINKING_MODELS` | 可选,支持思考功能的模型列表 | `[]` |
| `THINKING_BUDGET_MAP` | 可选,思考功能预算映射 (模型名:预算值) | `{}` |
| `URL_NORMALIZATION_ENABLED` | 可选,是否启用智能路由映射功能 | `false` |
| `BASE_URL` | 可选Gemini API 基础 URL默认无需修改 | `https://generativelanguage.googleapis.com/v1beta` |
| `MAX_FAILURES` | 可选允许单个key失败的次数 | `3` |
| `MAX_RETRIES` | 可选API 请求失败时的最大重试次数 | `3` |
@@ -191,6 +192,10 @@ app/
| `AUTO_DELETE_REQUEST_LOGS_ENABLED`| 可选,是否开启自动删除请求日志 | `false` |
| `AUTO_DELETE_REQUEST_LOGS_DAYS` | 可选,自动删除多少天前的请求日志 (例如 1, 7, 30) | `30` |
| `SAFETY_SETTINGS` | 可选,安全设置 (JSON 字符串格式),用于配置内容安全阈值。示例值可能需要根据实际模型支持情况调整。 | `[{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "OFF"}, {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "OFF"}, {"category": "HARM_CATEGORY_CIVIC_INTEGRITY", "threshold": "BLOCK_NONE"}]` |
| **TTS 相关** | | |
| `TTS_MODEL` | 可选TTS 模型名称 | `gemini-2.5-flash-preview-tts` |
| `TTS_VOICE_NAME` | 可选TTS 语音名称 | `Zephyr` |
| `TTS_SPEED` | 可选TTS 语速 | `normal` |
| **图像生成相关** | | |
| `PAID_KEY` | 可选付费版API Key用于图片生成等高级功能 | `your-paid-api-key` |
| `CREATE_IMAGE_MODEL` | 可选,图片生成模型 | `imagen-3.0-generate-002` |

View File

@@ -77,6 +77,11 @@ class Settings(BaseSettings):
THINKING_MODELS: List[str] = []
THINKING_BUDGET_MAP: Dict[str, float] = {}
# TTS相关配置
TTS_MODEL: str = "gemini-2.5-flash-preview-tts"
TTS_VOICE_NAME: str = "Zephyr"
TTS_SPEED: str = "normal"
# 图像生成相关配置
PAID_KEY: str = ""
CREATE_IMAGE_MODEL: str = DEFAULT_CREATE_IMAGE_MODEL

View File

@@ -33,3 +33,10 @@ class ImageGenerationRequest(BaseModel):
quality: Optional[str] = None
style: Optional[str] = None
response_format: Optional[str] = "url"
class TTSRequest(BaseModel):
model: str = "gemini-2.5-flash-preview-tts"
input: str
voice: str = "Kore"
response_format: Optional[str] = "wav"

View File

@@ -67,9 +67,9 @@ class SmartRoutingMiddleware(BaseHTTPMiddleware):
r"^/gemini/v1beta/models/[^/:]+:(generate|streamGenerate)Content$", # Gemini带前缀
r"^/v1beta/models$", # Gemini模型列表
r"^/gemini/v1beta/models$", # Gemini带前缀的模型列表
r"^/v1/(chat/completions|models|embeddings|images/generations)$", # v1格式
r"^/openai/v1/(chat/completions|models|embeddings|images/generations)$", # OpenAI格式
r"^/hf/v1/(chat/completions|models|embeddings|images/generations)$", # HF格式
r"^/v1/(chat/completions|models|embeddings|images/generations|audio/speech)$", # v1格式
r"^/openai/v1/(chat/completions|models|embeddings|images/generations|audio/speech)$", # OpenAI格式
r"^/hf/v1/(chat/completions|models|embeddings|images/generations|audio/speech)$", # HF格式
r"^/vertex-express/v1beta/models/[^/:]+:(generate|streamGenerate)Content$", # Vertex Express Gemini格式
r"^/vertex-express/v1beta/models$", # Vertex Express模型列表
r"^/vertex-express/v1/(chat/completions|models|embeddings|images/generations)$", # Vertex Express OpenAI格式
@@ -146,6 +146,8 @@ class SmartRoutingMiddleware(BaseHTTPMiddleware):
return "/openai/v1/embeddings", {"type": "openai_embeddings"}
elif "image" in path.lower():
return "/openai/v1/images/generations", {"type": "openai_images"}
elif "audio" in path.lower():
return "/openai/v1/audio/speech", {"type": "openai_audio"}
elif method == "GET":
if "model" in path.lower():
return "/openai/v1/models", {"type": "openai_models"}
@@ -161,6 +163,8 @@ class SmartRoutingMiddleware(BaseHTTPMiddleware):
return "/v1/embeddings", {"type": "v1_embeddings"}
elif "image" in path.lower():
return "/v1/images/generations", {"type": "v1_images"}
elif "audio" in path.lower():
return "/v1/audio/speech", {"type": "v1_audio"}
elif method == "GET":
if "model" in path.lower():
return "/v1/models", {"type": "v1_models"}

View File

@@ -1,4 +1,4 @@
from fastapi import APIRouter, Depends, HTTPException
from fastapi import APIRouter, Depends, HTTPException, Response
from fastapi.responses import StreamingResponse
from app.config.config import settings
@@ -7,6 +7,7 @@ from app.domain.openai_models import (
ChatRequest,
EmbeddingRequest,
ImageGenerationRequest,
TTSRequest,
)
from app.handler.retry_handler import RetryHandler
from app.handler.error_handler import handle_route_errors
@@ -14,6 +15,7 @@ from app.log.logger import get_openai_logger
from app.service.chat.openai_chat_service import OpenAIChatService
from app.service.embedding.embedding_service import EmbeddingService
from app.service.image.image_create_service import ImageCreateService
from app.service.tts.tts_service import TTSService
from app.service.key.key_manager import KeyManager, get_key_manager_instance
from app.service.model.model_service import ModelService
@@ -24,6 +26,7 @@ security_service = SecurityService()
model_service = ModelService()
embedding_service = EmbeddingService()
image_create_service = ImageCreateService()
tts_service = TTSService()
async def get_key_manager():
@@ -41,6 +44,11 @@ async def get_openai_chat_service(key_manager: KeyManager = Depends(get_key_mana
return OpenAIChatService(settings.BASE_URL, key_manager)
async def get_tts_service():
"""获取TTS服务实例"""
return tts_service
@router.get("/v1/models")
@router.get("/hf/v1/models")
async def list_models(
@@ -147,3 +155,21 @@ async def get_keys_list(
},
"total": len(keys_status["valid_keys"]) + len(keys_status["invalid_keys"]),
}
@router.post("/v1/audio/speech")
@router.post("/hf/v1/audio/speech")
async def text_to_speech(
request: TTSRequest,
_=Depends(security_service.verify_authorization),
api_key: str = Depends(get_next_working_key_wrapper),
tts_service: TTSService = Depends(get_tts_service),
):
"""处理 OpenAI TTS 请求。"""
operation_name = "text_to_speech"
async with handle_route_errors(logger, operation_name):
logger.info(f"Handling TTS request for model: {request.model}")
logger.debug(f"Request: \n{request.model_dump_json(indent=2)}")
logger.info(f"Using API key: {api_key}")
audio_data = await tts_service.create_tts(request, api_key)
return Response(content=audio_data, media_type="audio/wav")

View File

@@ -0,0 +1,94 @@
import datetime
import io
import re
import time
import wave
from typing import Optional
from google import genai
from app.config.config import settings
from app.database.services import add_error_log, add_request_log
from app.domain.openai_models import TTSRequest
from app.log.logger import get_openai_logger
logger = get_openai_logger()
def _create_wav_file(audio_data: bytes) -> bytes:
"""Creates a WAV file in memory from raw audio data."""
with io.BytesIO() as wav_file:
with wave.open(wav_file, "wb") as wf:
wf.setnchannels(1) # Mono
wf.setsampwidth(2) # 16-bit
wf.setframerate(24000) # 24kHz sample rate
wf.writeframes(audio_data)
return wav_file.getvalue()
class TTSService:
async def create_tts(self, request: TTSRequest, api_key: str) -> Optional[bytes]:
"""
使用 Google Gemini SDK 创建音频。
"""
start_time = time.perf_counter()
request_datetime = datetime.datetime.now()
is_success = False
status_code = None
response = None
error_log_msg = ""
try:
client = genai.Client(api_key=api_key)
response =await client.aio.models.generate_content(
model=settings.TTS_MODEL,
contents=f"Speak in a {settings.TTS_SPEED} speed voice: {request.input}",
config={
"response_modalities": ["Audio"],
"speech_config": {
"voice_config": {
"prebuilt_voice_config": {
"voice_name": settings.TTS_VOICE_NAME
}
}
},
},
)
if (
response.candidates
and response.candidates[0].content.parts
and response.candidates[0].content.parts[0].inline_data
):
raw_audio_data = response.candidates[0].content.parts[0].inline_data.data
is_success = True
status_code = 200
return _create_wav_file(raw_audio_data)
except Exception as e:
is_success = False
error_log_msg = f"Generic error: {e}"
logger.error(f"An error occurred in TTSService: {error_log_msg}")
match = re.search(r"status code (\d+)", str(e))
if match:
status_code = int(match.group(1))
else:
status_code = 500
raise
finally:
end_time = time.perf_counter()
latency_ms = int((end_time - start_time) * 1000)
if not is_success:
await add_error_log(
gemini_key=api_key,
model_name=settings.TTS_MODEL,
error_type="google-tts",
error_log=error_log_msg,
error_code=status_code,
request_msg=request.input
)
await add_request_log(
model_name=settings.TTS_MODEL,
api_key=api_key,
is_success=is_success,
status_code=status_code,
latency_ms=latency_ms,
request_time=request_datetime
)

View File

@@ -745,6 +745,13 @@ endblock %} {% block head_extra_styles %}
>
模型配置
</button>
<button
class="tab-btn px-5 py-2 rounded-full font-medium text-sm transition-all duration-200"
data-tab="tts"
style="background-color: #f8fafc !important; color: #64748b !important; border: 2px solid #e2e8f0 !important; box-shadow: 0 1px 3px 0 rgba(0, 0, 0, 0.1) !important; font-weight: 500 !important;"
>
TTS 配置
</button>
<button
class="tab-btn px-5 py-2 rounded-full font-medium text-sm transition-all duration-200"
data-tab="image"
@@ -1370,11 +1377,97 @@ endblock %} {% block head_extra_styles %}
</div>
</div>
</div>
<!-- 图像生成相关配置 -->
<div class="config-section" id="image-section">
<!-- TTS配置 -->
<div class="config-section" id="tts-section">
<h2
class="text-xl font-bold mb-6 pb-3 border-b flex items-center gap-2 text-gray-800 border-violet-300 border-opacity-30"
>
<i class="fas fa-volume-up text-violet-400"></i> TTS 相关配置
</h2>
<!-- TTS 模型 -->
<div class="mb-6">
<label for="TTS_MODEL" class="block font-semibold mb-2 text-gray-700"
>TTS 模型</label
>
<select
id="TTS_MODEL"
name="TTS_MODEL"
class="w-full px-4 py-3 rounded-lg form-select-themed"
>
<option value="gemini-2.5-flash-preview-tts">gemini-2.5-flash-preview-tts</option>
<option value="gemini-2.5-pro-preview-tts">gemini-2.5-pro-preview-tts</option>
</select>
<small class="text-gray-500 mt-1 block">用于TTS的模型</small>
</div>
<!-- TTS 语音名称 -->
<div class="mb-6">
<label for="TTS_VOICE_NAME" class="block font-semibold mb-2 text-gray-700"
>TTS 语音名称</label
>
<select
id="TTS_VOICE_NAME"
name="TTS_VOICE_NAME"
class="w-full px-4 py-3 rounded-lg form-select-themed"
>
<option value="Zephyr">Zephyr (明亮)</option>
<option value="Puck">Puck (欢快)</option>
<option value="Charon">Charon (信息丰富)</option>
<option value="Kore">Kore (坚定)</option>
<option value="Fenrir">Fenrir (易激动)</option>
<option value="Leda">Leda (年轻)</option>
<option value="Orus">Orus (坚定)</option>
<option value="Aoede">Aoede (轻松)</option>
<option value="Callirhoe">Callirhoe (随和)</option>
<option value="Autonoe">Autonoe (明亮)</option>
<option value="Enceladus">Enceladus (呼吸感)</option>
<option value="Iapetus">Iapetus (清晰)</option>
<option value="Umbriel">Umbriel (随和)</option>
<option value="Algieba">Algieba (平滑)</option>
<option value="Despina">Despina (平滑)</option>
<option value="Erinome">Erinome (清晰)</option>
<option value="Algenib">Algenib (沙哑)</option>
<option value="Rasalgethi">Rasalgethi (信息丰富)</option>
<option value="Laomedeia">Laomedeia (欢快)</option>
<option value="Achernar">Achernar (轻柔)</option>
<option value="Alnilam">Alnilam (坚定)</option>
<option value="Schedar">Schedar (平稳)</option>
<option value="Gacrux">Gacrux (成熟)</option>
<option value="Pulcherrima">Pulcherrima (向前)</option>
<option value="Achird">Achird (友好)</option>
<option value="Zubenelgenubi">Zubenelgenubi (休闲)</option>
<option value="Vindemiatrix">Vindemiatrix (温柔)</option>
<option value="Sadachbia">Sadachbia (活泼)</option>
<option value="Sadaltager">Sadaltager (博学)</option>
<option value="Sulafat">Sulafat (温暖)</option>
</select>
<small class="text-gray-500 mt-1 block">TTS 的语音名称,控制风格、语调、口音和节奏</small>
</div>
<!-- TTS 语速 -->
<div class="mb-6">
<label for="TTS_SPEED" class="block font-semibold mb-2 text-gray-700"
>TTS 语速</label
>
<select
id="TTS_SPEED"
name="TTS_SPEED"
class="w-full px-4 py-3 rounded-lg form-select-themed"
>
<option value="slow"></option>
<option value="normal">正常</option>
<option value="fast"></option>
</select>
<small class="text-gray-500 mt-1 block">选择 TTS 的语速</small>
</div>
</div>
<!-- 图像生成相关配置 -->
<div class="config-section" id="image-section">
<h2
class="text-xl font-bold mb-6 pb-3 border-b flex items-center gap-2 text-gray-800 border-violet-300 border-opacity-30"
>
<i class="fas fa-image text-violet-400"></i> 图像生成配置
</h2>
@@ -1511,12 +1604,12 @@ endblock %} {% block head_extra_styles %}
</div>
</div>
<!-- 流式输出优化配置 -->
<!-- 流式输出优化配置 -->
<div class="config-section" id="stream-section">
<h2
class="text-xl font-bold mb-6 pb-3 border-b flex items-center gap-2 text-gray-800 border-violet-300 border-opacity-30"
>
<i class="fas fa-stream text-violet-400"></i> 流式输出优化器
<i class="fas fa-stream text-violet-400"></i> 流式输出相关配置
</h2>
<!-- 启用流式输出优化 -->