From 44ef8aec14a04a7419ed40c05e8d3e410886f126 Mon Sep 17 00:00:00 2001
From: Wendong <w3ndong.fan@gmail.com>
Date: Tue, 11 Mar 2025 18:19:23 +0800
Subject: [PATCH] update readme

---
 README.md    | 34 ++++++++++++++++++++++------------
 README_zh.md | 34 ++++++++++++++++++++++------------
 2 files changed, 44 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index aee4d26..c602824 100644
--- a/README.md
+++ b/README.md
@@ -219,11 +219,7 @@ Alternatively, you can set environment variables directly in your terminal:
 
 > **Note**: Environment variables set directly in the terminal will only persist for the current session.
 
-### Additional Models
 
-For information on configuring other AI models beyond OpenAI, please refer to our [CAMEL models documentation](https://docs.camel-ai.org/key_modules/models.html#supported-model-platforms-in-camel).
-
-> **Note**: For optimal performance, we strongly recommend using OpenAI models. Our experiments show that other models may result in significantly lower performance on complex tasks and benchmarks.
 
 ## **Running with Docker**
 
@@ -265,11 +261,19 @@ python owl/run.py
 
 ## Running with Different Models
 
-### Additional Models
+### Model Requirements
 
-For information on configuring other AI models beyond OpenAI, please refer to our [CAMEL models documentation](https://docs.camel-ai.org/key_modules/models.html#supported-model-platforms-in-camel).
+- **Tool Calling**: OWL requires models with robust tool calling capabilities to interact with various toolkits. Models must be able to understand tool descriptions, generate appropriate tool calls, and process tool outputs.
 
-OWL supports various LLM backends. You can use the following scripts to run with different models:
+- **Multimodal Understanding**: For tasks involving web interaction, image analysis, or video processing, models with multimodal capabilities are required to interpret visual content and context.
+
+#### Supported Models
+
+For information on configuring AI models, please refer to our [CAMEL models documentation](https://docs.camel-ai.org/key_modules/models.html#supported-model-platforms-in-camel).
+
+> **Note**: For optimal performance, we strongly recommend using OpenAI models (GPT-4 or later versions). Our experiments show that other models may result in significantly lower performance on complex tasks and benchmarks, especially those requiring advanced multi-modal understanding and tool use.
+
+OWL supports various LLM backends, though capabilities may vary depending on the model's tool calling and multimodal abilities. You can use the following scripts to run with different models:
 
 ```bash
 # Run with Qwen model
@@ -325,6 +329,8 @@ Example tasks you can try:
 
 # 🧰 Configuring Toolkits
 
+> **Important**: Effective use of toolkits requires models with strong tool calling capabilities. For multimodal toolkits (Web, Image, Video), models must also have multimodal understanding abilities.
+
 OWL supports various toolkits that can be customized by modifying the `tools` list in your script:
 
 ```python
@@ -347,11 +353,15 @@ tools = [
 ## Available Toolkits
 
 Key toolkits include:
-- **WebToolkit**: Browser automation
-- **VideoAnalysisToolkit**: Video processing
-- **AudioAnalysisToolkit**: Audio processing
-- **CodeExecutionToolkit**: Python code execution
-- **ImageAnalysisToolkit**: Image analysis
+
+### Multimodal Toolkits (Require multimodal model capabilities)
+- **WebToolkit**: Browser automation for web interaction and navigation
+- **VideoAnalysisToolkit**: Video processing and content analysis
+- **ImageAnalysisToolkit**: Image analysis and interpretation
+
+### Text-Based Toolkits
+- **AudioAnalysisToolkit**: Audio processing (requires OpenAI API)
+- **CodeExecutionToolkit**: Python code execution and evaluation
 - **SearchToolkit**: Web searches (Google, DuckDuckGo, Wikipedia)
 - **DocumentProcessingToolkit**: Document parsing (PDF, DOCX, etc.)
 
diff --git a/README_zh.md b/README_zh.md
index 588f0f7..04e3074 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -218,10 +218,6 @@ OWL 需要各种 API 密钥来与不同的服务进行交互。`owl/.env_templat
 
 > **注意**：直接在终端中设置的环境变量仅在当前会话中有效。
 
-### 其他模型
-
-有关配置 OpenAI 以外的其他 AI 模型的信息，请参阅我们的 [CAMEL 模型文档](https://docs.camel-ai.org/key_modules/models.html#supported-model-platforms-in-camel)。
-
 ## **使用Docker运行**
 
 如果您希望使用Docker运行OWL项目，我们提供了完整的Docker支持：
@@ -267,11 +263,19 @@ python owl/run_mini.py
 
 ## 使用不同的模型
 
-### 其他模型
+### 模型要求
 
-有关配置 OpenAI 以外的其他 AI 模型的信息，请参阅我们的 [CAMEL 模型文档](https://docs.camel-ai.org/key_modules/models.html#supported-model-platforms-in-camel)。
+- **工具调用能力**：OWL 需要具有强大工具调用能力的模型来与各种工具包交互。模型必须能够理解工具描述、生成适当的工具调用，并处理工具输出。
 
-OWL 支持多种 LLM 后端。您可以使用以下脚本来运行不同的模型：
+- **多模态理解能力**：对于涉及网页交互、图像分析或视频处理的任务，需要具备多模态能力的模型来解释视觉内容和上下文。
+
+#### 支持的模型
+
+有关配置模型的信息，请参阅我们的 [CAMEL 模型文档](https://docs.camel-ai.org/key_modules/models.html#supported-model-platforms-in-camel)。
+
+> **注意**：为获得最佳性能，我们强烈推荐使用 OpenAI 模型（GPT-4 或更高版本）。我们的实验表明，其他模型在复杂任务和基准测试上可能表现明显较差，尤其是那些需要多模态理解和工具使用的任务。
+
+OWL 支持多种 LLM 后端，但功能可能因模型的工具调用和多模态能力而异。您可以使用以下脚本来运行不同的模型：
 
 ```bash
 # 使用 Qwen 模型运行
@@ -321,6 +325,8 @@ OWL 将自动调用与文档相关的工具来处理文件并提取答案。
 
 # 🧰 配置工具包
 
+> **重要提示**：有效使用工具包需要具备强大工具调用能力的模型。对于多模态工具包（Web、图像、视频），模型还必须具备多模态理解能力。
+
 OWL支持多种工具包，可通过修改脚本中的`tools`列表进行自定义：
 
 ```python
@@ -343,11 +349,15 @@ tools = [
 ## 主要工具包
 
 关键工具包包括：
-- **WebToolkit**：浏览器自动化
-- **VideoAnalysisToolkit**：视频处理
-- **AudioAnalysisToolkit**：音频处理
-- **CodeExecutionToolkit**：Python代码执行
-- **ImageAnalysisToolkit**：图像分析
+
+### 多模态工具包（需要模型具备多模态能力）
+- **WebToolkit**：浏览器自动化，用于网页交互和导航
+- **VideoAnalysisToolkit**：视频处理和内容分析
+- **ImageAnalysisToolkit**：图像分析和解释
+
+### 基于文本的工具包
+- **AudioAnalysisToolkit**：音频处理（需要 OpenAI API）
+- **CodeExecutionToolkit**：Python 代码执行和评估
 - **SearchToolkit**：网络搜索（Google、DuckDuckGo、维基百科）
 - **DocumentProcessingToolkit**：文档解析（PDF、DOCX等）