merge master

2025-12-26 05:16:21 +08:00 · 2025-03-18 09:51:43 +08:00 · 2025-03-18 09:51:43 +08:00 · 354eb7113b
commit 354eb7113b
parent 0a9693c5cc c67d35d187
18 changed files with 327253 additions and 124 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -0,0 +1,25 @@
+---
+name: Bug Report
+about: Create a bug report to help us improve
+title: '[BUG] '
+labels: bug
+assignees: ''
+---
+
+## Version Information
+- Commit Hash: <!-- e.g. abc123def456 -->
+
+## Error Message
+<!-- Paste your error message in markdown -->
+```
+
+```
+
+## Description
+
+
+### Current Behavior
+<!-- Describe what actually happened -->
+
+### Expected Behavior
+<!-- Describe what you expected to happen -->
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@ -0,0 +1 @@
+blank_issues_enabled: false
--- a/21
+++ b/21
@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Dongle
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@ -11,6 +11,14 @@

 https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368

+Support Model:
+
+| Vendor| Model |
+| --- | --- |
+| [openainext](https://api.openai-next.com) | gpt-4o,gpt-4o-2024-08-06,gpt-4o-2024-11-20 |
+|[yeka](https://2233.ai/api)|gpt-4o,o1|
+|openai|gpt-4o,gpt-4o-2024-08-06,gpt-4o-2024-11-20,o1,4.gpt-4.5-preview-2025-02-27,|
+


 </div>
@ -65,8 +73,6 @@ python main.py
 ```
 Then open `http://localhost:7888/` in your browser to configure your API key and basic settings.

-For supported vendors and models, please refer to this [link](./SUPPORT_MODEL.md)
-
 ## 📝 FAQ

 ### 🔧 CUDA Version Mismatch
--- a/README_CN.md
+++ b/README_CN.md
@ -8,6 +8,18 @@

 https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368

+
+目前支持的模型:
+
+
+| Vendor| Model |
+| --- | --- |
+| [openainext](https://api.openai-next.com) | gpt-4o,gpt-4o-2024-08-06,gpt-4o-2024-11-20 |
+|[yeka](https://2233.ai/api)|gpt-4o,o1|
+|openai|gpt-4o,gpt-4o-2024-08-06,gpt-4o-2024-11-20,o1,4.gpt-4.5-preview-2025-02-27,|
+
+
+
 </div>

 > 特别声明：autoMate 项目还处于非常早期阶段，目前的能力还不足以解决任何问题，当前仅限于学习和交流。不过我会不断的寻求突破点，不停地融入最新的技术！如果你有任何疑问，也可以加vx好友，入群交流。
@ -58,9 +70,6 @@ python main.py
 ```
 然后在浏览器中打开`http://localhost:7888/`，配置您的API密钥和基本设置。

-支持的厂商及模型可以参考这个[链接](./SUPPORT_MODEL.md)
-
-

 ## 📝常见问题

@ -78,11 +87,7 @@ python main.py
 pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
 ```

-### 模型无法下载
-多半是被墙了，可以从百度网盘直接下载模型。

-通过网盘分享的文件：weights.zip
-链接: https://pan.baidu.com/s/1Tj8sZZK9_QI7whZV93vb0w?pwd=dyeu 提取码: dyeu


 ## 🤝 参与共建
--- a/README_JA.md
+++ b/README_JA.md
@ -2,58 +2,58 @@

 <img src="./resources/logo.png" width="120" height="120" alt="autoMate logo">
 <h1>autoMate</h1>
-<p><b>🤖 AI駆動のローカルオートメーションツール | コンピュータに仕事をさせる</b></p>
+<p><b>🤖 AI搭載ローカル自動化ツール | パソコンに仕事を任せよう</b></p>

 [English](./README.md) | [简体中文](./README_CN.md)

->"面倒な作業を自動化し、生活のための時間を取り戻す"
+>"面倒な作業を自動化し、人生の時間を取り戻そう"


 https://github.com/user-attachments/assets/bf27f8bd-136b-402e-bc7d-994b99bcc368

+対応モデル：

+| ベンダー | モデル |
+| --- | --- |
+| [openainext](https://api.openai-next.com) | gpt-4o,gpt-4o-2024-08-06,gpt-4o-2024-11-20 |
+|[yeka](https://2233.ai/api)|gpt-4o,o1|
+|openai|gpt-4o,gpt-4o-2024-08-06,gpt-4o-2024-11-20,o1,4.gpt-4.5-preview-2025-02-27,|

 </div>

-## 💫 コンピュータとの関係を再定義する
+> 特記事項：autoMateプロジェクトは現在、非常に初期段階にあります。現時点での機能は限定的で、主に学習とコミュニケーションを目的としています。しかし、私たちは継続的にブレークスルーを追求し、最新技術を統合しています！ご質問がありましたら、WeChatでお気軽にお問い合わせください。

-繰り返しの作業に夜遅くまで取り組むのに疲れましたか？単調な作業があなたの創造力と貴重な時間を消費していませんか？
+<div align="center">
+<img src="./resources/wxchat.png" width="120" height="120" alt="autoMate logo">
+</div>

-autoMateは単なるツールではありません。AGI時代のインテリジェントなデジタル同僚であり、仕事と生活のバランスを取り戻すために絶え間なく働きます。
+## 💫 パソコンとの関係を再定義

-**オートメーションがあなたの生活にもっと多くの可能性をもたらします。**
+反復的な作業で夜遅くまで働くことに疲れていませんか？単調な作業があなたの創造性と貴重な時間を奪っていませんか？
+
+autoMateは単なるツールではありません - AGI時代のインテリジェントなデジタル同僚として、仕事と生活のバランスを取り戻すためにたゆまぬ努力を続けています。
+
+**自動化であなたの人生にもっと可能性を。**

 ## 💡 プロジェクト概要
-autoMateは、OmniParserを基盤とした革新的なAI+RPAオートメーションツールで、AIを「デジタル従業員」に変え、以下のことができます：
+autoMateは、OmniParserを基盤とした革新的なAI+RPA自動化ツールで、AIをあなたの「デジタル従業員」に変えます：

- 📊 コンピュータのインターフェースを自動的に操作し、複雑なワークフローを完了する
- 🔍 画面の内容をインテリジェントに理解し、人間の視覚と操作をシミュレートする
- 🧠 タスクの要件に基づいて自律的に意思決定し、行動を取る
- 💻 ローカルデプロイメントをサポートし、データのセキュリティとプライバシーを保護する
+- 📊 パソコンインターフェースを自動操作し、複雑なワークフローを完了
+- 🔍 画面の内容を知的に理解し、人間の視覚と操作をシミュレート
+- 🧠 タスク要件に基づいて自律的に判断し行動
+- 💻 ローカルデプロイメントでデータセキュリティとプライバシーを保護

-従来のRPAツールの複雑なルール設定とは異なり、autoMateは自然言語のタスク記述だけで複雑なオートメーションプロセスを完了します。繰り返しの作業に別れを告げ、真に価値を生み出すことに集中しましょう！
+## ✨ 特徴

-## 🌟 autoMateがあなたの仕事を変える理由
-
-> "autoMateを使用する前は、毎日3時間かけてレポートを処理していました。今ではタスクを設定するのに10分しかかからず、本当に重要なことに集中できます。" - 財務マネージャーのフィードバック。
-
-autoMateがあなたの時間を何時間も占めていたタスクを自動的に完了するのを初めて見たとき、言葉にできない安堵感を感じるでしょう。これは単なる効率の向上ではなく、創造力の解放です。
-
-想像してみてください：毎朝目覚めると、昨夜のデータ整理、レポート生成、メール返信がすべて完了しており、あなたの知恵と創造力を本当に必要とする仕事だけが残っています。これがautoMateがもたらす未来です。
-
-## ✨ 機能
-
- 🔮 コード不要のオートメーション - タスクを自然言語で記述し、プログラミング知識は不要
- 🖥️ フルインターフェースコントロール - 特定のソフトウェアに限定されず、あらゆるビジュアルインターフェースの操作をサポート
- 🚅 簡素化されたインストール - 公式バージョンと比較して簡素化されたインストールプロセス、中国語環境をサポートし、ワンクリックでデプロイ
- 🔒 ローカル操作 - データのセキュリティを保護し、プライバシーの懸念なし
- 🌐 マルチモデルサポート - 主流の大規模言語モデルと互換性あり
- 💎 継続的な成長 - 使用するにつれて、あなたの作業習慣とニーズに適応し学習
+- 🔮 ノーコード自動化 - 自然言語でタスクを記述、プログラミング知識不要
+- 🖥️ 完全なインターフェース制御 - 特定のソフトウェアに限定されない、あらゆる視覚的インターフェースでの操作をサポート
+- 🚅 簡単インストール - 公式版と比べてインストール手順を簡素化、日本語環境対応、ワンクリックデプロイメント
+- 🔒 ローカル実行 - データセキュリティを保護、プライバシーの心配なし

 ## 🚀 クイックスタート

 ### 📦 インストール
-プロジェクトをクローンし、環境を設定します：
+プロジェクトをクローンし、環境をセットアップ：

 ```bash
 git clone https://github.com/yuruotong1/autoMate.git
@ -62,32 +62,42 @@ conda create -n "automate" python==3.12
 conda activate automate
 pip install -r requirements.txt
 ```
+
 ### 🎮 アプリケーションの起動

 ```bash
 python main.py
 ```
-その後、ブラウザで`http://localhost:7888/`を開き、APIキーと基本設定を構成します。
+ブラウザで`http://localhost:7888/`を開き、APIキーと基本設定を構成してください。

-サポートされているベンダーとモデルについては、この[リンク](./SUPPORT_MODEL.md)を参照してください。
-
-## 📝 FAQ
+## 📝 よくある質問

 ### 🔧 CUDAバージョンの不一致
-エラーが表示された場合：「GPUドライバが互換性がありません。readmeに従って適切なtorchバージョンをインストールしてください」と表示された場合、ドライバの互換性がないことを示しています。次のいずれかを行うことができます：
+4GB以上のVRAMを搭載したNVIDIAグラフィックスカードの使用を推奨しますが、CPU上での実行も可能です（ただし非常に遅くなります）：

-1. CPUのみで実行（遅いが機能する）
-2. `pip list`でtorchバージョンを確認
-3. [公式サイト](https://pytorch.org/get-started/locally/)でサポートされているCUDAバージョンを確認
-4. Nvidiaドライバを再インストール
+1. `pip list`を実行してtorchバージョンを確認；
+2. [公式サイト](https://pytorch.org/get-started/locally/)で対応するCUDAバージョンを確認；
+3. 現在インストールされているtorchとtorchvisionをアンインストール；
+4. 公式のtorchインストールコマンドをコピーし、お使いのCUDAバージョン用のtorchを再インストール。

-## 🤝 コントリビュート
+例えば、CUDAバージョンが12.4の場合、以下のコマンドでtorchをインストール：

-優れたオープンソースプロジェクトは、集団の知恵を体現しています。autoMateの成長は、あなたの参加と貢献にかかっています。バグの修正、機能の追加、ドキュメントの改善など、あなたの努力は何千人もの人々が繰り返し作業から解放されるのを助けます。
+```bash
+pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
+```

-よりインテリジェントな未来を一緒に創りましょう。

-> ["賢い質問の仕方"](https://github.com/ryanhanwu/How-To-Ask-Questions-The-Smart-Way)、["オープンソースコミュニティへの質問の仕方"](https://github.com/seajs/seajs/issues/545)、["効果的なバグ報告の仕方"](http://www.chiark.greenend.org.uk/%7Esgtatham/bugs.html)、および["オープンソースプロジェクトに解決できない質問を提出する方法"](https://zhuanlan.zhihu.com/p/25795393)を読むことを強くお勧めします。より良い質問は、より良いサポートを得るのに役立ちます。
+ファイル：weights.zip
+リンク：https://pan.baidu.com/s/1Tj8sZZK9_QI7whZV93vb0w?pwd=dyeu
+パスワード：dyeu
+
+## 🤝 コントリビューション
+
+優れたオープンソースプロジェクトは集合知の結晶です。autoMateの成長はあなたの参加と貢献に依存しています。バグ修正、機能追加、ドキュメント改善など、あなたの努力は何千人もの人々を反復作業から解放することにつながります。
+
+より知的な未来を一緒に創造しましょう。
+
+> より良いサポートを受けるために、["賢い質問の仕方"](https://github.com/ryanhanwu/How-To-Ask-Questions-The-Smart-Way)、["オープンソースコミュニティへの質問方法"](https://github.com/seajs/seajs/issues/545)、["効果的なバグ報告の方法"](http://www.chiark.greenend.org.uk/%7Esgtatham/bugs.html)、["オープンソースプロジェクトへの回答不能な質問の提出方法"](https://zhuanlan.zhihu.com/p/25795393)をお読みいただくことを強くお勧めします。

 <a href="https://github.com/yuruotong1/autoMate/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=yuruotong1/autoMate" />
@ -96,6 +106,6 @@ python main.py
 ---

 <div align="center">
-⭐ すべてのスターはクリエイターへの励ましであり、より多くの人々がautoMateを発見し、恩恵を受ける機会です ⭐
-今日のあなたのサポートが、私たちの明日の原動力です
+⭐ 一つ一つのスターは開発者への励みであり、より多くの人々がautoMateを発見し恩恵を受ける機会となります ⭐
+今日のあなたのサポートが、私たちの明日への原動力です
 </div>
--- a/SUPPORT_MODEL.md
+++ b/SUPPORT_MODEL.md
@ -1,6 +0,0 @@
-| Vendor-en | Vendor-ch | Model | base-url |
-| --- | --- | --- | --- |
-| openainext international| openainext 国际| gpt-4o-2024-11-20 | https://api.openai-next.com/v1 |
-| openainext international| openainext 国际| gpt-4.5-preview-2025-02-27 | https://api.openai-next.com/v1 |
-| openainext china| openainext中国|gpt-4o-2024-11-20|https://cn.api.openai-next.com/v1
-| openainext china| openainext中国|gpt-4.5-preview-2025-02-27|https://cn.api.openai-next.com/v1
--- a/gradio_ui/agent/vision_agent.py
+++ b/gradio_ui/agent/vision_agent.py
@ -1,3 +1,4 @@
+import os
 from typing import List, Optional
 import cv2
 import torch
@ -56,11 +57,22 @@ class VisionAgent:
                trust_remote_code=True,
                local_files_only=True
            ).to(self.device)
+            "processor", 
+            trust_remote_code=True
+        )
+        
+        try:
+            self.caption_model = AutoModelForCausalLM.from_pretrained(
+                caption_model_path, 
+                torch_dtype=self.dtype,
+                trust_remote_code=True
+            ).to(self.device)
            
            # 不需要额外加载权重，因为权重已经包含在 florence_base_path 中
            
        except Exception as e:
            print(f"Model loading failed: {e}")
+            print(f"Model loading failed for path: {caption_model_path}")
            raise e
        self.prompt = "<CAPTION>"
        
--- a/gradio_ui/app.py
+++ b/gradio_ui/app.py
@ -14,7 +14,7 @@ from gradio_ui.loop import (
 import base64
 from xbrain.utils.config import Config

-from util.download_weights import MODEL_DIR
+from util.download_weights import OMNI_PARSER_MODEL_DIR
 CONFIG_DIR = Path("~/.anthropic").expanduser()
 API_KEY_FILE = CONFIG_DIR / "api_key"

@ -320,8 +320,8 @@ def run():
        model.change(fn=update_model, inputs=[model, state], outputs=None)
        api_key.change(fn=update_api_key, inputs=[api_key, state], outputs=None)
        chatbot.clear(fn=clear_chat, inputs=[state], outputs=[chatbot])
-        vision_agent = VisionAgent(yolo_model_path=os.path.join(MODEL_DIR, "icon_detect", "model.pt"),
-                                 caption_model_path=os.path.join(MODEL_DIR, "icon_caption"))
+        vision_agent = VisionAgent(yolo_model_path=os.path.join(OMNI_PARSER_MODEL_DIR, "icon_detect", "model.pt"),
+                                  caption_model_path=os.path.join(OMNI_PARSER_MODEL_DIR, "icon_caption"))
        vision_agent_state = gr.State({"agent": vision_agent})
        submit_button.click(process_input, [chat_input, state, vision_agent_state], [chatbot, task_list])
        stop_button.click(stop_app, [state], None)
--- a/main.py
+++ b/main.py
@ -1,17 +1,19 @@
+# import os
+# os.environ["HF_ENDPOINT"] = "https://hf-mirror.com/"
+
 from gradio_ui import app
 from util import download_weights
-import torch
-def run():
-    try:
-        print("cuda is_available: ", torch.cuda.is_available())
-        print("MPS is_available: ", torch.backends.mps.is_available())
-        print("cuda device_count", torch.cuda.device_count()) 
-        print("cuda device_name", torch.cuda.get_device_name(0)) 
-    except Exception:
-        print("GPU driver is not compatible, please install the appropriate version of torch according to the readme!")

+import torch
+
+def run():
+    if not torch.cuda.is_available():
+        print("Warning: GPU is not available, we will use CPU, the application may run slower!\nyou computer will very likely heat up!")
+    print("Downloading the weight files...")
    # download the weight files
-    download_weights.download()   
+    # download_weights.download_models()  
+    # 配置 HuggingFace 镜像
+    # print("HuggingFace mirror configured to use ModelScope registry")
    app.run()


--- a/processor/added_tokens.json
+++ b/processor/added_tokens.json
--- a/processor/merges.txt
+++ b/processor/merges.txt
--- a/processor/preprocessor_config.json
+++ b/processor/preprocessor_config.json
@ -0,0 +1,33 @@
+{
+  "auto_map": {
+    "AutoProcessor": "microsoft/Florence-2-base--processing_florence2.Florence2Processor"
+  },
+  "crop_size": {
+    "height": 768,
+    "width": 768
+  },
+  "do_center_crop": false,
+  "do_convert_rgb": null,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "image_mean": [
+    0.485,
+    0.456,
+    0.406
+  ],
+  "image_processor_type": "CLIPImageProcessor",
+  "image_seq_length": 577,
+  "image_std": [
+    0.229,
+    0.224,
+    0.225
+  ],
+  "processor_class": "Florence2Processor",
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "height": 768,
+    "width": 768
+  }
+}
--- a/processor/special_tokens_map.json
+++ b/processor/special_tokens_map.json
--- a/processor/tokenizer.json
+++ b/processor/tokenizer.json
--- a/processor/tokenizer_config.json
+++ b/processor/tokenizer_config.json
--- a/processor/vocab.json
+++ b/processor/vocab.json
--- a/util/download_weights.py
+++ b/util/download_weights.py
@ -1,65 +1,22 @@
 import os
 from pathlib import Path
 from modelscope import snapshot_download
-import subprocess
-import shutil
 __WEIGHTS_DIR = Path("weights")
-MODEL_DIR = os.path.join(__WEIGHTS_DIR, "AI-ModelScope", "OmniParser-v2___0") 
-PROCESSOR_DIR = os.path.join(__WEIGHTS_DIR, "AI-ModelScope", "Florence-2-base")
-def download():
-    # Create weights directory
-   
-    __WEIGHTS_DIR.mkdir(exist_ok=True)
-    
-    # List of files to download
-    files = [
-        "icon_detect/train_args.yaml",
-        "icon_detect/model.pt",
-        "icon_detect/model.yaml",
-        "icon_caption/generation_config.json",
-        "icon_caption/model.safetensors",
-    ]
+OMNI_PARSER_MODEL_DIR = os.path.join(__WEIGHTS_DIR, "AI-ModelScope", "OmniParser-v2___0")

-    # Extra config files downloaded from Florence2 repo
-    config_files = [
-        "configuration_florence2.py",
-        "modeling_florence2.py"
-    ]
-    
-    # Check and download missing files
-    missing_files = []
-    for file in files:
-        file_path = os.path.join(MODEL_DIR, file)
-        if not os.path.exists(file_path):
-            missing_files.append(file)
-            break
-    
-    if not missing_files:
-        print("Model files already detected!")
-        return
+def __download_omni_parser():
+    # Create weights directory
+    __WEIGHTS_DIR.mkdir(exist_ok=True)
    
    snapshot_download(
        'AI-ModelScope/OmniParser-v2.0',
-        cache_dir='weights',
-        ignore_file_pattern=['config.json']
+        cache_dir='weights'
        )
    
-    snapshot_download(
-        'AI-ModelScope/Florence-2-base',
-        cache_dir='weights',
-        allow_file_pattern=['*.py', '*.json']
-    )
-
-    # Move downloaded Florence config files into icon_caption
-    for file_path in config_files:
-        source_dir = os.path.join(PROCESSOR_DIR, file_path)
-        dest_dir = os.path.join(MODEL_DIR, "icon_caption", file_path)
-        shutil.copy(source_dir, dest_dir)
-
-    # Move customized config.json into icon_caption to load the model from local path
-    shutil.copy(os.path.join("util", "config.json"), os.path.join(MODEL_DIR, "icon_caption", "config.json"))
    
-    print("Download complete")
+def download_models():
+    __download_omni_parser()
+

 if __name__ == "__main__":
-    download()
+    download_models()