docs: rewrite local LLMs page (#9307)
@ -6,58 +6,70 @@ description: When using a Local LLM, OpenHands may have limited functionality. I
|
||||
## News
|
||||
|
||||
- 2025/05/21: We collaborated with Mistral AI and released [Devstral Small](https://mistral.ai/news/devstral) that achieves [46.8% on SWE-Bench Verified](https://github.com/SWE-bench/experiments/pull/228)!
|
||||
- 2025/03/31: We released an open model OpenHands LM v0.1 32B that achieves 37.1% on SWE-Bench Verified
|
||||
- 2025/03/31: We released an open model OpenHands LM 32B v0.1 that achieves 37.1% on SWE-Bench Verified
|
||||
([blog](https://www.all-hands.dev/blog/introducing-openhands-lm-32b----a-strong-open-coding-agent-model), [model](https://huggingface.co/all-hands/openhands-lm-32b-v0.1)).
|
||||
|
||||
## Quickstart: Running OpenHands with a Local LLM using LM Studio
|
||||
|
||||
## Quickstart: Running OpenHands on Your Macbook
|
||||
This guide explains how to serve a local Devstral LLM using [LM Studio](https://lmstudio.ai/) and have OpenHands connect to it.
|
||||
|
||||
### Serve the model on your Macbook
|
||||
We recommend:
|
||||
- **LM Studio** as the local model server, which handles metadata downloads automatically and offers a simple, user-friendly interface for configuration.
|
||||
- **Devstral Small 2505** as the LLM for software development, trained on real GitHub issues and optimized for agent-style workflows like OpenHands.
|
||||
|
||||
We recommend using [LMStudio](https://lmstudio.ai/) for serving these models locally.
|
||||
### Hardware Requirements
|
||||
|
||||
1. Download [LM Studio](https://lmstudio.ai/) and install it
|
||||
Running Devstral requires a recent GPU with at least 16GB of VRAM, or a Mac with Apple Silicon (M1, M2, etc.) with at least 32GB of RAM.
|
||||
|
||||
2. Download the model:
|
||||
- Option 1: Directly download the LLM from [this link](https://lmstudio.ai/model/devstral-small-2505-mlx) or by searching for the name `Devstral-Small-2505` in LM Studio
|
||||
- Option 2: Download a LLM in GGUF format. For example, to download [Devstral Small 2505 GGUF](https://huggingface.co/mistralai/Devstral-Small-2505_gguf), using `huggingface-cli download mistralai/Devstral-Small-2505_gguf --local-dir mistralai/Devstral-Small-2505_gguf`. Then in bash terminal, run `lms import {model_name}` in the directory where you've downloaded the model checkpoint (e.g. run `lms import devstralQ4_K_M.gguf` in `mistralai/Devstral-Small-2505_gguf`)
|
||||
### 1. Install LM Studio
|
||||
|
||||
3. Open LM Studio application, you should first switch to `power user` mode, and then open the developer tab:
|
||||
Download and install the LM Studio desktop app from [lmstudio.ai](https://lmstudio.ai/).
|
||||
|
||||

|
||||
### 2. Download Devstral Small
|
||||
|
||||
4. Then click `Select a model to load` on top of the application:
|
||||
1. Make sure to set the User Interface Complexity Level to "Power User", by clicking on the appropriate label at the bottom of the window.
|
||||
2. Click the "Discover" button (Magnifying Glass icon) on the left navigation bar to open the Models download page.
|
||||
|
||||

|
||||

|
||||
|
||||
5. And choose the model you want to use, holding `option` on mac to enable advanced loading options:
|
||||
3. Search for the "Devstral Small 2505" model, confirm it's the official Mistral AI (mistralai) model, then proceed to download.
|
||||
|
||||

|
||||

|
||||
|
||||
6. You should then pick an appropriate context window for OpenHands based on your hardware configuration (larger than 32768 is recommended for using OpenHands, but too large may cause you to run out of memory); Flash attention is also recommended if it works on your machine.
|
||||
4. Wait for the download to finish.
|
||||
|
||||

|
||||
### 3. Load the Model
|
||||
|
||||
7. And you should start the server (if it is not already in `Running` status), un-toggle `Serve on Local Network` and remember the port number of the LMStudio URL (`1234` is the port number for `http://127.0.0.1:1234` in this example):
|
||||
1. Click the "Developer" button (Console icon) on the left navigation bar to open the Developer Console.
|
||||
2. Click the "Select a model to load" dropdown at the top of the application window.
|
||||
|
||||

|
||||

|
||||
|
||||
8. Finally, you can click the `copy` button near model name to copy the model name (`imported-models/uncategorized/devstralq4_k_m.gguf` in this example):
|
||||
3. Enable the "Manually choose model load parameters" switch.
|
||||
4. Select 'Devstral Small 2505' from the model list.
|
||||
|
||||

|
||||

|
||||
|
||||
### Start OpenHands with locally served model
|
||||
5. Enable the "Show advanced settings" switch at the bottom of the Model settings flyout to show all the available settings.
|
||||
6. Set "Context Length" to at least 32768 and enable Flash Attention.
|
||||
7. Click "Load Model" to start loading the model.
|
||||
|
||||
Check [the installation guide](/usage/local-setup) to make sure you have all the prerequisites for running OpenHands.
|
||||

|
||||
|
||||
### 4. Start the LLM server
|
||||
|
||||
1. Enable the switch next to "Status" at the top-left of the Window.
|
||||
2. Take note of the Model API Identifier shown on the sidebar on the right.
|
||||
|
||||

|
||||
|
||||
### 5. Start OpenHands
|
||||
|
||||
1. Check [the installation guide](/usage/local-setup) and ensure all prerequisites are met before running OpenHands, then run:
|
||||
|
||||
```bash
|
||||
export LMSTUDIO_MODEL_NAME="imported-models/uncategorized/devstralq4_k_m.gguf" # <- Replace this with the model name you copied from LMStudio
|
||||
export LMSTUDIO_URL="http://host.docker.internal:1234" # <- Replace this with the port from LMStudio
|
||||
|
||||
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik
|
||||
|
||||
mkdir -p ~/.openhands && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"lm_studio/'$LMSTUDIO_MODEL_NAME'","llm_api_key":"dummy","llm_base_url":"'$LMSTUDIO_URL/v1'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true,"user_consents_to_analytics":true}' > ~/.openhands/settings.json
|
||||
|
||||
docker run -it --rm --pull=always \
|
||||
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.45-nikolaik \
|
||||
-e LOG_ALL_EVENTS=true \
|
||||
@ -69,9 +81,7 @@ docker run -it --rm --pull=always \
|
||||
docker.all-hands.dev/all-hands-ai/openhands:0.45
|
||||
```
|
||||
|
||||
> **Note**: If you used OpenHands before version 0.44, you may want to run `mv ~/.openhands-state ~/.openhands` to migrate your conversation history to the new location.
|
||||
|
||||
Once your server is running -- you can visit `http://localhost:3000` in your browser to use OpenHands with local Devstral model:
|
||||
2. Wait until the server is running (see log below):
|
||||
```
|
||||
Digest: sha256:e72f9baecb458aedb9afc2cd5bc935118d1868719e55d50da73190d3a85c674f
|
||||
Status: Image is up to date for docker.all-hands.dev/all-hands-ai/openhands:0.45
|
||||
@ -84,65 +94,88 @@ INFO: Application startup complete.
|
||||
INFO: Uvicorn running on http://0.0.0.0:3000 (Press CTRL+C to quit)
|
||||
```
|
||||
|
||||
3. Visit `http://localhost:3000` in your browser.
|
||||
|
||||
## Advanced: Serving LLM on GPUs
|
||||
### 6. Configure OpenHands to use the LLM server
|
||||
|
||||
### Download model checkpoints
|
||||
Once you open OpenHands in your browser, you'll need to configure it to use the local LLM server you just started.
|
||||
|
||||
<Note>
|
||||
The model checkpoints downloaded here should NOT be in GGUF format.
|
||||
</Note>
|
||||
When started for the first time, OpenHands will prompt you to set up the LLM provider.
|
||||
|
||||
For example, to download [OpenHands LM 32B v0.1](https://huggingface.co/all-hands/openhands-lm-32b-v0.1):
|
||||
1. Click "see advanced settings" to open the LLM Settings page.
|
||||
|
||||

|
||||
|
||||
2. Enable the "Advanced" switch at the top of the page to show all the available settings.
|
||||
|
||||
3. Set the following values:
|
||||
- **Custom Model**: `openai/mistralai/devstral-small-2505` (the Model API identifier from LM Studio, prefixed with "openai/")
|
||||
- **Base URL**: `http://host.docker.internal:1234/v1`
|
||||
- **API Key**: `local-llm`
|
||||
|
||||
4. Click "Save Settings" to save the configuration.
|
||||
|
||||

|
||||
|
||||
That's it! You can now start using OpenHands with the local LLM server.
|
||||
|
||||
If you encounter any issues, let us know on [Slack](https://join.slack.com/t/openhands-ai/shared_invite/zt-34zm4j0gj-Qz5kRHoca8DFCbqXPS~f_A) or [Discord](https://discord.gg/ESHStjSjD4).
|
||||
|
||||
## Advanced: Alternative LLM Backends
|
||||
|
||||
This section describes how to run local LLMs with OpenHands using alternative backends like Ollama, SGLang, or vLLM — without relying on LM Studio.
|
||||
|
||||
### Create an OpenAI-Compatible Endpoint with Ollama
|
||||
|
||||
- Install Ollama following [the official documentation](https://ollama.com/download).
|
||||
- Example launch command for Devstral Small 2505:
|
||||
|
||||
```bash
|
||||
huggingface-cli download all-hands/openhands-lm-32b-v0.1 --local-dir all-hands/openhands-lm-32b-v0.1
|
||||
# ⚠️ WARNING: OpenHands requires a large context size to work properly.
|
||||
# When using Ollama, set OLLAMA_CONTEXT_LENGTH to at least 32768.
|
||||
# The default (4096) is way too small — not even the system prompt will fit, and the agent will not behave correctly.
|
||||
OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve &
|
||||
ollama pull devstral:latest
|
||||
```
|
||||
|
||||
### Create an OpenAI-Compatible Endpoint With SGLang
|
||||
### Create an OpenAI-Compatible Endpoint with vLLM or SGLang
|
||||
|
||||
First, download the model checkpoints. For [Devstral Small 2505](https://huggingface.co/mistralai/Devstral-Small-2505):
|
||||
|
||||
```bash
|
||||
huggingface-cli download mistralai/Devstral-Small-2505 --local-dir mistralai/Devstral-Small-2505
|
||||
```
|
||||
|
||||
#### Serving the model using SGLang
|
||||
|
||||
- Install SGLang following [the official documentation](https://docs.sglang.ai/start/install.html).
|
||||
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
|
||||
- Example launch command for Devstral Small 2505 (with at least 2 GPUs):
|
||||
|
||||
```bash
|
||||
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python3 -m sglang.launch_server \
|
||||
--model all-hands/openhands-lm-32b-v0.1 \
|
||||
--served-model-name openhands-lm-32b-v0.1 \
|
||||
--model mistralai/Devstral-Small-2505 \
|
||||
--served-model-name Devstral-Small-2505 \
|
||||
--port 8000 \
|
||||
--tp 2 --dp 1 \
|
||||
--host 0.0.0.0 \
|
||||
--api-key mykey --context-length 131072
|
||||
```
|
||||
|
||||
### Create an OpenAI-Compatible Endpoint with vLLM
|
||||
#### Serving the model using vLLM
|
||||
|
||||
- Install vLLM following [the official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html).
|
||||
- Example launch command for OpenHands LM 32B (with at least 2 GPUs):
|
||||
- Example launch command for Devstral Small 2505 (with at least 2 GPUs):
|
||||
|
||||
```bash
|
||||
vllm serve all-hands/openhands-lm-32b-v0.1 \
|
||||
vllm serve mistralai/Devstral-Small-2505 \
|
||||
--host 0.0.0.0 --port 8000 \
|
||||
--api-key mykey \
|
||||
--tensor-parallel-size 2 \
|
||||
--served-model-name openhands-lm-32b-v0.1
|
||||
--served-model-name Devstral-Small-2505 \
|
||||
--enable-prefix-caching
|
||||
```
|
||||
|
||||
### Create an OpenAI-Compatible Endpoint with Ollama
|
||||
|
||||
- Install Ollama following [the official documentation](https://ollama.com/download).
|
||||
- For Ollama configuration, use `ollama/<modelname>` as custom model in web. Api key also can be set to `ollama`.
|
||||
- Example launch command for Devstral LM 24B:
|
||||
|
||||
```bash
|
||||
OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_KEEP_ALIVE=-1 nohup ollama serve&
|
||||
#The minimum context size is ~8196, even the system prompt won't fit smaller
|
||||
ollama pull devstral:latest
|
||||
```
|
||||
|
||||
## Advanced: Run and Configure OpenHands
|
||||
|
||||
### Run OpenHands
|
||||
### Run OpenHands (Alternative Backends)
|
||||
|
||||
#### Using Docker
|
||||
|
||||
@ -151,28 +184,20 @@ Run OpenHands using [the official docker run command](../installation#start-the-
|
||||
#### Using Development Mode
|
||||
|
||||
Use the instructions in [Development.md](https://github.com/All-Hands-AI/OpenHands/blob/main/Development.md) to build OpenHands.
|
||||
Ensure `config.toml` exists by running `make setup-config` which will create one for you. In the `config.toml`, enter the following:
|
||||
|
||||
```
|
||||
[core]
|
||||
workspace_base="/path/to/your/workspace"
|
||||
|
||||
[llm]
|
||||
model="openhands-lm-32b-v0.1"
|
||||
ollama_base_url="http://localhost:8000"
|
||||
```
|
||||
|
||||
Start OpenHands using `make run`.
|
||||
|
||||
### Configure OpenHands
|
||||
### Configure OpenHands (Alternative Backends)
|
||||
|
||||
Once OpenHands is running, you'll need to set the following in the OpenHands UI through the Settings under the `LLM` tab:
|
||||
1. Enable `Advanced` options.
|
||||
2. Set the following:
|
||||
- `Custom Model` to `openai/<served-model-name>` (e.g. `openai/openhands-lm-32b-v0.1`)
|
||||
- `Base URL` to `http://host.docker.internal:8000`
|
||||
- `API key` to the same string you set when serving the model (e.g. `mykey`)
|
||||
Once OpenHands is running, open the Settings page in the UI and go to the `LLM` tab.
|
||||
|
||||
<Note>
|
||||
**API Key for Local LLMs**: When using local LLM servers (including Ollama, LM Studio, vLLM, etc.), you can enter any value as the API key if your server doesn't require authentication. The OpenHands UI requires an API key to be entered, but for local servers without authentication, you can use any placeholder value like `local-key`, `test123`, or `dummy`.
|
||||
</Note>
|
||||
1. Click **"see advanced settings"** to access the full configuration panel.
|
||||
2. Enable the **Advanced** toggle at the top of the page.
|
||||
3. Set the following parameters, if you followed the examples above:
|
||||
- **Custom Model**: `openai/<served-model-name>`
|
||||
e.g. `openai/devstral` if you're using Ollama, or `openai/Devstral-Small-2505` for SGLang or vLLM.
|
||||
- **Base URL**: `http://host.docker.internal:<port>/v1`
|
||||
Use port `11434` for Ollama, or `8000` for SGLang and vLLM.
|
||||
- **API Key**:
|
||||
- For **Ollama**: any placeholder value (e.g. `dummy`, `local-llm`)
|
||||
- For **SGLang** or **vLLM**: use the same key provided when starting the server (e.g. `mykey`)
|
||||
|
||||
BIN
docs/usage/llms/screenshots/01_lm_studio_open_model_hub.png
Normal file
|
After Width: | Height: | Size: 68 KiB |
BIN
docs/usage/llms/screenshots/02_lm_studio_download_devstral.png
Normal file
|
After Width: | Height: | Size: 168 KiB |
BIN
docs/usage/llms/screenshots/03_lm_studio_open_load_model.png
Normal file
|
After Width: | Height: | Size: 60 KiB |
|
After Width: | Height: | Size: 73 KiB |
|
After Width: | Height: | Size: 127 KiB |
BIN
docs/usage/llms/screenshots/06_lm_studio_start_server.png
Normal file
|
After Width: | Height: | Size: 87 KiB |
|
After Width: | Height: | Size: 18 KiB |
|
After Width: | Height: | Size: 74 KiB |
|
Before Width: | Height: | Size: 228 KiB |
|
Before Width: | Height: | Size: 420 KiB |
|
Before Width: | Height: | Size: 83 KiB |
|
Before Width: | Height: | Size: 558 KiB |
|
Before Width: | Height: | Size: 646 KiB |
|
Before Width: | Height: | Size: 93 KiB |