Add instruction to use Arctic Inference (#9547)

2025-12-26 05:48:36 +08:00 · 2025-07-04 20:34:05 +07:00 · 2025-07-04 20:34:05 +07:00 · 8bc9207c24
commit 8bc9207c24
parent 96008736a4
1 changed files with 17 additions and 0 deletions
--- a/docs/usage/llms/local-llms.mdx
+++ b/docs/usage/llms/local-llms.mdx
@ -179,6 +179,23 @@ If you are interested in further improved inference speed, you can also try Snow
 of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/),
 which can achieve up to 2x speedup in some cases.

+1. Install the Arctic Inference library that automatically patches vLLM:
+
+```bash
+pip install git+https://github.com/snowflakedb/ArcticInference.git
+```
+
+2. Run the launch command with speculative decoding enabled:
+
+```bash
+vllm serve mistralai/Devstral-Small-2505 \
+    --host 0.0.0.0 --port 8000 \
+    --api-key mykey \
+    --tensor-parallel-size 2 \
+    --served-model-name Devstral-Small-2505 \
+    --speculative-config '{"method": "suffix"}'
+```
+
 ### Run OpenHands (Alternative Backends)

 #### Using Docker