diff --git a/docs/usage/llms/local-llms.mdx b/docs/usage/llms/local-llms.mdx index 0e75fb581c..a1d14d5c92 100644 --- a/docs/usage/llms/local-llms.mdx +++ b/docs/usage/llms/local-llms.mdx @@ -179,6 +179,23 @@ If you are interested in further improved inference speed, you can also try Snow of vLLM, [ArcticInference](https://www.snowflake.com/en/engineering-blog/fast-speculative-decoding-vllm-arctic/), which can achieve up to 2x speedup in some cases. +1. Install the Arctic Inference library that automatically patches vLLM: + +```bash +pip install git+https://github.com/snowflakedb/ArcticInference.git +``` + +2. Run the launch command with speculative decoding enabled: + +```bash +vllm serve mistralai/Devstral-Small-2505 \ + --host 0.0.0.0 --port 8000 \ + --api-key mykey \ + --tensor-parallel-size 2 \ + --served-model-name Devstral-Small-2505 \ + --speculative-config '{"method": "suffix"}' +``` + ### Run OpenHands (Alternative Backends) #### Using Docker