"how numbers are stored and used in computers"

vLLM streaming

vLLM supports streaming responses. Here's an example using curl:

code.txt
1curl http://localhost:8000/v1/chat/completions \
2  -H "Content-Type: application/json" \
3  -d '{
4    "model": "meta-llama/Llama-2-7b-chat-hf",
5    "messages": [{"role": "user", "content": "Write a short poem about robots"}],
6    "stream": true
7}'

The output will arrive token-by-token, ideal for UIs or low-latency applications.