Number Representations & States

"how numbers are stored and used in computers"

Continuous Batching in vLLM

You can stream many requests simultaneously. Here's an example using curl

code.txt
1curl -X POST http://localhost:8000/v1/chat/completions \ 2-H "Content-Type: application/json" \ 3-d '{ 4 "model": "meta-llama/Llama-2-7b-chat-hf", 5 "messages": [{"role": "user", "content": "Summarize the plot of Dune."}], 6 "stream": false 7}'

You can send multiple such requests in parallel, and vLLM will dynamically batch them together under the hood.