"how numbers are stored and used in computers"

Running vLLM in standalone mode

vLLM can be run in standalone mode, which means it will not be integrated with any other service. This is useful for testing and development.

code.py
1from vllm import LLM, SamplingParams
2
3# Load model
4llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.1")
5
6# Define sampling parameters
7params = SamplingParams(temperature=0.8, top_p=0.9, max_tokens=128)
8
9# Run inference
10prompts = [
11    "Write a haiku about the moon.",
12    "What are the key differences between Python and Java?"
13]
14outputs = llm.generate(prompts, sampling_params=params)
15
16for output in outputs:
17    print(f"Prompt: {output.prompt}")
18    print(f"Response: {output.outputs[0].text}\n")

This is useful for batch inference, testing, or embedding LLM inference in custom apps.