"how numbers are stored and used in computers"
vLLM can be run in standalone mode, which means it will not be integrated with any other service. This is useful for testing and development.
code.py1from vllm import LLM, SamplingParams 2 3# Load model 4llm = LLM(model="mistralai/Mistral-7B-Instruct-v0.1") 5 6# Define sampling parameters 7params = SamplingParams(temperature=0.8, top_p=0.9, max_tokens=128) 8 9# Run inference 10prompts = [ 11 "Write a haiku about the moon.", 12 "What are the key differences between Python and Java?" 13] 14outputs = llm.generate(prompts, sampling_params=params) 15 16for output in outputs: 17 print(f"Prompt: {output.prompt}") 18 print(f"Response: {output.outputs[0].text}\n")
This is useful for batch inference, testing, or embedding LLM inference in custom apps.