# Online Serving

Online serving examples demonstrate how to use vLLM in an online setting, where the model is queried for predictions in real-time.

:::{toctree}
:caption: Examples
:maxdepth: 1
api_client
chart-helm
cohere_rerank_client
disaggregated_prefill
gradio_openai_chatbot_webserver
gradio_webserver
jinaai_rerank_client
multi-node-serving
openai_chat_completion_client
openai_chat_completion_client_for_multimodal
openai_chat_completion_client_with_tools
openai_chat_completion_structured_outputs
openai_chat_completion_structured_outputs_with_reasoning
openai_chat_completion_with_reasoning
openai_chat_completion_with_reasoning_streaming
openai_chat_embedding_client_for_multimodal
openai_completion_client
openai_cross_encoder_score
openai_embedding_client
openai_pooling_client
openai_transcription_client
opentelemetry
prometheus_grafana
run_cluster
sagemaker-entrypoint
:::
