vLLM Configuration

OpenAI Compatible Configuration Interface

The OpenAI API Compatible configuration screen in PipesHub where you’ll enter your vLLM endpoint URL, API Key, and Model Name PipesHub allows you to integrate with vLLM (High-throughput and memory-efficient inference engine) using its OpenAI-compatible API endpoint. vLLM is designed for fast LLM inference and serving, making it ideal for self-hosted deployments.

What is vLLM?

vLLM is an open-source library for fast LLM inference and serving. It provides:

High throughput serving with PagedAttention
Continuous batching of incoming requests
Optimized CUDA kernels for faster inference
OpenAI-compatible API server
Support for various popular open-source models

Prerequisites

Before configuring vLLM in PipesHub, ensure you have:

A running vLLM server instance
The endpoint URL where your vLLM server is accessible
(Optional) API key if you’ve configured authentication on your vLLM server
The model name/path used when starting your vLLM server

Starting a vLLM Server

If you haven’t started a vLLM server yet, here’s a quick example:

# Install vLLM
pip install "vllm>=0.8.5"

# With API key authentication
vllm serve Qwen/Qwen3-8B --port 8000 --api-key your-secret-key

Your vLLM server will be accessible at http://localhost:8000/v1/ (or your server’s IP/domain).

Required Fields

Endpoint URL *

The Endpoint URL is the base API endpoint of your vLLM server. Format: http://your-server:port/v1/ Examples:

Local deployment: http://localhost:8000/v1/
Remote server: http://192.168.1.100:8000/v1/
Domain-based: https://vllm.yourdomain.com/v1/

Important:

The endpoint URL must include the /v1/ suffix
Use https:// for production deployments with SSL/TLS
Ensure the server is accessible from where PipesHub is running
Check firewall rules if connecting to a remote vLLM server

API Key *

The API Key field is used to authenticate requests to your vLLM server. Configuration options:

If your vLLM server was started with --api-key, enter that key here
If your vLLM server was started without authentication, you can enter any placeholder value (e.g., no-key or dummy)

Security Note: For production deployments, always configure API key authentication on your vLLM server and use strong, unique keys.

Model Name *

The Model Name must match the model identifier used when starting your vLLM server. Examples:

Qwen/Qwen3-8B

Finding your model name: You can query your vLLM server to list available models:

curl http://localhost:8000/v1/models

Important: The model name must exactly match what was specified when starting the vLLM server.

Optional Features

Multimodal

Enable this checkbox if your vLLM server is running a model that supports multimodal input (text + images). When to enable:

You’re using a vision-language model (e.g., LLaVA, Qwen-VL)
The model was specifically trained for multimodal understanding
You need to process documents with images or visual content

Example multimodal models for vLLM:

Qwen/Qwen3-8B

Note: Standard text-only models do not support multimodal capabilities. Verify your model’s documentation before enabling this feature.

Reasoning

Enable this checkbox if your model has enhanced reasoning capabilities. When to enable:

You’re using a reasoning-focused model (e.g., DeepSeek-R1)
The model is designed for complex problem-solving tasks
Your use case involves mathematical, logical, or multi-step reasoning

Note: Reasoning models typically take longer to generate responses as they perform additional internal reasoning steps.

Configuration Steps

As shown in the image above:

Select “OpenAI API Compatible” as your Provider Type from the dropdown
Enter your vLLM server’s Endpoint URL (e.g., http://localhost:8000/v1/)
Enter your API Key (or a placeholder if authentication is disabled)
Specify the exact Model Name used when starting your vLLM server
(Optional) Check “Multimodal” if using a vision-language model
(Optional) Check “Reasoning” if using a reasoning-focused model
Click “Add Model” to complete the setup

All fields marked with an asterisk (*) are required to successfully configure the vLLM integration. You must complete these fields to proceed with the setup.

Supported Models

vLLM supports a wide range of open-source models. For the most up-to-date list of supported models, check the vLLM documentation.

Performance Considerations

Optimizing your vLLM deployment:

GPU Memory: Ensure adequate GPU memory for your model size
Batch Size: vLLM automatically manages batching for optimal throughput
Tensor Parallelism: For large models, use multiple GPUs with --tensor-parallel-size
Quantization: Use quantized models (GPTQ, AWQ) to reduce memory usage
Context Length: Adjust --max-model-len based on your use case

Example with optimizations:

vllm serve Qwen/Qwen3-8B \
  --tensor-parallel-size 4 \
  --max-model-len 8192 \
  --port 8000 \
  --api-key your-secret-key

Troubleshooting

Connection Issues:

Verify the endpoint URL is correct and includes /v1/
Check that the vLLM server is running: curl http://localhost:8000/health
Ensure network connectivity between PipesHub and vLLM server
Check firewall rules and port accessibility
For remote servers, ensure proper DNS resolution

Authentication Errors:

Verify the API key matches what was set with --api-key when starting vLLM
If no authentication was configured, any placeholder value should work
Check vLLM server logs for authentication failures

Model Not Found:

Confirm the model name exactly matches the one used to start the vLLM server
Query available models: curl http://localhost:8000/v1/models
Restart vLLM server if the model was changed

Performance Issues:

Monitor GPU memory usage and utilization
Check vLLM server logs for warnings or errors
Consider using a smaller model or quantization
Adjust --max-model-len if seeing out-of-memory errors
Use tensor parallelism for large models

Server Not Starting:

Verify CUDA/GPU drivers are properly installed
Check you have sufficient GPU memory for the model
Review vLLM server logs for detailed error messages
Ensure the model is compatible with your vLLM version

For additional support, refer to the vLLM documentation or contact PipesHub support.

Welcome To PipesHub

System Overview

Authentication

Mail Configuration

AI Providers

Connectors

User Management

Deployment

Developer

Additional Resources

vLLM

vLLM Configuration

What is vLLM?

Prerequisites

Starting a vLLM Server

Required Fields

Endpoint URL *

API Key *

Model Name *

Optional Features

Multimodal

Reasoning

Configuration Steps

Supported Models

Performance Considerations

Troubleshooting

Welcome To PipesHub

System Overview

Authentication

Mail Configuration

AI Providers

Connectors

User Management

Deployment

Developer

Additional Resources

​vLLM Configuration

​What is vLLM?

​Prerequisites

​Starting a vLLM Server

​Required Fields

​Endpoint URL *

​API Key *

​Model Name *

​Optional Features

​Multimodal

​Reasoning

​Configuration Steps

​Supported Models

​Performance Considerations

​Troubleshooting

vLLM Configuration

What is vLLM?

Prerequisites

Starting a vLLM Server

Required Fields

Endpoint URL *

API Key *

Model Name *

Optional Features

Multimodal

Reasoning

Configuration Steps

Supported Models

Performance Considerations

Troubleshooting