Sentence Transformer
Configure PipesHub Workplace AI to use Sentence Transformer embedding models
Sentence Transformer Embeddings Configuration
The Sentence Transformer embeddings configuration screen in PipesHub where you’ll enter your Embedding Model
PipesHub allows you to integrate with state-of-the-art Sentence Transformer embedding models to enable semantic search, document retrieval, and other AI features in your workspace.
About Sentence Transformers
Sentence Transformers is a Python framework for state-of-the-art sentence, text, and image embeddings. These models are designed to map sentences and paragraphs to a dense vector space where semantically similar texts are close to each other, making them ideal for:
- Semantic search
- Document retrieval
- Text clustering
- Text classification
- Question answering
- Multilingual applications
Required Fields
Embedding Model *
The Embedding Model field is the only required parameter for Sentence Transformer configuration. It defines which model you want to use with PipesHub.
Popular Sentence Transformer models include:
all-MiniLM-L6-v2
- Lightweight general-purpose model (384 dimensions)all-mpnet-base-v2
- High-performance general model (768 dimensions)multi-qa-mpnet-base-dot-v1
- Optimized for question answeringparaphrase-multilingual-mpnet-base-v2
- Supports 50+ languagesall-distilroberta-v1
- Balanced performance and efficiency
How to choose a model:
- For general purpose use, select
all-MiniLM-L6-v2
orall-mpnet-base-v2
- For multilingual applications, select
paraphrase-multilingual-mpnet-base-v2
- For specific tasks like question answering, select domain-specific models
- Check the Sentence Transformers documentation for the most up-to-date options
Configuration Steps
As shown in the image above:
- Select “Sentence Transformer” as your Provider from the dropdown
- Specify your desired Embedding Model in the designated field (marked with *)
- Click “Continue” to proceed with setup
The system configuration interface clearly indicates which fields are required with an asterisk (*). The Embedding Model field is the only required field to successfully configure Sentence Transformer integration.
Model Specifications
Model | Dimensions | Performance | Size | Languages | Best For |
---|---|---|---|---|---|
all-MiniLM-L6-v2 | 384 | High | 80MB | English | General purpose, efficient |
all-mpnet-base-v2 | 768 | Very High | 420MB | English | High accuracy |
multi-qa-mpnet-base-dot-v1 | 768 | Very High | 420MB | English | Question answering |
paraphrase-multilingual-mpnet-base-v2 | 768 | High | 420MB | 50+ languages | Multilingual applications |
all-distilroberta-v1 | 768 | High | 290MB | English | Balanced efficiency |
Advanced Usage
Sentence Transformer models offer several advantages over other embedding models:
- Self-contained - No external API required
- Cost-effective - No usage costs once deployed
- Privacy - Data never leaves your infrastructure
- Customizable - Models can be fine-tuned for specific domains
- Multilingual - Support for many languages with specialized models
Technical Details
The Sentence Transformer architecture consists of:
- A transformer network (like BERT, RoBERTa, or MPNet) that generates embeddings for each token
- A pooling layer that combines token embeddings into a fixed-size sentence embedding
- Optional normalization to ensure consistent vector magnitudes
The models map text to a dense vector space where semantic similarity corresponds to vector similarity, typically measured using cosine similarity.
Usage Considerations
- Larger models generally provide better performance but require more memory
- Models with higher dimensions may provide more nuanced embeddings but require more storage
- Consider your specific use case when selecting a model (general purpose vs. domain-specific)
- Most models have a maximum sequence length of 128-512 tokens
- First-time model loading may take a few seconds
Troubleshooting
- If you encounter errors, verify your model name is spelled correctly
- For memory issues, consider using a smaller model like
all-MiniLM-L6-v2
- For slow performance, ensure you have adequate computational resources
- For multilingual applications, ensure you’re using a model that supports your target languages
For additional support, refer to the Sentence Transformers documentation or contact PipesHub support.