Overview
Amazon S3 (Simple Storage Service) is a scalable object storage service designed to store and retrieve any amount of data from anywhere on the web. It provides industry-leading scalability, data availability, security, and performance. The S3 connector enables you to sync files and folders from your S3 buckets into PipesHub, making your cloud storage content searchable through AI-powered search and accessible across your organization.S3 Data Structure
The connector understands S3’s hierarchical structure: Buckets → Folders → Files| Entity | Description |
|---|---|
| Buckets | Top-level containers for organizing objects (similar to root directories) |
| Folders | Logical groupings of objects using prefixes (e.g., documents/reports/) |
| Files | Individual objects stored in buckets with unique keys (paths) |
What Gets Synced
The connector indexes the following content for AI-powered search:- Files: All file types stored in S3 buckets (documents, images, videos, archives, etc.)
- Folders: Directory structure and organization
- Metadata: File names, sizes, modification dates, MIME types
- Permissions: Access control based on bucket and object permissions
Configuration Guide
Setup
Setup
Setup Overview
The S3 connector provides access to your AWS S3 buckets through Access Key authentication. It syncs files, folders, and metadata, enabling comprehensive search and access across your S3 content.Authentication
The S3 connector uses AWS Access Key ID and Secret Access Key for authentication. This method allows secure programmatic access to your S3 buckets without requiring OAuth setup.Access Key authentication is simpler to set up than OAuth but requires you to manage AWS IAM credentials. Ensure you follow AWS security best practices when creating and storing access keys.
How to configure and enable the S3 Connector
Step 1: Create an IAM User in AWS
-
Sign in to AWS Management Console:
Navigate to console.aws.amazon.com and sign in with your AWS account credentials. -
Access IAM Service:
- In the AWS Management Console, search for “IAM” in the top search bar
- Click on “IAM” from the results to open the Identity and Access Management console

- Navigate to Users:
- In the left sidebar, click “Users”
- Click the “Create user” button (orange button at the top right)

Step 2: Configure User Details
- Enter User Name:
- Enter a descriptive name for the user (e.g., “PipesHub-S3-Connector”)
- Click “Next” button

- Set Permissions:
- Select “Attach policies directly”
- Search for and select “AmazonS3ReadOnlyAccess” policy
- This policy provides read-only access to all S3 buckets
For more granular control, you can create a custom IAM policy that restricts access to specific buckets. See the “Advanced: Custom IAM Policy” section below for details.

- Review and Create:
- Review the user configuration
- Click “Create user” button at the bottom
Step 3: Create Access Keys
- Access Security Credentials:
- After creating the user, you’ll be redirected to the user details page
- Click on the “Security credentials” tab
- Scroll down to the “Access keys” section

- Create Access Key:
- Click “Create access key” button
- Select “Application running outside AWS” as the use case
- Check the confirmation box
- Click “Next” button

- Download or Copy Credentials:
- Access Key ID: Copy this value immediately
- Secret Access Key: Click “Show” button and copy this value immediately

- Download Credentials (Optional):
- Click “Download .csv file” button to save credentials to a secure location
- Click “Done” button to complete the process
Step 4: Open S3 in PipesHub
- Open the PipesHub app.
- In the left sidebar, click Connectors (under Workspace).
- Find the S3 tile.
- Click + Setup on the tile.
Step 5: Fill in your S3 credentials

- Instance name — type any name you like (for example
S3). This is just a label so you can recognise it later. - Access Key — paste the Access Key ID you copied from AWS in Step 3.
- Secret Key — paste the Secret Access Key you copied from AWS in Step 3.
- Click Next → (bottom right).
The two links at the top (S3 Access Key Setup and Pipeshub Documentation) are just help links. You can ignore them if you’ve already followed Steps 1–3.
Step 6: Choose how often S3 should sync
You’re now on the Configure Records tab. The top part is called Sync settings.
- Sync Strategy — pick one:
- Scheduled — PipesHub checks S3 for new or changed files automatically. (Recommended.)
- Manual — nothing is pulled until you click Sync yourself.
- Sync Interval — only shown if you chose Scheduled. Pick how often to check S3 (for example
1 Hour).
Step 7: Pick what to sync (filters)
Still on the same tab, scroll down to Indexing & sync filters. Filters let you limit what PipesHub pulls from S3. If you don’t add any filters, everything will be synced.- Enable Manual Indexing — leave this off. (Turning it on stops PipesHub from automatically indexing new files for search.)
- Click + Add filter to add a filter. You’ll see four options:
- Bucket Names — pick which S3 buckets to include or exclude. PipesHub loads your bucket list automatically — just select from the dropdown.
- File Extensions — include or exclude specific file types. Type them comma-separated, e.g.
pdf, docx, txt. - Modified Date — only sync files changed after / before / between the dates you choose.
- Created Date — same idea, but based on when the file was first uploaded to S3.
- Click Save Configuration (bottom right) when you’re done.
You can change filters any time later. New filters take effect on the next sync.
Step 8: Turn on sync
After saving, the S3 tile now shows a card for your new instance.
- Find the Sync Enabled row at the bottom of the card.
- Click the toggle so it turns on.
Want to connect a second AWS account? Click + Add Another Instance in the top-right and repeat Steps 5–8 with the new account’s keys.
Step 9: Check that sync is working
Click the instance card to open the Overview panel. This shows you live progress.
- Records Status — the main scoreboard:
- Total — how many files PipesHub knows about.
- Failed — files that couldn’t be fetched (usually a permission problem in AWS).
- Unsupported — file types PipesHub can’t read (e.g. encrypted or unknown binary).
- Processing — being indexed right now.
- Not Started — waiting in the queue.
- Records by Type — shows how many FILE records have been synced.
- Sync (top right) — click it to pull the latest changes right now.
- Full sync — click this to re-scan everything from scratch (slower; use only if something looks wrong).
- Manage Configuration — opens the setup panel again if you need to change keys or filters.
0, everything’s working.Advanced: Custom IAM Policy
For enhanced security, you can create a custom IAM policy that restricts access to specific buckets or prefixes.Example Custom Policy
Here’s a basic example policy for a single bucket:Enhanced Custom Policy (Multiple Buckets)
For production use with multiple buckets and additional permissions, use this comprehensive policy:Replace the bucket names in the Resource array (
your-bucket-name-1, your-bucket-name-2, your-bucket-name-3) with your actual bucket names. The arn:aws:s3:::*/* allows access to all objects in the specified buckets.- In IAM, go to “Policies” → “Create policy”
- Use the JSON editor to paste the policy above (replace
your-bucket-namewith your actual bucket name) - Name the policy (e.g., “PipesHub-S3-ReadOnly-SpecificBuckets”)
- Attach the policy to your IAM user instead of the managed policy
Supported Features
The S3 connector syncs the following data from your AWS S3 buckets:- Buckets: All accessible buckets in your AWS account
- Files: All file types with their metadata (name, size, modification date, MIME type)
- Folders: Directory structure and organization using S3 prefixes
- Permissions: Access control based on bucket and object-level permissions
Data Sync Behavior
Initial Sync
The first sync performs a complete scan of your configured S3 buckets:- Bucket Discovery: Lists all accessible buckets in your AWS account (or uses configured bucket filters)
- Region Detection: Automatically detects and caches the region for each bucket
- Full Object Listing: Fetches all objects from selected buckets using pagination
- Metadata Extraction: Extracts file names, sizes, modification dates, MIME types, and paths
- Folder Structure: Creates logical folder structure from S3 object prefixes
- Indexing: Indexes file content for AI-powered search (based on indexing filter settings)
- Permission Assignment: Assigns permissions based on connector scope (Personal or Team)
| Bucket Size | Objects | Estimated Time |
|---|---|---|
| Small | < 1,000 | 5-15 minutes |
| Medium | 1,000 - 10,000 | 15-45 minutes |
| Large | 10,000 - 100,000 | 1-3 hours |
| Very Large | 100,000 - 1,000,000 | 3-8 hours |
| Enterprise | 1,000,000+ | 8+ hours |
Use bucket and file extension filters to significantly reduce initial sync time. For example, syncing only PDF files from a specific bucket can reduce sync time by 80-90%.
Incremental Sync
After the initial sync, subsequent syncs are much faster:- Timestamp-Based Detection: Uses
LastModifiedtimestamps to identify changed objects - Sync Point Tracking: Stores the last sync timestamp per bucket in sync points
- Change Detection: Only fetches objects modified since the last sync timestamp
- Pagination Resume: Uses continuation tokens to resume interrupted syncs
- Efficient Updates: Processes only new, modified, or deleted objects
- API Optimization: Reduces API calls by 90%+ compared to full syncs
- Connector reads the sync point for each bucket (stored in our database)
- Retrieves the
last_sync_timetimestamp - Queries S3 for objects with
LastModified >= last_sync_time - Processes only the changed objects
- Updates the sync point with the new maximum timestamp
Incremental syncs typically take 5-15 minutes for most buckets, regardless of total bucket size, as they only process changes.
Permission Handling
The connector respects AWS IAM permissions and PipesHub access controls:- IAM-Based Access: Only syncs buckets and objects the IAM user has permission to access
- Bucket Policies: Respects S3 bucket policies and access control lists (ACLs)
- PipesHub Scope:
- Personal Scope: Only the connector creator can access synced content
- Team Scope: All organization users have read access to synced content
- Permission Inheritance: Files inherit permissions from their parent bucket (Record Group)
- Access Control: Users see only content they have permission to view in both AWS and PipesHub
Sync Frequency
Scheduled Sync:- Default interval: 60 minutes
- Configurable: 15 min, 30 min, 1 hour, 4 hours, 24 hours
- Runs automatically in the background
- Best for: Keeping data up-to-date with minimal manual intervention
- Triggered on-demand from the connector settings
- Best for: Testing, troubleshooting, or syncing after bulk changes
- Useful when: You need immediate sync after uploading many files
For large buckets with infrequent changes, consider longer sync intervals (4-24 hours) to reduce AWS API usage and costs.
Troubleshooting
Common Issues and Solutions
Common Issues and Solutions
Common Issues
Invalid credentials error:Symptoms:- Error message: “InvalidAccessKeyId” or “SignatureDoesNotMatch”
- Connector fails to initialize
- Authentication fails immediately
- Verify Access Key ID and Secret Access Key are correct
- Ensure you copied the full values without extra spaces or line breaks
- Check that the IAM user is active in AWS IAM console
- Verify the access keys haven’t been deleted or rotated
- Try creating new access keys if the old ones are compromised
- Ensure credentials are pasted exactly as shown (no leading/trailing spaces)
- Error: “AccessDenied” when listing or accessing buckets
- Some buckets appear but others don’t
- Sync fails with permission errors
- Verify the IAM user has the following permissions:
s3:ListAllMyBuckets- To list all bucketss3:ListBucket- To list objects in each buckets3:GetBucketLocation- To detect bucket regionss3:GetObject- To download object content
- Check that the bucket policy allows access from your IAM user
- Ensure the bucket exists and is in the same AWS account
- Verify bucket region matches your AWS region configuration
- Review IAM policy JSON to ensure ARNs are correct
- Bucket dropdown is empty
- No buckets available to select
- “No buckets found” message
- Check that the IAM user has
s3:ListAllMyBucketspermission - Verify credentials are correctly entered in PipesHub
- Ensure the IAM user has access to at least one bucket
- Check AWS CloudTrail logs for permission errors
- Verify the IAM user is in the same AWS account as the buckets
- Try refreshing the bucket list in the filter configuration
- Connector shows “Active” but no files appear
- Sync completes but no records are indexed
- Indexing progress stays at 0%
- Verify the connector status shows “Active” or “Syncing”
- Check that buckets are selected in the filter configuration
- Ensure file extension filters aren’t excluding all files
- Verify date filters aren’t excluding all files (e.g., “Modified After” set to future date)
- Review sync logs for specific error messages
- Verify the IAM user has
s3:GetObjectpermission - Check that objects exist in the selected buckets
- Ensure indexing filters are enabled (Index Files, Index Folders)
- Initial sync runs for hours without completion
- Progress bar moves very slowly
- High AWS API usage
- Use Bucket Filters: Sync only necessary buckets instead of all buckets
- Use File Extension Filters: Filter to specific file types (e.g.,
pdf, docx, txt) - Use Date Filters: Sync only recently modified files (e.g., last 6 months)
- Increase Sync Interval: For large buckets, use 4-24 hour intervals
- Check Bucket Size: Very large buckets (millions of objects) will take longer
- Monitor Progress: Check indexing progress to ensure sync is actually running
A bucket with 1 million objects can take 6-12 hours for initial sync. Using filters to sync only 10% of objects reduces this to 1-2 hours.
- Warnings about region detection
- Sync fails with region mismatch errors
- Objects not found errors
- The connector automatically detects bucket regions - this usually works
- If region detection fails, ensure the IAM user has
s3:GetBucketLocationpermission - Some buckets may require explicit region configuration in AWS
- Check that buckets are in supported AWS regions
- Verify the IAM user has access to the region where buckets are located
- Sync was working but suddenly stopped
- Authentication errors after working previously
- “Credentials invalid” errors
- Access keys don’t expire automatically, but they can be rotated or deleted
- If access keys were rotated in AWS, update credentials in PipesHub immediately
- Check if the IAM user or access keys were deleted in AWS IAM console
- Re-authenticate by updating credentials in connector settings
- Verify the IAM user still exists and is active
- Check AWS CloudTrail for any credential-related events
- Files synced but not searchable
- Search returns no results for known files
- Indexing shows 0% complete
- Verify indexing filters are enabled (Index Files should be ON)
- Check that file types are supported for indexing (PDF, DOCX, TXT, etc.)
- Images and binary files cannot be indexed for text search
- Wait for indexing to complete - large files take time to process
- Check indexing progress in connector status
- Verify file content is not encrypted or password-protected
- Ensure files have readable text content (not just images)
- Some files appear but others don’t
- Files in certain folders are missing
- Inconsistent sync results
- Check file extension filters - files may be excluded by filter settings
- Verify date filters aren’t excluding files
- Check bucket permissions - IAM user may not have access to all objects
- Review sync logs for specific errors about missing files
- Verify files actually exist in S3 (check S3 console)
- Ensure files aren’t in excluded buckets (if bucket filter is active)
- Check if files are in Glacier storage class (not directly accessible)
AWS S3 Compatibility
This connector is designed for Amazon S3 and uses the AWS S3 API. It supports:- All AWS S3 Regions: Works with buckets in any AWS region worldwide
- Standard S3 Storage Classes: Standard, Standard-IA, One Zone-IA, Intelligent-Tiering
- S3-Compatible Services: Should work with MinIO and other S3-compatible storage (with proper endpoint configuration)
- S3 Glacier and Glacier Deep Archive (requires restore process)
- S3 Outposts (may require special configuration)
- Requester Pays buckets (not currently supported)
The connector uses the standard S3 API and should work with S3-compatible services like MinIO, but is primarily tested and optimized for AWS S3.
FAQ
How long does the initial sync take?
How long does the initial sync take?
The initial sync duration depends on the size of your S3 buckets:
Tip: Use bucket and file extension filters to significantly reduce initial sync time. For example, syncing only PDF files from specific buckets can reduce sync time by 80-90%.
| Bucket Size | Estimated Time |
|---|---|
| Small (< 1,000 objects) | 5-15 minutes |
| Medium (1,000 - 10,000 objects) | 15-45 minutes |
| Large (10,000 - 100,000 objects) | 1-3 hours |
| Very Large (100,000 - 1,000,000 objects) | 3-8 hours |
| Enterprise (1,000,000+ objects) | 8+ hours |
Can I sync multiple AWS accounts?
Can I sync multiple AWS accounts?
No, each connector instance connects to a single AWS account. To sync buckets from multiple AWS accounts, you’ll need to:
- Create separate IAM users in each AWS account
- Create separate S3 connector instances in PipesHub
- Configure each connector with credentials from the respective AWS account
What file types can be indexed for search?
What file types can be indexed for search?
The connector can index text-based file types for AI-powered search:Supported:
- Documents: PDF, DOCX, XLSX, PPTX, ODT, RTF
- Text: TXT, MD, CSV, JSON, XML, HTML
- Code: Python, JavaScript, TypeScript, Java, Go, and many others
- Images: JPG, PNG, GIF, SVG, WebP
- Videos: MP4, AVI, MOV, etc.
- Archives: ZIP, RAR, TAR, etc.
- Binary files without extractable text
How does incremental sync work?
How does incremental sync work?
After the initial sync, the connector uses timestamp-based incremental sync:
- Stores Sync Point: Saves the last sync timestamp for each bucket
- Queries Changes: On next sync, queries S3 for objects modified since that timestamp
- Processes Only Changes: Downloads and indexes only new or modified objects
- Updates Timestamp: Saves the new maximum timestamp for the next sync
Can I sync S3 Glacier files?
Can I sync S3 Glacier files?
No, the connector does not directly support S3 Glacier or Glacier Deep Archive storage classes. These storage classes require a restore process before files can be accessed.If you need to sync Glacier files:
- Restore the files to Standard storage class in S3
- Wait for restoration to complete
- The connector will then sync the restored files
What happens if I delete a file in S3?
What happens if I delete a file in S3?
When a file is deleted from S3:
- The deletion is detected during the next incremental sync
- The file record is marked as deleted in PipesHub
- The file is removed from search results
- Historical access to the file may still be available depending on your retention settings
Can I sync encrypted S3 buckets?
Can I sync encrypted S3 buckets?
Yes, the connector supports S3 buckets with encryption:
- SSE-S3: Server-side encryption with S3-managed keys (fully supported)
- SSE-KMS: Server-side encryption with AWS KMS keys (requires KMS permissions)
- SSE-C: Server-side encryption with customer-provided keys (not supported)
- Client-side encryption: Files encrypted before upload (not supported)
How do I sync only specific folders in a bucket?
How do I sync only specific folders in a bucket?
Currently, the connector syncs entire buckets. To sync only specific folders:
- Use Bucket Filters: Create separate buckets for different content types
- Use File Extension Filters: Filter to specific file types
- Use Date Filters: Sync only recently modified files
Useful Links
- AWS IAM Console: console.aws.amazon.com/iam
- AWS S3 Documentation: docs.aws.amazon.com/s3
- IAM Access Keys Guide: docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys
- S3 IAM Policies: docs.aws.amazon.com/AmazonS3/latest/userguide/using-with-s3-actions.html
- S3 Console: s3.console.aws.amazon.com
Ready to Get Started?
Connect your S3 buckets to PipesHub in just a few minutes. Follow the step-by-step guide above to enable organization-wide file search and access across all your S3 content.
