Skip to main content
Amazon S3 Logo

Amazon S3

Cloud object storage service

✅ Ready📖 Documentation Available

Overview

Amazon S3 (Simple Storage Service) is a scalable object storage service designed to store and retrieve any amount of data from anywhere on the web. It provides industry-leading scalability, data availability, security, and performance. The S3 connector enables you to sync files and folders from your S3 buckets into PipesHub, making your cloud storage content searchable through AI-powered search and accessible across your organization.

S3 Data Structure

The connector understands S3’s hierarchical structure: Buckets → Folders → Files
EntityDescription
BucketsTop-level containers for organizing objects (similar to root directories)
FoldersLogical groupings of objects using prefixes (e.g., documents/reports/)
FilesIndividual objects stored in buckets with unique keys (paths)

What Gets Synced

The connector indexes the following content for AI-powered search:
  • Files: All file types stored in S3 buckets (documents, images, videos, archives, etc.)
  • Folders: Directory structure and organization
  • Metadata: File names, sizes, modification dates, MIME types
  • Permissions: Access control based on bucket and object permissions

Configuration Guide

Setup

Setup Overview

The S3 connector provides access to your AWS S3 buckets through Access Key authentication. It syncs files, folders, and metadata, enabling comprehensive search and access across your S3 content.

Authentication

The S3 connector uses AWS Access Key ID and Secret Access Key for authentication. This method allows secure programmatic access to your S3 buckets without requiring OAuth setup.
Access Key authentication is simpler to set up than OAuth but requires you to manage AWS IAM credentials. Ensure you follow AWS security best practices when creating and storing access keys.

How to configure and enable the S3 Connector

Step 1: Create an IAM User in AWS

  1. Sign in to AWS Management Console:
    Navigate to console.aws.amazon.com and sign in with your AWS account credentials.
  2. Access IAM Service:
    • In the AWS Management Console, search for “IAM” in the top search bar
    • Click on “IAM” from the results to open the Identity and Access Management console
AWS IAM Console
  1. Navigate to Users:
    • In the left sidebar, click “Users”
    • Click the “Create user” button (orange button at the top right)
Create IAM User

Step 2: Configure User Details

  1. Enter User Name:
    • Enter a descriptive name for the user (e.g., “PipesHub-S3-Connector”)
    • Click “Next” button
Enter User Name
  1. Set Permissions:
    • Select “Attach policies directly”
    • Search for and select “AmazonS3ReadOnlyAccess” policy
    • This policy provides read-only access to all S3 buckets
For more granular control, you can create a custom IAM policy that restricts access to specific buckets. See the “Advanced: Custom IAM Policy” section below for details.
Attach S3 Read Only Policy
  1. Review and Create:
    • Review the user configuration
    • Click “Create user” button at the bottom

Step 3: Create Access Keys

  1. Access Security Credentials:
    • After creating the user, you’ll be redirected to the user details page
    • Click on the “Security credentials” tab
    • Scroll down to the “Access keys” section
Security Credentials Tab
  1. Create Access Key:
    • Click “Create access key” button
    • Select “Application running outside AWS” as the use case
    • Check the confirmation box
    • Click “Next” button
Create Access Key
  1. Download or Copy Credentials:
    • Access Key ID: Copy this value immediately
    • Secret Access Key: Click “Show” button and copy this value immediately
Important: The Secret Access Key is shown only once. If you don’t save it now, you’ll need to create a new access key pair. Store these credentials securely.
Access Key Credentials
  1. Download Credentials (Optional):
    • Click “Download .csv file” button to save credentials to a secure location
    • Click “Done” button to complete the process

Step 4: Open S3 in PipesHub

  1. Open the PipesHub app.
  2. In the left sidebar, click Connectors (under Workspace).
  3. Find the S3 tile.
  4. Click + Setup on the tile.
A panel called S3 Configuration opens on the right. It has two tabs: Authenticate Instance and Configure Records. You start on Authenticate Instance.

Step 5: Fill in your S3 credentials

S3 Authenticate Instance tab in PipesHub
Fill in the fields from top to bottom:
  1. Instance name — type any name you like (for example S3). This is just a label so you can recognise it later.
  2. Access Key — paste the Access Key ID you copied from AWS in Step 3.
  3. Secret Key — paste the Secret Access Key you copied from AWS in Step 3.
  4. Click Next → (bottom right).
Paste the keys exactly as you copied them. Extra spaces or line breaks will make the connection fail.
The two links at the top (S3 Access Key Setup and Pipeshub Documentation) are just help links. You can ignore them if you’ve already followed Steps 1–3.

Step 6: Choose how often S3 should sync

You’re now on the Configure Records tab. The top part is called Sync settings.
S3 Configure Records tab in PipesHub — sync settings and filters
  1. Sync Strategy — pick one:
    • Scheduled — PipesHub checks S3 for new or changed files automatically. (Recommended.)
    • Manual — nothing is pulled until you click Sync yourself.
  2. Sync Interval — only shown if you chose Scheduled. Pick how often to check S3 (for example 1 Hour).

Step 7: Pick what to sync (filters)

Still on the same tab, scroll down to Indexing & sync filters. Filters let you limit what PipesHub pulls from S3. If you don’t add any filters, everything will be synced.
  1. Enable Manual Indexing — leave this off. (Turning it on stops PipesHub from automatically indexing new files for search.)
  2. Click + Add filter to add a filter. You’ll see four options:
    • Bucket Names — pick which S3 buckets to include or exclude. PipesHub loads your bucket list automatically — just select from the dropdown.
    • File Extensions — include or exclude specific file types. Type them comma-separated, e.g. pdf, docx, txt.
    • Modified Date — only sync files changed after / before / between the dates you choose.
    • Created Date — same idea, but based on when the file was first uploaded to S3.
  3. Click Save Configuration (bottom right) when you’re done.
You can change filters any time later. New filters take effect on the next sync.

Step 8: Turn on sync

After saving, the S3 tile now shows a card for your new instance.
S3 instance card with Sync Paused and Sync Enabled toggle
Right now the card says Sync Paused — nothing is being synced yet.
  1. Find the Sync Enabled row at the bottom of the card.
  2. Click the toggle so it turns on.
That’s it — PipesHub will now start pulling files from S3.
Want to connect a second AWS account? Click + Add Another Instance in the top-right and repeat Steps 5–8 with the new account’s keys.

Step 9: Check that sync is working

Click the instance card to open the Overview panel. This shows you live progress.
S3 instance Overview panel with Records Status and Records by Type
What to look at:
  • Records Status — the main scoreboard:
    • Total — how many files PipesHub knows about.
    • Failed — files that couldn’t be fetched (usually a permission problem in AWS).
    • Unsupported — file types PipesHub can’t read (e.g. encrypted or unknown binary).
    • Processing — being indexed right now.
    • Not Started — waiting in the queue.
  • Records by Type — shows how many FILE records have been synced.
  • Sync (top right) — click it to pull the latest changes right now.
  • Full sync — click this to re-scan everything from scratch (slower; use only if something looks wrong).
  • Manage Configuration — opens the setup panel again if you need to change keys or filters.
If Total keeps going up and Failed stays at 0, everything’s working.

Advanced: Custom IAM Policy

For enhanced security, you can create a custom IAM policy that restricts access to specific buckets or prefixes.

Example Custom Policy

Here’s a basic example policy for a single bucket:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListAllMyBuckets",
        "s3:GetBucketLocation"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::your-bucket-name",
        "arn:aws:s3:::your-bucket-name/*"
      ]
    }
  ]
}

Enhanced Custom Policy (Multiple Buckets)

For production use with multiple buckets and additional permissions, use this comprehensive policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:GetObjectVersion",
                "s3:GetObjectAttributes",
                "s3:GetObjectTagging"
            ],
            "Resource": [
                "arn:aws:s3:::*/*",
                "arn:aws:s3:::your-bucket-name-1",
                "arn:aws:s3:::your-bucket-name-2",
                "arn:aws:s3:::your-bucket-name-3"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"
        }
    ]
}
Replace the bucket names in the Resource array (your-bucket-name-1, your-bucket-name-2, your-bucket-name-3) with your actual bucket names. The arn:aws:s3:::*/* allows access to all objects in the specified buckets.
Steps to apply custom policy:
  1. In IAM, go to “Policies”“Create policy”
  2. Use the JSON editor to paste the policy above (replace your-bucket-name with your actual bucket name)
  3. Name the policy (e.g., “PipesHub-S3-ReadOnly-SpecificBuckets”)
  4. Attach the policy to your IAM user instead of the managed policy

Supported Features

The S3 connector syncs the following data from your AWS S3 buckets:
  • Buckets: All accessible buckets in your AWS account
  • Files: All file types with their metadata (name, size, modification date, MIME type)
  • Folders: Directory structure and organization using S3 prefixes
  • Permissions: Access control based on bucket and object-level permissions

Data Sync Behavior

Initial Sync

The first sync performs a complete scan of your configured S3 buckets:
  • Bucket Discovery: Lists all accessible buckets in your AWS account (or uses configured bucket filters)
  • Region Detection: Automatically detects and caches the region for each bucket
  • Full Object Listing: Fetches all objects from selected buckets using pagination
  • Metadata Extraction: Extracts file names, sizes, modification dates, MIME types, and paths
  • Folder Structure: Creates logical folder structure from S3 object prefixes
  • Indexing: Indexes file content for AI-powered search (based on indexing filter settings)
  • Permission Assignment: Assigns permissions based on connector scope (Personal or Team)
Sync Duration Estimates:
Bucket SizeObjectsEstimated Time
Small< 1,0005-15 minutes
Medium1,000 - 10,00015-45 minutes
Large10,000 - 100,0001-3 hours
Very Large100,000 - 1,000,0003-8 hours
Enterprise1,000,000+8+ hours
Use bucket and file extension filters to significantly reduce initial sync time. For example, syncing only PDF files from a specific bucket can reduce sync time by 80-90%.

Incremental Sync

After the initial sync, subsequent syncs are much faster:
  • Timestamp-Based Detection: Uses LastModified timestamps to identify changed objects
  • Sync Point Tracking: Stores the last sync timestamp per bucket in sync points
  • Change Detection: Only fetches objects modified since the last sync timestamp
  • Pagination Resume: Uses continuation tokens to resume interrupted syncs
  • Efficient Updates: Processes only new, modified, or deleted objects
  • API Optimization: Reduces API calls by 90%+ compared to full syncs
How It Works:
  1. Connector reads the sync point for each bucket (stored in our database)
  2. Retrieves the last_sync_time timestamp
  3. Queries S3 for objects with LastModified >= last_sync_time
  4. Processes only the changed objects
  5. Updates the sync point with the new maximum timestamp
Incremental syncs typically take 5-15 minutes for most buckets, regardless of total bucket size, as they only process changes.

Permission Handling

The connector respects AWS IAM permissions and PipesHub access controls:
  • IAM-Based Access: Only syncs buckets and objects the IAM user has permission to access
  • Bucket Policies: Respects S3 bucket policies and access control lists (ACLs)
  • PipesHub Scope:
    • Personal Scope: Only the connector creator can access synced content
    • Team Scope: All organization users have read access to synced content
  • Permission Inheritance: Files inherit permissions from their parent bucket (Record Group)
  • Access Control: Users see only content they have permission to view in both AWS and PipesHub

Sync Frequency

Scheduled Sync:
  • Default interval: 60 minutes
  • Configurable: 15 min, 30 min, 1 hour, 4 hours, 24 hours
  • Runs automatically in the background
  • Best for: Keeping data up-to-date with minimal manual intervention
Manual Sync:
  • Triggered on-demand from the connector settings
  • Best for: Testing, troubleshooting, or syncing after bulk changes
  • Useful when: You need immediate sync after uploading many files
For large buckets with infrequent changes, consider longer sync intervals (4-24 hours) to reduce AWS API usage and costs.

Troubleshooting

Common Issues and Solutions

Common Issues

Invalid credentials error:Symptoms:
  • Error message: “InvalidAccessKeyId” or “SignatureDoesNotMatch”
  • Connector fails to initialize
  • Authentication fails immediately
Solutions:
  1. Verify Access Key ID and Secret Access Key are correct
  2. Ensure you copied the full values without extra spaces or line breaks
  3. Check that the IAM user is active in AWS IAM console
  4. Verify the access keys haven’t been deleted or rotated
  5. Try creating new access keys if the old ones are compromised
  6. Ensure credentials are pasted exactly as shown (no leading/trailing spaces)
Bucket access denied:Symptoms:
  • Error: “AccessDenied” when listing or accessing buckets
  • Some buckets appear but others don’t
  • Sync fails with permission errors
Solutions:
  1. Verify the IAM user has the following permissions:
    • s3:ListAllMyBuckets - To list all buckets
    • s3:ListBucket - To list objects in each bucket
    • s3:GetBucketLocation - To detect bucket regions
    • s3:GetObject - To download object content
  2. Check that the bucket policy allows access from your IAM user
  3. Ensure the bucket exists and is in the same AWS account
  4. Verify bucket region matches your AWS region configuration
  5. Review IAM policy JSON to ensure ARNs are correct
No buckets appearing:Symptoms:
  • Bucket dropdown is empty
  • No buckets available to select
  • “No buckets found” message
Solutions:
  1. Check that the IAM user has s3:ListAllMyBuckets permission
  2. Verify credentials are correctly entered in PipesHub
  3. Ensure the IAM user has access to at least one bucket
  4. Check AWS CloudTrail logs for permission errors
  5. Verify the IAM user is in the same AWS account as the buckets
  6. Try refreshing the bucket list in the filter configuration
No data syncing:Symptoms:
  • Connector shows “Active” but no files appear
  • Sync completes but no records are indexed
  • Indexing progress stays at 0%
Solutions:
  1. Verify the connector status shows “Active” or “Syncing”
  2. Check that buckets are selected in the filter configuration
  3. Ensure file extension filters aren’t excluding all files
  4. Verify date filters aren’t excluding all files (e.g., “Modified After” set to future date)
  5. Review sync logs for specific error messages
  6. Verify the IAM user has s3:GetObject permission
  7. Check that objects exist in the selected buckets
  8. Ensure indexing filters are enabled (Index Files, Index Folders)
Sync taking too long:Symptoms:
  • Initial sync runs for hours without completion
  • Progress bar moves very slowly
  • High AWS API usage
Solutions:
  1. Use Bucket Filters: Sync only necessary buckets instead of all buckets
  2. Use File Extension Filters: Filter to specific file types (e.g., pdf, docx, txt)
  3. Use Date Filters: Sync only recently modified files (e.g., last 6 months)
  4. Increase Sync Interval: For large buckets, use 4-24 hour intervals
  5. Check Bucket Size: Very large buckets (millions of objects) will take longer
  6. Monitor Progress: Check indexing progress to ensure sync is actually running
A bucket with 1 million objects can take 6-12 hours for initial sync. Using filters to sync only 10% of objects reduces this to 1-2 hours.
Region detection errors:Symptoms:
  • Warnings about region detection
  • Sync fails with region mismatch errors
  • Objects not found errors
Solutions:
  1. The connector automatically detects bucket regions - this usually works
  2. If region detection fails, ensure the IAM user has s3:GetBucketLocation permission
  3. Some buckets may require explicit region configuration in AWS
  4. Check that buckets are in supported AWS regions
  5. Verify the IAM user has access to the region where buckets are located
Token expired or sync stopped:Symptoms:
  • Sync was working but suddenly stopped
  • Authentication errors after working previously
  • “Credentials invalid” errors
Solutions:
  1. Access keys don’t expire automatically, but they can be rotated or deleted
  2. If access keys were rotated in AWS, update credentials in PipesHub immediately
  3. Check if the IAM user or access keys were deleted in AWS IAM console
  4. Re-authenticate by updating credentials in connector settings
  5. Verify the IAM user still exists and is active
  6. Check AWS CloudTrail for any credential-related events
Files not appearing in search:Symptoms:
  • Files synced but not searchable
  • Search returns no results for known files
  • Indexing shows 0% complete
Solutions:
  1. Verify indexing filters are enabled (Index Files should be ON)
  2. Check that file types are supported for indexing (PDF, DOCX, TXT, etc.)
  3. Images and binary files cannot be indexed for text search
  4. Wait for indexing to complete - large files take time to process
  5. Check indexing progress in connector status
  6. Verify file content is not encrypted or password-protected
  7. Ensure files have readable text content (not just images)
Partial sync or missing files:Symptoms:
  • Some files appear but others don’t
  • Files in certain folders are missing
  • Inconsistent sync results
Solutions:
  1. Check file extension filters - files may be excluded by filter settings
  2. Verify date filters aren’t excluding files
  3. Check bucket permissions - IAM user may not have access to all objects
  4. Review sync logs for specific errors about missing files
  5. Verify files actually exist in S3 (check S3 console)
  6. Ensure files aren’t in excluded buckets (if bucket filter is active)
  7. Check if files are in Glacier storage class (not directly accessible)
If you rotate or change AWS access keys, you must update the configuration in PipesHub immediately. The connector will fail to sync until new credentials are provided.

AWS S3 Compatibility

This connector is designed for Amazon S3 and uses the AWS S3 API. It supports:
  • All AWS S3 Regions: Works with buckets in any AWS region worldwide
  • Standard S3 Storage Classes: Standard, Standard-IA, One Zone-IA, Intelligent-Tiering
  • S3-Compatible Services: Should work with MinIO and other S3-compatible storage (with proper endpoint configuration)
Not Supported:
  • S3 Glacier and Glacier Deep Archive (requires restore process)
  • S3 Outposts (may require special configuration)
  • Requester Pays buckets (not currently supported)
The connector uses the standard S3 API and should work with S3-compatible services like MinIO, but is primarily tested and optimized for AWS S3.

FAQ

How long does the initial sync take?

The initial sync duration depends on the size of your S3 buckets:
Bucket SizeEstimated Time
Small (< 1,000 objects)5-15 minutes
Medium (1,000 - 10,000 objects)15-45 minutes
Large (10,000 - 100,000 objects)1-3 hours
Very Large (100,000 - 1,000,000 objects)3-8 hours
Enterprise (1,000,000+ objects)8+ hours
Tip: Use bucket and file extension filters to significantly reduce initial sync time. For example, syncing only PDF files from specific buckets can reduce sync time by 80-90%.
No, each connector instance connects to a single AWS account. To sync buckets from multiple AWS accounts, you’ll need to:
  1. Create separate IAM users in each AWS account
  2. Create separate S3 connector instances in PipesHub
  3. Configure each connector with credentials from the respective AWS account
This allows you to manage and filter buckets from different accounts separately.
After the initial sync, the connector uses timestamp-based incremental sync:
  1. Stores Sync Point: Saves the last sync timestamp for each bucket
  2. Queries Changes: On next sync, queries S3 for objects modified since that timestamp
  3. Processes Only Changes: Downloads and indexes only new or modified objects
  4. Updates Timestamp: Saves the new maximum timestamp for the next sync
This makes subsequent syncs 90%+ faster than full syncs, typically completing in 5-15 minutes regardless of bucket size.
No, the connector does not directly support S3 Glacier or Glacier Deep Archive storage classes. These storage classes require a restore process before files can be accessed.If you need to sync Glacier files:
  1. Restore the files to Standard storage class in S3
  2. Wait for restoration to complete
  3. The connector will then sync the restored files
Consider using S3 Intelligent-Tiering or Standard-IA for frequently accessed files that need to be searchable.
When a file is deleted from S3:
  • The deletion is detected during the next incremental sync
  • The file record is marked as deleted in PipesHub
  • The file is removed from search results
  • Historical access to the file may still be available depending on your retention settings
Note: If you use S3 versioning, deleted files may still appear if previous versions exist.
Yes, the connector supports S3 buckets with encryption:
  • SSE-S3: Server-side encryption with S3-managed keys (fully supported)
  • SSE-KMS: Server-side encryption with AWS KMS keys (requires KMS permissions)
  • SSE-C: Server-side encryption with customer-provided keys (not supported)
  • Client-side encryption: Files encrypted before upload (not supported)
Ensure your IAM user has the necessary KMS permissions if using SSE-KMS.
Currently, the connector syncs entire buckets. To sync only specific folders:
  1. Use Bucket Filters: Create separate buckets for different content types
  2. Use File Extension Filters: Filter to specific file types
  3. Use Date Filters: Sync only recently modified files
Folder-level filtering within a bucket is not currently supported, but may be added in future releases.

Ready to Get Started?

Connect your S3 buckets to PipesHub in just a few minutes. Follow the step-by-step guide above to enable organization-wide file search and access across all your S3 content.