Documentation Index
Fetch the complete documentation index at: https://docs.pipeshub.com/llms.txt
Use this file to discover all available pages before exploring further.
What is this?
Google Cloud Storage (GCS) is where a lot of teams keep their files in the cloud — documents, images, backups, anything. This connector copies those files into PipesHub so you can search them with AI, just like any other file in your workspace.What you’ll need before you start
- A Google Cloud account with access to the project that owns your GCS buckets
- Permission to create a service account in that project (usually a project owner or admin)
- About 10 minutes
What gets synced
- Files of any type (PDFs, Word docs, images, videos, code, archives)
- Folders (including the nested folder structure inside each bucket)
- Details like file name, size, when it was last changed
- Who can see what — PipesHub respects the permissions already set in Google Cloud
Step-by-step setup
Setup
Setup
How the connector signs in
Instead of logging in with a password, the connector uses a service account — a kind of robot user inside Google Cloud that has its own key file.You’ll create this service account, download its key (a small.json file), and upload that file into PipesHub. That’s it.Service accounts are the standard way to let one system talk to Google Cloud on your behalf. Keep the key file safe — anyone who has it can read the buckets you give it access to.
Part A — In Google Cloud (about 5 minutes)
Step 1: Create a service account
- Go to console.cloud.google.com and sign in.
- Click the menu (☰) in the top-left and pick IAM & Admin → Service Accounts.

- Click Create Service Account at the top.
- Give it a name you’ll recognise later, like
pipeshub-gcs-connector. You can skip the description. - Click Create and Continue.

Step 2: Give it permission to read your files
Google Cloud needs to know what this service account is allowed to do. We only want it to read files — never write or delete.- In the Role dropdown, search for and select Storage Object Viewer. This is read-only access to everything in GCS.
- Click Continue, then Done.

Want to limit it to just one or two buckets instead of everything? See Advanced Configuration further down for a custom role.
Step 3: Download the key file
This is the file you’ll upload to PipesHub.- You’ll land on the list of service accounts. Click the one you just made.
- Open the Keys tab.
- Click Add Key → Create new key.

- Choose JSON and click Create.

- A
.jsonfile downloads automatically. Keep it somewhere safe — you can’t download it again later.

Part B — In PipesHub (about 3 minutes)
Step 4: Open the GCS connector
- In PipesHub, click Workspace → Connectors on the left sidebar.
- Find the tile called GCS (“Sync files and folders from Google Cloud Storage”).
- Click + Setup on that tile.
Step 5: Upload your key (Authenticate Instance tab)

- Instance name — Type a friendly name like
GCSorMarketing GCS. This is just a label for you. - Service Account JSON — Click Upload JSON and pick the
.jsonfile you downloaded in Step 3. Once it’s uploaded, you’ll see its filename on screen. - Click Next →.
Step 6: Decide how often to sync (Configure Records tab)

- Sync Strategy
- Scheduled — runs on its own. (Recommended for most people.)
- Manual — only runs when you click the sync button yourself.
- Sync Interval — how often a scheduled sync happens. 1 Hour is the default. You can pick 15 min, 30 min, 4 hours, or 24 hours.
Not sure? Leave it on Scheduled every 1 Hour. You can change it any time.
Step 7: Choose what to pull in (still on Configure Records)

| Filter | What it does | When to use it |
|---|---|---|
| Bucket Names | Only sync specific buckets. The list is auto-filled from your Google Cloud project. | You only care about one or two buckets out of many |
| File Extensions | Only sync certain file types. Enter extensions separated by commas, like pdf, docx, txt. | You only want documents, not images or videos |
| Modified Date | Only sync files changed after/before/between certain dates. | You want to skip very old files |
| Created Date | Like Modified Date, but based on when the file was first uploaded. | Same idea, different timestamp |
Start small. Pick one bucket and maybe
pdf, docx, txt for extensions. You can always loosen the filters later — but an initial sync over a huge bucket with no filters can take hours.Step 8: Turn sync on

- Flip the Sync Enabled toggle on the card.
- The status changes to Sync Enabled and the initial sync starts in the background.
Step 9: Watch the sync progress
Click the instance card to open the Overview panel. This is where you’ll come back later to check on things.
- Total — how many files the connector has picked up so far
- Failed — files it couldn’t read (usually a permission problem)
- Unsupported — files it can store but can’t read the text of (images, videos, archives)
- Processing — files currently being read and indexed
- Not Started — files waiting in the queue
- Sync — run an extra sync right now (only grabs what’s changed)
- Full sync — re-scan everything from scratch (use sparingly, it’s slow)
- Manage Configuration — re-opens the two tabs from Steps 5–7 so you can change credentials or filters later
Advanced Configuration
Advanced Configuration
Limiting access to specific buckets (custom role)
Storage Object Viewer gives the service account read access to every bucket in the project. If that’s more than you want, you can make a custom role with a narrower scope.Here’s an example role with just the permissions the connector actually needs:- In Google Cloud Console, go to IAM & Admin → Roles.
- Click Create Role, paste in the permissions above, and save.
- Go to IAM & Admin → Service Accounts, open your service account, and assign the new role instead of (or in addition to) Storage Object Viewer.
Want to scope the service account to one specific bucket? Skip the custom role and instead go to the bucket in GCS, open Permissions, and grant Storage Object Viewer to just your service account’s email address. That’s simpler than a custom role.
How syncing actually works
How syncing actually works
The initial sync
When you turn the connector on, it does a full scan:- Lists every bucket you gave it access to (or just the ones in your filter)
- Walks through every object (file) inside
- Grabs the file’s text content so you can search it
Want it faster? Filters are the lever. Syncing only PDFs from one bucket is 90% quicker than syncing everything.
Every sync after that
Once the initial sync is done, the connector only asks Google Cloud: “what’s changed since last time?” So scheduled syncs usually finish in 5–15 minutes no matter how big the bucket is.Who can see the synced files?
PipesHub respects two things:- Google Cloud permissions — the connector can only pull files the service account has access to.
- The scope you picked when setting up the connector:
- Personal — only you can see the files in PipesHub
- Team — everyone in your organization can see them
Sync frequency — scheduled vs manual
Scheduled sync (the default) runs on its own every hour (or whatever interval you picked). Set it and forget it.Manual sync only runs when you click Sync or Full sync on the Overview panel. Useful when:- You just uploaded a batch of files and want them searchable right now
- You’re testing filter changes
- You’re debugging why something didn’t sync
What this connector works with
- All Google Cloud regions worldwide
- All storage classes — Standard, Nearline, Coldline, and Archive (Archive / Coldline may be slower to read)
- Encrypted buckets — Google-managed encryption works out of the box
- Requester Pays buckets
- Customer-supplied encryption keys (CSEK)
- Customer-managed encryption keys (CMEK) need extra KMS permissions on the service account
Useful links
Ready to connect?
Follow the nine steps above and your GCS files will be searchable across your whole organization in minutes.
Something not working?
Troubleshooting
Troubleshooting
“Invalid credentials” or “Authentication failed”Usually means the JSON key file is wrong or out of date.
Some buckets are missing from the filter dropdownThe service account can only list buckets it has permission to see.
Connector shows “Sync Enabled” but no files show upWalk through these in order:
Initial sync is taking foreverBig buckets take real time. Things that help:
Sync used to work, now it doesn’tMost often, the service account key was rotated or deleted in Google Cloud.
Files are synced but I can’t find them in search
Some files synced, others didn’t
- Make sure you uploaded the original file you downloaded from Google Cloud — not a copy-paste of the contents
- Check the service account still exists in Google Cloud → IAM & Admin → Service Accounts
- Check the key hasn’t been deleted under the service account’s Keys tab
- Worst case, delete the old key, create a fresh one, and upload it via Manage Configuration
Some buckets are missing from the filter dropdownThe service account can only list buckets it has permission to see.
- Give it Storage Object Viewer at the project level (or on each bucket individually)
- Check you’re in the right project — the connector talks to the project your JSON key came from
- Reopen + Add filter → Bucket Names to refresh the list
Connector shows “Sync Enabled” but no files show upWalk through these in order:
- Are your filters too strict? Open Manage Configuration and check.
- Does the service account actually have read access to objects, not just buckets?
- Look at the Failed tile on the Overview panel — errors there will tell you why.
- Any files at all in the bucket you picked?
Initial sync is taking foreverBig buckets take real time. Things that help:
- Limit to specific buckets with a Bucket Names filter
- Limit to the file types you care about with File Extensions (e.g.
pdf, docx) - Skip old files with a Modified Date filter
- For gigantic buckets (millions of objects), raise the Sync Interval to 4–24 hours — less API usage, less cost
Sync used to work, now it doesn’tMost often, the service account key was rotated or deleted in Google Cloud.
- Check the service account’s Keys tab — is the key you uploaded still there?
- Create a new key and re-upload via Manage Configuration → Authenticate Instance
- Also check the service account itself wasn’t deleted or had its role removed
Files are synced but I can’t find them in search
- Processing tile still above zero? Indexing hasn’t finished yet — give it time.
- Files showing as Unsupported? That’s expected for images, videos, archives — they’re stored but their content can’t be read.
- Password-protected or encrypted PDFs can’t be indexed either.
Some files synced, others didn’t
- Check file extension filters aren’t excluding them
- Check date filters aren’t excluding them
- Check the service account has access to those specific objects (some might live in a bucket with extra IAM rules)
- Archive / Coldline files may take longer to fetch
Frequently asked questions
How long does the initial sync take?
How long does the initial sync take?
Depends entirely on how many files are in the buckets you picked.
Using filters to sync only what you actually need is the biggest speedup.
| Bucket size | Rough time |
|---|---|
| Under 1,000 files | 5–15 minutes |
| 1,000 – 10,000 | 15–45 minutes |
| 10,000 – 100,000 | 1–3 hours |
| 100,000 – 1,000,000 | 3–8 hours |
| 1,000,000+ | 8+ hours |
Can I connect more than one Google Cloud project?
Can I connect more than one Google Cloud project?
Yes. Each instance is linked to one project, but you can add as many instances as you like. Click + Add Another Instance at the top of the GCS panel and repeat the setup with a different key file. Each instance has its own filters and its own on/off toggle.
Which file types can actually be searched?
Which file types can actually be searched?
Fully searchable (we can read the text):
- Documents — PDF, DOCX, XLSX, PPTX, ODT, RTF
- Text — TXT, MD, CSV, JSON, XML, HTML
- Code — Python, JavaScript, Java, Go, and most others
- Images — JPG, PNG, GIF, SVG
- Video and audio
- Archives — ZIP, RAR, TAR
- Anything password-protected or encrypted
How does the incremental sync know what's new?
How does the incremental sync know what's new?
Every file in GCS has an
updated timestamp. After each sync, the connector remembers the most recent timestamp it saw. Next time it runs, it just asks Google Cloud for files with a newer timestamp than that. Fast and efficient.Will it sync Archive and Coldline files?
Will it sync Archive and Coldline files?
Yes, but Archive and Coldline are designed for infrequent access — reading them can be slower and, depending on your Google Cloud billing, more expensive. For files you expect people to search often, Standard or Nearline storage is a better fit.
What happens when I delete a file from GCS?
What happens when I delete a file from GCS?
On the next sync, the connector notices it’s gone and removes it from PipesHub too — including from search results.If you use object versioning in GCS, older versions may still appear until they’re also deleted.
Does it work with encrypted buckets?
Does it work with encrypted buckets?
- Google-managed encryption — yes, works by default
- Customer-managed keys (CMEK) — yes, but the service account needs KMS decrypt permission on the key
- Customer-supplied keys (CSEK) — not supported
Can I sync just one folder inside a bucket?
Can I sync just one folder inside a bucket?
Not directly — filters work at the bucket level, not the folder level. A few workarounds:
- Split content into separate buckets so you can pick the right one
- Use a File Extensions filter if the folder you care about has a distinct file type
- Use a Modified Date filter if the folder you care about has freshly-changed files



















