Crawling Manager API
The Crawling Manager API provides endpoints for scheduling, managing, and monitoring data crawling jobs across various connector types including Google Workspace, OneDrive, SharePoint Online, Slack, and Confluence.Base URL
All endpoints are prefixed with/api/v1/crawlingManager
Authentication
All endpoints require:- Authentication via
Authorization
header with valid JWT token - Admin privileges - Only users with admin role can access these endpoints
Supported Connectors
gmail
- Gmail connectordrive
- Google Drive connectoronedrive
- OneDrive connectorsharepointonline
- SharePoint Online connectorconfluence
- Confluence connectorslack
- Slack connectorlinear
- Linear connectordropbox
- Dropbox connectoroutlook
- Outlook connectorjira
- Jira connectoratlassian
- Atlassian connectorgithub
- GitHub connectorbox
- Box connectors3
- Amazon S3 connectorazure
- Azure connectorairtable
- Airtable connectorzendesk
- Zendesk connector
crawl-{connector}-{orgId}
where connector is normalized to lowercase with hyphens.
API Endpoints
POST /:connector/schedule - Schedule Job
POST /:connector/schedule - Schedule Job
Schedule a new crawling job for a specific connector type.Endpoint: Schedule Configuration Types:All schedule configurations inherit base properties:Daily Schedule:Weekly Schedule:Monthly Schedule:Custom Schedule (Cron):Once Schedule:
POST /api/v1/crawlingManager/:connector/schedule
Parameters:connector
(string, path) - The connector type (gmail, drive, onedrive, sharepointonline, confluence, slack, linear, dropbox, outlook, jira, atlassian, github, box, s3, azure, airtable, zendesk)
scheduleType
(required) - The type of scheduleisEnabled
(boolean, default: true) - Whether the schedule is enabledtimezone
(string, default: “UTC”) - Timezone for schedule execution
GET /:connector/schedule - Get Job Status
GET /:connector/schedule - Get Job Status
Retrieve the status of a scheduled crawling job for a specific connector.Endpoint:
GET /api/v1/crawlingManager/:connector/schedule
Parameters:connector
(string, path) - The connector type
Status: Job States:
200 OK
waiting
- Job is waiting to be processedactive
- Job is currently being processedcompleted
- Job completed successfullyfailed
- Job failed with errorsdelayed
- Job is delayed for future executionpaused
- Job is paused
GET /schedule/all - Get All Job Statuses
GET /schedule/all - Get All Job Statuses
Retrieve all scheduled crawling jobs for the organization.Endpoint:
GET /api/v1/crawlingManager/schedule/all
Status:
200 OK
DELETE /:connector/schedule - Remove Job
DELETE /:connector/schedule - Remove Job
Remove a scheduled crawling job for a specific connector.Endpoint:
DELETE /api/v1/crawlingManager/:connector/schedule
Parameters:connector
(string, path) - The connector type
Status:
200 OK
DELETE /schedule/all - Remove All Jobs
DELETE /schedule/all - Remove All Jobs
Remove all scheduled crawling jobs for the organization.Endpoint:
DELETE /api/v1/crawlingManager/schedule/all
Status:
200 OK
POST /:connector/pause - Pause Job
POST /:connector/pause - Pause Job
Pause a scheduled crawling job for a specific connector.Endpoint:
POST /api/v1/crawlingManager/:connector/pause
Parameters:connector
(string, path) - The connector type
Status:
200 OK
POST /:connector/resume - Resume Job
POST /:connector/resume - Resume Job
Resume a paused crawling job for a specific connector.Endpoint:
POST /api/v1/crawlingManager/:connector/resume
Parameters:connector
(string, path) - The connector type
Status:
200 OK
GET /stats - Get Queue Statistics
GET /stats - Get Queue Statistics
Retrieve statistics about the crawling job queue.Endpoint:
GET /api/v1/crawlingManager/stats
Status: Statistics Fields:
200 OK
waiting
- Number of jobs waiting to be processedactive
- Number of jobs currently being processedcompleted
- Number of completed jobsfailed
- Number of failed jobsdelayed
- Number of delayed jobspaused
- Number of paused jobsrepeatable
- Number of repeatable/scheduled jobstotal
- Total number of jobs across all states
Data Types
System Configuration
The API includes built-in rate limiting and concurrency controls:
- Maximum 5 concurrent jobs per queue
- Jobs are automatically retried with exponential backoff (5000ms initial delay) on failure
- Stalled jobs are detected after 30 seconds
- Maximum 3 retry attempts for failed jobs
- Job history retention: Last 10 completed and 10 failed jobs per connector type
- Jobs are removed and recreated when updating schedules (no job modification)
Error Handling
All endpoints follow a consistent error response format:200
- Success201
- Created (for scheduling jobs)400
- Bad Request (validation errors, invalid configuration)401
- Unauthorized (missing or invalid authentication)403
- Forbidden (insufficient privileges, admin required)404
- Not Found (job not found)500
- Internal Server Error