
How do we connect Sourcegraph to GitHub Enterprise, GitLab, and Bitbucket so permissions match what users can actually access?
When you roll out a universal code understanding platform, the first non‑negotiable is: permissions in Sourcegraph must match permissions in your code hosts. If a developer can’t see a repo in GitHub Enterprise, GitLab, or Bitbucket, they shouldn’t discover it through Deep Search, Batch Changes, or any AI-powered workflow in Sourcegraph.
As someone who’s run this in a regulated environment, I’ll walk through how to connect Sourcegraph to GitHub Enterprise, GitLab, and Bitbucket so repo visibility and access line up with what users can actually reach—no more, no less.
How Sourcegraph thinks about permissions
Sourcegraph doesn’t replace your identity or authorization model. It layers on top of it.
At a high level:
- Auth is centralized. Users authenticate via Single Sign-On (SSO) with SAML, OpenID Connect, or OAuth. That means the same identity you use for GitHub Enterprise / GitLab / Bitbucket (often Okta, Azure AD, Ping, etc.) is what you use in Sourcegraph.
- Repo permissions are synchronized. Sourcegraph reads visibility and permissions from each connected code host. A user only sees (and can search) repositories they can access in the underlying host.
- Access is enforced everywhere. Code Search, Deep Search, Code Navigation, Batch Changes, Monitors, Insights—everything respects repo‑level permissions.
- Enterprise controls apply. You get RBAC on top (roles within Sourcegraph), SCIM for automated user lifecycle, and audit logs so security can see who accessed what and when.
The result: one place to search and understand code across GitHub, GitLab, Bitbucket, Perforce, and more—but always within the same access boundaries your security team already trusts.
Step 1: Decide your identity & SSO integration
Before you connect code hosts, lock in how users log in to Sourcegraph. This is what keeps identities and permissions aligned.
Recommended pattern
Use your existing IdP (Okta, Azure AD, Google Workspace, etc.):
- Configure SSO with:
- SAML or OpenID Connect as the primary choice for enterprise
- OAuth if you’re aligning tightly to a specific provider like GitHub
- Enable SCIM user provisioning so:
- New employees get Sourcegraph accounts automatically
- Departing users are deprovisioned and lose access
- Use RBAC in Sourcegraph to:
- Restrict admin capabilities to a small group
- Separate “operator” roles (who manage repos and Batch Changes) from normal developers
This ensures Sourcegraph identities track your org chart and security model, not a separate local user database.
Step 2: Connect GitHub Enterprise with permission sync
GitHub Enterprise is often the largest source of code sprawl. The key is to treat GitHub as the source of truth for repository permissions and let Sourcegraph mirror that.
How Sourcegraph connects to GitHub Enterprise
At a conceptual level, you:
- Create a machine identity in GitHub:
- Use a GitHub App or a personal access token (PAT) with:
repoaccess (for private repositories)read:org(if you’ll filter by organizations or teams)
- Scope it only to the organizations and repos you intend to index.
- Use a GitHub App or a personal access token (PAT) with:
- Add GitHub Enterprise as a code host in Sourcegraph:
- Point Sourcegraph to your GitHub Enterprise Server URL or GitHub Enterprise Cloud org
- Provide the app credentials or token
- Configure how frequently Sourcegraph:
- Syncs repositories
- Refreshes permissions
- Configure repository selection:
- Include repos by org, name patterns, or explicit lists
- Optionally exclude:
- Archived repos
- Specific orgs or projects that are out of scope
How permissions stay in sync
GitHub’s permission model is the backbone:
- Private repo access is derived from:
- Org membership
- Team membership
- Direct repo collaborators
- Sourcegraph queries GitHub to figure out:
- Which repos exist
- Which users (or teams) can see which repos
- When a user logs into Sourcegraph:
- Sourcegraph ties their identity (via SSO / OAuth) to their GitHub account
- Only repositories they can access in GitHub are surfaced in:
- Code Search / Deep Search
- Repo lists
- Batch Changes targets
- Insights and Monitors
Key outcome: A GitHub-only engineer logging into Sourcegraph will see exactly what they see in GitHub Enterprise—just searchable across the entire org, down to symbols and patterns.
Step 3: Connect GitLab (self-managed or SaaS) with matching access
For teams with GitLab (often for regulated or self-hosted projects), the model is similar but mapped to GitLab concepts.
How Sourcegraph connects to GitLab
- Create a GitLab token or application:
- Use a personal access token, group-level access token, or OAuth application
- Ensure it has:
apiscope (for project and membership lookup)read_repositoryfor cloning and indexing
- Add GitLab as a code host in Sourcegraph:
- For GitLab SaaS, use
https://gitlab.com - For self-managed GitLab, use your internal URL
- Provide the token or OAuth client credentials
- For GitLab SaaS, use
- Define project selection rules:
- Include by:
- Groups / subgroups
- Project name patterns
- Exclude confidential or out-of-scope projects if needed
- Include by:
How permissions stay in sync
GitLab’s access model centers around:
- Membership in groups / subgroups
- Project‑level roles (Guest/Reporter/Developer/Maintainer/Owner)
- Public vs internal vs private projects
Sourcegraph:
- Syncs the list of projects and their visibility from GitLab
- Maps users’ Sourcegraph identities to their GitLab accounts
- Enforces that:
- Users only see search results from projects they can access in GitLab
- Project-level visibility is reflected in all Sourcegraph features
That means, for example, a contractor with access to a single GitLab group won’t suddenly see internal migration repos or security-sensitive projects when using Deep Search.
Step 4: Connect Bitbucket (Server/Data Center or Cloud) with correct bounds
Bitbucket is still common in enterprises, especially in older or regulated environments. It’s usually where a lot of “legacy that can’t break” lives—exactly the code you don’t want leaking across permission boundaries.
How Sourcegraph connects to Bitbucket
- Set up credentials in Bitbucket:
- For Bitbucket Server/Data Center:
- Create a service account with:
- Read access to relevant projects and repos
- API permissions for project/repo listing
- Create a service account with:
- For Bitbucket Cloud:
- Use an app password or OAuth consumer with:
- Repository read
- Workspace membership read
- Use an app password or OAuth consumer with:
- For Bitbucket Server/Data Center:
- Add Bitbucket as a code host in Sourcegraph:
- Provide your Bitbucket base URL (Server/DC) or cloud workspace
- Configure authentication (username + app password, or OAuth)
- Decide which projects/repos to sync:
- Include specific projects (e.g.,
ENG,LEGACY,SEC) - Exclude sensitive or external-facing projects if you don’t want them in Sourcegraph at all
- Include specific projects (e.g.,
How permissions stay in sync
Bitbucket:
- Uses projects and groups plus user accounts to control repo access
- May also have fine-grained permissions at the repo level
Sourcegraph:
- Pulls project and repo lists from Bitbucket
- Queries Bitbucket to determine which users can see which repos
- Matches Sourcegraph users to Bitbucket identities via SSO mapping
- Ensures search, navigation, and automation are limited to those repos
That means developers with access only to a subset of Bitbucket projects will see that same subset in Sourcegraph, with no ability to search across the rest.
Step 5: Map identities correctly across all hosts
The tricky part in multi‑host environments isn’t just connecting GitHub, GitLab, and Bitbucket—it’s making sure the same human maps to the right accounts everywhere.
Common patterns that work well
- Email‑based linking
- Users log in via SSO with their corporate email.
- That email is also the primary email on their GitHub / GitLab / Bitbucket accounts.
- Sourcegraph associates identities based on this shared email.
- Username/handle mapping
- In some setups, usernames are standardized (e.g.,
jdoein both IdP and GitHub). - Sourcegraph can use that for mapping when emails differ or are private.
- In some setups, usernames are standardized (e.g.,
- Manual or scripted mapping for edge cases
- For contractors or external collaborators, you might:
- Create explicit mappings in Sourcegraph
- Or ensure their corporate IdP and code host identities line up before onboarding them to Sourcegraph
- For contractors or external collaborators, you might:
The goal: one Sourcegraph user identity that can be tied back to the same person in each code host so permission sync stays clean.
Step 6: Layer RBAC, audit, and governance on top
Once the base permission model is correct, you can safely unlock Sourcegraph’s higher‑leverage workflows without expanding blast radius.
Use RBAC to scope power features
Even if repo access is correct, you’ll want guardrails around high-impact actions:
- Admins / Operators
- Can configure code hosts
- Can run and manage Batch Changes across many repos
- Can define global Monitors and Insights
- Developers
- Can search and navigate code they’re authorized to see
- May be allowed to run Batch Changes in a narrow scope (e.g., only in certain projects)
- Read‑only / Stakeholders
- Can browse and search within permission boundaries
- Typically can’t author Batch Changes or edit configuration
Audit logs and compliance
Sourcegraph provides audit logs for security teams:
- Track admin changes (code host configuration, RBAC updates)
- Track key user activities, especially actions that could impact many repos
- Align with your compliance posture:
- SOC2 Type II
- GDPR/CCPA alignment
- Enterprise authentication (SAML/OIDC/OAuth), SCIM, RBAC
In heavily regulated environments, this is what makes security comfortable with exposing search and automation across 100 or 1M repositories.
How this plays with AI and Agentic search
If you’re bringing AI agents into the mix—via Deep Search, Sourcegraph MCP, or your own agents—the permission model matters even more.
Sourcegraph’s posture:
- Agents respect the same permissions as humans.
- Agents use Sourcegraph as their “eyes” on the codebase.
- They only see repositories the underlying user/token can see.
- Zero data retention for LLM inference.
- Your prompt and inference data isn’t stored beyond what’s required.
- It’s not shared with third parties.
- Every answer is grounded in code.
- Deep Search and AI workflows point back to the specific files and lines used.
- That makes review and governance possible, especially when agents propose large changes.
This is why getting permissions right at the GitHub/GitLab/Bitbucket layer is critical: it becomes the hard boundary for everything humans and agents do in Sourcegraph.
Practical rollout tips from the field
A few patterns that have worked well when I’ve rolled this out:
- Start with read‑only, limited scope.
- Begin with one code host (often GitHub Enterprise), a subset of orgs/projects, and SSO wired up.
- Validate that a sample of users see exactly what they expect—no more, no less.
- Bring security and compliance in early.
- Demo how permissions mirror code hosts.
- Show SSO, RBAC, audit logs, and zero data retention posture.
- Gradually add other hosts.
- Add GitLab, then Bitbucket, validating permission behavior at each step.
- Use Insights/Monitors incrementally, then introduce Batch Changes once trust is established.
- Document identity mapping rules.
- Make it clear how a Sourcegraph user links to GitHub/GitLab/Bitbucket accounts.
- Keep this in your internal “developer tooling” runbook so new teams know the expectations.
What you get once everything is wired up
When Sourcegraph is correctly connected to GitHub Enterprise, GitLab, and Bitbucket with aligned permissions:
- Developers and agents get lightning-fast search at enterprise scale across all their code—whether that’s 100 or 1M repositories.
- Security teams know:
- Access matches existing code host permissions.
- AI doesn’t see more than humans.
- There’s zero data retention for inference and clear audit trails.
- Platform and migration teams can:
- Use Batch Changes for controlled, multi‑repo edits.
- Use Monitors to catch insecure patterns and bad practices.
- Use Insights to understand how code is changing across repositories and hosts.
You turn scattered GitHub, GitLab, and Bitbucket repos into a single, governed surface for code understanding—without breaking the permission model that keeps your organization safe.