How do we connect Sourcegraph to GitHub Enterprise, GitLab, and Bitbucket so permissions match what users can actually access?
AI Codebase Context Platforms

How do we connect Sourcegraph to GitHub Enterprise, GitLab, and Bitbucket so permissions match what users can actually access?

10 min read

When you roll out a universal code understanding platform, the first non‑negotiable is: permissions in Sourcegraph must match permissions in your code hosts. If a developer can’t see a repo in GitHub Enterprise, GitLab, or Bitbucket, they shouldn’t discover it through Deep Search, Batch Changes, or any AI-powered workflow in Sourcegraph.

As someone who’s run this in a regulated environment, I’ll walk through how to connect Sourcegraph to GitHub Enterprise, GitLab, and Bitbucket so repo visibility and access line up with what users can actually reach—no more, no less.


How Sourcegraph thinks about permissions

Sourcegraph doesn’t replace your identity or authorization model. It layers on top of it.

At a high level:

  • Auth is centralized. Users authenticate via Single Sign-On (SSO) with SAML, OpenID Connect, or OAuth. That means the same identity you use for GitHub Enterprise / GitLab / Bitbucket (often Okta, Azure AD, Ping, etc.) is what you use in Sourcegraph.
  • Repo permissions are synchronized. Sourcegraph reads visibility and permissions from each connected code host. A user only sees (and can search) repositories they can access in the underlying host.
  • Access is enforced everywhere. Code Search, Deep Search, Code Navigation, Batch Changes, Monitors, Insights—everything respects repo‑level permissions.
  • Enterprise controls apply. You get RBAC on top (roles within Sourcegraph), SCIM for automated user lifecycle, and audit logs so security can see who accessed what and when.

The result: one place to search and understand code across GitHub, GitLab, Bitbucket, Perforce, and more—but always within the same access boundaries your security team already trusts.


Step 1: Decide your identity & SSO integration

Before you connect code hosts, lock in how users log in to Sourcegraph. This is what keeps identities and permissions aligned.

Recommended pattern

Use your existing IdP (Okta, Azure AD, Google Workspace, etc.):

  • Configure SSO with:
    • SAML or OpenID Connect as the primary choice for enterprise
    • OAuth if you’re aligning tightly to a specific provider like GitHub
  • Enable SCIM user provisioning so:
    • New employees get Sourcegraph accounts automatically
    • Departing users are deprovisioned and lose access
  • Use RBAC in Sourcegraph to:
    • Restrict admin capabilities to a small group
    • Separate “operator” roles (who manage repos and Batch Changes) from normal developers

This ensures Sourcegraph identities track your org chart and security model, not a separate local user database.


Step 2: Connect GitHub Enterprise with permission sync

GitHub Enterprise is often the largest source of code sprawl. The key is to treat GitHub as the source of truth for repository permissions and let Sourcegraph mirror that.

How Sourcegraph connects to GitHub Enterprise

At a conceptual level, you:

  1. Create a machine identity in GitHub:
    • Use a GitHub App or a personal access token (PAT) with:
      • repo access (for private repositories)
      • read:org (if you’ll filter by organizations or teams)
    • Scope it only to the organizations and repos you intend to index.
  2. Add GitHub Enterprise as a code host in Sourcegraph:
    • Point Sourcegraph to your GitHub Enterprise Server URL or GitHub Enterprise Cloud org
    • Provide the app credentials or token
    • Configure how frequently Sourcegraph:
      • Syncs repositories
      • Refreshes permissions
  3. Configure repository selection:
    • Include repos by org, name patterns, or explicit lists
    • Optionally exclude:
      • Archived repos
      • Specific orgs or projects that are out of scope

How permissions stay in sync

GitHub’s permission model is the backbone:

  • Private repo access is derived from:
    • Org membership
    • Team membership
    • Direct repo collaborators
  • Sourcegraph queries GitHub to figure out:
    • Which repos exist
    • Which users (or teams) can see which repos
  • When a user logs into Sourcegraph:
    • Sourcegraph ties their identity (via SSO / OAuth) to their GitHub account
    • Only repositories they can access in GitHub are surfaced in:
      • Code Search / Deep Search
      • Repo lists
      • Batch Changes targets
      • Insights and Monitors

Key outcome: A GitHub-only engineer logging into Sourcegraph will see exactly what they see in GitHub Enterprise—just searchable across the entire org, down to symbols and patterns.


Step 3: Connect GitLab (self-managed or SaaS) with matching access

For teams with GitLab (often for regulated or self-hosted projects), the model is similar but mapped to GitLab concepts.

How Sourcegraph connects to GitLab

  1. Create a GitLab token or application:
    • Use a personal access token, group-level access token, or OAuth application
    • Ensure it has:
      • api scope (for project and membership lookup)
      • read_repository for cloning and indexing
  2. Add GitLab as a code host in Sourcegraph:
    • For GitLab SaaS, use https://gitlab.com
    • For self-managed GitLab, use your internal URL
    • Provide the token or OAuth client credentials
  3. Define project selection rules:
    • Include by:
      • Groups / subgroups
      • Project name patterns
    • Exclude confidential or out-of-scope projects if needed

How permissions stay in sync

GitLab’s access model centers around:

  • Membership in groups / subgroups
  • Project‑level roles (Guest/Reporter/Developer/Maintainer/Owner)
  • Public vs internal vs private projects

Sourcegraph:

  • Syncs the list of projects and their visibility from GitLab
  • Maps users’ Sourcegraph identities to their GitLab accounts
  • Enforces that:
    • Users only see search results from projects they can access in GitLab
    • Project-level visibility is reflected in all Sourcegraph features

That means, for example, a contractor with access to a single GitLab group won’t suddenly see internal migration repos or security-sensitive projects when using Deep Search.


Step 4: Connect Bitbucket (Server/Data Center or Cloud) with correct bounds

Bitbucket is still common in enterprises, especially in older or regulated environments. It’s usually where a lot of “legacy that can’t break” lives—exactly the code you don’t want leaking across permission boundaries.

How Sourcegraph connects to Bitbucket

  1. Set up credentials in Bitbucket:
    • For Bitbucket Server/Data Center:
      • Create a service account with:
        • Read access to relevant projects and repos
        • API permissions for project/repo listing
    • For Bitbucket Cloud:
      • Use an app password or OAuth consumer with:
        • Repository read
        • Workspace membership read
  2. Add Bitbucket as a code host in Sourcegraph:
    • Provide your Bitbucket base URL (Server/DC) or cloud workspace
    • Configure authentication (username + app password, or OAuth)
  3. Decide which projects/repos to sync:
    • Include specific projects (e.g., ENG, LEGACY, SEC)
    • Exclude sensitive or external-facing projects if you don’t want them in Sourcegraph at all

How permissions stay in sync

Bitbucket:

  • Uses projects and groups plus user accounts to control repo access
  • May also have fine-grained permissions at the repo level

Sourcegraph:

  • Pulls project and repo lists from Bitbucket
  • Queries Bitbucket to determine which users can see which repos
  • Matches Sourcegraph users to Bitbucket identities via SSO mapping
  • Ensures search, navigation, and automation are limited to those repos

That means developers with access only to a subset of Bitbucket projects will see that same subset in Sourcegraph, with no ability to search across the rest.


Step 5: Map identities correctly across all hosts

The tricky part in multi‑host environments isn’t just connecting GitHub, GitLab, and Bitbucket—it’s making sure the same human maps to the right accounts everywhere.

Common patterns that work well

  • Email‑based linking
    • Users log in via SSO with their corporate email.
    • That email is also the primary email on their GitHub / GitLab / Bitbucket accounts.
    • Sourcegraph associates identities based on this shared email.
  • Username/handle mapping
    • In some setups, usernames are standardized (e.g., jdoe in both IdP and GitHub).
    • Sourcegraph can use that for mapping when emails differ or are private.
  • Manual or scripted mapping for edge cases
    • For contractors or external collaborators, you might:
      • Create explicit mappings in Sourcegraph
      • Or ensure their corporate IdP and code host identities line up before onboarding them to Sourcegraph

The goal: one Sourcegraph user identity that can be tied back to the same person in each code host so permission sync stays clean.


Step 6: Layer RBAC, audit, and governance on top

Once the base permission model is correct, you can safely unlock Sourcegraph’s higher‑leverage workflows without expanding blast radius.

Use RBAC to scope power features

Even if repo access is correct, you’ll want guardrails around high-impact actions:

  • Admins / Operators
    • Can configure code hosts
    • Can run and manage Batch Changes across many repos
    • Can define global Monitors and Insights
  • Developers
    • Can search and navigate code they’re authorized to see
    • May be allowed to run Batch Changes in a narrow scope (e.g., only in certain projects)
  • Read‑only / Stakeholders
    • Can browse and search within permission boundaries
    • Typically can’t author Batch Changes or edit configuration

Audit logs and compliance

Sourcegraph provides audit logs for security teams:

  • Track admin changes (code host configuration, RBAC updates)
  • Track key user activities, especially actions that could impact many repos
  • Align with your compliance posture:
    • SOC2 Type II
    • GDPR/CCPA alignment
    • Enterprise authentication (SAML/OIDC/OAuth), SCIM, RBAC

In heavily regulated environments, this is what makes security comfortable with exposing search and automation across 100 or 1M repositories.


How this plays with AI and Agentic search

If you’re bringing AI agents into the mix—via Deep Search, Sourcegraph MCP, or your own agents—the permission model matters even more.

Sourcegraph’s posture:

  • Agents respect the same permissions as humans.
    • Agents use Sourcegraph as their “eyes” on the codebase.
    • They only see repositories the underlying user/token can see.
  • Zero data retention for LLM inference.
    • Your prompt and inference data isn’t stored beyond what’s required.
    • It’s not shared with third parties.
  • Every answer is grounded in code.
    • Deep Search and AI workflows point back to the specific files and lines used.
    • That makes review and governance possible, especially when agents propose large changes.

This is why getting permissions right at the GitHub/GitLab/Bitbucket layer is critical: it becomes the hard boundary for everything humans and agents do in Sourcegraph.


Practical rollout tips from the field

A few patterns that have worked well when I’ve rolled this out:

  • Start with read‑only, limited scope.
    • Begin with one code host (often GitHub Enterprise), a subset of orgs/projects, and SSO wired up.
    • Validate that a sample of users see exactly what they expect—no more, no less.
  • Bring security and compliance in early.
    • Demo how permissions mirror code hosts.
    • Show SSO, RBAC, audit logs, and zero data retention posture.
  • Gradually add other hosts.
    • Add GitLab, then Bitbucket, validating permission behavior at each step.
    • Use Insights/Monitors incrementally, then introduce Batch Changes once trust is established.
  • Document identity mapping rules.
    • Make it clear how a Sourcegraph user links to GitHub/GitLab/Bitbucket accounts.
    • Keep this in your internal “developer tooling” runbook so new teams know the expectations.

What you get once everything is wired up

When Sourcegraph is correctly connected to GitHub Enterprise, GitLab, and Bitbucket with aligned permissions:

  • Developers and agents get lightning-fast search at enterprise scale across all their code—whether that’s 100 or 1M repositories.
  • Security teams know:
    • Access matches existing code host permissions.
    • AI doesn’t see more than humans.
    • There’s zero data retention for inference and clear audit trails.
  • Platform and migration teams can:
    • Use Batch Changes for controlled, multi‑repo edits.
    • Use Monitors to catch insecure patterns and bad practices.
    • Use Insights to understand how code is changing across repositories and hosts.

You turn scattered GitHub, GitLab, and Bitbucket repos into a single, governed surface for code understanding—without breaking the permission model that keeps your organization safe.


Next Step

Get Started