Sourcegraph proof-of-value plan: what should we test (indexing speed, permissions sync, search relevance, Deep Search quality) in 2–4 weeks?

Most teams only get one real shot at evaluating a code understanding platform before stakeholders decide whether it becomes critical infrastructure or “just another tool.” In a 2–4 week Sourcegraph proof-of-value (PoV), you don’t have time for a broad, unfocused trial. You need a tight, high-signal plan that proves Sourcegraph can handle your real codebase: the size, the sprawl, the legacy pockets, and the access controls your auditors care about.

Below is a practical PoV blueprint I’d use as a former Staff Engineer / Dev Productivity lead running Sourcegraph into a hybrid GitHub + Perforce environment with thousands of repositories.

PoV Goal and Timebox: What “Success” Should Look Like

In 2–4 weeks, the PoV should answer four questions with hard evidence:

Indexing speed & coverage
Can Sourcegraph index our real repositories (and branches) fast enough, and keep up as we push new code?
Permissions sync & governance
Does Sourcegraph reliably enforce our existing access model (GitHub, GitLab, Bitbucket, Gerrit, Perforce, SSO/RBAC) so that both humans and agents only see what they’re allowed to see?
Search relevance & depth (Code Search)
Can developers reliably find the right code, symbols, and patterns across all repositories with a few precise queries?
Deep Search quality for complex questions
When we ask Deep Search agentic questions about our legacy systems, does it return accurate, explainable answers that link back to the right code?

Everything else in this PoV plan rolls up to one of those questions.

At-a-Glance 2–4 Week PoV Plan

Week	Focus Area	What to Validate	Success Evidence
1	Indexing & connectivity	Repos added, indexing speed, branch coverage	% of target repos indexed; time from connect → searchable
1–2	Permissions & identity	SAML/OIDC, SCIM, RBAC, repo permissions parity	No permission leakage; auditors sign off on behavior
2–3	Search relevance	Real developer workflows with Code Search	Time-to-answer vs. today; search result quality
3–4	Deep Search quality	Complex, cross-repo questions & incidents	Accurate answers with code links; fewer human handoffs

You can overlap some of this (e.g., start search tests as soon as a representative slice of repos is indexed), but it helps to anchor each week to a primary outcome.

1. Test Indexing Speed and Coverage

If Sourcegraph can’t keep up with your codebase, nothing else matters. Prove early that it can index “100 or 1M repositories” at the speed your teams need.

1.1 Define a realistic PoV corpus

Resist the temptation to only index a toy subset. You want a slice that exposes real complexity:

At least 3–5 critical systems
- One modern service (e.g., TypeScript/Go/Java microservice)
- One legacy monolith (Java/C#/C++)
- One “hairy” repo (generated code, huge history, odd layout)
Multiple code hosts (if applicable)
- GitHub, GitLab, Bitbucket, Gerrit, Perforce—include at least one of each you actually use.
Multiple branches
- Mainline branch plus at least one long-lived release or LTS branch where bugfixes are still happening.

1.2 What to measure

For indexing, capture metrics that your infra and platform teams will care about:

Time from connect → first search
- For each repo and code host, note:
  - Timestamp when you add the connection
  - Timestamp when you can successfully search it in Sourcegraph
Incremental indexing behavior
- Make a change in a PoV repo, push to the main branch, and measure:
  - Time from push → updated code visible in search
Multi-branch indexing
- Enable multi-branch indexing for key repos and validate:
  - Branch filter queries (e.g., repo:my-service@release-2024.04) respond quickly
  - Cross-branch searches complete fast enough for interactive use
Scalability feel test
- Add more repositories mid-PoV (e.g., double the corpus) and confirm indexing performance remains acceptable.

1.3 How to run the test

Create a shared tracking doc with columns: repo, host, size (LOC/revisions), added-at, searchable-at, delta.
Have infra / platform sign off that the numbers are acceptable for going org-wide.
Ask at least 3–5 developers from different teams to confirm “search feels instantaneous enough” on their daily paths.

2. Test Permissions Sync and Governance

For an enterprise rollout, the most important question after performance is: “Does this respect our access model 100% of the time?” This is where PoVs often fall short if they treat permissions as a checkbox rather than a test suite.

2.1 Integrate identity the way you would in production

Configure Sourcegraph with your real identity stack:

Single Sign-On (SSO)
- SAML, OpenID Connect, or OAuth—whichever you use in production.
SCIM provisioning (if you use it)
- Validate user and group provisioning flows.
Role-based access controls (RBAC)
- Define roles similar to your current patterns (e.g., “Developer,” “Read-only,” “Security,” “Admin”).

2.2 Validate repo & branch permissions

Use real accounts, not admin-only testing:

Positive tests (should see)
- A developer who has access to Repo A in GitHub/GitLab/Bitbucket/Gerrit/Perforce should see the same repo (and only relevant branches) in Sourcegraph.
Negative tests (should not see)
- A developer without access to Repo B in the code host must not see:
  - Repo B in repo lists
  - Files from Repo B in search results
  - Any code snippets from Repo B in Deep Search answers
Edge cases
- User revoked from a group: confirm access is removed in Sourcegraph after your normal propagation window.
- Private forks / confidential projects: confirm they never show up for unauthorized users.

2.3 Deep Search and agents must honor permissions

Deep Search should behave like a well-trained engineer with the same access model:

Run Deep Search prompts from accounts with different roles and verify:
- Answers never leak content from repos the user can’t access.
- Referenced code links resolve and respect permissions (no “access denied” surprises on click-through).

2.4 What to measure and document

Parity checks:
- Sample N users (e.g., 10–20) and confirm 1:1 parity between their code host permissions and what they see in Sourcegraph.
Audit comfort:
- Involve security / compliance early. Ask explicitly:
  - “Would you approve this behavior at enterprise scale?”
Configuration notes:
- Capture SSO, SCIM, and RBAC configuration details so you can reproduce them in production after the PoV.

3. Test Search Relevance with Real Developer Workflows

This is where developers feel the difference between “another search bar” and a true code understanding platform. Focus on the day-to-day tasks that currently cost them 20–60 minutes at a time.

3.1 Collect 10–20 real search scenarios up front

Before you start, ask engineers from different teams to share:

Recent tasks where they:
- Had to grep across many repos to find a call site or configuration.
- Needed to identify all usages of a deprecated API or pattern.
- Searched for security-relevant patterns (e.g., missing input validation, insecure crypto).
- Spent time navigating monoliths or tangled service meshes.
For each scenario, document:
- Current steps (tools used, time spent, pain points).
- What “good” would look like (e.g., “Find all call sites of Foo.doWork() in 3 minutes or less”).

3.2 Re-run those scenarios in Sourcegraph Code Search

Use Sourcegraph’s advanced query capabilities:

Keyword + structural patterns
- Combine filters like repo:, file:, lang:, -repo:, and pattern matching.
Symbol and definition search
- Use Code Navigation and precise indexing to jump from usage → definition → references.
Cross-repo refactor exploration
- Identify all usages of a deprecated function or class across repos; sanity check the results.

Examples of tests to run:

“Find every call to the legacy encrypt() helper outside of security-utils.”
- Query using repo: filters, file: paths, and content patterns.
“List all places we construct HttpClient without a timeout.”
- Use content patterns and language filters.
“Show me all services importing old-config-lib.”
- Search by import path across all service repos.

3.3 How to measure search relevance

For each scenario, capture:

Time-to-answer
- How long it takes from starting a query to confidently answering the original question.
Number of queries / refinement steps
- How many query tweaks are needed to get to useful results.
Result quality
- Did the top results include the most relevant files?
- Were there false positives that made it harder to use?
Developer perception
- Ask participants to rate on a 1–5 scale:
  - “Sourcegraph made this task faster than my current workflow.”
  - “I trust the completeness of what I’m seeing.”

You’re looking for clear reductions in time-to-answer and higher confidence in coverage—especially in “sprawling” areas where teams usually guess.

4. Test Deep Search Quality on Complex, Cross-Repo Questions

Deep Search is Sourcegraph’s Agentic AI Search layer. The PoV should validate that it can handle the messy cases where coding agents usually fail: legacy codebases, cross-repo workflows, hidden coupling.

4.1 Pick the right Deep Search test cases

Avoid trivial “summarize this file” prompts. Instead, focus on tasks that already cost senior engineers real time:

Architecture and data flow questions
- “How does a payment refund flow from the API layer through to the ledger update?”
Incident investigation & postmortem support
- “Given this stack trace, where in the code do we likely throw this error, and what upstream inputs can trigger it?”
Security and compliance follow-up
- “Show me all endpoints that handle user PII and how they log or redact it.”
Migration planning
- “We want to replace LegacyCacheClient with NewCacheClient. What are the main usage patterns and integration points across services?”

4.2 How to run Deep Search evaluations

For each test case:

Baseline
- Ask a senior engineer how they’d solve it today and roughly how long it would take.
Deep Search prompt
- Craft a targeted prompt that references your real codebase, e.g.:
  - “In this repository group, explain how the refund flow works end-to-end and link to the key functions in each service.”
Review the answer
- Check:
  - Are the code examples and file paths correct?
  - Does the explanation align with the actual control flow?
  - Are there hallucinations (e.g., imaginary files, functions)?
Token and context behavior
- Observe how Deep Search behaves in large repos and multi-repo contexts. The design goal is:
  - Exhaustive-enough search. Focused answers.
  - Clear links back to the exact code used so humans can verify.

Remember: Deep Search is backed by real Sourcegraph code understanding and search, not just a generic LLM. It should feel grounded: every statement traceable to code.

4.3 What to measure for Deep Search quality

For each question:

Accuracy rating (1–5)
- 5 = “I’d ship a change or write docs based on this with minimal verification.”
Coverage rating (1–5)
- 5 = “It covered all major paths / edge cases I care about.”
Senior engineer time saved
- Estimate “time saved vs. baseline” for exploration and explanation.
Trust level
- Would the reviewer rely on Deep Search during a real incident or migration?

You want several 4–5 scores from skeptical senior engineers to build credible support for rollout.

5. Optional: Test Batch Changes, Monitors, and Insights (If Time Allows)

If you have bandwidth in a 4-week PoV, add at least a small test around Sourcegraph’s “turn understanding into action” workflows. These are often what unlock budget and sponsorship beyond “search.”

5.1 Batch Changes: multi-repo edits

Pick a safe, low-risk change:
- E.g., replace a deprecated config flag, update a logging API, or rename a constant across services.
Use Batch Changes to:
- Search for all affected call sites across repos.
- Generate a batch of changes (branches + changesets) automatically.
- Review and apply them through your normal review/merge process.
Measure:
- Time to prepare and roll out the change vs. your current scripting/manual approach.
- Number of repos touched.
- Developer feedback on review experience.

5.2 Monitors: catch risky patterns early

Define 2–3 patterns your security or platform team cares about:
- Use of insecure crypto APIs
- Direct access to forbidden dependencies
- Logging of PII
Create Monitors on those queries to:
- Detect new occurrences when they land.
- Notify the right owners or hook into downstream automation.
Measure:
- How quickly a new bad pattern is detected.
- How much manual log scraping or ad-hoc grepping this replaces.

5.3 Insights: track migrations and standards

For an ongoing migration (framework, library, or service), create an Insight:
- Track usage of old vs. new API over time.
Measure:
- How easy it is for tech leads to see migration progress without bespoke scripts or dashboards.

These workflows illustrate that Sourcegraph is more than a search bar—it becomes a platform for controlled, auditable change and governance across repos.

6. Who to Involve in the PoV

You’ll move faster and get a more reliable read if you treat this as a cross-functional effort:

Infra / platform engineering
- Owns indexing, performance, code host integration, and SSO/SCIM/RBAC.
Security / compliance
- Validates permissions behavior, zero data retention posture, and audit requirements.
Staff / senior engineers from 2–3 key domains
- Evaluate search relevance and Deep Search quality in real workflows.
Engineering leadership
- Defines success criteria: “If we see X, Y, Z in this PoV, we move to rollout.”

Make sure they all see the same PoV plan, metrics, and outcomes—not just a demo.

7. How to Summarize PoV Results for Stakeholders

At the end of 2–4 weeks, you want a single-page summary that answers:

Indexing & scale
- “We connected N repositories across GitHub/GitLab/Bitbucket/Gerrit/Perforce. Time from connection → searchable averaged X minutes. Incremental changes show up within Y seconds/minutes.”
Permissions & governance
- “SSO (SAML/OIDC/OAuth), SCIM, and RBAC configured. Sampled N users; 100% parity between code host permissions and Sourcegraph. Security signed off on behavior and zero data retention.”
Search relevance
- “We re-ran 15 real scenarios. Average time-to-answer decreased from X minutes to Y minutes. Developers rated relevance 4.3/5 and coverage 4.5/5.”
Deep Search quality
- “We tested 8 complex questions (incidents, migrations, security). Senior engineers rated answer accuracy 4.1/5 and coverage 4.0/5. All answers linked to exact code; no permission leaks observed.”
Action workflows (if tested)
- “One Batch Change touched 40 repos with a single reviewed rollout. Monitors configured for 3 risky patterns; Insights added for 1 migration.”

End with a clear recommendation: either proceed to production rollout or list the specific gaps blocking that decision.

Final Verdict

In a tight 2–4 week window, the most effective Sourcegraph proof-of-value plan concentrates on four things:

Indexing speed & coverage: Prove Sourcegraph can keep up with your real repos and branches.
Permissions sync & governance: Show that SSO, SCIM, and RBAC work end-to-end and that search/Deep Search never overstep your access model.
Search relevance: Demonstrate that developers get faster, more complete answers than they do with grep, per-repo search, or ad-hoc scripts.
Deep Search quality: Validate that agentic search delivers accurate, linked-to-code answers across the messy, legacy parts of your codebase.

If you have time, layer in Batch Changes, Monitors, and Insights to show how code understanding turns into controlled, auditable change at scale.

Once you’ve proven these in your own environment, you’re not just testing a tool—you’re validating a new code understanding layer that can support both your developers and your AI agents as your codebase keeps multiplying.

Next Step

Get Started