How good is code search ranking, really? When you search for router in a web framework, do you get the file that defines routing — or a changelog entry that mentions the word? When you search for context in Go’s standard library, do you get context/context.go — or context_test.go?

We benchmarked four code search tools across 41 queries and 8 repositories to find out. The results were stark: searchcode returned the correct #1 result 86% of the time, compared to 24% for Tool A, 50% for Tool C, and 54% for Tool B.

This post walks through the methodology, the raw results, and the technical reasons behind the gap.

Methodology

Tools tested

Repositories

We chose well-known open source projects across multiple languages:

RepositoryLanguageStarsWhy chosen
golang/goGo135k+Massive stdlib, deep package hierarchy
gin-gonic/ginGo80k+Popular web framework, clear file structure
expressjs/expressJavaScript65k+Node.js web framework, well-organized
pallets/flaskPython70k+Python web framework, clean codebase
rust-lang/regexRust3.9kComplex parsing/compilation pipeline
servo/servoRust36k+Browser engine, deep component hierarchy
jetbrains/kotlinKotlin/Java50k+Compiler, massive codebase
aquasecurity/vuln-list-updateGo191Vulnerability updater, many subpackages

How we judged “correct”

For each query, we defined the expected #1 result before searching: the file a developer would most likely want to find. For router in gin, that’s routergroup.go or gin.go (where routing is implemented), not BENCHMARKS.md or README.md. For context in Go, it’s context/context.go, not context_test.go.

A result was marked correct if the #1 result was a core implementation file relevant to the query. We gave partial credit for results in the right package but wrong file. Documentation, changelogs, test files, and example files were marked incorrect — a developer searching for parser wants the parser implementation, not a changelog entry mentioning a parser fix.

Results: Four-Way Comparison

We ran 8 queries across gin-gonic/gin and expressjs/express where all four tools could be compared head-to-head.

gin-gonic/gin

Querysearchcode #1Tool A #1Tool C #1Tool B #1
routergin.goBENCHMARKS.mdroutergroup.goroutergroup.go
contextcontext.gocontext_test.gocontext.gocontext.go
middlewaregin.goREADME.mdroutergroup.goREADME.md
bindingbinding/binding.gobinding_nomsgpack.gocontext.gobinding/binding.go

expressjs/express

Querysearchcode #1Tool A #1Tool C #1Tool B #1
routerlib/application.jsHistory.mdtest/Router.jslib/application.js
requestlib/request.jstest/req.xhr.jstest/express.static.jslib/request.js
responselib/response.jstest/res.status.jslib/response.jslib/response.js
middlewarelib/application.jsREADME.mdtest/app.use.jsexamples/route-middleware/index.js

Four-way scorecard

ToolCorrectAccuracy
searchcode8/8100%
Tool B6/875%
Tool C3/838%
Tool A0/80%

Tool A returned a documentation or test file for every single query across both repositories.

Results: Four-Way on Large Codebases

We extended the four-way comparison to two much larger repositories: servo/servo (a browser engine in Rust) and jetbrains/kotlin (the Kotlin compiler).

A note on Tool C: Tool C can filter to a single repository, but only for repos that appear in its faceted sidebar — essentially popular repos already in its index. The URL parameter filter[repo] is silently ignored; you must use f.repo= or click from the sidebar. For smaller repos like aquasecurity/vuln-list-update, Tool C cannot scope at all.

servo/servo

Querysearchcode #1Tool A #1Tool C #1Tool B #1
layoutcomponents/layout/layout_impl.rscomponents/layout/flow/mod.rscomponents/layout/dom.rscomponents/layout/flow/float.rs
scriptcomponents/script/script_thread.rstests/wpt/.../client.pycomponents/script/dom/html/htmlscriptelement.rscomponents/shared/embedder/user_contents.rs
rendercomponents/paint/painter.rstests/wpt/.../serializer.pycomponents/paint/painter.rscomponents/media/.../render.rs
parsecomponents/script/dom/html/htmlimageelement.rspython/servo/try_parser.pycomponents/script/dom/servoparser/async_html.rspython/servo/try_parser.py

For script, Tool A returned a WebDriver test tool from tests/wpt/ — a third-party Python file completely unrelated to Servo’s script engine. For render, it returned an html5lib serializer from the same test tools directory.

ToolCorrectAccuracy
searchcode3/475%
Tool C3/475%
Tool B2/450%
Tool A1/425%

Tool C performed well here — htmlscriptelement.rs for script and async_html.rs for parse are both strong results for a tool with no code-aware ranking.

jetbrains/kotlin

Querysearchcode #1Tool A #1Tool C #1Tool B #1
compilercli/.../KotlinToJVMBytecodeCompiler.ktrepo/gradle-build-conventions/.../ideaExtKotlinDsl.ktcompiler/build-tools/.../compat/...plugins/compose/design/compiler-metrics.md
parsercompiler/psi/parser/.../KDocParser.javakotlin-native/performance/.../JsonParser.ktjs/js.parser/.../JavaScriptParserListener.javacompiler/psi/parser/.../KDocParser.java
typecompiler/tests-spec/testData/...wasm/wasm.ir/.../Types.ktcore/compiler.common/.../AbstractTypeChecker.ktkotlin-native/runtime/.../Types.h
resolvecompiler/fir/resolve/.../FirExpressionsResolveTransformer.ktanalysis/.../testData/lazyResolve/superTypes.ktanalysis/analysis-api/.../KaResolver.ktjs/js.ast/.../JsNameRef.java

The Kotlin compiler is a stress test — 778k matches for type alone. Tool A returned a gradle build convention file for compiler and test data for resolve. Tool B returned a Markdown design doc for compiler. searchcode hit the actual KotlinToJVMBytecodeCompiler.kt but stumbled on type (returning test spec data).

ToolCorrectAccuracy
searchcode3/475%
Tool C2/450%
Tool B2/450%
Tool A0/40%

aquasecurity/vuln-list-update (3-way, Tool C cannot scope)

Querysearchcode #1Tool A #1Tool B #1
mainmain.gomain.gomain.go
updateredhat/csaf/vex.gocwe/cwe.gonvd/nvd.go
fetchredhat/csaf/vex.goutils/utils.gonvd/nvd.go
configredhat/csaf/vex.gogit/git.gogit/git.go
debiandebian/tracker/debian.godebian/tracker/debian.goREADME.md
alpinealpine/alpine.goalpine-unfixed/alpine_test.goalpine/alpine.go

For update, fetch, and config, every tool returned a different valid implementation file — these queries are genuinely ambiguous in a repo where every subpackage has its own Update() method and Config struct. The discriminating queries are debian and alpine: searchcode got both right, Tool A ranked a test file for alpine, and Tool B ranked README.md for debian.

ToolCorrectAccuracy
searchcode5/683%
Tool A4/667%
Tool B4/667%

Results: Deep Dive on golang/go

The Go standard library is the hardest test case — thousands of packages, many files with overlapping terminology. We tested 7 queries comparing searchcode and Tool A.

Querysearchcode #1Tool A #1SCA
sortsort/zsortinterface.goslices/sort.go~~
mutexruntime/mprof.gocmd/go/internal/lockedfile/mutex.gono~
context cancelcontext/context.gocontext/context.goyesyes
handlerlog/slog/handler.go(wrong)yesno
scannergo/scanner/scanner.goyes
http client requestnet/http/request.goruntime/valgrind_amd64.syesno
json marshalhtml/template/js.goencoding/json/v2/errors.gonono

Score: searchcode 5/7, Tool A 3/7

Notable: for http client request, Tool A returned an assembly file from the runtime (valgrind_amd64.s) — completely unrelated to HTTP.

Results: searchcode vs Tool A (All Repos)

rust-lang/regex (5 queries)

Querysearchcode #1Tool A #1SCA
parserast/parse.rsCHANGELOG.mdyesno
compileregex-test/lib.rsregex-test/lib.rs~no
matchdfa/dense.rsregex-test/lib.rs~no
literalast/parse.rsnfa/thompson/literal_trie.rsno~
errorast/parse.rshir/mod.rs~yes

Score: searchcode 4/5, Tool A 2/5

pallets/flask (5 queries)

Querysearchcode #1Tool A #1SCA
routesansio/scaffold.pyCHANGES.rstyesno
blueprintsansio/blueprints.pydocs/blueprints.rstyesno
request responseapp.pyapp.pyyesyes
template rendersansio/scaffold.pydocs/tutorial/templates.rstyesno
configconfig.pydocs/config.rstyesno

Score: searchcode 5/5, Tool A 1/5

expressjs/express (5 queries)

Querysearchcode #1Tool A #1SCA
routerlib/application.jsHistory.mdyesno
middlewarelib/application.jsREADME.mdyesno
requestlib/request.jstest/req.xhr.jsyesno
responselib/response.jstest/res.status.jsyesno
view renderlib/application.jsexamples/view-constructor/index.jsyesno

Score: searchcode 5/5, Tool A 0/5

Aggregate Scorecard

searchcode vs Tool A (all 41 queries)

RepositoryQueriessearchcodeTool A
golang/go75 (71%)3 (43%)
rust-lang/regex54 (80%)2 (40%)
gin-gonic/gin55 (100%)1 (20%)
pallets/flask55 (100%)1 (20%)
expressjs/express55 (100%)0 (0%)
servo/servo43 (75%)1 (25%)
jetbrains/kotlin43 (75%)0 (0%)
aquasecurity/vuln-list-update65 (83%)4 (67%)
Total4135 (85%)12 (29%)

searchcode is 2.9x more accurate than Tool A at returning the correct #1 result.

Four-way comparison (16 queries across gin, express, servo, kotlin)

ToolCorrectAccuracy
searchcode14/1688%
Tool C8/1650%
Tool B10/1663%
Tool A1/166%

Why searchcode Wins

searchcode’s ranking advantage comes from a handful of code-aware heuristics layered on top of BM25 text relevance scoring. None of these are individually complex — the total implementation is roughly 50 lines of code — but together they model what a developer actually wants when searching code.

1. Test dampening

Files matching test patterns (_test.go, *_test.rs, /test/, /tests/, -test/) have their ranking score multiplied by 0.4. When a developer searches for context, they want the implementation, not the test suite.

This single heuristic addresses Tool A’s most common failure mode. Across our benchmark, Tool A’s #1 result was a test file in 6 of 27 queries — including context_test.go for “context” in gin, test/req.xhr.js for “request” in express, and reactiveArray.spec.ts for “reactive” in Vue.

2. Complexity gravity

Files with higher cyclomatic complexity get a ranking boost. Implementation files are inherently more complex than documentation, configuration, or boilerplate — they contain the actual logic. A file with branching, loops, and error handling is more likely to be what a developer is looking for than a flat list of exports.

3. Noise penalty

The ratio of complexity to file size penalizes large, low-complexity files. Changelogs, READMEs, and JSON configs are typically long but contain minimal logic. This pushes them down in results.

Tool A ranked a documentation or changelog file #1 in 11 of 27 queries: BENCHMARKS.md, README.md (3x), History.md, CHANGELOG.md, CHANGES.rst, docs/blueprints.rst, docs/config.rst, docs/tutorial/templates.rst, docs/doc.md.

4. Filename boost

When the query term matches the filename stem exactly, the file gets a 1.0 boost. Substring matches get a 0.5 boost. Searching for context boosts context.go. Searching for scanner boosts scanner.go. This is intuitive — if someone names a file router.go, it’s probably the canonical file for routing.

5. Directory name matching

Parent directory names matching the query get an additional boost. For context cancel, the file context/context.go gets a double boost — directory match plus filename match. This handles the common Go pattern of package/package.go.

The structural advantage

searchcode computes ranking at query time. Every heuristic improvement applies instantly to every query across every indexed repository, with no re-indexing required. Tools that bake ranking signals into their index need to re-index millions of repositories to deploy a ranking change — making iteration on relevance painfully slow.

Why Others Struggle

Each competing tool has a characteristic failure mode:

Tool A: documentation and changelogs

Tool A’s ranking appears to weight raw term frequency heavily. Changelogs mention every feature by name. READMEs describe every module. Documentation references every API. These files contain every keyword — but they’re the last place a developer wants to land when searching for an implementation.

Across all 41 queries, Tool A ranked a documentation or changelog file #1 in 13 queries and a test or tooling file #1 in 9 more. That’s 22 out of 41 — a 54% rate of returning non-implementation files as the top result.

Tool C: inconsistent but improving

Tool C’s results are a mixed bag. On smaller web frameworks (gin, express), it tended to surface test files — test/Router.js for router, test/app.use.js for middleware. But on larger codebases like servo/servo, it performed surprisingly well, matching searchcode’s accuracy with strong results like painter.rs for render and async_html.rs for parse.

Tool C can scope to a single repository, but only for repos in its index. You must use the f.repo= URL parameter or click from the sidebar facet — the filter[repo] parameter is silently ignored. For repos not in the index (like aquasecurity/vuln-list-update), Tool C cannot scope at all and returns cross-repo results.

Tool B: examples and docs

Tool B performed well overall (75% in the 4-way comparison), but its failures skewed toward example files and documentation. For middleware in gin, it returned README.md. For middleware in express, it returned examples/route-middleware/index.js. These are reasonable results for someone learning the framework, but not for a developer navigating the codebase.

Tool B also requires authentication — you must be signed in to use it.

Repository Coverage

We tested 9 repositories across multiple hosting platforms:

RepositorysearchcodeTool A
torvalds/linuxyesyes
anomalyco/opencodeyesyes
vuejs/coreyesyes
rust-lang/regexyesyes
earthboundkid/requestsyesyes
boyter/dcdyesyes
boyter/pinceryesno
golang-io/requestsyesno
esr/loccount (non-GitHub)yesno

Tool A’s public instance indexed 6 of 9 repos (67%). The three failures were smaller repos and a non-GitHub-hosted repo. searchcode indexed all 9 (100%).

For Tool A, searching boyter/pincer returned “No repositories found” with 0 results in 0.01 seconds — the repo simply isn’t in the index. This is a fundamental coverage limitation for any tool that requires pre-indexing: if the repo isn’t popular enough to be indexed, it doesn’t exist.

Beyond Search: code_analyze

searchcode offers structural analysis capabilities that no other tool provides. A single code_analyze call returns:

For example, analyzing rust-lang/regex:

MetricValue
Files381
Code lines127,000
Total complexity5,512
Languages220 Rust files
Quality findings3,588

The most complex files list immediately reveals the architectural core:

FileComplexityLines
ast/parse.rs3045,497
hir/parse.rs2341,768
dfa/dense.rs2212,189

For a smaller project like erikbern/git-of-theseus, the analysis reveals the entire architecture at a glance:

FileComplexityLinesRole
analyze.py99540Core (68% of complexity)
survival_plot.py17112Plotting
line_plot.py1162Plotting
stack_plot.py1159Plotting
utils.py313Helpers

No other code search tool offers anything comparable. Tool A has symbol search, but no structural analysis, complexity ranking, or quality findings.

MCP and AI Agent Integration

searchcode exposes its full capabilities through MCP (Model Context Protocol), making it directly usable by AI agents. The comparison with browser-based tools is significant:

Capabilitysearchcode (MCP)Browser-based tools
Output formatStructured JSONHTML (requires parsing)
Code contextConfigurable line contextCollapsed matches
Filteringlang:, path:, regex, only-declarations, only-comments, only-strings, only-codelang:, type:, repo:
Repo analysiscode_analyze (complexity, LOC, tech stack)None
Auth requiredNoTool B requires sign-in
Repo coverageAny public git repoVaries by index

The structural filters deserve special mention. only-declarations finds where a function or type is defined, not every file that calls it. only-comments finds design notes, TODOs, and documentation within code. only-strings finds error messages and user-facing text. These filters have no equivalent in any other tool tested.

For example, searching only-comments + TODO OR FIXME OR HACK in rust-lang/regex returns 29 matches — actual technical debt markers that a developer or agent could triage. No other tool can isolate these without manually filtering results.

Conclusion

Code search ranking is a solved problem that most tools haven’t solved. Across 41 queries and 8 repositories, searchcode returned the correct #1 result 85% of the time — nearly 3x better than Tool A (29%) and substantially ahead of Tool B (63%) and Tool C (50%). The gap isn’t due to sophisticated machine learning or massive infrastructure — it’s five simple heuristics that model what developers actually want: implementation files over tests, code over documentation, complex logic over boilerplate, and files whose names match the query.

The results suggest that most code search tools optimize for coverage (finding every file that contains a term) rather than relevance (finding the file you actually want). For a developer navigating an unfamiliar codebase, relevance is everything — and that’s where searchcode leads.