Q&A: Sourcegraph’s Universal Code Search Tool

In software development, code search is a way to better navigate and understand code. But it’s an often overlooked technique, with development tools and coding environments offering clunky and limited search functionalities.

Tech startup Sourcegraph aims to change that with its universal code search tool by the same name that makes searching code as seamless as doing a Google search on the web. To achieve that efficiency, Sourcegraph models code and its dependencies as a graph, and performs queries on the graph in real time.

Compared to Facebook’s internal search tool or Google’s code search for its own open-source projects, Sourcegraph makes its source code publicly available. The tool is free for individuals and teams of up to 10, and is available to larger teams through tiered pricing. It supports over 30 programming languages and integrates with developer tools such as GitHub and GitLab for code hosting, Codecov for code coverage, and Jira Software for project management.

Sourcegraph, which is based in San Francisco, closed a US $23 million Series B funding round last month and has raised $43 million to date.  Engineering teams at Adidas, Cloudflare, Lyft, Uber, Yelp, and others already use the tool.

Sourcegraph co-founder and CEO Quinn Slack spoke to IEEE Spectrum about the inner workings of the company’s universal code search functionality and the advantages of code search for software developers.

This interview has been edited and condensed for clarity.

IEEE Spectrum: What are the benefits of code search for developers?

Quinn Slack: Code search makes it easy to find usage examples for your company’s own code. You can see how the engineers who are most experienced with a certain type of code are using it in your code base. You can also search some of the internal libraries you need to work with and see how other developers are using them. You’ll learn new techniques or you might find that everyone else is making a common mistake, or you might figure out how to do something better and educate the rest of your team.

Slack: Search is where people go to find answers to questions, which means you need to have all the information to show them the best answers. With universal code search, you need to have all the code—not just the latest version but the entire history—from every repository. An analogy I like is Wikipedia having its own search box, but almost everyone searches on Google because Google has all the answers.

Making code search universal is tough. We had to write a new search back-end from the ground up, and we had to come up with a common way to make it understand all these different languages. Then, we had to integrate it with other tools and services, so we can get information from code coverage tools, logging tools, tracing tools, and feature flag tools, to name a few. We have a hybrid search, which means that if someone pushes some code, then you’ll be able to search it on Sourcegraph instantly—even if it isn’t indexed yet.

IEEE Spectrum: Why did you decide to model code as a graph?

Slack: All code is connected. When writing code, you’re using libraries and calling services written by other developers. We had to understand the relationship of one piece of code to every other piece of code to answer the questions users have when searching code. [READ MORE]