darkowlzz

go-git - Grep

3 minute read Published:

go-git now supports git-grep. Follow the “go-git” tag to read other go-git related posts.

  • git-grep
  • The Amazing Worktree
  • Grep Options
  • Option Validation
  • Grep Result

git-grep

git-grep - Print lines matching a pattern

  • git-scm.com

git-grep performs pattern matching in the tracked files in the work tree, blobs registered in the index files, or blobs in given tree objects. The result of match is sent to stdout, which consists of filename and content of the line. Some options can be passed to get more attribuets in the result.

Example result:

$ git grep "clean"
...
content/post/git-clean.md:with the `git clean` command. By default, `git clean` would delete any untracked
...

Internally, git performs a regex match of the provided pattern with all the tracked file content. The beauty of how this works lies in how the file content can be queried through the worktree. Instead of checking out to the given commit or reference and then searching through all the files, worktree is used to reach all the required files in their correct version.

The Amazing Worktree

Git worktree maintains a collection of all the files in a repo at a given point. A commit, a branch, a tag, or any reference have their own worktree which consists of all the files in the repo at that reference. When, say a branch is checked out, worktree of that branch at a given commit is loaded and that becomes the current worktree.

In go-git, Worktree provides a way to get a tree, given a commit hash, getTreeFromCommitHash(). This can be used to obtain a tree and use it to perform grep on the tree, avoiding actual checkouts. No actual repo state change. The tree can provide a file iterator to traverse all the tree nodes, which are the files.

To implement this in go-git, GrepOptions and GrepResult types were created to store all the options and result of the operation respectively.

Grep Options

go-git GrepOptions consists of Pattern, PathSpec, InvertMatch, CommitHash and ReferenceName. An example of a GrepOptions is:

opts := GrepOptions{
	Pattern:    regexp.MustCompile("import"),
    CommitHash: plumbing.NewHash("2d55a722f3c3ecc36da919dfd8b6de38352f3507")
	PathSpec:   regexp.MustCompile("go/"),
}

This GrepOptions tells go-git to match the Pattern at the tree of CommitHash and only with the files with path matching the PathSpec.

Since it’s a library, instead of accepting raw strings as pattern and pathspec, it accepts pre-compiled regexp objects.

Option Validation

The GrepOptions is validated to ensure that the passed options are not ambiguous. For example, passing a commit hash and a reference is ambiguous. Also, if no commit hash or reference is passed, the HEAD tree of repo is used.

Grep Results

Results of git-grep are strings sent to stdout, but that can’t be done in case of go-git. Since it’s a library, it returns a type GrepResult, consisting of all the attributes of the result.

GrepResult contains FileName, LineNumber, Content and TreeName. The result of a grep in go-git is an slice of GrepResult.

Example usage:

r, _ := git.PlainOpen("path/to/a/git/repo")
worktree, _ := r.Worktree()
grepResults, err := worktree.Grep(&git.GrepOptions{
    Pattern: regexp.MustCompile("hello world"),
    InvertMatch: true,
})

Grep() is a method of Worktee, similar to Clean(). A lot of the operations are performed on worktree.

Like in case of git-clean, even this is not a complete implementation of git-grep, for the same reasons, keeping it simple. Right now, it supports passing a single pattern in Pattern and PathSpec, it should be able to accept slice of patterns. It can’t perform grep on untracked files. All these and more would be added on top of this base implementation.

go-git is a really nice project to contribute by implementing well known concepts of git. And the maintainers are great with their quick reviews and responses. Thanks to their years of work in the underlying plumbing, things like Worktree, all the git Objects, Storer, etc simplify implementation of various operations on top.

comments powered by Disqus