gitrim

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 29, 2023 License: MIT Imports: 14 Imported by: 0

README

gitrim

Git Trim is a deterministic tool to manipulate trees contained in git commits.

Go Reference

Trim/Filter Git History

Often, the history and files contained in a git repository need to be filtered in some way. One simple case will be a contributor is only allowed to access part of the repo. This can be done through git-filter-branch, although no so user-friendly.

gitrim does just that:

  1. read git commit history.
  2. from start, filter the tree contained in the commit and copy over author, committor, commit message. The parents are replaced with the newly created commits, and GPG signatures are omitted.

As long as the filters don't change, the generated git history is deterministic and can be one-to-one mapped back to the original repo.

Modifications made in the trimmed/filtered repo can be recreated by

  1. filter the changes
  2. apply them back to the original repo, copying over author, committor, commit message, and add the original commits as parents. GPG signatures are again omitted.

The commits in the filtered/trimmed repo will match the commit reproduced from original repo if they are without GPG signatures.

Filters

The filter all implements the Filter interface

The pattern used is a more restricted version of the pattern used by .gitignore.

  • ** is for multi level directories, and it can only appear once in the match.
  • * is for match one level of names.
  • ! and escapes are unsupported.
  • paths are always relative to the root. For example, LICENSE will only match LICENSE in the root of the repo. To match LICENSE at all directory levels, use **/LICENSE.

Refer to documentation on PatternFilter

Example

See Example

CLI

  • filter-git-hist filters the history of a git repo and output it to another git repo.
  • [expand-git-commit](cmd/expand-git-commit] expands the new commit back to the original repo.
  • dump-git-tree prints the files of a branch/tree/commit/head. Optionally filters can be applied.
  • remve-git-gpg removes gpg signatures for commits.

Documentation

Overview

gitrim is filtered git repo generator. It provides the functionality to create a linear history of commits from another linear history commits by applying selected filters on the entries contained in the commit tree.

See FilterLinearHistory and ExpandCommit for details.

See Filter and PatternFilter for how to use the filters.

Example

Example cloning a repo into in-memory store, select several commits from a specific commit, and filter it into another in-memory store.

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/go-git/go-git/v5"
	"github.com/go-git/go-git/v5/plumbing"
	"github.com/go-git/go-git/v5/storage/memory"

	"github.com/fardream/gitrim"
)

func OrPanic(err error) {
	if err != nil {
		log.Panic(err)
	}
}

// Example cloning a repo into in-memory store, select several commits from a specific commit, and filter it into another in-memory store.
func main() {
	// URL for the repo
	url := "https://github.com/fardream/gmsk"
	// commit to start from
	headcommithash := plumbing.NewHash("e0235243feee0ec1bde865be5fa2c0b761eff804")

	// Clone repo
	r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{
		URL: url,
	})
	OrPanic(err)

	// find the commit
	headcommit, err := r.CommitObject(headcommithash)
	OrPanic(err)

	// obtain the history of the repo.
	hist, err := gitrim.GetLinearHistory(context.Background(), headcommit, plumbing.ZeroHash, 10)
	OrPanic(err)

	// select 3 files
	orfilter, err := gitrim.NewOrFilterForPatterns(
		"README.md",
		"LICENSE",
		"capis.go",
	)
	OrPanic(err)

	// output storer
	outputfs := memory.NewStorage()

	newhist, err := gitrim.FilterLinearHistory(context.Background(), hist, outputfs, orfilter)
	OrPanic(err)

	// Note the result is deterministic
	fmt.Printf("From %d commits, generated %d commits.\nHead commit is:\n", len(hist), len(newhist))
	fmt.Println(newhist[5].String())

}
Output:

From 10 commits, generated 6 commits.
Head commit is:
commit 65e88d11b1331c3031945587c4c28635886fdc92
Author: Chao Xu <fardream@users.noreply.github.com>
Date:   Sat Sep 02 20:19:42 2023 -0400

    Update doc for slice input. (#57)

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func CopyTree

func CopyTree(ctx context.Context, t *object.Tree, s storer.Storer) error

CopyTree copies the given tree into the storer.Storer. If the tree already exists in s, function returns nil error right away.

func DumpTree

func DumpTree(ctx context.Context, prepath []string, tree *object.Tree, filter Filter, output io.Writer) error

DumpTree writes the file entries in this tree and its sub trees to an io.Writer.

func ExpandCommit

func ExpandCommit(
	ctx context.Context,
	sourceStorer storer.Storer,
	filteredOrig *object.Commit,
	filteredNew *object.Commit,
	target *object.Commit,
	targetStorer storer.Storer,
	filter Filter,
) (*object.Commit, error)

ExpandCommit added the changes contained in the filteredNew to filteredOrig and try to apply them to target, it will generate a new commit.

func ExpandTree

func ExpandTree(
	ctx context.Context,
	sourceStorer storer.Storer,
	filteredOrig *object.Tree,
	filteredNew *object.Tree,
	target *object.Tree,
	targetStorer storer.Storer,
	filter Filter,
) (*object.Tree, error)

ExpandTree apply the changes made in the filteredNew tree to filteredOrig tree and apply them to target tree, it returns a new tree.

func FilterCommit

func FilterCommit(
	ctx context.Context,
	c *object.Commit,
	parents []*object.Commit,
	s storer.Storer,
	filters Filter,
) (*object.Commit, bool, error)

FilterCommit creates a new object.Commit in the given storer.Storer by applying filters to the tree in the input object.Commit. Optionally parent commits can set on the generated commit. The author info, committor info, commit message will be copied from the input commit. Howver, GPG sign information will be dropped. The function returns three values, the new commit, a boolean indicating if the returned commit is actually parent containing the same tree, or an error.

  • If after filtering, the tree is empty, a nil will be returned, isparent will be set to false, and error will also be nil.
  • If the generated tree is exactly the same as the parent's, the parent commit will be returned, isparent bool will be set to true.

Submodules will be silently ignored.

func FilterDFSPath added in v0.2.0

func FilterDFSPath(ctx context.Context, dfspath []*object.Commit, s storer.Storer, filter Filter) ([]*object.Commit, error)

FilterDFSPath filters a slice of object.Commit that comes from a Depth First Search from a commit - this means the earlier commits should come first in the input slice dfspath, and the head/latest commit should come the last. dfspath can be obtained by GetDFSPath. The result is saved into a storer.Store.

  • The commits without parents will become the new roots of the filtered repo.
  • Filtered commits containing empty trees will be dropped, and subsequent commits following that path will have next non-nil commit as the new root.
  • Filtered commits containing the exact same tree as its parent will also be dropped, and commit after it will consider its parent its own parent.

The newly created commits will have exact same author info, committor info, commit message, but will parent correctly linked and gpg sign information dropped.

Example

Example cloning a repo into in-memory store, select several commits from a specific commit, and filter it into another in-memory store.

package main

import (
	"context"
	"fmt"
	"log"
	"strings"

	"github.com/go-git/go-git/v5"
	"github.com/go-git/go-git/v5/plumbing"
	"github.com/go-git/go-git/v5/storage/memory"

	"github.com/fardream/gitrim"
)

func FilterDFSPanic(err error) {
	if err != nil {
		log.Panic(err)
	}
}

func removeEmptyLines(s string) string {
	lines := strings.Split(s, "\n")
	r := make([]string, 0, len(lines))
	for _, line := range lines {
		if len(strings.TrimSpace(line)) == 0 {
			r = append(r, "")
		} else {
			r = append(r, line)
		}
	}

	return strings.Join(r, "\n")
}

// Example cloning a repo into in-memory store, select several commits from a specific commit, and filter it into another in-memory store.
func main() {
	// URL for the repo
	url := "https://github.com/go-git/go-git"
	// commit to start from
	headcommithash := plumbing.NewHash("7d047a9f8a43bca9d137d8787278265dd3415219")

	// Clone repo
	r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{
		URL: url,
	})
	FilterDFSPanic(err)

	// find the commit
	headcommit, err := r.CommitObject(headcommithash)
	FilterDFSPanic(err)

	graph, err := gitrim.GetDFSPath(context.Background(), headcommit, []plumbing.Hash{plumbing.NewHash("99e2f85843878671b028d4d01bd4668676226dd1")}, 90)

	FilterDFSPanic(err)

	// select 3 files
	orfilter, err := gitrim.NewOrFilterForPatterns(
		"README.md",
		"LICENSE",
		"plumbing/**/*.go",
	)
	FilterDFSPanic(err)

	// output storer
	outputfs := memory.NewStorage()

	newgraph, err := gitrim.FilterDFSPath(context.Background(), graph, outputfs, orfilter)
	FilterDFSPanic(err)

	// Note the result is deterministic
	fmt.Printf("From %d commits, generated %d commits.\nHead commit is:\n", len(graph), len(newgraph))

	commitinfo := newgraph[5].String()
	commitinfo = removeEmptyLines(strings.ReplaceAll(commitinfo, "\r\n", "\n"))
	fmt.Println(commitinfo)

	lastcommit := newgraph[88]
	fmt.Println("parents:")
	fmt.Println(lastcommit.ParentHashes[0])
	fmt.Println(lastcommit.ParentHashes[1])

}
Output:

From 241 commits, generated 89 commits.
Head commit is:
commit d5f3d5523dcd0e977f555831385eae31ccd8a30d
Author: cui fliter <imcusg@gmail.com>
Date:   Thu Sep 22 16:27:41 2022 +0800

    *: fix some typos (#567)

    Signed-off-by: cui fliter <imcusg@gmail.com>

    Signed-off-by: cui fliter <imcusg@gmail.com>

parents:
a6fae4bd1c424c3e7da6bc5c4ac8397a9f28db92
f09651ec4e2589543cf3ce89167c46cc43f3c0cd

func FilterLinearHistory

func FilterLinearHistory(
	ctx context.Context,
	hist []*object.Commit,
	s storer.Storer,
	filter Filter,
) ([]*object.Commit, error)

FilterLinearHistory performs filters on a sequence of commits of a linear history and produces new commits in the provided storer.Store. Similar to FilterDFSPath:

  • The first commit will become the new root of the filtered repo.
  • Filtered commits containing empty trees cause all previous commits to be dropped. The next commit with non-empty tree will become the new root.
  • Filtered commits containing the exact same tree as its parent will also be dropped, and commit after it will consider its parent its own parent.

The input commits can be obtained from GetLinearHistory.

func FilterTree

func FilterTree(
	ctx context.Context,
	t *object.Tree,
	prepath []string,
	s storer.Storer,
	filter Filter,
) (*object.Tree, error)

FilterTree filters the entries of the tree by the filter and stores it in the given storer.Storer. If after filtering the tree is empty, nil will be returned for the tree and the error.

Note: Submodules will be silently ignored.

func GetDFSPath added in v0.2.0

func GetDFSPath(
	ctx context.Context,
	head *object.Commit,
	rootcommits []plumbing.Hash,
	maxGeneration int,
) ([]*object.Commit, error)

GetDFSPath gets a deterministic depth first search path from a head commit, the returned slice has the head commit as the last one in the slice, and one of the root commits as the first of the slice. The search always search the first parent, then second, and so-on, therefore the commits first returned are history from git command with "--first-parent" parameter.

rootcommits can be optionally set so the search will stop for that path if one of those commits is seen. Max generation can be turned off by setting it to any value that is 0 or negative.

Example
package main

import (
	"context"
	"fmt"
	"log"

	"github.com/go-git/go-git/v5"
	"github.com/go-git/go-git/v5/plumbing"
	"github.com/go-git/go-git/v5/storage/memory"

	"github.com/fardream/gitrim"
)

func GetDFSPathPanic(err error) {
	if err != nil {
		log.Panic(err)
	}
}

func main() {
	// URL for the repo
	url := "https://github.com/go-git/go-git"
	// commit to start from
	headcommithash := plumbing.NewHash("7d047a9f8a43bca9d137d8787278265dd3415219")

	// Clone repo
	r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{
		URL: url,
	})
	GetDFSPathPanic(err)

	// find the commit
	headcommit, err := r.CommitObject(headcommithash)
	GetDFSPathPanic(err)

	graph, err := gitrim.GetDFSPath(context.Background(), headcommit, nil, 0)

	GetDFSPathPanic(err)

	fmt.Println(len(graph))
	fmt.Println(graph[0].Hash.String())
	fmt.Println(graph[len(graph)-1].Hash.String())

}
Output:

1986
5d7303c49ac984a9fec60523f2d5297682e16646
7d047a9f8a43bca9d137d8787278265dd3415219

func GetHash

func GetHash(o object.Object) (*plumbing.Hash, error)

GetHash returns the hash of the

func GetLinearHistory

func GetLinearHistory(
	ctx context.Context,
	head *object.Commit,
	startHash plumbing.Hash,
	numCommit int,
) ([]*object.Commit, error)

GetLinearHistory produces the linear history from a given head commit.

  • the number of commits can be limit by number of commits included in the history. A limit <= 0 indicates no limit on how many commits can be returned
  • the start commit can be specified by the startHash

It returns an error when more than one parents exist for the commit in the historical list.

func LoadPatternStringFromString

func LoadPatternStringFromString(str string, ignoreUnsupported bool) ([]string, error)

LoadPatternStringFromString loads from the string content of a pattern file like .gitignore. Similar to LoadPatternFilterFromString, a false ignoreUnsupported will error if unsupported patterns are encountered.

func RemoveGPGForDFSPath added in v0.2.0

func RemoveGPGForDFSPath(ctx context.Context, dfspath []*object.Commit, s storer.Storer) ([]*object.Commit, error)

RemoveGPGForDFSPath removes gpg signatures from a depth first search graph and save the nwe commits into s.

Example
package main

import (
	"context"
	"fmt"
	"log"

	"github.com/go-git/go-git/v5"
	"github.com/go-git/go-git/v5/plumbing"
	"github.com/go-git/go-git/v5/storage/memory"

	"github.com/fardream/gitrim"
)

func RemoveGPGForDFSPanic(err error) {
	if err != nil {
		log.Panic(err)
	}
}

func main() {
	// URL for the repo
	url := "https://github.com/go-git/go-git"
	// commit to start from
	headcommithash := plumbing.NewHash("7d047a9f8a43bca9d137d8787278265dd3415219")

	// Clone repo
	r, err := git.Clone(memory.NewStorage(), nil, &git.CloneOptions{
		URL: url,
	})
	RemoveGPGForDFSPanic(err)

	// find the commit
	headcommit, err := r.CommitObject(headcommithash)
	RemoveGPGForDFSPanic(err)

	graph, err := gitrim.GetDFSPath(context.Background(), headcommit, []plumbing.Hash{plumbing.NewHash("99e2f85843878671b028d4d01bd4668676226dd1")}, 90)

	RemoveGPGForDFSPanic(err)

	// output storer
	outputfs := memory.NewStorage()

	newgraph, err := gitrim.RemoveGPGForDFSPath(context.Background(), graph, outputfs)
	RemoveGPGForDFSPanic(err)

	// Note the result is deterministic
	fmt.Printf("From %d commits, generated %d commits.\n", len(graph), len(newgraph))

	lastcommit := newgraph[240]
	fmt.Println("last commit hash:")
	fmt.Println(lastcommit.Hash)
	fmt.Println("parents:")
	fmt.Println(lastcommit.ParentHashes[0])
	fmt.Println(lastcommit.ParentHashes[1])

}
Output:

From 241 commits, generated 241 commits.
last commit hash:
dc860bcd4bf62d0f90c518022e75621ecbe62885
parents:
4a2c8f269c2f122b814f767dab8f579bea6466cd
bc0c0692b987229363edb4f591a6eb3318e3ae67

func RemoveGPGForLinearHistory

func RemoveGPGForLinearHistory(ctx context.Context, hist []*object.Commit, s storer.Storer) ([]*object.Commit, error)

RemoveGPGForLinearHistory removes gpg signature from the commits and save the new commits into s.

func SetLogger

func SetLogger(l *slog.Logger)

SetLogger sets the logger used by gitrim. The default one comes from slog.Default.

Types

type AndFilter

type AndFilter struct {
	// contains filtered or unexported fields
}

AndFilter combines multiple Filter into one Filter with an "and" operation, the path will only be included when all the filters include it.

func NewAndFilter

func NewAndFilter(filters ...Filter) *AndFilter

NewAndFilter creates a new filter with and operations.

func (*AndFilter) Add

func (f *AndFilter) Add(filters ...Filter)

func (*AndFilter) Filter

func (f *AndFilter) Filter(paths []string, isdir bool) FilterResult

type CachedFilter

type CachedFilter struct {
	// contains filtered or unexported fields
}

CachedFilter records the paths it sees - the cache is no concurrent safe.

func NewCachedFilter

func NewCachedFilter(underlying Filter) *CachedFilter

func (*CachedFilter) Filter

func (f *CachedFilter) Filter(paths []string, isdir bool) FilterResult

func (*CachedFilter) Reset

func (f *CachedFilter) Reset()

Reset clears up the cache

type FilePatchError

type FilePatchError struct {
	FromFile string
	ToError  string
}

FilePatchError is an error containing the information about the invalid file patch.

func (*FilePatchError) Error

func (e *FilePatchError) Error() string

type Filter

type Filter interface {
	Filter(paths []string, isdir bool) FilterResult
}

Filter is the interface used to filter the path of the tree.

func NewOrFilterForPatterns

func NewOrFilterForPatterns(patterns ...string) (Filter, error)

NewOrFilterForPatterns creates a new Or filter for all the patterns

type FilterResult

type FilterResult uint8

FilterResult indicates the result of a filter, it can be

  • the input is out
  • the input is directory, and its entries should be filtered
  • the input is in

the logic or operation for filter result:

  • If out and in, in
  • If out and dir_dive, dir_dive
  • If dir_dive and in, in

the logic and operation for filter result:

  • if out and int, out
  • if out and dir_dive, out
  • if dir_dive and in, dir_dive

Notice that the enum values has FilterResult_In at 2, FilterResult_DirDive at 1, and FilterResult_Out at 0, therefore the or operation is finding the max, and and operation is finding the min.

const (
	FilterResult_Out     FilterResult = iota // Out
	FilterResult_DirDive                     // DirDive
	FilterResult_In                          // In
)

func FilterPath

func FilterPath(f Filter, fullpath string, isdir bool) FilterResult

FilterPath calls Filter f on fullpath string.

func FilterResultsAnd

func FilterResultsAnd(r ...FilterResult) FilterResult

FilterResultsAnd perform and operation on the filter results:

  • if out and int, out
  • if out and dir_dive, out
  • if dir_dive and in, dir_dive

func FilterResultsOr

func FilterResultsOr(r ...FilterResult) FilterResult

FilterResultsOr perform or operation on filter results:

  • If out and in, in
  • If out and dir_dive, dir_dive
  • If dir_dive and in, in

This is equivalent to take the max value of the input.

func PatternDirFilter

func PatternDirFilter(paths []string, filtersegs []PatternFilterSegment) FilterResult

PatternDirFilter filters the directory according to a directory filter.

The result is "In", if filters match all the leading path segments, and there are zero or more path trailing. Below are two examples of "In"

// path matches all filter segments, and path has extra segments
| p | p | p | p
| f | f | f
// path matches all filter segments, and path has no extra segmetns
| p | p | p
| f | f | f

The result is "DirDive", if size of path segments is smaller than filters, and those path segments match the corresponding filters

| p | p | p
| f | f | f | f

For empty paths or filtersegs, it will always return "Out".

func (FilterResult) IsIn

func (r FilterResult) IsIn() bool

If the filter result is in

func (FilterResult) String

func (i FilterResult) String() string

type OrFilter

type OrFilter struct {
	// contains filtered or unexported fields
}

OrFilter combines multiple Filter into one Filter with an "or" operation, the path will be inclueded if any one of the filters includes it.

func NewOrFilter

func NewOrFilter(filters ...Filter) *OrFilter

func (*OrFilter) Add

func (f *OrFilter) Add(filters ...Filter)

func (*OrFilter) Filter

func (f *OrFilter) Filter(paths []string, isdir bool) FilterResult

type PatternFilter

type PatternFilter struct {
	// contains filtered or unexported fields
}

PatternFilter filters the entries according to a restricted pattern of gitignore

  • `**` is for multi level directories, and it can only appear once in the match.
  • `*` is for match one level of names.
  • `!` and escapes are unsupported.
  • paths are always relative to the root. For example, `LICENSE` will only match `LICENSE` in the root of the repo. To match `LICENSE` at all directory levels, use `**/LICENSE`.

func LoadPatternFilterFromString

func LoadPatternFilterFromString(str string, ignoreUnsupported bool) ([]*PatternFilter, error)

LoadPatternFilterFromString loads the string content of a pattern file like .gitignore. If ignoreUnsupported is set to false, the loader will error if the any unsupported patterns like ! (reverse) is encountered.

func NewPatternFilter

func NewPatternFilter(pattern string) (*PatternFilter, error)

func (*PatternFilter) Filter

func (f *PatternFilter) Filter(paths []string, isdir bool) FilterResult

type PatternFilterSegment

type PatternFilterSegment string

PatternFilterSegment is a segment in PatternFilter

type TrueFilter

type TrueFilter struct{}

TrueFilter always return FilterResult_In for any input.

func NewTrueFilter

func NewTrueFilter() *TrueFilter

func (TrueFilter) Filter

func (TrueFilter) Filter(path []string, isdir bool) FilterResult

Directories

Path Synopsis
cmd
cmd package contains helper functions for various commands.
cmd package contains helper functions for various commands.
dump-git-tree
dump-git-tree dumps the git tree and optionally apply pattern filters.
dump-git-tree dumps the git tree and optionally apply pattern filters.
expand-git-commit
expand-git-commit adds back the changes made in a repo filtered by filter-git-hist to the unfiltered repo.
expand-git-commit adds back the changes made in a repo filtered by filter-git-hist to the unfiltered repo.
filter-git-hist
filter-git-hist is a more robust but limited git-filter-branch.
filter-git-hist is a more robust but limited git-filter-branch.
remove-git-gpg
remove-git-gpg removes gpg information from series of commits.
remove-git-gpg removes gpg information from series of commits.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL