bibtex

package module
v0.0.0-...-575be89 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 10, 2024 License: BSD-3-Clause Imports: 10 Imported by: 2

README

Bibtex parser in Go

https://pkg.go.dev/github.com/lmondada/bibtex

A parser for Bibtex files, a reference formatting language, in Go. I needed a parser for my static site generator and the existing library https://github.com/caltechlibrary/bibtex was difficult to integrate into idiomatic Go.

  • Uses a real, recursive descent parser based on the Golang parser to read Bibtex files into an AST.
  • Handles parsing different author formats.
  • Reasonably fast: parses a 30,000 line Bibtex file in 16 ms.
go get github.com/lmondada/bibtex

Example: read a bibtex file into an AST

func readBibtexFile() ([]bibtex.Entry, error) {
	f, err := os.Open("refs.bib")
	if err != nil {
		return nil, err
	}
	entries, err := bibtex.Read(f)
	if err != nil {
		return nil, err
	}
	return entries, nil
}

Example: format a bibtex entry into HTML

Formats a Bibtex entry into HTML:

@article{chattopadhyay2019procella,
  title={Procella: Unifying serving and analytical data at YouTube},
  author={Chattopadhyay, Biswapesh and Dutta, Priyam and Liu, Weiran and Tinn, Ott and Mccormick, Andrew and Mokashi, Aniket and Harvey, Paul and Gonzalez, Hector and Lomax, David and Mittal, Sagar and others},
  journal={Proceedings of the VLDB Endowment},
  volume={12},
  number={12},
  pages={2022--2034},
  year={2019},
  publisher={VLDB Endowment}
}

B. Chattopadhyay, P. Dutta, W. Liu, O. Tinn, A. Mccormick, A. Mokashi, P. Harvey, H. Gonzalez, D. Lomax, S. Mittal et al, "Procella: Unifying serving and analytical data at YouTube," in Proceedings of the VLDB Endowment, Vol. 12, 2019.

// formatEntry returns an HTML string in a IEEE citation style.
func formatEntry(entry bibtex.Entry) string {
	w := strings.Builder{}
	w.WriteString("<div>")

	// Format all authors.
	authors := entry.Author
	for i, author := range authors {
		sp := strings.Split(author.First, " ")
		for _, s := range sp {
			if r, _ := utf8.DecodeRuneInString(s); r != utf8.RuneError {
				w.WriteRune(r)
				w.WriteString(". ")
			}
		}
		w.WriteString(author.Last)
		if i < len(authors)-2 {
			w.WriteString(", ")
		} else if i == len(authors)-2 {
			if authors[len(authors)-1].IsOthers() {
				w.WriteString(" <em>et al</em>")
				break

			} else {
				w.WriteString(" and ")
			}
		}
	}

	title := entry.Tags[bibtex.FieldTitle]
	title = trimBraces(title)
	w.WriteString(`, "`)
	w.WriteString(title)
	w.WriteString(`,"`)

	journal := entry.Tags[bibtex.FieldJournal]
	journal = trimBraces(journal)
	if journal != "" {
		w.WriteString(" in <em class=cite-journal>")
		w.WriteString(journal)
		w.WriteString("</em>")
	}

	vol := entry.Tags[bibtex.FieldVolume]
	vol = trimBraces(vol)
	if vol != "" {
		w.WriteString(", Vol. ")
		w.WriteString(vol)
	}

	year := entry.Tags[bibtex.FieldYear]
	year = trimBraces(year)
	if year != "" {
		w.WriteString(", ")
		w.WriteString(year)
	}

	w.WriteString(".")
	w.WriteString(`</div>`)
	return w.String()
}

func trimBraces(s string) string {
	return strings.TrimFunc(s, func(r rune) bool {
		return r == '{' || r == '}'
	})
}

Features

  • Parse authors.
  • Resolve string abbreviations.
  • Resolve Crossref references.

Documentation

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func ExtractAuthors

func ExtractAuthors(txt *ast.ParsedText) (ast.Authors, error)

ExtractAuthors extracts the authors from the parsed text of a bibtex field, usually from the author or editor field of bibtex entry.

func SimplifyEscapedTextResolver

func SimplifyEscapedTextResolver(root ast.Node) error

SimplifyEscapedTextResolver replaces ast.TextEscaped nodes with a plain ast.Text containing the value that was escaped. Meaning, `\&` is converted to `&`.

Types

type AuthorResolver

type AuthorResolver struct {
	// contains filtered or unexported fields
}

AuthorResolver extracts ast.Authors from the expression value of a tag statement.

func NewAuthorResolver

func NewAuthorResolver(tags ...string) AuthorResolver

func (AuthorResolver) Resolve

func (a AuthorResolver) Resolve(root ast.Node) error

type Biber

type Biber struct {
	// contains filtered or unexported fields
}

Biber contains methods for parsing, resolving, and rendering bibtex.

func New

func New(opts ...Option) *Biber
Example (RenderToString)
input := `
    @book{greub2012linear,
      title={Linear algebra},
      author={Greub, {WERNER} H},
      volume={23},
      year={2012},
      publisher={Springer Science \& Business Media}
    }

    @inproceedings{francese2015model,
      title={Model-driven development for multi-platform mobile applications},
      author={Francese, Rita and Risi, Michele and Scanniello, Giuseppe and Tortora, Genoveffa},
      booktitle={Product-Focused Software Process Improvement: 16th International Conference, PROFES 2015, Bolzano, Italy, December 2-4, 2015, Proceedings 16},
      pages={61--67},
      year={2015},
      organization={Springer}
    }`

bib := New(
	WithResolvers(
		// NewAuthorResolver creates a resolver for the "author" field that parses
		// author names into an ast.Authors node.
		NewAuthorResolver("author"),
		// SimplifyEscapedTextResolver replaces ast.TextEscaped nodes with a plain
		// ast.Text containing the value that was escaped. Meaning, `\&` is converted to
		// `&`.
		ResolverFunc(SimplifyEscapedTextResolver),
		// RenderParsedTextResolver replaces ast.ParsedText with a simplified rendering
		// of ast.Text.
		NewRenderParsedTextResolver(),
	),
)

file, err := bib.Parse(strings.NewReader(input))
if err != nil {
	panic(err.Error())
}
entries, err := bib.Resolve(file)
if err != nil {
	panic(err.Error())
}

// Use intermediate type since tag output order is not deterministic.
// Go maps are unordered.
type TagOutput struct {
	Field string
	Value string
}
type EntryOutput struct {
	Type string
	Key  string
	Tags []TagOutput
}
entryOutputs := make([]EntryOutput, 0, len(entries))
for _, entry := range entries {
	tags := make([]TagOutput, 0, len(entry.Tags))
	for field, expr := range entry.Tags {
		switch expr := expr.(type) {
		case ast.Authors:
			sb := strings.Builder{}
			if len(expr) > 0 {
				for i, author := range expr {
					if i > 0 {
						sb.WriteString("\n")
					}
					first := author.First.(*ast.Text).Value
					prefix := author.Prefix.(*ast.Text).Value
					last := author.Last.(*ast.Text).Value
					suffix := author.Suffix.(*ast.Text).Value
					name := fmt.Sprintf("%s %s %s %s", first, prefix, last, suffix)
					name = strings.TrimSpace(name)
					name = strings.Join(strings.Fields(name), " ") // remove consecutive spaces
					sb.WriteString(field)
					sb.WriteString(": ")
					sb.WriteString(name)
				}
				tags = append(tags, TagOutput{Field: field, Value: sb.String()})
			}
		case *ast.Text:
			tags = append(tags, TagOutput{Field: field, Value: fmt.Sprintf("%s: %s", field, expr.Value)})
		default:
			tags = append(tags, TagOutput{Field: field, Value: fmt.Sprintf("%s: %T", field, expr)})
		}
	}
	sort.Slice(tags, func(i, j int) bool {
		return tags[i].Field < tags[j].Field
	})
	entryOutputs = append(entryOutputs, EntryOutput{
		Type: entry.Type,
		Key:  entry.Key,
		Tags: tags,
	})
}

for _, out := range entryOutputs {
	fmt.Printf("type: %s\n", out.Type)
	fmt.Printf("key: %s\n", out.Key)
	for _, tag := range out.Tags {
		fmt.Println(tag.Value)
	}
	fmt.Println()
}
Output:

type: book
key: greub2012linear
author: WERNER H Greub
publisher: Springer Science & Business Media
title: Linear algebra
volume: 23
year: 2012

type: inproceedings
key: francese2015model
author: Rita Francese
author: Michele Risi
author: Giuseppe Scanniello
author: Genoveffa Tortora
booktitle: Product-Focused Software Process Improvement: 16th International Conference, PROFES 2015, Bolzano, Italy, December 2-4, 2015, Proceedings 16
organization: Springer
pages: 61--67
title: Model-driven development for multi-platform mobile applications
year: 2015

func (*Biber) Parse

func (b *Biber) Parse(r io.Reader) (*ast.File, error)

func (*Biber) Render

func (b *Biber) Render(w io.Writer, root ast.Node) error

func (*Biber) Resolve

func (b *Biber) Resolve(node ast.Node) ([]Entry, error)

Resolve resolves all bibtex entries from an AST. The AST is a faithful representation of source code. By default, resolving the AST means replacing all abbreviation expressions with the value, inlining concatenation expressions, simplifying tag values by replacing TeX quote macros with Unicode graphemes, and stripping Tex macros.

The exact resolve steps are configurable using bibtex.WithResolvers.

type CiteKey

type CiteKey = string

CiteKey is the citation key for a Bibtex entry, like the "foo" in:

@article{ foo }

type Entry

type Entry struct {
	Type EntryType
	Key  CiteKey
	// All tags in the entry with the corresponding expression value.
	Tags map[Field]ast.Expr
}

Entry is a Bibtex entry, like an @article{} entry, that provides the rendered plain text of the entry.

type EntryType

type EntryType = string

EntryType is the type of Bibtex entry. An "@article" entry is represented as "article". String alias to allow for unknown entries.

const (
	EntryArticle       EntryType = "article"
	EntryBook          EntryType = "book"
	EntryBooklet       EntryType = "booklet"
	EntryInBook        EntryType = "inbook"
	EntryInCollection  EntryType = "incollection"
	EntryInProceedings EntryType = "inproceedings"
	EntryManual        EntryType = "manual"
	EntryMastersThesis EntryType = "mastersthesis"
	EntryMisc          EntryType = "misc"
	EntryPhDThesis     EntryType = "phdthesis"
	EntryProceedings   EntryType = "proceedings"
	EntryTechReport    EntryType = "techreport"
	EntryUnpublished   EntryType = "unpublished"
)

type Field

type Field = string

Field is a single field in a Bibtex Entry.

const (
	FieldAddress      Field = "address"
	FieldAnnote       Field = "annote"
	FieldAuthor       Field = "author"
	FieldBookTitle    Field = "booktitle"
	FieldChapter      Field = "chapter"
	EntryDOI          Field = "doi"
	FieldCrossref     Field = "crossref"
	FieldEdition      Field = "edition"
	FieldEditor       Field = "editor"
	FieldHowPublished Field = "howpublished"
	FieldInstitution  Field = "institution"
	FieldJournal      Field = "journal"
	FieldKey          Field = "key"
	FieldMonth        Field = "month"
	FieldNote         Field = "note"
	FieldNumber       Field = "number"
	FieldOrganization Field = "organization"
	FieldPages        Field = "pages"
	FieldPublisher    Field = "publisher"
	FieldSchool       Field = "school"
	FieldSeries       Field = "series"
	FieldTitle        Field = "title"
	FieldType         Field = "type"
	FieldVolume       Field = "volume"
	FieldYear         Field = "year"
)

type Option

type Option func(*Biber)

Option is a functional option to change how Bibtex is parsed, resolved, and rendered.

func WithParserMode

func WithParserMode(mode parser.Mode) Option

WithParserMode sets the parser options overwriting any previous parser options. parser.Mode is a bitflag so use bit-or for multiple flags like so:

WithParserMode(parser.ParserStrings|parser.Trace)

func WithRenderer

func WithRenderer(kind ast.NodeKind, r render.NodeRendererFunc) Option

WithRenderer sets the renderer for the node kind, replacing the previous renderer.

func WithResolvers

func WithResolvers(rs ...Resolver) Option

WithResolvers appends the resolvers to the list of resolvers. Resolvers run in the order given.

type RenderParsedTextResolver

type RenderParsedTextResolver struct {
	// contains filtered or unexported fields
}

RenderParsedTextResolver replaces ast.ParsedText with a simplified rendering of ast.Text.

func NewRenderParsedTextResolver

func NewRenderParsedTextResolver() *RenderParsedTextResolver

func (*RenderParsedTextResolver) Resolve

func (r *RenderParsedTextResolver) Resolve(root ast.Node) error

type Resolver

type Resolver interface {
	Resolve(n ast.Node) error
}

Resolver is an in-place mutation of an ast.Node to support resolving Bibtex entries. Typically, the mutations simplify the AST to support easier manipulation, like replacing ast.EscapedText with the escaped value.

type ResolverFunc

type ResolverFunc func(n ast.Node) error

func (ResolverFunc) Resolve

func (r ResolverFunc) Resolve(n ast.Node) error

Directories

Path Synopsis
Package ast declares the types used to represent syntax trees for bibtex files.
Package ast declares the types used to represent syntax trees for bibtex files.
Package asts contains utilities for constructing and manipulating ASTs.
Package asts contains utilities for constructing and manipulating ASTs.
Package parser is the exported entry points for invoking the parser.
Package parser is the exported entry points for invoking the parser.
Package scanner implements a scanner for bibtex source text.
Package scanner implements a scanner for bibtex source text.
Package token defines constants representing the lexical tokens of the bibtex language and basic operations on tokens (printing, predicates).
Package token defines constants representing the lexical tokens of the bibtex language and basic operations on tokens (printing, predicates).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL