larkdown

package module
v0.0.8 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 2, 2024 License: MIT Imports: 12 Imported by: 0

README

larkdown

Lock down your markdown.

Larkdown allows you to treat markdown files as a tree where headings are branches, to extract data from that tree, and then either update and render the tree back to markdown, or continue to render to HTML.

It lets you treat this:

# Title

## Subheading

### Sub-subheading

- a list
- of things

## Another subheading

Some content

like this

{
  "Title": [
    "# Title",
    {
      "Subheading": [
        "## Subheading",
        {
          "Sub-subheading": [
            "### Sub-subheading",
            {
              "list": ["a list", "of things"]
            }
          ]
        }
      ]
    },
    {
      "Another subheading": ["## Another subheading", "some content"]
    }
  ]
}

and then query that data structure to find a node. With a node you can then decode it into useful data like strings and slices of strings, or change it and re-save back to markdown.

Specially larkdown takes an AST generated from the excellent goldmark library for parsing Commonmark markdown, and lest you query, update, and re-render that AST. This makes it easy to take a markdown file, run it through Goldmark, query some structured data, and then either finish using Goldmark to render the file to HTML, or make some updates and save back to markdown.

Motivation

This library acts as a test bed for an idea - markdown has the excellent property of being good for both machine and human reading. Therefor (with the right tooling) it should be possible to use Markdown files as an extremely portable data store. You can author and edit data using a markdown editor like Obsidian, use it to back a web application (with editing capabilities!), and write scripts to slice and dice the data. And if you want to switch or give up a tool at some point, it's no problem - it's just markdown, you don't need to export or transform it.

Larkdown is an attempt to build that tooling.

Usage

go get github.com/will-wow/larkdown

You can use larkdown to pull out important data about a file before sending it to a frontend to be rendered:

package larkdown_test

import (
	"bytes"
	"fmt"

	"github.com/yuin/goldmark"
	"github.com/yuin/goldmark/extension"
	"github.com/yuin/goldmark/text"
	"go.abhg.dev/goldmark/hashtag"

	"github.com/will-wow/larkdown"
	"github.com/will-wow/larkdown/match"
)

var recipeMarkdown = `
# My Recipe

Here's a long story about making dinner.

## Tags

#dinner #chicken

## Ingredients

- Chicken
- Vegetables
- Salt
- Pepper

## Comments

| Name      | Comment    |
| --------- | ---------- |
| Alice     | It's good! |
| Bob       | It's bad   |

`

type Comment struct {
	Name    string
	Comment string
}

func (c Comment) String() string {
	return fmt.Sprintf("%s: %s", c.Name, c.Comment)
}

type Recipe struct {
	Tags        []string
	Ingredients []string
	Comments    []Comment
	Html        bytes.Buffer
}

func Example() {
	source := []byte(recipeMarkdown)
	// Preprocess the markdown into goldmark AST
	md := goldmark.New(
		// Parse hashtags to they can be matched against.
		goldmark.WithExtensions(
			extension.Table,
			&hashtag.Extender{Variant: hashtag.ObsidianVariant},
		),
	)
	doc := md.Parser().Parse(text.NewReader(source))

	recipe := Recipe{}

	// ====
	// Get the ingredients from the list
	// ====

	// Set up a ingredientsQuery for the first list under ## Ingredients
	ingredientsQuery := []match.Node{
		match.Branch{Level: 1},
		match.Branch{Level: 2, Name: []byte("Ingredients")},
		match.Index{Index: 0, Node: match.List{}},
	}

	// Decode the list items into a slice of strings
	ingredients, err := larkdown.Find(doc, source, ingredientsQuery, larkdown.DecodeListItems)
	if err != nil {
		panic(fmt.Errorf("couldn't find an ingredients list: %w", err))
	}
	recipe.Ingredients = ingredients

	// ====
	// Get the tags from the file
	// ====

	// Matcher for the tags header
	tagsQuery := []match.Node{
		match.Branch{Level: 2, Name: []byte("Tags")},
	}

	// Find all Tags under the tags header, and decode their contents into strings.
	tags, err := larkdown.FindAll(doc, source, tagsQuery, match.Tag{}, larkdown.DecodeTag)
	if err != nil {
		// This will not return an error if there are no tags, only if something else went wrong.
		panic(fmt.Errorf("error finding tags: %w", err))
	}
	recipe.Tags = tags

	tableQuery := []match.Node{
		match.Branch{Level: 2, Name: []byte("Comments")},
		match.Table{},
	}

	// ====
	// Get the comments from a table
	// ====

	// Get data from the comments table
	commentsTable, err := larkdown.Find(doc, source, tableQuery, larkdown.DecodeTableToMap)
	if err != nil {
		panic(fmt.Errorf("error finding comments: %w", err))
	}
	for _, comment := range commentsTable {
		recipe.Comments = append(recipe.Comments, Comment{
			Name:    comment["Name"],
			Comment: comment["Comment"],
		})
	}

	fmt.Println(recipe.Ingredients)
	fmt.Println(recipe.Tags)
	fmt.Println(recipe.Comments)

	// Output:
	// [Chicken Vegetables Salt Pepper]
	// [dinner chicken]
	// [Alice: It's good! Bob: It's bad]
}

Or you can use it to update a markdown file in-place, and still render to HTML afterwards:

package mdrender_test

import (
	"bytes"
	"fmt"
	"strings"

	"github.com/yuin/goldmark"
	"github.com/yuin/goldmark/ast"
	"github.com/yuin/goldmark/parser"
	"github.com/yuin/goldmark/text"
	"go.abhg.dev/goldmark/hashtag"

	"github.com/will-wow/larkdown"
	"github.com/will-wow/larkdown/gmast"
	"github.com/will-wow/larkdown/match"
	"github.com/will-wow/larkdown/mdfront"
	"github.com/will-wow/larkdown/mdrender"
	"github.com/will-wow/larkdown/query"
)

var postMarkdown = `# Markdown in Go

## Tags

#markdown #golang

In this essay I will explain...
`

type PostData struct {
	Slug string `yaml:"slug"`
}

// Matcher for a line of #tags under the heading ## Tags
var tagsQuery = []match.Node{
	match.Branch{Level: 2, Name: []byte("tags"), CaseInsensitive: true},
	match.NodeOfKind{Kind: ast.KindParagraph},
}

var titleQuery = []match.Node{match.Heading{Level: 1}}

func ExampleNewRenderer() {
	source := []byte(postMarkdown)
	// Preprocess the markdown into goldmark AST
	md := goldmark.New(
		goldmark.WithExtensions(
			// Parse hashtags to they can be matched against.
			&hashtag.Extender{Variant: hashtag.ObsidianVariant},
			// Support frontmatter rendering.
			// This does nothing on its own, but sets up a place to render frontmatter to.
			&mdfront.Extender{},
		),
	)

	// Set up context for the metadata
	context := parser.NewContext()
	// Parse the markdown into an AST, with context
	doc := md.Parser().Parse(text.NewReader(source), parser.WithContext(context))

	// ====
	// Get the tags from the file
	// ====

	// Find the tags header to append to
	tagsLine, err := query.QueryOne(doc, source, tagsQuery)
	if err != nil {
		panic(fmt.Errorf("error finding tags heading: %w", err))
	}

	// ====
	// Edit the AST to add a new tag
	// ====

	// Create a new tag
	space, source := gmast.NewSpace(source)
	hashtag, source := gmast.NewHashtag("testing", source)

	// Append the new tag to the tags line
	gmast.AppendChild(tagsLine,
		space,
		hashtag,
	)

	// ====
	// Add a slug to the post's frontmatter.
	// ====

	// Find the title header to use as a slug
	// In practice, you might want to also pull this frontmatter out of an existing document
	// using goldmark-meta or goldmark-frontmatter.
	title, err := larkdown.Find(doc, source, titleQuery, larkdown.DecodeText)
	if err != nil {
		panic(fmt.Errorf("error finding title: %w", err))
	}

	// Slugify the title
	slug := strings.ReplaceAll(strings.ToLower(title), " ", "-")

	// Set up a struct to render the frontmatter
	data := &PostData{Slug: slug}

	// ====
	// Use larkdown renderer to render back to markdown
	// ====
	var newMarkdown bytes.Buffer
	// Here we set up the renderer outside the goldmark.New call, so you can use the normal
	// goldmark HTML renderer, and also render back to markdown.
	err = larkdown.NewNodeRenderer(
		// Pass the metaData to the renderer to render back to markdown
		mdrender.WithFrontmatter(data),
	).Render(&newMarkdown, source, doc)
	if err != nil {
		panic(fmt.Errorf("error rendering Markdown: %w", err))
	}

	// ====
	// Also render to HTML
	// ====
	var html bytes.Buffer
	err = md.Renderer().Render(&html, source, doc)
	if err != nil {
		panic(fmt.Errorf("error rendering HTML: %w", err))
	}

	// The new #testing tag is after the #golang tag in the HTML output
	fmt.Println("HTML:")
	fmt.Println(html.String())

	// The new #testing tag is after the #golang tag in the markdown
	fmt.Println("Markdown:")
	fmt.Println(newMarkdown.String())

	// Output:
	// HTML:
	// <h1>Markdown in Go</h1>
	// <h2>Tags</h2>
	// <p><span class="hashtag">#markdown</span> <span class="hashtag">#golang</span> <span class="hashtag">#testing</span></p>
	// <p>In this essay I will explain...</p>
	//
	// Markdown:
	// ---
	// slug: markdown-in-go
	// ---
	//
	// # Markdown in Go
	//
	// ## Tags
	//
	// #markdown #golang #testing
	//
	// In this essay I will explain...
}

Roadmap

  • basic querying and unmarshaling of headings, lists, and text
  • make sure this works with extracting front matter
  • make sure this doesn't interfere with rendering the markdown to HTML with goldmark
  • tag matchers/decoders
  • handle finding multiple matches
  • generic matcher for any goldmark kind
  • basic markdown renderer
  • Full markdown renderer
  • Basic markdown editing tools
  • Move markdown editing tools
  • options for recording extra debugging data for failed matches
  • use options to support not setting a matcher or decoder
  • handle a list of matchers for FindAll extractors
  • matchers/decoders for more nodes:
    • codeblocks by language
    • tables with slice of string map output
    • tables with structured output
  • add an "end on" option for branches, to end on the next subheading of a specific level
  • nth instance matcher for queries like "the second list"
  • query validator to make sure it even makes sense
  • query syntax based on CSS selectors
  • Update queries to fit with CSS selectors
  • cli for selector queries
  • generic unmarshaler into json
  • benchmark
  • more docs and tests

Alternatives

  • markdown-to-json: Python-based library for parsing markdown into JSON with a similar nested style.

Contributing

Install task

This project uses task as its task runner.

# macos
brew install go-task/tap/go-task

# linux/wsl
sh -c "$(curl --location https://taskfile.dev/install.sh)" -- -d

Or follow the installation instructions for more options.

List commands

For a list of all commands for this project, run

task --list
Format
task fmt
Lint
task lint
Test
task test
Make ready for a commit
# runs fmt lint test
task ready
Publish
VERSION=0.0.N task publish

Documentation

Overview

Example
package main

import (
	"bytes"
	"fmt"

	"github.com/yuin/goldmark"
	"github.com/yuin/goldmark/extension"
	"github.com/yuin/goldmark/text"
	"go.abhg.dev/goldmark/hashtag"

	"github.com/will-wow/larkdown"
	"github.com/will-wow/larkdown/match"
)

var recipeMarkdown = `
# My Recipe

Here's a long story about making dinner.

## Tags

#dinner #chicken

## Ingredients

- Chicken
- Vegetables
- Salt
- Pepper

## Comments

| Name      | Comment    |
| --------- | ---------- |
| Alice     | It's good! |
| Bob       | It's bad   |

`

type Comment struct {
	Name    string
	Comment string
}

func (c Comment) String() string {
	return fmt.Sprintf("%s: %s", c.Name, c.Comment)
}

type Recipe struct {
	Tags        []string
	Ingredients []string
	Comments    []Comment
	Html        bytes.Buffer
}

func main() {
	source := []byte(recipeMarkdown)
	// Preprocess the markdown into goldmark AST
	md := goldmark.New(
		// Parse hashtags to they can be matched against.
		goldmark.WithExtensions(
			extension.Table,
			&hashtag.Extender{Variant: hashtag.ObsidianVariant},
		),
	)
	doc := md.Parser().Parse(text.NewReader(source))

	recipe := Recipe{}

	// ====
	// Get the ingredients from the list
	// ====

	// Set up a ingredientsQuery for the first list under ## Ingredients
	ingredientsQuery := []match.Node{
		match.Branch{Level: 1},
		match.Branch{Level: 2, Name: []byte("Ingredients")},
		match.Index{Index: 0, Node: match.List{}},
	}

	// Decode the list items into a slice of strings
	ingredients, err := larkdown.Find(doc, source, ingredientsQuery, larkdown.DecodeListItems)
	if err != nil {
		panic(fmt.Errorf("couldn't find an ingredients list: %w", err))
	}
	recipe.Ingredients = ingredients

	// ====
	// Get the tags from the file
	// ====

	// Matcher for the tags header
	tagsQuery := []match.Node{
		match.Branch{Level: 2, Name: []byte("Tags")},
	}

	// Find all Tags under the tags header, and decode their contents into strings.
	tags, err := larkdown.FindAll(doc, source, tagsQuery, match.Tag{}, larkdown.DecodeTag)
	if err != nil {
		// This will not return an error if there are no tags, only if something else went wrong.
		panic(fmt.Errorf("error finding tags: %w", err))
	}
	recipe.Tags = tags

	tableQuery := []match.Node{
		match.Branch{Level: 2, Name: []byte("Comments")},
		match.Table{},
	}

	// ====
	// Get the comments from a table
	// ====

	// Get data from the comments table
	commentsTable, err := larkdown.Find(doc, source, tableQuery, larkdown.DecodeTableToMap)
	if err != nil {
		panic(fmt.Errorf("error finding comments: %w", err))
	}
	for _, comment := range commentsTable {
		recipe.Comments = append(recipe.Comments, Comment{
			Name:    comment["Name"],
			Comment: comment["Comment"],
		})
	}

	fmt.Println(recipe.Ingredients)
	fmt.Println(recipe.Tags)
	fmt.Println(recipe.Comments)

}
Output:

[Chicken Vegetables Salt Pepper]
[dinner chicken]
[Alice: It's good! Bob: It's bad]

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func DecodeListItems added in v0.0.1

func DecodeListItems(node ast.Node, source []byte) (out []string, err error)

Decode an ast.List into a slice of strings for each item

func DecodeTableToMap added in v0.0.7

func DecodeTableToMap(node ast.Node, source []byte) ([]map[string]string, error)

DecodeTableToMap decodes a table node into a slice of maps of column names to column string values.

func DecodeTag added in v0.0.1

func DecodeTag(node ast.Node, source []byte) (string, error)

Decode a #tag parsed by go.abhg.dev/goldmark/hashtag into a string. Only the text content of the tag is returned, not the # prefix.

func DecodeText added in v0.0.1

func DecodeText(node ast.Node, source []byte) (string, error)

Decode all the text inside any node

func Find added in v0.0.1

func Find[T any](
	doc ast.Node,
	source []byte,
	matcher []match.Node,
	fn func(node ast.Node, source []byte) (T, error),
	opts ...FindOption,
) (out T, err error)

Use a matcher to find a node, and then decode its contents and return structured data.

Example
package main

import (
	"fmt"

	"github.com/yuin/goldmark"
	"github.com/yuin/goldmark/text"

	"github.com/will-wow/larkdown"
	"github.com/will-wow/larkdown/match"
)

var findMarkdown = `
# My Recipe

Here's a long story about making dinner.

## Tags

#dinner #chicken

## Ingredients

- Chicken
- Vegetables
- Salt
- Pepper
`

func main() {
	source := []byte(findMarkdown)
	// Preprocess the markdown into goldmark AST
	md := goldmark.New()
	doc := md.Parser().Parse(text.NewReader(source))

	// Set up a ingredientsQuery for the first list under ## Ingredients
	ingredientsQuery := []match.Node{
		match.Branch{Level: 1},
		match.Branch{Level: 2, Name: []byte("Ingredients")},
		match.Index{Index: 0, Node: match.List{}},
	}

	// Decode the list items into a slice of strings
	ingredients, err := larkdown.Find(doc, source, ingredientsQuery, larkdown.DecodeListItems)
	if err != nil {
		panic(fmt.Errorf("couldn't find an ingredients list: %w", err))
	}

	fmt.Println(ingredients)

}
Output:

[Chicken Vegetables Salt Pepper]

func FindAll added in v0.0.1

func FindAll[T any](
	doc ast.Node,
	source []byte,
	matcher []match.Node,
	extractor match.Node,
	fn func(node ast.Node, source []byte) (T, error),
	opts ...FindAllOption,
) (out []T, err error)

Use a matcher to find a all nodes, then decode its contents and return structured data.

Example
package main

import (
	"fmt"

	"github.com/yuin/goldmark"
	"github.com/yuin/goldmark/text"
	"go.abhg.dev/goldmark/hashtag"

	"github.com/will-wow/larkdown"
	"github.com/will-wow/larkdown/match"
)

var findAllMarkdown = `
# My Recipe

Here's a long story about making dinner.

## Tags

#dinner #chicken

## Ingredients

- Chicken
- Vegetables
- Salt
- Pepper
`

func main() {
	source := []byte(findAllMarkdown)
	// Preprocess the markdown into goldmark AST
	md := goldmark.New(
		// Parse hashtags to they can be matched against.
		goldmark.WithExtensions(
			&hashtag.Extender{Variant: hashtag.ObsidianVariant},
		),
	)
	doc := md.Parser().Parse(text.NewReader(source))

	tagsQuery := []match.Node{
		match.Branch{Level: 2, Name: []byte("Tags")},
	}

	// Find all Tags under the tags header, and decode their contents into strings.
	tags, err := larkdown.FindAll(doc, source, tagsQuery, match.Tag{}, larkdown.DecodeTag)
	if err != nil {
		// This will not return an error if there are no tags, only if something else went wrong.
		panic(fmt.Errorf("error finding tags: %w", err))
	}

	fmt.Println(tags)

}
Output:

[dinner chicken]

func NewNodeRenderer added in v0.0.4

func NewNodeRenderer(opts ...mdrender.Option) renderer.Renderer

NewNodeRenderer returns a new goldmark NodeRenderer with default config that renders nodes as Markdown.

Types

type FindAllConfig added in v0.0.8

type FindAllConfig struct {
	// AllowNoMatch allows FindAll to return nil when no match is found. By default it will return a query.QueryError.
	AllowNoMatch bool
}

FindAllConfig configures the FindAll function.

type FindAllOption added in v0.0.8

type FindAllOption func(*FindAllConfig)

FindAllOption describes a functional option for FindAll.

func FindAllAllowNoMatch added in v0.0.8

func FindAllAllowNoMatch() FindAllOption

AllowNoMatch allows FindAll to return nil when no match is found. By default it will return a query.QueryError.

type FindConfig added in v0.0.8

type FindConfig struct {
	// AllowNoMatch allows Find to return nil when no match is found. By default it will return a query.QueryError.
	AllowNoMatch bool
}

FindConfig configures the Find function.

type FindOption added in v0.0.8

type FindOption func(*FindConfig)

FindOption describes a functional option for Find.

func FindAllowNoMatch added in v0.0.8

func FindAllowNoMatch() FindOption

FindAllowNoMatch allows Find to return nil when no match is found. By default it will return a query.QueryError.

Directories

Path Synopsis
gmast provides some helper functions for working with goldmark's AST.
gmast provides some helper functions for working with goldmark's AST.
internal
test
Test helpers
Test helpers
match provides a query language for matching nodes in a larkdown.Tree
match provides a query language for matching nodes in a larkdown.Tree
Package mdfront adds support for rendering frontmatter to markdown for goldmark.
Package mdfront adds support for rendering frontmatter to markdown for goldmark.
query handles finding a match in a tree, but not unmarshaling the node.
query handles finding a match in a tree, but not unmarshaling the node.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL