Outline
The rfc5322
package implements a parser for address-list
and date-time
strings, as defined in RFC5322.
It also supports encoded words (RFC2047) and has international tokens (RFC6532).
Generated code
The lexer and parser are generated using ANTLR4.
The grammar is defined in the g4 files:
- RFC5322Parser.g4 defines the parser grammar,
- RFC5322Lexer.g4 defines the lexer grammar.
These grammars are derived from the ABNF grammar provided in the RFCs mentioned above,
albeit with some relaxations added to support "nonstandard" (and in some cases, bad) input.
Running go generate
generates a parser which recognises strings conforming to the grammar:
- parser/rfc5322_lexer.go
- parser/rfc5322parser_base_listener.go
- parser/rfc5322_parser.go
- parser/rfc5322parser_listener.go
The generated parser can then be used to convert a valid address/date into an abstract syntax tree.
Parsing
Once we have an abstract syntax tree, we must turn it into something usable, namely a mail.Address
or time.Time
.
The generated code in the parser
directory implements a walker.
This walker walks over the abstract syntax tree,
calling a callback when entering and another when when exiting each node.
By default, the callbacks are no-ops, unless they are overridden.
walker.go
The walker
type extends the base walker, overriding the default no-op callbacks
to do something specific when entering and exiting certain nodes.
The goal of the walker is to traverse the syntax tree, picking out relevant information from each node's text.
For example, when parsing a mailbox
node, the relevant information to pick out from the parse tree is the
name and address of the mailbox. This information can appear in a number of different ways, e.g. it might be
RFC2047 word-encoded, it might be a string with escaped chars that need to be handled, it might have comments
that should be ignored, and so on.
So while walking the syntax tree, each node needs to ask its children what their "value" is.
The mailbox
needs to ask its child nodes (either a nameAddr
node or an addrSpec
node)
what the name and address are.
If the child node is a nameAddr
, it needs to ask its displayName
child what the name is
and the angleAddr
what the address is; these in turn ask word
nodes, addrSpec
nodes, etc.
Each child node is responsible for telling its parent what its own value is.
The parent is responsible for assembling the children into something useful.
Ideally, this would be done with the visitor pattern. But unfortunately, the generated parser only
provides a walker interface. So we need to make use of a stack, pushing on nodes when we enter them
and popping off nodes when we exit them, to turn the walker into a kind of visitor.
parser.go
This file implements two methods,
ParseAddressList(string) ([]*mail.Address, error)
and
ParseDateTime(string) (time.Time, error)
.
These methods set up a parser from the raw input, start the walker, and convert the walker result
into an object of the correct type.
Example: Parsing dateTime
Parsing a date-time is rather simple. The implementation begins in date_time.go
. The abridged code is below:
type dateTime struct {
year int
...
}
func (dt *dateTime) withYear(year *year) {
dt.year = year.value
}
...
func (w *walker) EnterDateTime(ctx *parser.DateTimeContext) {
w.enter(&dateTime{
loc: time.UTC,
})
}
func (w *walker) ExitDateTime(ctx *parser.DateTimeContext) {
dt := w.exit().(*dateTime)
w.res = time.Date(dt.year, ...)
}
As you can see, when the walker reaches a dateTime
node, it pushes a dateTime
object onto the stack:
w.enter(&dateTime{
loc: time.UTC,
})
and when it leaves a dateTime
node, it pops it off the stack,
converting it from interface{}
to the concrete type,
and uses the parsed dateTime
values like day, month, year etc
to construct a go time.Time
object to set the walker result:
dt := w.exit().(*dateTime)
w.res = time.Date(dt.year, ...)
These parsed values were discovered while the walker continued to walk across the date-time node.
Let's see how the walker discovers the year
.
Here is the abridged code of what happens when the walker enters a year
node:
type year struct {
value int
}
func (w *walker) EnterYear(ctx *parser.YearContext) {
var text string
for _, digit := range ctx.AllDigit() {
text += digit.GetText()
}
val, err := strconv.Atoi(text)
if err != nil {
w.err = err
}
w.enter(&year{
value: val,
})
}
When entering the year
node, it collects all the raw digits, which are strings, then
converts them to an integer, and sets that as the year's integer value while pushing it onto the stack.
When exiting, it pops the year off the stack and gives itself to the parent (now on the top of the stack).
It doesn't know what type of object the parent is, it just checks to see if anything above it on the stack
is expecting a year
node:
func (w *walker) ExitYear(ctx *parser.YearContext) {
type withYear interface {
withYear(*year)
}
res := w.exit().(*year)
if parent, ok := w.parent().(withYear); ok {
parent.withYear(res)
}
}
In our case, the date
is expecting a year
node because it implements withYear
,
func (dt *dateTime) withYear(year *year) {
dt.year = year.value
}
and that is how the dateTime
data members are collected.