Documentation ¶
Overview ¶
Copyright © 2024 AntoninoAdornetto
The c.go file is responsible for satisfying the `LexicalTokenizer` interface in the `lexer.go` file. The methods are a strict set of rules for handling single & multi line comments for c-like languages. The result, if an issue annotation is located, is a slice of tokens that will provide information about the action item contained in the comment. If a comment does not contain an issue annotation, all subsequent tokens of the remaining comment bytes will be ignored and removed from the `DraftTokens` slice.
Copyright © 2024 AntoninoAdornetto ¶
The lexer.go file is responsible for creating a `Base` Lexer, consuming and iterating through bytes of source code, and determining which `Target` Lexer to use for the Tokenization process.
Base Lexer: The name Lexer may be a bit misleading for the Base Lexer. There is no strict rule set baked into the receiver methods. However, the `Base` Lexer has a very important role of sharing byte consumption methods to `Target` Lexers. For example, we don't want to re-write .next(), .peek() or .nextLexeme() multiple times for Target Lexers since the logic for said methods are not specific to the Target Lexer and won't change.
Target Lexer: Simply put, a `Target` Lexer is the Lexer that handles the Tokenization rule set. For this application, we are only concerned with creating single and multi line comments. More specifically, we are concerned with single and multi line comments that contain an issue annotation.
`Target` Lexers are created via the `NewTargetLexer` method. The `Base` Lexer is passed to the function, via dependency injection, as input and is stored within each `Target` Lexer so that targets can access the shared byte consumption methods. `Target` Lexers must satisfy the methods contained in the `LexicalTokenizer` interface. I know I mentioned we are only concerned with Comments in source code but you will notice a requirement for a `String` method in the interface. We must account for strings to combat an edge case. Let me explain, if we are lexing a python string that contains a hash character "#" (comment notation symbol), our lexer could very well- explode. Same could be said for c or go strings that contain 1 or more forward slashes "/". String tokens are not persisted, just consumed until the closing delimiter is located.
Lastly, it's important to mention how `Target` Lexers are created. When instantiating a new `Base` Lexer, the src code file path is provided. This path is utilized to read the base file extension. If the file extension is .c, .go, .cpp, .h ect, then we would return a Target Lexer that supports c-like comment syntax since they all denote single and multi line comments with the same notation. For .py files, we would return a PythonLexer and so on.
Index ¶
Constants ¶
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Clexer ¶ added in v1.0.0
type Clexer struct { Base *Lexer // holds shared byte consumption methods DraftTokens []Token // Unvalidated tokens // contains filtered or unexported fields }
func (*Clexer) AnalyzeToken ¶ added in v1.0.0
type Comment ¶
type Comment struct { TokenAnnotationIndex int Title, Description string TokenStartIndex int // location of the first token TokenEndIndex int // location of the last token AnnotationPos []int // start/end index of the annotation IssueNumber int // will contain a non 0 value if the comment has been reported LineNumber int NotationStartIndex int // index of where the comment starts NotationEndIndex int // index of where the comment ends }
type CommentManager ¶ added in v1.0.0
type CommentManager struct {
Comments []Comment
}
func BuildComments ¶ added in v1.0.0
func BuildComments(tokens []Token) (CommentManager, error)
type Lexer ¶
type Lexer struct { FilePath string FileName string Src []byte // source code bytes Tokens []Token // comment tokens after lexical analysis has been complete Start int // byte index Current int // byte index, used in conjunction with Start to construct tokens Line int // Line number Annotation []byte // issue annotation to search for within comments // contains filtered or unexported fields }
func (*Lexer) AnalyzeTokens ¶
func (base *Lexer) AnalyzeTokens(target LexicalTokenizer) ([]Token, error)
type LexicalTokenizer ¶ added in v1.0.0
type LexicalTokenizer interface { AnalyzeToken() error String(delim byte) error Comment() error // contains filtered or unexported methods }
AnalyzeToken - checks the current byte from [Lexer.peek()] and determines how we should process the proceeding bytes String - tokens from the string method are not stored. It's needed to prevent lexing comment notation within a string Comment - The bread and butter of our target lexers. Handles processing single & multi line comments processLexeme - transforms the lexeme into a token and appends it to the draft tokens contained in the target lexer struct
func NewTargetLexer ¶ added in v1.0.0
func NewTargetLexer(base *Lexer) (LexicalTokenizer, error)
type ShellLexer ¶ added in v1.2.0
type ShellLexer struct { Base *Lexer // holds shared byte consumption methods DraftTokens []Token // Unvalidated tokens // contains filtered or unexported fields }
func (*ShellLexer) AnalyzeToken ¶ added in v1.2.0
func (sh *ShellLexer) AnalyzeToken() error
func (*ShellLexer) Comment ¶ added in v1.2.0
func (sh *ShellLexer) Comment() error
func (*ShellLexer) String ¶ added in v1.2.0
func (sh *ShellLexer) String(delim byte) error
type Token ¶
type TokenType ¶
type TokenType = uint16
const ( TOKEN_SINGLE_LINE_COMMENT_START TokenType = 1 << iota TOKEN_SINGLE_LINE_COMMENT_END TOKEN_MULTI_LINE_COMMENT_START TOKEN_MULTI_LINE_COMMENT_END TOKEN_COMMENT_ANNOTATION TOKEN_ISSUE_NUMBER TOKEN_COMMENT_TITLE TOKEN_COMMENT_DESCRIPTION TOKEN_SINGLE_LINE_COMMENT TOKEN_MULTI_LINE_COMMENT TOKEN_OPEN_PARAN TOKEN_CLOSE_PARAN TOKEN_HASH TOKEN_UNKNOWN TOKEN_EOF )