Documentation ¶
Overview ¶
Author: Paul F. Dunn, https://github.com/paulfdunn/ Original source location: https://github.com/paulfdunn/go-parser This code is licensed under the MIT license. Please keep this attribution when replicating/copying/reusing the code.
Package parser was written to support parsing of log files that were written for human consumption and are generally difficult to parse. See the associtated test file for comments and examples with output.
Index ¶
- Constants
- func Hash(input string, format HashFormat) (string, error)
- func Hash8(input string, format HashFormat) (string, error)
- func SortedHashMapCounts(inputMap map[string]int) []string
- type Extract
- type HashFormat
- type Inputs
- type Replacement
- type Scanner
- func (scnr *Scanner) Extract(row []string) ([]string, []error)
- func (scnr *Scanner) Filter(row string) bool
- func (scnr *Scanner) HashingEnabled() bool
- func (scnr *Scanner) OpenFileScanner(filePath string) (err error)
- func (scnr *Scanner) OpenIoReaderScanner(ior io.Reader)
- func (scnr *Scanner) Read(databuffer int, errorBuffer int) (<-chan string, <-chan error)
- func (scnr *Scanner) Replace(row string) string
- func (scnr *Scanner) Shutdown()
- func (scnr *Scanner) Split(row string) ([]string, error)
- func (scnr *Scanner) SplitsExcludeHashColumns(splits []string, hashFormat HashFormat) ([]string, error)
- func (scnr *Scanner) SplitsToSql(numColumns int, table string, splits []string, extracts []string) string
Examples ¶
Constants ¶
const ( // Replacement regex that match this string will be replaced with unixmicro values to save // storage space. DATE_TIME_REGEX = "(\\d{4}-\\d{2}-\\d{2}[ -]\\d{2}:\\d{2}:\\d{2})" )
Variables ¶
This section is empty.
Functions ¶
func Hash ¶
func Hash(input string, format HashFormat) (string, error)
Hash returns the hex string of the MD5 hash of the input. Call this on fields where values have been extracted in order to perform pareto analysis on the resulting hashes. This can also be used to reduce storage space when storing in a database by replacing multiple fields with a single hash, and keeping a separate table mapping hashes to original field values.
func Hash8 ¶ added in v1.0.8
func Hash8(input string, format HashFormat) (string, error)
Hash8 implements the djb2 hash described here: http://www.cse.yorku.ca/~oz/hash.html and returns only 8 bytes.
func SortedHashMapCounts ¶
Convenience function to sort a map of hashes based on counts. Used to help develop extracts and hashes in order to reduce the total number of hashes.
Types ¶
type Extract ¶
type Extract struct { Columns []int RegexString string Submatch int Token string // contains filtered or unexported fields }
Extract objects determine how extractions (Scanner.Extract) occur. The RegexString is converted to a regex and is run against the specified data columns (after Split). Submatches is used to index submatches returned from regex.FindAllStringSubmatch(regex,-1) which are returned. The submatches are replaced with Token in the source data. Note on submatch indexing: The first item is the full match, so submatch indeces start at 1 not zero (https://pkg.go.dev/regexp#Regexp.FindAllStringSubmatch)
type HashFormat ¶ added in v1.0.1
type HashFormat int
The hash can be output in a pure string format (I.E. "0xdeadbeef") or a format compatible for importing into Sqlite3 as a Blob (I.E. x'deadbeef').
const ( HASH_FORMAT_STRING HashFormat = iota HASH_FORMAT_SQL )
type Inputs ¶ added in v0.0.2
type Inputs struct { DataDirectory string ExpectedFieldCount int Extracts []*Extract HashColumns []int InputDelimiter string NegativeFilter string OutputDelimiter string PositiveFilter string ProcessedInputDirectory string Replacements []*Replacement SqlQuoteColumns []int }
Inputs to parser. This object is just used for unmarshalling inputs from a file. The values are then stored with the scanner; see Scanner for details.
type Replacement ¶
type Replacement struct { Replacement string RegexString string // contains filtered or unexported fields }
Replacement objects determine how replacements (Scanner.Replacement) occur. The RegexString is converted to a regex and is run against input row (unsplit), with matches being replaced by RegexString.
type Scanner ¶
type Scanner struct { HashColumns []int HashCounts map[string]int HashMap map[string]string OutputDelimiter string // contains filtered or unexported fields }
Scanner is the main object of this package. dataDirectory - Directory with input files. expectedFieldCount - Expected number of fields after calling Split. extract - Extract objects; used for extracting values from rows into their own fields. hashColumns - Column indeces (zero index) of Split data used to create the hash. inputDelimiter - Regexp used by Split to split rows of data. negativeFilter - Regex used for negative filtering. Rows matching this value are excluded. outDelimiter - String used to delimit parsed output data. positiveFilter - Regex used for positive filtering. Rows must match to be included. processedInputDirectory - When Read completes, move the file to this directory; empty string means the file is left in place. replace - Replacement values used for performing regex replacements on input data. sqlQuoteColumns - When using SQL ouput, these columns will be quoted.
Example (ReplaceAndSplit) ¶
ExampleScanner_replaceAndSplit shows how to use the Split function. In this case the data is then Join'ed back together just for output purposed. Note that the call to Split drops the error that ExpectedFieldCount was incorrect. callers can choose to enforce the error, or not. Also note that the DATE_TIME_REGEX creates an additional column when used on datetime values with fractional seconds, as the fractional seconds become an additional field. Storing epoch time reduces storage compared to a string, but converting back to an SQL DATETIME is easier with seconds in their own field.
delimiter := `\s\s` delimiterString := " " rplc := []*Replacement{ {RegexString: `\s\s+`, Replacement: delimiterString}, {RegexString: DATE_TIME_REGEX}, {RegexString: `\.([0-9]+)\s+`, Replacement: delimiterString + "${1}" + delimiterString}, } defaultInputs, _ := NewInputs("./test/testInputs.json") defaultInputs.InputDelimiter = delimiter defaultInputs.ExpectedFieldCount = 8 defaultInputs.Replacements = rplc scnr := openFileScanner(filepath.Join(testDataDirectory, "test_split.txt"), *defaultInputs) dataChan, errorChan := scnr.Read(100, 100) fullData := []string{} splitData := []string{} for row := range dataChan { fullData = append(fullData, row) splits, _ := scnr.Split(scnr.Replace(row)) splitData = append(splitData, strings.Join(splits, "|")) } for err := range errorChan { fmt.Println(err) } fmt.Println("\nInput data:") fmt.Printf("%+v", strings.Join(fullData, "\n")) fmt.Println("\n\nSplit data:") fmt.Printf("%+v", strings.Join(splitData, "\n"))
Output: Input data: 2023-10-07 12:00:00 MDT 0 0 notification debug multi word type sw_a Debug SW message 2023-10-07 12:00:00 MDT 1 001 notification info SingleWordType sw_b Info SW message 2023-10-07 12:00:00.02 MDT 1 002 status info alphanumeric value sw_a Message with alphanumberic value abc123def 2023-10-07 12:00:00.03 MDT 1 003 status info alphanumeric value sw_a Message with extra delimiters Split data: 1696680000 MDT|0|0|notification|debug|multi word type|sw_a|Debug SW message 1696680000 MDT|1|001|notification|info|SingleWordType|sw_b|Info SW message 1696680000|02|MDT|1|002|status|info|alphanumeric value|sw_a|Message with alphanumberic value abc123def 1696680000|03|MDT|1|003|status|info|alphanumeric value|sw_a|Message|with|extra|delimiters
func NewScanner ¶
NewScanner is a constuctor for Scanners. See the Scanner definition for a description of inputs.
func (*Scanner) Extract ¶
Extract takes an input row slice (call Split to split a row on scnr.inputDelimiter) and applies the scnr.extract values to extract values from a column.
Example (AndHash) ¶
ExampleScanner_Extract_tosql shows how to extract data and hash a field, and also shows SQL output. The assumption with SQL output is that you create a table that can take the maximum number of extracts as NULLable strings. Note that the order of the extracts is based on the order of the extract expression evaluation, NOT the order of the data in the original string. Hash - Note that hashing a field after extracting unique data results in equal hashes. This is useful in order to calculate a pareto of message types regardless of some unique data.
delimiter := `\s\s+` delimiterString := " " extracts := []*Extract{ { // capture string that starts with alpha or number, and contains alpha, number, [_.-:], that has leading space delimited Columns: []int{7}, RegexString: "(^|\\s+)(([0-9]+[a-zA-Z_\\.-]|[a-zA-Z_\\.-]+[0-9])[a-zA-Z0-9\\.\\-_:]*)", Token: "${1}{}", Submatch: 2, }, { // capture word or [\\._] preceeded by' word=' Columns: []int{7}, RegexString: "(^|\\s+)([\\w]+[:=])([\\w:\\._]+)", Token: "${1}${2}{}", Submatch: 3, }, { // capture word or [\\.] in paretheses Columns: []int{7}, RegexString: "(\\()([\\w:\\.]+)(\\))", Token: "${1}{}${3}", Submatch: 2, }, { // capture hex number preceeded by space Columns: []int{7}, RegexString: "(^|\\s+)(0x[a-fA-F0-9]+)", Token: "${1}{}", Submatch: 2, }, { // capture number and [\\.:_] preceeded by space Columns: []int{7}, RegexString: "(^|\\s+)([0-9\\.:_]+)", Token: "${1}{}", Submatch: 2, }, } defaultInputs, _ := NewInputs("./test/testInputs.json") defaultInputs.NegativeFilter = `serial number` defaultInputs.InputDelimiter = delimiter defaultInputs.Replacements = []*Replacement{{RegexString: `\s\s+`, Replacement: delimiterString}} defaultInputs.Extracts = extracts defaultInputs.HashColumns = []int{3, 4, 5, 7} defaultInputs.SqlQuoteColumns = []int{0, 4} scnr := openFileScanner(filepath.Join(testDataDirectory, "test_extract.txt"), *defaultInputs) dataChan, errorChan := scnr.Read(100, 100) fullData := []string{} extractData := []string{} extractExcludeColumnsData := []string{} sql := []string{} sqlShort := []string{} for row := range dataChan { if scnr.Filter(row) { continue } splits, _ := scnr.Split(row) fullData = append(fullData, strings.Join(splits, "|")) extracts, _ := scnr.Extract(splits) hd, _ := Hash(splits[3]+splits[4]+splits[5]+splits[7], HASH_FORMAT_STRING) extractData = append(extractData, strings.Join(splits, "|")+ "|EXTRACTS|"+strings.Join(extracts, "|")+ "| hash:"+hd) sehc, _ := scnr.SplitsExcludeHashColumns(splits, HASH_FORMAT_STRING) extractExcludeColumnsData = append(extractExcludeColumnsData, strings.Join(sehc, "|")+ "|EXTRACTS|"+strings.Join(extracts, "|")+ "| hash:"+hd) sql = append(sql, scnr.SplitsToSql(10, "parsed", sehc, extracts)) sqlShort = append(sqlShort, scnr.SplitsToSql(7, "parsed", sehc, extracts)) } for err := range errorChan { fmt.Println(err) } fmt.Printf("Hashing is enabled: %t", scnr.HashingEnabled()) fmt.Println("\nInput data:") fmt.Printf("%+v", strings.Join(fullData, "\n")) fmt.Println("\n\nExtract(ed) data:") fmt.Printf("%+v", strings.Join(extractData, "\n")) fmt.Println("\n\nExtract(ed) data excluding hashed columns:") fmt.Printf("%+v", strings.Join(extractExcludeColumnsData, "\n")) fmt.Println("\n\nSQL:") fmt.Printf("%s", strings.Join(sql, "\n")) fmt.Println("\n\nSQL with numColumns truncating extracts:") fmt.Printf("%s", strings.Join(sqlShort, "\n"))
Output: Hashing is enabled: true Input data: 2023-10-07 12:00:00.00 MDT|0|0|notification|debug|multi word type|sw_a|Unit 12.Ab.34 message (789) 2023-10-07 12:00:00.01 MDT|1|001|notification|info|SingleWordType|sw_b|Info SW version = 1.2.34 release=a.1.1 2023-10-07 12:00:00.02 MDT|1|002|status|info|alphanumeric value|sw_a|Message with alphanumberic value abc123def 2023-10-07 12:00:00.03 MDT|1|003|status|info|alphanumeric value|sw_a|val:1 flag:x20 other:X30 on 127.0.0.1:8080 2023-10-07 12:00:00.04 MDT|1|004|status|info|alphanumeric value|sw_a|val=2 flag = 30 other 3.cd on (ABC.123_45) 2023-10-07 12:00:00.05 MDT|1|005|status|info|alphanumeric value|sw_a|val=3 flag = 40 other 4.ef on (DEF.678_90) 2023-10-07 12:00:00.06 MDT|1|006|status|info|alphanumeric value|sw_a|val=4 flag = 50 other 5.gh on (GHI.098_76) Extract(ed) data: 2023-10-07 12:00:00.00 MDT|0|0|notification|debug|multi word type|sw_a|Unit {} message ({})|EXTRACTS|12.Ab.34|789| hash:'0xa5a3dba744d3c6f1372f888f54447553' 2023-10-07 12:00:00.01 MDT|1|001|notification|info|SingleWordType|sw_b|Info SW version = {} release={}|EXTRACTS|1.2.34|a.1.1| hash:'0x9bd3989cf85b232ddadd73a1a312b249' 2023-10-07 12:00:00.02 MDT|1|002|status|info|alphanumeric value|sw_a|Message with alphanumberic value {}|EXTRACTS|abc123def| hash:'0x7f0e8136c3aec6bbde74dfbad17aef1c' 2023-10-07 12:00:00.03 MDT|1|003|status|info|alphanumeric value|sw_a|val:{} flag:{} other:{} on {}|EXTRACTS|127.0.0.1:8080|1|x20|X30| hash:'0x4907fb17a4212e2e09897fafa1cb758a' 2023-10-07 12:00:00.04 MDT|1|004|status|info|alphanumeric value|sw_a|val={} flag = {} other {} on ({})|EXTRACTS|3.cd|2|ABC.123_45|30| hash:'0x1b7739c1e24d3a837e7821ecfb9a1be1' 2023-10-07 12:00:00.05 MDT|1|005|status|info|alphanumeric value|sw_a|val={} flag = {} other {} on ({})|EXTRACTS|4.ef|3|DEF.678_90|40| hash:'0x1b7739c1e24d3a837e7821ecfb9a1be1' 2023-10-07 12:00:00.06 MDT|1|006|status|info|alphanumeric value|sw_a|val={} flag = {} other {} on ({})|EXTRACTS|5.gh|4|GHI.098_76|50| hash:'0x1b7739c1e24d3a837e7821ecfb9a1be1' Extract(ed) data excluding hashed columns: 2023-10-07 12:00:00.00 MDT|0|0|'0xa5a3dba744d3c6f1372f888f54447553'|sw_a|EXTRACTS|12.Ab.34|789| hash:'0xa5a3dba744d3c6f1372f888f54447553' 2023-10-07 12:00:00.01 MDT|1|001|'0x9bd3989cf85b232ddadd73a1a312b249'|sw_b|EXTRACTS|1.2.34|a.1.1| hash:'0x9bd3989cf85b232ddadd73a1a312b249' 2023-10-07 12:00:00.02 MDT|1|002|'0x7f0e8136c3aec6bbde74dfbad17aef1c'|sw_a|EXTRACTS|abc123def| hash:'0x7f0e8136c3aec6bbde74dfbad17aef1c' 2023-10-07 12:00:00.03 MDT|1|003|'0x4907fb17a4212e2e09897fafa1cb758a'|sw_a|EXTRACTS|127.0.0.1:8080|1|x20|X30| hash:'0x4907fb17a4212e2e09897fafa1cb758a' 2023-10-07 12:00:00.04 MDT|1|004|'0x1b7739c1e24d3a837e7821ecfb9a1be1'|sw_a|EXTRACTS|3.cd|2|ABC.123_45|30| hash:'0x1b7739c1e24d3a837e7821ecfb9a1be1' 2023-10-07 12:00:00.05 MDT|1|005|'0x1b7739c1e24d3a837e7821ecfb9a1be1'|sw_a|EXTRACTS|4.ef|3|DEF.678_90|40| hash:'0x1b7739c1e24d3a837e7821ecfb9a1be1' 2023-10-07 12:00:00.06 MDT|1|006|'0x1b7739c1e24d3a837e7821ecfb9a1be1'|sw_a|EXTRACTS|5.gh|4|GHI.098_76|50| hash:'0x1b7739c1e24d3a837e7821ecfb9a1be1' SQL: INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.00 MDT',0,0,'0xa5a3dba744d3c6f1372f888f54447553','sw_a','12.Ab.34','789',NULL,NULL,NULL); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.01 MDT',1,001,'0x9bd3989cf85b232ddadd73a1a312b249','sw_b','1.2.34','a.1.1',NULL,NULL,NULL); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.02 MDT',1,002,'0x7f0e8136c3aec6bbde74dfbad17aef1c','sw_a','abc123def',NULL,NULL,NULL,NULL); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.03 MDT',1,003,'0x4907fb17a4212e2e09897fafa1cb758a','sw_a','127.0.0.1:8080','1','x20','X30',NULL); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.04 MDT',1,004,'0x1b7739c1e24d3a837e7821ecfb9a1be1','sw_a','3.cd','2','ABC.123_45','30',NULL); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.05 MDT',1,005,'0x1b7739c1e24d3a837e7821ecfb9a1be1','sw_a','4.ef','3','DEF.678_90','40',NULL); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.06 MDT',1,006,'0x1b7739c1e24d3a837e7821ecfb9a1be1','sw_a','5.gh','4','GHI.098_76','50',NULL); SQL with numColumns truncating extracts: INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.00 MDT',0,0,'0xa5a3dba744d3c6f1372f888f54447553','sw_a','12.Ab.34','789'); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.01 MDT',1,001,'0x9bd3989cf85b232ddadd73a1a312b249','sw_b','1.2.34','a.1.1'); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.02 MDT',1,002,'0x7f0e8136c3aec6bbde74dfbad17aef1c','sw_a','abc123def',NULL); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.03 MDT',1,003,'0x4907fb17a4212e2e09897fafa1cb758a','sw_a','127.0.0.1:8080','1'); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.04 MDT',1,004,'0x1b7739c1e24d3a837e7821ecfb9a1be1','sw_a','3.cd','2'); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.05 MDT',1,005,'0x1b7739c1e24d3a837e7821ecfb9a1be1','sw_a','4.ef','3'); INSERT OR IGNORE INTO parsed VALUES('2023-10-07 12:00:00.06 MDT',1,006,'0x1b7739c1e24d3a837e7821ecfb9a1be1','sw_a','5.gh','4');
func (*Scanner) Filter ¶
Filter takes in input row and applies the scnr.negativeFilter and scnr.positiveFilter. True means the row should be filtered (dropped), false means keep the row.
Example (Negative) ¶
ExampleScanner_Filter_negative shows how to use the negative filter to remove lines not matching a pattern. Note the comment line and line with 'negative filter' are not included in the output.
// The '\s+' is used in the filter only to show that it is a regex; a space could have been used. defaultInputs, _ := NewInputs("./test/testInputs.json") defaultInputs.NegativeFilter = `#|negative\s+filter` scnr := openFileScanner(filepath.Join(testDataDirectory, "test_filter.txt"), *defaultInputs) dataChan, errorChan := scnr.Read(100, 100) fullData := []string{} filteredData := []string{} for row := range dataChan { fullData = append(fullData, row) if !scnr.Filter(row) { filteredData = append(filteredData, row) } } for err := range errorChan { fmt.Println(err) } fmt.Println("\nInput data:") fmt.Printf("%+v", strings.Join(fullData, "\n")) fmt.Println("\n\nFiltered data:") fmt.Printf("%+v", strings.Join(filteredData, "\n"))
Output: Input data: # Comment line 2023-10-07 12:00:00.00 MDT 0 0 notification debug will it filter sw_a Debug SW message 2023-10-07 12:00:00.01 MDT 1 001 notification info negative filter sw_b Info SW message 2023-10-07 12:00:00.02 MDT 1 002 status info will it filter sw_a Message with alphanumberic value abc123def Filtered data: 2023-10-07 12:00:00.00 MDT 0 0 notification debug will it filter sw_a Debug SW message 2023-10-07 12:00:00.02 MDT 1 002 status info will it filter sw_a Message with alphanumberic value abc123def
Example (Positive) ¶
ExampleScanner_Filter_positive shows how to use the positive filter to include lines matching a pattern. Note lines without a timestamp are not included in the output
defaultInputs, _ := NewInputs("./test/testInputs.json") defaultInputs.PositiveFilter = `\d{4}-\d{2}-\d{2}[ -]\d{2}:\d{2}:\d{2}\.\d{2}\s+[a-zA-Z]{2,5}` scnr := openFileScanner(filepath.Join(testDataDirectory, "test_filter.txt"), *defaultInputs) dataChan, errorChan := scnr.Read(100, 100) fullData := []string{} filteredData := []string{} for row := range dataChan { fullData = append(fullData, row) if !scnr.Filter(row) { filteredData = append(filteredData, row) } } for err := range errorChan { fmt.Println(err) } fmt.Println("\nInput data:") fmt.Printf("%+v", strings.Join(fullData, "\n")) fmt.Println("\n\nFiltered data:") fmt.Printf("%+v", strings.Join(filteredData, "\n"))
Output: Input data: # Comment line 2023-10-07 12:00:00.00 MDT 0 0 notification debug will it filter sw_a Debug SW message 2023-10-07 12:00:00.01 MDT 1 001 notification info negative filter sw_b Info SW message 2023-10-07 12:00:00.02 MDT 1 002 status info will it filter sw_a Message with alphanumberic value abc123def Filtered data: 2023-10-07 12:00:00.00 MDT 0 0 notification debug will it filter sw_a Debug SW message 2023-10-07 12:00:00.01 MDT 1 001 notification info negative filter sw_b Info SW message 2023-10-07 12:00:00.02 MDT 1 002 status info will it filter sw_a Message with alphanumberic value abc123def
func (*Scanner) HashingEnabled ¶ added in v1.0.0
HashingEnabled is true when the inputs are specifying that hashing is to be performed; false otherwise.
func (*Scanner) OpenFileScanner ¶
OpenFileScanner convenience function to open a file based scanner.
Example ¶
ExampleScanner_OpenFileScanner shows how to open a file for processing.
defaultInputs, _ := NewInputs("./test/testInputs.json") scnr, err := NewScanner(*defaultInputs) if err != nil { var t *testing.T t.Errorf("calling OpenScanner: %s", err) } scnr.OpenFileScanner(filepath.Join(testDataDirectory, "test_read.txt")) defer scnr.Shutdown()
Output:
func (*Scanner) OpenIoReaderScanner ¶
OpenIoReaderScanner opens a scanner using the supplied io.Reader. Callers reading from a file should call OpenFileScanner instead of this function.
Example ¶
ExampleScanner_OpenIoReaderScanner shows how to open an io.Reader for processing. Note that a file is used for convenience in calling OpenIoReaderScanner. When processing files, use the OpenFileScanner convenience function.
file, err := os.Open(filepath.Join(testDataDirectory, "test_read.txt")) if err != nil { var t *testing.T t.Errorf("calling os.Open: %s", err) } defaultInputs, _ := NewInputs("./test/testInputs.json") scnr, err := NewScanner(*defaultInputs) if err != nil { var t *testing.T t.Errorf("calling OpenIoReaderScanner: %s", err) } scnr.OpenIoReaderScanner(file) defer scnr.Shutdown()
Output:
func (*Scanner) Read ¶
Read starts a Go routine to read data from the input scanner and returns channels from which the caller can pull data and errors. Both data and error channels are buffered with buffer sizes databuffer and errorBuffer.
Example ¶
ExampleScanner_Read shows how to read data, with no other processing.
defaultInputs, _ := NewInputs("./test/testInputs.json") scnr := openFileScanner(filepath.Join(testDataDirectory, "test_read.txt"), *defaultInputs) fmt.Println("Read all the test data") dataChan, errorChan := scnr.Read(100, 100) for row := range dataChan { fmt.Println(row) } for err := range errorChan { fmt.Println(err) }
Output: Read all the test data 2023-10-07 12:00:00.00 MDT 0 0 notification debug multi word type sw_a Debug SW message 2023-10-07 12:00:00.01 MDT 1 001 notification info SingleWordType sw_b Info SW message 2023-10-07 12:00:00.02 MDT 1 002 status info alphanumeric value sw_a Message with alphanumberic value abc123def
func (*Scanner) Replace ¶
Replace applies the scnr.replace values to the supplied input row of data. The special case where RegexString == DATE_TIME_REGEX uses a function to replace a date time string with Unix epoch.
Example ¶
ExampleScanner_Replace shows how to use the Replace function to replace text that didn't include a delimiter with text that does have a delimiter. The delimiter in this example is two or more spaces. More than 2 consecutive spaces are also replaced with 2 spaces to enable splitting on a consistent delimiter. This also shows how to replace a datetime string with Unix epoch.
delimiter := `\s\s` delimiterString := " " // Note the order of the Replacements may be important. In this example a string that didn't include // delimiters is replaced with one that does. The next replacement is to replace more than 2 // consecutive spaces with the delimiter, which is 2 consecutive spaces. If the order of the // Replacements is reveresed, there will be more than 2 spaces seperating the poorly delimited rplc := []*Replacement{ {RegexString: "(class poor delimiting)", Replacement: delimiterString + "${1}" + delimiterString}, {RegexString: `\s\s+`, Replacement: delimiterString}, {RegexString: DATE_TIME_REGEX}, {RegexString: `\.([0-9]+)\s+`, Replacement: delimiterString + "${1}" + delimiterString}, } defaultInputs, _ := NewInputs("./test/testInputs.json") defaultInputs.InputDelimiter = delimiter defaultInputs.Replacements = rplc scnr := openFileScanner(filepath.Join(testDataDirectory, "test_replace.txt"), *defaultInputs) dataChan, errorChan := scnr.Read(100, 100) fullData := []string{} replacedData := []string{} for row := range dataChan { fullData = append(fullData, row) row = scnr.Replace(row) replacedData = append(replacedData, row) } for err := range errorChan { fmt.Println(err) } fmt.Println("\nInput data:") fmt.Printf("%+v", strings.Join(fullData, "\n")) fmt.Println("\n\nReplaced data:") fmt.Printf("%+v", strings.Join(replacedData, "\n"))
Output: Input data: 2023-10-07 12:00:00.01 MDT 0 000 class poor delimiting debug embedded values sw_a Message with embedded hex flag=0x01 and integer flag = 003 Replaced data: 1696680000 01 MDT 0 000 class poor delimiting debug embedded values sw_a Message with embedded hex flag=0x01 and integer flag = 003
func (*Scanner) Shutdown ¶
func (scnr *Scanner) Shutdown()
Shutdown performs an orderly shutdown on the scanner and is automatically called when Read completes. Callers should call shutdown if a scanner is created but not used.
func (*Scanner) Split ¶
Split uses the scnr.inputDelimiter to split the input data row. An error is returned if the resulting number of splits is not equal to Inputs.ExpectedFieldCount. But the data is returned and callers can choose to ignore the error if that is appropriate.
Example ¶
ExampleScanner_Split shows how to use the Split function. In this case the data is then Join'ed back together just for output purposed. Note that the call to Split drops the error that ExpectedFieldCount was incorrect. callers can choose to enforce the error, or not.
delimiter := `\s\s+` defaultInputs, _ := NewInputs("./test/testInputs.json") defaultInputs.InputDelimiter = delimiter defaultInputs.ExpectedFieldCount = 8 scnr := openFileScanner(filepath.Join(testDataDirectory, "test_split.txt"), *defaultInputs) dataChan, errorChan := scnr.Read(100, 100) fullData := []string{} splitData := []string{} for row := range dataChan { fullData = append(fullData, row) splits, _ := scnr.Split(row) splitData = append(splitData, strings.Join(splits, "|")) } for err := range errorChan { fmt.Println(err) } fmt.Println("\nInput data:") fmt.Printf("%+v", strings.Join(fullData, "\n")) fmt.Println("\n\nSplit data:") fmt.Printf("%+v", strings.Join(splitData, "\n"))
Output: Input data: 2023-10-07 12:00:00 MDT 0 0 notification debug multi word type sw_a Debug SW message 2023-10-07 12:00:00 MDT 1 001 notification info SingleWordType sw_b Info SW message 2023-10-07 12:00:00.02 MDT 1 002 status info alphanumeric value sw_a Message with alphanumberic value abc123def 2023-10-07 12:00:00.03 MDT 1 003 status info alphanumeric value sw_a Message with extra delimiters Split data: 2023-10-07 12:00:00 MDT|0|0|notification|debug|multi word type|sw_a|Debug SW message 2023-10-07 12:00:00 MDT|1|001|notification|info|SingleWordType|sw_b|Info SW message 2023-10-07 12:00:00.02 MDT|1|002|status|info|alphanumeric value|sw_a|Message with alphanumberic value abc123def 2023-10-07 12:00:00.03 MDT|1|003|status|info|alphanumeric value|sw_a|Message|with|extra|delimiters
func (*Scanner) SplitsExcludeHashColumns ¶ added in v1.0.0
func (scnr *Scanner) SplitsExcludeHashColumns(splits []string, hashFormat HashFormat) ([]string, error)
SplitsExcludeHashColumns creates a version of Split data that doesn't included the hash columns. It also calculates the hash of splits and adds the hash to hashMap and hashCount
func (*Scanner) SplitsToSql ¶ added in v1.0.3
func (scnr *Scanner) SplitsToSql(numColumns int, table string, splits []string, extracts []string) string
SplitsToSql will take a Split splits and convert it into an SQL INSERT INTO statement. All values are output as text. numColumns of Values will be provided, NULL padded. The table should be created with nullable text columns to receive as many extracts as might be produced. If the length of splits exceeds numColumns, the VALUES will be truncated. splits will be padded according to Scanner.SqlQuoteColumns, all extracts are quoted.