Documentation
¶
Overview ¶
Package bigcsvreader offers a multi-threaded approach for reading a large CSV file in order to improve the time of reading and processing it. It spawns multiple goroutines, each reading a piece of the file. Read rows are put into channels equal in number to the spawned goroutines, in this way also the processing of those rows can be parallelized.
Index ¶
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ErrEmptyFile = errors.New("empty csv file")
ErrEmptyFile is an error returned if CSV file is empty.
Functions ¶
This section is empty.
Types ¶
type CsvReader ¶
type CsvReader struct { // MaxGoroutinesNo is the maximum goroutines to start parsing the CSV file. // Minimum required bytes to start a new goroutine is 2048 bytes. // Defaults to [runtime.NumCPU]. MaxGoroutinesNo int // FileHasHeader is a flag indicating if file's first row is the header (columns names). // If so, the header line is disregarded and not returned as a row. // Defaults to false. FileHasHeader bool // ColumnsCount is the number of columns the CSV file has. ColumnsCount int // ColumnsDelimiter is the delimiter char between columns. Defaults to comma. ColumnsDelimiter rune // BufferSize is used internally for [bufio.Reader] size. Has a default value of 4096. // If you have lines bigger than this value, adjust it not to get "buffer full" error. BufferSize int // Logger can be set to perform some debugging/error logging. // Defaults to a no-operation logger (no log is performed). // You can enable logging by passing a logger that implements [internal.Logger] contract. Logger internal.Logger // LazyQuotes is a flag used to allow quotes in an unquoted field and non-doubled quotes // in a quoted field LazyQuotes bool // contains filtered or unexported fields }
CsvReader reads async rows from a CSV file. It does that by initializing multiple goroutines, each of them handling a chunk of data from the file.
Example ¶
package main import ( "context" "fmt" "strconv" "sync" "github.com/failingslapst/bigcsvreader" ) const ( columnProductID = iota columnProductName columnProductDescription columnProductPrice columnProductQty ) const noOfColumns = 5 type Product struct { ID int Name string Desc string Price float64 Qty int } func main() { // initialize the big csv reader bigCSV := bigcsvreader.New() bigCSV.SetFilePath("testdata/example_products.csv") bigCSV.ColumnsCount = noOfColumns bigCSV.MaxGoroutinesNo = 16 ctx, cancelCtx := context.WithCancel(context.Background()) defer cancelCtx() var wg sync.WaitGroup // start multi-thread reading rowsChans, errsChan := bigCSV.Read(ctx) // process rows and errors: for i := 0; i < len(rowsChans); i++ { wg.Add(1) go rowWorker(rowsChans[i], &wg) } wg.Add(1) go errWorker(errsChan, &wg) wg.Wait() } func rowWorker(rowsChan bigcsvreader.RowsChan, waitGr *sync.WaitGroup) { for row := range rowsChan { processRow(row) } waitGr.Done() } func errWorker(errsChan bigcsvreader.ErrsChan, waitGr *sync.WaitGroup) { for err := range errsChan { handleError(err) } waitGr.Done() } // processRow can be used to implement business logic // like validation / converting to a struct / persisting row into a storage. func processRow(row []string) { id, _ := strconv.Atoi(row[columnProductID]) price, _ := strconv.ParseFloat(row[columnProductPrice], 64) qty, _ := strconv.Atoi(row[columnProductQty]) name := row[columnProductName] desc := row[columnProductDescription] product := Product{ ID: id, Name: name, Desc: desc, Price: price, Qty: qty, } fmt.Printf("%+v\n", product) } // handleError handles the error. // errors can be fatal like file does not exist, or row related like a given row could not be parsed, etc... func handleError(err error) { fmt.Println(err) }
Output: {ID:1 Name:Apple iPhone 13 Desc:Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc eleifend felis quis magna auctor, ut lacinia eros efficitur. Maecenas mattis dolor a pharetra gravida. Aenean at eros sed metus posuere feugiat in vitae libero. Morbi a diam volutpat, tempor lacus sed, sagittis velit. Donec eget dignissim mauris, sed aliquam ex. Duis eros dolor, vestibulum ac aliquam eget, viverra in enim. Aenean ut turpis quis purus porta lobortis. Etiam sollicitudin lectus vitae velit tincidunt, ut volutpat justo aliquam. Aenean vitae vehicula arcu. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nunc viverra enim nec risus mollis elementum nec dictum ex. Nunc lorem eros, vulputate a rutrum nec, scelerisque non augue. Sed in egestas eros. Quisque felis lorem, vehicula ac venenatis vel, tristique id sapien. Morbi vitae odio eget orci facilisis suscipit. Cras sodales, augue vitae tincidunt tempus, diam turpis volutpat est, vitae fringilla augue leo semper augue. Integer scelerisque tempor mauris, ac posuere sem aenean Price:1025.99 Qty:100} {ID:2 Name:Samsung Galaxy S22 Desc:Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc eleifend felis quis magna auctor, ut lacinia eros efficitur. Maecenas mattis dolor a pharetra gravida. Aenean at eros sed metus posuere feugiat in vitae libero. Morbi a diam volutpat, tempor lacus sed, sagittis velit. Donec eget dignissim mauris, sed aliquam ex. Duis eros dolor, vestibulum ac aliquam eget, viverra in enim. Aenean ut turpis quis purus porta lobortis. Etiam sollicitudin lectus vitae velit tincidunt, ut volutpat justo aliquam. Aenean vitae vehicula arcu. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nunc viverra enim nec risus mollis elementum nec dictum ex. Nunc lorem eros, vulputate a rutrum nec, scelerisque non augue. Sed in egestas eros. Quisque felis lorem, vehicula ac venenatis vel, tristique id sapien. Morbi vitae odio eget orci facilisis suscipit. Cras sodales, augue vitae tincidunt tempus, diam turpis volutpat est, vitae fringilla augue leo semper augue. Integer scelerisque tempor mauris, ac posuere sem aenean Price:400.99 Qty:12} {ID:3 Name:Apple MacBook Air Desc:Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc eleifend felis quis magna auctor, ut lacinia eros efficitur. Maecenas mattis dolor a pharetra gravida. Aenean at eros sed metus posuere feugiat in vitae libero. Morbi a diam volutpat, tempor lacus sed, sagittis velit. Donec eget dignissim mauris, sed aliquam ex. Duis eros dolor, vestibulum ac aliquam eget, viverra in enim. Aenean ut turpis quis purus porta lobortis. Etiam sollicitudin lectus vitae velit tincidunt, ut volutpat justo aliquam. Aenean vitae vehicula arcu. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nunc viverra enim nec risus mollis elementum nec dictum ex. Nunc lorem eros, vulputate a rutrum nec, scelerisque non augue. Sed in egestas eros. Quisque felis lorem, vehicula ac venenatis vel, tristique id sapien. Morbi vitae odio eget orci facilisis suscipit. Cras sodales, augue vitae tincidunt tempus, diam turpis volutpat est, vitae fringilla augue leo semper augue. Integer scelerisque tempor mauris, ac posuere sem aenean Price:700.99 Qty:34} {ID:4 Name:Lenovo ThinkPad X1 Desc:Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc eleifend felis quis magna auctor, ut lacinia eros efficitur. Maecenas mattis dolor a pharetra gravida. Aenean at eros sed metus posuere feugiat in vitae libero. Morbi a diam volutpat, tempor lacus sed, sagittis velit. Donec eget dignissim mauris, sed aliquam ex. Duis eros dolor, vestibulum ac aliquam eget, viverra in enim. Aenean ut turpis quis purus porta lobortis. Etiam sollicitudin lectus vitae velit tincidunt, ut volutpat justo aliquam. Aenean vitae vehicula arcu. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nunc viverra enim nec risus mollis elementum nec dictum ex. Nunc lorem eros, vulputate a rutrum nec, scelerisque non augue. Sed in egestas eros. Quisque felis lorem, vehicula ac venenatis vel, tristique id sapien. Morbi vitae odio eget orci facilisis suscipit. Cras sodales, augue vitae tincidunt tempus, diam turpis volutpat est, vitae fringilla augue leo semper augue. Integer scelerisque tempor mauris, ac posuere sem aenean Price:550.99 Qty:90} {ID:5 Name:Logitech Mouse G203 Desc:Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nunc eleifend felis quis magna auctor, ut lacinia eros efficitur. Maecenas mattis dolor a pharetra gravida. Aenean at eros sed metus posuere feugiat in vitae libero. Morbi a diam volutpat, tempor lacus sed, sagittis velit. Donec eget dignissim mauris, sed aliquam ex. Duis eros dolor, vestibulum ac aliquam eget, viverra in enim. Aenean ut turpis quis purus porta lobortis. Etiam sollicitudin lectus vitae velit tincidunt, ut volutpat justo aliquam. Aenean vitae vehicula arcu. Interdum et malesuada fames ac ante ipsum primis in faucibus. Nunc viverra enim nec risus mollis elementum nec dictum ex. Nunc lorem eros, vulputate a rutrum nec, scelerisque non augue. Sed in egestas eros. Quisque felis lorem, vehicula ac venenatis vel, tristique id sapien. Morbi vitae odio eget orci facilisis suscipit. Cras sodales, augue vitae tincidunt tempus, diam turpis volutpat est, vitae fringilla augue leo semper augue. Integer scelerisque tempor mauris, ac posuere sem aenean Price:30.5 Qty:35}
func New ¶
func New() *CsvReader
New instantiates a new CsvReader object with some default fields preset.
func (*CsvReader) Read ¶
Read extracts asynchronously CSV rows, each started goroutine putting them into a RowsChan. Error(s) occurred during parsing are sent through ErrsChan.
func (*CsvReader) SetFilePath ¶
SetFilePath sets the CSV file path.