Documentation ¶
Overview ¶
Package bufio extends the stdlib bufio with additional support for the \r eol marker. It wraps an io.Reader or io.Writer object, creating another object (Reader or Writer) that also implements the interface but provides buffering and some help for textual I/O.
The PDF specification ISO 32000 defines the PDF end of line markers at 4.20 as:
One or two character sequence marking the end of a line of text, consisting of a CARRIAGE RETURN character (0Dh) or a LINE FEED character (0Ah) or a CARRIAGE RETURN followed immediately by a LINE FEED
There are two general ways provided by bufio to read a line:
ReadString(delim byte) eventually calling ReadSlice(delim byte)
ReadLine() supporting \n(0x0A) and \r\n(0x0D0A) out of the box.
Since PDF in addition to \n and \r\n also supports a \r eol marker and also because it is common for real world PDF files to have arbitrary eol markers (and therefore violating the spec) even within the same file we need to extend bufio in order to support PDF end of line markers in a flexible way.
Although one may suppose that extending ReadLine() is the perfect candidate for this the decision was to enhance ReadSlice(delim byte) by supporting an "unknownDelimiter" (=0) as the argument for the delimiter parameter.
Extending ReadSlice(delim byte) worked out as the better alternative because it is called once for every line whereas ReadLine() needs to be called in a loop returning chunks.
The nice thing about extending ReadSlice(delim byte) is that we still can call ReadString(delim byte) because the delimiter parameter gets passed down to ReadSlice(delim byte).
based on go1.9
Index ¶
- Variables
- type ReadWriter
- type Reader
- func (b *Reader) Buffered() int
- func (b *Reader) Discard(n int) (discarded int, err error)
- func (b *Reader) Peek(n int) ([]byte, error)
- func (b *Reader) Read(p []byte) (n int, err error)
- func (b *Reader) ReadByte() (byte, error)
- func (b *Reader) ReadBytes(delim byte) ([]byte, error)
- func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error)
- func (b *Reader) ReadRune() (r rune, size int, err error)
- func (b *Reader) ReadSlice(delim byte) (line []byte, err error)
- func (b *Reader) ReadString(delim byte) (string, error)
- func (b *Reader) Reset(r io.Reader)
- func (b *Reader) UnreadByte() error
- func (b *Reader) UnreadRune() error
- func (b *Reader) WriteTo(w io.Writer) (n int64, err error)
- type Writer
- func (b *Writer) Available() int
- func (b *Writer) Buffered() int
- func (b *Writer) Flush() error
- func (b *Writer) ReadFrom(r io.Reader) (n int64, err error)
- func (b *Writer) Reset(w io.Writer)
- func (b *Writer) Write(p []byte) (nn int, err error)
- func (b *Writer) WriteByte(c byte) error
- func (b *Writer) WriteRune(r rune) (size int, err error)
- func (b *Writer) WriteString(s string) (int, error)
Constants ¶
This section is empty.
Variables ¶
Functions ¶
This section is empty.
Types ¶
type ReadWriter ¶
ReadWriter stores pointers to a Reader and a Writer. It implements io.ReadWriter.
func NewReadWriter ¶
func NewReadWriter(r *Reader, w *Writer) *ReadWriter
NewReadWriter allocates a new ReadWriter that dispatches to r and w.
type Reader ¶
type Reader struct {
// contains filtered or unexported fields
}
Reader implements buffering for an io.Reader object.
func NewReaderSize ¶
NewReaderSize returns a new Reader whose buffer has at least the specified size. If the argument io.Reader is already a Reader with large enough size, it returns the underlying Reader.
func (*Reader) Buffered ¶
Buffered returns the number of bytes that can be read from the current buffer.
func (*Reader) Discard ¶
Discard skips the next n bytes, returning the number of bytes discarded.
If Discard skips fewer than n bytes, it also returns an error. If 0 <= n <= b.Buffered(), Discard is guaranteed to succeed without reading from the underlying io.Reader.
func (*Reader) Peek ¶
Peek returns the next n bytes without advancing the reader. The bytes stop being valid at the next read call. If Peek returns fewer than n bytes, it also returns an error explaining why the read is short. The error is ErrBufferFull if n is larger than b's buffer size.
func (*Reader) Read ¶
Read reads data into p. It returns the number of bytes read into p. The bytes are taken from at most one Read on the underlying Reader, hence n may be less than len(p). At EOF, the count will be zero and err will be io.EOF.
func (*Reader) ReadByte ¶
ReadByte reads and returns a single byte. If no byte is available, returns an error.
func (*Reader) ReadBytes ¶
ReadBytes reads until the first occurrence of delim in the input, returning a slice containing the data up to and including the delimiter. If ReadBytes encounters an error before finding a delimiter, it returns the data read before the error and the error itself (often io.EOF). ReadBytes returns err != nil if and only if the returned data does not end in delim. For simple uses, a Scanner may be more convenient.
func (*Reader) ReadLine ¶
ReadLine is a low-level line-reading primitive. Most callers should use ReadBytes('\n') or ReadString('\n') instead or use a Scanner.
ReadLine tries to return a single line, not including the end-of-line bytes. If the line was too long for the buffer then isPrefix is set and the beginning of the line is returned. The rest of the line will be returned from future calls. isPrefix will be false when returning the last fragment of the line. The returned buffer is only valid until the next call to ReadLine. ReadLine either returns a non-nil line or it returns an error, never both.
The text returned from ReadLine does not include the line end ("\r\n" or "\n"). No indication or error is given if the input ends without a final line end. Calling UnreadByte after ReadLine will always unread the last byte read (possibly a character belonging to the line end) even if that byte is not part of the line returned by ReadLine.
func (*Reader) ReadRune ¶
ReadRune reads a single UTF-8 encoded Unicode character and returns the rune and its size in bytes. If the encoded rune is invalid, it consumes one byte and returns unicode.ReplacementChar (U+FFFD) with a size of 1.
func (*Reader) ReadSlice ¶
ReadSlice reads until the first occurrence of delim in the input, returning a slice pointing at the bytes in the buffer. For delim 0 it reads until the first occurrence of either \n, \r\n, or \r. The bytes stop being valid at the next read. If ReadSlice encounters an error before finding a delimiter, it returns all the data in the buffer and the error itself (often io.EOF). ReadSlice fails with error ErrBufferFull if the buffer fills without a delim. Because the data returned from ReadSlice will be overwritten by the next I/O operation, most clients should use ReadBytes or ReadString instead. ReadSlice returns err != nil if and only if line does not end in delim.
func (*Reader) ReadString ¶
ReadString reads until the first occurrence of delim in the input, returning a string containing the data up to and including the delimiter. If ReadString encounters an error before finding a delimiter, it returns the data read before the error and the error itself (often io.EOF). ReadString returns err != nil if and only if the returned data does not end in delim. For simple uses, a Scanner may be more convenient.
func (*Reader) Reset ¶
Reset discards any buffered data, resets all state, and switches the buffered reader to read from r.
func (*Reader) UnreadByte ¶
UnreadByte unreads the last byte. Only the most recently read byte can be unread.
func (*Reader) UnreadRune ¶
UnreadRune unreads the last rune. If the most recent read operation on the buffer was not a ReadRune, UnreadRune returns an error. (In this regard it is stricter than UnreadByte, which will unread the last byte from any read operation.)
type Writer ¶
type Writer struct {
// contains filtered or unexported fields
}
Writer implements buffering for an io.Writer object. If an error occurs writing to a Writer, no more data will be accepted and all subsequent writes, and Flush, will return the error. After all data has been written, the client should call the Flush method to guarantee all data has been forwarded to the underlying io.Writer.
func NewWriterSize ¶
NewWriterSize returns a new Writer whose buffer has at least the specified size. If the argument io.Writer is already a Writer with large enough size, it returns the underlying Writer.
func (*Writer) Buffered ¶
Buffered returns the number of bytes that have been written into the current buffer.
func (*Writer) Reset ¶
Reset discards any unflushed buffered data, clears any error, and resets b to write its output to w.
func (*Writer) Write ¶
Write writes the contents of p into the buffer. It returns the number of bytes written. If nn < len(p), it also returns an error explaining why the write is short.