Pipesore: A command-line text processor that nobody asked for
Pipe because it's similar to unix like pipes and sore because the initial
hackathon version of this project was an eyesore.
Born from a proof of concept in using
bitfield/script directly in the CLI
pipesore provides a number of text filters that you can pipe together to
process text. It takes input from stdin and writes the pipeline output to
stdout allowing it to be used alongside unix pipes.
Motivation
Pipesore isn't intended to replace any of the well established cli text
processing tools (see
https://tldp.org/LDP/abs/html/textproc.html
for a good list and examples). These tools do a single job well and have many
powerful features to accomplish anything you might want to do.
On the other hand there can be a bit of a learning curve to remember their
names, flags, and usage - even for basic tasks.
Pipesore is intended to be a single command that covers the most useful use
cases of these tools while being intuitive to even someone who has never seen
pipesore before.
Installation
Download a stable release or install the latest development version with:
$ go install github.com/dyson/pipesore/cmd/pipesore@main
Optionally alias pipesore to something quicker to type, eg:
echo 'alias sp="pipesore"' >> ~/.bash_profile
Basic Usage
A contrived example:
$ echo "cat cat cat dog bird bird bird bird" | pipesore 'Replace(" ", "\n") | Frequency() | First(1)'
4 bird
Filters
All filters can be '|' (piped) together in any order, although not all ordering is logical.
All filter arguments are required. There a no assumptions about default values.
A filter prefixed with an "!" will return the opposite result of the non
prefixed filter of the same name. For example First(1)
would return only the
first line of the input and !First(1)
(read as not first) would skip the
first line of the input and return all other lines.
Filter |
|
Columns(delimiter string, columns string) |
Returns the selected columns in order where columns is a 1-indexed comma separated list of column positions. Columns are defined by splitting with the 'delimiter'. |
| ColumnsCSV(delimiter string, columns string)| Returns the selected columns
in order where columns
is a 1-indexed comma separated list of column positions. Parsing is CSV aware so quoted columns containing the delimiter
when splitting are preserved. |
| CountLines() | Returns the line count. Lines are delimited by \r?\n
. |
| CountRunes() | Returns the rune (Unicode code points) count. Erroneous and short encodings are treated as single runes of width 1 byte. |
| CountWords() | Returns the word count. Words are delimited by
\t\|\n\|\v\|\f\|\r\| \|0x85\|0xA0
. |
| First(n int) | Returns first n
lines where n
is a positive integer. If the input has less than n
lines, all lines are returned. |
| !First(n int) | Returns all but the the first n
lines where n
is a positive integer. If the input has less than n
lines, no lines are returned. |
| Frequency() | Ruturns a descending list containing frequency and unique line. Lines with equal frequency are sorted alphabetically. |
| Join(delimiter string) | Joins all lines together seperated by delimiter
. |
| Last(n int) | Returns last n
lines where n
is a positive integer. If the input has less than n
lines, all lines are returned. |
| !Last(n int) | Returns all but the last n
lines where n
is a positive integer. If the input has less than n
lines, no lines are returned. |
| Match(substring string) | Returns all lines that contain substring
. |
| !Match(substring string) | Returns all lines that don't contain substring
. |
| MatchRegex(regex string) | Returns all lines that match the compiled regular expression 'regex'. Regex is in the form of Re2. |
| !MatchRegex(regex string) | Returns all lines that don't match the compiled regular expression 'regex'. Regex is in the form of Re2. |
| Replace(old string, replace string) | Replaces all non-overlapping instances of old
with replace
. |
| ReplaceRegex(regex string, replace string) | Replaces all matches of the compiled regular expression regex
with replace
. Inside replace
, $
signs represent submatches. For example $1
represents the text of the first submatch. |
License
See LICENSE file.