nexus

module

v0.0.0-...-df06595 Latest Latest Go to latest Published: Feb 25, 2024 License: GPL-3.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/nnsgmsone/nexus

Links

Open Source Insights

README ¶

Nexus

Nexus is an interactive tool designed for versatile data analysis, leveraging the SPL (Search Processing Language) for expressing data manipulation tasks. Released under the GNU General Public License (GPL), Nexus offers the freedom to utilize, modify, and distribute its code.

Utilizing SPL, Nexus excels at parsing unformatted data, employing a syntax reminiscent of Unix pipelines with the pipe symbol (|). Each command within the pipeline encapsulates a specific analysis task.

The SPL syntax follows this pattern:

| <spl commands> = | <spl command> | <spl command> | ...

Key Features

Fast import
Parsing data in arbitrary formats
Extending data parsing capabilities with Lua plugins
Data analysis through SPL
Interactive user interface

Introduction to Nexus

Nexus Architecture

Nexus takes all imported data and streams it in the form of byte streams to the extract command. The extract command parses the byte stream and transforms it into a table format, resembling a database format, based on specific processing logic. The resulting table is then passed on to the next command. Ultimately, after all commands have been executed, the results are displayed on the screen.

In addition to the extract command, Nexus includes other commands such as dedup, where, eval, sort, limit, stats, etc. These commands are designed to manipulate the data. It is important to note that, excluding the extract command, the input and output for these commands are both in table format.

Compiling Nexus

make

Launching Nexus

# This command initiates an interactive interface to execute commands for data analysis
./nexus

Interactive Commands in Nexus

Nexus responds to several interactive commands:

quit/exit: Exit Nexus
clear: Clear the screen
Arrow keys: Navigate command history
Supported SPL statements

File Path

File Path in nexus all use absolute paths

Importing Data

Nexus leverages the import statement to import data, following this syntax:

IMPORT name [, name]...

Here are two examples:

| import "file1"; -- Import file1
| import "file1", "file2"; -- Import file1 and file2

Clearing Data

Nexus uses the clean statement to clear data, employing the following syntax:

CLEAN

Here is an example:

| clean

Parsing Data

Data parsing in Nexus utilizes the extract statement, serving as the initial command in a query. The syntax for extract is as follows:

EXTRACT [LUA = STRING | LUA_FILE = STRING] eval_list

Consider this example:

| extract LUA_FILE="1.lua" a = 0, b = 1, c = 2 | eval a = cast(a as float) + cast(b as float) | sort 10 by a;

The extract command incorporates Lua to implement the basic logic of data parsing. Details on writing Lua scripts for various functionalities will be explained in the Lua script section.

Deduplication

Data deduplication in Nexus is achieved through the dedup statement, utilizing the following syntax:

dedup name [, name]

Consider these examples:

... | dedup a
... | dedup a, b

Filtering

Nexus filters data using the where statement, adhering to the syntax:

where expr

Consider these simple examples:

... | where a = 1
... | where like(ip, "198.*")

Projection

Nexus projects data through the eval statement, following the syntax:

eval name = expr [, name = expr]

Consider these simple examples:

... | eval a = b + c
... | eval a = cast(b as float)

Sorting

Data sorting in Nexus is achieved through the sort statement, with the syntax:

sort [int] sort-field-list
sort-field-list = name [desc|asc] [, name [desc|asc]]

Consider these simple examples:

... | sort by uid
... | sort 10 by uid, date

Limiting

Limiting data in Nexus is accomplished using the limit statement, following this syntax:

limit int

Consider this example:

... | limit 10

Grouping and Aggregation

Nexus employs the stats statement for grouping and aggregating data, with the syntax:

stats expr [as name] [, expr [as name]] [by name [, name]]

Currently supported aggregation functions for stats include:

count
sum
max
min
avg

Consider these examples:

... | stats count()
... | stats count() as cnt by ip
... | stats sum(a), avg(a) by b, c

Functions and Operators Nexus currently supports various functions and operators:

isnull(x): Checks if x is nil.
isnotnull(x): Checks if x is not nil.
cast(x as t): Converts x to type t.
replace(x, old, new): String replacement.
regexp_match(x, reg): Checks if x matches the regular expression reg.
regexp_extract(x, reg, idx): Extracts data that satisfies the conditions.
Arithmetic operators: +, -, *, /, <, <=, >=, >, =, <>, %
Logical operators: and, or, not

Data Types

Nexus supports the following data types:

bool
long
double
string
NULL - Represents non-existence

Constants

The supported constants in Nexus are as follows:

true, false
1
1.2
"x", `x`

Nexus supports two types of strings. The first type is similar to "x", where the content within double quotation marks is considered the string content. It is important to note that this representation automatically handles escape characters, for example, "\n" will be treated as a newline character. The second type is `x`-style strings, where the content within backticks is considered the string content and no escape processing is applied.

Lua Scripting

Nexus utilizes a Lua script to convert byte stream data into table data. The Lua script employs io.read() to continuously read data, parsing it into table data and sending data chunks to Nexus.

The Lua script outputs a table with n rows * n columns, representing a two-dimensional array using Lua's table data structure. The specific format of the two-dimensional array is exemplified as follows:

tbl = { {"x", "y", "z"}, {"a", "b", "c"}}

The processing methodology involves using a column-by-column, one-dimensional array. Each output is transmitted to Nexus through writeResult. It is important to note that the rows in each column need to be aligned, and the number of rows must not exceed 8192, necessitating the script to output data in manageable chun

Below are several examples of Lua scripts. The first script separates the input byte stream into lines, with each line containing only one column of data:

local row = 0
local tbl = {{}}
tbl[1] = tbl[1] or {}
for line in io.stdin:lines() do
    local rows = #tbl[1] or 0
    if rows >= 8192 then
        writeResult(tbl)
        row = 0
        tbl = {{}}
        tbl[1] = tbl[1] or {}
    end
    tbl[1][row+1] = line
    row = row + 1
end
local rows = #tbl[1] or 0
if rows > 0 then
    writeResult(tbl)
end
writeResult(nil)

This is a simple example, and all Lua scripts are similar. They read data through io.stdin, build a two-dimensional array, and call writeResult to return results in batches, using writeResult(nil) at the end.

The next example is a more complex one designed to parse a simple CSV format. The Lua script is as follows:

-- csv.lua - parse and format csv
-- Usage: cat csv.txt | lua csv.lua
-- Input: csv.txt
-- Output: {field1, field2, field3..},{field1, field2, field3...}...
-- Author: nnsgmsone
function parseCSVLine(line)
    local fields = {}
    for v in string.gmatch(line, '([^,]+)') do
        table.insert(fields, v)
    end
    return fields
end

local row = 0
local tbl = {{}}
for line in io.stdin:lines() do
    local record = parseCSVLine(line)
    local rows = #tbl[1] or 0
    if rows >= 8192 then
        writeResult(tbl)
        row = 0
        tbl = {{}}
    end
    for i, v in ipairs(record) do
        tbl[i] = tbl[i] or {}
        tbl[i][row+1] = v
    end
    row = row + 1
end
local rows = #tbl[1] or 0
if rows > 0 then
    writeResult(tbl)
end
writeResult(nil)

The difference from the first Lua script is that this script outputs multiple columns. It parses CSV, divides the CSV data into units of 8192 rows, and then returns the data in table format (column storage).

The last example is a more practical Lua script that parses dump of goroutines, converting each goroutine into three columns (creator information, call stack information, goroutine survival time). The Lua script is as follows:

-- goroutine.lua - parse and format goroutine dump
-- Usage: cat goroutine.txt | lua goroutine.lua
-- Input: goroutine.txt
-- Output: {created, stack, time},{created, stack, time}...
-- Author: nnsgmsone
local row = 0
local tbl = {{}}
local time = "0"
local stack = ""
local created = ""
tbl[1] = tbl[1] or {}
tbl[2] = tbl[2] or {}
tbl[3] = tbl[3] or {}
for line in io.stdin:lines() do
    local rows = #tbl[1] or 0
    if rows >= 8192 then
        writeResult(tbl)
        row = 0
        tbl = {{}}
        tbl[1] = tbl[1] or {}
        tbl[2] = tbl[2] or {}
        tbl[3] = tbl[3] or {}
    end
    if string.len(stack) == 0 then
        local find = string.match(line, "goroutine")
        if find  then
            stack = line
            local t = string.match(line, "(%d+)%s*minutes")
            if t then 
                time = t
            end
        end
    elseif string.len(created) > 0 then
        local find = string.match(line, "%S+")
        if find then 
            tbl[1][row+1] = find
        else
            tbl[1][row+1] = ""
        end
        tbl[2][row+1] = stack .. "\n" .. line
        tbl[3][row+1] = time
        created = ""
        stack = ""
        time = "0"
        row = row + 1
    else
        local find = string.match(line, "created by")
        if find then
            created = find
        else
            stack = stack .. "\n" .. line
        end
    end
end
writeResult(tbl)
writeResult(nil)

These Lua scripts provide examples of reading and processing data in different formats.

License

This project is licensed under the GPL License.

Directories ¶

Path	Synopsis
cmd
pkg
container/batch
container/bitmap
container/indextable
container/types
container/vector
defines
encoding
spl/colexec/agg
spl/colexec/emit
spl/colexec/exchange
spl/colexec/filter
spl/colexec/group
spl/colexec/limit
spl/colexec/load
spl/colexec/order
spl/colexec/projection
spl/colexec/scan
spl/colexec/top
spl/compare
spl/compile
spl/expr
spl/lex
spl/parser
spl/plan
spl/tree
testutil
util
vfs
vm/engine
vm/engine/noah
vm/mheap
vm/pipeline
vm/process

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL