sum

module

v1.2.1 Latest Latest Go to latest Published: May 12, 2019 License: GPL-3.0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/evilsocket/sum

Links

Open Source Insights

README ¶

SUM

Sum is a specialized database server for linear algebra and machine learning.

Installation

Download the latest binary release, then create the certificate used for authentication and channel encryption:

sudo mkdir -p /etc/sumd/creds
sudo openssl req -x509 -newkey rsa:4096 -keyout /etc/sumd/creds/key.pem -out /etc/sumd/creds/cert.pem -days 365 -nodes -subj '/CN=localhost'

Proceed to install the sumd binary as a systemd service:

cd /path/to/extracted/sumd
sudo mkdir -p /var/lib/sumd/data
sudo mkdir -p /var/lib/sumd/oracles
sudo mv sumd /usr/local/bin/
sudo mv sumcli /usr/local/bin/
sudo mv sumd.service /etc/systemd/system/
sudo systemctl daemon-reload

Compile from Source

Install gRPC go bindings and then:

go get github.com/evilsocket/sum
cd $GOPATH/src/github.com/evilsocket/sum
make deps
make sumd
sudo make install

Usage

You can access your sum instance by using the sumcli client, run sumcli -eval "help; q" to print a list of available commands. Moreover, to have an idea of how the client side works, take a look at the example python client code that will create a few vectors on the server, define an oracle, call it for every vector and print the similarities the server returned.

Why?

If you work with machine learning you probably find yourself having around a bunch of huge CSV files that maybe you keep using to train your models, or you run PCA on them, or you perform any sort of analysis. If this is the case, you know the struggle of:

parsing and loading the file with numpy, tensorflow or whatever.
crossing your fingers that your laptop can actually store those records in memory.
running your algorithm
... waiting ...

This project is an attempt to make these tedious tasks (and many others) simpler if not completely automated. Sum is a database and gRPC high performance service offering three main things:

Persistace for your vectors.
A simple CRUD system to create, read, update and delete them.
Oracles.

An oracle is a piece of javascript logic you want to run on your data, this code is sent to the Sum server by a client, compiled and stored. It'll then be available for every client to use in order to "query" the data.

For instance, this is the findSimilar oracle definition:

// Given the vector with id=`id`, return a list of
// other vectors which cosine similarity to the reference
// one is greater or equal than the threshold.
// Results are given as a dictionary of :
//      `vector_id => similarity`
function findSimilar(id, threshold) {
    var v = records.Find(id);
    if( v.IsNull() == true ) {
        return ctx.Error("Vector " + id + " not found.");
    }

    var results = {};
    records.AllBut(v).forEach(function(record){
        var similarity = v.Cosine(record);
        if( similarity >= threshold ) {
           results[record.ID] = similarity
        }
    });

    return results;
}

Once defined on the Sum server, any client will be able to execute calls like findSimilar("some-vector-id-here", 0.9), such calls will be evaluated on data in memory in order to be as fast as possible, while the same data will be persisted on disk as binary protobuf encoded files.

Here you can see the output of an example usecase - finding behaviourally similar malware samples given a reference executable:

Directories ¶

Path	Synopsis
backend Package backend provides an abstraction layer to the available BLAS backends.	Package backend provides an abstraction layer to the available BLAS backends.
cmd
sumcli
sumcli/handlers
sumd
proto
service Package service contains the implementation of the SUM gRPC service.	Package service contains the implementation of the SUM gRPC service.
storage Package storage provides the basic data structures, indexing, persistency and low leve API for oracles and vectors.	Package storage provides the basic data structures, indexing, persistency and low leve API for oracles and vectors.
wrapper Package wrapper provides the functionalities for the objects being passed to oracles during evaluation.	Package wrapper provides the functionalities for the objects being passed to oracles during evaluation.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL