knowledge

command module

v0.1.9 Latest Latest Go to latest Published: Jul 10, 2024 License: Apache-2.0 Imports: 2 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/gptscript-ai/knowledge

Links

Open Source Insights

README ¶

Knowledge API

Standalone Knowledge Tool to be used with GPTScript and GPTStudio

Build

Requires Go 1.22+

make build

Run

The knowledge tool can run in two modes: server and client, where client can be standalone or referring to a remote server.

Client - Standalone

knowledge create-dataset foobar
knowledge ingest -d foobar README.md
knowledge retrieve -d foobar "Which filetypes are supported?"
knowledge delete-dataset foobar

Server & Client - Server Mode

knowledge server

export KNOW_SERVER_URL=http://localhost:8000/v1
knowledge create-dataset foobar
knowledge ingest -d foobar README.md
knowledge retrieve -d foobar "Which filetypes are supported?"
knowledge delete-dataset foobar

Supported File Types

.pdf
.html
.md
.txt
.docx
.odt
.rtf
.csv
.ipynb
.json

OpenAPI / Swagger

The API is documented using OpenAPI 2.0 (Swagger), automatically generated using swaggo/swag (make openapi).

GPTScript Examples

Note: The examples in the examples/ directory expect the knowledge binary to be in your $PATH.

Run

gptscript examples/client.gpt

Architecture & Components

The knowledge tool is composed of the following components, which are all run from the same executable:

knowledge client
- can run in two modes:
  - standalone (client/standalone): manages its own vector and knowledge database locally
  - server/remote (client/default): interacts with a knowledge server over the network
knowledge server (server)
- lets you run a REST API server that interacts with the below databases, so the client is stateless and sends/receives data over the network
datastore (datastore)
- responsible for handling data ingestion and retrieval
  - ingestion includes
    - loading documents (extracting text)
    - splitting text into chunks
    - pre-processing (e.g. metadata extraction, content enrichment)
    - requesting embeddings (part of the vectorstore implementation)
    - storing embeddings and metadata in the vector database
    - registering the document in the knowledge database (index)
  - retrieval includes
    - query embedding
    - querying the vector database for embeddings - similarity search
    - mapping the retrieved embeddings to document contents
    - (optional) post-processing (e.g. filtering, sorting, summarization)
    - returning document contents alongside their similarity scores
- consists of two databases:
  - vector database (vectorstore)
    - Current choice: chromem-go
    - used for storing and retrieving embeddings alongside the document contents
    - the implementation is responsible for
      - requesting the embeddings from a model (e.g. OpenAI's text-embeddings-ada-002)
      - storing the embeddings together with metadata and document contents
      - doing similarity searches to retrieve embeddings
      - returning document contents alongside their similarity scores
  - knowledge database (index
    - Current choice: sqlite3
    - used for
      - indexing knowledge bases (datasets): dataset <(1:n)> files <(1:n)> documents
        
        this is useful for deleting specific documents or files from a dataset and to get quick overviews over datasets without having to query the vector database (which holds this information in the metadata)
      - storing knowledge base metadata and e.g. attached ingestion flows

Retrieval Flows

The knowledge tool allows you to configure how sources are retrieved and how they should be treated before being returned to the caller (usually an LLM).

Here's how it looks like:

Retrieval Flows

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
docs
gendocs
pkg
client
cmd
config
datastore
datastore/defaults
datastore/documentloader
datastore/filetypes
datastore/postprocessors Package postprocessors is basically the same as package transformers, but used at a different stage of the RAG pipeline	Package postprocessors is basically the same as package transformers, but used at a different stage of the RAG pipeline
datastore/querymodifiers
datastore/retrievers
datastore/store
datastore/textsplitter
datastore/textsplitter/markdown_basic
datastore/transformers
datastore/types
docs Package docs Code generated by swaggo/swag.	Package docs Code generated by swaggo/swag.
env
flows
flows/config
index
llm
server
server/types
vectorstore
vectorstore/chromem
vectorstore/errors
version

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL