exporter

package
v1.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 3, 2023 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

Package exporter provides tools for extracting chat session data from JSON files and converting it into various formats such as CSV and JSON datasets. This package is designed to facilitate the analysis and processing of chat data, making it easier to perform tasks such as data visualization, reporting, or feeding the data into machine learning models.

The exporter package defines several types to represent chat sessions, messages, and associated metadata. It also includes functions to read chat session data from JSON files, convert sessions to CSV with different formatting options, create separate CSV files for sessions and messages, and extract sessions to a JSON format suitable for use with Hugging Face datasets.

Usage:

To read chat sessions from a JSON file and convert them to a CSV format:

store, err := exporter.ReadJSONFromFile("path/to/chat-sessions.json")
if err != nil {
    log.Fatal(err)
}
csvData, err := exporter.ConvertSessionsToCSV(store.ChatNextWebStore.Sessions, exporter.FormatOptionInline, "output.csv")
if err != nil {
    log.Fatal(err)
}
fmt.Println(csvData)

To create separate CSV files for sessions and messages:

err := exporter.CreateSeparateCSVFiles(store.ChatNextWebStore.Sessions, "sessions.csv", "messages.csv")
if err != nil {
    log.Fatal(err)
}

To extract chat sessions to a JSON dataset:

datasetJSON, err := exporter.ExtractToDataset(store.ChatNextWebStore.Sessions)
if err != nil {
    log.Fatal(err)
}
fmt.Println(datasetJSON)

The package supports handling of IDs and other fields that may be represented as either strings or integers in the source JSON by using the custom StringOrInt type.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ConvertSessionsToCSV

func ConvertSessionsToCSV(sessions []Session, formatOption int, outputFilePath string) error

ConvertSessionsToCSV writes a slice of Session objects into a CSV file. It formats the CSV data in different ways based on the formatOption parameter. It returns an error if the format option is invalid or if writing the CSV data fails.

func CreateSeparateCSVFiles

func CreateSeparateCSVFiles(sessions []Session, sessionsFileName string, messagesFileName string) error

CreateSeparateCSVFiles creates two separate CSV files for sessions and messages from a slice of Session objects. It takes the file names as parameters and returns an error if the files cannot be created or if writing the data fails.

func ExtractToDataset

func ExtractToDataset(sessions []Session) (string, error)

ExtractToDataset converts a slice of Session objects into a JSON formatted string suitable for use as a dataset in machine learning applications. It returns an error if marshaling the sessions into JSON format fails.

Types

type ChatNextWebStore

type ChatNextWebStore struct {
	ChatNextWebStore Store `json:"chat-next-web-store"`
}

ChatNextWebStore is a wrapper for Store that aligns with the expected JSON structure for a chat-next-web-store object.

func ReadJSONFromFile

func ReadJSONFromFile(filePath string) (ChatNextWebStore, error)

ReadJSONFromFile reads a JSON file from the given file path and unmarshals it into a ChatNextWebStore struct. It returns an error if the file cannot be opened, the JSON is invalid, or the JSON format does not match the expected ChatNextWebStore format.

type Mask

type Mask struct {
	ID        StringOrInt `json:"id"` // Use the custom type for ID
	Avatar    string      `json:"avatar"`
	Name      string      `json:"name"`
	Lang      string      `json:"lang"`
	CreatedAt int64       `json:"createdAt"` // Assuming it's a Unix timestamp
}

Mask represents an anonymization mask for a participant in a chat session, including the participant's ID, avatar link, name, language, and creation timestamp.

type Message

type Message struct {
	ID      string `json:"id"`
	Date    string `json:"date"`
	Role    string `json:"role"`
	Content string `json:"content"`
}

Message represents a single message within a chat session, including metadata like the ID, date, role of the sender, and the content of the message itself.

type Session

type Session struct {
	ID                 string    `json:"id"`
	Topic              string    `json:"topic"`
	MemoryPrompt       string    `json:"memoryPrompt"`
	Stat               Stat      `json:"stat"`
	LastUpdate         int64     `json:"lastUpdate"` // Changed to int64 assuming it's a Unix timestamp
	LastSummarizeIndex int       `json:"lastSummarizeIndex"`
	Mask               Mask      `json:"mask"`
	Messages           []Message `json:"messages"`
}

Session represents a single chat session, including session metadata, statistics, messages, and the mask for the participant.

type Stat

type Stat struct {
	TokenCount int `json:"tokenCount"`
	WordCount  int `json:"wordCount"`
	CharCount  int `json:"charCount"`
}

Stat represents statistics for a chat session, such as the count of tokens, words, and characters.

type Store

type Store struct {
	Sessions []Session `json:"sessions"`
}

Store encapsulates a collection of chat sessions.

type StringOrInt

type StringOrInt string

StringOrInt is a custom type to handle JSON values that can be either strings or integers (Magic Golang 🎩 🪄). It implements the Unmarshaler interface to handle this mixed type when unmarshaling JSON data.

func (*StringOrInt) UnmarshalJSON

func (soi *StringOrInt) UnmarshalJSON(data []byte) error

UnmarshalJSON is a custom unmarshaler for StringOrInt that tries to unmarshal JSON data as a string, and if that fails, as an integer, which is then converted to a string.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL