Documentation ¶
Overview ¶
Package exporter provides tools for extracting and converting chat session data from JSON files into various formats, such as CSV and JSON datasets.
This package facilitates tasks like data visualization, reporting, and machine learning data preparation.
The exporter package defines types to represent chat sessions, messages, and associated metadata.
It includes functions to:
- Read chat session data from JSON files
- Convert sessions to CSV with different formatting options
- Create separate CSV files for sessions and messages
- Extract sessions to a JSON format for Hugging Face datasets
The package also handles fields in the source JSON that may be represented as either strings or integers by using the custom StringOrInt type.
Additionally, it now supports context-aware operations, allowing for better control over long-running processes and the ability to cancel them if needed.
Code:
func (soi *StringOrInt) UnmarshalJSON(data []byte) error { // Try unmarshalling into a string var s string if err := json.Unmarshal(data, &s); err != nil { // If there is an error, try unmarshalling into an int var i int64 if err := json.Unmarshal(data, &i); err != nil { return err // Return the error if it is not a string or int } // Convert int to string and assign it to the custom type *soi = StringOrInt(strconv.FormatInt(i, 10)) return nil } // If no error, assign the string value to the custom type *soi = StringOrInt(s) return nil }
Usage examples:
To read chat sessions from a JSON file and convert them to a CSV format with context support:
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) defer cancel() store, err := exporter.ReadJSONFromFile("path/to/chat-sessions.json") if err != nil { log.Fatal(err) } err = exporter.ConvertSessionsToCSV(ctx, store.ChatNextWebStore.Sessions, exporter.FormatOptionInline, "output.csv") if err != nil { log.Fatal(err) }
To create separate CSV files for sessions and messages:
err = exporter.CreateSeparateCSVFiles(store.ChatNextWebStore.Sessions, "sessions.csv", "messages.csv") if err != nil { log.Fatal(err) }
To extract chat sessions to a JSON dataset:
datasetJSON, err := exporter.ExtractToDataset(store.ChatNextWebStore.Sessions) if err != nil { log.Fatal(err) } fmt.Println(datasetJSON)
Copyright (c) 2023 H0llyW00dzZ
Index ¶
- Constants
- func ConvertSessionsToCSV(ctx context.Context, sessions []Session, formatOption int, ...) error
- func CreateSeparateCSVFiles(sessions []Session, sessionsFileName string, messagesFileName string) (err error)
- func ExtractToDataset(sessions []Session) (string, error)
- func WriteHeaders(csvWriter *csv.Writer, headers []string) error
- func WriteMessageData(csvWriter *csv.Writer, sessions []Session) error
- func WriteSessionData(csvWriter *csv.Writer, sessions []Session) error
- type ChatNextWebStore
- type Mask
- type Message
- type Session
- type Stat
- type Store
- type StringOrInt
Constants ¶
const ( // FormatOptionInline specifies the format where messages are displayed inline. FormatOptionInline = iota + 1 // FormatOptionPerLine specifies the format where each message is on a separate line. FormatOptionPerLine // FormatOptionJSON specifies the format where messages are encoded as JSON. FormatOptionJSON // OutputFormatSeparateCSVFiles specifies the option to create separate CSV files for sessions and messages. OutputFormatSeparateCSVFiles )
Variables ¶
This section is empty.
Functions ¶
func ConvertSessionsToCSV ¶
func ConvertSessionsToCSV(ctx context.Context, sessions []Session, formatOption int, outputFilePath string) error
ConvertSessionsToCSV writes a slice of Session objects into a CSV file with support for context cancellation.
It delegates the writing of sessions to format-specific functions based on the formatOption provided.
The outputFilePath parameter specifies the path to the output CSV file.
It returns an error if the context is cancelled, the format option is invalid, or writing to the CSV fails.
func CreateSeparateCSVFiles ¶
func CreateSeparateCSVFiles(sessions []Session, sessionsFileName string, messagesFileName string) (err error)
CreateSeparateCSVFiles creates two separate CSV files for sessions and messages from a slice of Session objects.
It takes the file names as parameters and returns an error if the files cannot be created or if writing the data fails.
Errors from closing files or flushing data to the CSV writers are captured and will be returned after all operations are attempted.
Error messages are logged to the console.
func ExtractToDataset ¶
ExtractToDataset converts a slice of Session objects into a JSON formatted string suitable for use as a dataset in machine learning applications.
It returns an error if marshaling the sessions into JSON format fails.
func WriteHeaders ¶ added in v1.1.8
WriteHeaders writes the provided headers to the csv.Writer.
func WriteMessageData ¶ added in v1.1.8
WriteMessageData writes message data to the provided csv.Writer.
Types ¶
type ChatNextWebStore ¶
type ChatNextWebStore struct {
ChatNextWebStore Store `json:"chat-next-web-store"`
}
ChatNextWebStore is a wrapper for Store that aligns with the expected JSON structure for a chat-next-web-store object.
func ReadJSONFromFile ¶
func ReadJSONFromFile(filePath string) (ChatNextWebStore, error)
ReadJSONFromFile reads a JSON file from the given file path and unmarshals it into a ChatNextWebStore struct.
It returns an error if the file cannot be opened, the JSON is invalid, or the JSON format does not match the expected ChatNextWebStore format.
type Mask ¶
type Mask struct { ID StringOrInt `json:"id"` // Use the custom type for ID Avatar string `json:"avatar"` Name string `json:"name"` Lang string `json:"lang"` CreatedAt int64 `json:"createdAt"` // Assuming it's a Unix timestamp }
Mask represents an anonymization mask for a participant in a chat session, including the participant's ID, avatar link, name, language, and creation timestamp.
type Message ¶
type Message struct { ID string `json:"id"` Date string `json:"date"` Role string `json:"role"` Content string `json:"content"` }
Message represents a single message within a chat session, including metadata like the ID, date, role of the sender, and the content of the message itself.
type Session ¶
type Session struct { ID string `json:"id"` Topic string `json:"topic"` MemoryPrompt string `json:"memoryPrompt"` Stat Stat `json:"stat"` LastUpdate int64 `json:"lastUpdate"` // Changed to int64 assuming it's a Unix timestamp LastSummarizeIndex int `json:"lastSummarizeIndex"` Mask Mask `json:"mask"` Messages []Message `json:"messages"` }
Session represents a single chat session, including session metadata, statistics, messages, and the mask for the participant.
type Stat ¶
type Stat struct { TokenCount int `json:"tokenCount"` WordCount int `json:"wordCount"` CharCount int `json:"charCount"` }
Stat represents statistics for a chat session, such as the count of tokens, words, and characters.
type Store ¶
type Store struct {
Sessions []Session `json:"sessions"`
}
Store encapsulates a collection of chat sessions.
type StringOrInt ¶
type StringOrInt string
StringOrInt is a custom type to handle JSON values that can be either strings or integers (Magic Golang 🎩 🪄).
It implements the Unmarshaler interface to handle this mixed type when unmarshaling JSON data.
func (*StringOrInt) UnmarshalJSON ¶
func (soi *StringOrInt) UnmarshalJSON(data []byte) error
UnmarshalJSON is a custom unmarshaler for StringOrInt that tries to unmarshal JSON data as a string, and if that fails, as an integer, which is then converted to a string.