api

package
v0.0.0-...-9f00eee Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2021 License: Apache-2.0 Imports: 1 Imported by: 0

Documentation

Index

Constants

View Source
const (
	FissionRouterUrl   = "http://router.fission"
	StorageUrl         = "http://storage.kubeml"
	SchedulerUrl       = "http://scheduler.kubeml"
	ParameterServerUrl = "http://parameter-server.kubeml"
	ControllerUrl      = "http://controller.kubeml"
	MongoUrl           = "mongodb.kubeml"
	MongoPort          = 27017
	RedisUrl           = "redisai.kubeml"
	RedisPort          = 6379
)

Addresses of services

View Source
const (
	MongoUrlDebug            = "mongodb://192.168.99.101:30074"
	StorageAddressDebug      = "http://192.168.99.102:9090"
	FissionRouterUrlDebug    = "http://192.168.99.101:32422"
	RedisAddressDebug        = "192.168.99.101"
	RedisPortDebug           = 30358
	DebugParallelism         = 2
	SchedulerPortDebug       = 10200
	ParameterServerPortDebug = 10300
	ControllerPortDebug      = 10100
	HostUrlDebug             = "http://localhost"
)

Debug

View Source
const DefaultParallelism = 5

Variables

This section is empty.

Functions

This section is empty.

Types

type Datapoint

type Datapoint struct {
	Features []float32 `json:"features"`
}

A single datapoint plus label

type DatasetSummary

type DatasetSummary struct {
	Name         string `json:"name"`
	TrainSetSize int64  `json:"train_set_size"`
	TestSetSize  int64  `json:"test_set_size"`
}

DatasetSummary describes the contents a kubeml dataset

type History

type History struct {
	Id   string       `bson:"_id" json:"id"`
	Task TrainRequest `json:"task"`
	Data JobHistory   `json:"data,omitempty"`
}

History is the train and validation history of a specific training job

type InferRequest

type InferRequest struct {
	ModelId string        `json:"model_id"`
	Data    []interface{} `json:"data"`
}

InferRequest is sent when wanting to get a result back from a trained network

type JobHistory

type JobHistory struct {
	ValidationLoss []float64 `json:"validation_loss"`
	Accuracy       []float64 `json:"accuracy"`
	TrainLoss      []float64 `json:"train_loss"`
	Parallelism    []float64 `json:"parallelism"`
	EpochDuration  []float64 `json:"epoch_duration"`
}

JobHistory saves the intermediate results from the training process epoch to epoch

type JobInfo

type JobInfo struct {
	JobId   string          `json:"id"`
	State   JobState        `json:"state"`
	Pod     *corev1.Pod     `json:"-"`
	Svc     *corev1.Service `json:"-"`
	Channel chan *JobState  `json:"-"`
}

JobInfo holds the information about the Job responsible for training the network

This includes training specific parameters such as the elapsed time, parallelism and so on, but also lower level information such as this job's pod and service definition definition Also include the channel for backwards compatibility with the thread deploying method and with a - so it is ignored

type JobState

type JobState struct {
	Parallelism int     `json:"parallelism"`
	ElapsedTime float64 `json:"elapsed_time"`
}

JobState holds the training specific variables of the job

type MetricUpdate

type MetricUpdate struct {
	ValidationLoss float64 `json:"validations_loss"`
	Accuracy       float64 `json:"accuracy"`
	TrainLoss      float64 `json:"train_loss"`
	Parallelism    float64 `json:"parallelism"`
	EpochDuration  float64 `json:"epoch_duration"`
}

MetricUpdate is received by the parameter server from the train jobs to refresh the metrics exposed to prometheus

type TrainOptions

type TrainOptions struct {
	DefaultParallelism int  `json:"default_parallelism"`
	StaticParallelism  bool `json:"static_parallelism"`
	ValidateEvery      int  `json:"validate_every"`
	// K is the parameter of the K-avg algorithm, after how many
	// updates we sync with the PS
	K int `json:"k"`
	// GoalAccuracy accuracy objective, after which we'll stop the training
	GoalAccuracy float64 `json:"goal_accuracy"`
}

TrainOptions allows users to define extra configurations for the train job such as parallelism and validation options

type TrainRequest

type TrainRequest struct {
	ModelType    string       `json:"model_type"`
	BatchSize    int          `json:"batch_size"`
	Epochs       int          `json:"epochs"`
	Dataset      string       `json:"dataset"`
	LearningRate float32      `json:"lr"`
	FunctionName string       `json:"function_name"`
	Options      TrainOptions `json:"options,omitempty"`
}

TrainRequest is sent to the controller api to start a new training job This is then embedded in the Train Task that is used by the PS

type TrainTask

type TrainTask struct {
	Parameters TrainRequest `json:"request"`
	Job        JobInfo      `json:"job,omitempty"`
}

TrainTask associates the train request sent by the user with the kubeml specific handler of the request or job It is the main object exchanged by the Scheduler and parameter server to schedule new parallelism

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL