katib

module
v0.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 10, 2020 License: Apache-2.0

README

logo

Build Status Coverage Status Go Report Card

Katib is a Kubernetes-based system for Hyperparameter Tuning and Neural Architecture Search. Katib supports a number of ML frameworks, including TensorFlow, Apache MXNet, PyTorch, XGBoost, and others.

Table of Contents

Created by gh-md-toc

Getting Started

See the getting-started guide on the Kubeflow website.

Name

Katib stands for secretary in Arabic.

Concepts in Katib

For a detailed description of the concepts in Katib, hyperparameter tuning, and neural architecture search, see the Kubeflow documentation.

Katib has the concepts of Experiment, Trial, Job and Suggestion.

Experiment

Experiment represents a single optimization run over a feasible space. Each Experiment contains a configuration

  1. Objective: What we are trying to optimize
  2. Search Space: Constraints for configurations describing the feasible space.
  3. Search Algorithm: How to find the optimal configurations

Experiment is defined as a CRD. See the detailed guide to configuring and running a Katib experiment in the Kubeflow docs.

Suggestion

A Suggestion is a proposed solution to the optimization problem which is one set of hyperparameter values or a list of parameter assignments. Then a Trial will be created to evaluate the parameter assignments.

Suggestion is defined as a CRD

Trial

A Trial is one iteration of the optimization process, which is one worker job instance with a list of parameter assignments(corresponding to a suggestion).

Trial is defined as a CRD

Worker Job

A Worker Job refers to a process responsible for evaluating a Trial and calculating its objective value.

The worker kind can be Kubernetes Job which is a non distributed execution, Kubeflow TFJob or Kubeflow PyTorchJob which are distributed executions. Thus, Katib supports multiple frameworks with the help of different job kinds.

Currently Katib supports the following exploration algorithms:

Hyperparameter Tuning

Components in Katib

Katib consists of several components as shown below. Each component is running on k8s as a deployment. Each component communicates with others via GRPC and the API is defined at pkg/apis/manager/v1alpha3/api.proto.

  • Katib main components:
    • katib-db-manager: GRPC API server of Katib which is the DB Interface.
    • katib-mysql: Data storage backend of Katib using mysql.
    • katib-ui: User interface of Katib.
    • katib-controller: Controller for Katib CRDs in Kubernetes.

Web UI

Katib provides a Web UI. You can visualize general trend of Hyper parameter space and each training history. You can use random-example or other examples to generate a similar UI. katibui

API documentation

See the Katib API reference docs.

Installation

For standard installation of Katib with support for all job operators, install Kubeflow. See the documentation:

If you install Katib with other Kubeflow components, you can't submit Katib jobs in Kubeflow namespace.

Alternatively, if you want to install Katib manually, follow these steps:

git clone git@github.com:kubeflow/manifests.git
Set `MANIFESTS_DIR` to the cloned folder.

TF operator

For installing tfjob operator, run the following

cd "${MANIFESTS_DIR}/tf-training/tf-job-crds/base"
kustomize build . | kubectl apply -f -
cd "${MANIFESTS_DIR}/tf-training/tf-job-operator/base"
kustomize build . | kubectl apply -n kubeflow -f -

Pytorch operator

For installing pytorch operator, run the following

cd "${MANIFESTS_DIR}/pytorch-job/pytorch-job-crds/base"
kustomize build . | kubectl apply -f -
cd "${MANIFESTS_DIR}/pytorch-job/pytorch-operator/base/"
kustomize build . | kubectl apply -n kubeflow -f -
Katib

Finally, you can install Katib

cd "${MANIFESTS_DIR}/katib/katib-crds/base"
kustomize build . | kubectl apply -f -
cd "${MANIFESTS_DIR}/katib/katib-controller/base"
kustomize build . | kubectl apply -f -

If you want to use Katib in a cluster that doesn't have a StorageClass for dynamic volume provisioning at your cluster, you have to create persistent volume manually to bound your persistent volume claim.

This is sample yaml file for creating a persistent volume

apiVersion: v1
kind: PersistentVolume
metadata:
  name: katib-mysql
  labels:
    type: local
    app: katib
spec:
  storageClassName: katib
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /tmp/katib

Create this pv after deploying Katib package

Running examples

After deploy everything, you can run examples to verify the installation.

This is example for tfjob operator

kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/tfjob-example.yaml

This is example for pytorch operator

kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha3/pytorchjob-example.yaml

You can check status of experiment

$ kubectl describe experiment tfjob-example -n kubeflow


Name:         tfjob-example
Namespace:    kubeflow
Labels:       <none>
Annotations:  <none>
API Version:  kubeflow.org/v1alpha3
Kind:         Experiment
Metadata:
  Creation Timestamp:  2019-10-06T12:25:44Z
  Generation:          1
  Resource Version:    2110410
  Self Link:           /apis/kubeflow.org/v1alpha3/namespaces/kubeflow/experiments/tfjob-example
  UID:                 6b2bef2d-e834-11e9-93ee-42010aa00075
Spec:
  Algorithm:
    Algorithm Name:        random
  Max Failed Trial Count:  3
  Max Trial Count:         12
  Metrics Collector Spec:
    Collector:
      Kind:  TensorFlowEvent
    Source:
      File System Path:
        Kind:  Directory
        Path:  /train
  Objective:
    Goal:                   0.99
    Objective Metric Name:  accuracy_1
    Type:                   maximize
  Parallel Trial Count:     3
  Parameters:
    Feasible Space:
      Max:           0.05
      Min:           0.01
    Name:            --learning_rate
    Parameter Type:  double
    Feasible Space:
      Max:           200
      Min:           100
    Name:            --batch_size
    Parameter Type:  int
  Trial Template:
    Go Template:
      Raw Template:  apiVersion: "kubeflow.org/v1"
kind: TFJob
metadata:
  name: {{.Trial}}
  namespace: {{.NameSpace}}
spec:
 tfReplicaSpecs:
  Worker:
    replicas: 1 
    restartPolicy: OnFailure
    template:
      spec:
        containers:
          - name: tensorflow 
            image: gcr.io/kubeflow-ci/tf-mnist-with-summaries:1.0
            imagePullPolicy: Always
            command:
              - "python"
              - "/var/tf_mnist/mnist_with_summaries.py"
              - "--log_dir=/train/metrics"
              {{- with .HyperParameters}}
              {{- range .}}
              - "{{.Name}}={{.Value}}"
              {{- end}}
              {{- end}}
Status:
  Completion Time:  2019-10-06T12:28:50Z
  Conditions:
    Last Transition Time:  2019-10-06T12:25:44Z
    Last Update Time:      2019-10-06T12:25:44Z
    Message:               Experiment is created
    Reason:                ExperimentCreated
    Status:                True
    Type:                  Created
    Last Transition Time:  2019-10-06T12:28:50Z
    Last Update Time:      2019-10-06T12:28:50Z
    Message:               Experiment is running
    Reason:                ExperimentRunning
    Status:                False
    Type:                  Running
    Last Transition Time:  2019-10-06T12:28:50Z
    Last Update Time:      2019-10-06T12:28:50Z
    Message:               Experiment has succeeded because Objective goal has reached
    Reason:                ExperimentSucceeded
    Status:                True
    Type:                  Succeeded
  Current Optimal Trial:
    Observation:
      Metrics:
        Name:   accuracy_1
        Value:  1
    Parameter Assignments:
      Name:          --learning_rate
      Value:         0.018532845700535087
      Name:          --batch_size
      Value:         109
  Start Time:        2019-10-06T12:25:44Z
  Trials:            4
  Trials Running:    2
  Trials Succeeded:  2
Events:              <none>

When the spec.Status.Condition becomes Succeeded, the experiment is finished.

You can monitor your results in Katib UI. Access Katib UI via Kubeflow dashboard if you have used standard installation or port-forward the katib-ui service if you have installed manually.

kubectl -n kubeflow port-forward svc/katib-ui 8080:80

You can access the Katib UI using this URL: http://localhost:8080/katib/.

Katib SDK
  • Install the SDK

    pip install kubeflow-katib
    
  • Get the Katib SDK documents from here.

  • Follow the example here to use the Katib SDK to create, delete and get Hyperparameter values of experiment.

Cleanups

Delete installed components using kubectl delete -f on the respective folders.

Quick Start

Please see Quick Start Guide

Who are using Katib?

Please see adopters.md

CONTRIBUTING

Please feel free to test the system! developer-guide.md is a good starting point for developers.

Citation

If you use Katib in a scientific publication, we would appreciate citations to the following paper:

A Scalable and Cloud-Native Hyperparameter Tuning System, George et al., arXiv:2006.02085, 2020.

Bibtex entry:

@misc{george2020katib,
    title={A Scalable and Cloud-Native Hyperparameter Tuning System},
    author={Johnu George and Ce Gao and Richard Liu and Hou Gang Liu and Yuan Tang and Ramdoot Pydipaty and Amit Kumar Saha},
    year={2020},
    eprint={2006.02085},
    archivePrefix={arXiv},
    primaryClass={cs.DC}
}

Directories

Path Synopsis
cmd
katib-controller/v1alpha3
Katib-controller is a controller (operator) for Experiments and Trials
Katib-controller is a controller (operator) for Experiments and Trials
katib-controller/v1beta1
Katib-controller is a controller (operator) for Experiments and Trials
Katib-controller is a controller (operator) for Experiments and Trials
hack
pkg
apis/controller
Package apis contains Kubernetes API groups.
Package apis contains Kubernetes API groups.
apis/controller/common
Package experiment contains experiment API versions
Package experiment contains experiment API versions
apis/controller/common/v1alpha3
Package v1alpha3 contains API Schema definitions for the common v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/common/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=common.kubeflow.org Package v1alpha3 contains API Schema definitions for the common v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/common/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=common.kubeflow.org
Package v1alpha3 contains API Schema definitions for the common v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/common/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=common.kubeflow.org Package v1alpha3 contains API Schema definitions for the common v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/common/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=common.kubeflow.org
apis/controller/common/v1beta1
Package v1beta1 contains API Schema definitions for the common v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/common/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=common.kubeflow.org Package v1beta1 contains API Schema definitions for the common v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/common/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=common.kubeflow.org
Package v1beta1 contains API Schema definitions for the common v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/common/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=common.kubeflow.org Package v1beta1 contains API Schema definitions for the common v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/common/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=common.kubeflow.org
apis/controller/experiments
Package experiments contains experiment API versions
Package experiments contains experiment API versions
apis/controller/suggestions
Package suggestions contains suggestion API versions
Package suggestions contains suggestion API versions
apis/controller/suggestions/v1alpha3
Package v1alpha3 contains API Schema definitions for the suggestion v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/suggestions/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=suggestion.kubeflow.org Package v1alpha3 contains API Schema definitions for the suggestion v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/suggestions/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=suggestion.kubeflow.org
Package v1alpha3 contains API Schema definitions for the suggestion v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/suggestions/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=suggestion.kubeflow.org Package v1alpha3 contains API Schema definitions for the suggestion v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/suggestions/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=suggestion.kubeflow.org
apis/controller/suggestions/v1beta1
Package v1beta1 contains API Schema definitions for the suggestion v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/suggestions/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=suggestion.kubeflow.org Package v1beta1 contains API Schema definitions for the suggestion v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/suggestions/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=suggestion.kubeflow.org
Package v1beta1 contains API Schema definitions for the suggestion v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/suggestions/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=suggestion.kubeflow.org Package v1beta1 contains API Schema definitions for the suggestion v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/suggestions/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=suggestion.kubeflow.org
apis/controller/trials
Package trials contains trial API versions
Package trials contains trial API versions
apis/controller/trials/v1alpha3
Package v1alpha3 contains API Schema definitions for the trial v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/trials/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=trial.kubeflow.org Package v1alpha3 contains API Schema definitions for the trial v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/trials/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=trials.kubeflow.org
Package v1alpha3 contains API Schema definitions for the trial v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/trials/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=trial.kubeflow.org Package v1alpha3 contains API Schema definitions for the trial v1alpha3 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/trials/v1alpha3 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=trials.kubeflow.org
apis/controller/trials/v1beta1
Package v1beta1 contains API Schema definitions for the trial v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/trials/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=trial.kubeflow.org Package v1beta1 contains API Schema definitions for the trial v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/trials/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=trials.kubeflow.org
Package v1beta1 contains API Schema definitions for the trial v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/trials/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=trial.kubeflow.org Package v1beta1 contains API Schema definitions for the trial v1beta1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/katib/pkg/apis/controller/trials/v1beta1 +k8s:defaulter-gen=TypeMeta +kubebuilder:subresource:status +groupName=trials.kubeflow.org
apis/manager/health
Package grpc_health_v1 is a generated protocol buffer package.
Package grpc_health_v1 is a generated protocol buffer package.
apis/manager/v1alpha3
Package api_v1_alpha3 is a generated protocol buffer package.
Package api_v1_alpha3 is a generated protocol buffer package.
apis/manager/v1beta1
Package api_v1_beta1 is a generated protocol buffer package.
Package api_v1_beta1 is a generated protocol buffer package.
client/controller/clientset/versioned
This package has the automatically generated clientset.
This package has the automatically generated clientset.
client/controller/clientset/versioned/fake
This package has the automatically generated fake clientset.
This package has the automatically generated fake clientset.
client/controller/clientset/versioned/scheme
This package contains the scheme of the automatically generated clientset.
This package contains the scheme of the automatically generated clientset.
client/controller/clientset/versioned/typed/common/v1alpha3
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
client/controller/clientset/versioned/typed/common/v1alpha3/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
client/controller/clientset/versioned/typed/common/v1beta1
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
client/controller/clientset/versioned/typed/common/v1beta1/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
client/controller/clientset/versioned/typed/experiments/v1alpha3
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
client/controller/clientset/versioned/typed/experiments/v1alpha3/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
client/controller/clientset/versioned/typed/experiments/v1beta1
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
client/controller/clientset/versioned/typed/experiments/v1beta1/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
client/controller/clientset/versioned/typed/suggestions/v1alpha3
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
client/controller/clientset/versioned/typed/suggestions/v1alpha3/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
client/controller/clientset/versioned/typed/suggestions/v1beta1
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
client/controller/clientset/versioned/typed/suggestions/v1beta1/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
client/controller/clientset/versioned/typed/trials/v1alpha3
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
client/controller/clientset/versioned/typed/trials/v1alpha3/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
client/controller/clientset/versioned/typed/trials/v1beta1
This package has the automatically generated typed clients.
This package has the automatically generated typed clients.
client/controller/clientset/versioned/typed/trials/v1beta1/fake
Package fake has the automatically generated clients.
Package fake has the automatically generated clients.
metricscollector/v1alpha3/common
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
metricscollector/v1beta1/common
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
mock/v1alpha3/api
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1alpha3/db
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1alpha3/experiment/manifest
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1alpha3/experiment/suggestion
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1alpha3/trial/managerclient
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1alpha3/util/katibclient
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1beta1/api
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1beta1/db
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1beta1/experiment/manifest
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1beta1/experiment/suggestion
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1beta1/trial/managerclient
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
mock/v1beta1/util/katibclient
Package mock is a generated GoMock package.
Package mock is a generated GoMock package.
test

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL