xgboost-operator

module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 14, 2020 License: Apache-2.0

README

XGBoost Operator

Build Status Go Report Card

Incubating project for XGBoost operator. The XGBoost operator makes it easy to run distributed XGBoost job training and batch prediction on Kubernetes cluster.

The overall design can be found here.

Overview

This repository contains the specification and implementation of XGBoostJob custom resource definition. Using this custom resource, users can create and manage XGBoost jobs like other built-in resources in Kubernetes.

Prerequisites

Install XGBoost Operator

You can deploy the operator with default settings by running the following commands

git clone https://github.com/kubeflow/xgboost-operator
cd xgboost-operator
kubectl create -f manifests/xgboost-operator/base/cluster-role.yaml
kubectl create -f manifests/xgboost-operator/base/crd.yaml
kubectl create -f manifests/xgboost-operator/base/service-account.yaml
kubectl create -f config/rbac/rbac_role_binding.yaml

Build XGBoost Operator

XGBoost Operator is developed based on Kubebuilder and Kubeflow Common.

You can follow the installation guide of Kubebuilder to install XGBoost operator into the Kubernetes cluster.

You can check whether the XGBoostJob custom resource has been installed via:

kubectl get crd

The output should include xgboostjobs.kubeflow.org like the following:

NAME                                  CREATED AT
xgboostjobs.kubeflow.org   2019-06-14T06:49:45Z

If it is not included you can add it as follows:

## setup the build enviroment
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
export GO111MODULE=on
cd $GOPATH
mkdir src/github.com/kubeflow
cd src/github.com/kubeflow

## clone the code 
git clone git@github.com:kubeflow/xgboost-operator.git
cd xgboost-operator

## build and install xgboost operator  
make 
make install 
make run 

If the XGBoost Job operator can be installed into cluster, you can view the logs likes this

{"level":"info","ts":1589406873.090652,"logger":"entrypoint","msg":"setting up client for manager"}
{"level":"info","ts":1589406873.0991302,"logger":"entrypoint","msg":"setting up manager"}
{"level":"info","ts":1589406874.2192929,"logger":"entrypoint","msg":"Registering Components."}
{"level":"info","ts":1589406874.219318,"logger":"entrypoint","msg":"setting up scheme"}
{"level":"info","ts":1589406874.219448,"logger":"entrypoint","msg":"Setting up controller"}
{"level":"info","ts":1589406874.2194738,"logger":"controller","msg":"Running controller in local mode, using kubeconfig file"}
{"level":"info","ts":1589406874.224564,"logger":"controller","msg":"gang scheduling is set: ","gangscheduling":false}
{"level":"info","ts":1589406874.2247412,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"xgboostjob-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1589406874.224958,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"xgboostjob-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1589406874.2251048,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"xgboostjob-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1589406874.225237,"logger":"entrypoint","msg":"setting up webhooks"}
{"level":"info","ts":1589406874.225247,"logger":"entrypoint","msg":"Starting the Cmd."}
{"level":"info","ts":1589406874.32791,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"xgboostjob-controller"}
{"level":"info","ts":1589406874.430336,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"xgboostjob-controller","worker count":1}

Creating a XGBoost Training/Prediction Job

You can create a XGBoost training or prediction (batch oriented) job by modifying the XGBoostJob config file. See the distributed XGBoost Job training and prediction example.
You can change the config file and related python file (i.e., train.py or predict.py) based on your requirement.

Following the job configuration guild in the example, you can deploy a XGBoost Job to start training or prediction like:

## For training job 
cat config/samples/xgboost-dist/xgboostjob_v1alpha1_iris_train.yaml
kubectl create -f  config/samples/xgboost-dist/xgboostjob_v1alpha1_iris_train.yaml

## For batch prediction job 
cat config/samples/xgboost-dist/xgboostjob_v1alpha1_iris_predict.yaml
kubectl create -f  config/samples/xgboost-dist/xgboostjob_v1alpha1_iris_predict.yaml

Monitor a distributed XGBoost Job

Once the XGBoost Job is created, you should be able to watch how th related pod and service working. Distributed XGBoost job is trained by synchronizing different worker status via tne Rabit of XGBoost.
You can also monitor the job status.

 kubectl get -o yaml XGBoostJob/xgboost-dist-iris-test-predict

Here is the sample output when training job is finished.

Name:         xgboost-dist-iris-test
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  xgboostjob.kubeflow.org/v1alpha1
Kind:         XGBoostJob
Metadata:
  Creation Timestamp:  2019-06-27T01:16:09Z
  Generation:          9
  Resource Version:    385834
  Self Link:           /apis/xgboostjob.kubeflow.org/v1alpha1/namespaces/default/xgboostjobs/xgboost-dist-iris-test
  UID:                 2565e99a-9879-11e9-bbab-080027dfbfe2
Spec:
  Run Policy:
    Clean Pod Policy:  None
  Xgb Replica Specs:
    Master:
      Replicas:        1
      Restart Policy:  Never
      Template:
        Metadata:
          Creation Timestamp:  <nil>
        Spec:
          Containers:
            Args:
              --job_type=Train
              --xgboost_parameter=objective:multi:softprob,num_class:3
              --n_estimators=10
              --learning_rate=0.1
              --model_path=autoAI/xgb-opt/2
              --model_storage_type=oss
              --oss_param=unknown
            Image:              docker.io/merlintang/xgboost-dist-iris:1.1
            Image Pull Policy:  Always
            Name:               xgboostjob
            Ports:
              Container Port:  9991
              Name:            xgboostjob-port
            Resources:
    Worker:
      Replicas:        2
      Restart Policy:  ExitCode
      Template:
        Metadata:
          Creation Timestamp:  <nil>
        Spec:
          Containers:
            Args:
              --job_type=Train
              --xgboost_parameter="objective:multi:softprob,num_class:3"
              --n_estimators=10
              --learning_rate=0.1
              --model_path="/tmp/xgboost_model"
              --model_storage_type=oss
            Image:              docker.io/merlintang/xgboost-dist-iris:1.1
            Image Pull Policy:  Always
            Name:               xgboostjob
            Ports:
              Container Port:  9991
              Name:            xgboostjob-port
            Resources:
Status:
  Completion Time:  2019-06-27T01:17:04Z
  Conditions:
    Last Transition Time:  2019-06-27T01:16:09Z
    Last Update Time:      2019-06-27T01:16:09Z
    Message:               xgboostJob xgboost-dist-iris-test is created.
    Reason:                XGBoostJobCreated
    Status:                True
    Type:                  Created
    Last Transition Time:  2019-06-27T01:16:09Z
    Last Update Time:      2019-06-27T01:16:09Z
    Message:               XGBoostJob xgboost-dist-iris-test is running.
    Reason:                XGBoostJobRunning
    Status:                False
    Type:                  Running
    Last Transition Time:  2019-06-27T01:17:04Z
    Last Update Time:      2019-06-27T01:17:04Z
    Message:               XGBoostJob xgboost-dist-iris-test is successfully completed.
    Reason:                XGBoostJobSucceeded
    Status:                True
    Type:                  Succeeded
  Replica Statuses:
    Master:
      Succeeded:  1
    Worker:
      Succeeded:  2
Events:
  Type    Reason                   Age                From                 Message
  ----    ------                   ----               ----                 -------
  Normal  SuccessfulCreatePod      102s               xgboostjob-operator  Created pod: xgboost-dist-iris-test-master-0
  Normal  SuccessfulCreateService  102s               xgboostjob-operator  Created service: xgboost-dist-iris-test-master-0
  Normal  SuccessfulCreatePod      102s               xgboostjob-operator  Created pod: xgboost-dist-iris-test-worker-1
  Normal  SuccessfulCreateService  102s               xgboostjob-operator  Created service: xgboost-dist-iris-test-worker-0
  Normal  SuccessfulCreateService  102s               xgboostjob-operator  Created service: xgboost-dist-iris-test-worker-1
  Normal  SuccessfulCreatePod      64s                xgboostjob-operator  Created pod: xgboost-dist-iris-test-worker-0
  Normal  ExitedWithCode           47s (x3 over 49s)  xgboostjob-operator  Pod: default.xgboost-dist-iris-test-worker-1 exited with code 0
  Normal  ExitedWithCode           47s                xgboostjob-operator  Pod: default.xgboost-dist-iris-test-master-0 exited with code 0
  Normal  XGBoostJobSucceeded      47s                xgboostjob-operator  XGBoostJob xgboost-dist-iris-test is successfully completed.

Docker Images

You can use the following Dockerfile to build the images yourself:

For your convenience, you can pull the existing image from the following:

Directories

Path Synopsis
cmd
pkg
apis
Package apis contains Kubernetes API groups.
Package apis contains Kubernetes API groups.
apis/xgboostjob
Package xgboostjob contains xgboostjob API versions
Package xgboostjob contains xgboostjob API versions
apis/xgboostjob/v1alpha1
Package v1alpha1 contains API Schema definitions for the xgboostjob v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/xgboost-operator/pkg/apis/xgboostjob +k8s:defaulter-gen=TypeMeta +groupName=xgboostjob.kubeflow.org Package v1alpha1 contains API Schema definitions for the xgboostjob v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/xgboost-operator/pkg/apis/xgboostjob +k8s:defaulter-gen=TypeMeta +groupName=xgboostjob.kubeflow.org
Package v1alpha1 contains API Schema definitions for the xgboostjob v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/xgboost-operator/pkg/apis/xgboostjob +k8s:defaulter-gen=TypeMeta +groupName=xgboostjob.kubeflow.org Package v1alpha1 contains API Schema definitions for the xgboostjob v1alpha1 API group +k8s:openapi-gen=true +k8s:deepcopy-gen=package,register +k8s:conversion-gen=github.com/kubeflow/xgboost-operator/pkg/apis/xgboostjob +k8s:defaulter-gen=TypeMeta +groupName=xgboostjob.kubeflow.org

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL