ipvs

package
v1.11.0-alpha.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 19, 2018 License: Apache-2.0 Imports: 29 Imported by: 0

README

IPVS

This document intends to show users

  • what is IPVS
  • difference between IPVS and IPTABLES
  • how to run kube-proxy in ipvs mode and info on debugging

What is IPVS

IPVS (IP Virtual Server) implements transport-layer load balancing, usually called Layer 4 LAN switching, as part of Linux kernel.

IPVS runs on a host and acts as a load balancer in front of a cluster of real servers. IPVS can direct requests for TCP and UDP-based services to the real servers, and make services of real servers appear as virtual services on a single IP address.

IPVS vs. IPTABLES

IPVS mode was introduced in Kubernetes v1.8 and goes beta in v1.9. IPTABLES mode was added in v1.1 and become the default operating mode since v1.2. Both IPVS and IPTABLES are based on netfilter. Differences between IPVS mode and IPTABLES mode are as follows:

  1. IPVS provides better scalability and performance for large clusters.

  2. IPVS supports more sophisticated load balancing algorithms than iptables (least load, least connections, locality, weighted, etc.).

  3. IPVS supports server health checking and connection retries, etc.

When ipvs falls back to iptables

IPVS proxier will employ iptables in doing packet filtering, SNAT and supporting NodePort type service. Specifically, ipvs proxier will fall back on iptables in the following 4 scenarios.

1. kube-proxy starts with --masquerade-all=true

If kube-proxy starts with --masquerade-all=true, ipvs proxier will masquerade all traffic accessing service Cluster IP, which behaves the same as what iptables proxier. Suppose there is a service with Cluster IP 10.244.5.1 and port 8080, then the iptables installed by ipvs proxier should be like what is shown below.

# iptables -t nat -nL

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination         
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000

Chain KUBE-MARK-DROP (0 references)
target     prot opt source               destination         
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x8000

Chain KUBE-MARK-MASQ (6 references)
target     prot opt source               destination         
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  tcp  -- 0.0.0.0/0        10.244.5.1            /* default/foo:http cluster IP */ tcp dpt:8080

2. Specify cluster CIDR in kube-proxy startup

If kube-proxy starts with --cluster-cidr=<cidr>, ipvs proxier will masquerade off-cluster traffic accessing service Cluster IP, which behaves the same as what iptables proxier. Suppose kube-proxy is provided with the cluster cidr 10.244.16.0/24, and service Cluster IP is 10.244.5.1 and port is 8080, then the iptables installed by ipvs proxier should be like what is shown below.

# iptables -t nat -nL

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination         
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000

Chain KUBE-MARK-DROP (0 references)
target     prot opt source               destination         
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x8000

Chain KUBE-MARK-MASQ (6 references)
target     prot opt source               destination         
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  tcp  -- !10.244.16.0/24        10.244.5.1            /* default/foo:http cluster IP */ tcp dpt:8080

3. Load Balancer Source Ranges is specified for LB type service

When service's LoadBalancerStatus.ingress.IP is not empty and service's LoadBalancerSourceRanges is specified, ipvs proxier will install iptables which looks like what is shown below.

Suppose service's LoadBalancerStatus.ingress.IP is 10.96.1.2 and service's LoadBalancerSourceRanges is 10.120.2.0/24.

# iptables -t nat -nL

Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-POSTROUTING  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */

Chain KUBE-POSTROUTING (1 references)
target     prot opt source               destination         
MASQUERADE  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service traffic requiring SNAT */ mark match 0x4000/0x4000

Chain KUBE-MARK-DROP (0 references)
target     prot opt source               destination         
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x8000

Chain KUBE-MARK-MASQ (6 references)
target     prot opt source               destination         
MARK       all  --  0.0.0.0/0            0.0.0.0/0            MARK or 0x4000

Chain KUBE-SERVICES (2 references)
target     prot opt source       destination         
ACCEPT  tcp  -- 10.120.2.0/24    10.96.1.2       /* default/foo:http loadbalancer IP */ tcp dpt:8080
DROP    tcp  -- 0.0.0.0/0        10.96.1.2       /* default/foo:http loadbalancer IP */ tcp dpt:8080

4. Support NodePort type service

For supporting NodePort type service, ipvs will recruit the existing implementation in iptables proxier. For example,

# kubectl describe svc nginx-service
Name:			nginx-service
...
Type:			NodePort
IP:			    10.101.28.148
Port:			http	3080/TCP
NodePort:		http	31604/TCP
Endpoints:		172.17.0.2:80
Session Affinity:	None

# iptables -t nat -nL

[root@100-106-179-225 ~]# iptables -t nat -nL
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  tcp  -- !172.16.0.0/16        10.101.28.148        /* default/nginx-service:http cluster IP */ tcp dpt:3080
KUBE-SVC-6IM33IEVEEV7U3GP  tcp  --  0.0.0.0/0            10.101.28.148        /* default/nginx-service:http cluster IP */ tcp dpt:3080
KUBE-NODEPORTS  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service:http */ tcp dpt:31604
KUBE-SVC-6IM33IEVEEV7U3GP  tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service:http */ tcp dpt:31604

Chain KUBE-SVC-6IM33IEVEEV7U3GP (2 references)
target     prot opt source               destination
KUBE-SEP-Q3UCPZ54E6Q2R4UT  all  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service:http */
Chain KUBE-SEP-Q3UCPZ54E6Q2R4UT (1 references)
target     prot opt source               destination         
KUBE-MARK-MASQ  all  --  172.17.0.2           0.0.0.0/0            /* default/nginx-service:http */
DNAT       tcp  --  0.0.0.0/0            0.0.0.0/0            /* default/nginx-service:http */ tcp to:172.17.0.2:80

Run kube-proxy in ipvs mode

Currently, local-up scripts, GCE scripts and kubeadm support switching IPVS proxy mode via exporting environment variables or specifying flags.

Prerequisite

Ensure the following kernel modules required by IPVS-based kube-proxy have been compiled into the node kernel (use lsmod to check):

ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4

Packages such as ipset should also be installed on the node before using IPVS mode.

Kube-proxy will fall back to IPTABLES mode if those requirements are not met.

Local UP Cluster

Kube-proxy will run in iptables mode by default in a local-up cluster.

To use IPVS mode, users should export the env KUBE_PROXY_MODE=ipvs to specify the ipvs mode before starting the cluster:

#before running `hack/local-up-cluster.sh`
export KUBE_PROXY_MODE=ipvs
GCE Cluster

Similar to local-up cluster, kube-proxy in clusters running on GCE run in iptables mode by default. Users need to export the env KUBE_PROXY_MODE=ipvs before starting a cluster:

#before running one of the commmands chosen to start a cluster:
# curl -sS https://get.k8s.io | bash
# wget -q -O - https://get.k8s.io | bash
# cluster/kube-up.sh
export KUBE_PROXY_MODE=ipvs
Cluster Created by Kubeadm

Kube-proxy will run in iptables mode by default in a cluster deployed by kubeadm.

If you are using kubeadm with a configuration file, you can specify the ipvs mode adding SupportIPVSProxyMode: true below the kubeProxy field.

kind: MasterConfiguration
apiVersion: kubeadm.k8s.io/v1alpha1
...
kubeProxy:
  config:
    featureGates: SupportIPVSProxyMode=true
    mode: ipvs
...

before running

kube init --config <path_to_configuration_file>

If you are using Kubernetes v1.8, you can also add the flag --feature-gates=SupportIPVSProxyMode=true (deprecated since v1.9) in kubeadm init command

kubeadm init --feature-gates=SupportIPVSProxyMode=true

to specify the ipvs mode before deploying the cluster.

Notes
If ipvs mode is successfully on, you should see ipvs proxy rules (use ipvsadm) like

 # ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.1:443 rr persistent 10800
  -> 192.168.0.1:6443             Masq    1      1          0

or similar logs occur in kube-proxy logs (for example, /tmp/kube-proxy.log for local-up cluster) when the local cluster is running:

Using ipvs Proxier.

While there is no ipvs proxy rules or the following logs ocuurs indicate that the kube-proxy fails to use ipvs mode:

Can't use ipvs proxier, trying iptables proxier
Using iptables Proxier.

See the following section for more details on debugging.

Debug

Check IPVS proxy rules

Users can use ipvsadm tool to check whether kube-proxy are maintaining IPVS rules correctly. For example, we have the following services in the cluster:

 # kubectl get svc --all-namespaces
NAMESPACE     NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)         AGE
default       kubernetes   ClusterIP   10.0.0.1     <none>        443/TCP         1d
kube-system   kube-dns     ClusterIP   10.0.0.10    <none>        53/UDP,53/TCP   1d

We may get IPVS proxy rules like:

 # ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  10.0.0.1:443 rr persistent 10800
  -> 192.168.0.1:6443             Masq    1      1          0
TCP  10.0.0.10:53 rr
  -> 172.17.0.2:53                Masq    1      0          0
UDP  10.0.0.10:53 rr
  -> 172.17.0.2:53                Masq    1      0          0
Why kube-proxy can't start IPVS mode

Use the following check list to help you solve the problems:

1. Enable IPVS feature gateway

For Kubernetes v1.10 and later, feature gate SupportIPVSProxyMode is set to true by default. However, you need to enable --feature-gates=SupportIPVSProxyMode=true explicitly for Kubernetes before v1.10.

2. Specify proxy-mode=ipvs

Check whether the kube-proxy mode has been set to ipvs.

3. Install required kernel modules and packages

Check whether the ipvs required kernel modules have been compiled into the kernel and packages installed. (see Prerequisite)

Documentation

Index

Constants

View Source
const (
	// MinIPSetCheckVersion is the min ipset version we need.  IPv6 is supported in ipset 6.x
	MinIPSetCheckVersion = "6.0"

	// KubeLoopBackIPSet is used to store endpoints dst ip:port, source ip for solving hairpin purpose.
	KubeLoopBackIPSet = "KUBE-LOOP-BACK"

	// KubeClusterIPSet is used to store service cluster ip + port for masquerade purpose.
	KubeClusterIPSet = "KUBE-CLUSTER-IP"

	// KubeExternalIPSet is used to store service external ip + port for masquerade and filter purpose.
	KubeExternalIPSet = "KUBE-EXTERNAL-IP"

	// KubeLoadBalancerSet is used to store service load balancer ingress ip + port, it is the service lb portal.
	KubeLoadBalancerSet = "KUBE-LOAD-BALANCER"

	// KubeLoadBalancerIngressLocalSet is used to store service load balancer ingress ip + port with externalTrafficPolicy=local.
	KubeLoadBalancerIngressLocalSet = "KUBE-LB-INGRESS-LOCAL"

	// KubeLoadBalancerSourceIPSet is used to store service load balancer ingress ip + port + source IP for packet filter purpose.
	KubeLoadBalancerSourceIPSet = "KUBE-LOAD-BALANCER-SOURCE-IP"

	// KubeLoadBalancerSourceCIDRSet is used to store service load balancer ingress ip + port + source cidr for packet filter purpose.
	KubeLoadBalancerSourceCIDRSet = "KUBE-LOAD-BALANCER-SOURCE-CIDR"

	// KubeNodePortSetTCP is used to store the nodeport TCP port for masquerade purpose.
	KubeNodePortSetTCP = "KUBE-NODE-PORT-TCP"

	// KubeNodePortLocalSetTCP is used to store the nodeport TCP port with externalTrafficPolicy=local.
	KubeNodePortLocalSetTCP = "KUBE-NODE-PORT-LOCAL-TCP"

	// KubeNodePortSetUDP is used to store the nodeport UDP port for masquerade purpose.
	KubeNodePortSetUDP = "KUBE-NODE-PORT-UDP"

	// KubeNodePortLocalSetUDP is used to store the nodeport UDP port with externalTrafficPolicy=local.
	KubeNodePortLocalSetUDP = "KUBE-NODE-PORT-LOCAL-UDP"
)
View Source
const (

	// KubeFireWallChain is the kubernetes firewall chain.
	KubeFireWallChain utiliptables.Chain = "KUBE-FIRE-WALL"

	// KubeMarkMasqChain is the mark-for-masquerade chain
	KubeMarkMasqChain utiliptables.Chain = "KUBE-MARK-MASQ"

	// KubeNodePortChain is the kubernetes node port chain
	KubeNodePortChain utiliptables.Chain = "KUBE-NODE-PORT"

	// KubeMarkDropChain is the mark-for-drop chain
	KubeMarkDropChain utiliptables.Chain = "KUBE-MARK-DROP"

	// KubeForwardChain is the kubernetes forward chain
	KubeForwardChain utiliptables.Chain = "KUBE-FORWARD"

	// DefaultScheduler is the default ipvs scheduler algorithm - round robin.
	DefaultScheduler = "rr"

	// DefaultDummyDevice is the default dummy interface which ipvs service address will bind to it.
	DefaultDummyDevice = "kube-ipvs0"
)
View Source
const EntryInvalidErr = "error adding entry %s to ipset %s"

EntryInvalidErr indicates if an ipset entry is invalid or not

Variables

This section is empty.

Functions

func CanUseIPVSProxier

func CanUseIPVSProxier(handle KernelHandler, ipsetver IPSetVersioner) (bool, error)

CanUseIPVSProxier returns true if we can use the ipvs Proxier. This is determined by checking if all the required kernel modules can be loaded. It may return an error if it fails to get the kernel modules information without error, in which case it will also return false.

func CleanupLeftovers

func CleanupLeftovers(ipvs utilipvs.Interface, ipt utiliptables.Interface, ipset utilipset.Interface, cleanupIPVS bool) (encounteredError bool)

CleanupLeftovers clean up all ipvs and iptables rules created by ipvs Proxier.

Types

type IPGetter

type IPGetter interface {
	NodeIPs() ([]net.IP, error)
}

IPGetter helps get node network interface IP

type IPSet added in v1.9.0

type IPSet struct {
	utilipset.IPSet
	// contains filtered or unexported fields
}

IPSet wraps util/ipset which is used by IPVS proxier.

func NewIPSet added in v1.9.0

func NewIPSet(handle utilipset.Interface, name string, setType utilipset.Type, isIPv6 bool) *IPSet

NewIPSet initialize a new IPSet struct

type IPSetVersioner added in v1.9.0

type IPSetVersioner interface {
	// returns "X.Y"
	GetVersion() (string, error)
}

IPSetVersioner can query the current ipset version.

type KernelHandler added in v1.10.0

type KernelHandler interface {
	GetModules() ([]string, error)
}

KernelHandler can handle the current installed kernel modules.

type LinuxKernelHandler added in v1.10.0

type LinuxKernelHandler struct {
	// contains filtered or unexported fields
}

LinuxKernelHandler implements KernelHandler interface.

func NewLinuxKernelHandler added in v1.10.0

func NewLinuxKernelHandler() *LinuxKernelHandler

NewLinuxKernelHandler initializes LinuxKernelHandler with exec.

func (*LinuxKernelHandler) GetModules added in v1.10.0

func (handle *LinuxKernelHandler) GetModules() ([]string, error)

GetModules returns all installed kernel modules.

type NetLinkHandle added in v1.9.0

type NetLinkHandle interface {
	// EnsureAddressBind checks if address is bound to the interface and, if not, binds it.  If the address is already bound, return true.
	EnsureAddressBind(address, devName string) (exist bool, err error)
	// UnbindAddress unbind address from the interface
	UnbindAddress(address, devName string) error
	// EnsureDummyDevice checks if dummy device is exist and, if not, create one.  If the dummy device is already exist, return true.
	EnsureDummyDevice(devName string) (exist bool, err error)
	// DeleteDummyDevice deletes the given dummy device by name.
	DeleteDummyDevice(devName string) error
	// GetLocalAddresses returns all unique local type IP addresses based on filter device interface.  If filter device is not given,
	// it will list all unique local type addresses.
	GetLocalAddresses(filterDev string) (sets.String, error)
}

NetLinkHandle for revoke netlink interface

func NewNetLinkHandle added in v1.9.0

func NewNetLinkHandle() NetLinkHandle

NewNetLinkHandle will crate a new NetLinkHandle

type Proxier

type Proxier struct {
	// contains filtered or unexported fields
}

Proxier is an ipvs based proxy for connections between a localhost:lport and services that provide the actual backends.

func NewProxier

func NewProxier(ipt utiliptables.Interface,
	ipvs utilipvs.Interface,
	ipset utilipset.Interface,
	sysctl utilsysctl.Interface,
	exec utilexec.Interface,
	syncPeriod time.Duration,
	minSyncPeriod time.Duration,
	masqueradeAll bool,
	masqueradeBit int,
	clusterCIDR string,
	hostname string,
	nodeIP net.IP,
	recorder record.EventRecorder,
	healthzServer healthcheck.HealthzUpdater,
	scheduler string,
	nodePortAddresses []string,
) (*Proxier, error)

NewProxier returns a new Proxier given an iptables and ipvs Interface instance. Because of the iptables and ipvs logic, it is assumed that there is only a single Proxier active on a machine. An error will be returned if it fails to update or acquire the initial lock. Once a proxier is created, it will keep iptables and ipvs rules up to date in the background and will not terminate if a particular iptables or ipvs call fails.

func (*Proxier) OnEndpointsAdd

func (proxier *Proxier) OnEndpointsAdd(endpoints *api.Endpoints)

OnEndpointsAdd is called whenever creation of new endpoints object is observed.

func (*Proxier) OnEndpointsDelete

func (proxier *Proxier) OnEndpointsDelete(endpoints *api.Endpoints)

OnEndpointsDelete is called whenever deletion of an existing endpoints object is observed.

func (*Proxier) OnEndpointsSynced

func (proxier *Proxier) OnEndpointsSynced()

OnEndpointsSynced is called once all the initial event handlers were called and the state is fully propagated to local cache.

func (*Proxier) OnEndpointsUpdate

func (proxier *Proxier) OnEndpointsUpdate(oldEndpoints, endpoints *api.Endpoints)

OnEndpointsUpdate is called whenever modification of an existing endpoints object is observed.

func (*Proxier) OnServiceAdd

func (proxier *Proxier) OnServiceAdd(service *api.Service)

OnServiceAdd is called whenever creation of new service object is observed.

func (*Proxier) OnServiceDelete

func (proxier *Proxier) OnServiceDelete(service *api.Service)

OnServiceDelete is called whenever deletion of an existing service object is observed.

func (*Proxier) OnServiceSynced

func (proxier *Proxier) OnServiceSynced()

OnServiceSynced is called once all the initial even handlers were called and the state is fully propagated to local cache.

func (*Proxier) OnServiceUpdate

func (proxier *Proxier) OnServiceUpdate(oldService, service *api.Service)

OnServiceUpdate is called whenever modification of an existing service object is observed.

func (*Proxier) Sync

func (proxier *Proxier) Sync()

Sync is called to synchronize the proxier state to iptables and ipvs as soon as possible.

func (*Proxier) SyncLoop

func (proxier *Proxier) SyncLoop()

SyncLoop runs periodic work. This is expected to run as a goroutine or as the main loop of the app. It does not return.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL