close
Skip to content

Latest commit

 

History

History
916 lines (724 loc) · 265 KB

File metadata and controls

916 lines (724 loc) · 265 KB

My Own Kind - Build

Create a Container Image

We will create a docker container image based on Centos 7 to use as a base image for all our nodes later on. I originally tried with Centos 8, which worked, but is not officially supported so I changed everything to Centos 7. Kubernetes requirements tells us what Operating Systems and versions are supported, along with some very important pieces of information including these requirements:

  • Unique hostname, MAC address, and product_uuid for every node. See here for more details.
  • Swap disabled. You MUST disable swap in order for the kubelet to work properly.

Unfortunately swap shouldn't be disabled on your laptop/desktop machine, so we work around it. On production clusters swap must be disabled as a it degrades performance when used.

Lets go ahead and write a Dockerfile that will be our base image for all the ‘nodes’ of the cluster. Copy and paste the next block of code. It will make a directory mok-centos-7 and put some files in it.

The big block of base64 encoding is the entrypoint script, taken directly from the kind source code repository, encoded so we can start off just by copy/pasting.

{
  mkdir -p mok-centos-7
  cd mok-centos-7

  # Create the Dockerfile

  cat >Dockerfile <<EnD 
FROM centos:7
ENV container docker
RUN (cd /lib/systemd/system/sysinit.target.wants/; for i in *; do [ \$i == \\
systemd-tmpfiles-setup.service ] || rm -f \$i; done); \\
rm -f /lib/systemd/system/multi-user.target.wants/*; \\
rm -f /etc/systemd/system/*.wants/*; \\
rm -f /lib/systemd/system/local-fs.target.wants/*; \\
rm -f /lib/systemd/system/sockets.target.wants/*udev*; \\
rm -f /lib/systemd/system/sockets.target.wants/*initctl*; \\
rm -f /lib/systemd/system/basic.target.wants/*; \\
rm -f /lib/systemd/system/anaconda.target.wants/*;
COPY entrypoint /usr/local/bin
VOLUME [ "/sys/fs/cgroup" ]
ENTRYPOINT ["/usr/local/bin/entrypoint"]
CMD ["/usr/sbin/init"]
STOPSIGNAL SIGRTMIN+3
EnD

  # Steal fixes from KIND. Includes things like creating a unique product_uuid
  # by bind mounting a file on top of it. These are the fixes in kind:
  # fix_kmsg
  # fix_mount
  # fix_cgroup
  # fix_machine_id
  # fix_product_name
  # fix_product_uuid  <- kubernetes expects this and the MAC address
  #                      to be unique, but it's not usually unique
  #                      for containers on a single host.
  #
  # Take a look at the file named entrypoint after running the next command.

  # The entrypoint file was encoded to base 64 and pasted here.
  # Including that file in original form would have been difficult
  # to copy/paste.
  base64 -d >entrypoint <<EnD
IyEvYmluL2Jhc2gKCiMgQ29weXJpZ2h0IDIwMTkgVGhlIEt1YmVybmV0ZXMgQXV0aG9ycy4KIwoj
IExpY2Vuc2VkIHVuZGVyIHRoZSBBcGFjaGUgTGljZW5zZSwgVmVyc2lvbiAyLjAgKHRoZSAiTGlj
ZW5zZSIpOwojIHlvdSBtYXkgbm90IHVzZSB0aGlzIGZpbGUgZXhjZXB0IGluIGNvbXBsaWFuY2Ug
d2l0aCB0aGUgTGljZW5zZS4KIyBZb3UgbWF5IG9idGFpbiBhIGNvcHkgb2YgdGhlIExpY2Vuc2Ug
YXQKIwojICAgICBodHRwOi8vd3d3LmFwYWNoZS5vcmcvbGljZW5zZXMvTElDRU5TRS0yLjAKIwoj
IFVubGVzcyByZXF1aXJlZCBieSBhcHBsaWNhYmxlIGxhdyBvciBhZ3JlZWQgdG8gaW4gd3JpdGlu
Zywgc29mdHdhcmUKIyBkaXN0cmlidXRlZCB1bmRlciB0aGUgTGljZW5zZSBpcyBkaXN0cmlidXRl
ZCBvbiBhbiAiQVMgSVMiIEJBU0lTLAojIFdJVEhPVVQgV0FSUkFOVElFUyBPUiBDT05ESVRJT05T
IE9GIEFOWSBLSU5ELCBlaXRoZXIgZXhwcmVzcyBvciBpbXBsaWVkLgojIFNlZSB0aGUgTGljZW5z
ZSBmb3IgdGhlIHNwZWNpZmljIGxhbmd1YWdlIGdvdmVybmluZyBwZXJtaXNzaW9ucyBhbmQKIyBs
aW1pdGF0aW9ucyB1bmRlciB0aGUgTGljZW5zZS4KCnNldCAtbyBlcnJleGl0CnNldCAtbyBub3Vu
c2V0CnNldCAtbyBwaXBlZmFpbAoKZml4X21vdW50KCkgewogIGVjaG8gJ0lORk86IGVuc3VyaW5n
IHdlIGNhbiBleGVjdXRlIC9iaW4vbW91bnQgZXZlbiB3aXRoIHVzZXJucy1yZW1hcCcKICAjIG5l
Y2Vzc2FyeSBvbmx5IHdoZW4gdXNlcm5zLXJlbWFwIGlzIGVuYWJsZWQgb24gdGhlIGhvc3QsIGJ1
dCBoYXJtbGVzcwogICMgVGhlIGJpbmFyeSAvYmluL21vdW50IHNob3VsZCBiZSBvd25lZCBieSBy
b290IGFuZCBoYXZlIHRoZSBzZXR1aWQgYml0CiAgY2hvd24gcm9vdDpyb290IC9iaW4vbW91bnQK
ICBjaG1vZCAtcyAvYmluL21vdW50CgogICMgVGhpcyBpcyBhIHdvcmthcm91bmQgdG8gYW4gQVVG
UyBidWcgdGhhdCBtaWdodCBjYXVzZSBgVGV4dCBmaWxlCiAgIyBidXN5YCBvbiBgbW91bnRgIGNv
bW1hbmQgYmVsb3cuIFNlZSBtb3JlIGRldGFpbHMgaW4KICAjIGh0dHBzOi8vZ2l0aHViLmNvbS9t
b2J5L21vYnkvaXNzdWVzLzk1NDcKICBzeW5jCgogIGVjaG8gJ0lORk86IHJlbW91bnRpbmcgL3N5
cyByZWFkLW9ubHknCiAgIyBzeXN0ZW1kLWluLWEtY29udGFpbmVyIHNob3VsZCBoYXZlIHJlYWQg
b25seSAvc3lzCiAgIyBodHRwczovL3d3dy5mcmVlZGVza3RvcC5vcmcvd2lraS9Tb2Z0d2FyZS9z
eXN0ZW1kL0NvbnRhaW5lckludGVyZmFjZS8KICAjIGhvd2V2ZXIsIHdlIG5lZWQgb3RoZXIgdGhp
bmdzIGZyb20gYGRvY2tlciBydW4gLS1wcml2aWxlZ2VkYCAuLi4KICAjIGFuZCB0aGlzIGZsYWcg
YWxzbyBoYXBwZW5zIHRvIG1ha2UgL3N5cyBydywgYW1vbmdzdCBvdGhlciB0aGluZ3MKICBtb3Vu
dCAtbyByZW1vdW50LHJvIC9zeXMKCiAgZWNobyAnSU5GTzogbWFraW5nIG1vdW50cyBzaGFyZWQn
CiAgIyBmb3IgbW91bnQgcHJvcGFnYXRpb24KICBtb3VudCAtLW1ha2UtcnNoYXJlZCAvCn0KCmZp
eF9jZ3JvdXAoKSB7CiAgZWNobyAnSU5GTzogZml4IGNncm91cCBtb3VudHMgZm9yIGFsbCBzdWJz
eXN0ZW1zJwogICMgRm9yIGVhY2ggY2dyb3VwIHN1YnN5c3RlbSwgRG9ja2VyIGRvZXMgYSBiaW5k
IG1vdW50IGZyb20gdGhlIGN1cnJlbnQKICAjIGNncm91cCB0byB0aGUgcm9vdCBvZiB0aGUgY2dy
b3VwIHN1YnN5c3RlbS4gRm9yIGluc3RhbmNlOgogICMgICAvc3lzL2ZzL2Nncm91cC9tZW1vcnkv
ZG9ja2VyLzxjaWQ+IC0+IC9zeXMvZnMvY2dyb3VwL21lbW9yeQogICMKICAjIFRoaXMgd2lsbCBj
b25mdXNlIEt1YmVsZXQgYW5kIGNhZHZpc29yIGFuZCB3aWxsIGR1bXAgdGhlIGZvbGxvd2luZyBl
cnJvcgogICMgbWVzc2FnZXMgaW4ga3ViZWxldCBsb2c6CiAgIyAgIGBzdW1tYXJ5X3N5c19jb250
YWluZXJzLmdvOjQ3XSBGYWlsZWQgdG8gZ2V0IHN5c3RlbSBjb250YWluZXIgc3RhdHMgZm9yICIu
Li4va3ViZWxldC5zZXJ2aWNlImAKICAjCiAgIyBUaGlzIGlzIGJlY2F1c2UgYC9wcm9jLzxwaWQ+
L2Nncm91cGAgaXMgbm90IGFmZmVjdGVkIGJ5IHRoZSBiaW5kIG1vdW50LgogICMgVGhlIGZvbGxv
d2luZyBpcyBhIHdvcmthcm91bmQgdG8gcmVjcmVhdGUgdGhlIG9yaWdpbmFsIGNncm91cAogICMg
ZW52aXJvbm1lbnQgYnkgZG9pbmcgYW5vdGhlciBiaW5kIG1vdW50IGZvciBlYWNoIHN1YnN5c3Rl
bS4KICBsb2NhbCBkb2NrZXJfY2dyb3VwX21vdW50cwogIGRvY2tlcl9jZ3JvdXBfbW91bnRzPSQo
Z3JlcCAvc3lzL2ZzL2Nncm91cCAvcHJvYy9zZWxmL21vdW50aW5mbyB8IGdyZXAgZG9ja2VyIHx8
IHRydWUpCiAgaWYgW1sgLW4gIiR7ZG9ja2VyX2Nncm91cF9tb3VudHN9IiBdXTsgdGhlbgogICAg
bG9jYWwgZG9ja2VyX2Nncm91cCBjZ3JvdXBfc3Vic3lzdGVtcyBzdWJzeXN0ZW0KICAgIGRvY2tl
cl9jZ3JvdXA9JChlY2hvICIke2RvY2tlcl9jZ3JvdXBfbW91bnRzfSIgfCBoZWFkIC1uIDEgfCBj
dXQgLWQnICcgLWYgNCkKICAgIGNncm91cF9zdWJzeXN0ZW1zPSQoZWNobyAiJHtkb2NrZXJfY2dy
b3VwX21vdW50c30iIHwgY3V0IC1kJyAnIC1mIDUpCiAgICBlY2hvICIke2Nncm91cF9zdWJzeXN0
ZW1zfSIgfAogICAgd2hpbGUgSUZTPSByZWFkIC1yIHN1YnN5c3RlbTsgZG8KICAgICAgbWtkaXIg
LXAgIiR7c3Vic3lzdGVtfSR7ZG9ja2VyX2Nncm91cH0iCiAgICAgIG1vdW50IC0tYmluZCAiJHtz
dWJzeXN0ZW19IiAiJHtzdWJzeXN0ZW19JHtkb2NrZXJfY2dyb3VwfSIKICAgIGRvbmUKICBmaQp9
CgpmaXhfbWFjaGluZV9pZCgpIHsKICAjIERlbGV0ZXMgdGhlIG1hY2hpbmUtaWQgZW1iZWRkZWQg
aW4gdGhlIG5vZGUgaW1hZ2UgYW5kIGdlbmVyYXRlcyBhIG5ldyBvbmUuCiAgIyBUaGlzIGlzIG5l
Y2Vzc2FyeSBiZWNhdXNlIGJvdGgga3ViZWxldCBhbmQgb3RoZXIgY29tcG9uZW50cyBsaWtlIHdl
YXZlIG5ldAogICMgdXNlIG1hY2hpbmUtaWQgaW50ZXJuYWxseSB0byBkaXN0aW5ndWlzaCBub2Rl
cy4KICBlY2hvICdJTkZPOiBjbGVhcmluZyBhbmQgcmVnZW5lcmF0aW5nIC9ldGMvbWFjaGluZS1p
ZCcKICBybSAtZiAvZXRjL21hY2hpbmUtaWQKICBzeXN0ZW1kLW1hY2hpbmUtaWQtc2V0dXAKfQoK
Zml4X3Byb2R1Y3RfbmFtZSgpIHsKICAjIHRoaXMgaXMgYSBzbWFsbCBmaXggdG8gaGlkZSB0aGUg
dW5kZXJseWluZyBoYXJkd2FyZSBhbmQgZml4IGlzc3VlICM0MjYKICAjIGh0dHBzOi8vZ2l0aHVi
LmNvbS9rdWJlcm5ldGVzLXNpZ3Mva2luZC9pc3N1ZXMvNDI2CiAgaWYgW1sgLWYgL3N5cy9jbGFz
cy9kbWkvaWQvcHJvZHVjdF9uYW1lIF1dOyB0aGVuCiAgICBlY2hvICdJTkZPOiBmYWtpbmcgL3N5
cy9jbGFzcy9kbWkvaWQvcHJvZHVjdF9uYW1lIHRvIGJlICJraW5kIicKICAgIG1rZGlyIC1wIC9r
aW5kCiAgICBlY2hvICdraW5kJyA+IC9raW5kL3Byb2R1Y3RfbmFtZQogICAgbW91bnQgLW8gcm8s
YmluZCAva2luZC9wcm9kdWN0X25hbWUgL3N5cy9jbGFzcy9kbWkvaWQvcHJvZHVjdF9uYW1lCiAg
ZmkKfQoKZml4X3Byb2R1Y3RfdXVpZCgpIHsKICAjIFRoZSBzeXN0ZW0gVVVJRCBpcyB1c3VhbGx5
IHJlYWQgZnJvbSBETUkgdmlhIHN5c2ZzLCB0aGUgcHJvYmxlbSBpcyB0aGF0CiAgIyBpbiB0aGUg
a2luZCBjYXNlIHRoaXMgbWVhbnMgdGhhdCBhbGwgKGNvbnRhaW5lcikgbm9kZXMgc2hhcmUgdGhl
IHNhbWUKICAjIHN5c3RlbS9wcm9kdWN0IHV1aWQsIGFzIHRoZXkgc2hhcmUgdGhlIHNhbWUgRE1J
LgogICMgTm90ZTogVGhlIFVVSUQgaXMgcmVhZCBmcm9tIERNSSwgdGhpcyB0b29sIGlzIG92ZXJ3
cml0aW5nIHRoZSBzeXNmcyBmaWxlcwogICMgd2hpY2ggc2hvdWxkIGZpeCB0aGUgYXR0YWNoZWQg
aXNzdWUsIGJ1dCB0aGlzIHdvcmthcm91bmQgZG9lcyBub3QgYWRkcmVzcwogICMgdGhlIGlzc3Vl
IGlmIGEgdG9vbCBpcyByZWFkaW5nIGRpcmVjdGx5IGZyb20gRE1JLgogICMgaHR0cHM6Ly9naXRo
dWIuY29tL2t1YmVybmV0ZXMtc2lncy9raW5kL2lzc3Vlcy8xMDI3CiAgW1sgISAtZiAva2luZC9w
cm9kdWN0X3V1aWQgXV0gJiYgY2F0IC9wcm9jL3N5cy9rZXJuZWwvcmFuZG9tL3V1aWQgPiAva2lu
ZC9wcm9kdWN0X3V1aWQKICBpZiBbWyAtZiAvc3lzL2NsYXNzL2RtaS9pZC9wcm9kdWN0X3V1aWQg
XV07IHRoZW4KICAgIGVjaG8gJ0lORk86IGZha2luZyAvc3lzL2NsYXNzL2RtaS9pZC9wcm9kdWN0
X3V1aWQgdG8gYmUgcmFuZG9tJwogICAgbW91bnQgLW8gcm8sYmluZCAva2luZC9wcm9kdWN0X3V1
aWQgL3N5cy9jbGFzcy9kbWkvaWQvcHJvZHVjdF91dWlkCiAgZmkKICBpZiBbWyAtZiAvc3lzL2Rl
dmljZXMvdmlydHVhbC9kbWkvaWQvcHJvZHVjdF91dWlkIF1dOyB0aGVuCiAgICBlY2hvICdJTkZP
OiBmYWtpbmcgL3N5cy9kZXZpY2VzL3ZpcnR1YWwvZG1pL2lkL3Byb2R1Y3RfdXVpZCBhcyB3ZWxs
JwogICAgbW91bnQgLW8gcm8sYmluZCAva2luZC9wcm9kdWN0X3V1aWQgL3N5cy9kZXZpY2VzL3Zp
cnR1YWwvZG1pL2lkL3Byb2R1Y3RfdXVpZAogIGZpCn0KCmZpeF9rbXNnKCkgewogICMgSW4gZW52
aXJvbm1lbnRzIHdoZXJlIC9kZXYva21zZyBpcyBub3QgYXZhaWxhYmxlLCB0aGUga3ViZWxldCAo
MS4xNSspIHdvbid0CiAgIyBzdGFydCBiZWNhdXNlIGl0IGNhbm5vdCBvcGVuIC9kZXYva21zZyB3
aGVuIHN0YXJ0aW5nIHRoZSBrbXNncGFyc2VyIGluIHRoZQogICMgT09NIHBhcnNlci4KICAjIFRv
IHN1cHBvcnQgdGhvc2UgZW52aXJvbm1lbnRzLCB3ZSBsaW5rIC9kZXYva21zZyB0byAvZGV2L2Nv
bnNvbGUuCiAgIyBodHRwczovL2dpdGh1Yi5jb20va3ViZXJuZXRlcy1zaWdzL2tpbmQvaXNzdWVz
LzY2MgogIGlmIFtbICEgLWUgL2Rldi9rbXNnIF1dOyB0aGVuCiAgICBpZiBbWyAtZSAvZGV2L2Nv
bnNvbGUgXV07IHRoZW4KICAgICAgZWNobyAnV0FSTjogL2Rldi9rbXNnIGRvZXMgbm90IGV4aXN0
LCBzeW1saW5raW5nIC9kZXYvY29uc29sZScgPiYyCiAgICAgIGxuIC1zIC9kZXYvY29uc29sZSAv
ZGV2L2ttc2cKICAgIGVsc2UKICAgICAgZWNobyAnV0FSTjogL2Rldi9rbXNnIGRvZXMgbm90IGV4
aXN0LCBub3IgZG9lcyAvZGV2L2NvbnNvbGUhJyA+JjIKICAgIGZpCiAgZmkKfQoKY29uZmlndXJl
X3Byb3h5KCkgewogICMgZW5zdXJlIGFsbCBwcm9jZXNzZXMgcmVjZWl2ZSB0aGUgcHJveHkgc2V0
dGluZ3MgYnkgZGVmYXVsdAogICMgaHR0cHM6Ly93d3cuZnJlZWRlc2t0b3Aub3JnL3NvZnR3YXJl
L3N5c3RlbWQvbWFuL3N5c3RlbWQtc3lzdGVtLmNvbmYuaHRtbAogIG1rZGlyIC1wIC9ldGMvc3lz
dGVtZC9zeXN0ZW0uY29uZi5kLwogIGNhdCA8PEVPRiA+L2V0Yy9zeXN0ZW1kL3N5c3RlbS5jb25m
LmQvcHJveHktZGVmYXVsdC1lbnZpcm9ubWVudC5jb25mCltNYW5hZ2VyXQpEZWZhdWx0RW52aXJv
bm1lbnQ9IkhUVFBfUFJPWFk9JHtIVFRQX1BST1hZOi19IiAiSFRUUFNfUFJPWFk9JHtIVFRQU19Q
Uk9YWTotfSIgIk5PX1BST1hZPSR7Tk9fUFJPWFk6LX0iCkVPRgp9CgojIHJ1biBwcmUtaW5pdCBm
aXh1cHMKZml4X2ttc2cKZml4X21vdW50CmZpeF9jZ3JvdXAKZml4X21hY2hpbmVfaWQKZml4X3By
b2R1Y3RfbmFtZQpmaXhfcHJvZHVjdF91dWlkCmNvbmZpZ3VyZV9wcm94eQoKIyB3ZSB3YW50IHRo
ZSBjb21tYW5kIChleHBlY3RlZCB0byBiZSBzeXN0ZW1kKSB0byBiZSBQSUQxLCBzbyBleGVjIHRv
IGl0CmV4ZWMgIiRAIgo=
EnD

  chmod +x entrypoint

  docker build --rm -t local/mok-centos-7 .
}

That's it! We now have an image suitable for running containers. I chose CentOS 7 because it includes native support for running systemd inside containers, gives example command arguments to run it, and is a fully ‘supported’ use case.

Create a Single Node Kubernetes Cluster

Start the Node container

We will start a single node, naming it 'master-1', where this name will be seen in docker ps, and saving its container ID in the variable, id, to be used in the next section.

If we don't mount the /lib/modules directory then kubeadm init will fail. Kubeadm inspects the kernel configuration in /lib/modules/<KERNEL VERSION>/config (take a look - it's a text file) to see what Operating System features it can use. Don't forget, containers don't have their own kernels running, hence the bind mount.

We also set the host name otherwise we get some random ID as the host name and it's confusing when looking at the journal logs.

The container also needs to be ‘privileged’ since it does many things an application container wouldn't normally do. For example, mounting filesystems - seen above in entrypoint, running iptables for firewalling and network address translation (NAT/MASQUERADE), creating bridges and network interfaces, and starting containers.

{
  id=$(docker run \
    --privileged \
    -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
    -v /lib/modules:/lib/modules:ro \
    --tmpfs /run \
    --tmpfs /tmp \
    --name master-1 \
    --hostname master-1 \
    --detach \
    local/mok-c7-systemd)
}

Side Note: Just in case you don't know, all the cgroup and namespace features that are used to containerise an application are actually features of the Linux kernel - many people think it's docker that provides all that loveliness. In contrast, docker is a wrapper around these kernel features. Before docker the two leading projects were libvirt, which was used to control KVM but libvirt-lxc did the container part, and lxc, before it moved to Ubuntu. These projects put a great deal of work into the user-space side of these kernel features and were very active contributers to the linux kernel. OpenVZ were there before everyone else though with huge patches to the linux kernel that could not be merged. OpenVZ is still more secure than linux kernel containerisation and they also contributed many smaller patches to the linux kernel. OpenVZ is still the most secure and performant way to provide general purpose linux containers to users - when used for hosting for instance.

Log in to the Centos 7 container

We exec into the container to run all the remaining commands.

docker exec -ti $id bash

Correctly Set Iptables Bridge Routing

It is a requirement that the Linux node’s iptables correctly sees bridged traffic, so ensure that net.bridge.bridge-nf-call-iptables is set to 1 in the sysctl config.

{
  cat >/etc/sysctl.d/k8s.conf <<EnD
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EnD
  sysctl --system
}

Get the Kubernetes CNI plugins

The easy way, from kubernetes-the-hard-way/12-configure-pod-networking.md at mmumshad:

{
  yum -y install wget
  mkdir -p /opt/cni/bin

  # CNI Plugins
  wget https://github.com/containernetworking/plugins/releases/download/v0.7.5/cni-plugins-amd64-v0.7.5.tgz

  # Extract to /opt/cni/bin
  tar -xzvf cni-plugins-amd64-v0.7.5.tgz --directory /opt/cni/bin/
}

The production ready way, according to github.com/cri-o/.../cni/README.md. This installs more plugins than the easier method above. Choose either one, but not both.

{
  yum -y install git golang
  cd
  git clone https://github.com/containernetworking/plugins
  cd plugins
  git checkout v0.8.1

  ./build_linux.sh # or build_windows.sh

  mkdir -p /opt/cni/bin
  cp bin/* /opt/cni/bin/
}

Install CRI-O

For more information about crio see GitHub cri-o: and github.com/containernetworking/plugins

{
  # The crio docs tell us that this is the way to do it
  CRIO_VERSION=1.17 # <- for kubernetes 1.17 - crio 1.18 is not available yet
  curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable/CentOS_7/devel:kubic:libcontainers:stable.repo
  curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable:cri-o:$CRIO_VERSION.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$CRIO_VERSION/CentOS_7/devel:kubic:libcontainers:stable:cri-o:$CRIO_VERSION.repo

  # Kubernetes needs the traffic control binary, tc.
  # You'll see an error in kubeadm later if 'tc' is not installed.
  yum -y install cri-o iptables iproute-tc

  # Comment out hard-coded path to conmon. It's already in the path so will be found
  # and the existing hard coded one is not right for our system
  sed -i 's/\(conmon = .*\)/#\1/' /etc/crio/crio.conf
  # Also change cgroup_manager to cgroupfs rather than systemd
  # I found this out because cgroupfs just didn't work
  sed -i 's/\(cgroup_manager =\).*/\1 "cgroupfs"/' /etc/crio/crio.conf

  # Write a new CNI crio bridge file without ipv6 enabled otherwise it breaks crio.
  cat >/etc/cni/net.d/100-crio-bridge.conf <<EnD
{
    "cniVersion": "0.3.1",
    "name": "crio-bridge",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "hairpinMode": true,
    "ipam": {
        "type": "host-local",
        "routes": [
            { "dst": "0.0.0.0/0" }

        ],
        "ranges": [
            [{ "subnet": "10.88.0.0/16" }]
        ]
    }
}
EnD

  # Start crio
  systemctl enable --now crio

  sleep 5

  # Check crio
  systemctl status crio
}

Install crictl, a crio control binary

This step is not necessary but is nice for us to have to be able to check that crio is set up correctly.

NOTE: Actually, kubeadm uses crictl to pull the control plane's container images. So this step is REQUIRED.

See also: github.com/kubernetes-sigs/.../crictl.md

{
  VERSION="v1.17.0"
  curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/$VERSION/crictl-${VERSION}-linux-amd64.tar.gz --output crictl-${VERSION}-linux-amd64.tar.gz
  tar zxvf crictl-$VERSION-linux-amd64.tar.gz -C /usr/local/bin
  rm -f crictl-$VERSION-linux-amd64.tar.gz
}

Test cri-o

If you did not install crictl in the previous step then skip this step. If things don't work later on then come back here and try these steps after installing crictl

When using crictl there is no way to run a pod without writing two config files - and this is how kubernetes will do it, essentially - it first creates the pod (pause container) then creates the containers which share many of the same namespaces (they are linked).

Also note that the pod-config.json can only be used once, regardless of whether the pod has been deleted. To run the pod again the uid field needs to be changed, but the error messages make this clear.

{
  crictl version
  # Output should be similar to:
  #Version:  0.1.0
  #RuntimeName:  cri-o
  #RuntimeVersion:  1.17.2
  #RuntimeApiVersion:  v1alpha1

  crictl pull busybox
  # Output should be similar to:
  #Image is up to date for docker.io/library/busybox@sha256:a2490cec4...

  cat >pod-config.json <<EnD
{
    "metadata": {
        "name": "nginx-sandbox",
        "namespace": "default",
        "attempt": 1,
        "uid": "hdishd83dpaidwnduwk28bcsb"
    },
    "log_directory": "/tmp",
    "linux": {
    }
}
EnD
  cat >container-config.json <<EnD
{
  "metadata": {
      "name": "busybox"
  },
  "image":{
      "image": "busybox"
  },
  "command": [
      "top"
  ],
  "log_path":"busybox.0.log",
  "linux": {
  }
}
EnD

  id=`crictl run container-config.json pod-config.json`

  # Check if it's running
  crictl ps

  # Exec into it then ping google, it should work
  crictl exec -it $id ping -c 5 8.8.8.8

  # And delete the container
  crictl stop $id
  crictl rm $id
}

Output from the above should look similar to:

Version:  0.1.0
RuntimeName:  cri-o
RuntimeVersion:  1.17.2
RuntimeApiVersion:  v1alpha1
Image is up to date for docker.io/library/busybox@sha256:a2490cec4484ee6c1068ba3a05f89934010c85242f736280b35343483b2264b6
CONTAINER           IMAGE               CREATED                  STATE               NAME                ATTEMPT             POD ID
71cefb4720dde       busybox             Less than a second ago   Running             busybox             0                   b31a1c63e996a
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=55 time=5.898 ms
64 bytes from 8.8.8.8: seq=1 ttl=55 time=5.888 ms
64 bytes from 8.8.8.8: seq=2 ttl=55 time=6.043 ms
64 bytes from 8.8.8.8: seq=3 ttl=55 time=5.621 ms
64 bytes from 8.8.8.8: seq=4 ttl=55 time=5.854 ms

--- 8.8.8.8 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 5.621/5.860/6.043 ms
71cefb4720dde4d92199e9855ab4eca229add95241e2558bdfdd44eb9d76d6aa
71cefb4720dde4d92199e9855ab4eca229add95241e2558bdfdd44eb9d76d6aa

Also take a look at iptables -L and iptables -L -t nat to see the rules that have been added. Without the MASQUERADE target the ping to google would not work - at least, it would get to google but the response would be dropped by the kernel when it returns.

This suggests that CRI-O uses the CNI plugins to set up networking for the containers, rather than it being kubernetes, since no kubernetes services are running yet. I incorrectly thought it would be the kube-proxy that did that.

Install k8s core binaries

Kubeadm does not install kubelet or kubectl. So kubeadm, kubectl and kubelet must be installed manually. See: kubernetes.io/docs/.../#installing-kubeadm-kubelet-and-kubectl.

After running the following commands the kubelet's job is started but if you check it, with systemctl status kubelet, it is in an error state. This will be fixed by kubeadm when it creates a configuration for it.

{
  cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

  # Set SELinux in permissive mode (effectively disabling it)
  setenforce 0
  sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

  yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

  systemctl enable --now kubelet
}

The pod-network-cidr is defined by whichever CNI network plugin is used. The value used below is 10.244.0.0 which is for flannel. See kubernetes.io/docs/.../#pod-network.

The commands below ensure kubeadm and kubelet don't fail when they see that swap is enabled. We don't want to turn swap off on our dev laptop so we have to tell them to ignore the swap.

# There's no point running this but no harm if you do

{
  mkdir -p /var/lib/kubelet
  cat >/var/lib/kubelet/config.yaml <<EnD
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
failSwapOn: false
EnD
}

Unfortunately kubeadm will overwrite that configuration, thankfully actually, because there are lots of missing options, but kubeadm won't be able to finish since failSwapOn: false is not set. Kind accomplishes this by setting a flag on the kubelet binary, but this has been deprecated, so let's not use that (it's easier though!). We need to run kubeadm init in 'phases'. In the next section we will do precisely that.

Install the kubernetes components with kubeadm

Run kubeadm now and it will create all of our CA and certificates and start the kube-scheduler, kube-apiserver, kube-controller-manager and etcd.

As stated in the previous section, kubeadm needs to be run in phases.

Take a look at all the phases available using kubeadm init --help and read kubeadm init for more information. Now let's install kubernetes:

{
  # Run the preflight phase
  kubeadm init \
    --ignore-preflight-errors Swap \
    phase preflight

  # Set up the kubelet
  kubeadm init phase kubelet-start

  # Edit the kubelet configuration file
  echo "failSwapOn: false" >>/var/lib/kubelet/config.yaml

  # Tell kubeadm to carry on from here
  kubeadm init \
    --pod-network-cidr=10.244.0.0/16 \
    --ignore-preflight-errors Swap \
    --skip-phases=preflight,kubelet-start
}

A sample output can be seen at Creating a single control-plane cluster with kubeadm - Kubernetes.

Inspect the system

Use ps -ef or similar and crictl ps to watch the static pods start over the next few minutes. Take another look at iptables -L and iptables -L -t nat as these are now quite a bit different than earlier - this would be kube-proxy this time (would it? TODO). Watch journalctl -xef to see warnings and errors. When everything is up use kubectl to inspect kubernetes.

For example:

# See the process tree
ps axf

# What crictl thinks is running
crictl ps

# iptables are quite different now
iptables -L
iptables -L -t nat

# System logs
journalctl -xef

# Kubectl pods
kubectl get pods -n kube-system --kubeconfig /etc/kubernetes/admin.conf
# NAME                               READY   STATUS    RESTARTS   AGE
# coredns-66bff467f8-2g9q2           1/1     Running   0          6m44s
# coredns-66bff467f8-gwwm9           1/1     Running   0          6m44s
# etcd-master-1                      1/1     Running   0          7m14s
# kube-apiserver-master-1            1/1     Running   0          7m9s
# kube-controller-manager-master-1   1/1     Running   0          7m22s
# kube-proxy-vhkk9                   1/1     Running   0          6m44s
# kube-scheduler-master-1            1/1     Running   0          7m18s

# Kubectl nodes
kubectl describe nodes -n kube-system --kubeconfig /etc/kubernetes/admin.conf
# ...
# Taints:             node-role.kubernetes.io/master:NoSchedule
# ...
# Kernel Version:             5.5.17-200.fc31.x86_64
# OS Image:                   CentOS Linux 7 (Core)
# Operating System:           linux
# Architecture:               amd64
# Container Runtime Version:  cri-o://1.17.2
# Kubelet Version:            v1.18.2
# Kube-Proxy Version:         v1.18.2
# ...

# To make kubectl shorter at this stage use an environment variable
# For example:
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl get nodes

It's nearly all there, but we don't have networking yet.

Install the flannel network plugin

Install flannel exactly as described in pod-networking. Flannel is a ‘simple’ Linux bridge.

# This won't work but go ahead and try it

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/2140ac876ef134e0ed5af15c65e414cf26827915/Documentation/kube-flannel.yml

A new pod will appear named kube-flannel-blah-blah, but with extra text in place of 'blah-blah'.

Now if you actually ran the previous kubectl apply then everything will be broken now, but why?

Well, cri-o, like docker, starts it's own bridge named cni0. The docker bridge is named docker0 and you'll see it on many kubernetes servers (and in the diagram below). The network crio creates is for a single machine. If we used the crio defined network on a few nodes then we would need to manually manage the subnets then add routes to the other servers in the cluster. Obviously this doesn't scale well so we need to delete the CNI configuration files included with cri-o (they are in the rpm - use rpm -ql cri-o to see them). Once deleted the network will start up and flannel will now tell CNI which subnet crio should use. We end up with two, yes two, bridges: cni0 for internal communication on each node, probably with a /24 network assigned, and a flannel.X bridge which handles routing between nodes. Bridged networks don't need extra routes to hosts because they discover and maintain a list of ARP addresses and the bridges talk to each other to know the topology.

TODO The following image shows how this works. View the original diagram at Kubernetes Is Hard: Why EKS Makes It Easier for Network and…

Image

So, after installing cri-o, the files it creates in /etc/cni/net.d/ should be deleted. crio is fine for our single node cluster though. It's only when we have more nodes that the subnet ranges will need to be managed. For crio this would need to be done manually, but Flannel manages this for us so it's best to use that from now, since the goal is to create a kubernetes cluster later on.

Let's delete the crio configuration files and use Flannel as we will add a single extra node to this single node cluster just to test this, before going on to create more complex clusters.

# Delete crio's cni config files
rm /etc/cni/net.d/100-crio-bridge.conf
rm /etc/cni/net.d/200-loopback.conf

Remove the taint

Remove the taint so pods can run on this node.

kubectl taint nodes --all node-role.kubernetes.io/master-

That's about it for the single node cluster. We can still join nodes to this single node cluster so we'll try that in a bit, but first we will test the node we've built.

Test the Single Node Cluster with Sonobuoy

Let's first just try to run a busybox container.

# Tell kubectl which config to use
export KUBECONFIG=/etc/kubernetes/admin.conf

# run a busybox container
kubectl run -ti --rm busybox --image=busybox sh
# Try ping 8.8.8.8 the exit the container

Now let's try sonobuoy as suggested in Creating a single control-plane cluster with kubeadm - What's Next.

By default, sonobuoy run runs the Kubernetes conformance tests. Kubeadm creates conformant clusters so let's find out!

# This has support for kubernetes 1.18
curl -LO https://github.com/vmware-tanzu/sonobuoy/releases/download/v0.18.0/sonobuoy_0.18.0_linux_amd64.tar.gz
tar xvfz sonobuoy_0.18.0_linux_amd64.tar.gz

# Quick test to make sure sonobuoy works
./sonobuoy run --wait --mode quick
# INFO[0000] created object name=sonobuoy namespace= resource=namespaces
# INFO[0000] created object name=sonobuoy-serviceaccount namespace=sonobuoy resource=serviceaccounts
# INFO[0000] created object name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterrolebindings
# INFO[0000] created object name=sonobuoy-serviceaccount-sonobuoy namespace= resource=clusterroles
# INFO[0000] created object name=sonobuoy-config-cm namespace=sonobuoy resource=configmaps
# INFO[0000] created object name=sonobuoy-plugins-cm namespace=sonobuoy resource=configmaps
# INFO[0000] created object name=sonobuoy namespace=sonobuoy resource=pods
# INFO[0000] created object name=sonobuoy-master namespace=sonobuoy resource=services

# Delete sonobuoy test files
./sonobuoy delete

# Now for a full test
./sonobuoy run --wait

# In another terminal window check the status
./sonobuoy status
#          PLUGIN     STATUS   RESULT   COUNT
#             e2e    running                1
#    systemd-logs   complete                1
# 
# Sonobuoy is still running. Runs can take up to 60 minutes.
# - time to have a cup of tea, or a full meal! This took 70 minutes
# on my old laptop.

# Retrieve results
results=$(./sonobuoy retrieve)

# Inspect results
./sonobuoy results $results

Here are my results:

./sonobuoy results $results
Plugin: e2e
Status: failed
Total: 4992
Passed: 274
Failed: 1
Skipped: 4717

Failed tests:
[sig-apps] Daemon set [Serial] should rollback without unnecessary restarts [Conformance]

Plugin: systemd-logs
Status: passed
Total: 1
Passed: 1
Failed: 0
Skipped: 0

Just one test failed out of 274 conformance tests - we're non-conformant. There is a note about this: Conformance test "[sig-apps] Daemon set [Serial] should rollback without unnecessary restarts [Conformance] [It]" is skipped on some clusters · Issue #69601 · kubernetes/kubernetes · GitHub. Hopefully this problem will be rectified, as the GitHub issue suggests, when another node is added, which happens next!

Add a 2nd node for fun

We need to create a second 'node' and to do that we need to go through this whole document again right up to and including Install k8s core binaries.

Let's collect all the required commands together right here instead.

Step 1: Start a new 'node' and exec in:

id=$(docker run \
  --privileged \
  -v /sys/fs/cgroup:/sys/fs/cgroup:ro \
  -v /lib/modules:/lib/modules:ro \
  --tmpfs /run \
  --tmpfs /tmp \
  --name node-1 \
  --hostname node-1 \
  --detach \
  local/mok-centos-7)

docker exec -ti $id bash

Step 2: Copy the next commands and paste as one big command.

{
  cat >/etc/sysctl.d/k8s.conf <<EnD
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EnD
  sysctl --system

  yum -y install git golang
  cd
  git clone https://github.com/containernetworking/plugins
  cd plugins
  git checkout v0.8.1

  ./build_linux.sh # or build_windows.sh

  mkdir -p /opt/cni/bin
  cp bin/* /opt/cni/bin/

  CRIO_VERSION=1.17
  curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable/CentOS_7/devel:kubic:libcontainers:stable.repo
  curl -L -o /etc/yum.repos.d/devel:kubic:libcontainers:stable:cri-o:$CRIO_VERSION.repo https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable:cri-o:$CRIO_VERSION/CentOS_7/devel:kubic:libcontainers:stable:cri-o:$CRIO_VERSION.repo

  # Kubernetes needs the traffic control binary, tc.
  yum -y install cri-o iptables iproute-tc

  # Comment out hard-coded path to conmon. It's already in the path so will be found
  # and the existing hard coded one is not right for our system
  sed -i 's/\(conmon = .*\)/#\1/' /etc/crio/crio.conf
  # Also change cgroup_manager to cgroupfs rather than systemd
  sed -i 's/\(cgroup_manager =\).*/\1 "cgroupfs"/' /etc/crio/crio.conf

  # Write a new CNI crio bridge file without ipv6 enabled as it breaks crio.
  cat >/etc/cni/net.d/100-crio-bridge.conf <<EnD
{
    "cniVersion": "0.3.1",
    "name": "crio-bridge",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "hairpinMode": true,
    "ipam": {
        "type": "host-local",
        "routes": [
            { "dst": "0.0.0.0/0" }

        ],
        "ranges": [
            [{ "subnet": "10.88.0.0/16" }]
        ]
    }
}
EnD

  # Start crio
  systemctl enable --now crio

  CRICTL_VERSION="v1.17.0"
  curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/$CRICTL_VERSION/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz --output crictl-${CRICTL_VERSION}-linux-amd64.tar.gz
  tar zxvf crictl-$CRICTL_VERSION-linux-amd64.tar.gz -C /usr/local/bin
  rm -f crictl-$CRICTL_VERSION-linux-amd64.tar.gz

  # There is no el8 yet, so using el7 packages
  cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

  # Set SELinux in permissive mode (effectively disabling it)
  setenforce 0
  sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

  yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes

  systemctl enable --now kubelet

  # Delete crio's cni config files
  rm -f /etc/cni/net.d/100-crio-bridge.conf
  rm -f /etc/cni/net.d/200-loopback.conf
}

Step 3: Get required information from the Master node.

The same problem exists as before with respect to kubelet refusing to start with swap set to on, so this needs to be done in phases again.

A few pieces of information are required before starting:

  1. A token. On the MASTER node, create a token with: kubeadm token create

  2. A hash of the master's CA. Get the hash, on the MASTER node, using:

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | \
   openssl rsa -pubin -outform der 2>/dev/null | \
   openssl dgst -sha256 -hex | sed 's/^.* //'
  1. The IP address and port of the master node.

    Get the IP address of the master node. The port is 6443.

Step 4: Join the node to the cluster.

On the NODE, node-1, set up the variables then join the node like so:

# set up the variables
# For example:
#token=8ewj1p.9r9hcjoqgajrj4gi
#hash=9b5f8ef25dd472209c230823d4ff6d587896b97b4e27f1ddb599eac367fe3276
#cp_host=172.17.0.7
# Set up your variables:

# Fill in the details:
token=
hash=
cp_host=

On the MASTER, run a watch command to be able to see exactly when the new node becomes available:

bash
# Tell kubectl which config to use
export KUBECONFIG=/etc/kubernetes/admin.conf

# run the watch command (Use Control-C to exit it later)
watch kubectl get nodes -o wide 

Copy and paste the block in the new NODE, node-1:

{
# Do the preflight tests (ignoring swap error)
kubeadm join \
  phase preflight \
  --token $token \
  --discovery-token-ca-cert-hash sha256:$hash \
  --ignore-preflight-errors Swap \
  $cp_host:6443

# Set up the kubelet
kubeadm join \
  phase kubelet-start \
  --token $token \
  --discovery-token-ca-cert-hash sha256:$hash \
  $cp_host:6443 &

while true; do
  [[ -e /var/lib/kubelet/config.yaml ]] && break
  sleep 1
done

# Edit the kubelet configuration file
echo "failSwapOn: false" >>/var/lib/kubelet/config.yaml

systemctl restart kubelet
}

After a few minutes the node will appear in the MASTER window, where watch is running. Also notice that flannel is automatically installed on the new node.

You may have noticed that this is a bit of a hack. Running kubelet with the --failswapon flag is deprecated, but the phases aren't granular enough to be able to change the kubelet configuration file. I can't find the 'right way' to do this so I have hacked it - the while loop is the hack. TODO! Ask on GitHub

Re-run the Sonobuoy Conformance Test

Test the Single Node Cluster again. This time all the tests should pass - and they did - fantastic! This is a kubernetes compliant installation :)

What's Next

  • Packaging.

    Now we know what's involved we'll try to make it easier to create masters and nodes so we can build, delete, test, build, delete, ...

    We'll also try to make it easy to install different versions of kubernetes so we can practice upgrades or test different versions.