Trying to enable Kubernetes audit policy on my Kops cluster but ran into some problems during my setup. Some of this guide is very specific to the problem about the audit policy but I think this is a good guide on general troubleshooting the Kubernetes Masters when a problem arises and what you should take a look at to figure out what the Kube masters are doing and if it is running correctly.
If you are troubleshooting the Kubernetes masters, you might not have
access. This means you will have to ssh into the Kubernetes master’s nodes. Yup,
back to Linux!! I’ll run you through what you should be looking for.
Here is the link to the doc: https://github.com/kubernetes/kops/blob/master/docs/cluster_spec.md#audit-logging
Copying this section into my cluster config:
spec: kubeAPIServer: auditLogPath: /var/log/kube-apiserver-audit.log auditLogMaxAge: 10 auditLogMaxBackups: 1 auditLogMaxSize: 100 auditPolicyFile: /srv/kubernetes/audit.yaml
Following the direction to load this in via the
spec: fileAssets: # https://github.com/kubernetes/kops/blob/master/docs/cluster_spec.md#audit-logging - name: apiserver-audit-policy path: /srv/kubernetes/audit.yaml roles: [Master] content: | apiVersion: audit.k8s.io/v1 # This is required. # apiVersion: audit.k8s.io/v1beta1 kind: Policy # Don't generate audit events for all requests in RequestReceived stage. omitStages: - "RequestReceived" rules: # Log pod changes at RequestResponse level - level: RequestResponse resources: - group: "" ... ... ...
Was able to apply this but non of the masters came back up and reachable.
So now, it is time to ssh into the master to find out what is going on.
If you are like me, I have the entire cluster on private subnets with no public IPs. You will need a bastion host and jump from there to be able to reach the Kube master. Here is the instructions on how to enable a bastion host in Kops: https://github.com/kubernetes/kops/blob/master/docs/bastion.md#configure-the-bastion-instance-group
Lets take a look at what the
kubelet is telling us:
core@ip-172-16-30-135 ~ $ journalctl -fu kubelet Mar 26 02:56:56 ip-172-16-30-135.ec2.internal kubelet: I0326 02:56:56.266464 2341 kuberuntime_manager.go:757] checking backoff for container "kube-apiserver" in pod "kube-apiserver-ip-172-16-30-135.ec2.internal_kube-system(9725139d01a4b4c33809817a7f87b185)" Mar 26 02:56:56 ip-172-16-30-135.ec2.internal kubelet: I0326 02:56:56.266637 2341 kuberuntime_manager.go:767] Back-off 2m40s restarting failed container=kube-apiserver pod=kube-apiserver-ip-172-16-30-135.ec2.internal_kube-system(9725139d01a4b4c33809817a7f87b185) Mar 26 02:56:56 ip-172-16-30-135.ec2.internal kubelet: E0326 02:56:56.266668 2341 pod_workers.go:186] Error syncing pod 9725139d01a4b4c33809817a7f87b185 ("kube-apiserver-ip-172-16-30-135.ec2.internal_kube-system(9725139d01a4b4c33809817a7f87b185)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kube-apiserver pod=kube-apiserver-ip-172-16-30-135.ec2.internal_kube-system(9725139d01a4b4c33809817a7f87b185)"
This is a lot of output but the
kubelet is telling us that the
pod is crashing. Lets check the logs of this container by listing the containers
running on this system:
core@ip-172-16-30-135 ~ $ docker ps -a | grep api add61058922f d82b2643a56a "/bin/sh -c 'mkfifo …" 3 minutes ago Exited (1) 3 minutes ago k8s_kube-apiserver_kube-apiserver-ip-172-16-30-135.ec2.internal_kube-system_9725139d01a4b4c33809817a7f87b185_6
This is telling us that the container
exited. With Docker containers this means that PID 1 exited and did not hold the
process as it should. Lets take a look at the logs:
core@ip-172-16-30-135 ~ $ docker logs add61058922f ... ... I0326 03:02:46.509849 1 server.go:145] Version: v1.11.7 Error: loading audit policy file: failed decoding file "/srv/kubernetes/audit.yaml": no kind "Policy" is registered for version "audit.k8s.io/v1" Usage: kube-apiserver [flags] Flags: ... ...
Ive omitted a bunch of logs and just showing what is pertinent here. Looks like
the Kube API tried to load my
audit.yaml file but it failed on:
no kind "Policy" is registered for version "audit.k8s.io/v1"
Lets Google this!
First page I got was: https://stackoverflow.com/questions/54238430/cant-create-policy-no-matches-for-kind-policy
Same problem!! Yay!
No solved answer =(
However, one person did say he fixed it by changing this:
Let’s give it a try!
Edit the file: /srv/kubernetes/audit.yaml
and change the
After the change, just wait a minute or so. The
kubelet is continually restarting
this pod, which will restart the container.
core@ip-172-16-30-135 ~ $ docker ps | grep api 8215a06872dd d82b2643a56a "/bin/sh -c 'mkfifo …" About a minute ago Up About a minute k8s_kube-apiserver_kube-apiserver-ip-172-16-30-135.ec2.internal_kube-system_9725139d01a4b4c33809817a7f87b185_8
It looks like the container has been up for a minute. That is good news. take
a look at the logs with
docker logs and the server has started and functioning.
I can now use
kubectl to see if I can talk to the Kube API:
core@ip-172-16-30-135 ~ $ kubectl get nodes NAME STATUS ROLES AGE VERSION ip-172-16-30-135.ec2.internal Ready master 55s v1.11.7
This is looking good.
This post showed you how to fix this specific issue, but in general the workflow
is valid for other Kubernetes Master related issues where it is so early in the
process where the Master doesn’t even start up. At that point, you will have to
ssh into the Master’s node and start debugging from there. I showed you that you
should first look at the
kubelet logs to see if it is telling you anything, then
from there you might even have to interact with
docker to get more logs and details
on what the problem is.
Need personalized help?
ManagedKube provides DevOps consulting services that help you leverage the power of Docker/Kubernetes in building highly resilient, secure, and scalable fully automated CI/CD workflows.
Schedule a free 15 minute consultation today by e-mailing us: firstname.lastname@example.org
Kops | Kubernetes | Audit | Policy | Setup | Debug