This is us eating our own dog food. How else are we going to find these use cases =)
Our Prometheus alerting sent an alert to our Slack channel telling us that the
deployment for one of our GCP Marketplace backend pods
has been in a non-ready state for longer than an hour.
From there in Slack, I can ask the bot to list the pods:
Then I see that the
I then ask @k8sBot to describe this pod from the drop down menu, which then @k8sbot returns in Slack:
This telling me that it is failing to pull the image and there is a specific event that k8sBot brought back to us that gives a really good clue on what happened:
unauthorized: incorrect username or password
That triggered my memory that Docker Hub had one of their databases compromised and they sent out emails to everyone to reset the password
So I did. However, this has some downstream effects that were not known to me at the time, like this one. The final fix was to update the password used to pull these images and we are back!
GKE | Prometheus | k8sBot