Known Issues
The otomi (diff|apply|sync|template)
commands are delegated to helmfile
, which in turn delegates the deployment work to helm
. Sometimes it is not clear wether the issue is from Helm or Helmfile, so we will address them together in this section.
State drift
Problem: otomi apply
does not seem to change resources.
Solution: try otomi sync
Background info:
The otomi apply
command uses helmfile's apply
command, which combines its diff
and sync
commandds. So it first does a helmfile diff
against helm's bookeeping (which resides in versioned secrets, e.g. sh.helm.release.v1.loki.v1
). This is the most cost effective way and does not lead to a new release version being deployed when there are no changes. However, when you changed cluster resources without the CLI (so without using helm) this is not reflected in the secrets. helmfile diff
will not see any changes in the secret, so it won't execute the subsequent helmfile sync
. If you wish to overwrite the desired state on the cluster, use the otomi sync -l name=$releaseName
command directly. Usually only for a certain release, so you don't force change all the releases, which costs a lot of time.
Deployment errors/problems
Helmfile uses Helm 3 under the hood, and it will throw errors in certain situations:
1. It can't deploy a resource when it already exists
When a resource already exists and was not deployed with the chart before (alien to Helm), it is possible to 'adopt' the resource beforehand by labeling and annotating them correctly:
k -n $NS annotate --overwrite $KIND $NAME meta.helm.sh/release-name=$RELEASE
k -n $NS annotate --overwrite $KIND $NAME meta.helm.sh/release-namespace=$NAMESPACE
k -n $NS label --overwrite $KIND $NAME app.kubernetes.io/managed-by=Helm
This functionality exists in the stack in bin/upgrades/adopt-by-helm.sh
, and is used in the upgrade scripts.
2. Release's latest state is failed
Error: "$releaseName" has no deployed releases
This may happen when you try to install a chart (usually for the first time) and it fails. This results in the release's deployment having state 'failed'.
Solution:
- When this was the first install: destroy with
otomi destroy -l name=$releaseName
and then apply withotomi apply -l name=$releaseName
again. - When it was successfully deployed before: remove the last versioned helm secret that is causing the blockage (e.g.
sh.helm.release.v1.loki.v3
)
3. Some resources couldn't be patched
Error: UPGRADE FAILED: failed to replace object: ... field is immutable
This usually happens when a manifest is not allowed to be patched in place and needs to be replaced. Retry the borking release with otomi apply -l name=$releaseName --extraArgs='--force=true'
which does exactly that.
4. Timeout
Problem: Sometimes the CLI will time out when operating on a Google cluster.
Cause: This happens when the containerized kubectl binary wants to refresh an access token, but it can't find the binary that was registered to do so in the otomi docker container.
Workaround: Retry the command. Before every invocation with the containerized kubectl
binary, the CLI first runs kubectl version
with the local binary to invoke a token refresh, resulting in an up-to-date config to mount.
Background:
The CLI is a docker container with all the binaries it needs to deploy to these clusters. When running a command the local cloud configs are mounted. These configs may contain configuration for token refresh mechanisms, including the name of a binary to execute with certain parameters. This makes it possible to include the binaries in the image, and make them available via the known $PATH
.
However, Google Cloud SDK breaks with that approach, by tightly coupling a hard path to the local gcloud binary. Sample user section from $KUBECONFIG
:
- name: gke_otomi-cloud_europe-west4_otomi-gke-demo
user:
auth-provider:
config:
access-token: xxxxxxxxx
cmd-args: config config-helper --format=json
cmd-path: /usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/bin/gcloud
expiry: '2020-10-29T02:15:37Z'
expiry-key: '{.credential.token_expiry}'
token-key: '{.credential.access_token}'
name: gcp
This will not work with containerization unfortunately. We also can't predict the path on the users host computer to this binary, so we have to hope for Google to fix this some day. They are not inclined to do so it seems:
https://issuetracker.google.com/issues/171493249
Maybe they will start to see the importance of this after getting more feedback ;)
5. It can't deploy when another operation is in progress
Problem: The otomi apply
or otomi sync
command fails with the following error:
Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
Cause: A given helm release is in pending-upgrade
state, e.g.:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
keycloak keycloak 3 2021-03-22 13:50:22.5069506 +0000 UTC pending-upgrade keycloak-8.2.2 10.0.0
Solution: Find helm release that is in the pending-upgrade
state:
helm list -a -A | grep -i pending
If there is a helm release in the pending
state AND it has more than one revision, then rollback to the previous revision:
helm -n <namespace> rollback <release-name> <previous revision number>
Otherwise, if there is a helm release in the state pending
AND it has only one revision, then remove that release:
helm uninstall -n <namespace> <release-name>