Not today...

comments

Snippet

Bash base64 padding

Tagged bash

Disclaimer: I am not 100% percent sure of my formula. It works every time I use it but I may have misunderstood some concepts. Take this article with a grain of salt.

It happens that sometimes I have to process some base64 encoded strings which are not padded correctly (here is the wikipedia explanation of base64 padding). When using the base64 -d command I end up getting the following error: base64: invalid input and a non zero exit code.

Note: This only happen with the GNU version of the base64 utilitary. On BSD it seems to support invalid padding.

Since I am relying on the exit code I can’t ignore the error and need to fix the padding before processing the string. Also, I would like this fix to be rather short to insert it in a pipe flow without creating a monstruous line. After a few experimentations I came up with a satisfying solution relying on awk.

In this example I will show a real life use case I encountered. I had a JWT token stored in a file on my machine. This token come with an expiration time and I would like to check if the token has expired before requesting a new one. This will avoid going over the network if my token is still valid.

	cat my_jwt_token.txt \
		| awk -F'.' '{l=length($2)+2; print substr($2"==",1,l-l%4)}' \
		| base64 -d | jq -r '.exp' \
		| xargs test "$(date +%s)" -lt
  • The padding happens on the second line here is a quick explanation:
       ┌──── We are decoding a JWT token, base64 encoded fields are separated by a dot
       │                                    ┌─── We add two padding characters to the second field
       │                                    │     (the one we want to decode), which is the maximum possible padding
       │          ┌─ We calculate the new length with the additional padding
       │          │                         │
awk -F'.' '{l=length($2)+2; print substr($2"==",1,l-l%4)}'
  This is where we compute how much padding we need. ┘
  We calculate the modulo 4 of the string with extra padding and use it to
  truncate the length and remove the non required characters.
  The resulting string will contains the required padding characters to be base64 valid
  • The third line decode the string and we use jq to retrieve the proper field.
  • Fourth line is a test against current time to determine if we are past the expiration date.

Even if I am not sure of my formula I ran a few tests using the wiki article to validate it works with most of the cases:

decode() {
    awk -vstr="${1}" 'BEGIN {l=length(str)+2; print substr(str"==",1,l-l%4)}' \
        | base64 -d \
        | { cat; echo; }
}

decode "bGlnaHQgd29yay4"
decode "bGlnaHQgd29yaw"
decode "bGlnaHQgd29y"
decode "bGlnaHQgd28"
decode "bGlnaHQgdw"

I hope this could help someone out there, I am keeping it here for my future self to not forget about all those tricks.