Implementing Base64 Encoding in Several Languages
I love software. What I don't love is coming up with ideas for software projects. The truth is that any meaningful software problem has already been solved, at least those whose scope I would be willing to undertake myself. I do take some pleasure in designing or implementing something with little to no framework or libraries, especially lower-level utilities that might be taken for granted. What better a project than to implement Base64 encoding and decoding in a few languages in the same way and compare their relative performances?
What is Base64 Encoding?
Sometimes it's useful to be able to encode arbitrary data using only simple ASCII characters, namely alphanumeric. We can think of a standard byte or octet as being Base256. That is, a single unit can be 1 of 256 possible values. The problem with representing a raw byte with an ASCII character is that many of the 256 values either aren't displayable characters or fall outside range of alphanumeric and basic punctuation. To solve that, we can encode the data using a subset of ASCII values that are all human readable characters. This will result in a loss of efficiency: the encoded representation of the data is larger than the data itself. The factor by which it is larger than the raw data can be defined with log2(256)/log2(b) where b is the base of our encoding scheme.
A simple encoding scheme would be Base2, or "binary" colloquially.
log2(256)/log2(2) = 8/1 = 8
8 is a big factor to explode your data by.
The Base2 or "binary" representation of 79 is '01001111', which indeed uses 8 bytes to represent
the value of a single byte. While Base2 encoding is useful for visualizing what's going on at
bit-level, it's certainly not a good way to represent data compactly.
Another common scheme is
Base16, or "hexadecimal," "hex" for short.
log2(256)/log2(16) = 8/4 = 2
Not bad; the data will only be twice as large. The Base16 or "hex" representation of 79 is '4F',
only two bytes to represent one, and perhaps the most visually appealing encoding scheme when
analyzing raw data.
Enter Base64 encoding.
log2(256)/log2(64) = 8/6 = 1.333...
1.333 is even better than 2, but it's not an integer so it will be much harder to "visualize" the
underlying data. Furthermore, there will be some overhead when the number of bytes we're encoding
isn't divisible by 3. So, what does Base64 mean exactly?
In Base2/binary we limited our bytes to 1 of 2 possible values, '0' or '1'.
In Base16/hex we allowed our bytes to be 1 of 16 possible values, '0-9' and 'A-F'.
In Base64, our bytes are assigned 1 of 64(+1) possible vales: 'A-Z', 'a-z', '0-9', '/', '+',
and a 65th character '=' for the cases where there's remainder in data to be encoded.
The input data is split into chunks of 3 bytes, where each chunk will be encoded into an output chunk of 4 characters.
We can see from the figure that the chars 1 and 4 each depend on only one byte whereas chars 2 and 3 each depend on portions of two bytes. Because
we must store each char in memory as byte, we can see that our encoded output will be 4 bytes compared to the 3 input bytes, confirming our data-exploding factor of 1.333.
The Code
If you didn't already know what Base64 encoding was, hopefully you do now. Let's get into the code. I started out implementing it in Java.
Java
The index table for encoding is created in one line by converting the string to a char array.
static char[] indexTable = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".toCharArray();
For our encode function we'll be taking in a byte array and returning a String.
public static String encode(byte[] data) {...}
I want to allocate exactly how much memory I'll need for the encoded output. We get the remainder with a simple modulo operation. Now the length of the encoded output will be 4/3 * the input length, and if we have a remainder, add 4 more. Java has StringBuffer and StringBuilder classes, which are conveniences for appending chars to a string. StringBuffer makes for a nice wrapper around a preallocated char array in this case.
int datLen = data.length; int remainder = datLen % 3; int encLen = (datLen / 3) * 4 + (remainder != 0 ? 4 : 0); StringBuffer buffer = new StringBuffer(encLen);
Now for the loop. We'll declare our indexing variable outside of the so we can use it for the remainder. In this loop, we'll be looping through each complete input chunk of 3 bytes, saving the remainder for later. A bunch of bitwise operations on the input chunk determine the value used to index into the encoding index table. In Java, >>> is the unsigned right bitshift operator where >> is signed.
int i; for (i = 0; i < (datLen - remainder); i += 3) { buffer.append(indexTable[ data[i ]>>>2 ]); buffer.append(indexTable[ (data[i ] & 0x03)<<4 | data[i+1]>>>4 ]); buffer.append(indexTable[ (data[i+1] & 0x0F)<<2 | data[i+2]>>>6 ]); buffer.append(indexTable[ data[i+2] & 0x3F ]); }
Because it didn't feel right for the handling of the remainder to take up as many lines as the loop, I cheese'd it in two with some goofy char arrays.
if (remainder == 1) buffer.append(new char[] { indexTable[ data[i]>>>2 ], indexTable[ (data[i] & 0x03)<<4 ], '=', '=' }); if (remainder == 2) buffer.append(new char[] { indexTable[ data[i]>>>2 ], indexTable[ (data[i] & 0x03)<<4 | data[i+1]>>>4 ], indexTable[ (data[i+1] & 0x0F)<<2 ], '='});
Now we can put it all together and return the buffer's string.
public static String encode(byte[] data) { int datLen = data.length; int remainder = datLen % 3; int encLen = (datLen / 3) * 4 + (remainder != 0 ? 4 : 0); StringBuffer buffer = new StringBuffer(encLen); int i; for (i = 0; i < (datLen - remainder); i += 3) { buffer.append(indexTable[ data[i ]>>>2 ]); buffer.append(indexTable[ (data[i ] & 0x03)<<4 | data[i+1]>>>4 ]); buffer.append(indexTable[ (data[i+1] & 0x0F)<<2 | data[i+2]>>>6 ]); buffer.append(indexTable[ data[i+2] & 0x3F ]); } if (remainder == 1) buffer.append(new char[] { indexTable[ data[i]>>>2 ], indexTable[ (data[i] & 0x03)<<4 ], '=', '=' }); if (remainder == 2) buffer.append(new char[] { indexTable[ data[i]>>>2 ], indexTable[ (data[i] & 0x03)<<4 | data[i+1]>>>4 ], indexTable[ (data[i+1] & 0x0F)<<2 ], '='}); return buffer.toString(); }
For our decode function, we will accept a String and return a byte array.
public static byte[] decode(String data) {...}
Java also has a ByteBuffer, a wrapper around a byte array. To determine the length of the decoded data, we'll need the value of the remainder. The way we determine the value of the remainder can expressed verbally as follows: Is the last character in our encoded string '='? If not, no remainder. Else, if the second-last character also '=' then remainder = 1, otherwise remainder = 2. With that, the decoded length is 3/4 that of the encoded string, minus 1 or 2 if we had a remainder of 2 or 1 respectively.
int encLen = data.length(); int remainder = data.charAt(encLen - 1) == '=' ? (data.charAt(encLen - 2) == '=' ? 1 : 2) : 0; int decLen = (encLen * 3) / 4 - (3 - remainder) % 3; ByteBuffer buffer = ByteBuffer.allocate(decLen);
For the loop, we perform the opposite bitwise operations and use a reverse index table to convert from the ASCII value of the char to the integer value from 0-63. We exit the loop before getting into the remainder territory.
int i; for (i = 0; i < (encLen - 4); i += 4) { buffer.put((byte)( (revTable[data.charAt(i )]<<2) | (revTable[data.charAt(i+1)]>>>4))); buffer.put((byte)(0xF0 & (revTable[data.charAt(i+1)]<<4) | 0x0F & (revTable[data.charAt(i+2)]>>>2))); buffer.put((byte)(0xC0 & (revTable[data.charAt(i+2)]<<6) | 0x3F & (revTable[data.charAt(i+3)]))); }
For the last iteration, outside the loop, we simply perform the same steps, exiting early if our remainder dictates it.
buffer.put((byte)((revTable[data.charAt(i)]<<2) | (revTable[data.charAt(i+1)]>>>4))); if (remainder == 1) return buffer.array(); buffer.put((byte)(0xF0 & (revTable[data.charAt(i+1)]<<4) | 0x0F & (revTable[data.charAt(i+2)]>>>2))); if (remainder == 2) return buffer.array(); buffer.put((byte)(0xC0 & (revTable[data.charAt(i+2)]<<6) | 0x3F & (revTable[data.charAt(i+3)]))); return buffer.array();
Putting it all together:
public static byte[] decode(String data) { int encLen = data.length(); int remainder = data.charAt(encLen - 1) == '=' ? (data.charAt(encLen - 2) == '=' ? 1 : 2) : 0; int decLen = (encLen * 3) / 4 - (3 - remainder) % 3; ByteBuffer buffer = ByteBuffer.allocate(decLen); int i; for (i = 0; i < (encLen - 4); i += 4) { buffer.put((byte)( (revTable[data.charAt(i )]<<2) | (revTable[data.charAt(i+1)]>>>4))); buffer.put((byte)(0xF0 & (revTable[data.charAt(i+1)]<<4) | 0x0F & (revTable[data.charAt(i+2)]>>>2))); buffer.put((byte)(0xC0 & (revTable[data.charAt(i+2)]<<6) | 0x3F & (revTable[data.charAt(i+3)]))); } buffer.put((byte)((revTable[data.charAt(i)]<<2) | (revTable[data.charAt(i+1)]>>>4))); if (remainder == 1) return buffer.array(); buffer.put((byte)(0xF0 & (revTable[data.charAt(i+1)]<<4) | 0x0F & (revTable[data.charAt(i+2)]>>>2))); if (remainder == 2) return buffer.array(); buffer.put((byte)(0xC0 & (revTable[data.charAt(i+2)]<<6) | 0x3F & (revTable[data.charAt(i+3)]))); return buffer.array(); }
C
I ported the Java implementation over to C. Without the conveniences of String/ByteBuffers, I had to keep track of two indexing variables for the loops. For the encode function I unfortunately couldn't cheese the remainder bit in two lines, but I did my best.
char* encode(char* data) { int datLen = strlen(data); int remainder = datLen % 3; int encLen = (datLen / 3) * 4 + (remainder ? 4 : 0); char* buffer = calloc(encLen + 1, sizeof(char)); int i,j; for (i = j = 0; i < (datLen - remainder); i += 3) { buffer[j++] = indexTable[ data[i ]>>2 ]; buffer[j++] = indexTable[ (data[i ] & 0x03)<<4 | data[i+1]>>4 ]; buffer[j++] = indexTable[ (data[i+1] & 0x0F)<<2 | data[i+2]>>6 ]; buffer[j++] = indexTable[ data[i+2] & 0x3F ]; } if (remainder) { strncpy(buffer + j + 2, "==", 2); buffer[j++] = indexTable[ data[i]>>2 ]; buffer[j] = indexTable[ (data[i] & 0x03)<<4 ]; if (remainder == 2) { buffer[j++] = indexTable[ (data[i] & 0x03)<<4 | data[i+1]>>4 ]; buffer[j++] = indexTable[ (data[i+1] & 0x0F)<<2 ]; } } return buffer; }
C#
The C# port looked almost identical to Java. I used a StringBuilder for encoding, and a byte array for decoding. I had tried implemented decoding using MemoryStream, C#'s equivalent to Java's ByteBuffer, but it ran about 60% slower than the byte array, so I used the byte array.
Python
Python looked a little closer to C since I had to track two indexing variables and couldn't cheese the remainder portion of the encode function. I had to use while loops for encoding and decoding as well since Python doesn't offer the traditional 3-clause for loop.
i = 0 j = 0 while i < (dat_len - remainder): buffer[j ] = index_table[ (data[i ])>>2 ] buffer[j+1] = index_table[ ((data[i ]) & 0x03)<<4 | (data[i+1])>>4 ] buffer[j+2] = index_table[ ((data[i+1]) & 0x0F)<<2 | (data[i+2])>>6 ] buffer[j+3] = index_table[ (data[i+2]) & 0x3F ] i += 3 j += 4
Javascript
While the decode function of my JS implementation was pretty similar to all the other languages, the encode function was quite a bit different. I didn't need to calculate a remainder or allocate a string. I instead checked the length of what ArrayBuffer.slice() returned in the condition of a while loop and used simple string concatenation for the encoded value.
const encode = (data) => { data = Buffer.from(data, 'utf8') let enc = '' let i = 0 let chunk while((chunk = data.slice(i, i + 3)).length == 3) { enc += index[ chunk[0] >>> 2] + index[(chunk[0] & 0x03) << 4 | chunk[1] >>> 4] + index[(chunk[1] & 0x0F) << 2 | chunk[2] >>> 6] + index[ chunk[2] & 0x3F] i += 3 } if (chunk.length) { enc += index[chunk[0] >>> 2] + index[(chunk[0] & 0x03) << 4 | chunk[1] >>> 4 ] + (chunk[1] ? index[(chunk[1] & 0x0F) << 2] : '=') + '=' } return enc }
Elixir
The Elixir implementation is quite a bit different than any of the other languages. The bitwise operations for encode/decode functions were the same, which was the bare minimum that I wanted all of my implementations to comply to. Believe it or not, Elixir does not have a simple array. It has (linked) lists, maps, and contiguous-in-memory tuples. I tried maps, tuples, and even strings for my index table(s), but they were all too slow. I ended up using a function that leverages Elixir's neat cond blocks. I also didn't need to care about remainders or allocating buffers. Function pattern matching, bit-string concatenation, and recursion would have to suffice.
def encode(data), do: encode(<<>>, data) defp encode(h, <<>>), do: h defp encode(h, <<a::8>>), do: h <> indx(a >>> 2) <> indx((a &&& 0x03) <<< 4) <> "==" defp encode(h, <<a::8, b::8>>), do: h <> indx(a >>> 2) <> indx((a &&& 0x03) <<< 4 ||| b >>> 4) <> indx((b &&& 0x0F) <<< 2) <> "=" defp encode(h, <<a::8, b::8, c::8, t::binary>>) do h <> indx( a >>> 2) <> indx((a &&& 0x03) <<< 4 ||| b >>> 4) <> indx((b &&& 0x0F) <<< 2 ||| c >>> 6) <> indx( c &&& 0x3F) |> encode(t) end
def decode(data), do: decode(<<>>, data) defp decode(h, <<>>), do: h defp decode(h, <<a, b, ?=, ?=>>), do: h <> <>> 4>> defp decode(h, <<a, b, c, ?=>>), do: h <> < >> 4, rev(b) <<< 4 &&& 0xF0 ||| rev(c) >>> 2 &&& 0x0F>> defp decode(h, <<a, b, c, d, t::binary>>) do h <> << rev(a) <<< 2 ||| rev(b) >>> 4, rev(b) <<< 4 &&& 0xF0 ||| rev(c) >>> 2 &&& 0x0F, rev(c) <<< 6 &&& 0xC0 ||| rev(d) &&& 0x3F >> |> decode(t) end
Nim and Crystal
I've never programmed in Nim or Crystal, but my friend Ben Haney kindly contributed implementations in both languages. The Nim port is almost verbatim my Python; he simply copied it and changed the keywords and syntax around slightly to those of Nim. Nim compiles into C/C++ or Javascript and has apparently Python-like syntax. Very cool. Crystal is essentially statically type checked, compiled Ruby. For the Crystal implementation, Ben opted to not care about remainders and has two very short, clean functions. I think checking for the remainder condition on every iteration of the encode loop caused the performance to suffer. Also, I think the decode function pads the decoded buffer with 1 or 2 zero bytes in the event of a remainder. This might slightly corrupt your data if your data was not a null-terminated string. Regardless, the functions are slick:
def encode(data) String.build(data.size * 2) do |s| data.each_slice(3) do |d| s << ( E_TBL[ (d[0] >> 2) ]) s << ( E_TBL[ ((d[0] & 0x03) << 4) | ((d[1]? || 0) >> 4) ]) s << (d[1]?.nil? ? '=' : E_TBL[ ((d[1] & 0x0F) << 2) | ((d[2]? || 0) >> 6) ]) s << (d[2]?.nil? ? '=' : E_TBL[ ((d[2] & 0x3F)) ]) end end end def decode(data) buffer = Array(UInt8).new(data.size) data.each_byte.each_slice(4) do |d| buffer.push((((D_TBL[d[0]] << 2) ) | ((D_TBL[d[1]] >> 4) ))) buffer.push((((D_TBL[d[1]] << 4) & 0xF0) | ((D_TBL[d[2]] >> 2) & 0x0F))) buffer.push((((D_TBL[d[2]] << 6) & 0xC0) | ((D_TBL[d[3]] ) & 0x3F))) end buffer end
Go
After initially writing this blog post, I implemented the algorithm in Go. I thought it would be most similar to C, but Go has some conveniences like strings.Builder
and bytes.Buffer
making it look closer to Java.
This meant I didn't need a second indexing variable for the hot loops. Both of those structs have a Grow(int) method, so after initializing Builder/Buffer to their respective 'zero value' they could have their capacity increased to the amount needed.
Unfortunately, Go doesn't have a ternary operator, so things that took one line in most languages took up to eight in Go. The performance of Go was exceptional.
Benchmark
With all these lovely implementations of the same Base64 encoding and decoding algorithm, let's see how they compare. The test for each language was to encode, decode, and encode-then-decode a few human readable strings to verify correct functionality, then record the time it takes to encode-then-decode a large Lorem Ipsum paragraph one million times.
const big_string = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum mi purus, mollis et pulvinar quis, elementum non felis. Ut bibendum dolor ut mauris tempus, euismod ultricies odio vulputate. Nunc finibus elit non venenatis maximus. Maecenas in mollis ipsum, mattis laoreet purus. Sed lacus purus, tempus vel elementum sed, rutrum nec massa. Mauris mattis libero vitae nunc tempor, eget posuere ipsum molestie. Curabitur semper tempus diam. Morbi rutrum sollicitudin augue, rhoncus viverra velit volutpat vitae.\r\n`
Here's Ben flexing with Nim's Uniform Function Call Syntax.
echo encode("AAAAAAAAAAAA") echo "AAAAAAAAAAAAA".encode echo encode "AAAAAAAAAAAAAA" echo decode(encode("123")) echo decode encode "1234" echo "12345".encode.decode echo "123456".encode().decode() echo decode "QUJDYWJjMTIzWFlaeHl6"
And here's what each of the one million paragraph benchmark looked like for each language. Remember to free your buffers in C.
C
gettimeofday(&timeStart, NULL); for (i = 0; i<1000000; i++) { largeTest_enc = (char*)encode(largeData); largeTest_dec = (char*)decode(largeTest_enc); free(largeTest_enc); free(largeTest_dec); } gettimeofday(&timeStop, NULL);
Java
long timeStart = System.nanoTime(); for (int i = 0; i < 1000000; i++) largeTest = decode(encode(largeData)); long timeEnd = System.nanoTime();
Go
t0 := time.Now() for i := 0; i < 1000000; i++ { decode(encodeStr(largeData)) } t1 := time.Since(t0)
C#
Stopwatch sw = new Stopwatch(); sw.Start(); for (int i = 0; i < 1000000; i++) largeTest = Decode(Encode(largeData)); sw.Stop();
Nim
let time_start = epochTime() for _ in (1 .. 1_000_000): largeTest = decode(encode(largeData)) let time_stop = epochTime()
Crystal
elapsed_time = Time.measure do test = [] of UInt8 1_000_000.times { test = decode(encode(LARGE_DATA)) } end
Javascript
let start = Date.now() for (let i = 0; i < 1000000; i++) big_res = decode(encode(big_string)) let stop = Date.now()
Elixir
t0 = System.system_time(:millisecond) Task.async_stream(1..1_000_000, fn _x -> @big_string |> Base64.encode |> Base64.decode end) |> Enum.to_list t1 = System.system_time(:millisecond)
Python
time_start = time.time() for _ in range(1000000): largeTest = decode(encode(largeData)) time_stop = time.time()
Results
Language | Time (s) | Comments | Encoding Struct | Decoding Struct |
---|---|---|---|---|
C | 1.078 | Compiled with gcc -O3 | char * | char * |
Java | 3.173 | Ran in Eclipse, Java 8 | StringBuffer | ByteBuffer |
Go | 3.374 | Go version go1.12.9 | strings.Builder | bytes.Buffer |
C# | 4.585 | Ran in VS 2015, .NET 4.6 | StringBuilder | byte[] |
Nim | 5.264 | Compiled with nim c -d:release | string | string |
Python | 15.52 | Ran with PyPy3 (JIT) | List | bytearray |
Crystal | 20.28 | Compiled with crystal build --release | String | UInt8[] |
Javascript | 22.04 | Ran with Node.js v10 | String | Buffer |
Elixir | 53.00 | Used Task.async_stream | bitstring | bitstring |
Python | 355.3 | Ran with Python3 (standard) | List | bytearray |
It should come as no surprise that optimized C beat out everything else by a longshot. Interpreted Python is comically slow, but JIT’d Python, with PyPy3, made it competitive by giving it a 2200% speedup. Nim, whose implementation was identical to that of Python, was three times faster than PyPy3. Java is sitting comfortably in second place. For as much hate as Java receives, it did exceedingly well in this benchmark. C# wasn’t too far behind it when run in release mode. Poor Elixir; though it’s built-in baseN encode/decode functions make it competitive with the likes of my JS, Crystal, and PyPy3 Python implementations, my crusty algorithms didn’t make Elixir look very good here. I think the main issue with it is that the function pattern matching forced it to check the remainder case several times per iteration, something I explicitly avoided in other languages.
Conclusion
Thank you for taking the time to read this, or for at least scrolling to the bottom of the page. Hopefully you've learned something about your favorite or least favorite languages. You should definitely use whatever native Base64 encoding library your language has to offer. If you want to check out the repository, or even port my algorithm to another language, feel free to make a PR. Repository can be found here .