How Base64 Encoding Works

Base64 protects binary data against corruption in ASCII-only transfers

Woman Hand On Keyboard
BLOOM image/Getty Images

Base64 encoding is a process of converting binary data to an ASCII string format by converting that binary data into a 6-bit character representation. The Base64 method of encoding is used when binary data, such as images or video, is transmitted over systems that are designed to transmit data in a plain-text (ASCII) format.

Why is Base64 Encoding Used?

The need for Base64 encoding comes from the problems that occur when media is transmitted in raw binary format to text-based systems.

Since text-based systems (like email) interpret binary data as a wide range of characters, including special command characters, much of the binary data that is transmitted to transfer media is misinterpreted by those systems and lost or corrupted in the transmission process.

One method of encoding this kind of binary data in a way that avoids such transmission problems is to send it as plain ASCII text in Base64 encoded format. This is one of the techniques employed by the MIME standard to send data other than plain text.

Many programming languages, such as PHP and Javascript, include Base64 encoding and decoding functions in order to interpret data transmitted using Base64 encoding.

Base64 Encoding Logic

Base64 encoding breaks binary data into 6-bit segments of 3 full bytes and represents those as printable characters in ASCII standard. It does that in essentially two steps.

The first step is to break the binary string down into 6-bit blocks. Base64 only uses 6 bits (corresponding to 2^6 = 64 characters) to ensure encoded data is printable and humanly readable. None of the special characters available in ASCII are used.

The 64 characters (hence the name Base64) are 10 digits, 26 lowercase characters, 26 uppercase characters as well as the Plus sign (+) and the Forward Slash (/). There is also a 65th character known as a pad, which is the Equal sign (=). This character is used when the last segment of binary data doesn't contain a full 6 bits.

Base64 Encoding Example

For example, take three ASCII numbers 155, 162, and 233. These three numbers constitute a binary stream of 100110111010001011101001. A binary file, like an image, contains a binary stream running for tens or hundreds of thousands of zeroes and ones.

A Base64 encoder starts by chunking the binary stream into groupings of six characters: 100110 111010 001011 101001. Each of these groupings translates into the numbers 38, 58, 11, and 41.

A six-character binary stream converts between binary (or base-2) to decimal (base-10) characters by squaring each value represented by a 1 in the binary sequence with its positional square. Starting from the right and moving left, and starting with zero, the values in the binary stream represent 2^0, then 2^1, then 2^2, then 2^3, then 2^4, then 2^5.

Here's another way to look at it. Starting from the left, each position is worth 1, 2, 4, 8, 16, and 32. If the binary number has a 1 in the slot, you add that value; if it has a 0 in the slot, you don't. The binary string 100110 converts to the decimal number 38: 0*2^01 + 1*2^1 + 1*2^2 + 0*2^3 + 0*2^4 + 1*2^5 = 0+2+4+0+0+32.

Base64 encoding takes this binary string and breaks it down into the 6-bit values 38, 58, 11 and 41.

Finally, these numbers are converted to ASCII characters using the Base64 encoding table. The 6-bit values of this example translate to the ASCII sequence m6Lp.

Using the Base64 conversion table:

  • 38 is m
  • 58 is 6
  • 11 is L
  • 41 is p

This two-step process is applied to the entire binary string that's encoded.

To ensure the encoded data can be properly printed and does not exceed any mail server's line length limit, newline characters are inserted to keep line lengths below 76 characters. The newline characters are encoded like all other data.

The entire purpose of Base64 encoding, from adding padding to preserve 3-byte binary segments to converting binary to text using the Base64 table, is to preserve the integrity of the transmitted binary information.

Base64 Encoding Table

The following table translates all 64 characters used in Base64 encoding.

Base64 Encoding Table
Value Char   Value Char   Value Char   Value Char
0 A   16 Q   32 g   48 w
1 B   17 R   33 h   49 x
2 C   18 S   34 i   50 y
3 D   19 T   35 j   51 z
4 E   20 U   36 k   52 0
5 F   21 V   37 l   53 1
6 G   22 W   38 m   54 2
7 H   23 X   39 n   55 3
8 I   24 Y   40 o   56 4
9 J   25 Z   41 p   57 5
10 K   26 a   42 q   58 6
11 L   27 b   43 r   59 7
12 M   28 c   44 s   60 8
13 N   29 d   45 t   61 9
14 O   30 e   46 u   62 +
15 P   31 f   47 v   63 /

Solving the Endgame

At the end of the encoding process, there might be a problem. If the size of the original data in bytes is a multiple of three, everything works fine. If it is not, there may be empty bytes. For proper encoding, exactly 3-bytes of binary data is needed.

The solution is to append enough bytes with a value of 0 to create a 3-byte group. Two such values are appended if the data needs one extra byte of data, one is appended for two extra bytes.

Of course, these artificial trailing '0's cannot be encoded using the encoding table below. They must be represented by a 65th character. The Base64 padding character is the Equal sign (=) and is placed at the end of encoded data.