Brief Introduction to URL Encoding

Close up of https on internet search bar


A website's URL, also commonly known as the "website address", is what someone would enter into a web browser in order to access a specific website. When you pass information through a URL, you need to make sure it only uses specific allowed characters. These allowed characters include alphabetic characters, numerals, and a few special characters that have meaning in the URL string. Any other characters that need to be added to a URL should be encoded so that they don't cause problems during the browser's trip to locate the pages and resources you are looking for.

Encoding a URL

The most commonly encoded character in URL string is the <space> character. You see this character whenever you see a plus-sign (+) in a URL. This represents the space character. The plus sign acts as a special character representing that space in a URL. The most common way you'll see this is in a mailto link that includes a subject. If you want the subject to have spaces in it, you can encode them as pluses:

This bit of encoding text would transmit a subject of "this is my subject". The "+" character in the encoding would be replaced with an actual <space> when it is rendered in the browser.

To encode a URL, you simply replace the special characters with their encoding string. This will nearly always begin with a % character.

Strictly speaking, you should always encode any special characters found in a URL. One important note, in case you are feeling a bit intimidated by all this talk or encoding, is that you generally won't find any special characters in a URL outside their normal context except with form data. Most URLs use the simple characters that are always allowed, so no encoding is needed at all.

If you submit data to CGI scripts using the GET method, you should encode the data as it will be sent over the URL. For instance, if you are writing a link to promote an RSS feed, your URL will need to be encoded to add to the script URL you're promoting it on.

What Should Be Encoded?

Any character that is not an alphabetic character, a number, or a special character that is being used outside its normal context is going to need to be encoded in your page. Below is a table of common characters that could be found in a URL and their encoding.

Reserved Characters URL Encoding

Character Purpose in URL Encoding
: Separate protocol (http) from address %3B
/ Separate domain and directories %2F
# Separate anchors %23
? Separate query string %3F
& Separate query elements %24
@ Separate username and password from domain %40
% Indicates an encoded character %25
+ Indicates a space %2B
<space> Not recommended in URLs %20 or +

Note that these encoded examples are different than what you find with HTML special characters. For example, if you need to encode a URL with an ampersand (&) character, you would use %24, which is what is shown in the table above. If you were writing out HTML and you wanted to add an ampersand to the text, you could not use %24. Instead, you would use either &amp; or &#38;, both of which would write out the & in the HTML page when rendered. This may seem confusing at first, but it is basically the difference between the text that appears on the page itself, which is part of the HTML code, and the URL string, which is a separate entity and therefore subject to different rules. The fact that the "&" character, as well as many other characters, can appear in each should not confuse you with the differences between the two.

Edited by Jeremy Girard.