What Signs Are Allowed in URLs?
URLs (Uniform Resource Locators) are designed to use a specific, limited set of characters to ensure universal compatibility and proper interpretation across the web. The characters permitted in a URL include alphabetic characters, numerals, and a defined set of symbols and punctuation marks, some of which carry special meanings within the URL structure itself.
Understanding URL Character Classes
The allowed characters can be broadly categorized as follows:
- Alphabetic Characters: Both uppercase (
A-Z
) and lowercase (a-z
) letters are permitted. These are considered "unreserved" and can be used directly. - Numeric Characters: All digits from
0
through9
are allowed. Like alphabetic characters, these are also "unreserved." - Specific Punctuation and Symbols: A precise set of symbols is allowed. However, it's crucial to understand that while these characters are "allowed," some of them serve as delimiters or have specific functions within a URL and must be percent-encoded (
%xx
) if they are intended to be part of data (like a query parameter value) rather than serving their structural role.
The specific symbols and punctuation marks allowed are:
;
,
/
?
:
@
&
=
+
$
-
_
.
!
~
*
'
(
)
#
Breakdown of Allowed Symbols
To clarify how these specific symbols function within a URL, they can be further grouped based on their general behavior:
Character | Category | Notes | Example/Usage Context |
---|---|---|---|
A-Z , a-z |
Alphabetic (Unreserved) | Can be used freely without encoding. | example.com/mypage |
0-9 |
Numeric (Unreserved) | Can be used freely without encoding. | example.com/item123 |
- , _ , . |
Unreserved Punctuation | Can be used freely without encoding, commonly used in hostnames or path segments. | my-website.com/user_data/doc.pdf |
~ , ! , * , ' , ( , ) |
Unreserved Punctuation/Sub-delimiters | Generally allowed without encoding, though () might be encoded by some systems for consistency. These are often used in contexts like query parameters. |
example.com/search?q=hello~world |
/ |
General Delimiter | Separates path segments. Must be percent-encoded (%2F ) if it is part of a data value (e.g., a filename containing a / ) and not acting as a path separator. |
example.com/path/to/resource |
? |
General Delimiter | Separates the path from the query string. Must be percent-encoded (%3F ) if used within a data value. |
example.com/search?query=data |
# |
General Delimiter | Separates the query string from the fragment identifier (hash). Must be percent-encoded (%23 ) if used within a data value. |
example.com/page#section1 |
: , @ |
General Delimiters | : separates protocol from host (http:// ), or host from port (example.com:8080 ). @ separates user info from host in authentication (user:pass@ ). Must be percent-encoded if used as data. |
http://user:[email protected]/ |
& |
Sub-delimiter (Reserved) | Separates query parameters. Must be percent-encoded (%26 ) if used as part of a parameter value. |
example.com/search?q=term&category=books |
= |
Sub-delimiter (Reserved) | Separates key-value pairs in a query string. Must be percent-encoded (%3D ) if used as part of a parameter value. |
example.com/search?param=value |
+ |
Sub-delimiter (Reserved) | Represents a space character in query strings (though %20 is preferred for clarity). Must be percent-encoded (%2B ) if used literally. |
example.com/search?query=hello+world (space) |
$ , ; , , |
Sub-delimiters (Reserved) | Can be used in specific contexts (e.g., matrix parameters or older RFCs) but are generally percent-encoded if they appear in data. | example.com/browse;jsessionid=xyz or user=$username |
It is important to remember that characters not explicitly listed as allowed must be percent-encoded (e.g., a space becomes %20
). This ensures that URLs are globally interoperable and correctly interpreted by web browsers and servers.