URL Anatomy: Understanding Every Component of a Web Address
Dissect URLs into their components — scheme, authority, path, query, and fragment — and learn how encoding, redirects, and canonical URLs work.
The Anatomy of a URL
Every web developer works with URLs daily, yet few understand every component of this fundamental building block of the web. URLs (Uniform Resource Locators) are defined by RFC 3986 and follow a precise structure that determines how browsers, servers, and APIs locate and access resources.
The Full URL Structure
https://user:pass@api.example.com:8443/v2/users?role=admin&active=true#section-2
└─┬─┘ └───┬───┘ └──────┬───────┘└┬─┘└───┬───┘└──────────┬──────────┘└───┬────┘
scheme userinfo host port path query fragment
└──────────┬──────────┘
authorityLet us examine each component in detail.
1. Scheme (Protocol)
The scheme identifies the protocol used to access the resource:
https://— HTTP over TLS (encrypted). Required for all modern websites.http://— Unencrypted HTTP. Should only be used for local development.ftp://— File Transfer Protocol.file://— Local filesystem access.mailto:— Email addresses (note: no //).tel:— Phone numbers.data:— Inline data (Base64-encoded images, etc.).
Fun fact: The double slash (//) in URLs was a mistake. Tim Berners-Lee, the inventor of URLs, has publicly stated that the // after the colon was unnecessary and that he would remove it if he could redesign URLs.
2. Authority
The authority section identifies the server hosting the resource.
Userinfo (user:pass@): Rarely used today because it sends credentials in plain text. Chrome has removed support for displaying user:pass@ in the address bar to prevent phishing attacks.
Host: Can be a domain name (api.example.com), an IPv4 address (192.168.1.1), or an IPv6 address in brackets ([::1]). Domain names are resolved to IP addresses via DNS.
Port: The TCP port number. Defaults are:
- HTTP: 80
- HTTPS: 443
- FTP: 21
If the default port is used, it is omitted from the URL. A non-default port (:8443) must be explicitly specified.
3. Path
The path identifies the specific resource on the server. It follows Unix filesystem conventions with forward slashes as separators:
/v2/users/12345/profileImportant path concepts:
- Trailing slash matters:
/usersand/users/may be different resources. Some servers redirect one to the other; some treat them as distinct. - Path parameters: Some APIs use path segments as parameters:
/users/:id/posts/:postId - Dot segments:
.(current directory) and..(parent directory) are resolved before the request is sent./a/b/../cbecomes/a/c. - URL-encoded paths: Spaces and special characters in paths must be percent-encoded:
/my documents/→/my%20documents/
4. Query String
The query string begins with ? and contains key-value pairs separated by &:
?role=admin&active=true&page=3Encoding rules:
- Spaces:
%20or+(both are valid, but%20is more universal) - Special characters:
&→%26,=→%3D,?→%3F - Non-ASCII: UTF-8 encoded, then percent-encoded:
ü→%C3%BC
Array parameters have no standard format, leading to inconsistencies:
?color=red&color=blue (repeated key — Express.js, PHP)
?color[]=red&color[]=blue (bracket notation — PHP, Rails)
?color=red,blue (comma-separated — custom APIs)5. Fragment (Hash)
The fragment begins with # and identifies a specific section within the page:
https://example.com/docs#installationCritical fact: The fragment is NEVER sent to the server. It is processed entirely by the browser. This makes it useful for:
- Page anchors (
#section-name) - Single-page app routing (
#/users/profile) - Tracking analytics without server-side logging
URL Encoding: The Essential Skill
URLs can only contain ASCII characters. Any character outside this set — or any reserved character used as data — must be percent-encoded:
encodeURIComponent('hello world') // "hello%20world"
encodeURIComponent('price=50&tax=10') // "price%3D50%26tax%3D10"
encodeURIComponent('café') // "caf%C3%A9"encodeURI vs. encodeURIComponent:
encodeURI(): Encodes a complete URL, preserving :, /, ?, #, &, = charactersencodeURIComponent(): Encodes a single URL component, encoding ALL special characters
// ❌ Wrong: double-encodes the URL structure
encodeURIComponent('https://example.com/path?q=hello world')
// "https%3A%2F%2Fexample.com%2Fpath%3Fq%3Dhello%20world"
// ✅ Right: encode only the value
'https://example.com/path?q=' + encodeURIComponent('hello world')
// "https://example.com/path?q=hello%20world"Canonical URLs and SEO
Search engines may index the same content under multiple URLs:
https://example.com/products
https://example.com/products/
https://example.com/products?ref=homepage
https://www.example.com/productsThe canonical URL tag tells search engines which version is the "official" one:
<link rel="canonical" href="https://example.com/products" />Best practices:
- Always include a canonical tag on every page
- Use the HTTPS version with or without www (pick one and be consistent)
- Exclude tracking parameters from the canonical URL
- Use 301 redirects from non-canonical URLs to the canonical version
URL Security Considerations
Open redirect vulnerabilities: If your application redirects users based on a URL parameter:
https://yoursite.com/login?redirect=https://evil.comAn attacker can craft a link that redirects to a phishing site after login. Always validate redirect URLs against an allowlist of trusted domains.
Homograph attacks: Unicode characters can look identical to ASCII characters. The domain аpple.com (Cyrillic 'а') looks like apple.com (Latin 'a') but resolves to a different server. Modern browsers display the Punycode version (xn--pple-43d.com) to prevent this.
URL length limits: While RFC 3986 does not specify a maximum URL length, browsers and servers impose practical limits:
- Chrome: ~2MB
- Internet Explorer (legacy): 2,083 characters
- Apache: 8,177 characters (default)
- Nginx: 4,096 characters (default)
Keep URLs under 2,000 characters for maximum compatibility.
The URL API in JavaScript
Modern JavaScript provides the URL and URLSearchParams APIs for safe URL manipulation:
const url = new URL('https://example.com/path?a=1&b=2#section');
url.protocol // "https:"
url.hostname // "example.com"
url.pathname // "/path"
url.searchParams.get('a') // "1"
url.hash // "#section"
// Modify safely
url.searchParams.set('page', '3');
url.searchParams.delete('b');
url.toString() // "https://example.com/path?a=1&page=3#section"Always use the URL API instead of string manipulation for URL construction. String concatenation leads to encoding bugs, missing separators, and injection vulnerabilities.
Summary
URLs are more than simple web addresses — they are a precisely structured addressing system with specific rules for encoding, security, and behavior. Understanding the role of each component (scheme, authority, path, query, fragment) helps you build more robust web applications, avoid encoding bugs, implement proper canonical URLs for SEO, and prevent security vulnerabilities like open redirects and homograph attacks.
Try the Related Tool
Put this knowledge into practice with our free, privacy-first tool.
Open Url Parser Tool →