Thursday, February 26, 2009

Uniform Resource Locator (URL)

n computing, a Uniform Resource Locator (URL) is a type of Uniform Resource Identifier (URI) that specifies where an identified resource is available and the mechanism for retrieving it.[1] In popular usage and in many technical documents and verbal discussions it is often, imprecisely and confusingly, used as a synonym for uniform resource identifier. The confusion in usage stems from historically different interpretations of the semantics of the terms involved.[2] In popular language, a URL is also referred to as a Web address.
Contents
[hide]

* 1 Syntax
* 2 URLs as locators
* 3 Internet hostnames
* 4 See also
* 5 References
* 6 External links

Main article: URI scheme#Generic syntax

Every URL begins with the scheme name that defines its namespace, purpose, and the syntax of the remaining part of the URL. Most Web-enabled programs will try to dereference a URL according to the semantics of its scheme and a context-specific heuristic. For example, a Web browser will usually dereference the URL http://example.org/ by performing an HTTP request to the host example.org, at the default HTTP port (port 80). Dereferencing the URL mailto:bob@example.com will usually start an e-mail composer with the address bob@example.com in the To field.

example.com is a domain name; an IP address or other network address might be used instead. In addition, URLs that specify https as a scheme (such as https://example.com/) normally denote a secure website.

The hostname portion of a URL, if present, is case insensitive (since the DNS is specified to ignore case); other parts are not required to be, but may be treated as case insensitive by some clients and servers, especially those that are based on Microsoft Windows. For example:

1. http://en.wikipedia.org/ and HTTP://EN.WIKIPEDIA.ORG/ will both open same page.
2. http://en.wikipedia.org/wiki/URL is correct, but http://en.wikipedia.org/WIKI/URL/ will result in an HTTP 404 error page.

[edit] URLs as locators

In its current strict technical meaning, a URL is a URI that, “in addition to identifying a resource, [provides] a means of locating the resource by describing its primary access mechanism (e.g., its network ‘location’).”[3]

[edit] Internet hostnames

Main article: Hostname

On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.

Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.

[edit] See also

* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme

[edit] References

1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Uniform Resource Identifier (URI): Generic Syntax”. Internet Society. RFC 3986; STD 66.

[edit] External links

* RFC 3986 Uniform Resource Identifier (URI): Generic Syntax [Text] [HTML]
URLs as locators

In its current strict technical meaning, a URL is a URI that, “in addition to identifying a resource, [provides] a means of locating the resource by describing its primary access mechanism (e.g., its network ‘location’).”[3]

[edit] Internet hostnames

Main article: Hostname

On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.

Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.

[edit] See also

* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme

[edit] References

1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Unifor

nternet hostnames

Main article: Hostname

On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.

Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.

[edit] See also

* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme

[edit] References

1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Uniform Resource Identifier (URI): Generic Syntax”. Internet Society. RFC 3986; STD 66.

[edit] External links

* RFC 3986 Uniform Resource Identifie


CURIE
From Wikipedia, the free encyclopedia
Jump to: navigation, search
For other uses see Curie (disambiguation)

A CURIE (short for Compact URI) is an abbreviated URI expressed in CURIE syntax, and may be found in both XML and non-XML grammars. A CURIE may be considered a datatype.

An example of CURIE syntax: [isbn:0393315703]

The square brackets may be used to prevent ambiguities between CURIEs and regular URIs.

QNames (the namespace prefixes used in XML) often are used as a CURIE, and may be considered a type of CURIE. CURIEs, as defined by the W3C, will be better defined and may include checking. Unlike QNames, the part of a CURIE after the colon does not need to conform to the rules for element names.

The first W3C Working Draft of CURIE syntax was released 7 March 2007.[1]

[edit] Example

This example is based on one from the W3C Working Draft 7 March 2007, using a QName syntax within XHTML.


...


Find out more about biomes.





* The definition ("") is highlighted in yellow
* The CURIE ("[wiki:Biome]") is highlighted in green

[edit] References

1. ^ CURIE Syntax 1.0

[edit] External links

* www.w3.org/TR/curie

[hide]
v • d • e
Standards of the World Wide Web Consortium
Recommendations
Canonical XML · CDF · CSS · DOM · HTML · MathML · OWL · P3P · PLS · RDF · RDF Schema · SISR · SMIL · SOAP · SRGS · SSML · SVG · SPARQL · Timed Text · VoiceXML · WSDL · XForms · XHTML · XLink · XML · XML Base · XML Encryption · XML Events · XML Information Set · XML Schema · XML Signature · XPath · XPointer · XQuery · XSL · XSL-FO · XSLT
Notes
XAdES · XHTML+SMIL
Working Drafts
CCXML · CURIE · HTML 5 · InkML · WICD · XFDL · XFrames · XBL · XHTML+MathML+SVG · XProc
Guidelines
Web Content Accessibility Guidelines
deprecated
HDML · JSSS · PGML · VML



Extensible Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The neutrality of this article is disputed.
Please see the discussion on the talk page. (June 2008)
Please do not remove this message until the dispute is resolved.

Extensible Resource Identifier (abbreviated XRI) is a scheme and resolution protocol for abstract identifiers compatible with Uniform Resource Identifiers and Internationalized Resource Identifiers, developed by the XRI Technical Committee at OASIS. The goal of XRI is a standard syntax and discovery format for abstract, structured identifiers that are domain-, location-, application-, and transport-independent, so they can be shared across any number of domains, directories, and interaction protocols.

The XRI 2.0 specifications narrowly failed to become OASIS standards due to the number of negative votes,[1] a failure attributed[2] to the intervention of the W3C Technical Architecture Group which made a statement recommending against using XRIs or taking the XRI specifications forward.[3] The core of the dispute is whether the widely interoperable HTTP URIs are capable of fulfilling the role of abstract, structured identifiers, as the TAG believes,[4] but whose limitations the XRI Technical Committee was formed specifically to address.[5]

With the growth of XML, Web services, and other ways of adapting the Web to automated, machine-to-machine communications, it is increasingly important to be able to identify a resource independent of any specific physical network path, location, or protocol in order to:

* Create structured identifiers with self-describing "tags" that can be understood across domains the same way XML documents provide a self-describing, domain-independent data format.
* Maintain a persistent link to the resource regardless of whether its network location changes.
* Delegate identifier management not just in the authority segment (the first segment following the "xxx://" scheme name) but anywhere in the identifier path.
* Map identifiers used to identify a resource in one domain to other synonyms used to identify the same resource in the same domain, or in other domains.

By early 2003, these requirements led to the a resolution protocol based on HTTP(S) and simple XML documents called XRDS (Extensible Resource Descriptor Sequence).
Contents
[hide]

* 1 Features
* 2 Composition of an Extensible Resource Identifier
* 3 Resolving an Extensible Resource Identifier
o 3.1 Proxy resolvers and the HXRI
* 4 Examples of XRI cross-reference syntax
* 5 Other examples of XRI 2.0 syntax
* 6 Applications
* 7 Licensing
* 8 References
* 9 See also
* 10 External links

[edit] Features

* URI- and IRI-compatibility — XRIs can be used wherever URIs or IRIs are called for.
* Cross-references — An XRI can contain another XRI (or a URI), to any level of nesting. This enables the construction of structured, "tagged" identifiers that enable identifier sharing across domains the same way XML enables data sharing across domains.
* Global context symbols — These are single-character symbols (=, @, +, $, or !) that provide a simple, human-friendly way to indicate the global context of an i-name or i-number. These are not required, but may be used within communities of interest that agree on their meaning and how they are resolved.
* Peer-to-peer addressing — XRI syntax supports the ability for any two network nodes to assign each other XRIs and perform cross-resolution. That is, a top-level namespace authority can be referred to by names assigned by other parties. This aids in federating namespaces between organizations or communities of interest.
* Decentralization — XRIs can be rooted in either centralized addressing systems (e.g., IP addresses or DNS domain names) or private/decentralized root authorities and peer-to-peer addressing.
* Delegation — Namespaces can be delegated to other namespace authorities.
* Federation — Namespaces defined separately at any level can be joined together in a hierarchical or polyarchical fashion, and made visible and resolvable.
* Persistence — The ability to express the intent that parts (or all) of an XRI are permanent identifiers that will never be reassigned.
* Human-friendly and machine-friendly formats — XRI provides syntax both for identifiers that can be created and understood by humans easily (i-names), and those that are optimized for machine structuring/parsing (i-numbers).
* Simple, extensible resolution — XRI offers a lightweight resolution scheme using HTTP and a simple XML document format called XRDS.
* Trusted resolution — the XRI resolution protocol includes three modes of trusted version: a) HTTPS, b) SAML assertions, and c) both.
* Multiple resolution options — XRI resolution can be independent of DNS.
* Fully internationalizable, leveraging Unicode and IRI specifications.
* Transport independent — XRIs are not bound to any specific transport protocols or mechanism.

[edit] Composition of an Extensible Resource Identifier

An XRI starting with "=" is thought of identifying a person. An XRI starting with "@" identifies a company or organization. A starting "+" indicates a generic concept, subject or topic [6].

A "*" marks a delegation. For example with "=family*name", "=family" delegates the resolving of its sub-XRI "name" to another resolver. This is analogous to DNS' delegating the subdomain resolution to other nameservers (name.family.de: after resolving de, the nameserver responsible for de delegates to the family nameserver, which delegates to the name nameserver).

[edit] Resolving an Extensible Resource Identifier

XRIs are resolved to XRDS documents using the HTTP(S) protocol in the same way as URLs are resolved to Resource Records using the DNS protocol. This lookup process can be configured by passing parameters [7].

[edit] Proxy resolvers and the HXRI

An XRI can be transformed into a URI by adding http(s)://xri.*/ at the beginning and appending the XRI. Internally, the URI now refers to a so called proxy resolver, which resolves a URI of this kind to an XRDS document. The proxy resolver found under http://xri.net for example can be used to resolve an XRI. So =example becomes http://xri.net/=example. The second form is called an HTTP XRI or shortly HXRI. The owner of the XRI =example can tell the proxy resolver what to do, if the HXRI is called. One possible reaction is to do a 302 HTTP redirect to a stored URI.

Further parameters to specifiy the resolution can be appended to the HXRI, e.g. to get the whole XRDS document or to get service descriptions for this XRI. E.g. if you attach ?_xrd_r=application/xrds+xml to the HXRI, the whole XRDS document is returned. So http://xri.net/=example?_xrd_r=application/xrds+xml returns the whole XRDS for the XRI =example.

[edit] Examples of XRI cross-reference syntax

Say a library system uses URNs in the ISBN namespace to identify books and DNS subdomains to identify its library branches. HTTP URI syntax does not provide a standard way to express the URN for the book title in the context of the DNS name for the library branch. XRI cross-reference syntax solves this problem by allowing the library (and even automated programs running at the library) to programmatically construct the XRIs necessary to address any book at any branch. Examples:

xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)
xri://shoreline.library.example.com/(urn:isbn:0-395-36341-1)
xri://northgate.library.example.com/(urn:isbn:0-395-36341-1)

This ability to create structured, self-describing identifiers can be extended to many other uses. For example, say the library wanted to indicate the type of each book available. By establishing a simple XRI dictionary of book types, it can now programmatically construct XRIs that include this metadata,

xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+hardcover)
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+softcover)
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+reference)

[edit] Other examples of XRI 2.0 syntax

(Note that none of these show the prefix "xri://", which is optional in XRIs when they are not in URI normal form, i.e, they have not undergone the specified transformation between XRI format and URI format.)

Example XRIs composed entirely of reassignable segments:

=Mary.Jones
@Jones.and.Company
+phone.number
+phone.number/(+area.code)
=Mary.Jones/(+phone.number)
@Jones.and.Company/(+phone.number)
@Jones.and.Company/((+phone.number)/(+area.code))

Example XRIs composed entirely of persistent segments:

=!13cf.4da5.9371.a7c5
@!280d.3822.17bf.ca48!78d2/!12

Example of XRIs with mixes of persistent and reassignable segments (XRI allows any combination of the two):

=!13cf.4da5.9371.a7c5/(+phone.number)
@Jones.and.Company!78d2/!12/(+area.code)

[edit] Applications

Examples of applications being developed using XRI infrastructure include:

* OpenID 2.0 includes support for XRIs and uses XRDS for OpenID identifier discovery.
* The Higgins Project uses XRIs and XRDS to address and discover Higgins context providers.
* XDI.org I-name and I-number digital identity addressing services.
* The XDI data sharing protocol under development by the OASIS XDI Technical Committee.

[edit] Licensing
The neutrality of this article is disputed.
Please see the discussion on the talk page. (June 2008)
Please do not remove this message until the dispute is resolved.
This section needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (August 2008)

The XRI Technical Committee is chartered under the RF on Limited Terms Mode of the OASIS IPR policy (See http://www.oasis-open.org/committees/xri/ipr.php for more details.)

Some people[weasel words] argues that the use of the technologies employed in XRI are subject to patent claims, that the licensing rights to these patents has been vested in XDI.org, a non-profit organization which has in turn licensed a non-exclusive interest in the use of the patents to companies associated with the original patent holders, despite the above IPR statement.

[edit] References

1. ^ Failed OASIS
2. ^ Time for OASIS XRI TC and W3C TAG to Sit Down Together
3. ^ TAG recommends against XRI
4. ^ URNs, Namespaces and Registries
5. ^ Xri Solves Real Problems
6. ^ XRI and XDI Explained
7. ^ XRI in a Nutshell

[edit] See also

* I-names
* I-numbers
* XRDS
* XDI
* Dataweb
* Social Web
* Higgins project
* Project Xanadu

[edit] External links

* OASIS XRI Technical Committee specifications:
o XRI Syntax 2.0 Committee Specification
o XRI Resolution 2.0 Committee Specification
o XRI 2.0 FAQ
o XRI Requirements and Glossary 1.0
* W3C Internationalized Resource Identifier (IRI)
* XDI.org - public trust organization governing XRI global registry services
o XDI.org Global Services Specifications - website of XDI.org specifications for global registry services for public i-names and i-numbers
o XDI.org I-Services Specifications - website of XDI.org specifications for XRDS-enabled identity services.
* dev.xri.net - open public wiki on XRI and XRI open source projects
* Internet Identity Workshop One-Pager on XRI and XRDS
* FSF's Dispute with OASIS patent policies and on FSF's Support for OASIS RF on Limited Terms IPR Policy, which is used for ODF.
* EqualsDrummond - blog about XRI and Internet identifiers by Drummond Reed, co-chair of the OASIS XRI Technical Committee and Chief Architect at Cordance, currently under contract with XDI.org to operate XRI registr


nternationalized Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search

On the Internet, the Internationalized Resource Identifier (IRI) is a generalization of the Uniform Resource Identifier (URI), which is in turn a generalization of the Uniform Resource Locator (URL). While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth.

It is defined by RFC 3987.
Contents
[hide]

* 1 Advantages
* 2 Disadvantages
* 3 See also
* 4 External links

[edit] Advantages

There are reasons to see URIs displayed in different languages; mostly it makes it easier on users who are unfamiliar with the roman alphabet, and assuming that isn't too difficult for anyone to replicate arbitrary unicode on their keyboards this can make the URI system more worldly and accessible.

[edit] Disadvantages

Mixing IRIs and ASCII URIs can make it much easier to do phishing attacks which trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay.com or www.paypal.com with an internationalized look-alike "a" character, and point that IRI to a malicious site.

Additionally, it can be difficult for those with different language keyboards to access web resources in other languages; in contrast, open-source programming projects (and most programs) are almost exclusively written using the Roman alphabet to avoid this type of encoding incompatibility.

[edit] See also

* XRI (Extensible Resource Identifier)
* IDN (Internationalized Domain Name)
* Punycode

[edit] External links

* IRI
* Internationalized Resource Identifiers

This computer-related article is a stub. You can help Wikipedia by expanding it

Uniform Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search
"URI" redirects here. For other uses, see URI (disambiguation).

In computing, a Uniform Resource Identifier (URI) is a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. URIs are defined in schemes specifying a specific syntax and associated protocols.
Contents
[hide]

* 1 Relationship to URL and URN
o 1.1 Technical view
o 1.2 RFC 3305
* 2 Syntax
* 3 History
o 3.1 Naming, addressing, and identifying resources
o 3.2 Refinement of specifications
* 4 URI reference
o 4.1 Uses of URI references in markup languages
o 4.2 Examples of absolute URIs
o 4.3 Examples of URI references
* 5 URI resolution
* 6 Relation to XML namespaces
* 7 See also
* 8 References
* 9 External links

[edit] Relationship to URL and URN
Set diagram of URI scheme categories. Schemes in the URL (locator) and URN (name) categories form subsets of URI, and, generally, are also disjoint sets.
Technically URL and URN function as resource IDs, however, many schemes can't be categorized as strictly one or the other, because all URIs can be treated as names, and some schemes embody aspects of both categories – or neither.

Computer scientists may classify a URI as a locator (URL), or a name (URN), or both.

A Uniform Resource Name (URN) is like a person's name, while a Uniform Resource Locator (URL) resembles that person's street address. The URN defines an item's identity, while the URL provides a method for finding it.

The ISBN system for uniquely identifying books provides a typical example of the use of typical URNs. ISBN 0486275574 (urn:isbn:0-486-27557-4) cites unambiguously a specific edition of Shakespeare's play Romeo and Juliet. In order to gain access to this object and read the book, one would need its location: a URL address. A typical URL for this book on a unix-like operating system is a file path, like file:///home/username/RomeoAndJuliet.pdf, identifying the electronic book saved in a local hard disk. So URNs and URLs have complementary purposes.

[edit] Technical view

A URL is a URI that, in addition to identifying a resource, provides means of acting upon or obtaining a representation of the resource by describing its primary access mechanism or network "location". For example, the URL http://www.wikipedia.org/ identifies a resource (Wikipedia's home page) and implies that a representation of that resource (such as the home page's current HTML code, as encoded characters) is obtainable via HTTP from a network host named www.wikipedia.org. A Uniform Resource Name (URN) is a URI that identifies a resource by name in a particular namespace. A URN can be used to talk about a resource without implying its location or how to access it. For example, the URN urn:isbn:0-395-36341-1 is a URI that specifies the identifier system, i.e. International Standard Book Number (ISBN), as well as the unique reference within that system and allows one to talk about a book, but doesn't suggest where and how to obtain an actual copy of it.

Technical publications, especially standards produced by the IETF and the W3C, have long deprecated the term URL, as it is rarely necessary to distinguish between URLs and URIs. However, in nontechnical contexts and in software for the World Wide Web, the term URL remains widely used. Additionally, the term web address, which has no formal definition, is often used in nontechnical publications as a synonym for URL or URI, although it generally refers only to "http" and "https" URL schemes.

[edit] RFC 3305

Much of this discussion comes from RFC3305, titled "Report from the Joint W3C/IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and Recommendations". This RFC outlines the work of a joint W3C/IETF working group that was setup specifically to normalize the divergent views held within the IETF and W3C over what the relationship was between the various "UR*" terms and standards. While not published as a full standard by either organization, it has become the basis for the above common understanding and has informed many standards since then.

[edit] Syntax

The URI syntax is essentially a URI scheme name like "HTTP", "FTP", "mailto", "URN", "tel", "rtsp", "file", etc., followed by a colon character, and then a scheme-specific part. The specifications that govern the schemes determine the syntax and semantics of the scheme-specific part, although the URI syntax does force all schemes to adhere to a certain generic syntax that, among other things, reserves certain characters for special purposes, without always saying what those purposes are. The URI syntax also enforces restrictions on the scheme-specific part, in order to, for example, provide for a degree of consistency when the part has a hierarchical structure. Percent-encoding is an often-misunderstood aspect of URI syntax.

[edit] History

[edit] Naming, addressing, and identifying resources

URIs and URLs have a shared history. Early in 1990, Tim Berners-Lee’s proposals for HyperText [2] implicitly introduced the idea of a URL as a short string representing a resource that is the target of a hyperlink. At the time, it was called a hypertext name or document name[3]

Over the next three-and-a-half years, as the World Wide Web's core technologies of HTML (the HyperText Markup Language), HTTP, and Web browsers developed, a need to distinguish a string that provided an address for a resource from a string that merely named a resource emerged. Although not yet formally defined, the term Uniform Resource Locator came to represent the former, and the more contentious Uniform Resource Name came to represent the latter.

During the debate over how to best define URLs and URNs, it became evident that the two concepts embodied by the terms were merely aspects of the fundamental, overarching notion of resource identification. So, in June 1994, the IETF published Berners-Lee's RFC 1630: the first RFC that (in its non-normative text) acknowledged the existence of URLs and URNs, and, more importantly, defined a formal syntax for Universal Resource Identifiers — URL-like strings whose precise syntaxes and semantics depended on their schemes. In addition, this RFC attempted to summarize the syntaxes of URL schemes that were in use at the time. It also acknowledged, but did not standardize, the existence of relative URLs and fragment identifiers.

[edit] Refinement of specifications

In December 1994, RFC 1738 formally defined relative and absolute URLs, refined the general URL syntax, defined how relative URLs were to be resolved to absolute form, and better enumerated the URL schemes that were in use at the time. The definition and syntax of URNs was not settled upon until the publication of RFC 2141 in May 1997.

With the publication of RFC 2396 in August 1998, the URI syntax became a separate specification[4], and most parts of RFCs 1630 and 1738 relating to URIs and URLs in general were revised and expanded. The new RFC changed the significance of the "U" in "URI": it came to represent "Uniform" rather than "Universal". The sections of RFC 1738 that summarized existing URL schemes were moved into a separate document[1]. IANA keeps a registry of those schemes[2], the procedure to register them was first described in RFC 2717.

In December 1999, RFC 2732 provided a minor update to RFC 2396, allowing URIs to accommodate IPv6 addresses. Some time later, a number of shortcomings discovered in the two specifications led to the development of a number of draft revisions under the title rfc2396bis. This community effort, coordinated by RFC 2396 co-author Roy Fielding, culminated in the publication of RFC 3986 in January 2005. This RFC, as of 2009[update] the current version of the URI syntax recommended for use on the Internet, renders RFC 2396 obsolete. It does not, however, render the details of existing URL schemes obsolete; those are still governed by RFC 1738, except where otherwise superseded — RFC 2616 for example, refines the "http" scheme. The content of RFC 3986 was simultaneously published by the IETF as the full standard STD 66, reflecting the establishment of the URI generic syntax as an official Internet protocol.

In August 2002, RFC 3305 pointed out that the term URL has, despite its ubiquity in the vernacular of the Internet-aware public at large, faded into near-obsolescence. It now serves only as a reminder that some URIs act as addresses because they have schemes that imply some kind of network accessibility, regardless of whether systems actually use them for that purpose. As URI-based standards such as Resource Description Framework make evident, resource identification need not be coupled with the retrieval of resource representations over the Internet, nor does it need to be associated with network-bound resources at all.

On November 1, 2006, the W3C Technical Architecture Group published "On Linking Alternative Representations To Enable Discovery And Publishing", a guide to best practices and canonical URIs for publishing multiple versions of a given resource. For example, content might differ by language or by size to adjust for capacity or settings of the device used to access that content.

For the Semantic Web, the HTTP URI scheme can be used to identify both documents and concepts in the real world, this has caused confusion how to exactly distinguish both. The Technical Architecture Group (TAG) published an e-mail in June 2005 on how to solve this problem. This was known as httpRange-14 resolution[3]. To explain this (rather brief) email, W3C published in March 2008 the Interest Group Note Cool URIs for the Semantic Web[4]. This explains the the use of content negotiation and the 303-redirect code in more detail.

[edit] URI reference

A URI reference is another type of string that represents a URI, and, in turn, the resource identified by that URI. Informal usage does not often maintain the distinction between a URI and a URI reference, but protocol documents should not allow for ambiguity.

A URI reference may take the form of a full URI, or just the scheme-specific portion of one, or even some trailing component thereof—even the empty string. An optional fragment identifier, preceded by "#", may be present at the end of a URI reference. The part of the reference before the "#" indirectly identifies a resource, and the fragment identifier identifies some portion of that resource.

In order to derive a URI from a URI reference, software converts the URI reference to "absolute" form by merging it with an absolute "base" URI, according to a fixed algorithm. The URI reference is considered to be relative to the base URI, although if the reference itself is absolute, then the base is irrelevant. The base URI is typically the URI that identifies the document containing the URI reference, although this can be overridden by declarations made within the document or as part of an external data transmission protocol. If a fragment identifier is present in the base URI, it is ignored during the merging process. If a fragment identifier is present in the URI reference, it is preserved during the merging process.

Web document markup languages frequently use URI references in places where there is a need to point to other resources, such as external documents or specific portions of the same logical document.

[edit] Uses of URI references in markup languages

* In HTML, the value of the src attribute of the img element is a URI reference, as is the value of the href attribute of the a or link element.
* In XML, the system identifier appearing after the SYSTEM keyword in a DTD is a fragmentless URI reference.
* In XSLT, the value of the href attribute of the xsl:import element/instruction is a URI reference, as is the first argument to the document() function.

[edit] Examples of absolute URIs

* http://example.org/absolute/URI/with/absolute/path/to/resource.txt
* ftp://example.org/resource.txt
* urn:issn:1535-3613

[edit] Examples of URI references

* http://en.wikipedia.org/wiki/URI#Examples_of_URI_references ("http" is the 'scheme' name, "en.wikipedia.org" is the 'authority', "/wiki/URI" the 'path' pointing to this article, and "#Examples_of_URI_references" is a 'fragment' pointing to this section.)
* http://example.org/absolute/URI/with/absolute/path/to/resource.txt
* /relative/URI/with/absolute/path/to/resource.txt
* relative/path/to/resource.txt
* ../../../resource.txt
* ./resource.txt#frag01
* resource.txt
* #frag01
* (empty string)

[edit] URI resolution

To "resolve" a URI means either to convert a relative URI reference to absolute form, or to dereference a URI or URI reference by attempting to obtain a representation of the resource that it identifies. The "resolver" component in document processing software generally provides both services.

One can regard a URI reference as a same-document reference: a reference to the document containing the URI reference itself. Document processing software is encouraged to use its current representation of the document to satisfy the resolution of a same-document reference; a new representation should not be fetched. This is only a recommendation, and document processing software is free to use other mechanisms to determine whether obtaining a new representation is warranted.

According to the current URI specification as of 2009[update], RFC 3986, a URI reference is a same-document reference if, when resolved to absolute form, it is identical to the base URI that is in effect for the reference. Typically, the base URI is the URI of the document containing the reference. XSLT 1.0, for example, has a document() function that, in effect, implements this functionality. RFC 3986 also formally defines URI equivalence, which can be used in order to determine that a URI reference, while not identical to the base URI, still represents the same resource and thus can be considered to be a same-document reference.

Same-document references were determined differently according to RFC 2396, which was made obsolete by RFC 3986 but still serves as the basis of many specifications and implementations. According to this specification, a URI reference is a same-document reference if it is an empty string or consists of only the "#" character followed by an optional fragment.

[edit] Relation to XML namespaces

XML has a concept of a namespace, an abstract domain to which a collection of element and attribute names can be assigned. An XML namespace is identified by a character string, the namespace name, which must adhere to the generic URI syntax. However, the namespace name is not considered to be a URI because the "URI-ness" of strings is, according to the URI specification, based on how they are intended to be used, not just their lexical components. A namespace name also does not necessarily imply any of the semantics of URI schemes; a namespace name beginning with "http:", for example, likely has nothing to do with the HTTP protocol. XML professionals have debated this intensively on the xml-dev electronic mailing list; some feel that a namespace name could be a URI, since the collection of names comprising a particular namespace could be considered to be a resource that is being identified, and since the Namespaces in XML specification says that the namespace name is a URI reference. But the consensus seems to suggest that a namespace name is just a string that happens to look like a URI, nothing more.

Initially, the namespace name was allowed to match the syntax of any non-empty URI reference, but an erratum to the "Namespaces In XML Recommendation" later deprecated the use of relative URI references. A separate specification was issued for namespaces for XML 1.1, and allows IRI references, not just URI references, to be used as the basis for namespace names.

In order to mitigate the confusion that began to arise among newcomers to XML from the use of URIs (particularly HTTP URLs) for namespaces, a descriptive language called RDDL developed, though the specification of RDDL (http://www.rddl.org/) has no official standing and has not been considered nor approved by any organization (e.g., W3C). An RDDL document can provide machine- and human-readable information about a particular namespace and about the XML documents that use it. XML document authors were encouraged[by whom?] to put RDDL documents in locations such that if a namespace name in their document was somehow dereferenced, then an RDDL document would be obtained, thus satisfying the desire among many developers for a namespace name to point to a network-accessible resource.

[edit] See also
For help on using external links on Wikipedia, see Help:URL and Wikipedia:External links

* .arpa - uri.arpa is for dynamic discovery
* Dereferenceable URI (an HTTP URI)
* History of the Internet
* IRI (Internationalized Resource Identifier)
* Namespace (programming)
* percent-encoding
* Persistent Uniform Resource Locator (PURL)
* Uniform Naming Convention (UNC), in computing
* URI scheme
* Uniform Resource Locator (URL)
* Uniform Resource Name (URN)
* Website
* XRI (Extensible Resource Identifier)

[edit] References
This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (October 2008)

1. ^ This separate document is not explicitly linked, RFC 2717 and RFC 4395 point to the IANA registry as the official URI scheme registry.
2. ^ IANA registry of URI schemes[1]
3. ^ The httpRange-14 resolution consists of three bullet points and did not help much to reduce the confusion. http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
4. ^ http://www.w3.org/TR/cooluris/

[edit] External links
The external links in this article may not follow Wikipedia's content policies or guidelines.
Please improve this article by removing excessive or inappropriate external links.

* RFC 3986 / STD 66 (2005) – the current[update] generic URI syntax specification
* RFC 2396 (1998) and RFC 2732 (1999) – obsolete, but widely implemented, version of the generic URI syntax
* RFC 1808 (1995) – obsolete companion to RFC 1738 covering relative URL processing
* RFC 1738 (1994) – mostly obsolete definition of URL schemes and generic URI syntax
* RFC 1630 (1994) – the first generic URI syntax specification; first acknowledgment of URLs in an Internet standard
* URI Schemes – IANA-maintained registry of URI Schemes
* URI Working Group – coordination center for development of URI standards
* Architecture of the World Wide Web, Volume One, §2: Identification – by W3C
* Example of discussion about names and addresses
* W3C materials related to Addressing
* W3C URI Clarification
* What's a URI and why does it matter? (2008) - from W3C
* The Self-Describing Web (2008) - from W3C

[hide]
v • d • e
Semantic Web
Background
World Wide Web · Internet · Databases · Semantic networks · Ontologies
Sub-topics
Linked Data · Data Web · Hyperdata · Dereferenceable URIs · Ontologies · Rule bases · Data Spaces
Applications
Semantic wiki · Semantic publishing · Semantic search · Semantic advertising · Semantic reasoner · Semantic matching · Semantic mapper · Semantic broker · Semantic analytics · Semantic service oriented architecture
Related Topics
Folksonomy · Web 2.0 · Plain Old Semantic HTML · Search engine optimization · Open Database Connectivity · References · Information architecture · Knowledge management · Collective intelligence · Topic Maps · XML · Description logic
Standards
Syntax & Supporting Technologies : RDF (Notation 3 · Turtle · N-Triples) · SPARQL · URI · HTTP · XML

Schemas, Ontologies & Rules : RDFS · OWL · Rule Interchange Format · Semantic Web Rule Language

Semantic Annotation : RDFa · eRDF · GRDDL · Microformats
Common Vocabularies : FOAF · SIOC · Dublin Core · SKOS
People
Tim Berners-Lee · James Hendler · Ora Lassila · Nigel Shadbolt · Wendy Hall
Key Semantic
Web Organizations
W3C · WSRI · MIT · OpenLink Software · Talis Group · ClearForest · University o


URL normalization
From Wikipedia, the free encyclopedia
Jump to: navigation, search

URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent.

Search engines employ URL normalization in order to assign importance to web pages and to reduce indexing of duplicate pages. Web crawlers perform URL normalization in order to avoid crawling the same resource more than once. Web browsers may perform normalization to determine if a link has been visited or to determine if a page has been cached.
Contents
[hide]

* 1 Normalization process
* 2 Normalization based on URL lists
* 3 References
* 4 See also

[edit] Normalization process

There are several type of normalization that may be performed:

* Converting the scheme and host to lower case. The scheme and host components of the URL are case-insensitive. Most normalizers will convert them to lowercase. Example:

HTTP://www.Example.com/ → http://www.example.com/

* Adding trailing / Directories are indicated with a trailing slash and should be included in URLs. Example:

http://www.example.com → http://www.example.com/

* Removing directory index. Default directory indexes are generally not needed in URLs. Examples:

http://www.example.com/default.asp → http://www.example.com/
http://www.example.com/a/index.html → http://www.example.com/a/

* Converting the entire URL to lower case. Some web servers that run on top of case-insensitive file systems allow URLs to be case-insensitive. URLs from a case-insensitive web server may be converted to lowercase to avoid ambiguity. Example:

http://www.example.com/BAR.html → http://www.example.com/bar.html

* Capitalizing letters in escape sequences. All letters within a percent-encoding triplet (e.g., "%3A") are case-insensitive, and should be capitalized. Example:

http://www.example.com/a%c2%b1b → http://www.example.com/a%C2%B1b

* Removing the fragment. The fragment component of a URL is usually removed. Example:

http://www.example.com/bar.html#section1 → http://www.example.com/bar.html

* Removing the default port. The default port (port 80 for the “http” scheme) may be removed from (or added to) a URL. Example:

http://www.example.com:80/bar.html → http://www.example.com/bar.html

* Removing dot-segments. The segments “..” and “.” are usually removed from a URL according to the algorithm described in RFC 3986 (or a similar algorithm). Example:

http://www.example.com/../a/b/../c/./d.html → http://www.example.com/a/c/d.html

* Removing “www” as the first domain label. Some websites operate in two Internet domains: one whose least significant label is “www” and another whose name is the result of omitting the least significant label from the name of the first. For example, http://example.com/ and http://www.example.com/ may access the same website. Although many websites redirect the user to the non-www address (or vice versa), some do not. A normalizer may perform extra processing to determine if there is a non-www equivalent and then normalize all URLs to the non-www prefix. Example:

http://www.example.com/ → http://example.com/

* Sorting the variables of active pages. Some active web pages have more than one variable in the URL. A normalizer can remove all the variables with their data, sort them into alphabetical order (by variable name), and reassemble the URL. Example:

http://www.example.com/display?lang=en&article=fred → http://www.example.com/display?article=fred〈=en

* Removing arbitrary querystring variables. An active page may expect certain variables to appear in the querystring; all unexpected variables should be removed. Example:

http://www.example.com/display?id=123&fakefoo=fakebar → http://www.example.com/display?id=123

* Removing default querystring variables. A default value in the querystring will render identically whether it is there or not. When a default value appears in the querystring, it should be removed. Example:

http://www.example.com/display?id=&sort=ascending → http://www.example.com/display

* Removing the "?" when the querystring is empty. When the querystring is empty, there is no need for the "?". Example:

http://www.example.com/display? → http://www.example.com/display

[edit] Normalization based on URL lists

Some normalization rules may be developed for specific websites by examining URL lists obtained from previous crawls or web server logs. For example, if the URL

http://foo.org/story?id=xyz

appears in a crawl log several times along with

http://foo.org/story_xyz

we may assume that the two URLs are equivalent and can be normalized to one of the URL forms.

Schonfeld et al. (2006) present a heuristic called DustBuster for detecting DUST (different URLs with similar text) rules that can be applied to URL lists. They showed that once the correct DUST rules were found and applied with a canonicalization algorithm, they were able to find up to 68% of the redundant URLs in a URL list.

[edit] References

* RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax

* Sang Ho Lee, Sung Jin Kim, and Seok Hoo Hong (2005). "On URL normalization". Proceedings of the International Conference on Computational Science and its Applications (ICCSA 2005): 1076-1085.

* Gautam Pant, Padmini Srinivasan, and Filippo Menczer (2004). "Crawling the Web". Web Dynamics: Adapting to Change in Content, Size, Topology and Use, edited by M. Levene and A. Poulovassilis: 153-178.

* Uri Schonfeld, Ziv Bar-Yossef, and Idit Keidar (2006). "Do not crawl in the DUST: different URLs with similar text". Proceedings of the 15th international conference on World Wide Web: 1015-1016.

* Uri Schonfeld, Ziv Bar-Yossef, and Idit Keidar (2007). "Do not crawl in the DUST: different URLs with similar text". Proceedings of the 16th international conference on World Wide Web: 111-120.

[edit] See also

* Web crawler
* Uniform Resource Locator

Retrieved from "http://en.wikipedia.org/wiki/URL_normalization"
Categories: URL | Internet search algorithms


URI scheme
From Wikipedia, the free encyclopedia
Jump to: navigation, search

In the field of computer networking, a URI scheme is the top level of the Uniform Resource Identifier (URI) naming structure. All URIs and absolute URI references are formed with a scheme name, followed by a colon character (":"), and the remainder of the URI called (in the outdated RFCs 1738 and 2396, but not the current STD 66/RFC 3986) the scheme-specific part. The syntax and semantics of the scheme-specific part are left largely to the specifications governing individual schemes, subject to certain constraints such as reserved characters and how to "escape" them.

URI schemes are sometimes erroneously referred to as "protocols", or specifically as URI protocols or URL protocols, since most were originally designed to be used with a particular protocol, and often have the same name. The http scheme, for instance, is generally used for interacting with Web resources using HyperText Transfer Protocol. Today, URIs with that scheme are also used for other purposes, such as RDF resource identifiers and XML namespaces, that are not related to the protocol. Furthermore, some URI schemes are not associated with any specific protocol (e.g. "file") and many others do not use the name of a protocol as their prefix (e.g. "news").

URI schemes should be registered with IANA, although non-registered schemes are used in practice. RFC 4395 describes the procedures for registering new URI schemes.
Contents
[hide]

* 1 Generic syntax
o 1.1 Examples
* 2 Official IANA-registered schemes
* 3 Unofficial but common URI schemes
* 4 External links

[edit] Generic syntax

Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:

: [ ? ] [ # ]

The scheme name consists of a letter followed by any combination of letters, digits, and the plus ("+"), period ("."), or hyphen ("-") characters; and is terminated by a colon (":").

The hierarchical part of the URI is intended to hold identification information hierarchical in nature. Usually this part begins with a double forward slash ("//"), followed by an authority part and an optional path.

* The authority part holds an optional user information part terminated with "@" (e.g. username:password@), a hostname (i.e. domain name or IP address), and an optional port number preceded by a colon ":".

* The path part is a sequence of segments (conceptually similar to directories, though not necessarily representing them) separated by a forward slash ("/"). Each segment can contain parameters separated from it using a semicolon (";"), though this is rarely used in practice.

The query is an optional part separated with a question mark, which contains additional identification information which is not hierarchical in nature. The query string syntax is not generically defined, but is commonly organized as a sequence of = pairs separated by a semicolon[1][2][3] or separated by an ampersand, e. g. key1=value1&key2=value2&key3=value3 or key1=value1;key2=value2;key3=value3.

The fragment is an optional part separated from the front parts by a hash ("#"). It holds additional identifying information that provides direction to a secondary resource, e.g. a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id atrribute of a specific element and web browsers will make sure this element is visible.

[edit] Examples

The following are two example URIs and their component parts (taken loosely from RFC 3986 — STD 66):

foo://username:password@example.com:8042/over/there/?name=ferret#nose
\ / \________________/\_________/ \__/\_________/ \_________/ \__/
| | | | | | |
| userinfo hostname port path query fragment
| \_______________________________/
scheme authority
|
| path
| ___________|____________
/ \ / \
urn:example:animal:ferret:nose

1. ^ RFC 1866 section 8.2.1 : by Tim Berners-Lee in 1995 encourages CGI authors to support ';' in addition to '&'.
2. ^ HTML 4.01 Specification: Implementation, and Design Notes: "CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner."
3. ^ Hypertext Markup Language - 2.0 "CGI implementors are encouraged to support the use of ';' in place of '&' "

[edit] Official IANA-registered schemes

The official URI schemes registered with the IANA follow.
Scheme Purpose Defined by General format Notes
aaa Diameter Protocol RFC 3588 aaa://[:][;transport=][;protocol=]

example:
aaa://host.example.com:1813;transport=udp;protocol=radius

aaas Secure equivalent of aaa RFC 3588 aaas://[:][;transport=][;protocol=]
acap Application Configuration Access Protocol RFC 2244 acap://[[;AUTH=]@][:]/ URL scheme used within the ACAP protocol for the "subdataset" attribute, referrals and inheritance
cap Calendar access protocol RFC 4324 generic syntax URL scheme used to designate both calendar stores and calendars accessible using the CAP protocol
cid Referencing individual parts of an SMTP/MIME message RFC 2392 cid: e.g. referencing an attached image within a formatted e-mail. (See also mid:)
crid TV-Anytime Content Reference Identifier RFC 4078 crid:/// Allow references to scheduled publications of broadcast media content.
data Inclusion of small data items inline RFC 2397 data:[;base64],
dav HTTP Extensions for Distributed Authoring (WebDAV) RFC 2518 dav: Used for internal identifiers only; WebDAV itself addresses resources using the http: and https: schemes. [1]
dict Dictionary service protocol RFC 2229 dict://;@:/d:::

dict://;@:/m::::
refer to definitions or word lists available using the DICT protocol
dns Domain Name System RFC 4501 dns:[//[:]/][?]

examples:
dns:example?TYPE=A;CLASS=IN
dns://192.168.1.1/ftp.example.org?type=A
designates a DNS resource record set, referenced by domain name, class, type, and, optionally, the authority
fax Used for telefacsimile numbers RFC 2806 fax: Seems to be deprecated in RFC 3966 in favour of tel:
file Addressing files on local or network file systems RFC 1738 generic syntax
(often appears as file:///path, the 3rd '/' is the final delimiter when no host (authority) is specified between) Unusual in not being bound to any network protocol, and not usable in an Internet context.
ftp FTP resources RFC 1738 generic syntax
go Common Name Resolution Protocol RFC 3368 go://[]?[]*[;=[,]] or
go:*[;=[,]]
gopher Used with Gopher protocol RFC 4266 gopher://://
h323 Used with H.323 multimedia communications RFC 3508 h323:[@][:][;]
http HTTP resources RFC 2616 generic syntax
https HTTP connections secured using SSL/TLS RFC 2817 generic syntax
icap Internet Content Adaptation Protocol RFC 3507
im Instant messaging protocol RFC 3860 RFC 4622 im:[@] Works as xmpp: URI for single user chat sessions.
imap Accessing e-mail resources through IMAP RFC 2192 imap://[[;AUTH=]@][:]/
info Information Assets with Identifiers in Public Namespaces RFC 4452
ipp Internet Printing Protocol RFC 3510
iris
iris.beep
iris.xpc
iris.xpcs
iris.lws Internet Registry Information Service RFC 3981 RFC 3983 RFC 4992 RFC 4992 RFC 4993
ldap LDAP directory request RFC 2255
RFC 4516 ldap://[[:]][/ [?[][?[][?[][?]]]]]

example:
ldap://ldap1.example.net:6666/o=University%20of%20Michigan, c=US??sub?(cn=Babs%20Jensen)

mailto SMTP e-mail addresses and default content RFC 2368 mailto:
[?=[&=]]

example:
mailto:jsmith@example.com?subject=A%20Test&body=My%20idea%20is%3A%20%0A
Headers are optional, but often include subject=; body= can be used to pre-fill the body of the message.
mid Referencing SMTP/MIME messages, or parts of messages. RFC 2392 mid:[/] (See also cid:)
modem modem RFC 3966
msrp
msrps Message Session Relay Protocol RFC 4975
mtqp Message Tracking Query Protocol RFC 3887
mupdate Mailbox Update Protocol RFC 3656
news (Usenet) newsgroups and postings RFC 1738 news: or
news: References a particular resource, regardless of location.
nfs Network File System resources RFC 2224 generic syntax
nntp Usenet NNTP RFC 1738 nntp://:// Referencing a specific host is often less useful than referencing the resource generically, as NNTP servers are not always publicly accessible
opaquelocktoken opaquelocktoken RFC 4918
pop Accessing mailbox through POP3 RFC 2384 pop://[[;AUTH=]@][:]
pres Used in Common Profile for Presence (CPP) to identify presence RFC 3859 pres:
[?=[&=]] Similar to "mailto:"
prospero Prospero Directory Service RFC 4157 Listed as "Historical" by IANA.
rtsp Real Time Streaming Protocol RFC 2326
service RFC 2609
shttp Secure HTTP RFC 2660 Largely superseded by HTTPS.
sip Used with Session Initiation Protocol (SIP) RFC 3969
RFC 3261 sip:[:]@[:][;][?]

examples:
sip:alice@atlanta.com?subject=project%20x&priority=urgent
sip:+1-212-555-1212:1234@gateway.com;user=phone

sips Secure equivalent of sip RFC 3969
RFC 3261 sips:[:]@[:][;][?]
snmp Simple Network Management Protocol RFC 4088 snmp://[user@]host[:port][/[[;]][/]]

examples:
snmp://example.com//1.3.6.1.2.1.1.3+
snmp://tester5@example.com:8161/bridge1;800002b804616263

soap.beep
soap.beeps RFC 3288
tag RFC 4151
tel Used for telephone numbers RFC 3966
RFC 2806 tel:
telnet Used with telnet RFC 4248 telnet://:@[:/]
tftp Trivial File Transfer Protocol RFC 3617
thismessage multipart/related relative reference resolution RFC 2557
tip Transaction Internet Protocol RFC 2371
tv TV Broadcasts RFC 2838
urn Uniform Resource Names RFC 2141 urn::
vemmi Versatile Multimedia Interface RFC 2122
wais Used with Wide area information server (WAIS) RFC 4156 wais://:/[?] or wais://:/// Listed as "Historical" by IANA.
xmlrpc.beep
xmlrpc.beep RFC 3529
xmpp XMPP (Jabber) RFC 5122 xmpp:@[:]/[][?]
z39.50r Z39.50 retrieval RFC 2056 z39.50r://[:]/?[;esn=][;rs=]
z39.50s Z39.50 session RFC 2056 z39.50s://[:]/[][?][;esn=][;rs=]

[edit] Unofficial but common URI schemes
Scheme Purpose Defined by General format Notes
about Displaying product information and internal information Un-standardised
about:blank is commonly used to display a blank page. Widely used by web browsers, sometimes even providing interactive resources. The Opera web browser uses opera: instead.
adiumxtra Direct installation of Adium Xtras (plugins). The Adium Team adiumxtra://www.adiumxtras.com/download/0000 0000 refers to a specific Xtra
aim Controlling AOL Instant Messenger. AOL aim:? Functions include goim, addbuddy, and buddyicon.
afp Accessing Apple Filing Protocol shares IETF Draft over TCP/IP: afp://[@][:][/[]]

over AppleTalk: afp:/at/[@][:][/]

aw Link to an Active Worlds world Activeworlds Inc. aw://:/ Mostly found in HTTP referers when users open a website from within a Active Worlds world.
bolo Join an existing bolo game. bolo:/// Mostly passed via IRC or via tracker servers.
callto Launching Skype call (+And in Hungary the KLIP Software call too) (unofficial; see also skype:) Skype callto: or
callto: [2] Introduced with Microsoft NetMeeting. Works with current version of Skype with Firefox, Internet Explorer and Safari
chrome Specifies user interfaces built using XUL in Mozilla-based browsers. Mozilla chrome:///
/ (Where
is either "content", "skin" or "locale") Works only in Mozilla-based browsers such as Firefox, SeaMonkey and Netscape.
cvs Provides a link to a Concurrent Versions System (CVS) Repository Concurrent Versions System cvs://@/;[date=date to retrieve | tag=tag to retrieve]
ed2k Resources available using the eDonkey2000 network eDonkey2000 ed2k://|file||||/ or
ed2k://|server|||/ Links to servers are also possible, as are additional parameters. Official documentation from eDonkey2000 website at the Internet Archive
feed web feed subscription feed: or
feed://

examples:
feed://example.com/rss.xml
feed:https://example.com/rss.xml
See Feed URI scheme for a detailed overview of common implementations, supported software, and critics.
fish Accessing another computer's files using the SSH protocol fish KDE kioslave fish://[[:]@][:] See Files transferred over shell protocol for details about the protocol.
gg Starting chat with Gadu-Gadu user Gadu-Gadu gg:
gizmoproject Gizmo Project calling link. gizmoproject://call?id= May use sip:// instead of gizmoproject:// in recent versions of Gizmo.
iax2 Inter-Asterisk eXchange protocol version 2 IETF Draft iax2:[@][:][/[?]]

examples:
iax2:[2001:db8::1]:4569/alice?friends
iax2:johnQ@example.com/12022561414
irc Connecting to a server to join a channel. IETF Draft
Old IETF Draft irc://[:]/[[?]] Assuming the client knows a server associated with the name, "host" may optionally be an IRC network name.
ircs Secure equivalent of irc IETF Draft ircs://[:]/[[?]] See irc
itms Used for connecting to the iTunes Music Store Apple Inc itms:
jar Compressed archive member Java API jar:!/[] Works for any ZIP based file.
javascript Execute javascript code IETF Draft javascript: Works in any modern browser.
keyparc Keyparc encrypt/decrypt resource. keyparc://encrypt// or

keyparc://decrypt//
lastfm Connecting to a radio stream from Last.fm. Last.fm lastfm:// or lastfm://globaltags/ or
lastfm://user//
ldaps Secure equivalent of ldap ldaps://[[:]][/ [?[][?[][?[][?]]]]] Not an IETF standard, but commonly used in applications.
magnet "magnet links" Magnet-URI Project magnet:?xt=urn:sha1:&dn=
(other parameters are also possible) Used by various peer-to-peer clients, usually providing the hash of a file to be located on the network.
mms Windows streaming media mms://:/ Used by Windows Media Player to stream audio and/or video.
msnim Adding a contact, or starting a conversation in Windows Live Messenger Windows Live Messenger Add a contact to the buddy list

msnim:add?contact=nada@nowhere.com
Start a conversation with a contact
msnim:chat?contact=nada@nowhere.com
Start a voice conversation with a contact
msnim:voice?contact=nada@nowhere.com
Start a video conversation with a contact
msnim:video?contact=nada@nowhere.com
Can be invoked from a web page or via a run command or an ie browser URL (won't work with firefox 2.0.0.8). For web pages use this HTML:
Click to chat!
mvn Access Apache Maven repository artifacts OPS4J mvn:org.ops4j.pax.web.bundles/service/0.2.0-SNAPSHOT
mvn:http://user:password@repository.ops4j.org/maven2!org.ops4j.pax.web.bundles/service/0.2.0
notes Open a Lotus Notes document or database Lotus Notes notes://
Used by IBM Lotus Notes to refer to documents and databases stored within the Lotus Notes system. When clicked in a browser on a computer with Lotus Notes client installed, Notes will open the document link as if a Notes DocLink were clicked within Notes.
psyc Used to identify or locate a person, group, place or a service and specify its ability to communicate PSYC psyc:[//[:[][]]/[][#] -
paparazzi:http Used to launch and automatically take a screen shot using the application "Paparazzi" (Mac only) Derailer paparazzi:http:[//[:[][]]/ Official documentation from Paparazzi website
rmi Look up a Java object in an RMI registry. Sun rmi://[:]/ URI scheme understood by JNDI. Can be used to lookup a remote Java object within an RMI registry (typically for the purposes of RMI on that object). Host/port in the URI are of the rmiregistry process, not the remote object.
rsync rsync rsync://[:]/
secondlife Open the Map floater in Second Life application to teleport the resident to the location. Linden Lab secondlife:///// Used by SLurl.com. Knowledge base article.
sgn Social Graph Node Mapper Google example:

sgn://social-network.example.com/?ident=bob
Official documentation from sgnodemapper project.
skype Launching Skype call (official; see also callto:) Skype skype:[?[add|call|chat|sendfile|userinfo]] Official documentation from Skype website.
ssh SSH connections (like telnet:) and IETF Draft ssh://[[;fingerprint=]@][:]
sftp SFTP file transfers (not be to confused with FTPS (FTP/SSL)) IETF Draft sftp://[[;fingerprint=]@][:]//
smb Accessing SMB/CIFS shares IETF Draft smb://[@][:][/[]][?=[;=]] or
smb://[@][:][/]
sms Interact with SMS capable devices for composing and sending messages. IETF draft sms:? Should be used as a subset to the tel: schema.[citation needed]
soldat Joining servers Soldat soldat://:/

example:
soldat://127.0.0.1:23073/
Official note in Manual
steam Interact with Steam: install apps, purchase games, run games, etc. Steam, Valve Corporation steam: or
steam:/// Official documentation from Valve Developer Community website
svn Provides a link to a Subversion (SVN) source control repository Subversion (software) svn[+ssh]://@<:port>/
teamspeak Joining a server. TeamSpeak teamspeak://[:]/[?=[&=]] Official documentation from TeamSpeak Website
unreal Joining servers Unreal unreal://[:]/ Unreal legacy "protocol"
ut2004 Joining servers Unreal Tournament 2004 ut2004://[:][/?] Documentation from Unreal Developer Network
ventrilo Joining a server. Ventrilo ventrilo://[:]/[?=[&=]] Official documentation from Ventrilo Website
view-source Shows a web page as code 'in the raw'. Mozilla view-source:

example:
view-source:http://en.wikipedia.org/wiki/URI_scheme
See ??? for details.
webcal Subscribing to calendars in iCalendar format iCalendar webcal://

example:
webcal://example.com/calendar.ics
HTTP as a transport protocol is assumed.
See Webcal for details.
wtai Wireless Telephony Application Interface WAP Forum wtai://wp/mc/+18165551212 See Application Protocol Wireless Application Environment Specification Version 1.1 for details.
wyciwyg What You Cache Is What You Get WYCIWYG Mozilla wyciwyg:// See WYCIWYG for details.
xfire Adding friends and servers, joining servers, changing status text. Xfire xfire:[?=[&=]] Official documentation from Xfire website
xri eXtensible Resource Identifier (XRI) OASIS XRI Technical Committee xri://[/[]][?][#fragment] Official documentation from OASIS XRI Technical Committee
ymsgr Sending an instant message to a Yahoo! Contact. Yahoo! Messenger ymsgr:sendIM?

[edit] External links

* Official IANA Registry of URI Schemes
* More information, including many more schemes

[hide]
v • d • e
URI scheme
Official
aaa: · aaas: · acap: · cap: · cid: · crid: · data: · dav: · dict: · dns: · fax: · file: · ftp: · go: · gopher: · h323: · http: · https: · im: · imap: · Info: · ldap: · mailto: · mid: · news: · nfs: · nntp: · pop: · pres: · rtsp: · sip: · sips: · snmp: · tel: · telnet: · urn: · wais: · xmpp:
Unofficial
about: · aim: · apt: · bolo: · bzr: · callto: · cel: · cvs: · daap: · ed2k: · feed: · fish: · gg: · git: · gizmoproject: · iax2: · irc: · ircs: · itms: · lastfm: · ldaps: · magnet: · mms: · msnim: · psyc: · rsync: · secondlife: · skype: · ssh: · svn: · sftp: · smb: · sms: · soldat: · steam: · unreal: · ut2004: · vzochat: · webcal: · xfire: · ymsgr: · wyciwg://