How to Install SQL Server 2008
SQL Server 2008 is relatively easy to install, but it does take a little knowledge of the process and a little planning. For most shops the planning phase can be minimal but there will be instances (like clustering) when you’ll need to plan quite a bit. This will not be one of those cases. Today we’re going to discuss a straight-forward SQL Server 2008 install. Unlike some of the other Microsoft products there are a lot of screens to go through when installing SQL Server 2008, but most of them aren’t that bad. You just need to know what choices to make.
Exclusive Free SQL Server 2008 R2 Guidebook for Petri Readers
Hot off the presses at Microsoft, this book is for anyone who has an interest in SQL Server 2008 R2 and wants to understand its capabilities.Access custom scripts, installation and upgrade guides, management tips, high availability strategies, database consolidation techniques, and more...
Download Your Free Copy Here »
Before we walk through the screens and the choices you’ll be making, let’s go over a couple things you’re going to need to know before you get started. First of all you need to make sure your system meets the minimum requirements for the version of SQL Server 2008 you’re installing. That really isn’t very difficult these days but it’s best to check anyway. You can find a list of the minimum requirements here.
Assuming that you’ll be installing this on a production system it’s important to know that there is at least one system reboot required for this install. That’s because the first thing this install will do is upgrade your version of Windows Installer to 4.5. If you’ve already got Windows Installer 4.5 then setup will not require the reboot. Next it will install version 3.5 of the .Net framework. On most boxes this shouldn’t require a reboot. I tell you this because if you’re installing SQL Server 2008 on a current production box that, then you’ll need to plan ahead for the reboot and perhaps do it a few days before you install SQL Server 2008 so you’ve got it out of the way when the time comes.
Ok, now that we’ve taken care of the preliminaries, let’s install SQL Server 2008.
The media will autorun and present you with this screen.
Notice how much info Microsoft gives you right away. Over on the left there are menu choices and the options on the right change as you go through the menus. You want to click on the Installation menu on the left to be presented with this screen.
2. You can see there are many options here, but in this case the only one that makes sense is to click the top option for a stand-alone install. Click that option.
3. Once you choose the stand-alone install, setup will install the setup support files. It does this every time you run the installer so don’t be surprised if you run it again and again and it installs them every time. At this point setup will also run some checks to make sure that your system is ready to run through the install process. If it’s not and something fails, you’ll see an error similar to the one below. In my case I have a reboot pending and setup will not continue until I reboot.
If you don’t have any problems with the checks, the screen will be all green like this:
I might also add that if you have to reboot, once your server comes back up setup should continue on its own and re-run this check and if all errors have been eliminated, it should present you with the above screen just as if there had never been a problem.
4. Click OK. Next you will be presented with the setup support rules page. This is another set of checks that tells you whether things will complete and work properly once setup completes. The screen looks like this.
And again, warnings can be ok, but if there are any errors you’ll need to resolve them before moving forward. I’ll talk more about the kinds of things that can go wrong with setup in another article.
5. Now it’s time to choose the installation type. Are you installing a new instance of SQL Server 2008, or adding to an existing instance?
6. Now enter your product key if you need to. There are times when it’ll be filled in automatically for instance if you’re installing an MSDN version.
7. Next you need to agree to the license terms. For the sake of completion I’m going to go ahead and say that you should read all of the license terms so you know what you’re agreeing to. However, assuming you agree and actually want to install SQL Server 2008, then check the box and click next.
8. This is one of those really important parts of the installation: choosing features to install. Unfortunately, since I don’t know your goal I can’t really offer very much guidance here, but I can tell you that you’ll need to install Database Engine Services at a minimum if you want to run a SQL Server 2008 database on the server. Everything else is optional, but it is a good idea to install the documentation and the management tools. If you’re not sure whether you should install a feature or not there are two bits of advice I can offer. First, as you click on each option, a description comes up in the right pane that tells you what it’s all about so that can help you decide. And second, installing everything won’t hurt your server so if you’re not sure, then it’s cool to go with a full install. It’s still always best to snipe the features you’re after though.
You’ll notice that some features are grayed out. This is because I’ve already got SQL Server 2008 installed on my box, and some features can only be installed once across all instances. It doesn’t do any good to install multiple copies of the documentation, for example.
9. This next screen allows you to define whether you want to install a default or a named instance. You can of course only install one default instance, and all others must be named. As you can see here, I’m installing a named instance because I’ve already got a default instance on my box.
10. This screen outlines the disk space requirements for all the options you’ve chosen. There’s nothing really to do here and I’ve never seen it fail since nobody puts a database on a server that can’t even hold the install.
11. Now you need to configure your service accounts. You can select the same account for all services or separate them into different accounts. And while detailed advice on that is beyond the scope of this article, I can say that either way is usually ok. The one piece of advice I can give you though is to make it a domain account instead of just a local Windows account. And there are special permissions that these accounts need so you should make sure they have those rights. For a detailed discussion on choosing authentication methods, go to: How to choose a SQL Server authentication method.
12. This screen takes a little explanation. It’s asking you which security model you want to run. It defaults to Windows authentication which should be fine for most shops. The deciding factor is whether you’ve got non-windows domains, or if you’ve got regular windows domains that don’t trust each other. This could be a situation where you’ve got external customers hitting your database and you don’t want to give them windows access, but rather access through some sort of portal. All the same, it’s best to use Windows authentication if your users have Windows accounts. It’s more secure and it’s easier for the users to connect to the database. This screen also wants you to define default data directories. You can change these at any time once SQL Server 2008 is installed, so you can just accept the defaults here if you like and then change it later if you need to.
13. This screen gets overlooked by a lot of people. It’s really a good idea to check both of these boxes. There’s absolutely no personal or identifying information sent to Microsoft and checking these boxes allows them to get automatic error and usage information so they can improve the product. And contrary to popular belief, they actually do look at every error report that comes across. So check both of these. It costs you nothing and it helps improve the product.
14. You’re going to learn to hate this screen. Here we have another set of checks that can stop your install and in my experience, most of the stuff that’s going to stop you will happen here… after you’ve already been through most of the screens. This topic is too detailed to go into right now, but just know that you can expect some showstoppers here if there are going to be any. The good news is that you mostly get stopped on upgrade so if you’re installing a fresh instance you’ll probably be ok. So once you get all greens you can go ahead.
15. Assuming you’ve made it this far you should be free to hit the Install button. There’s nothing really notable about this screen; it’s just a summary of all the options you’ve chosen. If you like you can look it over and to make sure that you didn’t do anything stupid, but I’m usually ready to just start installing at this point so I just bull on ahead.
OK, you’ve just gone through the setup wizard to install SQL Server 2008. As you can see it’s fairly straight-forward but there are a lot of screens to click through. This is a much bigger topic than I could ever cover in a single article, but starting with a simple wizard-driven install is a good place to get your feet wet. In later articles I’ll go into more detail on many of these aspects and even get into upgrades, and troubleshooting install errors.
m
Friday, June 11, 2010
What is the difference between an Ethernet hub and switch
What is the difference between an Ethernet hub and switch?
Although hubs and switches both glue the PCs in a network together, a switch is more expensive and a network built with switches is generally considered faster than one built with hubs. Why?
When a hub receives a packet (chunk) of data (a frame in Ethernet lingo) at one of its ports from a PC on the network, it transmits (repeats) the packet to all of its ports and, thus, to all of the other PCs on the network. If two or more PCs on the network try to send packets at the same time a collision is said to occur. When that happens all of the PCs have to go though a routine to resolve the conflict. The process is prescribed in the Ethernet Carrier Sense Multiple Access with Collision Detection (CSMA/CD) protocol. Each Ethernet Adapter has both a receiver and a transmitter. If the adapters didn't have to listen with their receivers for collisions they would be able to send data at the same time they are receiving it (full duplex). Because they have to operate at half duplex (data flows one way at a time) and a hub retransmits data from one PC to all of the PCs, the maximum bandwidth is 100 Mhz and that bandwidth is shared by all of the PC's connected to the hub. The result is when a person using a computer on a hub downloads a large file or group of files from another computer the network becomes congested. In a 10 Mhz 10Base-T network the affect is to slow the network to nearly a crawl. The affect on a small, 100 Mbps (million bits per scond), 5-port network is not as significant.
Two computers can be connected directly together in an Ethernet with a crossover cable. A crossover cable doesn't have a collision problem. It hardwires the Ethernet transmitter on one computer to the receiver on the other. Most 100BASE-TX Ethernet Adapters can detect when listening for collisions is not required with a process known as auto-negotiation and will operate in a full duplex mode when it is permitted. The result is a crossover cable doesn't have delays caused by collisions, data can be sent in both directions simultaneously, the maximum available bandwidth is 200 Mbps, 100 Mbps each way, and there are no other PC's with which the bandwidth must be shared.
An Ethernet switch automatically divides the network into multiple segments, acts as a high-speed, selective bridge between the segments, and supports simultaneous connections of multiple pairs of computers which don't compete with other pairs of computers for network bandwidth. It accomplishes this by maintaining a table of each destination address and its port. When the switch receives a packet, it reads the destination address from the header information in the packet, establishes a temporary connection between the source and destination ports, sends the packet on its way, and then terminates the connection.
Picture a switch as making multiple temporary crossover cable connections between pairs of computers (the cables are actually straight-thru cables; the crossover function is done inside the switch). High-speed electronics in the switch automatically connect the end of one cable (source port) from a sending computer to the end of another cable (destination port) going to the receiving computer on a per packet basis. Multiple connections like this can occur simultaneously. It's as simple as that. And like a crossover cable between two PCs, PC's on an Ethernet switch do not share the transmission media, do not experience collisions or have to listen for them, can operate in a full-duplex mode, have bandwidth as high as 200 Mbps, 100 Mbps each way, and do not share this bandwidth with other PCs on the switch. In short, a switch is "more better."
Although hubs and switches both glue the PCs in a network together, a switch is more expensive and a network built with switches is generally considered faster than one built with hubs. Why?
When a hub receives a packet (chunk) of data (a frame in Ethernet lingo) at one of its ports from a PC on the network, it transmits (repeats) the packet to all of its ports and, thus, to all of the other PCs on the network. If two or more PCs on the network try to send packets at the same time a collision is said to occur. When that happens all of the PCs have to go though a routine to resolve the conflict. The process is prescribed in the Ethernet Carrier Sense Multiple Access with Collision Detection (CSMA/CD) protocol. Each Ethernet Adapter has both a receiver and a transmitter. If the adapters didn't have to listen with their receivers for collisions they would be able to send data at the same time they are receiving it (full duplex). Because they have to operate at half duplex (data flows one way at a time) and a hub retransmits data from one PC to all of the PCs, the maximum bandwidth is 100 Mhz and that bandwidth is shared by all of the PC's connected to the hub. The result is when a person using a computer on a hub downloads a large file or group of files from another computer the network becomes congested. In a 10 Mhz 10Base-T network the affect is to slow the network to nearly a crawl. The affect on a small, 100 Mbps (million bits per scond), 5-port network is not as significant.
Two computers can be connected directly together in an Ethernet with a crossover cable. A crossover cable doesn't have a collision problem. It hardwires the Ethernet transmitter on one computer to the receiver on the other. Most 100BASE-TX Ethernet Adapters can detect when listening for collisions is not required with a process known as auto-negotiation and will operate in a full duplex mode when it is permitted. The result is a crossover cable doesn't have delays caused by collisions, data can be sent in both directions simultaneously, the maximum available bandwidth is 200 Mbps, 100 Mbps each way, and there are no other PC's with which the bandwidth must be shared.
An Ethernet switch automatically divides the network into multiple segments, acts as a high-speed, selective bridge between the segments, and supports simultaneous connections of multiple pairs of computers which don't compete with other pairs of computers for network bandwidth. It accomplishes this by maintaining a table of each destination address and its port. When the switch receives a packet, it reads the destination address from the header information in the packet, establishes a temporary connection between the source and destination ports, sends the packet on its way, and then terminates the connection.
Picture a switch as making multiple temporary crossover cable connections between pairs of computers (the cables are actually straight-thru cables; the crossover function is done inside the switch). High-speed electronics in the switch automatically connect the end of one cable (source port) from a sending computer to the end of another cable (destination port) going to the receiving computer on a per packet basis. Multiple connections like this can occur simultaneously. It's as simple as that. And like a crossover cable between two PCs, PC's on an Ethernet switch do not share the transmission media, do not experience collisions or have to listen for them, can operate in a full-duplex mode, have bandwidth as high as 200 Mbps, 100 Mbps each way, and do not share this bandwidth with other PCs on the switch. In short, a switch is "more better."
The difference between the firewall and the application level proxy server
Firewall
Application level proxy server
Firewall is a router (computer which is able to forward packets between two or more networks) with some restriction rules applied.Most of current routers can be used as an easy firewall (most of routers allows to define restrictions). It applies by example to Cisco routers, Linux systems,... But real firewall is more complicated. It implements mechanisms to allow dynamically opened holes for incoming connections (for FTP sessions by example) and more.
Application proxy server is a computer which is able to handle requests in some communication protocols (HTTP,FTP,SOCKS,..). For each used protocol appropriate proxy service must be enabled.
Firewall works on the packet level. It can apply rules on packets (by checking the source/destination IP address, source/destination port,...) to decide whether the packet will be forwarded or denied.
Proxy works on application protocol level. They doesn't work on packet level so they can't forward packets.
The client station have to be configured to use firewall as default gateway.
Applications on the client PC have to be configured to use proxy server to access Internet servers.
If you disable the firewall (only the router works) all LAN station have direct and full Internet access.You can imagine the firewall as a set of restrictive rules (all is enabled when these rules are inactive). So you can eliminate/change some rules to create a hole (range) of port by example.
If you disable proxy there is no way to connect from the LAN to the Internet servers.
Services which use low-level TCP/IP protocols (ping, traceroute,..) will work behind firewall (if they are not disabled by firewall restrictions).
Services which use low-level TCP/IP protocols (ping, traceroute,..) will not work behind
Application level proxy server
Firewall is a router (computer which is able to forward packets between two or more networks) with some restriction rules applied.Most of current routers can be used as an easy firewall (most of routers allows to define restrictions). It applies by example to Cisco routers, Linux systems,... But real firewall is more complicated. It implements mechanisms to allow dynamically opened holes for incoming connections (for FTP sessions by example) and more.
Application proxy server is a computer which is able to handle requests in some communication protocols (HTTP,FTP,SOCKS,..). For each used protocol appropriate proxy service must be enabled.
Firewall works on the packet level. It can apply rules on packets (by checking the source/destination IP address, source/destination port,...) to decide whether the packet will be forwarded or denied.
Proxy works on application protocol level. They doesn't work on packet level so they can't forward packets.
The client station have to be configured to use firewall as default gateway.
Applications on the client PC have to be configured to use proxy server to access Internet servers.
If you disable the firewall (only the router works) all LAN station have direct and full Internet access.You can imagine the firewall as a set of restrictive rules (all is enabled when these rules are inactive). So you can eliminate/change some rules to create a hole (range) of port by example.
If you disable proxy there is no way to connect from the LAN to the Internet servers.
Services which use low-level TCP/IP protocols (ping, traceroute,..) will work behind firewall (if they are not disabled by firewall restrictions).
Services which use low-level TCP/IP protocols (ping, traceroute,..) will not work behind
Thursday, February 26, 2009
Uniform Resource Locator (URL)
n computing, a Uniform Resource Locator (URL) is a type of Uniform Resource Identifier (URI) that specifies where an identified resource is available and the mechanism for retrieving it.[1] In popular usage and in many technical documents and verbal discussions it is often, imprecisely and confusingly, used as a synonym for uniform resource identifier. The confusion in usage stems from historically different interpretations of the semantics of the terms involved.[2] In popular language, a URL is also referred to as a Web address.
Contents
[hide]
* 1 Syntax
* 2 URLs as locators
* 3 Internet hostnames
* 4 See also
* 5 References
* 6 External links
Main article: URI scheme#Generic syntax
Every URL begins with the scheme name that defines its namespace, purpose, and the syntax of the remaining part of the URL. Most Web-enabled programs will try to dereference a URL according to the semantics of its scheme and a context-specific heuristic. For example, a Web browser will usually dereference the URL http://example.org/ by performing an HTTP request to the host example.org, at the default HTTP port (port 80). Dereferencing the URL mailto:bob@example.com will usually start an e-mail composer with the address bob@example.com in the To field.
example.com is a domain name; an IP address or other network address might be used instead. In addition, URLs that specify https as a scheme (such as https://example.com/) normally denote a secure website.
The hostname portion of a URL, if present, is case insensitive (since the DNS is specified to ignore case); other parts are not required to be, but may be treated as case insensitive by some clients and servers, especially those that are based on Microsoft Windows. For example:
1. http://en.wikipedia.org/ and HTTP://EN.WIKIPEDIA.ORG/ will both open same page.
2. http://en.wikipedia.org/wiki/URL is correct, but http://en.wikipedia.org/WIKI/URL/ will result in an HTTP 404 error page.
[edit] URLs as locators
In its current strict technical meaning, a URL is a URI that, “in addition to identifying a resource, [provides] a means of locating the resource by describing its primary access mechanism (e.g., its network ‘location’).”[3]
[edit] Internet hostnames
Main article: Hostname
On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.
Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.
[edit] See also
* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme
[edit] References
1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Uniform Resource Identifier (URI): Generic Syntax”. Internet Society. RFC 3986; STD 66.
[edit] External links
* RFC 3986 Uniform Resource Identifier (URI): Generic Syntax [Text] [HTML]
URLs as locators
In its current strict technical meaning, a URL is a URI that, “in addition to identifying a resource, [provides] a means of locating the resource by describing its primary access mechanism (e.g., its network ‘location’).”[3]
[edit] Internet hostnames
Main article: Hostname
On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.
Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.
[edit] See also
* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme
[edit] References
1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Unifor
nternet hostnames
Main article: Hostname
On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.
Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.
[edit] See also
* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme
[edit] References
1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Uniform Resource Identifier (URI): Generic Syntax”. Internet Society. RFC 3986; STD 66.
[edit] External links
* RFC 3986 Uniform Resource Identifie
CURIE
From Wikipedia, the free encyclopedia
Jump to: navigation, search
For other uses see Curie (disambiguation)
A CURIE (short for Compact URI) is an abbreviated URI expressed in CURIE syntax, and may be found in both XML and non-XML grammars. A CURIE may be considered a datatype.
An example of CURIE syntax: [isbn:0393315703]
The square brackets may be used to prevent ambiguities between CURIEs and regular URIs.
QNames (the namespace prefixes used in XML) often are used as a CURIE, and may be considered a type of CURIE. CURIEs, as defined by the W3C, will be better defined and may include checking. Unlike QNames, the part of a CURIE after the colon does not need to conform to the rules for element names.
The first W3C Working Draft of CURIE syntax was released 7 March 2007.[1]
[edit] Example
This example is based on one from the W3C Working Draft 7 March 2007, using a QName syntax within XHTML.
...
* The definition ("") is highlighted in yellow
* The CURIE ("[wiki:Biome]") is highlighted in green
[edit] References
1. ^ CURIE Syntax 1.0
[edit] External links
* www.w3.org/TR/curie
[hide]
v • d • e
Standards of the World Wide Web Consortium
Recommendations
Canonical XML · CDF · CSS · DOM · HTML · MathML · OWL · P3P · PLS · RDF · RDF Schema · SISR · SMIL · SOAP · SRGS · SSML · SVG · SPARQL · Timed Text · VoiceXML · WSDL · XForms · XHTML · XLink · XML · XML Base · XML Encryption · XML Events · XML Information Set · XML Schema · XML Signature · XPath · XPointer · XQuery · XSL · XSL-FO · XSLT
Notes
XAdES · XHTML+SMIL
Working Drafts
CCXML · CURIE · HTML 5 · InkML · WICD · XFDL · XFrames · XBL · XHTML+MathML+SVG · XProc
Guidelines
Web Content Accessibility Guidelines
deprecated
HDML · JSSS · PGML · VML
Extensible Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The neutrality of this article is disputed.
Please see the discussion on the talk page. (June 2008)
Please do not remove this message until the dispute is resolved.
Extensible Resource Identifier (abbreviated XRI) is a scheme and resolution protocol for abstract identifiers compatible with Uniform Resource Identifiers and Internationalized Resource Identifiers, developed by the XRI Technical Committee at OASIS. The goal of XRI is a standard syntax and discovery format for abstract, structured identifiers that are domain-, location-, application-, and transport-independent, so they can be shared across any number of domains, directories, and interaction protocols.
The XRI 2.0 specifications narrowly failed to become OASIS standards due to the number of negative votes,[1] a failure attributed[2] to the intervention of the W3C Technical Architecture Group which made a statement recommending against using XRIs or taking the XRI specifications forward.[3] The core of the dispute is whether the widely interoperable HTTP URIs are capable of fulfilling the role of abstract, structured identifiers, as the TAG believes,[4] but whose limitations the XRI Technical Committee was formed specifically to address.[5]
With the growth of XML, Web services, and other ways of adapting the Web to automated, machine-to-machine communications, it is increasingly important to be able to identify a resource independent of any specific physical network path, location, or protocol in order to:
* Create structured identifiers with self-describing "tags" that can be understood across domains the same way XML documents provide a self-describing, domain-independent data format.
* Maintain a persistent link to the resource regardless of whether its network location changes.
* Delegate identifier management not just in the authority segment (the first segment following the "xxx://" scheme name) but anywhere in the identifier path.
* Map identifiers used to identify a resource in one domain to other synonyms used to identify the same resource in the same domain, or in other domains.
By early 2003, these requirements led to the a resolution protocol based on HTTP(S) and simple XML documents called XRDS (Extensible Resource Descriptor Sequence).
Contents
[hide]
* 1 Features
* 2 Composition of an Extensible Resource Identifier
* 3 Resolving an Extensible Resource Identifier
o 3.1 Proxy resolvers and the HXRI
* 4 Examples of XRI cross-reference syntax
* 5 Other examples of XRI 2.0 syntax
* 6 Applications
* 7 Licensing
* 8 References
* 9 See also
* 10 External links
[edit] Features
* URI- and IRI-compatibility — XRIs can be used wherever URIs or IRIs are called for.
* Cross-references — An XRI can contain another XRI (or a URI), to any level of nesting. This enables the construction of structured, "tagged" identifiers that enable identifier sharing across domains the same way XML enables data sharing across domains.
* Global context symbols — These are single-character symbols (=, @, +, $, or !) that provide a simple, human-friendly way to indicate the global context of an i-name or i-number. These are not required, but may be used within communities of interest that agree on their meaning and how they are resolved.
* Peer-to-peer addressing — XRI syntax supports the ability for any two network nodes to assign each other XRIs and perform cross-resolution. That is, a top-level namespace authority can be referred to by names assigned by other parties. This aids in federating namespaces between organizations or communities of interest.
* Decentralization — XRIs can be rooted in either centralized addressing systems (e.g., IP addresses or DNS domain names) or private/decentralized root authorities and peer-to-peer addressing.
* Delegation — Namespaces can be delegated to other namespace authorities.
* Federation — Namespaces defined separately at any level can be joined together in a hierarchical or polyarchical fashion, and made visible and resolvable.
* Persistence — The ability to express the intent that parts (or all) of an XRI are permanent identifiers that will never be reassigned.
* Human-friendly and machine-friendly formats — XRI provides syntax both for identifiers that can be created and understood by humans easily (i-names), and those that are optimized for machine structuring/parsing (i-numbers).
* Simple, extensible resolution — XRI offers a lightweight resolution scheme using HTTP and a simple XML document format called XRDS.
* Trusted resolution — the XRI resolution protocol includes three modes of trusted version: a) HTTPS, b) SAML assertions, and c) both.
* Multiple resolution options — XRI resolution can be independent of DNS.
* Fully internationalizable, leveraging Unicode and IRI specifications.
* Transport independent — XRIs are not bound to any specific transport protocols or mechanism.
[edit] Composition of an Extensible Resource Identifier
An XRI starting with "=" is thought of identifying a person. An XRI starting with "@" identifies a company or organization. A starting "+" indicates a generic concept, subject or topic [6].
A "*" marks a delegation. For example with "=family*name", "=family" delegates the resolving of its sub-XRI "name" to another resolver. This is analogous to DNS' delegating the subdomain resolution to other nameservers (name.family.de: after resolving de, the nameserver responsible for de delegates to the family nameserver, which delegates to the name nameserver).
[edit] Resolving an Extensible Resource Identifier
XRIs are resolved to XRDS documents using the HTTP(S) protocol in the same way as URLs are resolved to Resource Records using the DNS protocol. This lookup process can be configured by passing parameters [7].
[edit] Proxy resolvers and the HXRI
An XRI can be transformed into a URI by adding http(s)://xri.*/ at the beginning and appending the XRI. Internally, the URI now refers to a so called proxy resolver, which resolves a URI of this kind to an XRDS document. The proxy resolver found under http://xri.net for example can be used to resolve an XRI. So =example becomes http://xri.net/=example. The second form is called an HTTP XRI or shortly HXRI. The owner of the XRI =example can tell the proxy resolver what to do, if the HXRI is called. One possible reaction is to do a 302 HTTP redirect to a stored URI.
Further parameters to specifiy the resolution can be appended to the HXRI, e.g. to get the whole XRDS document or to get service descriptions for this XRI. E.g. if you attach ?_xrd_r=application/xrds+xml to the HXRI, the whole XRDS document is returned. So http://xri.net/=example?_xrd_r=application/xrds+xml returns the whole XRDS for the XRI =example.
[edit] Examples of XRI cross-reference syntax
Say a library system uses URNs in the ISBN namespace to identify books and DNS subdomains to identify its library branches. HTTP URI syntax does not provide a standard way to express the URN for the book title in the context of the DNS name for the library branch. XRI cross-reference syntax solves this problem by allowing the library (and even automated programs running at the library) to programmatically construct the XRIs necessary to address any book at any branch. Examples:
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)
xri://shoreline.library.example.com/(urn:isbn:0-395-36341-1)
xri://northgate.library.example.com/(urn:isbn:0-395-36341-1)
This ability to create structured, self-describing identifiers can be extended to many other uses. For example, say the library wanted to indicate the type of each book available. By establishing a simple XRI dictionary of book types, it can now programmatically construct XRIs that include this metadata,
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+hardcover)
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+softcover)
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+reference)
[edit] Other examples of XRI 2.0 syntax
(Note that none of these show the prefix "xri://", which is optional in XRIs when they are not in URI normal form, i.e, they have not undergone the specified transformation between XRI format and URI format.)
Example XRIs composed entirely of reassignable segments:
=Mary.Jones
@Jones.and.Company
+phone.number
+phone.number/(+area.code)
=Mary.Jones/(+phone.number)
@Jones.and.Company/(+phone.number)
@Jones.and.Company/((+phone.number)/(+area.code))
Example XRIs composed entirely of persistent segments:
=!13cf.4da5.9371.a7c5
@!280d.3822.17bf.ca48!78d2/!12
Example of XRIs with mixes of persistent and reassignable segments (XRI allows any combination of the two):
=!13cf.4da5.9371.a7c5/(+phone.number)
@Jones.and.Company!78d2/!12/(+area.code)
[edit] Applications
Examples of applications being developed using XRI infrastructure include:
* OpenID 2.0 includes support for XRIs and uses XRDS for OpenID identifier discovery.
* The Higgins Project uses XRIs and XRDS to address and discover Higgins context providers.
* XDI.org I-name and I-number digital identity addressing services.
* The XDI data sharing protocol under development by the OASIS XDI Technical Committee.
[edit] Licensing
The neutrality of this article is disputed.
Please see the discussion on the talk page. (June 2008)
Please do not remove this message until the dispute is resolved.
This section needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (August 2008)
The XRI Technical Committee is chartered under the RF on Limited Terms Mode of the OASIS IPR policy (See http://www.oasis-open.org/committees/xri/ipr.php for more details.)
Some people[weasel words] argues that the use of the technologies employed in XRI are subject to patent claims, that the licensing rights to these patents has been vested in XDI.org, a non-profit organization which has in turn licensed a non-exclusive interest in the use of the patents to companies associated with the original patent holders, despite the above IPR statement.
[edit] References
1. ^ Failed OASIS
2. ^ Time for OASIS XRI TC and W3C TAG to Sit Down Together
3. ^ TAG recommends against XRI
4. ^ URNs, Namespaces and Registries
5. ^ Xri Solves Real Problems
6. ^ XRI and XDI Explained
7. ^ XRI in a Nutshell
[edit] See also
* I-names
* I-numbers
* XRDS
* XDI
* Dataweb
* Social Web
* Higgins project
* Project Xanadu
[edit] External links
* OASIS XRI Technical Committee specifications:
o XRI Syntax 2.0 Committee Specification
o XRI Resolution 2.0 Committee Specification
o XRI 2.0 FAQ
o XRI Requirements and Glossary 1.0
* W3C Internationalized Resource Identifier (IRI)
* XDI.org - public trust organization governing XRI global registry services
o XDI.org Global Services Specifications - website of XDI.org specifications for global registry services for public i-names and i-numbers
o XDI.org I-Services Specifications - website of XDI.org specifications for XRDS-enabled identity services.
* dev.xri.net - open public wiki on XRI and XRI open source projects
* Internet Identity Workshop One-Pager on XRI and XRDS
* FSF's Dispute with OASIS patent policies and on FSF's Support for OASIS RF on Limited Terms IPR Policy, which is used for ODF.
* EqualsDrummond - blog about XRI and Internet identifiers by Drummond Reed, co-chair of the OASIS XRI Technical Committee and Chief Architect at Cordance, currently under contract with XDI.org to operate XRI registr
nternationalized Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search
On the Internet, the Internationalized Resource Identifier (IRI) is a generalization of the Uniform Resource Identifier (URI), which is in turn a generalization of the Uniform Resource Locator (URL). While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth.
It is defined by RFC 3987.
Contents
[hide]
* 1 Advantages
* 2 Disadvantages
* 3 See also
* 4 External links
[edit] Advantages
There are reasons to see URIs displayed in different languages; mostly it makes it easier on users who are unfamiliar with the roman alphabet, and assuming that isn't too difficult for anyone to replicate arbitrary unicode on their keyboards this can make the URI system more worldly and accessible.
[edit] Disadvantages
Mixing IRIs and ASCII URIs can make it much easier to do phishing attacks which trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay.com or www.paypal.com with an internationalized look-alike "a" character, and point that IRI to a malicious site.
Additionally, it can be difficult for those with different language keyboards to access web resources in other languages; in contrast, open-source programming projects (and most programs) are almost exclusively written using the Roman alphabet to avoid this type of encoding incompatibility.
[edit] See also
* XRI (Extensible Resource Identifier)
* IDN (Internationalized Domain Name)
* Punycode
[edit] External links
* IRI
* Internationalized Resource Identifiers
This computer-related article is a stub. You can help Wikipedia by expanding it
Uniform Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search
"URI" redirects here. For other uses, see URI (disambiguation).
In computing, a Uniform Resource Identifier (URI) is a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. URIs are defined in schemes specifying a specific syntax and associated protocols.
Contents
[hide]
* 1 Relationship to URL and URN
o 1.1 Technical view
o 1.2 RFC 3305
* 2 Syntax
* 3 History
o 3.1 Naming, addressing, and identifying resources
o 3.2 Refinement of specifications
* 4 URI reference
o 4.1 Uses of URI references in markup languages
o 4.2 Examples of absolute URIs
o 4.3 Examples of URI references
* 5 URI resolution
* 6 Relation to XML namespaces
* 7 See also
* 8 References
* 9 External links
[edit] Relationship to URL and URN
Set diagram of URI scheme categories. Schemes in the URL (locator) and URN (name) categories form subsets of URI, and, generally, are also disjoint sets.
Technically URL and URN function as resource IDs, however, many schemes can't be categorized as strictly one or the other, because all URIs can be treated as names, and some schemes embody aspects of both categories – or neither.
Computer scientists may classify a URI as a locator (URL), or a name (URN), or both.
A Uniform Resource Name (URN) is like a person's name, while a Uniform Resource Locator (URL) resembles that person's street address. The URN defines an item's identity, while the URL provides a method for finding it.
The ISBN system for uniquely identifying books provides a typical example of the use of typical URNs. ISBN 0486275574 (urn:isbn:0-486-27557-4) cites unambiguously a specific edition of Shakespeare's play Romeo and Juliet. In order to gain access to this object and read the book, one would need its location: a URL address. A typical URL for this book on a unix-like operating system is a file path, like file:///home/username/RomeoAndJuliet.pdf, identifying the electronic book saved in a local hard disk. So URNs and URLs have complementary purposes.
[edit] Technical view
A URL is a URI that, in addition to identifying a resource, provides means of acting upon or obtaining a representation of the resource by describing its primary access mechanism or network "location". For example, the URL http://www.wikipedia.org/ identifies a resource (Wikipedia's home page) and implies that a representation of that resource (such as the home page's current HTML code, as encoded characters) is obtainable via HTTP from a network host named www.wikipedia.org. A Uniform Resource Name (URN) is a URI that identifies a resource by name in a particular namespace. A URN can be used to talk about a resource without implying its location or how to access it. For example, the URN urn:isbn:0-395-36341-1 is a URI that specifies the identifier system, i.e. International Standard Book Number (ISBN), as well as the unique reference within that system and allows one to talk about a book, but doesn't suggest where and how to obtain an actual copy of it.
Technical publications, especially standards produced by the IETF and the W3C, have long deprecated the term URL, as it is rarely necessary to distinguish between URLs and URIs. However, in nontechnical contexts and in software for the World Wide Web, the term URL remains widely used. Additionally, the term web address, which has no formal definition, is often used in nontechnical publications as a synonym for URL or URI, although it generally refers only to "http" and "https" URL schemes.
[edit] RFC 3305
Much of this discussion comes from RFC3305, titled "Report from the Joint W3C/IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and Recommendations". This RFC outlines the work of a joint W3C/IETF working group that was setup specifically to normalize the divergent views held within the IETF and W3C over what the relationship was between the various "UR*" terms and standards. While not published as a full standard by either organization, it has become the basis for the above common understanding and has informed many standards since then.
[edit] Syntax
The URI syntax is essentially a URI scheme name like "HTTP", "FTP", "mailto", "URN", "tel", "rtsp", "file", etc., followed by a colon character, and then a scheme-specific part. The specifications that govern the schemes determine the syntax and semantics of the scheme-specific part, although the URI syntax does force all schemes to adhere to a certain generic syntax that, among other things, reserves certain characters for special purposes, without always saying what those purposes are. The URI syntax also enforces restrictions on the scheme-specific part, in order to, for example, provide for a degree of consistency when the part has a hierarchical structure. Percent-encoding is an often-misunderstood aspect of URI syntax.
[edit] History
[edit] Naming, addressing, and identifying resources
URIs and URLs have a shared history. Early in 1990, Tim Berners-Lee’s proposals for HyperText [2] implicitly introduced the idea of a URL as a short string representing a resource that is the target of a hyperlink. At the time, it was called a hypertext name or document name[3]
Over the next three-and-a-half years, as the World Wide Web's core technologies of HTML (the HyperText Markup Language), HTTP, and Web browsers developed, a need to distinguish a string that provided an address for a resource from a string that merely named a resource emerged. Although not yet formally defined, the term Uniform Resource Locator came to represent the former, and the more contentious Uniform Resource Name came to represent the latter.
During the debate over how to best define URLs and URNs, it became evident that the two concepts embodied by the terms were merely aspects of the fundamental, overarching notion of resource identification. So, in June 1994, the IETF published Berners-Lee's RFC 1630: the first RFC that (in its non-normative text) acknowledged the existence of URLs and URNs, and, more importantly, defined a formal syntax for Universal Resource Identifiers — URL-like strings whose precise syntaxes and semantics depended on their schemes. In addition, this RFC attempted to summarize the syntaxes of URL schemes that were in use at the time. It also acknowledged, but did not standardize, the existence of relative URLs and fragment identifiers.
[edit] Refinement of specifications
In December 1994, RFC 1738 formally defined relative and absolute URLs, refined the general URL syntax, defined how relative URLs were to be resolved to absolute form, and better enumerated the URL schemes that were in use at the time. The definition and syntax of URNs was not settled upon until the publication of RFC 2141 in May 1997.
With the publication of RFC 2396 in August 1998, the URI syntax became a separate specification[4], and most parts of RFCs 1630 and 1738 relating to URIs and URLs in general were revised and expanded. The new RFC changed the significance of the "U" in "URI": it came to represent "Uniform" rather than "Universal". The sections of RFC 1738 that summarized existing URL schemes were moved into a separate document[1]. IANA keeps a registry of those schemes[2], the procedure to register them was first described in RFC 2717.
In December 1999, RFC 2732 provided a minor update to RFC 2396, allowing URIs to accommodate IPv6 addresses. Some time later, a number of shortcomings discovered in the two specifications led to the development of a number of draft revisions under the title rfc2396bis. This community effort, coordinated by RFC 2396 co-author Roy Fielding, culminated in the publication of RFC 3986 in January 2005. This RFC, as of 2009[update] the current version of the URI syntax recommended for use on the Internet, renders RFC 2396 obsolete. It does not, however, render the details of existing URL schemes obsolete; those are still governed by RFC 1738, except where otherwise superseded — RFC 2616 for example, refines the "http" scheme. The content of RFC 3986 was simultaneously published by the IETF as the full standard STD 66, reflecting the establishment of the URI generic syntax as an official Internet protocol.
In August 2002, RFC 3305 pointed out that the term URL has, despite its ubiquity in the vernacular of the Internet-aware public at large, faded into near-obsolescence. It now serves only as a reminder that some URIs act as addresses because they have schemes that imply some kind of network accessibility, regardless of whether systems actually use them for that purpose. As URI-based standards such as Resource Description Framework make evident, resource identification need not be coupled with the retrieval of resource representations over the Internet, nor does it need to be associated with network-bound resources at all.
On November 1, 2006, the W3C Technical Architecture Group published "On Linking Alternative Representations To Enable Discovery And Publishing", a guide to best practices and canonical URIs for publishing multiple versions of a given resource. For example, content might differ by language or by size to adjust for capacity or settings of the device used to access that content.
For the Semantic Web, the HTTP URI scheme can be used to identify both documents and concepts in the real world, this has caused confusion how to exactly distinguish both. The Technical Architecture Group (TAG) published an e-mail in June 2005 on how to solve this problem. This was known as httpRange-14 resolution[3]. To explain this (rather brief) email, W3C published in March 2008 the Interest Group Note Cool URIs for the Semantic Web[4]. This explains the the use of content negotiation and the 303-redirect code in more detail.
[edit] URI reference
A URI reference is another type of string that represents a URI, and, in turn, the resource identified by that URI. Informal usage does not often maintain the distinction between a URI and a URI reference, but protocol documents should not allow for ambiguity.
A URI reference may take the form of a full URI, or just the scheme-specific portion of one, or even some trailing component thereof—even the empty string. An optional fragment identifier, preceded by "#", may be present at the end of a URI reference. The part of the reference before the "#" indirectly identifies a resource, and the fragment identifier identifies some portion of that resource.
In order to derive a URI from a URI reference, software converts the URI reference to "absolute" form by merging it with an absolute "base" URI, according to a fixed algorithm. The URI reference is considered to be relative to the base URI, although if the reference itself is absolute, then the base is irrelevant. The base URI is typically the URI that identifies the document containing the URI reference, although this can be overridden by declarations made within the document or as part of an external data transmission protocol. If a fragment identifier is present in the base URI, it is ignored during the merging process. If a fragment identifier is present in the URI reference, it is preserved during the merging process.
Web document markup languages frequently use URI references in places where there is a need to point to other resources, such as external documents or specific portions of the same logical document.
[edit] Uses of URI references in markup languages
* In HTML, the value of the src attribute of the img element is a URI reference, as is the value of the href attribute of the a or link element.
* In XML, the system identifier appearing after the SYSTEM keyword in a DTD is a fragmentless URI reference.
* In XSLT, the value of the href attribute of the xsl:import element/instruction is a URI reference, as is the first argument to the document() function.
[edit] Examples of absolute URIs
* http://example.org/absolute/URI/with/absolute/path/to/resource.txt
* ftp://example.org/resource.txt
* urn:issn:1535-3613
[edit] Examples of URI references
* http://en.wikipedia.org/wiki/URI#Examples_of_URI_references ("http" is the 'scheme' name, "en.wikipedia.org" is the 'authority', "/wiki/URI" the 'path' pointing to this article, and "#Examples_of_URI_references" is a 'fragment' pointing to this section.)
* http://example.org/absolute/URI/with/absolute/path/to/resource.txt
* /relative/URI/with/absolute/path/to/resource.txt
* relative/path/to/resource.txt
* ../../../resource.txt
* ./resource.txt#frag01
* resource.txt
* #frag01
* (empty string)
[edit] URI resolution
To "resolve" a URI means either to convert a relative URI reference to absolute form, or to dereference a URI or URI reference by attempting to obtain a representation of the resource that it identifies. The "resolver" component in document processing software generally provides both services.
One can regard a URI reference as a same-document reference: a reference to the document containing the URI reference itself. Document processing software is encouraged to use its current representation of the document to satisfy the resolution of a same-document reference; a new representation should not be fetched. This is only a recommendation, and document processing software is free to use other mechanisms to determine whether obtaining a new representation is warranted.
According to the current URI specification as of 2009[update], RFC 3986, a URI reference is a same-document reference if, when resolved to absolute form, it is identical to the base URI that is in effect for the reference. Typically, the base URI is the URI of the document containing the reference. XSLT 1.0, for example, has a document() function that, in effect, implements this functionality. RFC 3986 also formally defines URI equivalence, which can be used in order to determine that a URI reference, while not identical to the base URI, still represents the same resource and thus can be considered to be a same-document reference.
Same-document references were determined differently according to RFC 2396, which was made obsolete by RFC 3986 but still serves as the basis of many specifications and implementations. According to this specification, a URI reference is a same-document reference if it is an empty string or consists of only the "#" character followed by an optional fragment.
[edit] Relation to XML namespaces
XML has a concept of a namespace, an abstract domain to which a collection of element and attribute names can be assigned. An XML namespace is identified by a character string, the namespace name, which must adhere to the generic URI syntax. However, the namespace name is not considered to be a URI because the "URI-ness" of strings is, according to the URI specification, based on how they are intended to be used, not just their lexical components. A namespace name also does not necessarily imply any of the semantics of URI schemes; a namespace name beginning with "http:", for example, likely has nothing to do with the HTTP protocol. XML professionals have debated this intensively on the xml-dev electronic mailing list; some feel that a namespace name could be a URI, since the collection of names comprising a particular namespace could be considered to be a resource that is being identified, and since the Namespaces in XML specification says that the namespace name is a URI reference. But the consensus seems to suggest that a namespace name is just a string that happens to look like a URI, nothing more.
Initially, the namespace name was allowed to match the syntax of any non-empty URI reference, but an erratum to the "Namespaces In XML Recommendation" later deprecated the use of relative URI references. A separate specification was issued for namespaces for XML 1.1, and allows IRI references, not just URI references, to be used as the basis for namespace names.
In order to mitigate the confusion that began to arise among newcomers to XML from the use of URIs (particularly HTTP URLs) for namespaces, a descriptive language called RDDL developed, though the specification of RDDL (http://www.rddl.org/) has no official standing and has not been considered nor approved by any organization (e.g., W3C). An RDDL document can provide machine- and human-readable information about a particular namespace and about the XML documents that use it. XML document authors were encouraged[by whom?] to put RDDL documents in locations such that if a namespace name in their document was somehow dereferenced, then an RDDL document would be obtained, thus satisfying the desire among many developers for a namespace name to point to a network-accessible resource.
[edit] See also
For help on using external links on Wikipedia, see Help:URL and Wikipedia:External links
* .arpa - uri.arpa is for dynamic discovery
* Dereferenceable URI (an HTTP URI)
* History of the Internet
* IRI (Internationalized Resource Identifier)
* Namespace (programming)
* percent-encoding
* Persistent Uniform Resource Locator (PURL)
* Uniform Naming Convention (UNC), in computing
* URI scheme
* Uniform Resource Locator (URL)
* Uniform Resource Name (URN)
* Website
* XRI (Extensible Resource Identifier)
[edit] References
This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (October 2008)
1. ^ This separate document is not explicitly linked, RFC 2717 and RFC 4395 point to the IANA registry as the official URI scheme registry.
2. ^ IANA registry of URI schemes[1]
3. ^ The httpRange-14 resolution consists of three bullet points and did not help much to reduce the confusion. http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
4. ^ http://www.w3.org/TR/cooluris/
[edit] External links
The external links in this article may not follow Wikipedia's content policies or guidelines.
Please improve this article by removing excessive or inappropriate external links.
* RFC 3986 / STD 66 (2005) – the current[update] generic URI syntax specification
* RFC 2396 (1998) and RFC 2732 (1999) – obsolete, but widely implemented, version of the generic URI syntax
* RFC 1808 (1995) – obsolete companion to RFC 1738 covering relative URL processing
* RFC 1738 (1994) – mostly obsolete definition of URL schemes and generic URI syntax
* RFC 1630 (1994) – the first generic URI syntax specification; first acknowledgment of URLs in an Internet standard
* URI Schemes – IANA-maintained registry of URI Schemes
* URI Working Group – coordination center for development of URI standards
* Architecture of the World Wide Web, Volume One, §2: Identification – by W3C
* Example of discussion about names and addresses
* W3C materials related to Addressing
* W3C URI Clarification
* What's a URI and why does it matter? (2008) - from W3C
* The Self-Describing Web (2008) - from W3C
[hide]
v • d • e
Semantic Web
Background
World Wide Web · Internet · Databases · Semantic networks · Ontologies
Sub-topics
Linked Data · Data Web · Hyperdata · Dereferenceable URIs · Ontologies · Rule bases · Data Spaces
Applications
Semantic wiki · Semantic publishing · Semantic search · Semantic advertising · Semantic reasoner · Semantic matching · Semantic mapper · Semantic broker · Semantic analytics · Semantic service oriented architecture
Related Topics
Folksonomy · Web 2.0 · Plain Old Semantic HTML · Search engine optimization · Open Database Connectivity · References · Information architecture · Knowledge management · Collective intelligence · Topic Maps · XML · Description logic
Standards
Syntax & Supporting Technologies : RDF (Notation 3 · Turtle · N-Triples) · SPARQL · URI · HTTP · XML
Schemas, Ontologies & Rules : RDFS · OWL · Rule Interchange Format · Semantic Web Rule Language
Semantic Annotation : RDFa · eRDF · GRDDL · Microformats
Common Vocabularies : FOAF · SIOC · Dublin Core · SKOS
People
Tim Berners-Lee · James Hendler · Ora Lassila · Nigel Shadbolt · Wendy Hall
Key Semantic
Web Organizations
W3C · WSRI · MIT · OpenLink Software · Talis Group · ClearForest · University o
URL normalization
From Wikipedia, the free encyclopedia
Jump to: navigation, search
URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent.
Search engines employ URL normalization in order to assign importance to web pages and to reduce indexing of duplicate pages. Web crawlers perform URL normalization in order to avoid crawling the same resource more than once. Web browsers may perform normalization to determine if a link has been visited or to determine if a page has been cached.
Contents
[hide]
* 1 Normalization process
* 2 Normalization based on URL lists
* 3 References
* 4 See also
[edit] Normalization process
There are several type of normalization that may be performed:
* Converting the scheme and host to lower case. The scheme and host components of the URL are case-insensitive. Most normalizers will convert them to lowercase. Example:
HTTP://www.Example.com/ → http://www.example.com/
* Adding trailing / Directories are indicated with a trailing slash and should be included in URLs. Example:
http://www.example.com → http://www.example.com/
* Removing directory index. Default directory indexes are generally not needed in URLs. Examples:
http://www.example.com/default.asp → http://www.example.com/
http://www.example.com/a/index.html → http://www.example.com/a/
* Converting the entire URL to lower case. Some web servers that run on top of case-insensitive file systems allow URLs to be case-insensitive. URLs from a case-insensitive web server may be converted to lowercase to avoid ambiguity. Example:
http://www.example.com/BAR.html → http://www.example.com/bar.html
* Capitalizing letters in escape sequences. All letters within a percent-encoding triplet (e.g., "%3A") are case-insensitive, and should be capitalized. Example:
http://www.example.com/a%c2%b1b → http://www.example.com/a%C2%B1b
* Removing the fragment. The fragment component of a URL is usually removed. Example:
http://www.example.com/bar.html#section1 → http://www.example.com/bar.html
* Removing the default port. The default port (port 80 for the “http” scheme) may be removed from (or added to) a URL. Example:
http://www.example.com:80/bar.html → http://www.example.com/bar.html
* Removing dot-segments. The segments “..” and “.” are usually removed from a URL according to the algorithm described in RFC 3986 (or a similar algorithm). Example:
http://www.example.com/../a/b/../c/./d.html → http://www.example.com/a/c/d.html
* Removing “www” as the first domain label. Some websites operate in two Internet domains: one whose least significant label is “www” and another whose name is the result of omitting the least significant label from the name of the first. For example, http://example.com/ and http://www.example.com/ may access the same website. Although many websites redirect the user to the non-www address (or vice versa), some do not. A normalizer may perform extra processing to determine if there is a non-www equivalent and then normalize all URLs to the non-www prefix. Example:
http://www.example.com/ → http://example.com/
* Sorting the variables of active pages. Some active web pages have more than one variable in the URL. A normalizer can remove all the variables with their data, sort them into alphabetical order (by variable name), and reassemble the URL. Example:
http://www.example.com/display?lang=en&article=fred → http://www.example.com/display?article=fred〈=en
* Removing arbitrary querystring variables. An active page may expect certain variables to appear in the querystring; all unexpected variables should be removed. Example:
http://www.example.com/display?id=123&fakefoo=fakebar → http://www.example.com/display?id=123
* Removing default querystring variables. A default value in the querystring will render identically whether it is there or not. When a default value appears in the querystring, it should be removed. Example:
http://www.example.com/display?id=&sort=ascending → http://www.example.com/display
* Removing the "?" when the querystring is empty. When the querystring is empty, there is no need for the "?". Example:
http://www.example.com/display? → http://www.example.com/display
[edit] Normalization based on URL lists
Some normalization rules may be developed for specific websites by examining URL lists obtained from previous crawls or web server logs. For example, if the URL
http://foo.org/story?id=xyz
appears in a crawl log several times along with
http://foo.org/story_xyz
we may assume that the two URLs are equivalent and can be normalized to one of the URL forms.
Schonfeld et al. (2006) present a heuristic called DustBuster for detecting DUST (different URLs with similar text) rules that can be applied to URL lists. They showed that once the correct DUST rules were found and applied with a canonicalization algorithm, they were able to find up to 68% of the redundant URLs in a URL list.
[edit] References
* RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
* Sang Ho Lee, Sung Jin Kim, and Seok Hoo Hong (2005). "On URL normalization". Proceedings of the International Conference on Computational Science and its Applications (ICCSA 2005): 1076-1085.
* Gautam Pant, Padmini Srinivasan, and Filippo Menczer (2004). "Crawling the Web". Web Dynamics: Adapting to Change in Content, Size, Topology and Use, edited by M. Levene and A. Poulovassilis: 153-178.
* Uri Schonfeld, Ziv Bar-Yossef, and Idit Keidar (2006). "Do not crawl in the DUST: different URLs with similar text". Proceedings of the 15th international conference on World Wide Web: 1015-1016.
* Uri Schonfeld, Ziv Bar-Yossef, and Idit Keidar (2007). "Do not crawl in the DUST: different URLs with similar text". Proceedings of the 16th international conference on World Wide Web: 111-120.
[edit] See also
* Web crawler
* Uniform Resource Locator
Retrieved from "http://en.wikipedia.org/wiki/URL_normalization"
Categories: URL | Internet search algorithms
URI scheme
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In the field of computer networking, a URI scheme is the top level of the Uniform Resource Identifier (URI) naming structure. All URIs and absolute URI references are formed with a scheme name, followed by a colon character (":"), and the remainder of the URI called (in the outdated RFCs 1738 and 2396, but not the current STD 66/RFC 3986) the scheme-specific part. The syntax and semantics of the scheme-specific part are left largely to the specifications governing individual schemes, subject to certain constraints such as reserved characters and how to "escape" them.
URI schemes are sometimes erroneously referred to as "protocols", or specifically as URI protocols or URL protocols, since most were originally designed to be used with a particular protocol, and often have the same name. The http scheme, for instance, is generally used for interacting with Web resources using HyperText Transfer Protocol. Today, URIs with that scheme are also used for other purposes, such as RDF resource identifiers and XML namespaces, that are not related to the protocol. Furthermore, some URI schemes are not associated with any specific protocol (e.g. "file") and many others do not use the name of a protocol as their prefix (e.g. "news").
URI schemes should be registered with IANA, although non-registered schemes are used in practice. RFC 4395 describes the procedures for registering new URI schemes.
Contents
[hide]
* 1 Generic syntax
o 1.1 Examples
* 2 Official IANA-registered schemes
* 3 Unofficial but common URI schemes
* 4 External links
[edit] Generic syntax
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
: [ ? ] [ # ]
The scheme name consists of a letter followed by any combination of letters, digits, and the plus ("+"), period ("."), or hyphen ("-") characters; and is terminated by a colon (":").
The hierarchical part of the URI is intended to hold identification information hierarchical in nature. Usually this part begins with a double forward slash ("//"), followed by an authority part and an optional path.
* The authority part holds an optional user information part terminated with "@" (e.g. username:password@), a hostname (i.e. domain name or IP address), and an optional port number preceded by a colon ":".
* The path part is a sequence of segments (conceptually similar to directories, though not necessarily representing them) separated by a forward slash ("/"). Each segment can contain parameters separated from it using a semicolon (";"), though this is rarely used in practice.
The query is an optional part separated with a question mark, which contains additional identification information which is not hierarchical in nature. The query string syntax is not generically defined, but is commonly organized as a sequence of= pairs separated by a semicolon[1][2][3] or separated by an ampersand, e. g. key1=value1&key2=value2&key3=value3 or key1=value1;key2=value2;key3=value3.
The fragment is an optional part separated from the front parts by a hash ("#"). It holds additional identifying information that provides direction to a secondary resource, e.g. a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id atrribute of a specific element and web browsers will make sure this element is visible.
[edit] Examples
The following are two example URIs and their component parts (taken loosely from RFC 3986 — STD 66):
foo://username:password@example.com:8042/over/there/?name=ferret#nose
\ / \________________/\_________/ \__/\_________/ \_________/ \__/
| | | | | | |
| userinfo hostname port path query fragment
| \_______________________________/
scheme authority
|
| path
| ___________|____________
/ \ / \
urn:example:animal:ferret:nose
1. ^ RFC 1866 section 8.2.1 : by Tim Berners-Lee in 1995 encourages CGI authors to support ';' in addition to '&'.
2. ^ HTML 4.01 Specification: Implementation, and Design Notes: "CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner."
3. ^ Hypertext Markup Language - 2.0 "CGI implementors are encouraged to support the use of ';' in place of '&' "
[edit] Official IANA-registered schemes
The official URI schemes registered with the IANA follow.
Scheme Purpose Defined by General format Notes
aaa Diameter Protocol RFC 3588 aaa://[:][;transport=][;protocol=]
example:
aaa://host.example.com:1813;transport=udp;protocol=radius
aaas Secure equivalent of aaa RFC 3588 aaas://[:][;transport=][;protocol=]
acap Application Configuration Access Protocol RFC 2244 acap://[[;AUTH=]@][:]/ URL scheme used within the ACAP protocol for the "subdataset" attribute, referrals and inheritance
cap Calendar access protocol RFC 4324 generic syntax URL scheme used to designate both calendar stores and calendars accessible using the CAP protocol
cid Referencing individual parts of an SMTP/MIME message RFC 2392 cid: e.g. referencing an attached image within a formatted e-mail. (See also mid:)
crid TV-Anytime Content Reference Identifier RFC 4078 crid:/// Allow references to scheduled publications of broadcast media content.
data Inclusion of small data items inline RFC 2397 data:[;base64],
dav HTTP Extensions for Distributed Authoring (WebDAV) RFC 2518 dav: Used for internal identifiers only; WebDAV itself addresses resources using the http: and https: schemes. [1]
dict Dictionary service protocol RFC 2229 dict://;@:/d:::
dict://;@:/m::::
refer to definitions or word lists available using the DICT protocol
dns Domain Name System RFC 4501 dns:[//[:]/][?]
examples:
dns:example?TYPE=A;CLASS=IN
dns://192.168.1.1/ftp.example.org?type=A
designates a DNS resource record set, referenced by domain name, class, type, and, optionally, the authority
fax Used for telefacsimile numbers RFC 2806 fax: Seems to be deprecated in RFC 3966 in favour of tel:
file Addressing files on local or network file systems RFC 1738 generic syntax
(often appears as file:///path, the 3rd '/' is the final delimiter when no host (authority) is specified between) Unusual in not being bound to any network protocol, and not usable in an Internet context.
ftp FTP resources RFC 1738 generic syntax
go Common Name Resolution Protocol RFC 3368 go://[]?[]*[;=[,]] or
go:*[;=[,]]
gopher Used with Gopher protocol RFC 4266 gopher://:/- /
h323 Used with H.323 multimedia communications RFC 3508 h323:[@][:][;]
http HTTP resources RFC 2616 generic syntax
https HTTP connections secured using SSL/TLS RFC 2817 generic syntax
icap Internet Content Adaptation Protocol RFC 3507
im Instant messaging protocol RFC 3860 RFC 4622 im:[@] Works as xmpp: URI for single user chat sessions.
imap Accessing e-mail resources through IMAP RFC 2192 imap://[[;AUTH=]@][:]/
info Information Assets with Identifiers in Public Namespaces RFC 4452
ipp Internet Printing Protocol RFC 3510
iris
iris.beep
iris.xpc
iris.xpcs
iris.lws Internet Registry Information Service RFC 3981 RFC 3983 RFC 4992 RFC 4992 RFC 4993
ldap LDAP directory request RFC 2255
RFC 4516 ldap://[[:]][/ [?[][?[][?[][?]]]]]
example:
ldap://ldap1.example.net:6666/o=University%20of%20Michigan, c=US??sub?(cn=Babs%20Jensen)
mailto SMTP e-mail addresses and default content RFC 2368 mailto:[?=[&=]]
example:
mailto:jsmith@example.com?subject=A%20Test&body=My%20idea%20is%3A%20%0A
Headers are optional, but often include subject=; body= can be used to pre-fill the body of the message.
mid Referencing SMTP/MIME messages, or parts of messages. RFC 2392 mid:[/] (See also cid:)
modem modem RFC 3966
msrp
msrps Message Session Relay Protocol RFC 4975
mtqp Message Tracking Query Protocol RFC 3887
mupdate Mailbox Update Protocol RFC 3656
news (Usenet) newsgroups and postings RFC 1738 news: or
news: References a particular resource, regardless of location.
nfs Network File System resources RFC 2224 generic syntax
nntp Usenet NNTP RFC 1738 nntp://:// Referencing a specific host is often less useful than referencing the resource generically, as NNTP servers are not always publicly accessible
opaquelocktoken opaquelocktoken RFC 4918
pop Accessing mailbox through POP3 RFC 2384 pop://[[;AUTH=]@][:]
pres Used in Common Profile for Presence (CPP) to identify presence RFC 3859 pres: [?=[&=]] Similar to "mailto:"
prospero Prospero Directory Service RFC 4157 Listed as "Historical" by IANA.
rtsp Real Time Streaming Protocol RFC 2326
service RFC 2609
shttp Secure HTTP RFC 2660 Largely superseded by HTTPS.
sip Used with Session Initiation Protocol (SIP) RFC 3969
RFC 3261 sip:[:]@[:][;][?]
examples:
sip:alice@atlanta.com?subject=project%20x&priority=urgent
sip:+1-212-555-1212:1234@gateway.com;user=phone
sips Secure equivalent of sip RFC 3969
RFC 3261 sips:[:]@[:][;][?]
snmp Simple Network Management Protocol RFC 4088 snmp://[user@]host[:port][/[[;]][/]]
examples:
snmp://example.com//1.3.6.1.2.1.1.3+
snmp://tester5@example.com:8161/bridge1;800002b804616263
soap.beep
soap.beeps RFC 3288
tag RFC 4151
tel Used for telephone numbers RFC 3966
RFC 2806 tel:
telnet Used with telnet RFC 4248 telnet://:@[:/]
tftp Trivial File Transfer Protocol RFC 3617
thismessage multipart/related relative reference resolution RFC 2557
tip Transaction Internet Protocol RFC 2371
tv TV Broadcasts RFC 2838
urn Uniform Resource Names RFC 2141 urn::
vemmi Versatile Multimedia Interface RFC 2122
wais Used with Wide area information server (WAIS) RFC 4156 wais://:/[?] or wais://:/// Listed as "Historical" by IANA.
xmlrpc.beep
xmlrpc.beep RFC 3529
xmpp XMPP (Jabber) RFC 5122 xmpp:@[:]/[][?]
z39.50r Z39.50 retrieval RFC 2056 z39.50r://[:]/?[;esn=][;rs=]
z39.50s Z39.50 session RFC 2056 z39.50s://[:]/[][?][;esn=][;rs=]
[edit] Unofficial but common URI schemes
Scheme Purpose Defined by General format Notes
about Displaying product information and internal information Un-standardised
about:blank is commonly used to display a blank page. Widely used by web browsers, sometimes even providing interactive resources. The Opera web browser uses opera: instead.
adiumxtra Direct installation of Adium Xtras (plugins). The Adium Team adiumxtra://www.adiumxtras.com/download/0000 0000 refers to a specific Xtra
aim Controlling AOL Instant Messenger. AOL aim:? Functions include goim, addbuddy, and buddyicon.
afp Accessing Apple Filing Protocol shares IETF Draft over TCP/IP: afp://[@][:][/[]]
over AppleTalk: afp:/at/[@][:][/]
aw Link to an Active Worlds world Activeworlds Inc. aw://:/ Mostly found in HTTP referers when users open a website from within a Active Worlds world.
bolo Join an existing bolo game. bolo:/// Mostly passed via IRC or via tracker servers.
callto Launching Skype call (+And in Hungary the KLIP Software call too) (unofficial; see also skype:) Skype callto: or
callto: [2] Introduced with Microsoft NetMeeting. Works with current version of Skype with Firefox, Internet Explorer and Safari
chrome Specifies user interfaces built using XUL in Mozilla-based browsers. Mozilla chrome://// (Where is either "content", "skin" or "locale") Works only in Mozilla-based browsers such as Firefox, SeaMonkey and Netscape.
cvs Provides a link to a Concurrent Versions System (CVS) Repository Concurrent Versions System cvs://@/;[date=date to retrieve | tag=tag to retrieve]
ed2k Resources available using the eDonkey2000 network eDonkey2000 ed2k://|file||||/ or
ed2k://|server|||/ Links to servers are also possible, as are additional parameters. Official documentation from eDonkey2000 website at the Internet Archive
feed web feed subscription feed: or
feed://
examples:
feed://example.com/rss.xml
feed:https://example.com/rss.xml
See Feed URI scheme for a detailed overview of common implementations, supported software, and critics.
fish Accessing another computer's files using the SSH protocol fish KDE kioslave fish://[[:]@][:] See Files transferred over shell protocol for details about the protocol.
gg Starting chat with Gadu-Gadu user Gadu-Gadu gg:
gizmoproject Gizmo Project calling link. gizmoproject://call?id= May use sip:// instead of gizmoproject:// in recent versions of Gizmo.
iax2 Inter-Asterisk eXchange protocol version 2 IETF Draft iax2:[@][:][/[?]]
examples:
iax2:[2001:db8::1]:4569/alice?friends
iax2:johnQ@example.com/12022561414
irc Connecting to a server to join a channel. IETF Draft
Old IETF Draft irc://[:]/[[?]] Assuming the client knows a server associated with the name, "host" may optionally be an IRC network name.
ircs Secure equivalent of irc IETF Draft ircs://[:]/[[?]] See irc
itms Used for connecting to the iTunes Music Store Apple Inc itms:
jar Compressed archive member Java API jar:!/[] Works for any ZIP based file.
javascript Execute javascript code IETF Draft javascript: Works in any modern browser.
keyparc Keyparc encrypt/decrypt resource. keyparc://encrypt// or
keyparc://decrypt//
lastfm Connecting to a radio stream from Last.fm. Last.fm lastfm:// or lastfm://globaltags/ or
lastfm://user//
ldaps Secure equivalent of ldap ldaps://[[:]][/ [?[][?[][?[][?]]]]] Not an IETF standard, but commonly used in applications.
magnet "magnet links" Magnet-URI Project magnet:?xt=urn:sha1:&dn=
(other parameters are also possible) Used by various peer-to-peer clients, usually providing the hash of a file to be located on the network.
mms Windows streaming media mms://:/ Used by Windows Media Player to stream audio and/or video.
msnim Adding a contact, or starting a conversation in Windows Live Messenger Windows Live Messenger Add a contact to the buddy list
msnim:add?contact=nada@nowhere.com
Start a conversation with a contact
msnim:chat?contact=nada@nowhere.com
Start a voice conversation with a contact
msnim:voice?contact=nada@nowhere.com
Start a video conversation with a contact
msnim:video?contact=nada@nowhere.com
Can be invoked from a web page or via a run command or an ie browser URL (won't work with firefox 2.0.0.8). For web pages use this HTML: Click to chat!
mvn Access Apache Maven repository artifacts OPS4J mvn:org.ops4j.pax.web.bundles/service/0.2.0-SNAPSHOT
mvn:http://user:password@repository.ops4j.org/maven2!org.ops4j.pax.web.bundles/service/0.2.0
notes Open a Lotus Notes document or database Lotus Notes notes:// Used by IBM Lotus Notes to refer to documents and databases stored within the Lotus Notes system. When clicked in a browser on a computer with Lotus Notes client installed, Notes will open the document link as if a Notes DocLink were clicked within Notes.
psyc Used to identify or locate a person, group, place or a service and specify its ability to communicate PSYC psyc:[//[:[][]]/[][#] -
paparazzi:http Used to launch and automatically take a screen shot using the application "Paparazzi" (Mac only) Derailer paparazzi:http:[//[:[][]]/ Official documentation from Paparazzi website
rmi Look up a Java object in an RMI registry. Sun rmi://[:]/ URI scheme understood by JNDI. Can be used to lookup a remote Java object within an RMI registry (typically for the purposes of RMI on that object). Host/port in the URI are of the rmiregistry process, not the remote object.
rsync rsync rsync://[:]/
secondlife Open the Map floater in Second Life application to teleport the resident to the location. Linden Lab secondlife:///// Used by SLurl.com. Knowledge base article.
sgn Social Graph Node Mapper Google example:
sgn://social-network.example.com/?ident=bob
Official documentation from sgnodemapper project.
skype Launching Skype call (official; see also callto:) Skype skype:[?[add|call|chat|sendfile|userinfo]] Official documentation from Skype website.
ssh SSH connections (like telnet:) and IETF Draft ssh://[[;fingerprint=]@][:]
sftp SFTP file transfers (not be to confused with FTPS (FTP/SSL)) IETF Draft sftp://[[;fingerprint=]@][:]//
smb Accessing SMB/CIFS shares IETF Draft smb://[@][:][/[]][?=[;=]] or
smb://[@][:][/]
sms Interact with SMS capable devices for composing and sending messages. IETF draft sms:? Should be used as a subset to the tel: schema.[citation needed]
soldat Joining servers Soldat soldat://:/
example:
soldat://127.0.0.1:23073/
Official note in Manual
steam Interact with Steam: install apps, purchase games, run games, etc. Steam, Valve Corporation steam: or
steam:/// Official documentation from Valve Developer Community website
svn Provides a link to a Subversion (SVN) source control repository Subversion (software) svn[+ssh]://@<:port>/
teamspeak Joining a server. TeamSpeak teamspeak://[:]/[?=[&=]] Official documentation from TeamSpeak Website
unreal Joining servers Unreal unreal://[:]/ Unreal legacy "protocol"
ut2004 Joining servers Unreal Tournament 2004 ut2004://[:][/
Contents
[hide]
* 1 Syntax
* 2 URLs as locators
* 3 Internet hostnames
* 4 See also
* 5 References
* 6 External links
Main article: URI scheme#Generic syntax
Every URL begins with the scheme name that defines its namespace, purpose, and the syntax of the remaining part of the URL. Most Web-enabled programs will try to dereference a URL according to the semantics of its scheme and a context-specific heuristic. For example, a Web browser will usually dereference the URL http://example.org/ by performing an HTTP request to the host example.org, at the default HTTP port (port 80). Dereferencing the URL mailto:bob@example.com will usually start an e-mail composer with the address bob@example.com in the To field.
example.com is a domain name; an IP address or other network address might be used instead. In addition, URLs that specify https as a scheme (such as https://example.com/) normally denote a secure website.
The hostname portion of a URL, if present, is case insensitive (since the DNS is specified to ignore case); other parts are not required to be, but may be treated as case insensitive by some clients and servers, especially those that are based on Microsoft Windows. For example:
1. http://en.wikipedia.org/ and HTTP://EN.WIKIPEDIA.ORG/ will both open same page.
2. http://en.wikipedia.org/wiki/URL is correct, but http://en.wikipedia.org/WIKI/URL/ will result in an HTTP 404 error page.
[edit] URLs as locators
In its current strict technical meaning, a URL is a URI that, “in addition to identifying a resource, [provides] a means of locating the resource by describing its primary access mechanism (e.g., its network ‘location’).”[3]
[edit] Internet hostnames
Main article: Hostname
On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.
Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.
[edit] See also
* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme
[edit] References
1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Uniform Resource Identifier (URI): Generic Syntax”. Internet Society. RFC 3986; STD 66.
[edit] External links
* RFC 3986 Uniform Resource Identifier (URI): Generic Syntax [Text] [HTML]
URLs as locators
In its current strict technical meaning, a URL is a URI that, “in addition to identifying a resource, [provides] a means of locating the resource by describing its primary access mechanism (e.g., its network ‘location’).”[3]
[edit] Internet hostnames
Main article: Hostname
On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.
Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.
[edit] See also
* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme
[edit] References
1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Unifor
nternet hostnames
Main article: Hostname
On the Internet, a hostname is a domain name assigned to a host computer. This is usually a combination of the host's local name with its parent domain's name. For example, "en.wikipedia.org" consists of a local hostname ("en") and the domain name "wikipedia.org". This kind of hostname is translated into an IP address via the local hosts file, or the Domain Name System (DNS) resolver. It is possible for a single host computer to have several hostnames; but generally the operating system of the host prefers to have one hostname that the host uses for itself.
Any domain name can also be a hostname, as long as the restrictions mentioned below are followed. So, for example, both "en.wikimedia.org" and "wikimedia.org" are hostnames because they both have IP addresses assigned to them. The domain name "pmtpa.wikimedia.org" is not a hostname since it does not have an IP address, but "rr.pmtpa.wikimedia.org" is a hostname. All hostnames are domain names, but not all domain names are hostnames.
[edit] See also
* CURIE (Compact URI)
* Extensible Resource Identifier (XRI)
* Internationalized Resource Identifier (IRI)
* Uniform Resource Identifier (URI)
* URL normalization
* URI scheme
[edit] References
1. ^ RFC 1738
2. ^ RFC 3305
3. ^ Tim Berners-Lee, Roy T. Fielding, Larry Masinter. (January 2005). “Uniform Resource Identifier (URI): Generic Syntax”. Internet Society. RFC 3986; STD 66.
[edit] External links
* RFC 3986 Uniform Resource Identifie
CURIE
From Wikipedia, the free encyclopedia
Jump to: navigation, search
For other uses see Curie (disambiguation)
A CURIE (short for Compact URI) is an abbreviated URI expressed in CURIE syntax, and may be found in both XML and non-XML grammars. A CURIE may be considered a datatype.
An example of CURIE syntax: [isbn:0393315703]
The square brackets may be used to prevent ambiguities between CURIEs and regular URIs.
QNames (the namespace prefixes used in XML) often are used as a CURIE, and may be considered a type of CURIE. CURIEs, as defined by the W3C, will be better defined and may include checking. Unlike QNames, the part of a CURIE after the colon does not need to conform to the rules for element names.
The first W3C Working Draft of CURIE syntax was released 7 March 2007.[1]
[edit] Example
This example is based on one from the W3C Working Draft 7 March 2007, using a QName syntax within XHTML.
...
Find out more about biomes.
* The definition ("") is highlighted in yellow
* The CURIE ("[wiki:Biome]") is highlighted in green
[edit] References
1. ^ CURIE Syntax 1.0
[edit] External links
* www.w3.org/TR/curie
[hide]
v • d • e
Standards of the World Wide Web Consortium
Recommendations
Canonical XML · CDF · CSS · DOM · HTML · MathML · OWL · P3P · PLS · RDF · RDF Schema · SISR · SMIL · SOAP · SRGS · SSML · SVG · SPARQL · Timed Text · VoiceXML · WSDL · XForms · XHTML · XLink · XML · XML Base · XML Encryption · XML Events · XML Information Set · XML Schema · XML Signature · XPath · XPointer · XQuery · XSL · XSL-FO · XSLT
Notes
XAdES · XHTML+SMIL
Working Drafts
CCXML · CURIE · HTML 5 · InkML · WICD · XFDL · XFrames · XBL · XHTML+MathML+SVG · XProc
Guidelines
Web Content Accessibility Guidelines
deprecated
HDML · JSSS · PGML · VML
Extensible Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The neutrality of this article is disputed.
Please see the discussion on the talk page. (June 2008)
Please do not remove this message until the dispute is resolved.
Extensible Resource Identifier (abbreviated XRI) is a scheme and resolution protocol for abstract identifiers compatible with Uniform Resource Identifiers and Internationalized Resource Identifiers, developed by the XRI Technical Committee at OASIS. The goal of XRI is a standard syntax and discovery format for abstract, structured identifiers that are domain-, location-, application-, and transport-independent, so they can be shared across any number of domains, directories, and interaction protocols.
The XRI 2.0 specifications narrowly failed to become OASIS standards due to the number of negative votes,[1] a failure attributed[2] to the intervention of the W3C Technical Architecture Group which made a statement recommending against using XRIs or taking the XRI specifications forward.[3] The core of the dispute is whether the widely interoperable HTTP URIs are capable of fulfilling the role of abstract, structured identifiers, as the TAG believes,[4] but whose limitations the XRI Technical Committee was formed specifically to address.[5]
With the growth of XML, Web services, and other ways of adapting the Web to automated, machine-to-machine communications, it is increasingly important to be able to identify a resource independent of any specific physical network path, location, or protocol in order to:
* Create structured identifiers with self-describing "tags" that can be understood across domains the same way XML documents provide a self-describing, domain-independent data format.
* Maintain a persistent link to the resource regardless of whether its network location changes.
* Delegate identifier management not just in the authority segment (the first segment following the "xxx://" scheme name) but anywhere in the identifier path.
* Map identifiers used to identify a resource in one domain to other synonyms used to identify the same resource in the same domain, or in other domains.
By early 2003, these requirements led to the a resolution protocol based on HTTP(S) and simple XML documents called XRDS (Extensible Resource Descriptor Sequence).
Contents
[hide]
* 1 Features
* 2 Composition of an Extensible Resource Identifier
* 3 Resolving an Extensible Resource Identifier
o 3.1 Proxy resolvers and the HXRI
* 4 Examples of XRI cross-reference syntax
* 5 Other examples of XRI 2.0 syntax
* 6 Applications
* 7 Licensing
* 8 References
* 9 See also
* 10 External links
[edit] Features
* URI- and IRI-compatibility — XRIs can be used wherever URIs or IRIs are called for.
* Cross-references — An XRI can contain another XRI (or a URI), to any level of nesting. This enables the construction of structured, "tagged" identifiers that enable identifier sharing across domains the same way XML enables data sharing across domains.
* Global context symbols — These are single-character symbols (=, @, +, $, or !) that provide a simple, human-friendly way to indicate the global context of an i-name or i-number. These are not required, but may be used within communities of interest that agree on their meaning and how they are resolved.
* Peer-to-peer addressing — XRI syntax supports the ability for any two network nodes to assign each other XRIs and perform cross-resolution. That is, a top-level namespace authority can be referred to by names assigned by other parties. This aids in federating namespaces between organizations or communities of interest.
* Decentralization — XRIs can be rooted in either centralized addressing systems (e.g., IP addresses or DNS domain names) or private/decentralized root authorities and peer-to-peer addressing.
* Delegation — Namespaces can be delegated to other namespace authorities.
* Federation — Namespaces defined separately at any level can be joined together in a hierarchical or polyarchical fashion, and made visible and resolvable.
* Persistence — The ability to express the intent that parts (or all) of an XRI are permanent identifiers that will never be reassigned.
* Human-friendly and machine-friendly formats — XRI provides syntax both for identifiers that can be created and understood by humans easily (i-names), and those that are optimized for machine structuring/parsing (i-numbers).
* Simple, extensible resolution — XRI offers a lightweight resolution scheme using HTTP and a simple XML document format called XRDS.
* Trusted resolution — the XRI resolution protocol includes three modes of trusted version: a) HTTPS, b) SAML assertions, and c) both.
* Multiple resolution options — XRI resolution can be independent of DNS.
* Fully internationalizable, leveraging Unicode and IRI specifications.
* Transport independent — XRIs are not bound to any specific transport protocols or mechanism.
[edit] Composition of an Extensible Resource Identifier
An XRI starting with "=" is thought of identifying a person. An XRI starting with "@" identifies a company or organization. A starting "+" indicates a generic concept, subject or topic [6].
A "*" marks a delegation. For example with "=family*name", "=family" delegates the resolving of its sub-XRI "name" to another resolver. This is analogous to DNS' delegating the subdomain resolution to other nameservers (name.family.de: after resolving de, the nameserver responsible for de delegates to the family nameserver, which delegates to the name nameserver).
[edit] Resolving an Extensible Resource Identifier
XRIs are resolved to XRDS documents using the HTTP(S) protocol in the same way as URLs are resolved to Resource Records using the DNS protocol. This lookup process can be configured by passing parameters [7].
[edit] Proxy resolvers and the HXRI
An XRI can be transformed into a URI by adding http(s)://xri.*/ at the beginning and appending the XRI. Internally, the URI now refers to a so called proxy resolver, which resolves a URI of this kind to an XRDS document. The proxy resolver found under http://xri.net for example can be used to resolve an XRI. So =example becomes http://xri.net/=example. The second form is called an HTTP XRI or shortly HXRI. The owner of the XRI =example can tell the proxy resolver what to do, if the HXRI is called. One possible reaction is to do a 302 HTTP redirect to a stored URI.
Further parameters to specifiy the resolution can be appended to the HXRI, e.g. to get the whole XRDS document or to get service descriptions for this XRI. E.g. if you attach ?_xrd_r=application/xrds+xml to the HXRI, the whole XRDS document is returned. So http://xri.net/=example?_xrd_r=application/xrds+xml returns the whole XRDS for the XRI =example.
[edit] Examples of XRI cross-reference syntax
Say a library system uses URNs in the ISBN namespace to identify books and DNS subdomains to identify its library branches. HTTP URI syntax does not provide a standard way to express the URN for the book title in the context of the DNS name for the library branch. XRI cross-reference syntax solves this problem by allowing the library (and even automated programs running at the library) to programmatically construct the XRIs necessary to address any book at any branch. Examples:
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)
xri://shoreline.library.example.com/(urn:isbn:0-395-36341-1)
xri://northgate.library.example.com/(urn:isbn:0-395-36341-1)
This ability to create structured, self-describing identifiers can be extended to many other uses. For example, say the library wanted to indicate the type of each book available. By establishing a simple XRI dictionary of book types, it can now programmatically construct XRIs that include this metadata,
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+hardcover)
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+softcover)
xri://broadview.library.example.com/(urn:isbn:0-395-36341-1)/(+reference)
[edit] Other examples of XRI 2.0 syntax
(Note that none of these show the prefix "xri://", which is optional in XRIs when they are not in URI normal form, i.e, they have not undergone the specified transformation between XRI format and URI format.)
Example XRIs composed entirely of reassignable segments:
=Mary.Jones
@Jones.and.Company
+phone.number
+phone.number/(+area.code)
=Mary.Jones/(+phone.number)
@Jones.and.Company/(+phone.number)
@Jones.and.Company/((+phone.number)/(+area.code))
Example XRIs composed entirely of persistent segments:
=!13cf.4da5.9371.a7c5
@!280d.3822.17bf.ca48!78d2/!12
Example of XRIs with mixes of persistent and reassignable segments (XRI allows any combination of the two):
=!13cf.4da5.9371.a7c5/(+phone.number)
@Jones.and.Company!78d2/!12/(+area.code)
[edit] Applications
Examples of applications being developed using XRI infrastructure include:
* OpenID 2.0 includes support for XRIs and uses XRDS for OpenID identifier discovery.
* The Higgins Project uses XRIs and XRDS to address and discover Higgins context providers.
* XDI.org I-name and I-number digital identity addressing services.
* The XDI data sharing protocol under development by the OASIS XDI Technical Committee.
[edit] Licensing
The neutrality of this article is disputed.
Please see the discussion on the talk page. (June 2008)
Please do not remove this message until the dispute is resolved.
This section needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (August 2008)
The XRI Technical Committee is chartered under the RF on Limited Terms Mode of the OASIS IPR policy (See http://www.oasis-open.org/committees/xri/ipr.php for more details.)
Some people[weasel words] argues that the use of the technologies employed in XRI are subject to patent claims, that the licensing rights to these patents has been vested in XDI.org, a non-profit organization which has in turn licensed a non-exclusive interest in the use of the patents to companies associated with the original patent holders, despite the above IPR statement.
[edit] References
1. ^ Failed OASIS
2. ^ Time for OASIS XRI TC and W3C TAG to Sit Down Together
3. ^ TAG recommends against XRI
4. ^ URNs, Namespaces and Registries
5. ^ Xri Solves Real Problems
6. ^ XRI and XDI Explained
7. ^ XRI in a Nutshell
[edit] See also
* I-names
* I-numbers
* XRDS
* XDI
* Dataweb
* Social Web
* Higgins project
* Project Xanadu
[edit] External links
* OASIS XRI Technical Committee specifications:
o XRI Syntax 2.0 Committee Specification
o XRI Resolution 2.0 Committee Specification
o XRI 2.0 FAQ
o XRI Requirements and Glossary 1.0
* W3C Internationalized Resource Identifier (IRI)
* XDI.org - public trust organization governing XRI global registry services
o XDI.org Global Services Specifications - website of XDI.org specifications for global registry services for public i-names and i-numbers
o XDI.org I-Services Specifications - website of XDI.org specifications for XRDS-enabled identity services.
* dev.xri.net - open public wiki on XRI and XRI open source projects
* Internet Identity Workshop One-Pager on XRI and XRDS
* FSF's Dispute with OASIS patent policies and on FSF's Support for OASIS RF on Limited Terms IPR Policy, which is used for ODF.
* EqualsDrummond - blog about XRI and Internet identifiers by Drummond Reed, co-chair of the OASIS XRI Technical Committee and Chief Architect at Cordance, currently under contract with XDI.org to operate XRI registr
nternationalized Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search
On the Internet, the Internationalized Resource Identifier (IRI) is a generalization of the Uniform Resource Identifier (URI), which is in turn a generalization of the Uniform Resource Locator (URL). While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth.
It is defined by RFC 3987.
Contents
[hide]
* 1 Advantages
* 2 Disadvantages
* 3 See also
* 4 External links
[edit] Advantages
There are reasons to see URIs displayed in different languages; mostly it makes it easier on users who are unfamiliar with the roman alphabet, and assuming that isn't too difficult for anyone to replicate arbitrary unicode on their keyboards this can make the URI system more worldly and accessible.
[edit] Disadvantages
Mixing IRIs and ASCII URIs can make it much easier to do phishing attacks which trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay.com or www.paypal.com with an internationalized look-alike "a" character, and point that IRI to a malicious site.
Additionally, it can be difficult for those with different language keyboards to access web resources in other languages; in contrast, open-source programming projects (and most programs) are almost exclusively written using the Roman alphabet to avoid this type of encoding incompatibility.
[edit] See also
* XRI (Extensible Resource Identifier)
* IDN (Internationalized Domain Name)
* Punycode
[edit] External links
* IRI
* Internationalized Resource Identifiers
This computer-related article is a stub. You can help Wikipedia by expanding it
Uniform Resource Identifier
From Wikipedia, the free encyclopedia
Jump to: navigation, search
"URI" redirects here. For other uses, see URI (disambiguation).
In computing, a Uniform Resource Identifier (URI) is a string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. URIs are defined in schemes specifying a specific syntax and associated protocols.
Contents
[hide]
* 1 Relationship to URL and URN
o 1.1 Technical view
o 1.2 RFC 3305
* 2 Syntax
* 3 History
o 3.1 Naming, addressing, and identifying resources
o 3.2 Refinement of specifications
* 4 URI reference
o 4.1 Uses of URI references in markup languages
o 4.2 Examples of absolute URIs
o 4.3 Examples of URI references
* 5 URI resolution
* 6 Relation to XML namespaces
* 7 See also
* 8 References
* 9 External links
[edit] Relationship to URL and URN
Set diagram of URI scheme categories. Schemes in the URL (locator) and URN (name) categories form subsets of URI, and, generally, are also disjoint sets.
Technically URL and URN function as resource IDs, however, many schemes can't be categorized as strictly one or the other, because all URIs can be treated as names, and some schemes embody aspects of both categories – or neither.
Computer scientists may classify a URI as a locator (URL), or a name (URN), or both.
A Uniform Resource Name (URN) is like a person's name, while a Uniform Resource Locator (URL) resembles that person's street address. The URN defines an item's identity, while the URL provides a method for finding it.
The ISBN system for uniquely identifying books provides a typical example of the use of typical URNs. ISBN 0486275574 (urn:isbn:0-486-27557-4) cites unambiguously a specific edition of Shakespeare's play Romeo and Juliet. In order to gain access to this object and read the book, one would need its location: a URL address. A typical URL for this book on a unix-like operating system is a file path, like file:///home/username/RomeoAndJuliet.pdf, identifying the electronic book saved in a local hard disk. So URNs and URLs have complementary purposes.
[edit] Technical view
A URL is a URI that, in addition to identifying a resource, provides means of acting upon or obtaining a representation of the resource by describing its primary access mechanism or network "location". For example, the URL http://www.wikipedia.org/ identifies a resource (Wikipedia's home page) and implies that a representation of that resource (such as the home page's current HTML code, as encoded characters) is obtainable via HTTP from a network host named www.wikipedia.org. A Uniform Resource Name (URN) is a URI that identifies a resource by name in a particular namespace. A URN can be used to talk about a resource without implying its location or how to access it. For example, the URN urn:isbn:0-395-36341-1 is a URI that specifies the identifier system, i.e. International Standard Book Number (ISBN), as well as the unique reference within that system and allows one to talk about a book, but doesn't suggest where and how to obtain an actual copy of it.
Technical publications, especially standards produced by the IETF and the W3C, have long deprecated the term URL, as it is rarely necessary to distinguish between URLs and URIs. However, in nontechnical contexts and in software for the World Wide Web, the term URL remains widely used. Additionally, the term web address, which has no formal definition, is often used in nontechnical publications as a synonym for URL or URI, although it generally refers only to "http" and "https" URL schemes.
[edit] RFC 3305
Much of this discussion comes from RFC3305, titled "Report from the Joint W3C/IETF URI Planning Interest Group: Uniform Resource Identifiers (URIs), URLs, and Uniform Resource Names (URNs): Clarifications and Recommendations". This RFC outlines the work of a joint W3C/IETF working group that was setup specifically to normalize the divergent views held within the IETF and W3C over what the relationship was between the various "UR*" terms and standards. While not published as a full standard by either organization, it has become the basis for the above common understanding and has informed many standards since then.
[edit] Syntax
The URI syntax is essentially a URI scheme name like "HTTP", "FTP", "mailto", "URN", "tel", "rtsp", "file", etc., followed by a colon character, and then a scheme-specific part. The specifications that govern the schemes determine the syntax and semantics of the scheme-specific part, although the URI syntax does force all schemes to adhere to a certain generic syntax that, among other things, reserves certain characters for special purposes, without always saying what those purposes are. The URI syntax also enforces restrictions on the scheme-specific part, in order to, for example, provide for a degree of consistency when the part has a hierarchical structure. Percent-encoding is an often-misunderstood aspect of URI syntax.
[edit] History
[edit] Naming, addressing, and identifying resources
URIs and URLs have a shared history. Early in 1990, Tim Berners-Lee’s proposals for HyperText [2] implicitly introduced the idea of a URL as a short string representing a resource that is the target of a hyperlink. At the time, it was called a hypertext name or document name[3]
Over the next three-and-a-half years, as the World Wide Web's core technologies of HTML (the HyperText Markup Language), HTTP, and Web browsers developed, a need to distinguish a string that provided an address for a resource from a string that merely named a resource emerged. Although not yet formally defined, the term Uniform Resource Locator came to represent the former, and the more contentious Uniform Resource Name came to represent the latter.
During the debate over how to best define URLs and URNs, it became evident that the two concepts embodied by the terms were merely aspects of the fundamental, overarching notion of resource identification. So, in June 1994, the IETF published Berners-Lee's RFC 1630: the first RFC that (in its non-normative text) acknowledged the existence of URLs and URNs, and, more importantly, defined a formal syntax for Universal Resource Identifiers — URL-like strings whose precise syntaxes and semantics depended on their schemes. In addition, this RFC attempted to summarize the syntaxes of URL schemes that were in use at the time. It also acknowledged, but did not standardize, the existence of relative URLs and fragment identifiers.
[edit] Refinement of specifications
In December 1994, RFC 1738 formally defined relative and absolute URLs, refined the general URL syntax, defined how relative URLs were to be resolved to absolute form, and better enumerated the URL schemes that were in use at the time. The definition and syntax of URNs was not settled upon until the publication of RFC 2141 in May 1997.
With the publication of RFC 2396 in August 1998, the URI syntax became a separate specification[4], and most parts of RFCs 1630 and 1738 relating to URIs and URLs in general were revised and expanded. The new RFC changed the significance of the "U" in "URI": it came to represent "Uniform" rather than "Universal". The sections of RFC 1738 that summarized existing URL schemes were moved into a separate document[1]. IANA keeps a registry of those schemes[2], the procedure to register them was first described in RFC 2717.
In December 1999, RFC 2732 provided a minor update to RFC 2396, allowing URIs to accommodate IPv6 addresses. Some time later, a number of shortcomings discovered in the two specifications led to the development of a number of draft revisions under the title rfc2396bis. This community effort, coordinated by RFC 2396 co-author Roy Fielding, culminated in the publication of RFC 3986 in January 2005. This RFC, as of 2009[update] the current version of the URI syntax recommended for use on the Internet, renders RFC 2396 obsolete. It does not, however, render the details of existing URL schemes obsolete; those are still governed by RFC 1738, except where otherwise superseded — RFC 2616 for example, refines the "http" scheme. The content of RFC 3986 was simultaneously published by the IETF as the full standard STD 66, reflecting the establishment of the URI generic syntax as an official Internet protocol.
In August 2002, RFC 3305 pointed out that the term URL has, despite its ubiquity in the vernacular of the Internet-aware public at large, faded into near-obsolescence. It now serves only as a reminder that some URIs act as addresses because they have schemes that imply some kind of network accessibility, regardless of whether systems actually use them for that purpose. As URI-based standards such as Resource Description Framework make evident, resource identification need not be coupled with the retrieval of resource representations over the Internet, nor does it need to be associated with network-bound resources at all.
On November 1, 2006, the W3C Technical Architecture Group published "On Linking Alternative Representations To Enable Discovery And Publishing", a guide to best practices and canonical URIs for publishing multiple versions of a given resource. For example, content might differ by language or by size to adjust for capacity or settings of the device used to access that content.
For the Semantic Web, the HTTP URI scheme can be used to identify both documents and concepts in the real world, this has caused confusion how to exactly distinguish both. The Technical Architecture Group (TAG) published an e-mail in June 2005 on how to solve this problem. This was known as httpRange-14 resolution[3]. To explain this (rather brief) email, W3C published in March 2008 the Interest Group Note Cool URIs for the Semantic Web[4]. This explains the the use of content negotiation and the 303-redirect code in more detail.
[edit] URI reference
A URI reference is another type of string that represents a URI, and, in turn, the resource identified by that URI. Informal usage does not often maintain the distinction between a URI and a URI reference, but protocol documents should not allow for ambiguity.
A URI reference may take the form of a full URI, or just the scheme-specific portion of one, or even some trailing component thereof—even the empty string. An optional fragment identifier, preceded by "#", may be present at the end of a URI reference. The part of the reference before the "#" indirectly identifies a resource, and the fragment identifier identifies some portion of that resource.
In order to derive a URI from a URI reference, software converts the URI reference to "absolute" form by merging it with an absolute "base" URI, according to a fixed algorithm. The URI reference is considered to be relative to the base URI, although if the reference itself is absolute, then the base is irrelevant. The base URI is typically the URI that identifies the document containing the URI reference, although this can be overridden by declarations made within the document or as part of an external data transmission protocol. If a fragment identifier is present in the base URI, it is ignored during the merging process. If a fragment identifier is present in the URI reference, it is preserved during the merging process.
Web document markup languages frequently use URI references in places where there is a need to point to other resources, such as external documents or specific portions of the same logical document.
[edit] Uses of URI references in markup languages
* In HTML, the value of the src attribute of the img element is a URI reference, as is the value of the href attribute of the a or link element.
* In XML, the system identifier appearing after the SYSTEM keyword in a DTD is a fragmentless URI reference.
* In XSLT, the value of the href attribute of the xsl:import element/instruction is a URI reference, as is the first argument to the document() function.
[edit] Examples of absolute URIs
* http://example.org/absolute/URI/with/absolute/path/to/resource.txt
* ftp://example.org/resource.txt
* urn:issn:1535-3613
[edit] Examples of URI references
* http://en.wikipedia.org/wiki/URI#Examples_of_URI_references ("http" is the 'scheme' name, "en.wikipedia.org" is the 'authority', "/wiki/URI" the 'path' pointing to this article, and "#Examples_of_URI_references" is a 'fragment' pointing to this section.)
* http://example.org/absolute/URI/with/absolute/path/to/resource.txt
* /relative/URI/with/absolute/path/to/resource.txt
* relative/path/to/resource.txt
* ../../../resource.txt
* ./resource.txt#frag01
* resource.txt
* #frag01
* (empty string)
[edit] URI resolution
To "resolve" a URI means either to convert a relative URI reference to absolute form, or to dereference a URI or URI reference by attempting to obtain a representation of the resource that it identifies. The "resolver" component in document processing software generally provides both services.
One can regard a URI reference as a same-document reference: a reference to the document containing the URI reference itself. Document processing software is encouraged to use its current representation of the document to satisfy the resolution of a same-document reference; a new representation should not be fetched. This is only a recommendation, and document processing software is free to use other mechanisms to determine whether obtaining a new representation is warranted.
According to the current URI specification as of 2009[update], RFC 3986, a URI reference is a same-document reference if, when resolved to absolute form, it is identical to the base URI that is in effect for the reference. Typically, the base URI is the URI of the document containing the reference. XSLT 1.0, for example, has a document() function that, in effect, implements this functionality. RFC 3986 also formally defines URI equivalence, which can be used in order to determine that a URI reference, while not identical to the base URI, still represents the same resource and thus can be considered to be a same-document reference.
Same-document references were determined differently according to RFC 2396, which was made obsolete by RFC 3986 but still serves as the basis of many specifications and implementations. According to this specification, a URI reference is a same-document reference if it is an empty string or consists of only the "#" character followed by an optional fragment.
[edit] Relation to XML namespaces
XML has a concept of a namespace, an abstract domain to which a collection of element and attribute names can be assigned. An XML namespace is identified by a character string, the namespace name, which must adhere to the generic URI syntax. However, the namespace name is not considered to be a URI because the "URI-ness" of strings is, according to the URI specification, based on how they are intended to be used, not just their lexical components. A namespace name also does not necessarily imply any of the semantics of URI schemes; a namespace name beginning with "http:", for example, likely has nothing to do with the HTTP protocol. XML professionals have debated this intensively on the xml-dev electronic mailing list; some feel that a namespace name could be a URI, since the collection of names comprising a particular namespace could be considered to be a resource that is being identified, and since the Namespaces in XML specification says that the namespace name is a URI reference. But the consensus seems to suggest that a namespace name is just a string that happens to look like a URI, nothing more.
Initially, the namespace name was allowed to match the syntax of any non-empty URI reference, but an erratum to the "Namespaces In XML Recommendation" later deprecated the use of relative URI references. A separate specification was issued for namespaces for XML 1.1, and allows IRI references, not just URI references, to be used as the basis for namespace names.
In order to mitigate the confusion that began to arise among newcomers to XML from the use of URIs (particularly HTTP URLs) for namespaces, a descriptive language called RDDL developed, though the specification of RDDL (http://www.rddl.org/) has no official standing and has not been considered nor approved by any organization (e.g., W3C). An RDDL document can provide machine- and human-readable information about a particular namespace and about the XML documents that use it. XML document authors were encouraged[by whom?] to put RDDL documents in locations such that if a namespace name in their document was somehow dereferenced, then an RDDL document would be obtained, thus satisfying the desire among many developers for a namespace name to point to a network-accessible resource.
[edit] See also
For help on using external links on Wikipedia, see Help:URL and Wikipedia:External links
* .arpa - uri.arpa is for dynamic discovery
* Dereferenceable URI (an HTTP URI)
* History of the Internet
* IRI (Internationalized Resource Identifier)
* Namespace (programming)
* percent-encoding
* Persistent Uniform Resource Locator (PURL)
* Uniform Naming Convention (UNC), in computing
* URI scheme
* Uniform Resource Locator (URL)
* Uniform Resource Name (URN)
* Website
* XRI (Extensible Resource Identifier)
[edit] References
This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. (October 2008)
1. ^ This separate document is not explicitly linked, RFC 2717 and RFC 4395 point to the IANA registry as the official URI scheme registry.
2. ^ IANA registry of URI schemes[1]
3. ^ The httpRange-14 resolution consists of three bullet points and did not help much to reduce the confusion. http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
4. ^ http://www.w3.org/TR/cooluris/
[edit] External links
The external links in this article may not follow Wikipedia's content policies or guidelines.
Please improve this article by removing excessive or inappropriate external links.
* RFC 3986 / STD 66 (2005) – the current[update] generic URI syntax specification
* RFC 2396 (1998) and RFC 2732 (1999) – obsolete, but widely implemented, version of the generic URI syntax
* RFC 1808 (1995) – obsolete companion to RFC 1738 covering relative URL processing
* RFC 1738 (1994) – mostly obsolete definition of URL schemes and generic URI syntax
* RFC 1630 (1994) – the first generic URI syntax specification; first acknowledgment of URLs in an Internet standard
* URI Schemes – IANA-maintained registry of URI Schemes
* URI Working Group – coordination center for development of URI standards
* Architecture of the World Wide Web, Volume One, §2: Identification – by W3C
* Example of discussion about names and addresses
* W3C materials related to Addressing
* W3C URI Clarification
* What's a URI and why does it matter? (2008) - from W3C
* The Self-Describing Web (2008) - from W3C
[hide]
v • d • e
Semantic Web
Background
World Wide Web · Internet · Databases · Semantic networks · Ontologies
Sub-topics
Linked Data · Data Web · Hyperdata · Dereferenceable URIs · Ontologies · Rule bases · Data Spaces
Applications
Semantic wiki · Semantic publishing · Semantic search · Semantic advertising · Semantic reasoner · Semantic matching · Semantic mapper · Semantic broker · Semantic analytics · Semantic service oriented architecture
Related Topics
Folksonomy · Web 2.0 · Plain Old Semantic HTML · Search engine optimization · Open Database Connectivity · References · Information architecture · Knowledge management · Collective intelligence · Topic Maps · XML · Description logic
Standards
Syntax & Supporting Technologies : RDF (Notation 3 · Turtle · N-Triples) · SPARQL · URI · HTTP · XML
Schemas, Ontologies & Rules : RDFS · OWL · Rule Interchange Format · Semantic Web Rule Language
Semantic Annotation : RDFa · eRDF · GRDDL · Microformats
Common Vocabularies : FOAF · SIOC · Dublin Core · SKOS
People
Tim Berners-Lee · James Hendler · Ora Lassila · Nigel Shadbolt · Wendy Hall
Key Semantic
Web Organizations
W3C · WSRI · MIT · OpenLink Software · Talis Group · ClearForest · University o
URL normalization
From Wikipedia, the free encyclopedia
Jump to: navigation, search
URL normalization (or URL canonicalization) is the process by which URLs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URL into a normalized or canonical URL so it is possible to determine if two syntactically different URLs are equivalent.
Search engines employ URL normalization in order to assign importance to web pages and to reduce indexing of duplicate pages. Web crawlers perform URL normalization in order to avoid crawling the same resource more than once. Web browsers may perform normalization to determine if a link has been visited or to determine if a page has been cached.
Contents
[hide]
* 1 Normalization process
* 2 Normalization based on URL lists
* 3 References
* 4 See also
[edit] Normalization process
There are several type of normalization that may be performed:
* Converting the scheme and host to lower case. The scheme and host components of the URL are case-insensitive. Most normalizers will convert them to lowercase. Example:
HTTP://www.Example.com/ → http://www.example.com/
* Adding trailing / Directories are indicated with a trailing slash and should be included in URLs. Example:
http://www.example.com → http://www.example.com/
* Removing directory index. Default directory indexes are generally not needed in URLs. Examples:
http://www.example.com/default.asp → http://www.example.com/
http://www.example.com/a/index.html → http://www.example.com/a/
* Converting the entire URL to lower case. Some web servers that run on top of case-insensitive file systems allow URLs to be case-insensitive. URLs from a case-insensitive web server may be converted to lowercase to avoid ambiguity. Example:
http://www.example.com/BAR.html → http://www.example.com/bar.html
* Capitalizing letters in escape sequences. All letters within a percent-encoding triplet (e.g., "%3A") are case-insensitive, and should be capitalized. Example:
http://www.example.com/a%c2%b1b → http://www.example.com/a%C2%B1b
* Removing the fragment. The fragment component of a URL is usually removed. Example:
http://www.example.com/bar.html#section1 → http://www.example.com/bar.html
* Removing the default port. The default port (port 80 for the “http” scheme) may be removed from (or added to) a URL. Example:
http://www.example.com:80/bar.html → http://www.example.com/bar.html
* Removing dot-segments. The segments “..” and “.” are usually removed from a URL according to the algorithm described in RFC 3986 (or a similar algorithm). Example:
http://www.example.com/../a/b/../c/./d.html → http://www.example.com/a/c/d.html
* Removing “www” as the first domain label. Some websites operate in two Internet domains: one whose least significant label is “www” and another whose name is the result of omitting the least significant label from the name of the first. For example, http://example.com/ and http://www.example.com/ may access the same website. Although many websites redirect the user to the non-www address (or vice versa), some do not. A normalizer may perform extra processing to determine if there is a non-www equivalent and then normalize all URLs to the non-www prefix. Example:
http://www.example.com/ → http://example.com/
* Sorting the variables of active pages. Some active web pages have more than one variable in the URL. A normalizer can remove all the variables with their data, sort them into alphabetical order (by variable name), and reassemble the URL. Example:
http://www.example.com/display?lang=en&article=fred → http://www.example.com/display?article=fred〈=en
* Removing arbitrary querystring variables. An active page may expect certain variables to appear in the querystring; all unexpected variables should be removed. Example:
http://www.example.com/display?id=123&fakefoo=fakebar → http://www.example.com/display?id=123
* Removing default querystring variables. A default value in the querystring will render identically whether it is there or not. When a default value appears in the querystring, it should be removed. Example:
http://www.example.com/display?id=&sort=ascending → http://www.example.com/display
* Removing the "?" when the querystring is empty. When the querystring is empty, there is no need for the "?". Example:
http://www.example.com/display? → http://www.example.com/display
[edit] Normalization based on URL lists
Some normalization rules may be developed for specific websites by examining URL lists obtained from previous crawls or web server logs. For example, if the URL
http://foo.org/story?id=xyz
appears in a crawl log several times along with
http://foo.org/story_xyz
we may assume that the two URLs are equivalent and can be normalized to one of the URL forms.
Schonfeld et al. (2006) present a heuristic called DustBuster for detecting DUST (different URLs with similar text) rules that can be applied to URL lists. They showed that once the correct DUST rules were found and applied with a canonicalization algorithm, they were able to find up to 68% of the redundant URLs in a URL list.
[edit] References
* RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax
* Sang Ho Lee, Sung Jin Kim, and Seok Hoo Hong (2005). "On URL normalization". Proceedings of the International Conference on Computational Science and its Applications (ICCSA 2005): 1076-1085.
* Gautam Pant, Padmini Srinivasan, and Filippo Menczer (2004). "Crawling the Web". Web Dynamics: Adapting to Change in Content, Size, Topology and Use, edited by M. Levene and A. Poulovassilis: 153-178.
* Uri Schonfeld, Ziv Bar-Yossef, and Idit Keidar (2006). "Do not crawl in the DUST: different URLs with similar text". Proceedings of the 15th international conference on World Wide Web: 1015-1016.
* Uri Schonfeld, Ziv Bar-Yossef, and Idit Keidar (2007). "Do not crawl in the DUST: different URLs with similar text". Proceedings of the 16th international conference on World Wide Web: 111-120.
[edit] See also
* Web crawler
* Uniform Resource Locator
Retrieved from "http://en.wikipedia.org/wiki/URL_normalization"
Categories: URL | Internet search algorithms
URI scheme
From Wikipedia, the free encyclopedia
Jump to: navigation, search
In the field of computer networking, a URI scheme is the top level of the Uniform Resource Identifier (URI) naming structure. All URIs and absolute URI references are formed with a scheme name, followed by a colon character (":"), and the remainder of the URI called (in the outdated RFCs 1738 and 2396, but not the current STD 66/RFC 3986) the scheme-specific part. The syntax and semantics of the scheme-specific part are left largely to the specifications governing individual schemes, subject to certain constraints such as reserved characters and how to "escape" them.
URI schemes are sometimes erroneously referred to as "protocols", or specifically as URI protocols or URL protocols, since most were originally designed to be used with a particular protocol, and often have the same name. The http scheme, for instance, is generally used for interacting with Web resources using HyperText Transfer Protocol. Today, URIs with that scheme are also used for other purposes, such as RDF resource identifiers and XML namespaces, that are not related to the protocol. Furthermore, some URI schemes are not associated with any specific protocol (e.g. "file") and many others do not use the name of a protocol as their prefix (e.g. "news").
URI schemes should be registered with IANA, although non-registered schemes are used in practice. RFC 4395 describes the procedures for registering new URI schemes.
Contents
[hide]
* 1 Generic syntax
o 1.1 Examples
* 2 Official IANA-registered schemes
* 3 Unofficial but common URI schemes
* 4 External links
[edit] Generic syntax
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
The scheme name consists of a letter followed by any combination of letters, digits, and the plus ("+"), period ("."), or hyphen ("-") characters; and is terminated by a colon (":").
The hierarchical part of the URI is intended to hold identification information hierarchical in nature. Usually this part begins with a double forward slash ("//"), followed by an authority part and an optional path.
* The authority part holds an optional user information part terminated with "@" (e.g. username:password@), a hostname (i.e. domain name or IP address), and an optional port number preceded by a colon ":".
* The path part is a sequence of segments (conceptually similar to directories, though not necessarily representing them) separated by a forward slash ("/"). Each segment can contain parameters separated from it using a semicolon (";"), though this is rarely used in practice.
The query is an optional part separated with a question mark, which contains additional identification information which is not hierarchical in nature. The query string syntax is not generically defined, but is commonly organized as a sequence of
The fragment is an optional part separated from the front parts by a hash ("#"). It holds additional identifying information that provides direction to a secondary resource, e.g. a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id atrribute of a specific element and web browsers will make sure this element is visible.
[edit] Examples
The following are two example URIs and their component parts (taken loosely from RFC 3986 — STD 66):
foo://username:password@example.com:8042/over/there/?name=ferret#nose
\ / \________________/\_________/ \__/\_________/ \_________/ \__/
| | | | | | |
| userinfo hostname port path query fragment
| \_______________________________/
scheme authority
|
| path
| ___________|____________
/ \ / \
urn:example:animal:ferret:nose
1. ^ RFC 1866 section 8.2.1 : by Tim Berners-Lee in 1995 encourages CGI authors to support ';' in addition to '&'.
2. ^ HTML 4.01 Specification: Implementation, and Design Notes: "CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner."
3. ^ Hypertext Markup Language - 2.0 "CGI implementors are encouraged to support the use of ';' in place of '&' "
[edit] Official IANA-registered schemes
The official URI schemes registered with the IANA follow.
Scheme Purpose Defined by General format Notes
aaa Diameter Protocol RFC 3588 aaa://
example:
aaa://host.example.com:1813;transport=udp;protocol=radius
aaas Secure equivalent of aaa RFC 3588 aaas://
acap Application Configuration Access Protocol RFC 2244 acap://[
cap Calendar access protocol RFC 4324 generic syntax URL scheme used to designate both calendar stores and calendars accessible using the CAP protocol
cid Referencing individual parts of an SMTP/MIME message RFC 2392 cid:
crid TV-Anytime Content Reference Identifier RFC 4078 crid://
data Inclusion of small data items inline RFC 2397 data:
dav HTTP Extensions for Distributed Authoring (WebDAV) RFC 2518 dav: Used for internal identifiers only; WebDAV itself addresses resources using the http: and https: schemes. [1]
dict Dictionary service protocol RFC 2229 dict://
dict://
refer to definitions or word lists available using the DICT protocol
dns Domain Name System RFC 4501 dns:[//
examples:
dns:example?TYPE=A;CLASS=IN
dns://192.168.1.1/ftp.example.org?type=A
designates a DNS resource record set, referenced by domain name, class, type, and, optionally, the authority
fax Used for telefacsimile numbers RFC 2806 fax:
file Addressing files on local or network file systems RFC 1738 generic syntax
(often appears as file:///path, the 3rd '/' is the final delimiter when no host (authority) is specified between) Unusual in not being bound to any network protocol, and not usable in an Internet context.
ftp FTP resources RFC 1738 generic syntax
go Common Name Resolution Protocol RFC 3368 go://[
go:
gopher Used with Gopher protocol RFC 4266 gopher://
h323 Used with H.323 multimedia communications RFC 3508 h323:[
http HTTP resources RFC 2616 generic syntax
https HTTP connections secured using SSL/TLS RFC 2817 generic syntax
icap Internet Content Adaptation Protocol RFC 3507
im Instant messaging protocol RFC 3860 RFC 4622 im:
imap Accessing e-mail resources through IMAP RFC 2192 imap://[
info Information Assets with Identifiers in Public Namespaces RFC 4452
ipp Internet Printing Protocol RFC 3510
iris
iris.beep
iris.xpc
iris.xpcs
iris.lws Internet Registry Information Service RFC 3981 RFC 3983 RFC 4992 RFC 4992 RFC 4993
ldap LDAP directory request RFC 2255
RFC 4516 ldap://[
example:
ldap://ldap1.example.net:6666/o=University%20of%20Michigan, c=US??sub?(cn=Babs%20Jensen)
mailto SMTP e-mail addresses and default content RFC 2368 mailto:[?
example:
mailto:jsmith@example.com?subject=A%20Test&body=My%20idea%20is%3A%20%0A
Headers are optional, but often include subject=; body= can be used to pre-fill the body of the message.
mid Referencing SMTP/MIME messages, or parts of messages. RFC 2392 mid:
modem modem RFC 3966
msrp
msrps Message Session Relay Protocol RFC 4975
mtqp Message Tracking Query Protocol RFC 3887
mupdate Mailbox Update Protocol RFC 3656
news (Usenet) newsgroups and postings RFC 1738 news:
news:
nfs Network File System resources RFC 2224 generic syntax
nntp Usenet NNTP RFC 1738 nntp://
opaquelocktoken opaquelocktoken RFC 4918
pop Accessing mailbox through POP3 RFC 2384 pop://[
pres Used in Common Profile for Presence (CPP) to identify presence RFC 3859 pres:
prospero Prospero Directory Service RFC 4157 Listed as "Historical" by IANA.
rtsp Real Time Streaming Protocol RFC 2326
service RFC 2609
shttp Secure HTTP RFC 2660 Largely superseded by HTTPS.
sip Used with Session Initiation Protocol (SIP) RFC 3969
RFC 3261 sip:
examples:
sip:alice@atlanta.com?subject=project%20x&priority=urgent
sip:+1-212-555-1212:1234@gateway.com;user=phone
sips Secure equivalent of sip RFC 3969
RFC 3261 sips:
snmp Simple Network Management Protocol RFC 4088 snmp://[user@]host[:port][/[
examples:
snmp://example.com//1.3.6.1.2.1.1.3+
snmp://tester5@example.com:8161/bridge1;800002b804616263
soap.beep
soap.beeps RFC 3288
tag RFC 4151
tel Used for telephone numbers RFC 3966
RFC 2806 tel:
telnet Used with telnet RFC 4248 telnet://
tftp Trivial File Transfer Protocol RFC 3617
thismessage multipart/related relative reference resolution RFC 2557
tip Transaction Internet Protocol RFC 2371
tv TV Broadcasts RFC 2838
urn Uniform Resource Names RFC 2141 urn:
vemmi Versatile Multimedia Interface RFC 2122
wais Used with Wide area information server (WAIS) RFC 4156 wais://
xmlrpc.beep
xmlrpc.beep RFC 3529
xmpp XMPP (Jabber) RFC 5122 xmpp:
z39.50r Z39.50 retrieval RFC 2056 z39.50r://
z39.50s Z39.50 session RFC 2056 z39.50s://
[edit] Unofficial but common URI schemes
Scheme Purpose Defined by General format Notes
about Displaying product information and internal information Un-standardised
about:blank is commonly used to display a blank page. Widely used by web browsers, sometimes even providing interactive resources. The Opera web browser uses opera: instead.
adiumxtra Direct installation of Adium Xtras (plugins). The Adium Team adiumxtra://www.adiumxtras.com/download/0000 0000 refers to a specific Xtra
aim Controlling AOL Instant Messenger. AOL aim:
afp Accessing Apple Filing Protocol shares IETF Draft over TCP/IP: afp://[
over AppleTalk: afp:/at/[
aw Link to an Active Worlds world Activeworlds Inc. aw://
bolo Join an existing bolo game. bolo://
callto Launching Skype call (+And in Hungary the KLIP Software call too) (unofficial; see also skype:) Skype callto:
callto:
chrome Specifies user interfaces built using XUL in Mozilla-based browsers. Mozilla chrome://
cvs Provides a link to a Concurrent Versions System (CVS) Repository Concurrent Versions System cvs://
ed2k Resources available using the eDonkey2000 network eDonkey2000 ed2k://|file|
ed2k://|server|
feed web feed subscription feed:
feed://
examples:
feed://example.com/rss.xml
feed:https://example.com/rss.xml
See Feed URI scheme for a detailed overview of common implementations, supported software, and critics.
fish Accessing another computer's files using the SSH protocol fish KDE kioslave fish://[
gg Starting chat with Gadu-Gadu user Gadu-Gadu gg:
gizmoproject Gizmo Project calling link. gizmoproject://call?id=
iax2 Inter-Asterisk eXchange protocol version 2 IETF Draft iax2:[
examples:
iax2:[2001:db8::1]:4569/alice?friends
iax2:johnQ@example.com/12022561414
irc Connecting to a server to join a channel. IETF Draft
Old IETF Draft irc://
ircs Secure equivalent of irc IETF Draft ircs://
itms Used for connecting to the iTunes Music Store Apple Inc itms:
jar Compressed archive member Java API jar:
javascript Execute javascript code IETF Draft javascript:
keyparc Keyparc encrypt/decrypt resource. keyparc://encrypt/
keyparc://decrypt/
lastfm Connecting to a radio stream from Last.fm. Last.fm lastfm://
lastfm://user/
ldaps Secure equivalent of ldap ldaps://[
magnet "magnet links" Magnet-URI Project magnet:?xt=urn:sha1:
(other parameters are also possible) Used by various peer-to-peer clients, usually providing the hash of a file to be located on the network.
mms Windows streaming media mms://
msnim Adding a contact, or starting a conversation in Windows Live Messenger Windows Live Messenger Add a contact to the buddy list
msnim:add?contact=nada@nowhere.com
Start a conversation with a contact
msnim:chat?contact=nada@nowhere.com
Start a voice conversation with a contact
msnim:voice?contact=nada@nowhere.com
Start a video conversation with a contact
msnim:video?contact=nada@nowhere.com
Can be invoked from a web page or via a run command or an ie browser URL (won't work with firefox 2.0.0.8). For web pages use this HTML:
mvn Access Apache Maven repository artifacts OPS4J mvn:org.ops4j.pax.web.bundles/service/0.2.0-SNAPSHOT
mvn:http://user:password@repository.ops4j.org/maven2!org.ops4j.pax.web.bundles/service/0.2.0
notes Open a Lotus Notes document or database Lotus Notes notes://
psyc Used to identify or locate a person, group, place or a service and specify its ability to communicate PSYC psyc:[//
paparazzi:http Used to launch and automatically take a screen shot using the application "Paparazzi" (Mac only) Derailer paparazzi:http:[//
rmi Look up a Java object in an RMI registry. Sun rmi://
rsync rsync rsync://
secondlife Open the Map floater in Second Life application to teleport the resident to the location. Linden Lab secondlife://
sgn Social Graph Node Mapper Google example:
sgn://social-network.example.com/?ident=bob
Official documentation from sgnodemapper project.
skype Launching Skype call (official; see also callto:) Skype skype:
ssh SSH connections (like telnet:) and IETF Draft ssh://[
sftp SFTP file transfers (not be to confused with FTPS (FTP/SSL)) IETF Draft sftp://[
smb Accessing SMB/CIFS shares IETF Draft smb://[
smb://[
sms Interact with SMS capable devices for composing and sending messages. IETF draft sms:
soldat Joining servers Soldat soldat://
example:
soldat://127.0.0.1:23073/
Official note in Manual
steam Interact with Steam: install apps, purchase games, run games, etc. Steam, Valve Corporation steam:
steam://
svn Provides a link to a Subversion (SVN) source control repository Subversion (software) svn[+ssh]://
teamspeak Joining a server. TeamSpeak teamspeak://
unreal Joining servers Unreal unreal://
ut2004 Joining servers Unreal Tournament 2004 ut2004://
Tuesday, January 27, 2009
What is BIOS?
BIOS is an acronym for Basic Input/Output System. It is the boot firmware program on a PC, and controls the computer from the time you start it up until the operating system takes over. When you turn on a PC, the BIOS first conducts a basic hardware check, called a Power-On Self Test (POST), to determine whether all of the attachments are present and working. Then it loads the operating system into your computer's random access memory, or RAM.
The BIOS also manages data flow between the computer's operating system and attached devices such as the hard disk, video card, keyboard, mouse, and printer.
The BIOS stores the date, the time, and your system configuration information in a battery-powered, non-volatile memory chip, called a CMOS (Complementary Metal Oxide Semiconductor) after its manufacturing process.
Although the BIOS is standardized and should rarely require updating, some older BIOS chips may not accommodate new hardware devices. Before the early 1990s, you couldn't update the BIOS without removing and replacing its ROM chip. Contemporary BIOS resides on memory chips such as flash chips or EEPROM (Electrically Erasable Programmable Read-Only Memory), so that you can update the BIOS yourself if necessary.
220 manufacturers are listed on Wim's BIOS Page!computer for BIOS and driver updates
2 the Maxx
Aaeon
Ability
Abit
Acer
Achitec
Achme
Acma
Acorp
Adcom
ADI
ADLink Technology Inc.
Advanced Logic Research (ALR)
Advantech
Aeton Technology
AIR
Alaris
Albatron
ALD Technology
Alton
Amaquest
Amax
AMI
Amptron
AMS
Anigo
AOpen
Aotexin
Aprocom
Arima
Aristo
Arvida
ASK Technology
Asrock
AST
Asus
AT&T
Atrend
Austin Direct
AVT
Azza
BCM
BCom / ASI
BEK-Tronic Technology
Biostar
Boser
California Graphics
Chaintech
Chaplet
Chicony
Clevo
CMC
Commate
Compaq
Computrend
Daewoo
Darter
Dataexpert
Datavan International
Dell
Delta Electronics
DFI
Digicom
Digital
Domex (DTC)
DTK
Dual Tech
ECS (Elitegroup)
EFA
Elonex
ENMIC
ENPC
EPoX
ESPCo
Evalue Technology Inc
Expen Tech
Fastfame Technology Co., Ltd.
FIC (FICA)
Flagpoint
Flytech Group International
Ford Lian
Formosa Industrial Computing
Foxconn / Hoxtek (with FK)
Foxconn / Hoxtek (with FL)
Free Tech
Fugutech
Fujitsu
Full Yes
Gainward
Gateway
Gemlight
Giantec
Gigabyte
Global Circuit Technology (GCT)
Globe Legate
GVC
HighTech Information System
Holco Enterprise
Hope Vision
HP
IBM
ICP
Industrial Technology Research Institute
Informtech International
Inlog Microsystem
Intel
Inventa
Inventec
Iwill
J&J Technology
J-Bond
J-Mark
Jamicon
Jetta
Jetway
Joss Technology
Kaimei
Kapok
KINPO Electronic
Lanner Electronics
Leadtek
LeadTek/Foxconn
Lian Guan
LiPPERT Automationstechnik
LiPPERT Automationstechnik GmbH
Lucky Star
Lucky Tiger
Luckytech Technology
Macrotek
Matra
Matsonic
Maxtium Computer
Micro Leader Enterprises
Micron
Microstar
Mitac
Mustek
MyComp
Nature Worldwide Technology Corp
Nec
New Tech
Nexar
Nexcom
NMC
Ocean Office Automation
Packard Bell
Palmax
PC Chips (Hsin Tech)
PcPartner
PCWare
Pine Technology
Pionix
Polaris
Powercolor
Powertech
President (Formerly Wang Labs)
Procomp Informatics
Protech
Puretek
QDI
Quanta
Rectron
Redfox
Research Machines PLC
RioWorks Solutions
Rise Computer
RSAP Technology
S&D
San Li
Seanix
Seavo
Seritech
ShenZhen Zeling
Shuttle (Holco)
Smart D&M Technology
Soltek
Sono Computer
Sony
Sowah
Soyo
Spring Circle
Sukjung
Super Grace Electronics
Supermicro
SuperPower
T&W Electronics
Taeli (Techmedia)
Taken
Tatung
Tekram
Top Star
Toshiba
Totem Technology
Trangg Bow
Transcend
Trigem
Tsann Kuen (EUPA)
Twinhead
Tyan
U-Board
UHC Advanced Integration Research
Umax
Unicorn
Unitron
Universal Scientific Industrial
Uniwill
Vision Top
Vobis
VTech
Warp Speed
Weal Union Development
Well Join Industry
Win Lan
Winco
WinTechnologies (Edom)
Yamashita
YKM (Dayton Micron)
Zenith
Zida
Zillion
What is the difference between memory and disk storage?
Memory and disk storage both refer to internal storage space in a computer.
The term memory usually means RAM (Random Access Memory). To refer to hard drive storage, the terms disk space or storage are usually used. Typically, computers have much less memory than disk space, because RAM is much more expensive per megabyte than a hard disk.
Virtual memory is disk space that has been designated to act like RAM.
Computers also contain a small amount of ROM, or read-only memory, containing permanent or semi-permanent (firmware) instructions for checking hardware and starting up the computer. On a PC, this is called the BIOS.
What is RAM?
Overview
Random Access Memory (RAM) provides space for your computer to read and write data to be accessed by the CPU (central processing unit). When people refer to a computer's memory, they usually mean its RAM.
If you add more RAM to your computer, you reduce the number of times your CPU must read data from your hard disk. This usually allows your computer to work considerably faster, as RAM is many times faster than a hard disk.
RAM is volatile, so data stored in RAM stays there only as long as your computer is running. As soon as you turn the computer off, the data stored in RAM disappears.
When you turn your computer on again, your computer's boot firmware (called BIOS on a PC) uses instructions stored semi-permanently in ROM chips to read your operating system and related files from the disk and load them back into RAM.
SDR, DDR, DDR2, and DDR3 RAM
There are several types of RAM used in modern computers. Prior to 2002, most computers used single data rate (SDR) RAM. Most computers made since use either double data rate (DDR), DDR2, or DDR3 RAM. DDR2 is able to achieve faster transfer rates to prevent limitation of your CPU's performance, and DDR3 technology takes these advancements even further.
Note that these RAM technologies are not interchangeable. One type of RAM will not function if installed with another type, and physical differences in the RAM modules prevent them from even being inserted in the same computer.
What is ROM?
ROM is an acronym for Read-Only Memory. It refers to computer memory chips containing permanent or semi-permanent data. Unlike RAM, ROM is non-volatile; even after you turn off your computer, the contents of ROM will remain.
Almost every computer comes with a small amount of ROM containing the boot firmware. This consists of a few kilobytes of code that tell the computer what to do when it starts up, e.g., running hardware diagnostics and loading the operating system into RAM. On a PC, the boot firmware is called the BIOS.
Originally, ROM was actually read-only. To update the programs in ROM, you had to remove and physically replace your ROM chips. Contemporary versions of ROM allow some limited rewriting, so you can usually upgrade firmware such as the BIOS by using installation software. Rewritable ROM chips include PROMs (programmable read-only memory), EPROMs (erasable read-only memory), EEPROMs (electrically erasable programmable read-only memory), and a common variation of EEPROMs called flash memory.
What is CMOS?
CMOS, short for Complementary Metal Oxide Semiconductor, is a low-power, low-heat semiconductor technology used in contemporary microchips, especially useful for battery-powered devices. The specific technology is explained in detail on this CMOS definition page.
Most commonly, though, the term CMOS is used to refer to small battery-powered configuration chips on system boards of personal computers, where the BIOS stores the date, the time, and system configuration details.
On a PC, how do I reset or remove the CMOS password?
A CMOS password, if present, is one that you must enter when the computer is booting up. It comes before a network or operating system password. You will not be able to run any programs, view files, or even start the operating system if you do not enter this password.
If you'd like to change the password, you can do so by restarting your computer and entering the CMOS setup. You usually enter CMOS setup by typing a certain key or combination of keys as your computer is booting up; F2, Del, and Ctrl-Enter are common possibilities. However, your BIOS should display a line during startup explaining which key(s) to type to enter setup, BIOS setup, or CMOS setup. There should be a password option in the setup program. If you are not able to find this option, you will need to consult the manual that came with the computer or motherboard.
If you have forgotten your password, you will need to clear it by physically changing jumpers on your motherboard to short out certain pins, thereby erasing the password. The location of these pins varies from motherboard to motherboard, so you will need to refer to your computer or motherboard's manual.
Some motherboards have a default password. For example, the AMI BIOS default password is AMI. Check your computer or motherboard manual for the default password. It's worth trying this password if you don't know or have forgotten the CMOS password.
For some older computers, you must have a reference disk to make changes to the CMOS settings, including the password.
What's the difference between BIOS and CMOS?
Many people use the terms BIOS (basic input/output system) and CMOS (complementary metal oxide semiconductor) to refer to the same thing. Though they are related, they are distinct and separate components of a computer. The BIOS is the program that starts a computer up, and the CMOS is where the BIOS stores the date, time, and system configuration details it needs to start the computer.
The BIOS is a small program that controls the computer from the time it powers on until the time the operating system takes over. The BIOS is firmware, which means it cannot store variable data.
CMOS is a type of memory technology, but most people use the term to refer to the chip that stores variable data for startup. A computer's BIOS will initialize and control components like the floppy and hard drive controllers and the computer's hardware clock, but the specific parameters for startup and initializing components are stored in the CMOS.
BIOS is an acronym for Basic Input/Output System. It is the boot firmware program on a PC, and controls the computer from the time you start it up until the operating system takes over. When you turn on a PC, the BIOS first conducts a basic hardware check, called a Power-On Self Test (POST), to determine whether all of the attachments are present and working. Then it loads the operating system into your computer's random access memory, or RAM.
The BIOS also manages data flow between the computer's operating system and attached devices such as the hard disk, video card, keyboard, mouse, and printer.
The BIOS stores the date, the time, and your system configuration information in a battery-powered, non-volatile memory chip, called a CMOS (Complementary Metal Oxide Semiconductor) after its manufacturing process.
Although the BIOS is standardized and should rarely require updating, some older BIOS chips may not accommodate new hardware devices. Before the early 1990s, you couldn't update the BIOS without removing and replacing its ROM chip. Contemporary BIOS resides on memory chips such as flash chips or EEPROM (Electrically Erasable Programmable Read-Only Memory), so that you can update the BIOS yourself if necessary.
220 manufacturers are listed on Wim's BIOS Page!computer for BIOS and driver updates
2 the Maxx
Aaeon
Ability
Abit
Acer
Achitec
Achme
Acma
Acorp
Adcom
ADI
ADLink Technology Inc.
Advanced Logic Research (ALR)
Advantech
Aeton Technology
AIR
Alaris
Albatron
ALD Technology
Alton
Amaquest
Amax
AMI
Amptron
AMS
Anigo
AOpen
Aotexin
Aprocom
Arima
Aristo
Arvida
ASK Technology
Asrock
AST
Asus
AT&T
Atrend
Austin Direct
AVT
Azza
BCM
BCom / ASI
BEK-Tronic Technology
Biostar
Boser
California Graphics
Chaintech
Chaplet
Chicony
Clevo
CMC
Commate
Compaq
Computrend
Daewoo
Darter
Dataexpert
Datavan International
Dell
Delta Electronics
DFI
Digicom
Digital
Domex (DTC)
DTK
Dual Tech
ECS (Elitegroup)
EFA
Elonex
ENMIC
ENPC
EPoX
ESPCo
Evalue Technology Inc
Expen Tech
Fastfame Technology Co., Ltd.
FIC (FICA)
Flagpoint
Flytech Group International
Ford Lian
Formosa Industrial Computing
Foxconn / Hoxtek (with FK)
Foxconn / Hoxtek (with FL)
Free Tech
Fugutech
Fujitsu
Full Yes
Gainward
Gateway
Gemlight
Giantec
Gigabyte
Global Circuit Technology (GCT)
Globe Legate
GVC
HighTech Information System
Holco Enterprise
Hope Vision
HP
IBM
ICP
Industrial Technology Research Institute
Informtech International
Inlog Microsystem
Intel
Inventa
Inventec
Iwill
J&J Technology
J-Bond
J-Mark
Jamicon
Jetta
Jetway
Joss Technology
Kaimei
Kapok
KINPO Electronic
Lanner Electronics
Leadtek
LeadTek/Foxconn
Lian Guan
LiPPERT Automationstechnik
LiPPERT Automationstechnik GmbH
Lucky Star
Lucky Tiger
Luckytech Technology
Macrotek
Matra
Matsonic
Maxtium Computer
Micro Leader Enterprises
Micron
Microstar
Mitac
Mustek
MyComp
Nature Worldwide Technology Corp
Nec
New Tech
Nexar
Nexcom
NMC
Ocean Office Automation
Packard Bell
Palmax
PC Chips (Hsin Tech)
PcPartner
PCWare
Pine Technology
Pionix
Polaris
Powercolor
Powertech
President (Formerly Wang Labs)
Procomp Informatics
Protech
Puretek
QDI
Quanta
Rectron
Redfox
Research Machines PLC
RioWorks Solutions
Rise Computer
RSAP Technology
S&D
San Li
Seanix
Seavo
Seritech
ShenZhen Zeling
Shuttle (Holco)
Smart D&M Technology
Soltek
Sono Computer
Sony
Sowah
Soyo
Spring Circle
Sukjung
Super Grace Electronics
Supermicro
SuperPower
T&W Electronics
Taeli (Techmedia)
Taken
Tatung
Tekram
Top Star
Toshiba
Totem Technology
Trangg Bow
Transcend
Trigem
Tsann Kuen (EUPA)
Twinhead
Tyan
U-Board
UHC Advanced Integration Research
Umax
Unicorn
Unitron
Universal Scientific Industrial
Uniwill
Vision Top
Vobis
VTech
Warp Speed
Weal Union Development
Well Join Industry
Win Lan
Winco
WinTechnologies (Edom)
Yamashita
YKM (Dayton Micron)
Zenith
Zida
Zillion
What is the difference between memory and disk storage?
Memory and disk storage both refer to internal storage space in a computer.
The term memory usually means RAM (Random Access Memory). To refer to hard drive storage, the terms disk space or storage are usually used. Typically, computers have much less memory than disk space, because RAM is much more expensive per megabyte than a hard disk.
Virtual memory is disk space that has been designated to act like RAM.
Computers also contain a small amount of ROM, or read-only memory, containing permanent or semi-permanent (firmware) instructions for checking hardware and starting up the computer. On a PC, this is called the BIOS.
What is RAM?
Overview
Random Access Memory (RAM) provides space for your computer to read and write data to be accessed by the CPU (central processing unit). When people refer to a computer's memory, they usually mean its RAM.
If you add more RAM to your computer, you reduce the number of times your CPU must read data from your hard disk. This usually allows your computer to work considerably faster, as RAM is many times faster than a hard disk.
RAM is volatile, so data stored in RAM stays there only as long as your computer is running. As soon as you turn the computer off, the data stored in RAM disappears.
When you turn your computer on again, your computer's boot firmware (called BIOS on a PC) uses instructions stored semi-permanently in ROM chips to read your operating system and related files from the disk and load them back into RAM.
SDR, DDR, DDR2, and DDR3 RAM
There are several types of RAM used in modern computers. Prior to 2002, most computers used single data rate (SDR) RAM. Most computers made since use either double data rate (DDR), DDR2, or DDR3 RAM. DDR2 is able to achieve faster transfer rates to prevent limitation of your CPU's performance, and DDR3 technology takes these advancements even further.
Note that these RAM technologies are not interchangeable. One type of RAM will not function if installed with another type, and physical differences in the RAM modules prevent them from even being inserted in the same computer.
What is ROM?
ROM is an acronym for Read-Only Memory. It refers to computer memory chips containing permanent or semi-permanent data. Unlike RAM, ROM is non-volatile; even after you turn off your computer, the contents of ROM will remain.
Almost every computer comes with a small amount of ROM containing the boot firmware. This consists of a few kilobytes of code that tell the computer what to do when it starts up, e.g., running hardware diagnostics and loading the operating system into RAM. On a PC, the boot firmware is called the BIOS.
Originally, ROM was actually read-only. To update the programs in ROM, you had to remove and physically replace your ROM chips. Contemporary versions of ROM allow some limited rewriting, so you can usually upgrade firmware such as the BIOS by using installation software. Rewritable ROM chips include PROMs (programmable read-only memory), EPROMs (erasable read-only memory), EEPROMs (electrically erasable programmable read-only memory), and a common variation of EEPROMs called flash memory.
What is CMOS?
CMOS, short for Complementary Metal Oxide Semiconductor, is a low-power, low-heat semiconductor technology used in contemporary microchips, especially useful for battery-powered devices. The specific technology is explained in detail on this CMOS definition page.
Most commonly, though, the term CMOS is used to refer to small battery-powered configuration chips on system boards of personal computers, where the BIOS stores the date, the time, and system configuration details.
On a PC, how do I reset or remove the CMOS password?
A CMOS password, if present, is one that you must enter when the computer is booting up. It comes before a network or operating system password. You will not be able to run any programs, view files, or even start the operating system if you do not enter this password.
If you'd like to change the password, you can do so by restarting your computer and entering the CMOS setup. You usually enter CMOS setup by typing a certain key or combination of keys as your computer is booting up; F2, Del, and Ctrl-Enter are common possibilities. However, your BIOS should display a line during startup explaining which key(s) to type to enter setup, BIOS setup, or CMOS setup. There should be a password option in the setup program. If you are not able to find this option, you will need to consult the manual that came with the computer or motherboard.
If you have forgotten your password, you will need to clear it by physically changing jumpers on your motherboard to short out certain pins, thereby erasing the password. The location of these pins varies from motherboard to motherboard, so you will need to refer to your computer or motherboard's manual.
Some motherboards have a default password. For example, the AMI BIOS default password is AMI. Check your computer or motherboard manual for the default password. It's worth trying this password if you don't know or have forgotten the CMOS password.
For some older computers, you must have a reference disk to make changes to the CMOS settings, including the password.
What's the difference between BIOS and CMOS?
Many people use the terms BIOS (basic input/output system) and CMOS (complementary metal oxide semiconductor) to refer to the same thing. Though they are related, they are distinct and separate components of a computer. The BIOS is the program that starts a computer up, and the CMOS is where the BIOS stores the date, time, and system configuration details it needs to start the computer.
The BIOS is a small program that controls the computer from the time it powers on until the time the operating system takes over. The BIOS is firmware, which means it cannot store variable data.
CMOS is a type of memory technology, but most people use the term to refer to the chip that stores variable data for startup. A computer's BIOS will initialize and control components like the floppy and hard drive controllers and the computer's hardware clock, but the specific parameters for startup and initializing components are stored in the CMOS.
Subscribe to:
Posts (Atom)