<< Click to Display Table of Contents >> Navigation: Appendix > DOI Name Encoding > DOI Name Encoding Rules for URL Presentation |
Hex-encoding must be used when presenting a DOI name as a Uniform Resource Locator (URL) if the DOI name contains characters that are not allowed, or have other meanings, in the URL application context. Hex-encoding consists of substituting for the given character its hexadecimal value preceded by percent. For example, # becomes %23 and https://doi.org/10.1000/456#789 is encoded as https://doi.org/10.1000/456%23789 (Thus, the browser does not now encounter the bare #, which it would normally treat as the end of the URL and the start of a fragment, and so sends the entire string off to the DOI network of servers for resolution, instead of stopping at the #.).
The table below lists the mandatory and recommended hex-encoding rules (the recommendation was established based on a practical experience of the current web browsers).
Character |
Encoding |
---|---|
Mandatory Rules |
|
% |
%25 |
" |
%22 |
# |
%23 |
SPACE |
%20 |
? |
%3F |
Recommended Rules |
|
< |
%3C |
> |
%3E |
{ |
%7B |
} |
%7D |
^ |
%5E |
[ |
%5B |
] |
%5D |
` |
%60 |
| |
%7C |
\ |
%5C |
+ |
%2B |
, (only necessary in a Which RA service request context) |
%2C |
NOTE The web browser treatment of /./ and /../ can be inconsistent. It is recommended that one of the slashes be percent encoded, for example, /./ is changed to /.%2F and /../ to /..%2F.
NOTE To enable the use of DOI names in workflows that have already standardized on URNs, the DOI proxy servers understand the substitution of a colon in place of the initial slash in a DOI name. DOI names may therefore be expressed as URNs in the doi.org domain by writing, for example, the DOI name 10.123/456 in the form https://doi.org/urn:doi:10.123:456. However, a DOI suffix is allowed to contain other slashes, and where these occur they must be hex-encoded rather than replaced with a colon: for example, the DOI name 10.123/456ABC/zyz would become https://doi.org/urn:doi:10.123:456ABC%2Fzyz, with the final slash character encoded as %2F.