Peter Poeml <poeml@suse.de> wrote:
> > http://www.kuro5hin.org/?op=displaystory;sid=2000/10/20/24336/134
>
> Normally, the separator character between two arguments appended to a
> URL in a GET request is an ampersand (&).
>
> This leads me to saying that the URL you quoted is not correct.
No, it just means that the semicolon is not an argument separator.
There is one argument, whose name is "op" and whose value is
"displaystory;sid=2000/10/20/24336/134".
Norman Walsh <ndw@nwalsh.com> wrote:
> RFC 1738 has been replaced by RFC 2396
Updated by, not replaced. RFC 2396 is the latest authority on generic
URI syntax, but RFC 1738 still has things to say about particular kinds
of URLs.
> which describes the semicolon character in section 3.3:
>
> The path may consist of a sequence of path segments separated by a
> single slash "/" character. Within a path segment, the characters
> "/", ";", "=", and "?" are reserved. Each path segment may include
> a sequence of parameters, indicated by the semicolon ";" character.
This is describing the role of semicolon in the path component, which is
irrelevant to the URL above, because in that URL the semicolon appears
in the query-string, not the path.
Furthermore, the whole issue of which characters are reserved and what
they mean is irrelevant to this discussion. If a character is reserved,
it serves as a delimiter; if it's not reserved, it can appear as a
regular character; either way it might appear.
RFC 2396 defines the set of characters allowed in URIs, but even that is
not terribly useful for finding URIs in plain text, because invalid URIs
containing disallowed ASCII characters are often used and they usually
work.
There is no perfectly correct way to find URIs in plain text; you can
only use heuristics. Here are some empirical observations I've found
helpful:
URIs are often enclosed in single quotes, double quotes, angle brackets,
or parentheses.
URIs never contain single quotes, double quotes, or angle brackets (but
they do sometimes contain parentheses).
URIs never contain whitespace except when that whitespace includes a
newline.
Even when the beginning of the URI is left off, the URI almost always
begins with a letter.
URIs usually end in a letter, digit, slash, or underscore. (When they
end in a question mark (empty query-string) or an ampersand (empty
argument), losing that character will probably have no effect.)
AMC
This archive was generated by hypermail 2b29 : Tue Oct 24 2000 - 19:09:30 CDT