/racket-5-0-2-bin-i386-osx-mac-dmg/collects/net/scribblings/uri-codec.scrbl

http://github.com/smorin/f4f.arc · Racket · 160 lines · 122 code · 38 blank · 0 comment · 9 complexity · ac01ac4e61811c0c4ac70f0d3c33322f MD5 · raw file

  1. #lang scribble/doc
  2. @(require "common.ss"
  3. scribble/bnf
  4. scribble/eval
  5. (for-label net/url
  6. net/uri-codec
  7. net/uri-codec-unit
  8. net/uri-codec-sig))
  9. @(define uri-codec-eval (make-base-eval))
  10. @interaction-eval[#:eval uri-codec-eval (require net/uri-codec)]
  11. @title[#:tag "uri-codec"]{URI Codec: Encoding and Decoding URIs}
  12. @defmodule[net/uri-codec]{The @schememodname[net/uri-codec] module
  13. provides utilities for encoding and decoding strings using the URI
  14. encoding rules given in RFC 2396 @cite["RFC2396"], and to encode and
  15. decode name/value pairs using the
  16. @tt{application/x-www-form-urlencoded} mimetype given the in HTML 4.0
  17. specification. There are minor differences between the two encodings.}
  18. The URI encoding uses allows a few characters to be represented as-is:
  19. @litchar{a} through @litchar{z}, @litchar{A} through @litchar{Z},
  20. @litchar{0}-@litchar{9}, @litchar{-}, @litchar{_}, @litchar{.},
  21. @litchar{!}, @litchar{~}, @litchar{*}, @litchar{'}, @litchar{(} and
  22. @litchar{)}. The remaining characters are encoded as
  23. @litchar{%}@nonterm{xx}, where @nonterm{xx} is the two-character hex
  24. representation of the integer value of the character (where the
  25. mapping character--integer is determined by US-ASCII if the integer is
  26. less than 128).
  27. The encoding, in line with RFC 2396's recommendation, represents a
  28. character as-is, if possible. The decoding allows any characters
  29. to be represented by their hex values, and allows characters to be
  30. incorrectly represented as-is.
  31. The rules for the @tt{application/x-www-form-urlencoded} mimetype
  32. given in the HTML 4.0 spec are:
  33. @itemize[
  34. @item{Control names and values are escaped. Space characters are
  35. replaced by @litchar{+}, and then reserved characters are escaped as
  36. described in RFC 1738, section 2.2: Non-alphanumeric characters are
  37. replaced by @litchar{%}@nonterm{xx} representing the ASCII code of
  38. the character. Line breaks are represented as CRLF pairs:
  39. @litchar{%0D%0A}. Note that RFC 2396 supersedes RFC 1738
  40. @cite["RFC1738"].}
  41. @item{The control names/values are listed in the order they appear
  42. in the document. The name is separated from the value by @litchar{=}
  43. and name/value pairs are separated from each other by either
  44. @litchar{;} or @litchar{&}. When encoding, @litchar{;} is used as
  45. the separator by default. When decoding, both @litchar{;} and
  46. @litchar{&} are parsed as separators by default.}
  47. ]
  48. These rules differs slightly from the straight encoding in RFC 2396 in
  49. that @litchar{+} is allowed, and it represents a space. The
  50. @schememodname[net/uri-codec] library follows this convention,
  51. encoding a space as @litchar{+} and decoding @litchar{+} as a space.
  52. In addtion, since there appear to be some brain-dead decoders on the
  53. web, the library also encodes @litchar{!}, @litchar{~}, @litchar{'},
  54. @litchar{(}, and @litchar{)} using their hex representation, which is
  55. the same choice as made by the Java's @tt{URLEncoder}.
  56. @; ----------------------------------------
  57. @section[#:tag "uri-codec-proc"]{Functions}
  58. @defproc[(uri-encode [str string?]) string?]{
  59. Encode a string using the URI encoding rules.}
  60. @defproc[(uri-decode [str string?]) string?]{
  61. Decode a string using the URI decoding rules.}
  62. @defproc[(uri-path-segment-encode [str string?]) string?]{
  63. Encodes a string according to the rules in @cite["RFC3986"] for path segments.
  64. }
  65. @defproc[(uri-path-segment-decode [str string?]) string?]{
  66. Decodes a string according to the rules in @cite["RFC3986"] for path segments.
  67. }
  68. @defproc[(uri-userinfo-encode [str string?]) string?]{
  69. Encodes a string according to the rules in @cite["RFC3986"] for the userinfo field.
  70. }
  71. @defproc[(uri-userinfo-decode [str string?]) string?]{
  72. Decodes a string according to the rules in @cite["RFC3986"] for the userinfo field.
  73. }
  74. @defproc[(form-urlencoded-encode [str string?]) string?]{
  75. Encode a string using the @tt{application/x-www-form-urlencoded}
  76. encoding rules. The result string contains no non-ASCII characters.}
  77. @defproc[(form-urlencoded-decode [str string?]) string?]{
  78. Decode a string encoded using the
  79. @tt{application/x-www-form-urlencoded} encoding rules.}
  80. @defproc[(alist->form-urlencoded [alist (listof (cons/c symbol? string?))])
  81. string?]{
  82. Encode an association list using the
  83. @tt{application/x-www-form-urlencoded} encoding rules.
  84. The @scheme[current-alist-separator-mode] parameter determines the
  85. separator used in the result.}
  86. @defproc[(form-urlencoded->alist [str string])
  87. (listof (cons/c symbol? string?))]{
  88. Decode a string encoded using the
  89. @tt{application/x-www-form-urlencoded} encoding rules into an
  90. association list. All keys are case-folded for conversion to symbols.
  91. The @scheme[current-alist-separator-mode] parameter determines the way
  92. that separators are parsed in the input.}
  93. @defparam[current-alist-separator-mode mode
  94. (one-of/c 'amp 'semi 'amp-or-semi 'semi-or-amp)]{
  95. A parameter that determines the separator used/recognized between
  96. associations in @scheme[form-urlencoded->alist],
  97. @scheme[alist->form-urlencoded], @scheme[url->string], and
  98. @scheme[string->url].
  99. The default value is @scheme['amp-or-semi], which means that both
  100. @litchar{&} and @litchar{;} are treated as separators when parsing,
  101. and @litchar{&} is used as a separator when encoding. The other modes
  102. use/recognize only of the separators.
  103. @examples[
  104. #:eval uri-codec-eval
  105. (define ex '((x . "foo") (y . "bar") (z . "baz")))
  106. (code:line (current-alist-separator-mode 'amp) (code:comment @#,t{try @scheme['amp]...}))
  107. (form-urlencoded->alist "x=foo&y=bar&z=baz")
  108. (form-urlencoded->alist "x=foo;y=bar;z=baz")
  109. (alist->form-urlencoded ex)
  110. (code:line (current-alist-separator-mode 'semi) (code:comment @#,t{try @scheme['semi]...}))
  111. (form-urlencoded->alist "x=foo;y=bar;z=baz")
  112. (form-urlencoded->alist "x=foo&y=bar&z=baz")
  113. (alist->form-urlencoded ex)
  114. (code:line (current-alist-separator-mode 'amp-or-semi) (code:comment @#,t{try @scheme['amp-or-semi]...}))
  115. (form-urlencoded->alist "x=foo&y=bar&z=baz")
  116. (form-urlencoded->alist "x=foo;y=bar;z=baz")
  117. (alist->form-urlencoded ex)
  118. (code:line (current-alist-separator-mode 'semi-or-amp) (code:comment @#,t{try @scheme['semi-or-amp]...}))
  119. (form-urlencoded->alist "x=foo&y=bar&z=baz")
  120. (form-urlencoded->alist "x=foo;y=bar;z=baz")
  121. (alist->form-urlencoded ex)
  122. ]}