PageRenderTime 44ms CodeModel.GetById 1ms app.highlight 38ms RepoModel.GetById 1ms app.codeStats 1ms

/http-parser/README.md

http://github.com/nicolasff/webdis
Markdown | 171 lines | 132 code | 39 blank | 0 comment | 0 complexity | 0cdf14d05a98e2b060a8f14525e6e3c2 MD5 | raw file
  1HTTP Parser
  2===========
  3
  4This is a parser for HTTP messages written in C. It parses both requests and
  5responses. The parser is designed to be used in performance HTTP
  6applications. It does not make any syscalls nor allocations, it does not
  7buffer data, it can be interrupted at anytime. Depending on your
  8architecture, it only requires about 40 bytes of data per message
  9stream (in a web server that is per connection).
 10
 11Features:
 12
 13  * No dependencies
 14  * Handles persistent streams (keep-alive).
 15  * Decodes chunked encoding.
 16  * Upgrade support
 17  * Defends against buffer overflow attacks.
 18
 19The parser extracts the following information from HTTP messages:
 20
 21  * Header fields and values
 22  * Content-Length
 23  * Request method
 24  * Response status code
 25  * Transfer-Encoding
 26  * HTTP version
 27  * Request path, query string, fragment
 28  * Message body
 29
 30
 31Usage
 32-----
 33
 34One `http_parser` object is used per TCP connection. Initialize the struct
 35using `http_parser_init()` and set the callbacks. That might look something
 36like this for a request parser:
 37
 38    http_parser_settings settings;
 39    settings.on_path = my_path_callback;
 40    settings.on_header_field = my_header_field_callback;
 41    /* ... */
 42
 43    http_parser *parser = malloc(sizeof(http_parser));
 44    http_parser_init(parser, HTTP_REQUEST);
 45    parser->data = my_socket;
 46
 47When data is received on the socket execute the parser and check for errors.
 48
 49    size_t len = 80*1024, nparsed;
 50    char buf[len];
 51    ssize_t recved;
 52
 53    recved = recv(fd, buf, len, 0);
 54
 55    if (recved < 0) {
 56      /* Handle error. */
 57    }
 58
 59    /* Start up / continue the parser.
 60     * Note we pass recved==0 to signal that EOF has been recieved.
 61     */
 62    nparsed = http_parser_execute(parser, &settings, buf, recved);
 63
 64    if (parser->upgrade) {
 65      /* handle new protocol */
 66    } else if (nparsed != recved) {
 67      /* Handle error. Usually just close the connection. */
 68    }
 69
 70HTTP needs to know where the end of the stream is. For example, sometimes
 71servers send responses without Content-Length and expect the client to
 72consume input (for the body) until EOF. To tell http_parser about EOF, give
 73`0` as the forth parameter to `http_parser_execute()`. Callbacks and errors
 74can still be encountered during an EOF, so one must still be prepared
 75to receive them.
 76
 77Scalar valued message information such as `status_code`, `method`, and the
 78HTTP version are stored in the parser structure. This data is only
 79temporally stored in `http_parser` and gets reset on each new message. If
 80this information is needed later, copy it out of the structure during the
 81`headers_complete` callback.
 82
 83The parser decodes the transfer-encoding for both requests and responses
 84transparently. That is, a chunked encoding is decoded before being sent to
 85the on_body callback.
 86
 87
 88The Special Problem of Upgrade
 89------------------------------
 90
 91HTTP supports upgrading the connection to a different protocol. An
 92increasingly common example of this is the Web Socket protocol which sends
 93a request like
 94
 95        GET /demo HTTP/1.1
 96        Upgrade: WebSocket
 97        Connection: Upgrade
 98        Host: example.com
 99        Origin: http://example.com
100        WebSocket-Protocol: sample
101
102followed by non-HTTP data.
103
104(See http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-75 for more
105information the Web Socket protocol.)
106
107To support this, the parser will treat this as a normal HTTP message without a
108body. Issuing both on_headers_complete and on_message_complete callbacks. However
109http_parser_execute() will stop parsing at the end of the headers and return.
110
111The user is expected to check if `parser->upgrade` has been set to 1 after
112`http_parser_execute()` returns. Non-HTTP data begins at the buffer supplied
113offset by the return value of `http_parser_execute()`.
114
115
116Callbacks
117---------
118
119During the `http_parser_execute()` call, the callbacks set in
120`http_parser_settings` will be executed. The parser maintains state and
121never looks behind, so buffering the data is not necessary. If you need to
122save certain data for later usage, you can do that from the callbacks.
123
124There are two types of callbacks:
125
126* notification `typedef int (*http_cb) (http_parser*);`
127    Callbacks: on_message_begin, on_headers_complete, on_message_complete.
128* data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);`
129    Callbacks: (requests only) on_path, on_query_string, on_uri, on_fragment,
130               (common) on_header_field, on_header_value, on_body;
131
132Callbacks must return 0 on success. Returning a non-zero value indicates
133error to the parser, making it exit immediately.
134
135In case you parse HTTP message in chunks (i.e. `read()` request line
136from socket, parse, read half headers, parse, etc) your data callbacks
137may be called more than once. Http-parser guarantees that data pointer is only
138valid for the lifetime of callback. You can also `read()` into a heap allocated
139buffer to avoid copying memory around if this fits your application.
140
141Reading headers may be a tricky task if you read/parse headers partially.
142Basically, you need to remember whether last header callback was field or value
143and apply following logic:
144
145    (on_header_field and on_header_value shortened to on_h_*)
146     ------------------------ ------------ --------------------------------------------
147    | State (prev. callback) | Callback   | Description/action                         |
148     ------------------------ ------------ --------------------------------------------
149    | nothing (first call)   | on_h_field | Allocate new buffer and copy callback data |
150    |                        |            | into it                                    |
151     ------------------------ ------------ --------------------------------------------
152    | value                  | on_h_field | New header started.                        |
153    |                        |            | Copy current name,value buffers to headers |
154    |                        |            | list and allocate new buffer for new name  |
155     ------------------------ ------------ --------------------------------------------
156    | field                  | on_h_field | Previous name continues. Reallocate name   |
157    |                        |            | buffer and append callback data to it      |
158     ------------------------ ------------ --------------------------------------------
159    | field                  | on_h_value | Value for current header started. Allocate |
160    |                        |            | new buffer and copy callback data to it    |
161     ------------------------ ------------ --------------------------------------------
162    | value                  | on_h_value | Value continues. Reallocate value buffer   |
163    |                        |            | and append callback data to it             |
164     ------------------------ ------------ --------------------------------------------
165
166
167See examples of reading in headers:
168
169* [partial example](http://gist.github.com/155877) in C
170* [from http-parser tests](http://github.com/ry/http-parser/blob/37a0ff8928fb0d83cec0d0d8909c5a4abcd221af/test.c#L403) in C
171* [from Node library](http://github.com/ry/node/blob/842eaf446d2fdcb33b296c67c911c32a0dabc747/src/http.js#L284) in Javascript