/deps/http_parser/README.md
Markdown | 178 lines | 137 code | 41 blank | 0 comment | 0 complexity | 038ac7889a412803d39a1abe57c77fc7 MD5 | raw file
Possible License(s): 0BSD, BSD-3-Clause, MPL-2.0-no-copyleft-exception, GPL-2.0, ISC, Apache-2.0, MIT, AGPL-3.0
- HTTP Parser
- ===========
- This is a parser for HTTP messages written in C. It parses both requests and
- responses. The parser is designed to be used in performance HTTP
- applications. It does not make any syscalls nor allocations, it does not
- buffer data, it can be interrupted at anytime. Depending on your
- architecture, it only requires about 40 bytes of data per message
- stream (in a web server that is per connection).
- Features:
- * No dependencies
- * Handles persistent streams (keep-alive).
- * Decodes chunked encoding.
- * Upgrade support
- * Defends against buffer overflow attacks.
- The parser extracts the following information from HTTP messages:
- * Header fields and values
- * Content-Length
- * Request method
- * Response status code
- * Transfer-Encoding
- * HTTP version
- * Request URL
- * Message body
- Usage
- -----
- One `http_parser` object is used per TCP connection. Initialize the struct
- using `http_parser_init()` and set the callbacks. That might look something
- like this for a request parser:
- http_parser_settings settings;
- settings.on_path = my_path_callback;
- settings.on_header_field = my_header_field_callback;
- /* ... */
- http_parser *parser = malloc(sizeof(http_parser));
- http_parser_init(parser, HTTP_REQUEST);
- parser->data = my_socket;
- When data is received on the socket execute the parser and check for errors.
- size_t len = 80*1024, nparsed;
- char buf[len];
- ssize_t recved;
- recved = recv(fd, buf, len, 0);
- if (recved < 0) {
- /* Handle error. */
- }
- /* Start up / continue the parser.
- * Note we pass recved==0 to signal that EOF has been recieved.
- */
- nparsed = http_parser_execute(parser, &settings, buf, recved);
- if (parser->upgrade) {
- /* handle new protocol */
- } else if (nparsed != recved) {
- /* Handle error. Usually just close the connection. */
- }
- HTTP needs to know where the end of the stream is. For example, sometimes
- servers send responses without Content-Length and expect the client to
- consume input (for the body) until EOF. To tell http_parser about EOF, give
- `0` as the forth parameter to `http_parser_execute()`. Callbacks and errors
- can still be encountered during an EOF, so one must still be prepared
- to receive them.
- Scalar valued message information such as `status_code`, `method`, and the
- HTTP version are stored in the parser structure. This data is only
- temporally stored in `http_parser` and gets reset on each new message. If
- this information is needed later, copy it out of the structure during the
- `headers_complete` callback.
- The parser decodes the transfer-encoding for both requests and responses
- transparently. That is, a chunked encoding is decoded before being sent to
- the on_body callback.
- The Special Problem of Upgrade
- ------------------------------
- HTTP supports upgrading the connection to a different protocol. An
- increasingly common example of this is the Web Socket protocol which sends
- a request like
- GET /demo HTTP/1.1
- Upgrade: WebSocket
- Connection: Upgrade
- Host: example.com
- Origin: http://example.com
- WebSocket-Protocol: sample
- followed by non-HTTP data.
- (See http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-75 for more
- information the Web Socket protocol.)
- To support this, the parser will treat this as a normal HTTP message without a
- body. Issuing both on_headers_complete and on_message_complete callbacks. However
- http_parser_execute() will stop parsing at the end of the headers and return.
- The user is expected to check if `parser->upgrade` has been set to 1 after
- `http_parser_execute()` returns. Non-HTTP data begins at the buffer supplied
- offset by the return value of `http_parser_execute()`.
- Callbacks
- ---------
- During the `http_parser_execute()` call, the callbacks set in
- `http_parser_settings` will be executed. The parser maintains state and
- never looks behind, so buffering the data is not necessary. If you need to
- save certain data for later usage, you can do that from the callbacks.
- There are two types of callbacks:
- * notification `typedef int (*http_cb) (http_parser*);`
- Callbacks: on_message_begin, on_headers_complete, on_message_complete.
- * data `typedef int (*http_data_cb) (http_parser*, const char *at, size_t length);`
- Callbacks: (requests only) on_uri,
- (common) on_header_field, on_header_value, on_body;
- Callbacks must return 0 on success. Returning a non-zero value indicates
- error to the parser, making it exit immediately.
- In case you parse HTTP message in chunks (i.e. `read()` request line
- from socket, parse, read half headers, parse, etc) your data callbacks
- may be called more than once. Http-parser guarantees that data pointer is only
- valid for the lifetime of callback. You can also `read()` into a heap allocated
- buffer to avoid copying memory around if this fits your application.
- Reading headers may be a tricky task if you read/parse headers partially.
- Basically, you need to remember whether last header callback was field or value
- and apply following logic:
- (on_header_field and on_header_value shortened to on_h_*)
- ------------------------ ------------ --------------------------------------------
- | State (prev. callback) | Callback | Description/action |
- ------------------------ ------------ --------------------------------------------
- | nothing (first call) | on_h_field | Allocate new buffer and copy callback data |
- | | | into it |
- ------------------------ ------------ --------------------------------------------
- | value | on_h_field | New header started. |
- | | | Copy current name,value buffers to headers |
- | | | list and allocate new buffer for new name |
- ------------------------ ------------ --------------------------------------------
- | field | on_h_field | Previous name continues. Reallocate name |
- | | | buffer and append callback data to it |
- ------------------------ ------------ --------------------------------------------
- | field | on_h_value | Value for current header started. Allocate |
- | | | new buffer and copy callback data to it |
- ------------------------ ------------ --------------------------------------------
- | value | on_h_value | Value continues. Reallocate value buffer |
- | | | and append callback data to it |
- ------------------------ ------------ --------------------------------------------
- Parsing URLs
- ------------
- A simplistic zero-copy URL parser is provided as `http_parser_parse_url()`.
- Users of this library may wish to use it to parse URLs constructed from
- consecutive `on_url` callbacks.
- See examples of reading in headers:
- * [partial example](http://gist.github.com/155877) in C
- * [from http-parser tests](http://github.com/joyent/http-parser/blob/37a0ff8/test.c#L403) in C
- * [from Node library](http://github.com/joyent/node/blob/842eaf4/src/http.js#L284) in Javascript