/tensorflow/tensorboard/http_api.md
Markdown | 344 lines | 281 code | 63 blank | 0 comment | 0 complexity | 8c26a3ae26e1806607721212bdb0f9d2 MD5 | raw file
- # Tensorboard client-server HTTP API
- ## Runs, Tags, and Tag Types
- TensorBoard data is organized around the concept of a `run`, which represents
- all the related data thrown off by a single execution of TensorFlow, a `tag`,
- which groups values of data that come from the same source within a TensorFlow
- run, and `tag types`, which are our way of distinguishing different types of
- data that have fundamentally different representations and should be processed
- on different code paths. For example, a "train" run may have a `scalars`
- tag that represents the learning rate, another `scalars` tag that
- represents the value of the objective function, a `histograms` tag that reveals
- information on weights in a particular layer over time, and an `images` tag that
- shows input images flowing into the system. The "eval" run might have an
- entirely different set of tag names, or some duplicated tag names.
- The currently supported tag types are `scalars`, `images`, `audio`,
- `histograms`, `graph` and `run_metadata`. Each tag type corresponds to a route
- (documented below) for retrieving tag data of that type.
- All of the data provided comes from TensorFlow events files ('\*.tfevents\*'),
- which are written using the SummaryWriter class
- (tensorflow/python/training/summary_writer.py), and the data is generated by
- summary ops (tensorflow/python/ops/summary_ops.py). The `scalars` come from the
- `ScalarSummary` op, the `histograms` from the `HistogramSummary`, the `audio`
- from the `AudioSummary`, and the `images` from `ImageSummary`. The tag type
- `graph` is special in that it is not a collection of tags of that type, but a
- boolean denoting if there is a graph definition associated with the run. The tag
- is provided to the summary op (usually as a constant).
- ## `data/runs`
- Returns a dictionary mapping from `run name` (quoted string) to dictionaries
- mapping from all available tagTypes to a list of tags of that type available for
- the run. Think of this as a comprehensive index of all of the data available
- from the TensorBoard server. Here is an example:
- {
- "train_run": {
- "histograms": ["foo_histogram", "bar_histogram"],
- "compressedHistograms": ["foo_histogram", "bar_histogram"],
- "scalars": ["xent", "loss", "learning_rate"],
- "images": ["input"],
- "audio": ["input_audio"],
- "graph": true,
- "run_metadata": ["forward prop", "inference"]
- },
- "eval": {
- "histograms": ["foo_histogram", "bar_histogram"],
- "compressedHistograms": ["foo_histogram", "bar_histogram"],
- "scalars": ["precision", "recall"],
- "images": ["input"],
- "audio": ["input_audio"],
- "graph": false,
- "run_metadata": []
- }
- }
- Note that the same tag may be present for many runs. It is not guaranteed that
- they will have the same meaning across runs. It is also not guaranteed that they
- will have the same tag type across different runs.
- ## '/data/scalars?run=foo&tag=bar'
- Returns an array of event_accumulator.SimpleValueEvents ([wall_time, step,
- value]) for the given run and tag. wall_time is seconds since epoch.
- Example:
- [
- [1443856985.705543, 1448, 0.7461960315704346], # wall_time, step, value
- [1443857105.704628, 3438, 0.5427092909812927],
- [1443857225.705133, 5417, 0.5457325577735901],
- ...
- ]
- If the format parameter is set to 'csv', the response will instead be in CSV
- format:
- Wall time,step,value
- 1443856985.705543,1448,0.7461960315704346
- 1443857105.704628,3438,0.5427092909812927
- 1443857225.705133,5417,0.5457325577735901
- ## '/data/scalars?[sample_count=10]'
- Without any parameters, returns a dictionary mapping from run name to a
- dictionary mapping from tag name to a sampled list of scalars from that run and
- tag. The values are given in the same format as when the run and tag are
- specified. For example:
- {
- "train_run": {
- "my_tag": [
- [1443856985.705543, 1448, 0.7461960315704346],
- [1443857105.704628, 3438, 0.5427092909812927],
- [1443857225.705133, 5417, 0.5457325577735901]
- ]
- }
- }
- The samples are distributed uniformly over the list of values. The sample_count
- parameter is optional and defaults to 10; it must be at least 2. The first and
- the last value will always be sampled.
- ## '/data/histograms?run=foo&tag=bar'
- Returns an array of event_accumulator.HistogramEvents ([wall_time, step,
- HistogramValue]) for the given run and tag. A HistogramValue is [min, max, num,
- sum, sum_squares, bucket_limit, bucket]. wall_time is seconds since epoch.
- Annotated Example: (note - real data is higher precision)
- [
- [
- 1443871386.185149, # wall_time
- 235166, # step
- [
- -0.66, # minimum value
- 0.44, # maximum value
- 8.0, # number of items in the histogram
- -0.80, # sum of items in the histogram
- 0.73, # sum of squares of items in the histogram
- [-0.68, -0.62, -0.292, -0.26, -0.11, -0.10, -0.08, -0.07, -0.05,
- -0.0525, -0.0434, -0.039, -0.029, -0.026, 0.42, 0.47, 1.8e+308],
- # the right edge of each bucket
- [0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0,
- 1.0, 0.0] # the number of elements within each bucket
- ]
- ]
- ]
- ## '/data/compressedHistograms?run=foo&tag=bar'
- Returns an array of event_accumulator.CompressedHistogramEvents ([wall_time,
- step, CompressedHistogramValues]) for the given run and tag.
- CompressedHistogramValues is a list of namedtuples with each tuple specifying
- a basis point (bps) as well as an interpolated value of the histogram value
- at that basis point. A basis point is 1/100 of a percent.
- The current compression strategy is to choose basis points that correspond to
- the median and bands of 1SD, 2SD, and 3SDs around the median. Note that the
- current compression strategy does not work well for representing multimodal
- data -- this is something that will be improved in a later iteration.
- Annotated Example: (note - real data is higher precision)
- [
- [
- 1441154832.580509, # wall_time
- 5, # step
- [ [0, -3.67], # CompressedHistogramValue for 0th percentile
- [2500, -4.19], # CompressedHistogramValue for 25th percentile
- [5000, 6.29],
- [7500, 1.64],
- [10000, 3.67]
- ]
- ],
- ...
- ]
- ## `/data/images?run=foo&tag=bar`
- Gets a sample of ImageMetadatas for the given run and tag.
- Returns an array of objects containing information about available images,
- crucially including the query parameter that may be used to retrieve that image.
- (See /individualImage for details.)
- For example:
- {
- "width": 28, # width in pixels
- "height": 28, # height in pixels
- "wall_time": 1440210599.246, # time in seconds since epoch
- "step": 63702821, # number of steps that have passed
- "query": "index=0&tagname=input%2Fimage%2F2&run=train"
- # param for /individualImage
- }
- ## `/data/individualImage?{{query}}`
- Retrieves an individual image. The image query should not be generated by the
- frontend, but instead acquired from calling the /images route (the image
- metadata objects contain the query to use). The response is the image itself
- with mime-type 'image/png'.
- Note that the query is not guaranteed to always refer to the same image even
- within a single run, as images may be removed from the sampling reservoir and
- replaced with other images. (See Notes for details on the reservoir sampling.)
- An example call to this route would look like this:
- /individualImage?index=0&tagname=input%2Fimage%2F2&run=train
- ## `/audio?run=foo&tag=bar`
- Gets a sample of AudioMetadatas for the given run and tag.
- Returns an array of objects containing information about available audio,
- crucially including the query parameter that may be used to retrieve that audio.
- (See /individualAudio for details.)
- For example:
- {
- "wall_time": 1440210599.246, # time in seconds since epoch
- "step": 63702821, # number of steps that have passed
- "content_type": "audio/wav" # the MIME-type of the audio
- "query": "index=0&tagname=input%2Faudio%2F2&run=train"
- # param for /individualAudio
- }
- ## `/individualAudio?{{query}}`
- Retrieves an individual audio clip. The audio query should not be generated by
- the frontend, but instead acquired from calling the /audio route (the audio
- metadata objects contain the query to use). The response is the audio itself
- with an appropriate Content-Type header set.
- Note that the query is not guaranteed to always refer to the same clip even
- within a single run, as audio may be removed from the sampling reservoir and
- replaced with other clips. (See Notes for details on the reservoir sampling.)
- An example call to this route would look like this:
- /individualAudio?index=0&tagname=input%2Faudio%2F2&run=train
- ## `/data/graph?run=foo&limit_attr_size=1024&large_attrs_key=key`
- Returns the graph definition for the given run in gzipped pbtxt format. The
- graph is composed of a list of nodes, where each node is a specific TensorFlow
- operation which takes as inputs other nodes (operations).
- The query parameters `limit_attr_size` and `large_attrs_key` are optional.
- `limit_attr_size` specifies the maximum allowed size in bytes, before the
- attribute is considered large and filtered out of the graph. If specified,
- it must be an int and > 0. If not specified, no filtering is applied.
- `large_attrs_key` is the attribute key that will be used for storing
- attributes that are too large. The value of this key (list of strings)
- should be used by the client in order to determine which attributes
- have been filtered. Must be specified if `limit_attr_size` is specified.
- For the query `/graph?run=foo&limit_attr_size=1024&large_attrs_key=_too_large`,
- here is an example pbtxt response of a graph with 3 nodes, where the second
- node had two large attributes "a" and "b" that were filtered out (size > 1024):
- node {
- op: "Input"
- name: "A"
- }
- node {
- op: "Input"
- name: "B"
- attr {
- key: "small_attr"
- value: {
- s: "some string"
- }
- }
- attr {
- key: "_too_large"
- value {
- list {
- s: "a"
- s: "b"
- }
- }
- }
- }
- node {
- op: "MatMul"
- name: "C"
- input: "A"
- input: "B"
- }
- Prior to filtering, the original node "B" had the following content:
- node {
- op: "Input"
- name: "B"
- attr {
- key: "small_attr"
- value: {
- s: "some string"
- }
- }
- attr {
- key: "a"
- value { Very large object... }
- }
- attr {
- key: "b"
- value { Very large object... }
- }
- }
- ## `/data/run_metadata?run=foo&tag=bar`
- Given a run and tag, returns the metadata of a particular
- `session.run()` as a gzipped, pbtxt serialized [`RunMetadata`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto)
- proto. For example:
- step_stats {
- dev_stats {
- device: "/job:localhost/replica:0/task:0/cpu:0"
- node_stats {
- node_name: "_SOURCE"
- all_start_micros: 1458337695775395
- op_start_rel_micros: 11
- op_end_rel_micros: 12
- all_end_rel_micros: 38
- memory {
- allocator_name: "cpu"
- }
- timeline_label: "_SOURCE = NoOp()"
- scheduled_micros: 1458337695775363
- }
- }
- }
- ## Notes
- All returned values, histograms, audio, and images are returned in the order
- they were written by Tensorflow (which should correspond to increasing
- `wall_time` order, but may not necessarily correspond to increasing step count
- if the process had to restart from a previous checkpoint).
- The returned values may be downsampled using reservoir sampling, which is
- configurable by the TensorBoard server. When downsampling occurs, the server
- guarantees that different tags will all sample at the same sequence of indices,
- so that if there are two tags `A` and `B` which are related so that `A[i] ~
- B[i]` for all `i`, then `D(A)[i] ~ D(B)[i]` for all `i`, where `D` represents
- the downsampling operation.
- The reservoir sampling puts an upper bound on the number of items that will be
- returned for a given run-tag combination, and guarantees that all items are
- equally likely to be in the final sample (ie it is a uniform distribution over
- the values), with the proviso that the most recent individual item is always
- included in the sample.
- The reservoir sizes are configurable on a per-tag type basis.