PageRenderTime 17ms CodeModel.GetById 6ms app.highlight 4ms RepoModel.GetById 1ms app.codeStats 0ms

/share/man/man9/zero_copy.9

https://bitbucket.org/freebsd/freebsd-head/
Unknown | 168 lines | 168 code | 0 blank | 0 comment | 0 complexity | d2024edc8d67dcfff7ce335976c488d5 MD5 | raw file
  1.\"
  2.\" Copyright (c) 2002 Kenneth D. Merry.
  3.\" All rights reserved.
  4.\"
  5.\" Redistribution and use in source and binary forms, with or without
  6.\" modification, are permitted provided that the following conditions
  7.\" are met:
  8.\" 1. Redistributions of source code must retain the above copyright
  9.\"    notice, this list of conditions, and the following disclaimer,
 10.\"    without modification, immediately at the beginning of the file.
 11.\" 2. The name of the author may not be used to endorse or promote products
 12.\"    derived from this software without specific prior written permission.
 13.\"
 14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 17.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
 18.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 24.\" SUCH DAMAGE.
 25.\"
 26.\" $FreeBSD$
 27.\"
 28.Dd December 5, 2004
 29.Dt ZERO_COPY 9
 30.Os
 31.Sh NAME
 32.Nm zero_copy ,
 33.Nm zero_copy_sockets
 34.Nd "zero copy sockets code"
 35.Sh SYNOPSIS
 36.Cd "options ZERO_COPY_SOCKETS"
 37.Sh DESCRIPTION
 38The
 39.Fx
 40kernel includes a facility for eliminating data copies on
 41socket reads and writes.
 42.Pp
 43This code is collectively known as the zero copy sockets code, because during
 44normal network I/O, data will not be copied by the CPU at all.
 45Rather it
 46will be DMAed from the user's buffer to the NIC (for sends), or DMAed from
 47the NIC to a buffer that will then be given to the user (receives).
 48.Pp
 49The zero copy sockets code uses the standard socket read and write
 50semantics, and therefore has some limitations and restrictions that
 51programmers should be aware of when trying to take advantage of this
 52functionality.
 53.Pp
 54For sending data, there are no special requirements or capabilities that
 55the sending NIC must have.
 56The data written to the socket, though, must be
 57at least a page in size and page aligned in order to be mapped into the
 58kernel.
 59If it does not meet the page size and alignment constraints, it
 60will be copied into the kernel, as is normally the case with socket I/O.
 61.Pp
 62The user should be careful not to overwrite buffers that have been written
 63to the socket before the data has been freed by the kernel, and the
 64copy-on-write mapping cleared.
 65If a buffer is overwritten before it has
 66been given up by the kernel, the data will be copied, and no savings in CPU
 67utilization and memory bandwidth utilization will be realized.
 68.Pp
 69The
 70.Xr socket 2
 71API does not really give the user any indication of when his data has
 72actually been sent over the wire, or when the data has been freed from
 73kernel buffers.
 74For protocols like TCP, the data will be kept around in
 75the kernel until it has been acknowledged by the other side; it must be
 76kept until the acknowledgement is received in case retransmission is required.
 77.Pp
 78From an application standpoint, the best way to guarantee that the data has
 79been sent out over the wire and freed by the kernel (for TCP-based sockets)
 80is to set a socket buffer size (see the
 81.Dv SO_SNDBUF
 82socket option in the
 83.Xr setsockopt 2
 84manual page) appropriate for the application and network environment and then
 85make sure you have sent out twice as much data as the socket buffer size
 86before reusing a buffer.
 87For TCP, the send and receive socket buffer sizes
 88generally directly correspond to the TCP window size.
 89.Pp
 90For receiving data, in order to take advantage of the zero copy receive
 91code, the user must have a NIC that is configured for an MTU greater than
 92the architecture page size.
 93(E.g., for i386 it would be 4KB.)
 94Additionally, in order for zero copy receive to work,
 95packet payloads must be at least a page in size and page aligned.
 96.Pp
 97Achieving page aligned payloads requires a NIC that can split an incoming
 98packet into multiple buffers.
 99It also generally requires some sort of
100intelligence on the NIC to make sure that the payload starts in its own
101buffer.
102This is called
103.Dq "header splitting" .
104Currently the only NICs with
105support for header splitting are Alteon Tigon 2 based boards running
106slightly modified firmware.
107The
108.Fx
109.Xr ti 4
110driver includes modified firmware for Tigon 2 boards only.
111Header
112splitting code can be written, however, for any NIC that allows putting
113received packets into multiple buffers and that has enough programmability
114to determine that the header should go into one buffer and the payload into
115another.
116.Pp
117You can also do a form of header splitting that does not require any NIC
118modifications if your NIC is at least capable of splitting packets into
119multiple buffers.
120This requires that you optimize the NIC driver for your
121most common packet header size.
122If that size (ethernet + IP + TCP headers)
123is generally 66 bytes, for instance, you would set the first buffer in a
124set for a particular packet to be 66 bytes long, and then subsequent
125buffers would be a page in size.
126For packets that have headers that are
127exactly 66 bytes long, your payload will be page aligned.
128.Pp
129The other requirement for zero copy receive to work is that the buffer that
130is the destination for the data read from a socket must be at least a page
131in size and page aligned.
132.Pp
133Obviously the requirements for receive side zero copy are impossible to
134meet without NIC hardware that is programmable enough to do header
135splitting of some sort.
136Since most NICs are not that programmable, or their
137manufacturers will not share the source code to their firmware, this approach
138to zero copy receive is not widely useful.
139.Pp
140There are other approaches, such as RDMA and TCP Offload, that may
141potentially help alleviate the CPU overhead associated with copying data
142out of the kernel.
143Most known techniques require some sort of support at
144the NIC level to work, and describing such techniques is beyond the scope
145of this manual page.
146.Pp
147The zero copy send and zero copy receive code can be individually turned
148off via the
149.Va kern.ipc.zero_copy.send
150and
151.Va kern.ipc.zero_copy.receive
152.Nm sysctl
153variables respectively.
154.Sh SEE ALSO
155.Xr sendfile 2 ,
156.Xr socket 2 ,
157.Xr ti 4
158.Sh HISTORY
159The zero copy sockets code first appeared in
160.Fx 5.0 ,
161although it has
162been in existence in patch form since at least mid-1999.
163.Sh AUTHORS
164.An -nosplit
165The zero copy sockets code was originally written by
166.An Andrew Gallatin Aq gallatin@FreeBSD.org
167and substantially modified and updated by
168.An Kenneth Merry Aq ken@FreeBSD.org .