PageRenderTime 35ms CodeModel.GetById 16ms app.highlight 8ms RepoModel.GetById 1ms app.codeStats 0ms

Unknown | 939 lines | 930 code | 9 blank | 0 comment | 0 complexity | 4d79c1f2f4b1514e42eb716818d286b7 MD5 | raw file
  1.\" Copyright (c) 1986, 1993
  2.\"	The Regents of the University of California.  All rights reserved.
  4.\" Redistribution and use in source and binary forms, with or without
  5.\" modification, are permitted provided that the following conditions
  6.\" are met:
  7.\" 1. Redistributions of source code must retain the above copyright
  8.\"    notice, this list of conditions and the following disclaimer.
  9.\" 2. Redistributions in binary form must reproduce the above copyright
 10.\"    notice, this list of conditions and the following disclaimer in the
 11.\"    documentation and/or other materials provided with the distribution.
 12.\" 3. All advertising materials mentioning features or use of this software
 13.\"    must display the following acknowledgement:
 14.\"	This product includes software developed by the University of
 15.\"	California, Berkeley and its contributors.
 16.\" 4. Neither the name of the University nor the names of its contributors
 17.\"    may be used to endorse or promote products derived from this software
 18.\"    without specific prior written permission.
 32.\"	@(#)	8.1 (Berkeley) 8/14/93
 34.oh 'Introductory 4.4BSD IPC''PSD:20-%' 'PSD:20-%''Introductory 4.4BSD IPC'
 37.sp 2 14
 39.ft B
 40.ce 2
 41An Introductory 4.4BSD
 42Interprocess Communication Tutorial 10
 44.sp 2
 46.i "Stuart Sechrest"
 49.ce 4
 50Computer Science Research Group
 51Computer Science Division
 52Department of Electrical Engineering and Computer Science
 53University of California, Berkeley
 54.sp 2
 60Berkeley UNIX\(dg 4.4BSD offers several choices for interprocess communication.
 61To aid the programmer in  developing programs which are comprised of
 63processes, the different choices are discussed and a series of example 
 64programs are presented.  These programs
 65demonstrate in a simple way the use of pipes, socketpairs, sockets
 66and the use of datagram and stream communication.  The intent of this
 67document is to present a few simple example programs, not to describe the
 68networking system in full.
 70.sp 2
 72\(dg\|UNIX is a trademark of AT&T Bell Laboratories.
 74.b 1 "Goals"
 78Facilities for interprocess communication (IPC) and networking
 79were a major addition to UNIX in the Berkeley UNIX 4.2BSD release.
 80These facilities required major additions and some changes
 81to the system interface.
 82The basic idea of this interface is to make IPC similar to file I/O.
 83In UNIX a process has a set of I/O descriptors, from which one reads
 84and to which one writes.
 85Descriptors may refer to normal files, to devices (including terminals),
 86or to communication channels.
 87The use of a descriptor has three phases: its creation,
 88its use for reading and writing, and its destruction.  By using descriptors
 89to write files, rather than simply naming the target file in the write
 90call, one gains a surprising amount of flexibility.  Often, the program that
 91creates a descriptor will be different from the program that uses the
 92descriptor.  For example the shell can create a descriptor for the output 
 93of the `ls'
 94command that will cause the listing to appear in a file rather than
 95on a terminal.
 96Pipes are another form of descriptor that have been used in UNIX
 97for some time.
 98Pipes allow one-way data transmission from one process
 99to another; the two processes and the pipe must be set up by a common
102The use of descriptors is not the only communication interface
103provided by UNIX.
104The signal mechanism sends a tiny amount of information from one 
105process to another.
106The signaled process receives only the signal type,
107not the identity of the sender,
108and the number of possible signals is small.
109The signal semantics limit the flexibility of the signaling mechanism
110as a means of interprocess communication.
112The identification of IPC with I/O is quite longstanding in UNIX and
113has proved quite successful.  At first, however, IPC was limited to
114processes communicating within a single machine.  With Berkeley UNIX
1154.2BSD this expanded to include IPC between machines.  This expansion
116has necessitated some change in the way that descriptors are created.
117Additionally, new possibilities for the meaning of read and write have
118been admitted.  Originally the meanings, or semantics, of these terms
119were fairly simple.  When you wrote something it was delivered.  When
120you read something, you were blocked until the data arrived.
121Other possibilities exist,
122however.  One can write without full assurance of delivery if one can
123check later to catch occasional failures.  Messages can be kept as
124discrete units or merged into a stream. 
125One can ask to read, but insist on not waiting if nothing is immediately
126available.  These new possibilities are allowed in the Berkeley UNIX IPC
129Thus Berkeley UNIX 4.4BSD offers several choices for IPC.
130This paper presents simple examples that illustrate some of
131the choices.
132The reader is presumed to be familiar with the C programming language
133[Kernighan & Ritchie 1978],
134but not necessarily with the system calls of the UNIX system or with
135processes and interprocess communication.
136The paper reviews the notion of a process and the types of
137communication that are supported by Berkeley UNIX 4.4BSD.
138A series of examples are presented that create processes that communicate
139with one another.  The programs show different ways of establishing
140channels of communication.
141Finally, the calls that actually transfer data are reviewed.
142To clearly present how communication can take place,
143the example programs have been cleared of anything that
144might be construed as useful work.
145They can, therefore, serve as models
146for the programmer trying to construct programs which are comprised of 
147cooperating processes.
148.b 1 "Processes"
151A \fIprogram\fP is both a sequence of statements and a rough way of referring 
152to the computation that occurs when the compiled statements are run.
153A \fIprocess\fP can be thought of as a single line of control in a program.
154Most programs execute some statements, go through a few loops, branch in
155various directions and then end.  These are single process programs.
156Programs can also have a point where control splits into two independent lines,
157an action called \fIforking.\fP
158In UNIX these lines can never join again.  A call to the system routine 
159\fIfork()\fP, causes a process to split in this way.
160The result of this call is that two independent processes will be
161running, executing exactly the same code.
162Memory values will be the same for all values set before the fork, but,
163subsequently, each version will be able to change only the 
164value of its own copy of each variable.
165Initially, the only difference between the two will be the value returned by
166\fIfork().\fP  The parent will receive a process id for the child, 
167the child will receive a zero.
168Calls to \fIfork(),\fP
169therefore, typically precede, or are included in, an if-statement.
171A process views the rest of the system through a private table of descriptors.
172The descriptors can represent open files or sockets (sockets are communication
173objects that will be discussed below).  Descriptors are referred to
174by their index numbers in the table.  The first three descriptors are often
175known by special names, \fI stdin, stdout\fP and \fIstderr\fP.
176These are the standard input, output and error.
177When a process forks, its descriptor table is copied to the child.
178Thus, if the parent's standard input is being taken from a terminal
179(devices are also treated as files in UNIX), the child's input will 
180be taken from the
181same terminal.  Whoever reads first will get the input.  If, before forking,
182the parent changes its standard input so that it is reading from a
183new file, the child will take its input from the new file.  It is
184also possible to take input from a socket, rather than from a file.
185.b 1 "Pipes"
189Most users of UNIX know that they can pipe the output of a 
190program ``prog1'' to the input of another, ``prog2,'' by typing the command
191\fI``prog1 | prog2.''\fP
192This is called ``piping'' the output of one program
193to another because the mechanism used to transfer the output is called a
195When the user types a command, the command is read by the shell, which
196decides how to execute it.  If the command is simple, for example,
197.i "``prog1,''"
198the shell forks a process, which executes the program, prog1, and then dies.
199The shell waits for this termination and then prompts for the next
201If the command is a compound command,
202.i "``prog1 | prog2,''"
203the shell creates two processes connected by a pipe. One process
204runs the program, prog1, the other runs prog2.  The pipe is an I/O
205mechanism with two ends, or sockets.  Data that is written into one socket
206can be read from the other.  
208.ft CW pipe.c
211.ce 1
212Figure 1\ \ Use of a pipe
215Since a program specifies its input and output only by the descriptor table
216indices, which appear as variables or constants,
217the input source and output destination can be changed without
218changing the text of the program.
219It is in this way that the shell is able to set up pipes.  Before executing
220prog1, the process can close whatever is at \fIstdout\fP
221and replace it with one
222end of a pipe.  Similarly, the process that will execute prog2 can substitute
223the opposite end of the pipe for 
226Let us now examine a program that creates a pipe for communication between
227its child and itself (Figure 1).
228A pipe is created by a parent process, which then forks.
229When a process forks, the parent's descriptor table is copied into 
230the child's.  
232In Figure 1, the parent process makes a call to the system routine 
234This routine creates a pipe and places descriptors for the sockets
235for the two ends of the pipe in the process's descriptor table.
237is passed an array into which it places the index numbers of the 
238sockets it created.
239The two ends are not equivalent.  The socket whose index is
240returned in the low word of the array is opened for reading only,
241while the socket in the high end is opened only for writing.
242This corresponds to the fact that the standard input is the first
243descriptor of a process's descriptor table and the standard output
244is the second.  After creating the pipe, the parent creates the child 
245with which it will share the pipe by calling \fIfork().\fP
246Figure 2 illustrates the effect of a fork.
247The parent process's descriptor table points to both ends of the pipe.
248After the fork, both parent's and child's descriptor tables point to
249the pipe.
250The child can then use the pipe to send a message to the parent.
251.(z fig2.pic
253.ce 2
254Figure 2\ \ Sharing a pipe between parent and child
255.ce 0
258Just what is a pipe?
259It is a one-way communication mechanism, with one end opened
260for reading and the other end for writing.
261Therefore, parent and child need to agree on which way to turn
262the pipe, from parent to child or the other way around.
263Using the same pipe for communication both from parent to child and 
264from child to parent would be possible (since both processes have 
265references to both ends), but very complicated.
266If the parent and child are to have a two-way conversation,
267the parent creates two pipes, one for use in each direction.
268(In accordance with their plans, both parent and child in the example above
269close the socket that they will not use.  It is not required that unused
270descriptors be closed, but it is good practice.)
271A pipe is also a \fIstream\fP communication mechanism; that
272is, all messages sent through the pipe are placed in order
273and reliably delivered.  When the reader asks for a certain
274number of bytes from this
275stream, he is given as many bytes as are available, up
276to the amount of the request. Note that these bytes may have come from 
277the same call to \fIwrite()\fR or from several calls to \fIwrite()\fR
278which were concatenated.
279.b 1 "Socketpairs"
283Berkeley UNIX 4.4BSD provides a slight generalization of pipes.  A pipe is a
284pair of connected sockets for one-way stream communication.  One may
285obtain a pair of connected sockets for two-way stream communication
286by calling the routine \fIsocketpair().\fP
287The program in Figure 3 calls \fIsocketpair()\fP
288to create such a connection.  The program uses the link for
289communication in both directions.  Since socketpairs are
290an extension of pipes, their use resembles that of pipes. 
291Figure 4 illustrates the result of a fork following a call to 
295takes as
296arguments a specification of a domain, a style of communication, and a
298These are the parameters shown in the example.
299Domains and protocols will be discussed in the next section.
301a domain is a space of names that may be bound
302to sockets and implies certain other conventions.
303Currently, socketpairs have only been implemented for one
304domain, called the UNIX domain.
305The UNIX domain uses UNIX path names for naming sockets.  
306It only allows communication
307between sockets on the same machine.
309Note that the header files 
310.i "<sys/socket.h>"
312.i "<sys/types.h>."
313are required in this program.
314The constants AF_UNIX and SOCK_STREAM are defined in 
315.i "<sys/socket.h>,"
316which in turn requires the file 
317.i "<sys/types.h>"
318for some of its definitions.
320.ft CW socketpair.c
323.ce 1
324Figure 3\ \ Use of a socketpair
326.(z fig3.pic
328.ce 1
329Figure 4\ \ Sharing a socketpair between parent and child
331.b 1 "Domains and Protocols"
335Pipes and socketpairs are a simple solution for communicating between
336a parent and child or between child processes.
337What if we wanted to have processes that have no common ancestor
338with whom to set up communication?
339Neither standard UNIX pipes nor socketpairs are
340the answer here, since both mechanisms require a common ancestor to
341set up the communication.
342We would like to have two processes separately create sockets
343and then have messages sent between them.  This is often the
344case when providing or using a service in the system.  This is
345also the case when the communicating processes are on separate machines.
346In Berkeley UNIX 4.4BSD one can create individual sockets, give them names and
347send messages between them.
349Sockets created by different programs use names to refer to one another;
350names generally must be translated into addresses for use.
351The space from which an address is drawn is referred to as a
352.i domain.
353There are several domains for sockets.
354Two that will be used in the examples here are the UNIX domain (or AF_UNIX,
355for Address Format UNIX) and the Internet domain (or AF_INET).
356UNIX domain IPC is an experimental facility in 4.2BSD and 4.3BSD.
357In the UNIX domain, a socket is given a path name within the file system
358name space.
359A file system node is created for the socket and other processes may 
360then refer to the socket by giving the proper pathname.
361UNIX domain names, therefore, allow communication between any two processes
362that work in the same file system.
363The Internet domain is the UNIX implementation of the DARPA Internet
364standard protocols IP/TCP/UDP.
365Addresses in the Internet domain consist of a machine network address
366and an identifying number, called a port.
367Internet domain names allow communication between machines.
369Communication follows some particular ``style.''
370Currently, communication is either through a \fIstream\fP
371or by \fIdatagram.\fP
372Stream communication implies several things.  Communication takes
373place across a connection between two sockets.  The communication
374is reliable, error-free, and, as in pipes, no message boundaries are
375kept. Reading from a stream may result in reading the data sent from
376one or several calls to \fIwrite()\fP
377or only part of the data from a single call, if there is not enough room
378for the entire message, or if not all the data from a large message
379has been transferred.
380The protocol implementing such a style will retransmit messages
381received with errors. It will also return error messages if one tries to
382send a message after the connection has been broken.
383Datagram communication does not use connections.  Each message is
384addressed individually.  If the address is correct, it will generally
385be received, although this is not guaranteed.  Often datagrams are
386used for requests that require a response from the 
387recipient.  If no response
388arrives in a reasonable amount of time, the request is repeated.
389The individual datagrams will be kept separate when they are read, that
390is, message boundaries are preserved.
392The difference in performance between the two styles of communication is 
393generally less important than the difference in semantics.  The
394performance gain that one might find in using datagrams must be weighed
395against the increased complexity of the program, which must now concern
396itself with lost or out of order messages.  If lost messages may simply be 
397ignored, the quantity of traffic may be a consideration. The expense
398of setting up a connection is best justified by frequent use of the connection.
399Since the performance of a protocol changes as it is tuned for different
400situations, it is best to seek the most up-to-date information when
401making choices for a program in which performance is crucial.
403A protocol is a set of rules, data formats and conventions that regulate the
404transfer of data between participants in the communication.
405In general, there is one protocol for each socket type (stream,
406datagram, etc.) within each domain.
407The code that implements a protocol 
408keeps track of the names that are bound to sockets,
409sets up connections and	transfers data between sockets,
410perhaps sending the data across a network.
411This code also keeps track of the names that are bound to sockets.
412It is possible for several protocols, differing only in low level
413details, to implement the same style of communication within
414a particular domain.  Although it is possible to select
415which protocol should be used, for nearly all uses it is sufficient to
416request the default protocol.  This has been done in all of the example
419One specifies the domain, style and protocol of a socket when
420it is created.  For example, in Figure 5a the call to \fIsocket()\fP
421causes the creation of a datagram socket with the default protocol 
422in the UNIX domain.
423.b 1 "Datagrams in the UNIX Domain"
427.ft CW udgramread.c
430.ce 1
431Figure 5a\ \ Reading UNIX domain datagrams
434Let us now look at two programs that create sockets separately.
435The programs in Figures 5a and 5b use datagram communication
436rather than a stream.  
437The structure used to name UNIX domain sockets is defined
438in the file \fI<sys/un.h>.\fP
439The definition has also been included in the example for clarity.
441Each program creates a socket with a call to \fIsocket().\fP
442These sockets are in the UNIX domain.
443Once a name has been decided upon it is attached to a socket by the
444system call \fIbind().\fP
445The program in Figure 5a uses the name ``socket'',
446which it binds to its socket.
447This name will appear in the working directory of the program.
448The routines in Figure 5b use its
449socket only for sending messages.  It does not create a name for
450the socket because no other process has to refer to it.  
452.ft CW udgramsend.c
455.ce 1
456Figure 5b\ \ Sending a UNIX domain datagrams
459Names in the UNIX domain are path names.  Like file path names they may
460be either absolute (e.g. ``/dev/imaginary'') or relative (e.g. ``socket'').
461Because these names are used to allow processes to rendezvous, relative
462path names can pose difficulties and should be used with care.
463When a name is bound into the name space, a file (inode) is allocated in the
464file system.  If
465the inode is not deallocated, the name will continue to exist even after
466the bound socket is closed.  This can cause subsequent runs of a program
467to find that a name is unavailable, and can cause 
468directories to fill up with these
469objects.  The names are removed by calling \fIunlink()\fP or using
470the \fIrm\fP\|(1) command.
471Names in the UNIX domain are only used for rendezvous.  They are not used
472for message delivery once a connection is established.  Therefore, in
473contrast with the Internet domain, unbound sockets need not be (and are
474not) automatically given addresses when they are connected.  
476There is no established means of communicating names to interested
477parties.  In the example, the program in Figure 5b gets the
478name of the socket to which it will send its message through its
479command line arguments.  Once a line of communication has been created,
480one can send the names of additional, perhaps new, sockets over the link.
481Facilities will have to be built that will make the distribution of
482names less of a problem than it now is.
483.b 1 "Datagrams in the Internet Domain"
487.ft CW dgramread.c
490.ce 1
491Figure 6a\ \ Reading Internet domain datagrams
494The examples in Figure 6a and 6b are very close to the previous example
495except that the socket is in the Internet domain.
496The structure of Internet domain addresses is defined in the file
498Internet addresses specify a host address (a 32-bit number)
499and a delivery slot, or port, on that
500machine.  These ports are managed by the system routines that implement 
501a particular protocol.
502Unlike UNIX domain names, Internet socket names are not entered into 
503the file system and, therefore,
504they do not have to be unlinked after the socket has been closed.
505When a message must be sent between machines it is sent to
506the protocol routine on the destination machine, which interprets the
507address to determine to which socket the message should be delivered.
508Several different protocols may be active on 
509the same machine, but, in general, they will not communicate with one another.
510As a result, different protocols are allowed to use the same port numbers.
511Thus, implicitly, an Internet address is a triple including a protocol as
512well as the port and machine address.
513An \fIassociation\fP is a temporary or permanent specification
514of a pair of communicating sockets.
515An association is thus identified by the tuple
516<\fIprotocol, local machine address, local port,
517remote machine address, remote port\fP>.
518An association may be transient when using datagram sockets;
519the association actually exists during a \fIsend\fP operation.
521.ft CW dgramsend.c
524.ce 1
525Figure 6b\ \ Sending an Internet domain datagram
528The protocol for a socket is chosen when the socket is created.  The 
529local machine address for a socket can be any valid network address of the
530machine, if it has more than one, or it can be the wildcard value
532The wildcard value is used in the program in Figure 6a.
533If a machine has several network addresses, it is likely
534that messages sent to any of the addresses should be deliverable to
535a socket.  This will be the case if the wildcard value has been chosen.
536Note that even if the wildcard value is chosen, a program sending messages
537to the named socket must specify a valid network address.  One can be willing
538to receive from ``anywhere,'' but one cannot send a message ``anywhere.''
539The program in Figure 6b is given the destination host name as a command
540line argument.
541To determine a network address to which it can send the message, it looks
543the host address by the call to \fIgethostbyname()\fP.
544The returned structure includes the host's network address,
545which is copied into the structure specifying the
546destination of the message.
548The port number can be thought of as the number of a mailbox, into
549which the protocol places one's messages.  Certain daemons, offering
550certain advertised services, have reserved
551or ``well-known'' port numbers.  These fall in the range
552from 1 to 1023.  Higher numbers are available to general users.
553Only servers need to ask for a particular number.
554The system will assign an unused port number when an address
555is bound to a socket.
556This may happen when an explicit \fIbind\fP
557call is made with a port number of 0, or
558when a \fIconnect\fP or \fIsend\fP
559is performed on an unbound socket.
560Note that port numbers are not automatically reported back to the user.
561After calling \fIbind(),\fP asking for port 0, one may call 
562\fIgetsockname()\fP to discover what port was actually assigned. 
563The routine \fIgetsockname()\fP
564will not work for names in the UNIX domain.
566The format of the socket address is specified in part by standards within the
567Internet domain.  The specification includes the order of the bytes in
568the address.  Because machines differ in the internal representation
569they ordinarily use
570to represent integers, printing out the port number as returned by 
571\fIgetsockname()\fP may result in a misinterpretation.  To
572print out the number, it is necessary to use the routine \fIntohs()\fP
573(for \fInetwork to host: short\fP) to convert the number from the
574network representation to the host's representation.  On some machines,
575such as 68000-based machines, this is a null operation.  On others,
576such as VAXes, this results in a swapping of bytes.  Another routine
577exists to convert a short integer from the host format to the network format,
578called \fIhtons()\fP; similar routines exist for long integers.
579For further information, refer to the
580entry for \fIbyteorder\fP in section 3 of the manual.
581.b 1 "Connections"
585To send data between stream sockets (having communication style SOCK_STREAM),
586the sockets must be connected.
587Figures 7a and 7b show two programs that create such a connection.
588The program in 7a is relatively simple.
589To initiate a connection, this program simply creates
590a stream socket, then calls \fIconnect()\fP,
591specifying the address of the socket to which
592it wishes its socket connected.  Provided that the target socket exists and
593is prepared to handle a connection, connection will be complete,
594and the program can begin to send
595messages.  Messages will be delivered in order without message
596boundaries, as with pipes.  The connection is destroyed when either
597socket is closed (or soon thereafter).  If a process persists 
598in sending messages after the connection is closed, a SIGPIPE signal 
599is sent to the process by the operating system.  Unless explicit action
600is taken to handle the signal (see the manual page for \fIsignal\fP
601or \fIsigvec\fP),
602the process will terminate and the shell
603will print the message ``broken pipe.'' 
605.ft CW streamwrite.c
608.ce 1
609Figure 7a\ \ Initiating an Internet domain stream connection
612.ft CW streamread.c
615.ce 1
616Figure 7b\ \ Accepting an Internet domain stream connection
617.sp 2
618.ft CW strchkread.c
621.ce 1
622Figure 7c\ \ Using select() to check for pending connections
624.(z fig8.pic
627.ce 1
628Figure 8\ \ Establishing a stream connection
631Forming a connection is asymmetrical; one process, such as the
632program in Figure 7a, requests a connection with a particular socket,
633the other process accepts connection requests.
634Before a connection can be accepted a socket must be created and an address
635bound to it.  This
636situation is illustrated in the top half of Figure 8.  Process 2
637has created a socket and bound a port number to it.  Process 1 has created an
638unnamed socket.
639The address bound to process 2's socket is then made known to process 1 and, 
640perhaps to several other potential communicants as well.
641If there are several possible communicants,
642this one socket might receive several requests for connections.
643As a result, a new socket is created for each connection.  This new socket
644is the endpoint for communication within this process for this connection.
645A connection may be destroyed by closing the corresponding socket.
647The program in Figure 7b is a rather trivial example of a server.  It 
648creates a socket to which it binds a name, which it then advertises.
649(In this case it prints out the socket number.)  The program then calls
650\fIlisten()\fP for this socket.  
651Since several clients may attempt to connect more or less
652simultaneously, a queue of pending connections is maintained in the system
653address space.  \fIListen()\fP
654marks the socket as willing to accept connections and initializes the queue.
655When a connection is requested, it is listed in the queue.  If the
656queue is full, an error status may be returned to the requester.
657The maximum length of this queue is specified by the second argument of
658\fIlisten()\fP; the maximum length is limited by the system.  
659Once the listen call has been completed, the program enters
660an infinite loop.  On each pass through the loop, a new connection is
661accepted and removed from the queue, and, hence, a new socket for the 
662connection is created.  The bottom half of Figure 8 shows the result of
663Process 1 connecting with the named socket of Process 2, and Process 2
664accepting the connection.  After the connection is created, the
665service, in this case printing out the messages, is performed and the
666connection socket closed.  The \fIaccept()\fP
667call will take a pending connection
668request from the queue if one is available, or block waiting for a request.
669Messages are read from the connection socket.
670Reads from an active connection will normally block until data is available.
671The number of bytes read is returned.  When a connection is destroyed,
672the read call returns immediately.  The number of bytes returned will
673be zero.
675The program in Figure 7c is a slight variation on the server in Figure 7b.
676It avoids blocking when there are no pending connection requests by 
677calling \fIselect()\fP
678to check for pending requests before calling \fIaccept().\fP
679This strategy is useful when connections may be received
680on more than one socket, or when data may arrive on other connected
681sockets before another connection request.
683The programs in Figures 9a and 9b show a program using stream communication
684in the UNIX domain.  Streams in the UNIX domain can be used for this sort
685of program in exactly the same way as Internet domain streams, except for
686the form of the names and the restriction of the connections to a single
687file system.  There are some differences, however, in the functionality of 
688streams in the two domains, notably in the handling of 
689\fIout-of-band\fP data (discussed briefly below).  These differences
690are beyond the scope of this paper.
692.ft CW ustreamwrite.c
695.ce 1
696Figure 9a\ \ Initiating a UNIX domain stream connection
697.sp 2
698.ft CW ustreamread.c
701.ce 1
702Figure 9b\ \ Accepting a UNIX domain stream connection
704.b 1 "Reads, Writes, Recvs, etc."
708UNIX 4.4BSD has several system calls for reading and writing information.
709The simplest calls are \fIread() \fP and \fIwrite().\fP \fIWrite()\fP
710takes as arguments the index of a descriptor, a pointer to a buffer 
711containing the data and the size of the data.
712The descriptor may indicate either a file or a connected socket.  
713``Connected'' can mean either a connected stream socket (as described
714in Section 8) or a datagram socket for which a \fIconnect()\fP
715call has provided a default destination (see the \fIconnect()\fP manual page).
716\fIRead()\fP also takes a descriptor that indicates either a file or a socket.
717\fIWrite()\fP requires a connected socket since no destination is 
718specified in the parameters of the system call.
719\fIRead()\fP can be used for either a connected or an unconnected socket.
720These calls are, therefore, quite flexible and may be used to
721write applications that require no assumptions about the source of
722their input or the destination of their output.
723There are variations on \fIread() \fP and \fIwrite()\fP
724that allow the source and destination of the input and output to use
725several separate buffers, while retaining the flexibility to handle
726both files and sockets.  These are \fIreadv()\fP and \fI writev(),\fP
727for read and write \fIvector.\fP
729It is sometimes necessary to send high priority data over a
730connection that may have unread low priority data at the
731other end.  For example, a user interface process may be interpreting
732commands and sending them on to another process through a stream connection.
733The user interface may have filled the stream with as yet unprocessed 
734requests when the user types
735a command to cancel all outstanding requests.
736Rather than have the high priority data wait
737to be processed after the low priority data, it is possible to
738send it as \fIout-of-band\fP
739(OOB) data.  The notification of pending OOB data results in the generation of
740a SIGURG signal, if this signal has been enabled (see the manual
741page for \fIsignal\fP or \fIsigvec\fP).
742See [Leffler 1986] for a more complete description of the OOB mechanism.
743There are a pair of calls similar to \fIread\fP and \fIwrite\fP
744that allow options, including sending 
745and receiving OOB information; these are \fI send()\fP
746and \fIrecv().\fP
747These calls are used only with sockets; specifying a descriptor for a file will
748result in the return of an error status.  These calls also allow
749\fIpeeking\fP at data in a stream.
750That is, they allow a process to read data without removing the data from
751the stream.  One use of this facility is to read ahead in a stream
752to determine the size of the next item to be read.
753When not using these options, these calls have the same functions as 
754\fIread()\fP and \fIwrite().\fP
756To send datagrams, one must be allowed to specify the destination.
757The call \fIsendto()\fP
758takes a destination address as an argument and is therefore used for
759sending datagrams.  The call \fIrecvfrom()\fP
760is often used to read datagrams, since this call returns the address
761of the sender, if it is available, along with the data.
762If the identity of the sender does not matter, one may use \fIread()\fP
763or \fIrecv().\fP
765Finally, there are a pair of calls that allow the sending and
766receiving of messages from multiple buffers, when the address of the
767recipient must be specified.  These are \fIsendmsg()\fP and 
769These calls are actually quite general and have other uses,
770including, in the UNIX domain, the transmission of a file descriptor from one
771process to another.
773The various options for reading and writing are shown in Figure 10,
774together with their parameters.  The parameters for each system call
775reflect the differences in function of the different calls.
776In the examples given in this paper, the calls \fIread()\fP and 
777\fIwrite()\fP have been used whenever possible.
779.ft CW
780	/*
781	 * The variable descriptor may be the descriptor of either a file
782	 * or of a socket.
783	 */
784	cc = read(descriptor, buf, nbytes)
785	int cc, descriptor; char *buf; int nbytes;
787	/*
788	 * An iovec can include several source buffers.
789	 */
790	cc = readv(descriptor, iov, iovcnt)
791	int cc, descriptor; struct iovec *iov; int iovcnt;
793	cc = write(descriptor, buf, nbytes)
794	int cc, descriptor; char *buf; int nbytes;
796	cc = writev(descriptor, iovec, ioveclen)
797	int cc, descriptor; struct iovec *iovec; int ioveclen;
799	/*
800	 * The variable ``sock'' must be the descriptor of a socket.
801	 * Flags may include MSG_OOB and MSG_PEEK.
802	 */
803	cc = send(sock, msg, len, flags)
804	int cc, sock; char *msg; int len, flags; 
806	cc = sendto(sock, msg, len, flags, to, tolen)
807	int cc, sock; char *msg; int len, flags;
808	struct sockaddr *to; int tolen;
810	cc = sendmsg(sock, msg, flags)
811	int cc, sock; struct msghdr msg[]; int flags;
813	cc = recv(sock, buf, len, flags)
814	int cc, sock; char *buf; int len, flags;
816	cc = recvfrom(sock, buf, len, flags, from, fromlen)
817	int cc, sock; char *buf; int len, flags;
818	struct sockaddr *from; int *fromlen;
820	cc = recvmsg(sock, msg, flags)
821	int cc, socket; struct msghdr msg[]; int flags;
823.sp 1
824.ce 1
825Figure 10\ \ Varieties of read and write commands
827.b 1 "Choices"
831This paper has presented examples of some of the forms
832of communication supported by
833Berkeley UNIX 4.4BSD.  These have been presented in an order chosen for
834ease of presentation.  It is useful to review these options emphasizing the
835factors that make each attractive.
837Pipes have the advantage of portability, in that they are supported in all
838UNIX systems.  They also are relatively
839simple to use.  Socketpairs share this simplicity and have the additional
840advantage of allowing bidirectional communication.  The major shortcoming
841of these mechanisms is that they require communicating processes to be
842descendants of a common process.  They do not allow intermachine communication.
844The two communication domains, UNIX and Internet, allow processes with no common
845ancestor to communicate.
846Of the two, only the Internet domain allows
847communication between machines.
848This makes the Internet domain a necessary
849choice for processes running on separate machines.
851The choice between datagrams and stream communication is best made by
852carefully considering the semantic and performance
853requirements of the application.
854Streams can be both advantageous and disadvantageous.  One disadvantage
855is that a process is only allowed a limited number of open streams,
856as there are usually only 64 entries available in the open descriptor
857table.  This can cause problems if a single server must talk with a large
858number of clients. 
859Another is that for delivering a short message the stream setup and 
860teardown time can be unnecessarily long.  Weighed against this are
861the reliability built into the streams.  This will often be the
862deciding factor in favor of streams.
863.b 1 "What to do Next"
867Many of the examples presented here can serve as models for multiprocess
868programs and for programs distributed across several machines.  
869In developing a new multiprocess program, it is often easiest to 
870first write the code to create the processes and communication paths.
871After this code is debugged, the code specific to the application can
872be added.
874An introduction to the UNIX system and programming using UNIX system calls
875can be found in [Kernighan and Pike 1984].
876Further documentation of the Berkeley UNIX 4.4BSD IPC mechanisms can be 
877found in [Leffler et al. 1986].
878More detailed information about particular calls and protocols
879is provided in sections
8802, 3 and 4 of the
881UNIX Programmer's Manual [CSRG 1986].
882In particular the following manual pages are relevant:
885l l.
886creating and naming sockets	socket(2), bind(2)
887establishing connections	listen(2), accept(2), connect(2)
888transferring data	read(2), write(2), send(2), recv(2)
889addresses	inet(4F)
890protocols	tcp(4P), udp(4P).
898I would like to thank Sam Leffler and Mike Karels for their help in
899understanding the IPC mechanisms and all the people whose comments 
900have helped in writing and improving this report.
902This work was sponsored by the Defense Advanced Research Projects Agency
903(DoD), ARPA Order No. 4031, monitored by the Naval Electronics Systems
904Command under contract No. N00039-C-0235. 
905The views and conclusions contained in this document are those of the
906author and should not be interpreted as representing official policies,
907either expressed or implied, of the Defense Research Projects Agency
908or of the US Government.
915.sp 1
917B.W. Kernighan & R. Pike, 1984,
918.i "The UNIX Programming Environment."
919Englewood Cliffs, N.J.: Prentice-Hall.
920.sp 1
922B.W. Kernighan & D.M. Ritchie, 1978,
923.i "The C Programming Language,"
924Englewood Cliffs, N.J.: Prentice-Hall.
925.sp 1
927S.J. Leffler, R.S. Fabry, W.N. Joy, P. Lapsley, S. Miller & C. Torek, 1986,
928.i "An Advanced 4.4BSD Interprocess Communication Tutorial."
929Computer Systems Research Group,
930Department of Electrical Engineering and Computer Science,
931University of California, Berkeley.
932.sp 1
934Computer Systems Research Group, 1986,
935.i "UNIX Programmer's Manual, 4.4 Berkeley Software Distribution."
936Computer Systems Research Group,
937Department of Electrical Engineering and Computer Science,
938University of California, Berkeley.