/share/doc/psd/04.uprog/p4

https://bitbucket.org/freebsd/freebsd-head/ · #! · 600 lines · 589 code · 11 blank · 0 comment · 0 complexity · 4ee81eb0ab56139cbebf4caf1feb7990 MD5 · raw file

  1. .\" Copyright (C) Caldera International Inc. 2001-2002. All rights reserved.
  2. .\"
  3. .\" Redistribution and use in source and binary forms, with or without
  4. .\" modification, are permitted provided that the following conditions are
  5. .\" met:
  6. .\"
  7. .\" Redistributions of source code and documentation must retain the above
  8. .\" copyright notice, this list of conditions and the following
  9. .\" disclaimer.
  10. .\"
  11. .\" Redistributions in binary form must reproduce the above copyright
  12. .\" notice, this list of conditions and the following disclaimer in the
  13. .\" documentation and/or other materials provided with the distribution.
  14. .\"
  15. .\" All advertising materials mentioning features or use of this software
  16. .\" must display the following acknowledgement:
  17. .\"
  18. .\" This product includes software developed or owned by Caldera
  19. .\" International, Inc. Neither the name of Caldera International, Inc.
  20. .\" nor the names of other contributors may be used to endorse or promote
  21. .\" products derived from this software without specific prior written
  22. .\" permission.
  23. .\"
  24. .\" USE OF THE SOFTWARE PROVIDED FOR UNDER THIS LICENSE BY CALDERA
  25. .\" INTERNATIONAL, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR
  26. .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  27. .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  28. .\" DISCLAIMED. IN NO EVENT SHALL CALDERA INTERNATIONAL, INC. BE LIABLE
  29. .\" FOR ANY DIRECT, INDIRECT INCIDENTAL, SPECIAL, EXEMPLARY, OR
  30. .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  31. .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
  32. .\" BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
  33. .\" WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
  34. .\" OR OTHERWISE) RISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN
  35. .\" IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  36. .\"
  37. .\" $FreeBSD$
  38. .\"
  39. .\" @(#)p4 8.1 (Berkeley) 6/8/93
  40. .\"
  41. .NH
  42. LOW-LEVEL I/O
  43. .PP
  44. This section describes the
  45. bottom level of I/O on the
  46. .UC UNIX
  47. system.
  48. The lowest level of I/O in
  49. .UC UNIX
  50. provides no buffering or any other services;
  51. it is in fact a direct entry into the operating system.
  52. You are entirely on your own,
  53. but on the other hand,
  54. you have the most control over what happens.
  55. And since the calls and usage are quite simple,
  56. this isn't as bad as it sounds.
  57. .NH 2
  58. File Descriptors
  59. .PP
  60. In the
  61. .UC UNIX
  62. operating system,
  63. all input and output is done
  64. by reading or writing files,
  65. because all peripheral devices, even the user's terminal,
  66. are files in the file system.
  67. This means that a single, homogeneous interface
  68. handles all communication between a program and peripheral devices.
  69. .PP
  70. In the most general case,
  71. before reading or writing a file,
  72. it is necessary to inform the system
  73. of your intent to do so,
  74. a process called
  75. ``opening'' the file.
  76. If you are going to write on a file,
  77. it may also be necessary to create it.
  78. The system checks your right to do so
  79. (Does the file exist?
  80. Do you have permission to access it?),
  81. and if all is well,
  82. returns a small positive integer
  83. called a
  84. .ul
  85. file descriptor.
  86. Whenever I/O is to be done on the file,
  87. the file descriptor is used instead of the name to identify the file.
  88. (This is roughly analogous to the use of
  89. .UC READ(5,...)
  90. and
  91. .UC WRITE(6,...)
  92. in Fortran.)
  93. All
  94. information about an open file is maintained by the system;
  95. the user program refers to the file
  96. only
  97. by the file descriptor.
  98. .PP
  99. The file pointers discussed in section 3
  100. are similar in spirit to file descriptors,
  101. but file descriptors are more fundamental.
  102. A file pointer is a pointer to a structure that contains,
  103. among other things, the file descriptor for the file in question.
  104. .PP
  105. Since input and output involving the user's terminal
  106. are so common,
  107. special arrangements exist to make this convenient.
  108. When the command interpreter (the
  109. ``shell'')
  110. runs a program,
  111. it opens
  112. three files, with file descriptors 0, 1, and 2,
  113. called the standard input,
  114. the standard output, and the standard error output.
  115. All of these are normally connected to the terminal,
  116. so if a program reads file descriptor 0
  117. and writes file descriptors 1 and 2,
  118. it can do terminal I/O
  119. without worrying about opening the files.
  120. .PP
  121. If I/O is redirected
  122. to and from files with
  123. .UL <
  124. and
  125. .UL > ,
  126. as in
  127. .P1
  128. prog <infile >outfile
  129. .P2
  130. the shell changes the default assignments for file descriptors
  131. 0 and 1
  132. from the terminal to the named files.
  133. Similar observations hold if the input or output is associated with a pipe.
  134. Normally file descriptor 2 remains attached to the terminal,
  135. so error messages can go there.
  136. In all cases,
  137. the file assignments are changed by the shell,
  138. not by the program.
  139. The program does not need to know where its input
  140. comes from nor where its output goes,
  141. so long as it uses file 0 for input and 1 and 2 for output.
  142. .NH 2
  143. Read and Write
  144. .PP
  145. All input and output is done by
  146. two functions called
  147. .UL read
  148. and
  149. .UL write .
  150. For both, the first argument is a file descriptor.
  151. The second argument is a buffer in your program where the data is to
  152. come from or go to.
  153. The third argument is the number of bytes to be transferred.
  154. The calls are
  155. .P1
  156. n_read = read(fd, buf, n);
  157. n_written = write(fd, buf, n);
  158. .P2
  159. Each call returns a byte count
  160. which is the number of bytes actually transferred.
  161. On reading,
  162. the number of bytes returned may be less than
  163. the number asked for,
  164. because fewer than
  165. .UL n
  166. bytes remained to be read.
  167. (When the file is a terminal,
  168. .UL read
  169. normally reads only up to the next newline,
  170. which is generally less than what was requested.)
  171. A return value of zero bytes implies end of file,
  172. and
  173. .UL -1
  174. indicates an error of some sort.
  175. For writing, the returned value is the number of bytes
  176. actually written;
  177. it is generally an error if this isn't equal
  178. to the number supposed to be written.
  179. .PP
  180. The number of bytes to be read or written is quite arbitrary.
  181. The two most common values are
  182. 1,
  183. which means one character at a time
  184. (``unbuffered''),
  185. and
  186. 512,
  187. which corresponds to a physical blocksize on many peripheral devices.
  188. This latter size will be most efficient,
  189. but even character at a time I/O
  190. is not inordinately expensive.
  191. .PP
  192. Putting these facts together,
  193. we can write a simple program to copy
  194. its input to its output.
  195. This program will copy anything to anything,
  196. since the input and output can be redirected to any file or device.
  197. .P1
  198. #define BUFSIZE 512 /* best size for PDP-11 UNIX */
  199. main() /* copy input to output */
  200. {
  201. char buf[BUFSIZE];
  202. int n;
  203. while ((n = read(0, buf, BUFSIZE)) > 0)
  204. write(1, buf, n);
  205. exit(0);
  206. }
  207. .P2
  208. If the file size is not a multiple of
  209. .UL BUFSIZE ,
  210. some
  211. .UL read
  212. will return a smaller number of bytes
  213. to be written by
  214. .UL write ;
  215. the next call to
  216. .UL read
  217. after that
  218. will return zero.
  219. .PP
  220. It is instructive to see how
  221. .UL read
  222. and
  223. .UL write
  224. can be used to construct
  225. higher level routines like
  226. .UL getchar ,
  227. .UL putchar ,
  228. etc.
  229. For example,
  230. here is a version of
  231. .UL getchar
  232. which does unbuffered input.
  233. .P1
  234. #define CMASK 0377 /* for making char's > 0 */
  235. getchar() /* unbuffered single character input */
  236. {
  237. char c;
  238. return((read(0, &c, 1) > 0) ? c & CMASK : EOF);
  239. }
  240. .P2
  241. .UL c
  242. .ul
  243. must
  244. be declared
  245. .UL char ,
  246. because
  247. .UL read
  248. accepts a character pointer.
  249. The character being returned must be masked with
  250. .UL 0377
  251. to ensure that it is positive;
  252. otherwise sign extension may make it negative.
  253. (The constant
  254. .UL 0377
  255. is appropriate for the
  256. .UC PDP -11
  257. but not necessarily for other machines.)
  258. .PP
  259. The second version of
  260. .UL getchar
  261. does input in big chunks,
  262. and hands out the characters one at a time.
  263. .P1
  264. #define CMASK 0377 /* for making char's > 0 */
  265. #define BUFSIZE 512
  266. getchar() /* buffered version */
  267. {
  268. static char buf[BUFSIZE];
  269. static char *bufp = buf;
  270. static int n = 0;
  271. if (n == 0) { /* buffer is empty */
  272. n = read(0, buf, BUFSIZE);
  273. bufp = buf;
  274. }
  275. return((--n >= 0) ? *bufp++ & CMASK : EOF);
  276. }
  277. .P2
  278. .NH 2
  279. Open, Creat, Close, Unlink
  280. .PP
  281. Other than the default
  282. standard input, output and error files,
  283. you must explicitly open files in order to
  284. read or write them.
  285. There are two system entry points for this,
  286. .UL open
  287. and
  288. .UL creat
  289. [sic].
  290. .PP
  291. .UL open
  292. is rather like the
  293. .UL fopen
  294. discussed in the previous section,
  295. except that instead of returning a file pointer,
  296. it returns a file descriptor,
  297. which is just an
  298. .UL int .
  299. .P1
  300. int fd;
  301. fd = open(name, rwmode);
  302. .P2
  303. As with
  304. .UL fopen ,
  305. the
  306. .UL name
  307. argument
  308. is a character string corresponding to the external file name.
  309. The access mode argument
  310. is different, however:
  311. .UL rwmode
  312. is 0 for read, 1 for write, and 2 for read and write access.
  313. .UL open
  314. returns
  315. .UL -1
  316. if any error occurs;
  317. otherwise it returns a valid file descriptor.
  318. .PP
  319. It is an error to
  320. try to
  321. .UL open
  322. a file that does not exist.
  323. The entry point
  324. .UL creat
  325. is provided to create new files,
  326. or to re-write old ones.
  327. .P1
  328. fd = creat(name, pmode);
  329. .P2
  330. returns a file descriptor
  331. if it was able to create the file
  332. called
  333. .UL name ,
  334. and
  335. .UL -1
  336. if not.
  337. If the file
  338. already exists,
  339. .UL creat
  340. will truncate it to zero length;
  341. it is not an error to
  342. .UL creat
  343. a file that already exists.
  344. .PP
  345. If the file is brand new,
  346. .UL creat
  347. creates it with the
  348. .ul
  349. protection mode
  350. specified by
  351. the
  352. .UL pmode
  353. argument.
  354. In the
  355. .UC UNIX
  356. file system,
  357. there are nine bits of protection information
  358. associated with a file,
  359. controlling read, write and execute permission for
  360. the owner of the file,
  361. for the owner's group,
  362. and for all others.
  363. Thus a three-digit octal number
  364. is most convenient for specifying the permissions.
  365. For example,
  366. 0755
  367. specifies read, write and execute permission for the owner,
  368. and read and execute permission for the group and everyone else.
  369. .PP
  370. To illustrate,
  371. here is a simplified version of
  372. the
  373. .UC UNIX
  374. utility
  375. .IT cp ,
  376. a program which copies one file to another.
  377. (The main simplification is that our version
  378. copies only one file,
  379. and does not permit the second argument
  380. to be a directory.)
  381. .P1
  382. #define NULL 0
  383. #define BUFSIZE 512
  384. #define PMODE 0644 /* RW for owner, R for group, others */
  385. main(argc, argv) /* cp: copy f1 to f2 */
  386. int argc;
  387. char *argv[];
  388. {
  389. int f1, f2, n;
  390. char buf[BUFSIZE];
  391. if (argc != 3)
  392. error("Usage: cp from to", NULL);
  393. if ((f1 = open(argv[1], 0)) == -1)
  394. error("cp: can't open %s", argv[1]);
  395. if ((f2 = creat(argv[2], PMODE)) == -1)
  396. error("cp: can't create %s", argv[2]);
  397. while ((n = read(f1, buf, BUFSIZE)) > 0)
  398. if (write(f2, buf, n) != n)
  399. error("cp: write error", NULL);
  400. exit(0);
  401. }
  402. .P2
  403. .P1
  404. error(s1, s2) /* print error message and die */
  405. char *s1, *s2;
  406. {
  407. printf(s1, s2);
  408. printf("\en");
  409. exit(1);
  410. }
  411. .P2
  412. .PP
  413. As we said earlier,
  414. there is a limit (typically 15-25)
  415. on the number of files which a program
  416. may have open simultaneously.
  417. Accordingly, any program which intends to process
  418. many files must be prepared to re-use
  419. file descriptors.
  420. The routine
  421. .UL close
  422. breaks the connection between a file descriptor
  423. and an open file,
  424. and frees the
  425. file descriptor for use with some other file.
  426. Termination of a program
  427. via
  428. .UL exit
  429. or return from the main program closes all open files.
  430. .PP
  431. The function
  432. .UL unlink(filename)
  433. removes the file
  434. .UL filename
  435. from the file system.
  436. .NH 2
  437. Random Access \(em Seek and Lseek
  438. .PP
  439. File I/O is normally sequential:
  440. each
  441. .UL read
  442. or
  443. .UL write
  444. takes place at a position in the file
  445. right after the previous one.
  446. When necessary, however,
  447. a file can be read or written in any arbitrary order.
  448. The
  449. system call
  450. .UL lseek
  451. provides a way to move around in
  452. a file without actually reading
  453. or writing:
  454. .P1
  455. lseek(fd, offset, origin);
  456. .P2
  457. forces the current position in the file
  458. whose descriptor is
  459. .UL fd
  460. to move to position
  461. .UL offset ,
  462. which is taken relative to the location
  463. specified by
  464. .UL origin .
  465. Subsequent reading or writing will begin at that position.
  466. .UL offset
  467. is
  468. a
  469. .UL long ;
  470. .UL fd
  471. and
  472. .UL origin
  473. are
  474. .UL int 's.
  475. .UL origin
  476. can be 0, 1, or 2 to specify that
  477. .UL offset
  478. is to be
  479. measured from
  480. the beginning, from the current position, or from the
  481. end of the file respectively.
  482. For example,
  483. to append to a file,
  484. seek to the end before writing:
  485. .P1
  486. lseek(fd, 0L, 2);
  487. .P2
  488. To get back to the beginning (``rewind''),
  489. .P1
  490. lseek(fd, 0L, 0);
  491. .P2
  492. Notice the
  493. .UL 0L
  494. argument;
  495. it could also be written as
  496. .UL (long)\ 0 .
  497. .PP
  498. With
  499. .UL lseek ,
  500. it is possible to treat files more or less like large arrays,
  501. at the price of slower access.
  502. For example, the following simple function reads any number of bytes
  503. from any arbitrary place in a file.
  504. .P1
  505. get(fd, pos, buf, n) /* read n bytes from position pos */
  506. int fd, n;
  507. long pos;
  508. char *buf;
  509. {
  510. lseek(fd, pos, 0); /* get to pos */
  511. return(read(fd, buf, n));
  512. }
  513. .P2
  514. .PP
  515. In pre-version 7
  516. .UC UNIX ,
  517. the basic entry point to the I/O system
  518. is called
  519. .UL seek .
  520. .UL seek
  521. is identical to
  522. .UL lseek ,
  523. except that its
  524. .UL offset
  525. argument is an
  526. .UL int
  527. rather than a
  528. .UL long .
  529. Accordingly,
  530. since
  531. .UC PDP -11
  532. integers have only 16 bits,
  533. the
  534. .UL offset
  535. specified
  536. for
  537. .UL seek
  538. is limited to 65,535;
  539. for this reason,
  540. .UL origin
  541. values of 3, 4, 5 cause
  542. .UL seek
  543. to multiply the given offset by 512
  544. (the number of bytes in one physical block)
  545. and then interpret
  546. .UL origin
  547. as if it were 0, 1, or 2 respectively.
  548. Thus to get to an arbitrary place in a large file
  549. requires two seeks, first one which selects
  550. the block, then one which
  551. has
  552. .UL origin
  553. equal to 1 and moves to the desired byte within the block.
  554. .NH 2
  555. Error Processing
  556. .PP
  557. The routines discussed in this section,
  558. and in fact all the routines which are direct entries into the system
  559. can incur errors.
  560. Usually they indicate an error by returning a value of \-1.
  561. Sometimes it is nice to know what sort of error occurred;
  562. for this purpose all these routines, when appropriate,
  563. leave an error number in the external cell
  564. .UL errno .
  565. The meanings of the various error numbers are
  566. listed
  567. in the introduction to Section II
  568. of the
  569. .I
  570. .UC UNIX
  571. Programmer's Manual,
  572. .R
  573. so your program can, for example, determine if
  574. an attempt to open a file failed because it did not exist
  575. or because the user lacked permission to read it.
  576. Perhaps more commonly,
  577. you may want to print out the
  578. reason for failure.
  579. The routine
  580. .UL perror
  581. will print a message associated with the value
  582. of
  583. .UL errno ;
  584. more generally,
  585. .UL sys\_errno
  586. is an array of character strings which can be indexed
  587. by
  588. .UL errno
  589. and printed by your program.