PageRenderTime 63ms CodeModel.GetById 32ms RepoModel.GetById 1ms app.codeStats 0ms

/doc/coding-style.txt

https://github.com/james-w/haproxy
Plain Text | 1251 lines | 960 code | 291 blank | 0 comment | 0 complexity | bf5a52d38e98059820857ee3c801c5bf MD5 | raw file
Possible License(s): LGPL-2.1, GPL-2.0
  1. 2011/12/30 - HAProxy coding style - Willy Tarreau <w@1wt.eu>
  2. ------------------------------------------------------------
  3. A number of contributors are often embarrassed with coding style issues, they
  4. don't always know if they're doing it right, especially since the coding style
  5. has elvoved along the years. What is explained here is not necessarily what is
  6. applied in the code, but new code should as much as possible conform to this
  7. style. Coding style fixes happen when code is replaced. It is useless to send
  8. patches to fix coding style only, they will be rejected, unless they belong to
  9. a patch series which needs these fixes prior to get code changes. Also, please
  10. avoid fixing coding style in the same patches as functional changes, they make
  11. code review harder.
  12. When modifying a file, you must accept the terms of the license of this file
  13. which is recalled at the top of the file, or is explained in the LICENSE file,
  14. or if not stated, defaults to LGPL version 2.1 or later for files in the
  15. 'include' directory, and GPL version 2 or later for all other files.
  16. When adding a new file, you must add a copyright banner at the top of the
  17. file with your real name, e-mail address and a reminder of the license.
  18. Contributions under incompatible licenses or too restrictive licenses might
  19. get rejected. If in doubt, please apply the principle above for existing files.
  20. All code examples below will intentionally be prefixed with " | " to mark
  21. where the code aligns with the first column, and tabs in this document will be
  22. represented as a series of 8 spaces so that it displays the same everywhere.
  23. 1) Indentation and alignment
  24. ----------------------------
  25. 1.1) Indentation
  26. ----------------
  27. Indentation and alignment are two completely different things that people often
  28. get wrong. Indentation is used to mark a sub-level in the code. A sub-level
  29. means that a block is executed in the context of another block (eg: a function
  30. or a condition) :
  31. | main(int argc, char **argv)
  32. | {
  33. | int i;
  34. |
  35. | if (argc < 2)
  36. | exit(1);
  37. | }
  38. In the example above, the code belongs to the main() function and the exit()
  39. call belongs to the if statement. Indentation is made with tabs (\t, ASCII 9),
  40. which allows any developer to configure their preferred editor to use their
  41. own tab size and to still get the text properly indented. Exactly one tab is
  42. used per sub-level. Tabs may only appear at the beginning of a line or after
  43. another tab. It is illegal to put a tab after some text, as it mangles displays
  44. in a different manner for different users (particularly when used to align
  45. comments or values after a #define). If you're tempted to put a tab after some
  46. text, then you're doing it wrong and you need alignment instead (see below).
  47. Note that there are places where the code was not properly indented in the
  48. past. In order to view it correctly, you may have to set your tab size to 8
  49. characters.
  50. 1.2) Alignment
  51. --------------
  52. Alignment is used to continue a line in a way to makes things easier to group
  53. together. By definition, alignment is character-based, so it uses spaces. Tabs
  54. would not work because for one tab there would not be as many characters on all
  55. displays. For instance, the arguments in a function declaration may be broken
  56. into multiple lines using alignment spaces :
  57. | int http_header_match2(const char *hdr, const char *end,
  58. | const char *name, int len)
  59. | {
  60. | ...
  61. | }
  62. In this example, the "const char *name" part is aligned with the first
  63. character of the group it belongs to (list of function arguments). Placing it
  64. here makes it obvious that it's one of the function's arguments. Multiple lines
  65. are easy to handle this way. This is very common with long conditions too :
  66. | if ((len < eol - sol) &&
  67. | (sol[len] == ':') &&
  68. | (strncasecmp(sol, name, len) == 0)) {
  69. | ctx->del = len;
  70. | }
  71. If we take again the example above marking tabs with "[-Tabs-]" and spaces
  72. with "#", we get this :
  73. | [-Tabs-]if ((len < eol - sol) &&
  74. | [-Tabs-]####(sol[len] == ':') &&
  75. | [-Tabs-]####(strncasecmp(sol, name, len) == 0)) {
  76. | [-Tabs-][-Tabs-]ctx->del = len;
  77. | [-Tabs-]}
  78. It is worth noting that some editors tend to confuse indentations and aligment.
  79. Emacs is notoriously known for this brokenness, and is responsible for almost
  80. all of the alignment mess. The reason is that Emacs only counts spaces, tries
  81. to fill as many as possible with tabs and completes with spaces. Once you know
  82. it, you just have to be careful, as alignment is not used much, so generally it
  83. is just a matter of replacing the last tab with 8 spaces when this happens.
  84. Indentation should be used everywhere there is a block or an opening brace. It
  85. is not possible to have two consecutive closing braces on the same column, it
  86. means that the innermost was not indented.
  87. Right :
  88. | main(int argc, char **argv)
  89. | {
  90. | if (argc > 1) {
  91. | printf("Hello\n");
  92. | }
  93. | exit(0);
  94. | }
  95. Wrong :
  96. | main(int argc, char **argv)
  97. | {
  98. | if (argc > 1) {
  99. | printf("Hello\n");
  100. | }
  101. | exit(0);
  102. | }
  103. A special case applies to switch/case statements. Due to my editor's settings,
  104. I've been used to align "case" with "switch" and to find it somewhat logical
  105. since each of the "case" statements opens a sublevel belonging to the "switch"
  106. statement. But indenting "case" after "switch" is accepted too. However in any
  107. case, whatever follows the "case" statement must be indented, whether or not it
  108. contains braces :
  109. | switch (*arg) {
  110. | case 'A': {
  111. | int i;
  112. | for (i = 0; i < 10; i++)
  113. | printf("Please stop pressing 'A'!\n");
  114. | break;
  115. | }
  116. | case 'B':
  117. | printf("You pressed 'B'\n");
  118. | break;
  119. | case 'C':
  120. | case 'D':
  121. | printf("You pressed 'C' or 'D'\n");
  122. | break;
  123. | default:
  124. | printf("I don't know what you pressed\n");
  125. | }
  126. 2) Braces
  127. ---------
  128. Braces are used to delimit multiple-instruction blocks. In general it is
  129. preferred to avoid braces around single-instruction blocks as it reduces the
  130. number of lines :
  131. Right :
  132. | if (argc >= 2)
  133. | exit(0);
  134. Wrong :
  135. | if (argc >= 2) {
  136. | exit(0);
  137. | }
  138. But it is not that strict, it really depends on the context. It happens from
  139. time to time that single-instruction blocks are enclosed within braces because
  140. it makes the code more symmetrical, or more readable. Example :
  141. | if (argc < 2) {
  142. | printf("Missing argument\n");
  143. | exit(1);
  144. | } else {
  145. | exit(0);
  146. | }
  147. Braces are always needed to declare a function. A function's opening brace must
  148. be placed at the beginning of the next line :
  149. Right :
  150. | int main(int argc, char **argv)
  151. | {
  152. | exit(0);
  153. | }
  154. Wrong :
  155. | int main(int argc, char **argv) {
  156. | exit(0);
  157. | }
  158. Note that a large portion of the code still does not conforms to this rule, as
  159. it took years to me to adapt to this more common standard which I now tend to
  160. prefer, as it avoids visual confusion when function declarations are broken on
  161. multiple lines :
  162. Right :
  163. | int foo(const char *hdr, const char *end,
  164. | const char *name, const char *err,
  165. | int len)
  166. | {
  167. | int i;
  168. Wrong :
  169. | int foo(const char *hdr, const char *end,
  170. | const char *name, const char *err,
  171. | int len) {
  172. | int i;
  173. Braces should always be used where there might be an ambiguity with the code
  174. later. The most common example is the stacked "if" statement where an "else"
  175. may be added later at the wrong place breaking the code, but it also happens
  176. with comments or long arguments in function calls. In general, if a block is
  177. more than one line long, it should use braces.
  178. Dangerous code waiting of a victim :
  179. | if (argc < 2)
  180. | /* ret must not be negative here */
  181. | if (ret < 0)
  182. | return -1;
  183. Wrong change :
  184. | if (argc < 2)
  185. | /* ret must not be negative here */
  186. | if (ret < 0)
  187. | return -1;
  188. | else
  189. | return 0;
  190. It will do this instead of what your eye seems to tell you :
  191. | if (argc < 2)
  192. | /* ret must not be negative here */
  193. | if (ret < 0)
  194. | return -1;
  195. | else
  196. | return 0;
  197. Right :
  198. | if (argc < 2) {
  199. | /* ret must not be negative here */
  200. | if (ret < 0)
  201. | return -1;
  202. | }
  203. | else
  204. | return 0;
  205. Similarly dangerous example :
  206. | if (ret < 0)
  207. | /* ret must not be negative here */
  208. | complain();
  209. | init();
  210. Wrong change to silent the annoying message :
  211. | if (ret < 0)
  212. | /* ret must not be negative here */
  213. | //complain();
  214. | init();
  215. ... which in fact means :
  216. | if (ret < 0)
  217. | init();
  218. 3) Breaking lines
  219. -----------------
  220. There is no strict rule for line breaking. Some files try to stick to the 80
  221. column limit, but given that various people use various tab sizes, it does not
  222. make much sense. Also, code is sometimes easier to read with less lines, as it
  223. represents less surface on the screen (since each new line adds its tabs and
  224. spaces). The rule is to stick to the average line length of other lines. If you
  225. are working in a file which fits in 80 columns, try to keep this goal in mind.
  226. If you're in a function with 120-chars lines, there is no reason to add many
  227. short lines, so you can make longer lines.
  228. In general, opening a new block should lead to a new line. Similarly, multiple
  229. instructions should be avoided on the same line. But some constructs make it
  230. more readable when those are perfectly aligned :
  231. A copy-paste bug in the following construct will be easier to spot :
  232. | if (omult % idiv == 0) { omult /= idiv; idiv = 1; }
  233. | if (idiv % omult == 0) { idiv /= omult; omult = 1; }
  234. | if (imult % odiv == 0) { imult /= odiv; odiv = 1; }
  235. | if (odiv % imult == 0) { odiv /= imult; imult = 1; }
  236. than in this one :
  237. | if (omult % idiv == 0) {
  238. | omult /= idiv;
  239. | idiv = 1;
  240. | }
  241. | if (idiv % omult == 0) {
  242. | idiv /= omult;
  243. | omult = 1;
  244. | }
  245. | if (imult % odiv == 0) {
  246. | imult /= odiv;
  247. | odiv = 1;
  248. | }
  249. | if (odiv % imult == 0) {
  250. | odiv /= imult;
  251. | imult = 1;
  252. | }
  253. What is important is not to mix styles. For instance there is nothing wrong
  254. with having many one-line "case" statements as long as most of them are this
  255. short like below :
  256. | switch (*arg) {
  257. | case 'A': ret = 1; break;
  258. | case 'B': ret = 2; break;
  259. | case 'C': ret = 4; break;
  260. | case 'D': ret = 8; break;
  261. | default : ret = 0; break;
  262. | }
  263. Otherwise, prefer to have the "case" statement on its own line as in the
  264. example in section 1.2 about alignment. In any case, avoid to stack multiple
  265. control statements on the same line, so that it will never be the needed to
  266. add two tab levels at once :
  267. Right :
  268. | switch (*arg) {
  269. | case 'A':
  270. | if (ret < 0)
  271. | ret = 1;
  272. | break;
  273. | default : ret = 0; break;
  274. | }
  275. Wrong :
  276. | switch (*arg) {
  277. | case 'A': if (ret < 0)
  278. | ret = 1;
  279. | break;
  280. | default : ret = 0; break;
  281. | }
  282. Right :
  283. | if (argc < 2)
  284. | if (ret < 0)
  285. | return -1;
  286. or Right :
  287. | if (argc < 2)
  288. | if (ret < 0) return -1;
  289. but Wrong :
  290. | if (argc < 2) if (ret < 0) return -1;
  291. When complex conditions or expressions are broken into multiple lines, please
  292. do ensure that alignment is perfectly appropriate, and group all main operators
  293. on the same side (which you're free to choose as long as it does not change for
  294. every block. Putting binary operators on the right side is preferred as it does
  295. not mangle with alignment but various people have their preferences.
  296. Right :
  297. | if ((txn->flags & TX_NOT_FIRST) &&
  298. | ((req->flags & BF_FULL) ||
  299. | req->r < req->lr ||
  300. | req->r > req->data + req->size - global.tune.maxrewrite)) {
  301. | return 0;
  302. | }
  303. Right :
  304. | if ((txn->flags & TX_NOT_FIRST)
  305. | && ((req->flags & BF_FULL)
  306. | || req->r < req->lr
  307. | || req->r > req->data + req->size - global.tune.maxrewrite)) {
  308. | return 0;
  309. | }
  310. Wrong :
  311. | if ((txn->flags & TX_NOT_FIRST) &&
  312. | ((req->flags & BF_FULL) ||
  313. | req->r < req->lr
  314. | || req->r > req->data + req->size - global.tune.maxrewrite)) {
  315. | return 0;
  316. | }
  317. If it makes the result more readable, parenthesis may even be closed on their
  318. own line in order to align with the opening one. Note that should normally not
  319. be needed because such code would be too complex to be digged into.
  320. The "else" statement may either be merged with the closing "if" brace or lie on
  321. its own line. The later is preferred but it adds one extra line to each control
  322. block which is annoying in short ones. However, if the "else" is followed by an
  323. "if", then it should really be on its own line and the rest of the if/else
  324. blocks must follow the same style.
  325. Right :
  326. | if (a < b) {
  327. | return a;
  328. | }
  329. | else {
  330. | return b;
  331. | }
  332. Right :
  333. | if (a < b) {
  334. | return a;
  335. | } else {
  336. | return b;
  337. | }
  338. Right :
  339. | if (a < b) {
  340. | return a;
  341. | }
  342. | else if (a != b) {
  343. | return b;
  344. | }
  345. | else {
  346. | return 0;
  347. | }
  348. Wrong :
  349. | if (a < b) {
  350. | return a;
  351. | } else if (a != b) {
  352. | return b;
  353. | } else {
  354. | return 0;
  355. | }
  356. Wrong :
  357. | if (a < b) {
  358. | return a;
  359. | }
  360. | else if (a != b) {
  361. | return b;
  362. | } else {
  363. | return 0;
  364. | }
  365. 4) Spacing
  366. ----------
  367. Correctly spacing code is very important. When you have to spot a bug at 3am,
  368. you need it to be clear. When you expect other people to review your code, you
  369. want it to be clear and don't want them to get nervous when trying to find what
  370. you did.
  371. Always place spaces around all binary or ternary operators, commas, as well as
  372. after semi-colons and opening braces if the line continues :
  373. Right :
  374. | int ret = 0;
  375. | /* if (x >> 4) { x >>= 4; ret += 4; } */
  376. | ret += (x >> 4) ? (x >>= 4, 4) : 0;
  377. | val = ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
  378. Wrong :
  379. | int ret=0;
  380. | /* if (x>>4) {x>>=4;ret+=4;} */
  381. | ret+=(x>>4)?(x>>=4,4):0;
  382. | val=ret+((0xFFFFAA50U>>(x<<1))&3)+1;
  383. Never place spaces after unary operators (&, *, -, !, ~, ++, --) nor cast, as
  384. they might be confused with they binary counterpart, nor before commas or
  385. semicolons :
  386. Right :
  387. | bit = !!(~len++ ^ -(unsigned char)*x);
  388. Wrong :
  389. | bit = ! ! (~len++ ^ - (unsigned char) * x) ;
  390. Note that "sizeof" is a unary operator which is sometimes considered as a
  391. langage keyword, but in no case it is a function. It does not require
  392. parenthesis so it is sometimes followed by spaces and sometimes not when
  393. there are no parenthesis. Most people do not really care as long as what
  394. is written is unambiguous.
  395. Braces opening a block must be preceeded by one space unless the brace is
  396. placed on the first column :
  397. Right :
  398. | if (argc < 2) {
  399. | }
  400. Wrong :
  401. | if (argc < 2){
  402. | }
  403. Do not add unneeded spaces inside parenthesis, they just make the code less
  404. readable.
  405. Right :
  406. | if (x < 4 && (!y || !z))
  407. | break;
  408. Wrong :
  409. | if ( x < 4 && ( !y || !z ) )
  410. | break;
  411. Language keywords must all be followed by a space. This is true for control
  412. statements (do, for, while, if, else, return, switch, case), and for types
  413. (int, char, unsigned). As an exception, the last type in a cast does not take
  414. a space before the closing parenthesis). The "default" statement in a "switch"
  415. construct is generally just followed by the colon. However the colon after a
  416. "case" or "default" statement must be followed by a space.
  417. Right :
  418. | if (nbargs < 2) {
  419. | printf("Missing arg at %c\n", *(char *)ptr);
  420. | for (i = 0; i < 10; i++) beep();
  421. | return 0;
  422. | }
  423. | switch (*arg) {
  424. Wrong :
  425. | if(nbargs < 2){
  426. | printf("Missing arg at %c\n", *(char*)ptr);
  427. | for(i = 0; i < 10; i++)beep();
  428. | return 0;
  429. | }
  430. | switch(*arg) {
  431. Function calls are different, the opening parenthesis is always coupled to the
  432. function name without any space. But spaces are still needed after commas :
  433. Right :
  434. | if (!init(argc, argv))
  435. | exit(1);
  436. Wrong :
  437. | if (!init (argc,argv))
  438. | exit(1);
  439. 5) Excess or lack of parenthesis
  440. --------------------------------
  441. Sometimes there are too many parenthesis in some formulas, sometimes there are
  442. too few. There are a few rules of thumb for this. The first one is to respect
  443. the compiler's advice. If it emits a warning and asks for more parenthesis to
  444. avoid confusion, follow the advice at least to shut the warning. For instance,
  445. the code below is quite ambiguous due to its alignment :
  446. | if (var1 < 2 || var2 < 2 &&
  447. | var3 != var4) {
  448. | /* fail */
  449. | return -3;
  450. | }
  451. Note that this code does :
  452. | if (var1 < 2 || (var2 < 2 && var3 != var4)) {
  453. | /* fail */
  454. | return -3;
  455. | }
  456. But maybe the author meant :
  457. | if ((var1 < 2 || var2 < 2) && var3 != var4) {
  458. | /* fail */
  459. | return -3;
  460. | }
  461. A second rule to put parenthesis is that people don't always know operators
  462. precedence too well. Most often they have no issue with operators of the same
  463. category (eg: booleans, integers, bit manipulation, assignment) but once these
  464. operators are mixed, it causes them all sort of issues. In this case, it is
  465. wise to use parenthesis to avoid errors. One common error concerns the bit
  466. shift operators because they're used to replace multiplies and divides but
  467. don't have the same precedence :
  468. The expression :
  469. | x = y * 16 + 5;
  470. becomes :
  471. | x = y << 4 + 5;
  472. which is wrong because it is equivalent to :
  473. | x = y << (4 + 5);
  474. while the following was desired instead :
  475. | x = (y << 4) + 5;
  476. It is generally fine to write boolean expressions based on comparisons without
  477. any parenthesis. But on top of that, integer expressions and assignments should
  478. then be protected. For instance, there is an error in the expression below
  479. which should be safely rewritten :
  480. Wrong :
  481. | if (var1 > 2 && var1 < 10 ||
  482. | var1 > 2 + 256 && var2 < 10 + 256 ||
  483. | var1 > 2 + 1 << 16 && var2 < 10 + 2 << 16)
  484. | return 1;
  485. Right (may remove a few parenthesis depending on taste) :
  486. | if ((var1 > 2 && var1 < 10) ||
  487. | (var1 > (2 + 256) && var2 < (10 + 256)) ||
  488. | (var1 > (2 + (1 << 16)) && var2 < (10 + (1 << 16))))
  489. | return 1;
  490. The "return" statement is not a function, so it takes no argument. It is a
  491. control statement which is followed by the expression to be returned. It does
  492. not need to be followed by parenthesis :
  493. Wrong :
  494. | int ret0()
  495. | {
  496. | return(0);
  497. | }
  498. Right :
  499. | int ret0()
  500. | {
  501. | return 0;
  502. | }
  503. Parenthesisis are also found in type casts. Type casting should be avoided as
  504. much as possible, especially when it concerns pointer types. Casting a pointer
  505. disables the compiler's type checking and is the best way to get caught doing
  506. wrong things with data not the size you expect. If you need to manipulate
  507. multiple data types, you can use a union instead. If the union is really not
  508. convenient and casts are easier, then try to isolate them as much as possible,
  509. for instance when initializing function arguments or in another function. Not
  510. proceeding this way causes huge risks of not using the proper pointer without
  511. any notification, which is especially true during copy-pastes.
  512. Wrong :
  513. | void *check_private_data(void *arg1, void *arg2)
  514. | {
  515. | char *area;
  516. |
  517. | if (*(int *)arg1 > 1000)
  518. | return NULL;
  519. | if (memcmp(*(const char *)arg2, "send(", 5) != 0))
  520. | return NULL;
  521. | area = malloc(*(int *)arg1);
  522. | if (!area)
  523. | return NULL;
  524. | memcpy(area, *(const char *)arg2 + 5, *(int *)arg1);
  525. | return area;
  526. | }
  527. Right :
  528. | void *check_private_data(void *arg1, void *arg2)
  529. | {
  530. | char *area;
  531. | int len = *(int *)arg1;
  532. | const char *msg = arg2;
  533. |
  534. | if (len > 1000)
  535. | return NULL;
  536. | if (memcmp(msg, "send(", 5) != 0)
  537. | return NULL;
  538. | area = malloc(len);
  539. | if (!area)
  540. | return NULL;
  541. | memcpy(area, msg + 5, len);
  542. | return area;
  543. | }
  544. 6) Ambiguous comparisons with zero or NULL
  545. ------------------------------------------
  546. In C, '0' has no type, or it has the type of the variable it is assigned to.
  547. Comparing a variable or a return value with zero means comparing with the
  548. representation of zero for this variable's type. For a boolean, zero is false.
  549. For a pointer, zero is NULL. Very often, to make things shorter, it is fine to
  550. use the '!' unary operator to compare with zero, as it is shorter and easier to
  551. remind or understand than a plain '0'. Since the '!' operator is read "not", it
  552. helps read code faster when what follows it makes sense as a boolean, and it is
  553. often much more appropriate than a comparison with zero which makes an equal
  554. sign appear at an undesirable place. For instance :
  555. | if (!isdigit(*c) && !isspace(*c))
  556. | break;
  557. is easier to understand than :
  558. | if (isdigit(*c) == 0 && isspace(*c) == 0)
  559. | break;
  560. For a char this "not" operator can be reminded as "no remaining char", and the
  561. absence of comparison to zero implies existence of the tested entity, hence the
  562. simple strcpy() implementation below which automatically stops once the last
  563. zero is copied :
  564. | void my_strcpy(char *d, const char *s)
  565. | {
  566. | while ((*d++ = *s++));
  567. | }
  568. Note the double parenthesis in order to avoid the compiler telling us it looks
  569. like an equality test.
  570. For a string or more generally any pointer, this test may be understood as an
  571. existence test or a validity test, as the only pointer which will fail to
  572. validate equality is the NULL pointer :
  573. | area = malloc(1000);
  574. | if (!area)
  575. | return -1;
  576. However sometimes it can fool the reader. For instance, strcmp() precisely is
  577. one of such functions whose return value can make one think the opposite due to
  578. its name which may be understood as "if strings compare...". Thus it is strongly
  579. recommended to perform an explicit comparison with zero in such a case, and it
  580. makes sense considering that the comparison's operator is the same that is
  581. wanted to compare the strings (note that current config parser lacks a lot in
  582. this regards) :
  583. strcmp(a, b) == 0 <=> a == b
  584. strcmp(a, b) != 0 <=> a != b
  585. strcmp(a, b) < 0 <=> a < b
  586. strcmp(a, b) > 0 <=> a > b
  587. Avoid this :
  588. | if (strcmp(arg, "test"))
  589. | printf("this is not a test\n");
  590. |
  591. | if (!strcmp(arg, "test"))
  592. | printf("this is a test\n");
  593. Prefer this :
  594. | if (strcmp(arg, "test") != 0)
  595. | printf("this is not a test\n");
  596. |
  597. | if (strcmp(arg, "test") == 0)
  598. | printf("this is a test\n");
  599. 7) System call returns
  600. ----------------------
  601. This is not directly a matter of coding style but more of bad habits. It is
  602. important to check for the correct value upon return of syscalls. The proper
  603. return code indicating an error is described in its man page. There is no
  604. reason to consider wider ranges than what is indicated. For instance, it is
  605. common to see such a thing :
  606. | if ((fd = open(file, O_RDONLY)) < 0)
  607. | return -1;
  608. This is wrong. The man page says that -1 is returned if an error occured. It
  609. does not suggest that any other negative value will be an error. It is possible
  610. that a few such issues have been left in existing code. They are bugs for which
  611. fixes are accepted, eventhough they're currently harmless since open() is not
  612. known for returning negative values at the moment.
  613. 8) Declaring new types, names and values
  614. ----------------------------------------
  615. Please refrain from using "typedef" to declare new types, they only obfuscate
  616. the code. The reader never knows whether he's manipulating a scalar type or a
  617. struct. For instance it is not obvious why the following code fails to build :
  618. | int delay_expired(timer_t exp, timer_us_t now)
  619. | {
  620. | return now >= exp;
  621. | }
  622. With the types declared in another file this way :
  623. | typedef unsigned int timer_t;
  624. | typedef struct timeval timer_us_t;
  625. This cannot work because we're comparing a scalar with a struct, which does
  626. not make sense. Without a typedef, the function would have been written this
  627. way without any ambiguity and would not have failed :
  628. | int delay_expired(unsigned int exp, struct timeval *now)
  629. | {
  630. | return now >= exp->tv_sec;
  631. | }
  632. Declaring special values may be done using enums. Enums are a way to define
  633. structured integer values which are related to each other. They are perfectly
  634. suited for state machines. While the first element is always assigned the zero
  635. value, not everybody knows that, especially people working with multiple
  636. languages all the day. For this reason it is recommended to explicitly force
  637. the first value even if it's zero. The last element should be followed by a
  638. comma if it is planned that new elements might later be added, this will make
  639. later patches shorter. Conversely, if the last element is placed in order to
  640. get the number of possible values, it must not be followed by a comma and must
  641. be preceeded by a comment :
  642. | enum {
  643. | first = 0,
  644. | second,
  645. | third,
  646. | fourth,
  647. | };
  648. | enum {
  649. | first = 0,
  650. | second,
  651. | third,
  652. | fourth,
  653. | /* nbvalues must always be placed last */
  654. | nbvalues
  655. | };
  656. Structure names should be short enough not to mangle function declarations,
  657. and explicit enough to avoid confusion (which is the most important thing).
  658. Wrong :
  659. | struct request_args { /* arguments on the query string */
  660. | char *name;
  661. | char *value;
  662. | struct misc_args *next;
  663. | };
  664. Right :
  665. | struct qs_args { /* arguments on the query string */
  666. | char *name;
  667. | char *value;
  668. | struct qs_args *next;
  669. | }
  670. When declaring new functions or structures, please do not use CamelCase, which
  671. is a style where upper and lower case are mixed in a single word. It causes a
  672. lot of confusion when words are composed from acronyms, because it's hard to
  673. stick to a rule. For instance, a function designed to generate an ISN (initial
  674. sequence number) for a TCP/IP connection could be called :
  675. - generateTcpipIsn()
  676. - generateTcpIpIsn()
  677. - generateTcpIpISN()
  678. - generateTCPIPISN()
  679. etc...
  680. None is right, none is wrong, these are just preferences which might change
  681. along the code. Instead, please use an underscore to separate words. Lowercase
  682. is preferred for the words, but if acronyms are upcased it's not dramatic. The
  683. real advantage of this method is that it creates unambiguous levels even for
  684. short names.
  685. Valid examples :
  686. - generate_tcpip_isn()
  687. - generate_tcp_ip_isn()
  688. - generate_TCPIP_ISN()
  689. - generate_TCP_IP_ISN()
  690. Another example is easy to understand when 3 arguments are involved in naming
  691. the function :
  692. Wrong (naming conflict) :
  693. | /* returns A + B * C */
  694. | int mulABC(int a, int b, int c)
  695. | {
  696. | return a + b * c;
  697. | }
  698. |
  699. | /* returns (A + B) * C */
  700. | int mulABC(int a, int b, int c)
  701. | {
  702. | return (a + b) * c;
  703. | }
  704. Right (unambiguous naming) :
  705. | /* returns A + B * C */
  706. | int mul_a_bc(int a, int b, int c)
  707. | {
  708. | return a + b * c;
  709. | }
  710. |
  711. | /* returns (A + B) * C */
  712. | int mul_ab_c(int a, int b, int c)
  713. | {
  714. | return (a + b) * c;
  715. | }
  716. Whenever you manipulate pointers, try to declare them as "const", as it will
  717. save you from many accidental misuses and will only cause warnings to be
  718. emitted when there is a real risk. In the examples below, it is possible to
  719. call my_strcpy() with a const string only in the first declaration. Note that
  720. people who ignore "const" are often the ones who cast a lot and who complain
  721. from segfaults when using strtok() !
  722. Right :
  723. | void my_strcpy(char *d, const char *s)
  724. | {
  725. | while ((*d++ = *s++));
  726. | }
  727. |
  728. | void say_hello(char *dest)
  729. | {
  730. | my_strcpy(dest, "hello\n");
  731. | }
  732. Wrong :
  733. | void my_strcpy(char *d, char *s)
  734. | {
  735. | while ((*d++ = *s++));
  736. | }
  737. |
  738. | void say_hello(char *dest)
  739. | {
  740. | my_strcpy(dest, "hello\n");
  741. | }
  742. 9) Getting macros right
  743. -----------------------
  744. It is very common for macros to do the wrong thing when used in a way their
  745. author did not have in mind. For this reason, macros must always be named with
  746. uppercase letters only. This is the only way to catch the developer's eye when
  747. using them, so that he double-checks whether he's taking risks or not. First,
  748. macros must never ever be terminated by a semi-colon, or they will close the
  749. wrong block once in a while. For instance, the following will cause a build
  750. error before the "else" due to the double semi-colon :
  751. Wrong :
  752. | #define WARN printf("warning\n");
  753. | ...
  754. | if (a < 0)
  755. | WARN;
  756. | else
  757. | a--;
  758. Right :
  759. | #define WARN printf("warning\n")
  760. If multiple instructions are needed, then use a do { } while (0) block, which
  761. is the only construct which respects *exactly* the semantics of a single
  762. instruction :
  763. | #define WARN do { printf("warning\n"); log("warning\n"); } while (0)
  764. | ...
  765. |
  766. | if (a < 0)
  767. | WARN;
  768. | else
  769. | a--;
  770. Second, do not put unprotected control statements in macros, they will
  771. definitely cause bugs :
  772. Wrong :
  773. | #define WARN if (verbose) printf("warning\n")
  774. | ...
  775. | if (a < 0)
  776. | WARN;
  777. | else
  778. | a--;
  779. Which is equivalent to the undesired form below :
  780. | if (a < 0)
  781. | if (verbose)
  782. | printf("warning\n");
  783. | else
  784. | a--;
  785. Right way to do it :
  786. | #define WARN do { if (verbose) printf("warning\n"); } while (0)
  787. | ...
  788. | if (a < 0)
  789. | WARN;
  790. | else
  791. | a--;
  792. Which is equivalent to :
  793. | if (a < 0)
  794. | do { if (verbose) printf("warning\n"); } while (0);
  795. | else
  796. | a--;
  797. Macro parameters must always be surrounded by parenthesis, and must never be
  798. duplicated in the same macro unless explicitly stated. Also, macros must not be
  799. defined with operators without surrounding parenthesis. The MIN/MAX macros are
  800. a pretty common example of multiple misuses, but this happens as early as when
  801. using bit masks. Most often, in case of any doubt, try to use inline functions
  802. instead.
  803. Wrong :
  804. | #define MIN(a, b) a < b ? a : b
  805. |
  806. | /* returns 2 * min(a,b) + 1 */
  807. | int double_min_p1(int a, int b)
  808. | {
  809. | return 2 * MIN(a, b) + 1;
  810. | }
  811. What this will do :
  812. | int double_min_p1(int a, int b)
  813. | {
  814. | return 2 * a < b ? a : b + 1;
  815. | }
  816. Which is equivalent to :
  817. | int double_min_p1(int a, int b)
  818. | {
  819. | return (2 * a) < b ? a : (b + 1);
  820. | }
  821. The first thing to fix is to surround the macro definition with parenthesis to
  822. avoid this mistake :
  823. | #define MIN(a, b) (a < b ? a : b)
  824. But this is still not enough, as can be seen in this example :
  825. | /* compares either a or b with c */
  826. | int min_ab_c(int a, int b, int c)
  827. | {
  828. | return MIN(a ? a : b, c);
  829. | }
  830. Which is equivalent to :
  831. | int min_ab_c(int a, int b, int c)
  832. | {
  833. | return (a ? a : b < c ? a ? a : b : c);
  834. | }
  835. Which in turn means a totally different thing due to precedence :
  836. | int min_ab_c(int a, int b, int c)
  837. | {
  838. | return (a ? a : ((b < c) ? (a ? a : b) : c));
  839. | }
  840. This can be fixed by surrounding *each* argument in the macro with parenthesis:
  841. | #define MIN(a, b) ((a) < (b) ? (a) : (b))
  842. But this is still not enough, as can be seen in this example :
  843. | int min_ap1_b(int a, int b)
  844. | {
  845. | return MIN(++a, b);
  846. | }
  847. Which is equivalent to :
  848. | int min_ap1_b(int a, int b)
  849. | {
  850. | return ((++a) < (b) ? (++a) : (b));
  851. | }
  852. Again, this is wrong because "a" is incremented twice if below b. The only way
  853. to fix this is to use a compound statement and to assign each argument exactly
  854. once to a local variable of the same type :
  855. | #define MIN(a, b) ({ typeof(a) __a = (a); typeof(b) __b = (b); \
  856. | ((__a) < (__b) ? (__a) : (__b)); \
  857. | })
  858. At this point, using static inline functions is much cleaner if a single type
  859. is to be used :
  860. | static inline int min(int a, int b)
  861. | {
  862. | return a < b ? a : b;
  863. | }
  864. 10) Includes
  865. ------------
  866. Includes are as much as possible listed in alphabetically ordered groups :
  867. - the libc-standard includes (those without any path component)
  868. - the includes more or less system-specific (sys/*, netinet/*, ...)
  869. - includes from the local "common" subdirectory
  870. - includes from the local "types" subdirectory
  871. - includes from the local "proto" subdirectory
  872. Each section is just visually delimited from the other ones using an empty
  873. line. The two first ones above may be merged into a single section depending on
  874. developer's preference. Please do not copy-paste include statements from other
  875. files. Having too many includes significantly increases build time and makes it
  876. hard to find which ones are needed later. Just include what you need and if
  877. possible in alphabetical order so that when something is missing, it becomes
  878. obvious where to look for it and where to add it.
  879. All files should include <common/config.h> because this is where build options
  880. are prepared.
  881. Header files are split in two directories ("types" and "proto") depending on
  882. what they provide. Types, structures, enums and #defines must go into the
  883. "types" directory. Function prototypes and inlined functions must go into the
  884. "proto" directory. This split is because of inlined functions which
  885. cross-reference types from other files, which cause a chicken-and-egg problem
  886. if the functions and types are declared at the same place.
  887. All headers which do not depend on anything currently go to the "common"
  888. subdirectory, but are equally well placed into the "proto" directory. It is
  889. possible that one day the "common" directory will disappear.
  890. Include files must be protected against multiple inclusion using the common
  891. #ifndef/#define/#endif trick with a tag derived from the include file and its
  892. location.
  893. 11) Comments
  894. ------------
  895. Comments are preferably of the standard 'C' form using /* */. The C++ form "//"
  896. are tolerated for very short comments (eg: a word or two) but should be avoided
  897. as much as possible. Multi-line comments are made with each intermediate line
  898. starting with a star aligned with the first one, as in this example :
  899. | /*
  900. | * This is a multi-line
  901. | * comment.
  902. | */
  903. If multiple code lines need a short comment, try to align them so that you can
  904. have multi-line sentences. This is rarely needed, only for really complex
  905. constructs.
  906. Do not tell what you're doing in comments, but explain why you're doing it if
  907. it seems not to be obvious. Also *do* indicate at the top of function what they
  908. accept and what they don't accept. For instance, strcpy() only accepts output
  909. buffers at least as large as the input buffer, and does not support any NULL
  910. pointer. There is nothing wrong with that if the caller knows it.
  911. Wrong use of comments :
  912. | int flsnz8(unsigned int x)
  913. | {
  914. | int ret = 0; /* initialize ret */
  915. | if (x >> 4) { x >>= 4; ret += 4; } /* add 4 to ret if needed */
  916. | return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1; /* add ??? */
  917. | }
  918. | ...
  919. | bit = ~len + (skip << 3) + 9; /* update bit */
  920. Right use of comments :
  921. | /* This function returns the positoin of the highest bit set in the lowest
  922. | * byte of <x>, between 0 and 7. It only works if <x> is non-null. It uses
  923. | * a 32-bit value as a lookup table to return one of 4 values for the
  924. | * highest 16 possible 4-bit values.
  925. | */
  926. | int flsnz8(unsigned int x)
  927. | {
  928. | int ret = 0;
  929. | if (x >> 4) { x >>= 4; ret += 4; }
  930. | return ret + ((0xFFFFAA50U >> (x << 1)) & 3) + 1;
  931. | }
  932. | ...
  933. | bit = ~len + (skip << 3) + 9; /* (skip << 3) + (8 - len), saves 1 cycle */
  934. 12) Use of assembly
  935. -------------------
  936. There are many projects where use of assembly code is not welcome. There is no
  937. problem with use of assembly in haproxy, provided that :
  938. a) an alternate C-form is provided for architectures not covered
  939. b) the code is small enough and well commented enough to be maintained
  940. It is important to take care of various incompatibilities between compiler
  941. versions, for instance regarding output and cloberred registers. There are
  942. a number of documentations on the subject on the net. Anyway if you are
  943. fiddling with assembly, you probably know that already.
  944. Example :
  945. | /* gcc does not know when it can safely divide 64 bits by 32 bits. Use this
  946. | * function when you know for sure that the result fits in 32 bits, because
  947. | * it is optimal on x86 and on 64bit processors.
  948. | */
  949. | static inline unsigned int div64_32(unsigned long long o1, unsigned int o2)
  950. | {
  951. | unsigned int result;
  952. | #ifdef __i386__
  953. | asm("divl %2"
  954. | : "=a" (result)
  955. | : "A"(o1), "rm"(o2));
  956. | #else
  957. | result = o1 / o2;
  958. | #endif
  959. | return result;
  960. | }