docs/coding-style.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
   2 <HTML>
   3 <HEAD>
   4    <TITLE>GHC Style Guidelines for C code</TITLE>
   5 </HEAD>
   6 <BODY>
   7
   8 <H1>Coding suggestions for GHC/Hugs related code</h1>
   9
  10 <h2>Comments</h2>
  11
  12 NB These are just suggestions.  They're not set in stone.  Some of
  13 them are probably misguided.  If you disagree with them, feel free to
  14 modify this document (and make your commit message reasonably
  15 informative) or mail someone (eg. <a
  16 href="glasgow-haskell-users@haskell.org">The GHC mailing list</a>)
  17
  18 <h2>References</h2>
  19
  20 If you haven't read them already, you might like to check the following.
  21 Where they conflict with our suggestions, they're probably right.
  22
  23 <ul>
  24 <li>
  25 Writing Solid Code, Microsoft Press.  (Highly recommended.  Possibly
  26 the only Microsoft Press book that's worth reading.  SimonPJ has a
  27 copy.)
  28
  29 <p><li>
  30 Autoconf documentation (which doesn't seem to be on the web).
  31 See also <a href="http://peti.gmd.de/autoconf-archive/">The autoconf macro archive</a> and
  32 <a href="http://www.cyclic.com/cyclic-pages/autoconf.html">Cyclic Software's description</a>
  33
  34 <p><li> <a
  35 href="http://www.cs.umd.edu/users/cml/cstyle/indhill-cstyle.html">Indian
  36 Hill C Style and Coding Standards</a>.
  37
  38 <p><li>
  39 <a href="http://www.cs.umd.edu/users/cml/cstyle/">A list of C programming style links</a>
  40
  41 <p><li>
  42 <a href="http://www.lysator.liu.se/c/c-www.html">A very large list of C programming links</a>
  43
  44 <p><li>
  45 <a href="http://www.geek-girl.com/unix.html">A list of Unix programming links</a>
  46
  47 </ul>
  48
  49
  50 <h2>Portability issues</h2>
  51
  52 <ul>
  53 <p><li>
  54 We use ANSI C with some extensions.  In particular, we use:
  55 <ul>
  56 <li>enum
  57 <li>ANSI style prototypes
  58 <li>#elsif, #error, #warning, ## and other cpp features
  59 </ul>
  60
  61 <p><li>
  62 We use the following gcc extensions (see gcc documentation):
  63
  64 <ul>
  65 <p><li>zero length arrays (useful as the last field of a struct)
  66
  67 <p><li>inline annotations on functions (see later)
  68
  69 <p><li>Labeled elements in initialisers
  70
  71 <p><li>Function attributes (mostly just <code>no_return</code> and
  72 <code>unused</code>)
  73 </ul>
  74
  75 <p>Some of these gcc/ANSI features could be avoided (for example, we
  76 could use __inline__ instead of inline or use the usual PROTO((...))
  77 trick in function prototypes) - but some of them can't be avoided
  78 so we don't try with the others.</p>
  79
  80 <p>Most of these features are actually part of the new C9X standard,
  81 so we hope to have mostly conformant code at some point in the future.
  82
  83 <p><li>
  84 Please don't use C++ style comments, they aren't standard C.
  85
  86 <p><li>
  87 char can be signed or unsigned - always say which you mean
  88
  89 <p><li> Our POSIX policy: try to write code that only uses POSIX (IEEE
  90 Std 1003.1) interfaces and APIs.  When you include <code>Rts.h</code>,
  91 <code>POSIX_SOURCE</code> is automatically defined for you before any
  92 system headers are slurped in, unless you define
  93 <code>NON_POSIX_SOURCE</code> prior to including <code>Rts.h</code>.
  94 A good C library will use the <code>POSIX_SOURCE</code> define to
  95 eliminate non-posix types and function prototypes, so the compiler
  96 should complain if you venture outside the POSIX spec.</li>
  97
  98 <p><li>
  99 Some architectures have memory alignment constraints.
 100 Others don't have any constraints but go faster if you align things.
 101 These macros tell you which alignment to use
 102
 103 <pre>
 104   /* minimum alignment of unsigned int */
 105   #define ALIGNMENT_UNSIGNED_INT 4
 106
 107   /* minimum alignment of long */
 108   #define ALIGNMENT_LONG 4
 109
 110   /* minimum alignment of float */
 111   #define ALIGNMENT_FLOAT 4
 112
 113   /* minimum alignment of double */
 114   #define ALIGNMENT_DOUBLE 4
 115 </pre>
 116
 117 <p><li>
 118 Use StgInt, StgWord and StgPtr when reading/writing ints and ptrs to
 119 the stack or heap.  Note that, by definition, StgInt, StgWord and
 120 StgPtr are the same size and have the same alignment constraints
 121 even if sizeof(int) != sizeof(ptr) on that platform.
 122
 123 <p><li>
 124 Use StgInt8, StgInt16, etc when you need a certain minimum number of
 125 bits in a type.  Use int and nat when there's no particular
 126 constraint.  ANSI C only guarantees that ints are at least 16 bits but
 127 within GHC we assume they are 32 bits (do we? --SDM).
 128
 129 <p><li>
 130 Use StgFloat and StgDouble for floating point values which will go
 131 on/have come from the stack or heap.  Note that StgFloat may be the
 132 same as StgDouble on some architectures (eg Alphas) and that it might
 133 occupy many StgWords.
 134
 135 <p>
 136 Use PK_FLT(addr), PK_DBL(addr) to read StgFloat and
 137 StgDouble values from the stack/heap, and ASSIGN_FLT(val,addr)/
 138 ASSIGN_DBL(val,addr) to assign StgFloat/StgDouble values to heap/stack
 139 locations.  These macros take care of alignment restrictions.
 140
 141 <p>
 142 Heap/Stack locations are StgWord aligned; the alignment requirements
 143 of an StgDouble may be more than that of StgWord, but we don't pad
 144 misaligned StgDoubles (doing so would be too much hassle).
 145
 146 <p>
 147 Doing a direct assignment/read of an StgDouble to/from a mis-aligned
 148 location may not work, so we use the ASSIGN_DBL(,)/PK_DBL() macro,
 149 which goes via a temporary.
 150
 151 <p>
 152 Problem: if the architecture allows mis-aligned accesses, but prefers
 153 aligned accesses, these macros just add an extra level of indirection.
 154 We need to distinguish between an architecture that allows mis-aligned
 155 accesses and one that just imposes a performance penalty (this is most
 156 of them).  Perhaps have PREFERRED_ALIGNMENT and REQUIRED_ALIGMENT
 157 configurations?
 158
 159 <p><li>
 160 Avoid conditional code like this:
 161
 162 <pre>
 163   #ifdef solaris_HOST_OS
 164   // do something solaris specific
 165   #endif
 166 </pre>
 167
 168 Instead, add an appropriate test to the configure.in script and use
 169 the result of that test instead.
 170
 171 <pre>
 172   #ifdef HAVE_BSD_H
 173   // use a BSD library
 174   #endif
 175 </pre>
 176
 177 The problem is that things change from one version of an OS to another
 178 - things get added, things get deleted, things get broken, some things
 179 are optional extras.  Using "feature tests" instead of "system tests"
 180 makes things a lot less brittle.  Things also tend to get documented
 181 better.
 182
 183 </ul>
 184
 185 <h2>Debugging/robustness tricks</h2>
 186
 187
 188 Anyone who has tried to debug a garbage collector or code generator
 189 will tell you: "If a program is going to crash, it should crash as
 190 soon, as noisily and as often as possible."  There's nothing worse
 191 than trying to find a bug which only shows up when running GHC on
 192 itself and doesn't manifest itself until 10 seconds after the actual
 193 cause of the problem.
 194
 195 <p>
 196 The ideas in this section are mostly aimed at this issue:
 197
 198 <ul>
 199 <p><li>
 200 Use assertions.  Use lots of assertions.  If you write a comment
 201 that says "takes a +ve number" add an assertion.  If you're casting
 202 an int to a nat, add an assertion.  If you're casting an int to a char,
 203 add an assertion.
 204
 205 <p><li>
 206 Write special debugging code to check the integrity of your data structures.
 207 (Most of the runtime checking code is in <tt>src/Sanity.c</tt>)
 208 Add extra assertions which call this code at the start and end of any
 209 code that operates on your data structures.
 210
 211 <p><li>
 212 When you find a hard-to-spot bug, try to think of some assertions,
 213 sanity checks or whatever that would have made the bug easier to find.
 214
 215 <p><li>
 216 When defining an enumeration, it's a good idea not to use 0 for normal
 217 values.  Instead, make 0 raise an internal error.  The idea here is to
 218 make it easier to detect pointer-related errors on the assumption that
 219 random pointers are more likely to point to a 0 than to anything else.
 220
 221 <pre>
 222 typedef enum
 223     { i_INTERNAL_ERROR  /* Instruction 0 raises an internal error */
 224     , i_PANIC           /* irrefutable pattern match failed! */
 225     , i_ERROR           /* user level error */
 226
 227     ...
 228 </pre>
 229
 230 <p><li> Use #warning or #error whenever you write a piece of incomplete/broken code.
 231
 232 <p><li> When testing, try to make infrequent things happen often.
 233      For example, make a context switch/gc/etc happen every time a
 234      context switch/gc/etc can happen.  The system will run like a
 235      pig but it'll catch a lot of bugs.
 236
 237 </ul>
 238
 239 <h2>Syntactic details</h2>
 240
 241 <ul>
 242 <p><li><b>Important:</b> Put "redundant" braces or parens in your code.
 243 Omitting braces and parens leads to very hard to spot bugs -
 244 especially if you use macros (and you might have noticed that GHC does
 245 this a lot!)
 246
 247 <p>
 248 In particular:
 249 <ul>
 250 <p><li>
 251 Put braces round the body of for loops, while loops, if statements, etc.
 252 even if they "aren't needed" because it's really hard to find the resulting
 253 bug if you mess up.  Indent them any way you like but put them in there!
 254
 255 <p><li>
 256 When defining a macro, always put parens round args - just in case.
 257 For example, write:
 258 <pre>
 259   #define add(x,y) ((x)+(y))
 260 </pre>
 261 instead of
 262 <pre>
 263   #define add(x,y) x+y
 264 </pre>
 265
 266 <p><li>
 267 Don't define macros that expand to a list of statements.
 268 You could just use braces as in:
 269
 270 <pre>
 271   #define ASSIGN_CC_ID(ccID)              \
 272         {                                 \
 273         ccID = CC_ID;                     \
 274         CC_ID++;                          \
 275         }
 276 </pre>
 277
 278 but it's better to use the "do { ... } while (0)" trick instead:
 279
 280 <pre>
 281   #define ASSIGN_CC_ID(ccID)              \
 282         do {                              \
 283         ccID = CC_ID;                     \
 284         CC_ID++;                          \
 285         } while(0)
 286 </pre>
 287
 288 The following explanation comes from
 289 <a href="http://www.cs.umd.edu/users/cml/cstyle/code-std-disc.txt">The Usenet C programming FAQ</a>
 290 <pre>
 291 10.4:   What's the best way to write a multi-statement macro?
 292
 293 A:      The usual goal is to write a macro that can be invoked as if it
 294         were a statement consisting of a single function call.  This
 295         means that the "caller" will be supplying the final semicolon,
 296         so the macro body should not.  The macro body cannot therefore
 297         be a simple brace-enclosed compound statement, because syntax
 298         errors would result if it were invoked (apparently as a single
 299         statement, but with a resultant extra semicolon) as the if
 300         branch of an if/else statement with an explicit else clause.
 301
 302         The traditional solution, therefore, is to use
 303
 304                 #define MACRO(arg1, arg2) do {  \
 305                         /* declarations */      \
 306                         stmt1;                  \
 307                         stmt2;                  \
 308                         /* ... */               \
 309                         } while(0)      /* (no trailing ; ) */
 310
 311         When the caller appends a semicolon, this expansion becomes a
 312         single statement regardless of context.  (An optimizing compiler
 313         will remove any "dead" tests or branches on the constant
 314         condition 0, although lint may complain.)
 315
 316         If all of the statements in the intended macro are simple
 317         expressions, with no declarations or loops, another technique is
 318         to write a single, parenthesized expression using one or more
 319         comma operators.  (For an example, see the first DEBUG() macro
 320         in question 10.26.)  This technique also allows a value to be
 321         "returned."
 322
 323         References: H&S Sec. 3.3.2 p. 45; CT&P Sec. 6.3 pp. 82-3.
 324 </pre>
 325
 326 <p><li>
 327 Don't even write macros that expand to 0 statements - they can mess you
 328 up as well.  Use the doNothing macro instead.
 329 <pre>
 330   #define doNothing() do { } while (0)
 331 </pre>
 332 </ul>
 333
 334 <p><li>
 335 Use inline functions instead of macros if possible - they're a lot
 336 less tricky to get right and don't suffer from the usual problems
 337 of side effects, evaluation order, multiple evaluation, etc.
 338
 339 <ul>
 340 <p><li>Inline functions get the naming issue right.  E.g. they
 341   can have local variables which (in an expression context)
 342   macros can't.
 343
 344 <p><li> Inline functions have call-by-value semantics whereas macros
 345   are call-by-name.  You can be bitten by duplicated computation
 346   if you aren't careful.
 347
 348 <p><li> You can use inline functions from inside gdb if you compile with
 349   -O0 or -fkeep-inline-functions.  If you use macros, you'd better
 350   know what they expand to.
 351 </ul>
 352
 353 However, note that macros can serve as both l-values and r-values and
 354 can be "polymorphic" as these examples show:
 355 <pre>
 356   // you can use this as an l-value or an l-value
 357   #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo)
 358
 359   // polymorphic case
 360   // but note that min(min(1,2),3) does 3 comparisions instead of 2!!
 361   #define min(x,y) (((x)<=(y)) ? (x) : (y))
 362 </pre>
 363
 364 <p><li>
 365 Inline functions should be "static inline" because:
 366 <ul>
 367 <p><li>
 368 gcc will delete static inlines if not used or theyre always inlined.
 369
 370 <p><li>
 371   if they're externed, we could get conflicts between 2 copies of the
 372   same function if, for some reason, gcc is unable to delete them.
 373   If they're static, we still get multiple copies but at least they don't conflict.
 374 </ul>
 375
 376 OTOH, the gcc manual says this
 377 so maybe we should use extern inline?
 378
 379 <pre>
 380    When a function is both inline and `static', if all calls to the
 381 function are integrated into the caller, and the function's address is
 382 never used, then the function's own assembler code is never referenced.
 383 In this case, GNU CC does not actually output assembler code for the
 384 function, unless you specify the option `-fkeep-inline-functions'.
 385 Some calls cannot be integrated for various reasons (in particular,
 386 calls that precede the function's definition cannot be integrated, and
 387 neither can recursive calls within the definition).  If there is a
 388 nonintegrated call, then the function is compiled to assembler code as
 389 usual.  The function must also be compiled as usual if the program
 390 refers to its address, because that can't be inlined.
 391
 392    When an inline function is not `static', then the compiler must
 393 assume that there may be calls from other source files; since a global
 394 symbol can be defined only once in any program, the function must not
 395 be defined in the other source files, so the calls therein cannot be
 396 integrated.  Therefore, a non-`static' inline function is always
 397 compiled on its own in the usual fashion.
 398
 399    If you specify both `inline' and `extern' in the function
 400 definition, then the definition is used only for inlining.  In no case
 401 is the function compiled on its own, not even if you refer to its
 402 address explicitly.  Such an address becomes an external reference, as
 403 if you had only declared the function, and had not defined it.
 404
 405    This combination of `inline' and `extern' has almost the effect of a
 406 macro.  The way to use it is to put a function definition in a header
 407 file with these keywords, and put another copy of the definition
 408 (lacking `inline' and `extern') in a library file.  The definition in
 409 the header file will cause most calls to the function to be inlined.
 410 If any uses of the function remain, they will refer to the single copy
 411 in the library.
 412 </pre>
 413
 414 <p><li>
 415 This code
 416 <pre>
 417 int* p, q;
 418 </pre>
 419 looks like it declares two pointers but, in fact, only p is a pointer.
 420 It's safer to write this:
 421 <pre>
 422 int* p;
 423 int* q;
 424 </pre>
 425 You could also write this:
 426 <pre>
 427 int *p, *q;
 428 </pre>
 429 but it is preferrable to split the declarations.
 430
 431 <p><li>
 432 Try to use ANSI C's enum feature when defining lists of constants of
 433 the same type.  Among other benefits, you'll notice that gdb uses the
 434 name instead of its (usually inscrutable) number when printing values
 435 with enum types and gdb will let you use the name in expressions you
 436 type.
 437
 438 <p>
 439 Examples:
 440 <pre>
 441     typedef enum { /* N.B. Used as indexes into arrays */
 442      NO_HEAP_PROFILING,
 443      HEAP_BY_CC,
 444      HEAP_BY_MOD,
 445      HEAP_BY_GRP,
 446      HEAP_BY_DESCR,
 447      HEAP_BY_TYPE,
 448      HEAP_BY_TIME
 449     } ProfilingFlags;
 450 </pre>
 451 instead of
 452 <pre>
 453     # define NO_HEAP_PROFILING  0       /* N.B. Used as indexes into arrays */
 454     # define HEAP_BY_CC         1
 455     # define HEAP_BY_MOD        2
 456     # define HEAP_BY_GRP        3
 457     # define HEAP_BY_DESCR      4
 458     # define HEAP_BY_TYPE       5
 459     # define HEAP_BY_TIME       6
 460 </pre>
 461 and
 462 <pre>
 463     typedef enum {
 464      CCchar    = 'C',
 465      MODchar   = 'M',
 466      GRPchar   = 'G',
 467      DESCRchar = 'D',
 468      TYPEchar  = 'Y',
 469      TIMEchar  = 'T'
 470     } ProfilingTag;
 471 </pre>
 472 instead of
 473 <pre>
 474     # define CCchar    'C'
 475     # define MODchar   'M'
 476     # define GRPchar   'G'
 477     # define DESCRchar 'D'
 478     # define TYPEchar  'Y'
 479     # define TIMEchar  'T'
 480 </pre>
 481
 482 <p><li> Please keep to 80 columns: the line has to be drawn somewhere,
 483 and by keeping it to 80 columns we can ensure that code looks OK on
 484 everyone's screen.  Long lines are hard to read, and a sign that the
 485 code needs to be restructured anyway.
 486
 487 <p><li> We don't care too much about your indentation style but, if
 488 you're modifying a function, please try to use the same style as the
 489 rest of the function (or file).  If you're writing new code, a
 490 tab width of 4 is preferred.
 491
 492 <p>
 493 On which note:
 494 Hugs related pieces of code often start with the line:
 495 <pre>
 496   /* -*- mode: hugs-c; -*- */
 497 </pre>
 498 which helps Emacs mimic the indentation style used by Mark P Jones
 499 within Hugs.  Add this to your .emacs file.
 500 <pre>
 501   (defun hugs-c-mode ()
 502     "C mode with adjusted defaults for use with Hugs (based on linux-c-mode)"
 503     (interactive)
 504     (c-mode)
 505     (setq c-basic-offset 4)
 506     (setq indent-tabs-mode nil) ; don't use tabs to indent
 507     (setq c-recognize-knr-r nil)  ; no K&R here - don't pay the price
 508     ; no: (setq tab-width 4)
 509
 510     (c-set-offset 'knr-argdecl-intro    0)
 511     (c-set-offset 'case-label           0)
 512     (c-set-offset 'statement-case-intro '++)
 513     (c-set-offset 'statement-case-open  '+)
 514     )
 515 </pre>
 516
 517 </ul>
 518
 519 <h2>CVS issues</h2>
 520
 521 <ul>
 522 <p><li>
 523 Don't be tempted to reindent or reorganise large chunks of code - it
 524 generates large diffs in which it's hard to see whether anything else
 525 was changed.
 526 <p>
 527 If you must reindent or reorganise, don't do anything else in that commit
 528 and give advance warning that you're about to do it in case anyone else
 529 is changing that file.
 530 </ul>
 531
 532
 533
 534 </body>
 535 </html>