docs/coding-style.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
   2 <HTML>
   3 <HEAD>
   4    <TITLE>GHC Style Guidelines for C code</TITLE>
   5 </HEAD>
   6 <BODY>
   7
   8 <H1>Coding suggestions for GHC/Hugs related code</h1>
   9
  10 <h2>Comments</h2>
  11
  12 NB These are just suggestions.  They're not set in stone.  Some of
  13 them are probably misguided.  If you disagree with them, feel free to
  14 modify this document (and make your commit message reasonably
  15 informative) or mail someone (eg. <a
  16 href="glasgow-haskell-users@haskell.org">The GHC mailing list</a>)
  17
  18 <h2>References</h2>
  19
  20 If you haven't read them already, you might like to check the following.
  21 Where they conflict with our suggestions, they're probably right.
  22
  23 <ul>
  24 <li>
  25 Writing Solid Code, Microsoft Press.  (Highly recommended.  Possibly
  26 the only Microsoft Press book that's worth reading.  SimonPJ has a
  27 copy.)
  28
  29 <p><li>
  30 Autoconf documentation (which doesn't seem to be on the web).
  31 See also <a href="http://peti.gmd.de/autoconf-archive/">The autoconf macro archive</a> and
  32 <a href="http://www.cyclic.com/cyclic-pages/autoconf.html">Cyclic Software's description</a>
  33
  34 <p><li> <a
  35 href="http://www.cs.umd.edu/users/cml/cstyle/indhill-cstyle.html">Indian
  36 Hill C Style and Coding Standards</a>.
  37
  38 <p><li>
  39 <a href="http://www.cs.umd.edu/users/cml/cstyle/">A list of C programming style links</a>
  40
  41 <p><li>
  42 <a href="http://www.lysator.liu.se/c/c-www.html">A very large list of C programming links</a>
  43
  44 <p><li>
  45 <a href="http://www.geek-girl.com/unix.html">A list of Unix programming links</a>
  46
  47 </ul>
  48
  49
  50 <h2>Portability issues</h2>
  51
  52 <ul>
  53 <p><li>
  54 We use ANSI C with some extensions.  In particular, we use:
  55 <ul>
  56 <li>enum
  57 <li>ANSI style prototypes
  58 <li>#elsif, #error, ## and other cpp features
  59 </ul>
  60
  61 <p><li>
  62 We use the following gcc extensions (see gcc documentation):
  63
  64 <ul>
  65 <p><li>#warning
  66
  67 <p><li>zero length arrays (useful as the last field of a struct)
  68
  69 <p><li>inline annotations on functions (see later)
  70
  71 <p><li>Labeled elements in initialisers
  72
  73 <p><li>Function attributes (mostly just <code>no_return</code> and
  74 <code>unused</code>)
  75 </ul>
  76
  77 <p>Some of these gcc/ANSI features could be avoided (for example, we
  78 could use __inline__ instead of inline or use the usual PROTO((...))
  79 trick in function prototypes) - but some of them can't be avoided
  80 so we don't try with the others.</p>
  81
  82 <p>Most of these features are actually part of the new C9X standard,
  83 so we hope to have mostly conformant code at some point in the future.
  84
  85 <p><li>
  86 Please don't use C++ style comments, they aren't standard C.
  87
  88 <p><li>
  89 char can be signed or unsigned - always say which you mean
  90
  91 <p><li> Our POSIX policy: try to write code that only uses POSIX (IEEE
  92 Std 1003.1) interfaces and APIs.  When you include <code>Rts.h</code>,
  93 <code>POSIX_SOURCE</code> is automatically defined for you before any
  94 system headers are slurped in, unless you define
  95 <code>NON_POSIX_SOURCE</code> prior to including <code>Rts.h</code>.
  96 A good C library will use the <code>POSIX_SOURCE</code> define to
  97 eliminate non-posix types and function prototypes, so the compiler
  98 should complain if you venture outside the POSIX spec.</li>
  99
 100 <p><li>
 101 Some architectures have memory alignment constraints.
 102 Others don't have any constraints but go faster if you align things.
 103 These macros tell you which alignment to use
 104
 105 <pre>
 106   /* minimum alignment of unsigned int */
 107   #define ALIGNMENT_UNSIGNED_INT 4
 108
 109   /* minimum alignment of long */
 110   #define ALIGNMENT_LONG 4
 111
 112   /* minimum alignment of float */
 113   #define ALIGNMENT_FLOAT 4
 114
 115   /* minimum alignment of double */
 116   #define ALIGNMENT_DOUBLE 4
 117 </pre>
 118
 119 <p><li>
 120 Use StgInt, StgWord and StgPtr when reading/writing ints and ptrs to
 121 the stack or heap.  Note that, by definition, StgInt, StgWord and
 122 StgPtr are the same size and have the same alignment constraints
 123 even if sizeof(int) != sizeof(ptr) on that platform.
 124
 125 <p><li>
 126 Use StgInt8, StgInt16, etc when you need a certain minimum number of
 127 bits in a type.  Use int and nat when there's no particular
 128 constraint.  ANSI C only guarantees that ints are at least 16 bits but
 129 within GHC we assume they are 32 bits (do we? --SDM).
 130
 131 <p><li>
 132 Use StgFloat and StgDouble for floating point values which will go
 133 on/have come from the stack or heap.  Note that StgFloat may be the
 134 same as StgDouble on some architectures (eg Alphas) and that it might
 135 occupy many StgWords.
 136
 137 <p>
 138 Use PK_FLT(addr), PK_DBL(addr) to read StgFloat and
 139 StgDouble values from the stack/heap, and ASSIGN_FLT(val,addr)/
 140 ASSIGN_DBL(val,addr) to assign StgFloat/StgDouble values to heap/stack
 141 locations.  These macros take care of alignment restrictions.
 142
 143 <p>
 144 Heap/Stack locations are StgWord aligned; the alignment requirements
 145 of an StgDouble may be more than that of StgWord, but we don't pad
 146 misaligned StgDoubles (doing so would be too much hassle).
 147
 148 <p>
 149 Doing a direct assignment/read of an StgDouble to/from a mis-aligned
 150 location may not work, so we use the ASSIGN_DBL(,)/PK_DBL() macro,
 151 which goes via a temporary.
 152
 153 <p>
 154 Problem: if the architecture allows mis-aligned accesses, but prefers
 155 aligned accesses, these macros just add an extra level of indirection.
 156 We need to distinguish between an architecture that allows mis-aligned
 157 accesses and one that just imposes a performance penalty (this is most
 158 of them).  Perhaps have PREFERRED_ALIGNMENT and REQUIRED_ALIGMENT
 159 configurations?
 160
 161 <p><li>
 162 Avoid conditional code like this:
 163
 164 <pre>
 165   #ifdef solaris_HOST_OS
 166   // do something solaris specific
 167   #endif
 168 </pre>
 169
 170 Instead, add an appropriate test to the configure.in script and use
 171 the result of that test instead.
 172
 173 <pre>
 174   #ifdef HAVE_BSD_H
 175   // use a BSD library
 176   #endif
 177 </pre>
 178
 179 The problem is that things change from one version of an OS to another
 180 - things get added, things get deleted, things get broken, some things
 181 are optional extras.  Using "feature tests" instead of "system tests"
 182 makes things a lot less brittle.  Things also tend to get documented
 183 better.
 184
 185 </ul>
 186
 187 <h2>Debugging/robustness tricks</h2>
 188
 189
 190 Anyone who has tried to debug a garbage collector or code generator
 191 will tell you: "If a program is going to crash, it should crash as
 192 soon, as noisily and as often as possible."  There's nothing worse
 193 than trying to find a bug which only shows up when running GHC on
 194 itself and doesn't manifest itself until 10 seconds after the actual
 195 cause of the problem.
 196
 197 <p>
 198 The ideas in this section are mostly aimed at this issue:
 199
 200 <ul>
 201 <p><li>
 202 Use assertions.  Use lots of assertions.  If you write a comment
 203 that says "takes a +ve number" add an assertion.  If you're casting
 204 an int to a nat, add an assertion.  If you're casting an int to a char,
 205 add an assertion.
 206
 207 <p><li>
 208 Write special debugging code to check the integrity of your data structures.
 209 (Most of the runtime checking code is in <tt>src/Sanity.c</tt>)
 210 Add extra assertions which call this code at the start and end of any
 211 code that operates on your data structures.
 212
 213 <p><li>
 214 When you find a hard-to-spot bug, try to think of some assertions,
 215 sanity checks or whatever that would have made the bug easier to find.
 216
 217 <p><li>
 218 When defining an enumeration, it's a good idea not to use 0 for normal
 219 values.  Instead, make 0 raise an internal error.  The idea here is to
 220 make it easier to detect pointer-related errors on the assumption that
 221 random pointers are more likely to point to a 0 than to anything else.
 222
 223 <pre>
 224 typedef enum
 225     { i_INTERNAL_ERROR  /* Instruction 0 raises an internal error */
 226     , i_PANIC           /* irrefutable pattern match failed! */
 227     , i_ERROR           /* user level error */
 228
 229     ...
 230 </pre>
 231
 232 <p><li> Use #warning or #error whenever you write a piece of incomplete/broken code.
 233
 234 <p><li> When testing, try to make infrequent things happen often.
 235      For example, make a context switch/gc/etc happen every time a
 236      context switch/gc/etc can happen.  The system will run like a
 237      pig but it'll catch a lot of bugs.
 238
 239 </ul>
 240
 241 <h2>Syntactic details</h2>
 242
 243 <ul>
 244 <p><li><b>Important:</b> Put "redundant" braces or parens in your code.
 245 Omitting braces and parens leads to very hard to spot bugs -
 246 especially if you use macros (and you might have noticed that GHC does
 247 this a lot!)
 248
 249 <p>
 250 In particular:
 251 <ul>
 252 <p><li>
 253 Put braces round the body of for loops, while loops, if statements, etc.
 254 even if they "aren't needed" because it's really hard to find the resulting
 255 bug if you mess up.  Indent them any way you like but put them in there!
 256
 257 <p><li>
 258 When defining a macro, always put parens round args - just in case.
 259 For example, write:
 260 <pre>
 261   #define add(x,y) ((x)+(y))
 262 </pre>
 263 instead of
 264 <pre>
 265   #define add(x,y) x+y
 266 </pre>
 267
 268 <p><li>
 269 Don't define macros that expand to a list of statements.
 270 You could just use braces as in:
 271
 272 <pre>
 273   #define ASSIGN_CC_ID(ccID)              \
 274         {                                 \
 275         ccID = CC_ID;                     \
 276         CC_ID++;                          \
 277         }
 278 </pre>
 279
 280 but it's better to use the "do { ... } while (0)" trick instead:
 281
 282 <pre>
 283   #define ASSIGN_CC_ID(ccID)              \
 284         do {                              \
 285         ccID = CC_ID;                     \
 286         CC_ID++;                          \
 287         } while(0)
 288 </pre>
 289
 290 The following explanation comes from
 291 <a href="http://www.cs.umd.edu/users/cml/cstyle/code-std-disc.txt">The Usenet C programming FAQ</a>
 292 <pre>
 293 10.4:   What's the best way to write a multi-statement macro?
 294
 295 A:      The usual goal is to write a macro that can be invoked as if it
 296         were a statement consisting of a single function call.  This
 297         means that the "caller" will be supplying the final semicolon,
 298         so the macro body should not.  The macro body cannot therefore
 299         be a simple brace-enclosed compound statement, because syntax
 300         errors would result if it were invoked (apparently as a single
 301         statement, but with a resultant extra semicolon) as the if
 302         branch of an if/else statement with an explicit else clause.
 303
 304         The traditional solution, therefore, is to use
 305
 306                 #define MACRO(arg1, arg2) do {  \
 307                         /* declarations */      \
 308                         stmt1;                  \
 309                         stmt2;                  \
 310                         /* ... */               \
 311                         } while(0)      /* (no trailing ; ) */
 312
 313         When the caller appends a semicolon, this expansion becomes a
 314         single statement regardless of context.  (An optimizing compiler
 315         will remove any "dead" tests or branches on the constant
 316         condition 0, although lint may complain.)
 317
 318         If all of the statements in the intended macro are simple
 319         expressions, with no declarations or loops, another technique is
 320         to write a single, parenthesized expression using one or more
 321         comma operators.  (For an example, see the first DEBUG() macro
 322         in question 10.26.)  This technique also allows a value to be
 323         "returned."
 324
 325         References: H&S Sec. 3.3.2 p. 45; CT&P Sec. 6.3 pp. 82-3.
 326 </pre>
 327
 328 <p><li>
 329 Don't even write macros that expand to 0 statements - they can mess you
 330 up as well.  Use the doNothing macro instead.
 331 <pre>
 332   #define doNothing() do { } while (0)
 333 </pre>
 334 </ul>
 335
 336 <p><li>
 337 Use inline functions instead of macros if possible - they're a lot
 338 less tricky to get right and don't suffer from the usual problems
 339 of side effects, evaluation order, multiple evaluation, etc.
 340
 341 <ul>
 342 <p><li>Inline functions get the naming issue right.  E.g. they
 343   can have local variables which (in an expression context)
 344   macros can't.
 345
 346 <p><li> Inline functions have call-by-value semantics whereas macros
 347   are call-by-name.  You can be bitten by duplicated computation
 348   if you aren't careful.
 349
 350 <p><li> You can use inline functions from inside gdb if you compile with
 351   -O0 or -fkeep-inline-functions.  If you use macros, you'd better
 352   know what they expand to.
 353 </ul>
 354
 355 However, note that macros can serve as both l-values and r-values and
 356 can be "polymorphic" as these examples show:
 357 <pre>
 358   // you can use this as an l-value or an l-value
 359   #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo)
 360
 361   // polymorphic case
 362   // but note that min(min(1,2),3) does 3 comparisions instead of 2!!
 363   #define min(x,y) (((x)<=(y)) ? (x) : (y))
 364 </pre>
 365
 366 <p><li>
 367 Inline functions should be "static inline" because:
 368 <ul>
 369 <p><li>
 370 gcc will delete static inlines if not used or theyre always inlined.
 371
 372 <p><li>
 373   if they're externed, we could get conflicts between 2 copies of the
 374   same function if, for some reason, gcc is unable to delete them.
 375   If they're static, we still get multiple copies but at least they don't conflict.
 376 </ul>
 377
 378 OTOH, the gcc manual says this
 379 so maybe we should use extern inline?
 380
 381 <pre>
 382    When a function is both inline and `static', if all calls to the
 383 function are integrated into the caller, and the function's address is
 384 never used, then the function's own assembler code is never referenced.
 385 In this case, GNU CC does not actually output assembler code for the
 386 function, unless you specify the option `-fkeep-inline-functions'.
 387 Some calls cannot be integrated for various reasons (in particular,
 388 calls that precede the function's definition cannot be integrated, and
 389 neither can recursive calls within the definition).  If there is a
 390 nonintegrated call, then the function is compiled to assembler code as
 391 usual.  The function must also be compiled as usual if the program
 392 refers to its address, because that can't be inlined.
 393
 394    When an inline function is not `static', then the compiler must
 395 assume that there may be calls from other source files; since a global
 396 symbol can be defined only once in any program, the function must not
 397 be defined in the other source files, so the calls therein cannot be
 398 integrated.  Therefore, a non-`static' inline function is always
 399 compiled on its own in the usual fashion.
 400
 401    If you specify both `inline' and `extern' in the function
 402 definition, then the definition is used only for inlining.  In no case
 403 is the function compiled on its own, not even if you refer to its
 404 address explicitly.  Such an address becomes an external reference, as
 405 if you had only declared the function, and had not defined it.
 406
 407    This combination of `inline' and `extern' has almost the effect of a
 408 macro.  The way to use it is to put a function definition in a header
 409 file with these keywords, and put another copy of the definition
 410 (lacking `inline' and `extern') in a library file.  The definition in
 411 the header file will cause most calls to the function to be inlined.
 412 If any uses of the function remain, they will refer to the single copy
 413 in the library.
 414 </pre>
 415
 416 <p><li>
 417 This code
 418 <pre>
 419 int* p, q;
 420 </pre>
 421 looks like it declares two pointers but, in fact, only p is a pointer.
 422 It's safer to write this:
 423 <pre>
 424 int* p;
 425 int* q;
 426 </pre>
 427 You could also write this:
 428 <pre>
 429 int *p, *q;
 430 </pre>
 431 but it is preferrable to split the declarations.
 432
 433 <p><li>
 434 Try to use ANSI C's enum feature when defining lists of constants of
 435 the same type.  Among other benefits, you'll notice that gdb uses the
 436 name instead of its (usually inscrutable) number when printing values
 437 with enum types and gdb will let you use the name in expressions you
 438 type.
 439
 440 <p>
 441 Examples:
 442 <pre>
 443     typedef enum { /* N.B. Used as indexes into arrays */
 444      NO_HEAP_PROFILING,
 445      HEAP_BY_CC,
 446      HEAP_BY_MOD,
 447      HEAP_BY_GRP,
 448      HEAP_BY_DESCR,
 449      HEAP_BY_TYPE,
 450      HEAP_BY_TIME
 451     } ProfilingFlags;
 452 </pre>
 453 instead of
 454 <pre>
 455     # define NO_HEAP_PROFILING  0       /* N.B. Used as indexes into arrays */
 456     # define HEAP_BY_CC         1
 457     # define HEAP_BY_MOD        2
 458     # define HEAP_BY_GRP        3
 459     # define HEAP_BY_DESCR      4
 460     # define HEAP_BY_TYPE       5
 461     # define HEAP_BY_TIME       6
 462 </pre>
 463 and
 464 <pre>
 465     typedef enum {
 466      CCchar    = 'C',
 467      MODchar   = 'M',
 468      GRPchar   = 'G',
 469      DESCRchar = 'D',
 470      TYPEchar  = 'Y',
 471      TIMEchar  = 'T'
 472     } ProfilingTag;
 473 </pre>
 474 instead of
 475 <pre>
 476     # define CCchar    'C'
 477     # define MODchar   'M'
 478     # define GRPchar   'G'
 479     # define DESCRchar 'D'
 480     # define TYPEchar  'Y'
 481     # define TIMEchar  'T'
 482 </pre>
 483
 484 <p><li> Please keep to 80 columns: the line has to be drawn somewhere,
 485 and by keeping it to 80 columns we can ensure that code looks OK on
 486 everyone's screen.  Long lines are hard to read, and a sign that the
 487 code needs to be restructured anyway.
 488
 489 <p><li> We don't care too much about your indentation style but, if
 490 you're modifying a function, please try to use the same style as the
 491 rest of the function (or file).  If you're writing new code, a
 492 tab width of 4 is preferred.
 493
 494 <p>
 495 On which note:
 496 Hugs related pieces of code often start with the line:
 497 <pre>
 498   /* -*- mode: hugs-c; -*- */
 499 </pre>
 500 which helps Emacs mimic the indentation style used by Mark P Jones
 501 within Hugs.  Add this to your .emacs file.
 502 <pre>
 503   (defun hugs-c-mode ()
 504     "C mode with adjusted defaults for use with Hugs (based on linux-c-mode)"
 505     (interactive)
 506     (c-mode)
 507     (setq c-basic-offset 4)
 508     (setq indent-tabs-mode nil) ; don't use tabs to indent
 509     (setq c-recognize-knr-r nil)  ; no K&R here - don't pay the price
 510     ; no: (setq tab-width 4)
 511
 512     (c-set-offset 'knr-argdecl-intro    0)
 513     (c-set-offset 'case-label           0)
 514     (c-set-offset 'statement-case-intro '++)
 515     (c-set-offset 'statement-case-open  '+)
 516     )
 517 </pre>
 518
 519 </ul>
 520
 521 <h2>CVS issues</h2>
 522
 523 <ul>
 524 <p><li>
 525 Don't be tempted to reindent or reorganise large chunks of code - it
 526 generates large diffs in which it's hard to see whether anything else
 527 was changed.
 528 <p>
 529 If you must reindent or reorganise, don't do anything else in that commit
 530 and give advance warning that you're about to do it in case anyone else
 531 is changing that file.
 532 </ul>
 533
 534
 535
 536 </body>
 537 </html>