docs/coding-style.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
   2 <HTML>
   3 <HEAD>
   4    <TITLE>GHC Style Guidelines for C code</TITLE>
   5 </HEAD>
   6 <BODY>
   7
   8 <H1>GHC Style guidelines for C code</h1>
   9
  10 <h2>Comments</h2>
  11
  12 <p>These coding style guidelines are mainly intended for use in
  13 <tt>ghc/rts</tt> and <tt>ghc/includes</tt>.
  14
  15 <p>NB These are just suggestions.  They're not set in stone.  Some of
  16 them are probably misguided.  If you disagree with them, feel free to
  17 modify this document (and make your commit message reasonably
  18 informative) or mail someone (eg. <a
  19 href="glasgow-haskell-users@haskell.org">The GHC mailing list</a>)
  20
  21 <h2>References</h2>
  22
  23 If you haven't read them already, you might like to check the following.
  24 Where they conflict with our suggestions, they're probably right.
  25
  26 <ul>
  27
  28 <li>
  29 The C99 standard.  One reasonable reference is <a
  30 href="http://home.tiscalinet.ch/t_wolf/tw/c/c9x_changes.html">here</a>.
  31
  32 <p><li>
  33 Writing Solid Code, Microsoft Press.  (Highly recommended.  Possibly
  34 the only Microsoft Press book that's worth reading.)
  35
  36 <p><li>
  37 Autoconf documentation.
  38 See also <a href="http://peti.gmd.de/autoconf-archive/">The autoconf macro archive</a> and
  39 <a href="http://www.cyclic.com/cyclic-pages/autoconf.html">Cyclic Software's description</a>
  40
  41 <p><li> <a
  42 href="http://www.cs.umd.edu/users/cml/cstyle/indhill-cstyle.html">Indian
  43 Hill C Style and Coding Standards</a>.
  44
  45 <p><li>
  46 <a href="http://www.cs.umd.edu/users/cml/cstyle/">A list of C programming style links</a>
  47
  48 <p><li>
  49 <a href="http://www.lysator.liu.se/c/c-www.html">A very large list of C programming links</a>
  50
  51 <p><li>
  52 <a href="http://www.geek-girl.com/unix.html">A list of Unix programming links</a>
  53
  54 </ul>
  55
  56
  57 <h2>Portability issues</h2>
  58
  59 <ul>
  60 <p><li> We try to stick to C99 where possible.  We use the following
  61 C99 features relative to C89, some of which were previously GCC
  62 extensions (possibly with different syntax):
  63
  64 <ul>
  65 <p><li>Variable length arrays as the last field of a struct.  GCC has
  66 a similar extension, but the syntax is slightly different: in GCC you
  67 would declare the array as <tt>arr[0]</tt>, whereas in C99 it is
  68 declared as <tt>arr[]</tt>.
  69
  70 <p><li>Inline annotations on functions (see later)
  71
  72 <p><li>Labeled elements in initialisers.  Again, GCC has a slightly
  73 different syntax from C99 here, and we stick with the GCC syntax until
  74 GCC implements the C99 proposal.
  75
  76 <p><li>C++-style comments.  These are part of the C99 standard, and we
  77 prefer to use them whenever possible.
  78 </ul>
  79
  80 <p>In addition we use ANSI-C-style function declarations and
  81 prototypes exclusively.  Every function should have a prototype;
  82 static function prototypes may be placed near the top of the file in
  83 which they are declared, and external prototypes are usually placed in
  84 a header file with the same basename as the source file (although there
  85 are exceptions to this rule, particularly when several source files
  86 together implement a subsystem which is described by a single external
  87 header file).
  88
  89 <p><li>We use the following GCC extensions, but surround them with
  90 <tt>#ifdef __GNUC__</tt>:
  91
  92 <ul>
  93 <p><li>Function attributes (mostly just <code>no_return</code> and
  94 <code>unused</code>)
  95 <p><li>Inline assembly.
  96 </ul>
  97
  98 <p><li>
  99 char can be signed or unsigned - always say which you mean
 100
 101 <p><li>Our POSIX policy: try to write code that only uses POSIX (IEEE
 102 Std 1003.1) interfaces and APIs.  We used to define
 103 <code>POSIX_SOURCE</code> by default, but found that this caused more
 104 problems than it solved, so now we require any code that is
 105 POSIX-compliant to explicitly say so by having <code>#include
 106 "PosixSource.h"</code> at the top.  Try to do this whenever possible.
 107
 108 <p><li> Some architectures have memory alignment constraints.  Others
 109 don't have any constraints but go faster if you align things.  These
 110 macros (from <tt>config.h</tt>) tell you which alignment to use
 111
 112 <pre>
 113   /* minimum alignment of unsigned int */
 114   #define ALIGNMENT_UNSIGNED_INT 4
 115
 116   /* minimum alignment of long */
 117   #define ALIGNMENT_LONG 4
 118
 119   /* minimum alignment of float */
 120   #define ALIGNMENT_FLOAT 4
 121
 122   /* minimum alignment of double */
 123   #define ALIGNMENT_DOUBLE 4
 124 </pre>
 125
 126 <p><li> Use <tt>StgInt</tt>, <tt>StgWord</tt> and <tt>StgPtr</tt> when
 127 reading/writing ints and ptrs to the stack or heap.  Note that, by
 128 definition, <tt>StgInt</tt>, <tt>StgWord</tt> and <tt>StgPtr</tt> are
 129 the same size and have the same alignment constraints even if
 130 <code>sizeof(int) != sizeof(ptr)</code> on that platform.
 131
 132 <p><li> Use <tt>StgInt8</tt>, <tt>StgInt16</tt>, etc when you need a
 133 certain minimum number of bits in a type.  Use <tt>int</tt> and
 134 <tt>nat</tt> when there's no particular constraint.  ANSI C only
 135 guarantees that ints are at least 16 bits but within GHC we assume
 136 they are 32 bits.
 137
 138 <p><li> Use <tt>StgFloat</tt> and <tt>StgDouble</tt> for floating
 139 point values which will go on/have come from the stack or heap.  Note
 140 that <tt>StgDouble</tt> may occupy more than one <tt>StgWord</tt>, but
 141 it will always be a whole number multiple.
 142
 143 <p>
 144 Use <code>PK_FLT(addr)</code>, <code>PK_DBL(addr)</code> to read
 145 <tt>StgFloat</tt> and <tt>StgDouble</tt> values from the stack/heap,
 146 and <code>ASSIGN_FLT(val,addr)</code> /
 147 <code>ASSIGN_DBL(val,addr)</code> to assign StgFloat/StgDouble values
 148 to heap/stack locations.  These macros take care of alignment
 149 restrictions.
 150
 151 <p>
 152 Heap/Stack locations are always <tt>StgWord</tt> aligned; the
 153 alignment requirements of an <tt>StgDouble</tt> may be more than that
 154 of <tt>StgWord</tt>, but we don't pad misaligned <tt>StgDoubles</tt>
 155 because doing so would be too much hassle (see <code>PK_DBL</code> &
 156 co above).
 157
 158 <p><li>
 159 Avoid conditional code like this:
 160
 161 <pre>
 162   #ifdef solaris_HOST_OS
 163   // do something solaris specific
 164   #endif
 165 </pre>
 166
 167 Instead, add an appropriate test to the configure.ac script and use
 168 the result of that test instead.
 169
 170 <pre>
 171   #ifdef HAVE_BSD_H
 172   // use a BSD library
 173   #endif
 174 </pre>
 175
 176 <p>The problem is that things change from one version of an OS to another
 177 - things get added, things get deleted, things get broken, some things
 178 are optional extras.  Using "feature tests" instead of "system tests"
 179 makes things a lot less brittle.  Things also tend to get documented
 180 better.
 181
 182 </ul>
 183
 184 <h2>Debugging/robustness tricks</h2>
 185
 186
 187 Anyone who has tried to debug a garbage collector or code generator
 188 will tell you: "If a program is going to crash, it should crash as
 189 soon, as noisily and as often as possible."  There's nothing worse
 190 than trying to find a bug which only shows up when running GHC on
 191 itself and doesn't manifest itself until 10 seconds after the actual
 192 cause of the problem.
 193
 194 <p>We put all our debugging code inside <tt>#ifdef DEBUG</tt>.  The
 195 general policy is we don't ship code with debugging checks and
 196 assertions in it, but we do run with those checks in place when
 197 developing and testing.  Anything inside <tt>#ifdef DEBUG</tt> should
 198 not slow down the code by more than a factor of 2.
 199
 200 <p>We also have more expensive "sanity checking" code for hardcore
 201 debugging - this can slow down the code by a large factor, but is only
 202 enabled on demand by a command-line flag.  General sanity checking in
 203 the RTS is currently enabled with the <tt>-DS</tt> RTS flag.
 204
 205 <p>There are a number of RTS flags which control debugging output and
 206 sanity checking in various parts of the system when <tt>DEBUG</tt> is
 207 defined.  For example, to get the scheduler to be verbose about what
 208 it is doing, you would say <tt>+RTS -Ds -RTS</tt>.  See
 209 <tt>includes/RtsFlags.h</tt> and <tt>rts/RtsFlags.c</tt> for the full
 210 set of debugging flags.  To check one of these flags in the code,
 211 write:
 212
 213 <pre>
 214   IF_DEBUG(gc, fprintf(stderr, "..."));
 215 </pre>
 216
 217 would check the <tt>gc</tt> flag before generating the output (and the
 218 code is removed altogether if <tt>DEBUG</tt> is not defined).
 219
 220 <p>All debugging output should go to <tt>stderr</tt>.
 221
 222 <p>
 223 Particular guidelines for writing robust code:
 224
 225 <ul>
 226 <p><li>
 227 Use assertions.  Use lots of assertions.  If you write a comment
 228 that says "takes a +ve number" add an assertion.  If you're casting
 229 an int to a nat, add an assertion.  If you're casting an int to a char,
 230 add an assertion.  We use the <tt>ASSERT</tt> macro for writing
 231 assertions; it goes away when <tt>DEBUG</tt> is not defined.
 232
 233 <p><li>
 234 Write special debugging code to check the integrity of your data structures.
 235 (Most of the runtime checking code is in <tt>rts/Sanity.c</tt>)
 236 Add extra assertions which call this code at the start and end of any
 237 code that operates on your data structures.
 238
 239 <p><li>
 240 When you find a hard-to-spot bug, try to think of some assertions,
 241 sanity checks or whatever that would have made the bug easier to find.
 242
 243 <p><li>
 244 When defining an enumeration, it's a good idea not to use 0 for normal
 245 values.  Instead, make 0 raise an internal error.  The idea here is to
 246 make it easier to detect pointer-related errors on the assumption that
 247 random pointers are more likely to point to a 0 than to anything else.
 248
 249 <pre>
 250 typedef enum
 251     { i_INTERNAL_ERROR  /* Instruction 0 raises an internal error */
 252     , i_PANIC           /* irrefutable pattern match failed! */
 253     , i_ERROR           /* user level error */
 254
 255     ...
 256 </pre>
 257
 258 <p><li> Use <tt>#warning</tt> or <tt>#error</tt> whenever you write a
 259 piece of incomplete/broken code.
 260
 261 <p><li> When testing, try to make infrequent things happen often.
 262      For example, make a context switch/gc/etc happen every time a
 263      context switch/gc/etc can happen.  The system will run like a
 264      pig but it'll catch a lot of bugs.
 265
 266 </ul>
 267
 268 <h2>Syntactic details</h2>
 269
 270 <ul>
 271 <p><li><b>Important:</b> Put "redundant" braces or parens in your code.
 272 Omitting braces and parens leads to very hard to spot bugs -
 273 especially if you use macros (and you might have noticed that GHC does
 274 this a lot!)
 275
 276 <p>
 277 In particular:
 278 <ul>
 279 <p><li>
 280 Put braces round the body of for loops, while loops, if statements, etc.
 281 even if they "aren't needed" because it's really hard to find the resulting
 282 bug if you mess up.  Indent them any way you like but put them in there!
 283 </ul>
 284
 285 <p><li>
 286 When defining a macro, always put parens round args - just in case.
 287 For example, write:
 288 <pre>
 289   #define add(x,y) ((x)+(y))
 290 </pre>
 291 instead of
 292 <pre>
 293   #define add(x,y) x+y
 294 </pre>
 295
 296 <p><li> Don't declare and initialize variables at the same time.
 297 Separating the declaration and initialization takes more lines, but
 298 make the code clearer.
 299
 300 <p><li>
 301 Use inline functions instead of macros if possible - they're a lot
 302 less tricky to get right and don't suffer from the usual problems
 303 of side effects, evaluation order, multiple evaluation, etc.
 304
 305 <ul>
 306 <p><li>Inline functions get the naming issue right.  E.g. they
 307   can have local variables which (in an expression context)
 308   macros can't.
 309
 310 <p><li> Inline functions have call-by-value semantics whereas macros
 311   are call-by-name.  You can be bitten by duplicated computation
 312   if you aren't careful.
 313
 314 <p><li> You can use inline functions from inside gdb if you compile with
 315   -O0 or -fkeep-inline-functions.  If you use macros, you'd better
 316   know what they expand to.
 317 </ul>
 318
 319 However, note that macros can serve as both l-values and r-values and
 320 can be "polymorphic" as these examples show:
 321 <pre>
 322   // you can use this as an l-value or an l-value
 323   #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo)
 324
 325   // polymorphic case
 326   // but note that min(min(1,2),3) does 3 comparisions instead of 2!!
 327   #define min(x,y) (((x)<=(y)) ? (x) : (y))
 328 </pre>
 329
 330 <p><li>
 331 Inline functions should be "static inline" because:
 332 <ul>
 333 <p><li>
 334 gcc will delete static inlines if not used or theyre always inlined.
 335
 336 <p><li>
 337   if they're externed, we could get conflicts between 2 copies of the
 338   same function if, for some reason, gcc is unable to delete them.
 339   If they're static, we still get multiple copies but at least they don't conflict.
 340 </ul>
 341
 342 OTOH, the gcc manual says this
 343 so maybe we should use extern inline?
 344
 345 <pre>
 346    When a function is both inline and `static', if all calls to the
 347 function are integrated into the caller, and the function's address is
 348 never used, then the function's own assembler code is never referenced.
 349 In this case, GNU CC does not actually output assembler code for the
 350 function, unless you specify the option `-fkeep-inline-functions'.
 351 Some calls cannot be integrated for various reasons (in particular,
 352 calls that precede the function's definition cannot be integrated, and
 353 neither can recursive calls within the definition).  If there is a
 354 nonintegrated call, then the function is compiled to assembler code as
 355 usual.  The function must also be compiled as usual if the program
 356 refers to its address, because that can't be inlined.
 357
 358    When an inline function is not `static', then the compiler must
 359 assume that there may be calls from other source files; since a global
 360 symbol can be defined only once in any program, the function must not
 361 be defined in the other source files, so the calls therein cannot be
 362 integrated.  Therefore, a non-`static' inline function is always
 363 compiled on its own in the usual fashion.
 364
 365    If you specify both `inline' and `extern' in the function
 366 definition, then the definition is used only for inlining.  In no case
 367 is the function compiled on its own, not even if you refer to its
 368 address explicitly.  Such an address becomes an external reference, as
 369 if you had only declared the function, and had not defined it.
 370
 371    This combination of `inline' and `extern' has almost the effect of a
 372 macro.  The way to use it is to put a function definition in a header
 373 file with these keywords, and put another copy of the definition
 374 (lacking `inline' and `extern') in a library file.  The definition in
 375 the header file will cause most calls to the function to be inlined.
 376 If any uses of the function remain, they will refer to the single copy
 377 in the library.
 378 </pre>
 379
 380 <p><li>
 381 Don't define macros that expand to a list of statements.
 382 You could just use braces as in:
 383
 384 <pre>
 385   #define ASSIGN_CC_ID(ccID)              \
 386         {                                 \
 387         ccID = CC_ID;                     \
 388         CC_ID++;                          \
 389         }
 390 </pre>
 391
 392 (but it's usually better to use an inline function instead - see above).
 393
 394 <p><li>
 395 Don't even write macros that expand to 0 statements - they can mess you
 396 up as well.  Use the doNothing macro instead.
 397 <pre>
 398   #define doNothing() do { } while (0)
 399 </pre>
 400 </ul>
 401
 402 <p><li>
 403 This code
 404 <pre>
 405 int* p, q;
 406 </pre>
 407 looks like it declares two pointers but, in fact, only p is a pointer.
 408 It's safer to write this:
 409 <pre>
 410 int* p;
 411 int* q;
 412 </pre>
 413 You could also write this:
 414 <pre>
 415 int *p, *q;
 416 </pre>
 417 but it is preferrable to split the declarations.
 418
 419 <p><li>
 420 Try to use ANSI C's enum feature when defining lists of constants of
 421 the same type.  Among other benefits, you'll notice that gdb uses the
 422 name instead of its (usually inscrutable) number when printing values
 423 with enum types and gdb will let you use the name in expressions you
 424 type.
 425
 426 <p>
 427 Examples:
 428 <pre>
 429     typedef enum { /* N.B. Used as indexes into arrays */
 430      NO_HEAP_PROFILING,
 431      HEAP_BY_CC,
 432      HEAP_BY_MOD,
 433      HEAP_BY_GRP,
 434      HEAP_BY_DESCR,
 435      HEAP_BY_TYPE,
 436      HEAP_BY_TIME
 437     } ProfilingFlags;
 438 </pre>
 439 instead of
 440 <pre>
 441     # define NO_HEAP_PROFILING  0       /* N.B. Used as indexes into arrays */
 442     # define HEAP_BY_CC         1
 443     # define HEAP_BY_MOD        2
 444     # define HEAP_BY_GRP        3
 445     # define HEAP_BY_DESCR      4
 446     # define HEAP_BY_TYPE       5
 447     # define HEAP_BY_TIME       6
 448 </pre>
 449 and
 450 <pre>
 451     typedef enum {
 452      CCchar    = 'C',
 453      MODchar   = 'M',
 454      GRPchar   = 'G',
 455      DESCRchar = 'D',
 456      TYPEchar  = 'Y',
 457      TIMEchar  = 'T'
 458     } ProfilingTag;
 459 </pre>
 460 instead of
 461 <pre>
 462     # define CCchar    'C'
 463     # define MODchar   'M'
 464     # define GRPchar   'G'
 465     # define DESCRchar 'D'
 466     # define TYPEchar  'Y'
 467     # define TIMEchar  'T'
 468 </pre>
 469
 470 <p><li> Please keep to 80 columns: the line has to be drawn somewhere,
 471 and by keeping it to 80 columns we can ensure that code looks OK on
 472 everyone's screen.  Long lines are hard to read, and a sign that the
 473 code needs to be restructured anyway.
 474
 475 <p><li> When commenting out large chunks of code, use <code>#ifdef 0
 476 ... #endif</code> rather than <code>/* ... */</code> because C doesn't
 477 have nested comments.
 478
 479 <p><li>When declaring a typedef for a struct, give the struct a name
 480 as well, so that other headers can forward-reference the struct name
 481 and it becomes possible to have opaque pointers to the struct.  Our
 482 convention is to name the struct the same as the typedef, but add a
 483 leading underscore.  For example:
 484
 485 <pre>
 486   typedef struct _Foo {
 487     ...
 488   } Foo;
 489 </pre>
 490
 491 <p><li>Do not use <tt>!</tt> instead of explicit comparison against
 492 <tt>NULL</tt> or <tt>'\0'</tt>;  the latter is much clearer.
 493
 494 <p><li> We don't care too much about your indentation style but, if
 495 you're modifying a function, please try to use the same style as the
 496 rest of the function (or file).  If you're writing new code, a
 497 tab width of 4 is preferred.
 498
 499 </ul>
 500
 501 <h2>CVS issues</h2>
 502
 503 <ul>
 504 <p><li>
 505 Don't be tempted to reindent or reorganise large chunks of code - it
 506 generates large diffs in which it's hard to see whether anything else
 507 was changed.
 508 <p>
 509 If you must reindent or reorganise, don't include any functional
 510 changes that commit and give advance warning that you're about to do
 511 it in case anyone else is changing that file.
 512 </ul>
 513
 514
 515 </body>
 516 </html>