docs/coding-style.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
   2 <HTML>
   3 <HEAD>
   4    <TITLE>Access To The GHC CVS Repository</TITLE>
   5    <META NAME="GENERATOR" CONTENT="Mozilla/3.04Gold (X11; I; SunOS 5.5.1 i86pc) [Netscape]">
   6 </HEAD>
   7 <BODY>
   8
   9 <H1>Coding suggestions for GHC/Hugs related code</h1>
  10
  11 <h2>Comments</h2>
  12
  13 NB These are just suggestions.  They're not set in stone.  Some of
  14 them are probably misguided.  If you disagree with them, feel free to
  15 modify this document (and make your commit message reasonably informative)
  16 or mail someone (eg <a href="fp-cvs-fptools@dcs.gla.ac.uk">The FP-CVS mailing list</a> or
  17 <a
  18 href="mailto:reid-alastair@cs.yale.edu">reid-alastair@cs.yale.edu</a>).
  19
  20
  21 <h2>References</h2>
  22
  23 If you haven't read them already, you might like to check the following.
  24 Where they conflict with our suggestions, they're probably right.
  25
  26 <ul>
  27 <li>
  28 Writing Solid Code, Microsoft Press.  (Highly recommended.  Possibly
  29 the only Microsoft Press book that's worth reading.  SimonPJ has a
  30 copy.)
  31
  32 <li>
  33 Autoconf documentation (which doesn't seem to be on the web).
  34 See also <a href="http://peti.gmd.de/autoconf-archive/">The autoconf macro archive</a> and
  35 <a href="http://www.cyclic.com/cyclic-pages/autoconf.html">Cyclic Software's description</a>
  36
  37 <li>
  38 <a href="http://www.cs.umd.edu/users/cml/cstyle/indhill-cstyle.html">Indian Hill C Style and Coding Standards</a>.
  39
  40 <li>
  41 <a href="http://www.cs.umd.edu/users/cml/cstyle/">A list of C programming style links</a>
  42
  43 <li>
  44 <a href="http://www.lysator.liu.se/c/c-www.html">A very large list of C programming links</a>
  45
  46 <li>
  47 <a href="http://www.geek-girl.com/unix.html">A list of Unix programming links</a>
  48
  49
  50 </ul>
  51
  52
  53 <h2>Portability issues</h2>
  54
  55 <ul>
  56 <li>
  57 We use ANSI C with some extensions.  In particular, we use:
  58 <ul>
  59 <li>enum
  60 <li>ANSI style prototypes
  61 <li>#elsif, #error, #warning, ## and other cpp features
  62 </ul>
  63
  64 <li> Our POSIX policy: try to write code that only uses POSIX (IEEE
  65 Std 1003.1) interfaces and APIs.  When you include <code>Rts.h<code>,
  66 <code>POSIX_SOURCE</code> is automatically defined for you before any
  67 system headers are slurped in, unless you define
  68 <code>NON_POSIX_SOURCE</code> prior to including <code>Rts.h</code>.
  69 A good C library will use the <code>POSIX_SOURCE</code> define to
  70 eliminate non-posix types and function prototypes, so the compiler
  71 should complain if you venture outside the POSIX spec.</li>
  72
  73 <li>
  74 We use the following gcc extensions (see gcc documentation):
  75 <ul>
  76 <li>zero length arrays (useful as the last field of a struct)
  77 <li>inline annotations on functions (see later)
  78 <li>Labeled elements in initialisers
  79 <li>Function attributes (mostly just no_return)
  80 <li>Macro varargs (actually, we don't use them yet but I'm very tempted)
  81 <li>Alastair really likes to use C++ style comments - but
  82     he'll probably regret it later.
  83 <li>other stuff I've forgotten about...
  84 </ul>
  85
  86 Some of these gcc/ANSI features could be avoided (for example, we
  87 could use __inline__ instead of inline or use the usual PROTO((...))
  88 trick in function prototypes) - but some of them can't be avoided
  89 so we don't try with the others.
  90
  91 <li>
  92 char can be signed or unsigned - always say which you mean
  93
  94 <li>
  95 Some architectures have memory alignment constraints.
  96 Others don't have any constraints but go faster if you align things.
  97 These macros tell you which alignment to use
  98
  99 <pre>
 100   /* minimum alignment of unsigned int */
 101   #define ALIGNMENT_UNSIGNED_INT 4
 102
 103   /* minimum alignment of long */
 104   #define ALIGNMENT_LONG 4
 105
 106   /* minimum alignment of float */
 107   #define ALIGNMENT_FLOAT 4
 108
 109   /* minimum alignment of double */
 110   #define ALIGNMENT_DOUBLE 4
 111 </pre>
 112
 113 <li>
 114 Use StgInt, StgWord and StgPtr when reading/writing ints and ptrs to
 115 the stack or heap.  Note that, by definition, StgInt, StgWord and
 116 StgPtr are the same size and have the same alignment constraints
 117 even if sizeof(int) != sizeof(ptr) on that platform.
 118
 119 <li>
 120 Use StgInt8, StgInt16, etc when you need a certain minimum number of
 121 bits in a type.  Use int and nat when there's no particular
 122 constraint.  ANSI C only guarantees that ints are at least 16 bits but
 123 within GHC we assume they are 32 bits.  (I'm not sure if this is a
 124 good idea - ADR)
 125
 126 <li>
 127 Use StgFloat and StgDouble for floating point values which will go
 128 on/have come from the stack or heap.  Note that StgFloat may be the
 129 same as StgDouble on some architectures (eg Alphas) and that it might
 130 occupy many StgWords.
 131
 132 <p>
 133 Use PK_FLT(addr), PK_DBL(addr) to read StgFloat and
 134 StgDouble values from the stack/heap, and ASSIGN_FLT(val,addr)/
 135 ASSIGN_DBL(val,addr) to assign StgFloat/StgDouble values to heap/stack
 136 locations.  These macros take care of alignment restrictions.
 137
 138 <p>
 139 Heap/Stack locations are StgWord aligned; the alignment requirements
 140 of an StgDouble may be more than that of StgWord, but we don't pad
 141 misaligned StgDoubles (doing so would be too much hassle).
 142
 143 <p>
 144 Doing a direct assignment/read of an StgDouble to/from a mis-aligned
 145 location may not work, so we use the ASSIGN_DBL(,)/PK_DBL() macro,
 146 which goes via a temporary.
 147
 148 <p>
 149 Problem: if the architecture allows mis-aligned accesses, but prefers
 150 aligned accesses, these macros just add an extra level of indirection.
 151 We need to distinguish between an architecture that allows mis-aligned
 152 accesses and one that just imposes a performance penalty (this is most
 153 of them).  Perhaps have PREFERRED_ALIGNMENT and REQUIRED_ALIGMENT
 154 configurations?
 155
 156 <li>
 157 Avoid conditional code like this:
 158
 159 <pre>
 160   #ifdef solaris_HOST_OS
 161   // do something solaris specific
 162   #endif
 163 </pre>
 164
 165 Instead, add an appropriate test to the configure.in script and use
 166 the result of that test instead.
 167
 168 <pre>
 169   #ifdef HAVE_BSD_H
 170   // use a BSD library
 171   #endif
 172 </pre>
 173
 174 The problem is that things change from one version of an OS to another
 175 - things get added, things get deleted, things get broken, some things
 176 are optional extras.  Using "feature tests" instead of "system tests"
 177 makes things a lot less brittle.  Things also tend to get documented
 178 better.
 179
 180 </ul>
 181
 182 <h2>Debugging/robustness tricks</h2>
 183
 184
 185 Anyone who has tried to debug a garbage collector or code generator
 186 will tell you: "If a program is going to crash, it should crash as
 187 soon, as noisily and as often as possible."  There's nothing worse
 188 than trying to find a bug which only shows up when running GHC on
 189 itself and doesn't manifest itself until 10 seconds after the actual
 190 cause of the problem.
 191
 192 <p>
 193 The ideas in this section are mostly aimed at this issue:
 194
 195 <ul>
 196 <li>
 197 Use assertions.  Use lots of assertions.  If you write a comment
 198 that says "takes a +ve number" add an assertion.  If you're casting
 199 an int to a nat, add an assertion.  If you're casting an int to a char,
 200 add an assertion.
 201
 202 <li>
 203 Write special debugging code to check the integrity of your data structures.
 204 (Most of the runtime checking code is in <tt>src/Sanity.c</tt>)
 205 Add extra assertions which call this code at the start and end of any
 206 code that operates on your data structures.
 207
 208 <li>
 209 When you find a hard-to-spot bug, try to think of some assertions,
 210 sanity checks or whatever that would have made the bug easier to find.
 211
 212 <li>
 213 When defining an enumeration, it's a good idea not to use 0 for normal
 214 values.  Instead, make 0 raise an internal error.  The idea here is to
 215 make it easier to detect pointer-related errors on the assumption that
 216 random pointers are more likely to point to a 0 than to anything else.
 217
 218 <pre>
 219 typedef enum
 220     { i_INTERNAL_ERROR  /* Instruction 0 raises an internal error */
 221     , i_PANIC           /* irrefutable pattern match failed! */
 222     , i_ERROR           /* user level error */
 223
 224     ...
 225 </pre>
 226
 227 <li> Use #warning or #error whenever you write a piece of incomplete/broken code.
 228
 229 <li> When testing, try to make infrequent things happen often.
 230      For example, make a context switch/gc/etc happen every time a
 231      context switch/gc/etc can happen.  The system will run like a
 232      pig but it'll catch a lot of bugs.
 233
 234 </ul>
 235
 236 <h2>Syntactic details</h2>
 237
 238 <ul>
 239 <li><b>Important:</b> Put "redundant" braces or parens in your code.
 240 Omitting braces and parens leads to very hard to spot bugs -
 241 especially if you use macros (and you might have noticed that GHC does
 242 this a lot!)
 243
 244 <p>
 245 In particular:
 246 <ul>
 247 <li>
 248 Put braces round the body of for loops, while loops, if statements, etc.
 249 even if they "aren't needed" because it's really hard to find the resulting
 250 bug if you mess up.  Indent them any way you like but put them in there!
 251
 252 <li>
 253 When defining a macro, always put parens round args - just in case.
 254 For example, write:
 255 <pre>
 256   #define add(x,y) ((x)+(y))
 257 </pre>
 258 instead of
 259 <pre>
 260   #define add(x,y) x+y
 261 </pre>
 262
 263 <li>
 264 Don't define macros that expand to a list of statements.
 265 You could just use braces as in:
 266
 267 <pre>
 268   #define ASSIGN_CC_ID(ccID)              \
 269         {                                 \
 270         ccID = CC_ID;                     \
 271         CC_ID++;                          \
 272         }
 273 </pre>
 274
 275 but it's better to use the "do { ... } while (0)" trick instead:
 276
 277 <pre>
 278   #define ASSIGN_CC_ID(ccID)              \
 279         do {                              \
 280         ccID = CC_ID;                     \
 281         CC_ID++;                          \
 282         } while(0)
 283 </pre>
 284
 285 The following explanation comes from
 286 <a href="http://www.cs.umd.edu/users/cml/cstyle/code-std-disc.txt">The Usenet C programming FAQ</a>
 287 <pre>
 288 10.4:   What's the best way to write a multi-statement macro?
 289
 290 A:      The usual goal is to write a macro that can be invoked as if it
 291         were a statement consisting of a single function call.  This
 292         means that the "caller" will be supplying the final semicolon,
 293         so the macro body should not.  The macro body cannot therefore
 294         be a simple brace-enclosed compound statement, because syntax
 295         errors would result if it were invoked (apparently as a single
 296         statement, but with a resultant extra semicolon) as the if
 297         branch of an if/else statement with an explicit else clause.
 298
 299         The traditional solution, therefore, is to use
 300
 301                 #define MACRO(arg1, arg2) do {  \
 302                         /* declarations */      \
 303                         stmt1;                  \
 304                         stmt2;                  \
 305                         /* ... */               \
 306                         } while(0)      /* (no trailing ; ) */
 307
 308         When the caller appends a semicolon, this expansion becomes a
 309         single statement regardless of context.  (An optimizing compiler
 310         will remove any "dead" tests or branches on the constant
 311         condition 0, although lint may complain.)
 312
 313         If all of the statements in the intended macro are simple
 314         expressions, with no declarations or loops, another technique is
 315         to write a single, parenthesized expression using one or more
 316         comma operators.  (For an example, see the first DEBUG() macro
 317         in question 10.26.)  This technique also allows a value to be
 318         "returned."
 319
 320         References: H&S Sec. 3.3.2 p. 45; CT&P Sec. 6.3 pp. 82-3.
 321 </pre>
 322
 323 <li>
 324 Don't even write macros that expand to 0 statements - they can mess you
 325 up as well.  Use the doNothing macro instead.
 326 <pre>
 327   #define doNothing() do { } while (0)
 328 </pre>
 329 </ul>
 330
 331 <li>
 332 Use inline functions instead of macros if possible - they're a lot
 333 less tricky to get right and don't suffer from the usual problems
 334 of side effects, evaluation order, multiple evaluation, etc.
 335
 336 <ul>
 337 <li>Inline functions get the naming issue right.  E.g. they
 338   can have local variables which (in an expression context)
 339   macros can't.
 340
 341 <li> Inline functions have call-by-value semantics whereas macros
 342   are call-by-name.  You can be bitten by duplicated computation
 343   if you aren't careful.
 344
 345 <li> You can use inline functions from inside gdb if you compile with
 346   -O0 or -fkeep-inline-functions.  If you use macros, you'd better
 347   know what they expand to.
 348 </ul>
 349
 350 However, note that macros can serve as both l-values and r-values and
 351 can be "polymorphic" as these examples show:
 352 <pre>
 353   // you can use this as an l-value or an l-value
 354   #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo)
 355
 356   // polymorphic case
 357   // but note that min(min(1,2),3) does 3 comparisions instead of 2!!
 358   #define min(x,y) (((x)<=(y)) ? (x) : (y))
 359 </pre>
 360
 361 <li>
 362 Inline functions should be "static inline" because:
 363 <ul>
 364 <li>
 365 gcc will delete static inlines if not used or theyre always inlined.
 366
 367 <li>
 368   if they're externed, we could get conflicts between 2 copies of the
 369   same function if, for some reason, gcc is unable to delete them.
 370   If they're static, we still get multiple copies but at least they don't conflict.
 371 </ul>
 372
 373 OTOH, the gcc manual says this
 374 so maybe we should use extern inline?
 375
 376 <pre>
 377    When a function is both inline and `static', if all calls to the
 378 function are integrated into the caller, and the function's address is
 379 never used, then the function's own assembler code is never referenced.
 380 In this case, GNU CC does not actually output assembler code for the
 381 function, unless you specify the option `-fkeep-inline-functions'.
 382 Some calls cannot be integrated for various reasons (in particular,
 383 calls that precede the function's definition cannot be integrated, and
 384 neither can recursive calls within the definition).  If there is a
 385 nonintegrated call, then the function is compiled to assembler code as
 386 usual.  The function must also be compiled as usual if the program
 387 refers to its address, because that can't be inlined.
 388
 389    When an inline function is not `static', then the compiler must
 390 assume that there may be calls from other source files; since a global
 391 symbol can be defined only once in any program, the function must not
 392 be defined in the other source files, so the calls therein cannot be
 393 integrated.  Therefore, a non-`static' inline function is always
 394 compiled on its own in the usual fashion.
 395
 396    If you specify both `inline' and `extern' in the function
 397 definition, then the definition is used only for inlining.  In no case
 398 is the function compiled on its own, not even if you refer to its
 399 address explicitly.  Such an address becomes an external reference, as
 400 if you had only declared the function, and had not defined it.
 401
 402    This combination of `inline' and `extern' has almost the effect of a
 403 macro.  The way to use it is to put a function definition in a header
 404 file with these keywords, and put another copy of the definition
 405 (lacking `inline' and `extern') in a library file.  The definition in
 406 the header file will cause most calls to the function to be inlined.
 407 If any uses of the function remain, they will refer to the single copy
 408 in the library.
 409 </pre>
 410
 411 <li>
 412 This code
 413 <pre>
 414 int* p, q;
 415 </pre>
 416 looks like it declares two pointers but, in fact, only p is a pointer.
 417 It's safer to write this:
 418 <pre>
 419 int* p;
 420 int* q;
 421 </pre>
 422 You could also write this:
 423 <pre>
 424 int *p, *q;
 425 </pre>
 426 but Alastair prefers to split the declarations.
 427
 428 <li>
 429 Try to use ANSI C's enum feature when defining lists of constants of
 430 the same type.  Among other benefits, you'll notice that gdb uses the
 431 name instead of its (usually inscrutable) number when printing values
 432 with enum types and gdb will let you use the name in expressions you
 433 type.
 434
 435 <p>
 436 Examples:
 437 <pre>
 438     typedef enum { /* N.B. Used as indexes into arrays */
 439      NO_HEAP_PROFILING,
 440      HEAP_BY_CC,
 441      HEAP_BY_MOD,
 442      HEAP_BY_GRP,
 443      HEAP_BY_DESCR,
 444      HEAP_BY_TYPE,
 445      HEAP_BY_TIME
 446     } ProfilingFlags;
 447 </pre>
 448 instead of
 449 <pre>
 450     # define NO_HEAP_PROFILING  0       /* N.B. Used as indexes into arrays */
 451     # define HEAP_BY_CC         1
 452     # define HEAP_BY_MOD        2
 453     # define HEAP_BY_GRP        3
 454     # define HEAP_BY_DESCR      4
 455     # define HEAP_BY_TYPE       5
 456     # define HEAP_BY_TIME       6
 457 </pre>
 458 and
 459 <pre>
 460     typedef enum {
 461      CCchar    = 'C',
 462      MODchar   = 'M',
 463      GRPchar   = 'G',
 464      DESCRchar = 'D',
 465      TYPEchar  = 'Y',
 466      TIMEchar  = 'T'
 467     } ProfilingTag;
 468 </pre>
 469 instead of
 470 <pre>
 471     # define CCchar    'C'
 472     # define MODchar   'M'
 473     # define GRPchar   'G'
 474     # define DESCRchar 'D'
 475     # define TYPEchar  'Y'
 476     # define TIMEchar  'T'
 477 </pre>
 478 ToDo: at the time of writing, we still use the former.
 479
 480 <li>
 481 Alastair likes to use stgCast instead of C syntax.  He claims
 482 it's easier to write and easier to grep for. YMMV.
 483 <pre>
 484 #define stgCast(ty,e) ((ty)(e))
 485 </pre>
 486
 487 <li> Please keep to 80 columns: the line has to be drawn somewhere,
 488 and by keeping it to 80 columns we can ensure that code looks OK on
 489 everyone's screen.  Long lines are hard to read, and a sign that the
 490 code needs to be restructured anyway.
 491
 492 <li> We don't care too much about your indentation style but, if
 493 you're modifying a function, please try to use the same style as the
 494 rest of the function (or file).
 495
 496 <p>
 497 On which note:
 498 Hugs related pieces of code often start with the line:
 499 <pre>
 500   /* -*- mode: hugs-c; -*- */
 501 </pre>
 502 which helps Emacs mimic the indentation style used by Mark P Jones
 503 within Hugs.  Add this to your .emacs file.
 504 <pre>
 505   (defun hugs-c-mode ()
 506     "C mode with adjusted defaults for use with Hugs (based on linux-c-mode)"
 507     (interactive)
 508     (c-mode)
 509     (setq c-basic-offset 4)
 510     (setq indent-tabs-mode nil) ; don't use tabs to indent
 511     (setq c-recognize-knr-r nil)  ; no K&R here - don't pay the price
 512     ; no: (setq tab-width 4)
 513
 514     (c-set-offset 'knr-argdecl-intro    0)
 515     (c-set-offset 'case-label           0)
 516     (c-set-offset 'statement-case-intro '++)
 517     (c-set-offset 'statement-case-open  '+)
 518     )
 519 </pre>
 520
 521 </ul>
 522
 523 <h2>CVS issues</h2>
 524
 525 <ul>
 526 <li>
 527 Don't be tempted to reindent or reorganise large chunks of code - it
 528 generates large diffs in which it's hard to see whether anything else
 529 was changed.
 530 <p>
 531 If you must reindent or reorganise, don't do anything else in that commit
 532 and give advance warning that you're about to do it in case anyone else
 533 is changing that file.
 534 </ul>
 535
 536
 537
 538 </body>
 539 </html>