docs/coding-style.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
   2 <HTML>
   3 <HEAD>
   4    <TITLE>Access To The GHC CVS Repository</TITLE>
   5    <META NAME="GENERATOR" CONTENT="Mozilla/3.04Gold (X11; I; SunOS 5.5.1 i86pc) [Netscape]">
   6 </HEAD>
   7 <BODY>
   8
   9 <H1>Coding suggestions for GHC/Hugs related code</h1>
  10
  11 <h2>Comments</h2>
  12
  13 NB These are just suggestions.  They're not set in stone.  Some of
  14 them are probably misguided.  If you disagree with them, feel free to
  15 modify this document (and make your commit message reasonably informative)
  16 or mail someone (eg <a href="fp-cvs-fptools@dcs.gla.ac.uk">The FP-CVS mailing list</a> or
  17 <a
  18 href="mailto:reid-alastair@cs.yale.edu">reid-alastair@cs.yale.edu</a>).
  19
  20
  21 <h2>References</h2>
  22
  23 If you haven't read them already, you might like to check the following.
  24 Where they conflict with our suggestions, they're probably right.
  25
  26 <ul>
  27 <li>
  28 Writing Solid Code, Microsoft Press.  (Highly recommended.  Possibly
  29 the only Microsoft Press book that's worth reading.  SimonPJ has a
  30 copy.)
  31
  32 <li>
  33 Autoconf documentation (which doesn't seem to be on the web).
  34 See also <a href="http://peti.gmd.de/autoconf-archive/">The autoconf macro archive</a> and
  35 <a href="http://www.cyclic.com/cyclic-pages/autoconf.html">Cyclic Software's description</a>
  36
  37 <li>
  38 <a href="http://www.cs.umd.edu/users/cml/cstyle/indhill-cstyle.html">Indian Hill C Style and Coding Standards</a>.
  39
  40 <li>
  41 <a href="http://www.cs.umd.edu/users/cml/cstyle/">A list of C programming style links</a>
  42
  43 <li>
  44 <a href="http://www.lysator.liu.se/c/c-www.html">A very large list of C programming links</a>
  45
  46 <li>
  47 <a href="http://www.geek-girl.com/unix.html">A list of Unix programming links</a>
  48
  49
  50 </ul>
  51
  52
  53 <h2>Portability issues</h2>
  54
  55 <ul>
  56 <li>
  57 We use ANSI C with some extensions.  In particular, we use:
  58 <ul>
  59 <li>enum
  60 <li>ANSI style prototypes
  61 <li>#elsif, #error, #warning, ## and other cpp features
  62 </ul>
  63
  64 <li>
  65 We use the following gcc extensions (see gcc documentation):
  66 <ul>
  67 <li>zero length arrays (useful as the last field of a struct)
  68 <li>inline annotations on functions (see later)
  69 <li>Labeled elements in initialisers
  70 <li>Function attributes (mostly just no_return)
  71 <li>Macro varargs (actually, we don't use them yet but I'm very tempted)
  72 <li>Alastair really likes to use C++ style comments - but
  73     he'll probably regret it later.
  74 <li>other stuff I've forgotten about...
  75 </ul>
  76
  77 Some of these gcc/ANSI features could be avoided (for example, we
  78 could use __inline__ instead of inline or use the usual PROTO((...))
  79 trick in function prototypes) - but some of them can't be avoided
  80 so we don't try with the others.
  81
  82 <li>
  83 char can be signed or unsigned - always say which you mean
  84
  85 <li>
  86 Some architectures have memory alignment constraints.
  87 Others don't have any constraints but go faster if you align things.
  88 These macros tell you which alignment to use
  89
  90 <pre>
  91   /* minimum alignment of unsigned int */
  92   #define ALIGNMENT_UNSIGNED_INT 4
  93
  94   /* minimum alignment of long */
  95   #define ALIGNMENT_LONG 4
  96
  97   /* minimum alignment of float */
  98   #define ALIGNMENT_FLOAT 4
  99
 100   /* minimum alignment of double */
 101   #define ALIGNMENT_DOUBLE 4
 102 </pre>
 103
 104 <li>
 105 Use StgInt, StgWord and StgPtr when reading/writing ints and ptrs to
 106 the stack or heap.  Note that, by definition, StgInt, StgWord and
 107 StgPtr are the same size and have the same alignment constraints
 108 even if sizeof(int) != sizeof(ptr) on that platform.
 109
 110 <li>
 111 Use StgInt8, StgInt16, etc when you need a certain minimum number of
 112 bits in a type.  Use int and nat when there's no particular
 113 constraint.  ANSI C only guarantees that ints are at least 16 bits but
 114 within GHC we assume they are 32 bits.  (I'm not sure if this is a
 115 good idea - ADR)
 116
 117 <li>
 118 Use StgFloat and StgDouble for floating point values which will go
 119 on/have come from the stack or heap.  Note that StgFloat may be the
 120 same as StgDouble on some architectures (eg Alphas) and that it might
 121 occupy many StgWords.
 122
 123 <p>
 124 Use PK_FLT(addr), PK_DBL(addr) to read StgFloat and
 125 StgDouble values from the stack/heap, and ASSIGN_FLT(val,addr)/
 126 ASSIGN_DBL(val,addr) to assign StgFloat/StgDouble values to heap/stack
 127 locations.  These macros take care of alignment restrictions.
 128
 129 <p>
 130 Heap/Stack locations are StgWord aligned; the alignment requirements
 131 of an StgDouble may be more than that of StgWord, but we don't pad
 132 misaligned StgDoubles (doing so would be too much hassle).
 133
 134 <p>
 135 Doing a direct assignment/read of an StgDouble to/from a mis-aligned
 136 location may not work, so we use the ASSIGN_DBL(,)/PK_DBL() macro,
 137 which goes via a temporary.
 138
 139 <p>
 140 Problem: if the architecture allows mis-aligned accesses, but prefers
 141 aligned accesses, these macros just add an extra level of indirection.
 142 We need to distinguish between an architecture that allows mis-aligned
 143 accesses and one that just imposes a performance penalty (this is most
 144 of them).  Perhaps have PREFERRED_ALIGNMENT and REQUIRED_ALIGMENT
 145 configurations?
 146
 147 <li>
 148 Avoid conditional code like this:
 149
 150 <pre>
 151   #ifdef solaris_HOST_OS
 152   // do something solaris specific
 153   #endif
 154 </pre>
 155
 156 Instead, add an appropriate test to the configure.in script and use
 157 the result of that test instead.
 158
 159 <pre>
 160   #ifdef HAVE_BSD_H
 161   // use a BSD library
 162   #endif
 163 </pre>
 164
 165 The problem is that things change from one version of an OS to another
 166 - things get added, things get deleted, things get broken, some things
 167 are optional extras.  Using "feature tests" instead of "system tests"
 168 makes things a lot less brittle.  Things also tend to get documented
 169 better.
 170
 171 </ul>
 172
 173 <h2>Debugging/robustness tricks</h2>
 174
 175
 176 Anyone who has tried to debug a garbage collector or code generator
 177 will tell you: "If a program is going to crash, it should crash as
 178 soon, as noisily and as often as possible."  There's nothing worse
 179 than trying to find a bug which only shows up when running GHC on
 180 itself and doesn't manifest itself until 10 seconds after the actual
 181 cause of the problem.
 182
 183 <p>
 184 The ideas in this section are mostly aimed at this issue:
 185
 186 <ul>
 187 <li>
 188 Use assertions.  Use lots of assertions.  If you write a comment
 189 that says "takes a +ve number" add an assertion.  If you're casting
 190 an int to a nat, add an assertion.  If you're casting an int to a char,
 191 add an assertion.
 192
 193 <li>
 194 Write special debugging code to check the integrity of your data structures.
 195 (Most of the runtime checking code is in <tt>src/Sanity.c</tt>)
 196 Add extra assertions which call this code at the start and end of any
 197 code that operates on your data structures.
 198
 199 <li>
 200 When you find a hard-to-spot bug, try to think of some assertions,
 201 sanity checks or whatever that would have made the bug easier to find.
 202
 203 <li>
 204 When defining an enumeration, it's a good idea not to use 0 for normal
 205 values.  Instead, make 0 raise an internal error.  The idea here is to
 206 make it easier to detect pointer-related errors on the assumption that
 207 random pointers are more likely to point to a 0 than to anything else.
 208
 209 <pre>
 210 typedef enum
 211     { i_INTERNAL_ERROR  /* Instruction 0 raises an internal error */
 212     , i_PANIC           /* irrefutable pattern match failed! */
 213     , i_ERROR           /* user level error */
 214
 215     ...
 216 </pre>
 217
 218 <li> Use #warning or #error whenever you write a piece of incomplete/broken code.
 219
 220 <li> When testing, try to make infrequent things happen often.
 221      For example, make a context switch/gc/etc happen every time a
 222      context switch/gc/etc can happen.  The system will run like a
 223      pig but it'll catch a lot of bugs.
 224
 225 </ul>
 226
 227 <h2>Syntactic details</h2>
 228
 229 <ul>
 230 <li><b>Important:</b> Put "redundant" braces or parens in your code.
 231 Omitting braces and parens leads to very hard to spot bugs -
 232 especially if you use macros (and you might have noticed that GHC does
 233 this a lot!)
 234
 235 <p>
 236 In particular:
 237 <ul>
 238 <li>
 239 Put braces round the body of for loops, while loops, if statements, etc.
 240 even if they "aren't needed" because it's really hard to find the resulting
 241 bug if you mess up.  Indent them any way you like but put them in there!
 242
 243 <li>
 244 When defining a macro, always put parens round args - just in case.
 245 For example, write:
 246 <pre>
 247   #define add(x,y) ((x)+(y))
 248 </pre>
 249 instead of
 250 <pre>
 251   #define add(x,y) x+y
 252 </pre>
 253
 254 <li>
 255 Don't define macros that expand to a list of statements.
 256 You could just use braces as in:
 257
 258 <pre>
 259   #define ASSIGN_CC_ID(ccID)              \
 260         {                                 \
 261         ccID = CC_ID;                     \
 262         CC_ID++;                          \
 263         }
 264 </pre>
 265
 266 but it's better to use the "do { ... } while (0)" trick instead:
 267
 268 <pre>
 269   #define ASSIGN_CC_ID(ccID)              \
 270         do {                              \
 271         ccID = CC_ID;                     \
 272         CC_ID++;                          \
 273         } while(0)
 274 </pre>
 275
 276 The following explanation comes from
 277 <a href="http://www.cs.umd.edu/users/cml/cstyle/code-std-disc.txt">The Usenet C programming FAQ</a>
 278 <pre>
 279 10.4:   What's the best way to write a multi-statement macro?
 280
 281 A:      The usual goal is to write a macro that can be invoked as if it
 282         were a statement consisting of a single function call.  This
 283         means that the "caller" will be supplying the final semicolon,
 284         so the macro body should not.  The macro body cannot therefore
 285         be a simple brace-enclosed compound statement, because syntax
 286         errors would result if it were invoked (apparently as a single
 287         statement, but with a resultant extra semicolon) as the if
 288         branch of an if/else statement with an explicit else clause.
 289
 290         The traditional solution, therefore, is to use
 291
 292                 #define MACRO(arg1, arg2) do {  \
 293                         /* declarations */      \
 294                         stmt1;                  \
 295                         stmt2;                  \
 296                         /* ... */               \
 297                         } while(0)      /* (no trailing ; ) */
 298
 299         When the caller appends a semicolon, this expansion becomes a
 300         single statement regardless of context.  (An optimizing compiler
 301         will remove any "dead" tests or branches on the constant
 302         condition 0, although lint may complain.)
 303
 304         If all of the statements in the intended macro are simple
 305         expressions, with no declarations or loops, another technique is
 306         to write a single, parenthesized expression using one or more
 307         comma operators.  (For an example, see the first DEBUG() macro
 308         in question 10.26.)  This technique also allows a value to be
 309         "returned."
 310
 311         References: H&S Sec. 3.3.2 p. 45; CT&P Sec. 6.3 pp. 82-3.
 312 </pre>
 313
 314 <li>
 315 Don't even write macros that expand to 0 statements - they can mess you
 316 up as well.  Use the doNothing macro instead.
 317 <pre>
 318   #define doNothing() do { } while (0)
 319 </pre>
 320 </ul>
 321
 322 <li>
 323 Use inline functions instead of macros if possible - they're a lot
 324 less tricky to get right and don't suffer from the usual problems
 325 of side effects, evaluation order, multiple evaluation, etc.
 326
 327 <ul>
 328 <li>Inline functions get the naming issue right.  E.g. they
 329   can have local variables which (in an expression context)
 330   macros can't.
 331
 332 <li> Inline functions have call-by-value semantics whereas macros
 333   are call-by-name.  You can be bitten by duplicated computation
 334   if you aren't careful.
 335
 336 <li> You can use inline functions from inside gdb if you compile with
 337   -O0 or -fkeep-inline-functions.  If you use macros, you'd better
 338   know what they expand to.
 339 </ul>
 340
 341 However, note that macros can serve as both l-values and r-values and
 342 can be "polymorphic" as these examples show:
 343 <pre>
 344   // you can use this as an l-value or an l-value
 345   #define PROF_INFO(cl) (((StgClosure*)(cl))->header.profInfo)
 346
 347   // polymorphic case
 348   // but note that min(min(1,2),3) does 3 comparisions instead of 2!!
 349   #define min(x,y) (((x)<=(y)) ? (x) : (y))
 350 </pre>
 351
 352 <li>
 353 Inline functions should be "static inline" because:
 354 <ul>
 355 <li>
 356 gcc will delete static inlines if not used or theyre always inlined.
 357
 358 <li>
 359   if they're externed, we could get conflicts between 2 copies of the
 360   same function if, for some reason, gcc is unable to delete them.
 361   If they're static, we still get multiple copies but at least they don't conflict.
 362 </ul>
 363
 364 OTOH, the gcc manual says this
 365 so maybe we should use extern inline?
 366
 367 <pre>
 368    When a function is both inline and `static', if all calls to the
 369 function are integrated into the caller, and the function's address is
 370 never used, then the function's own assembler code is never referenced.
 371 In this case, GNU CC does not actually output assembler code for the
 372 function, unless you specify the option `-fkeep-inline-functions'.
 373 Some calls cannot be integrated for various reasons (in particular,
 374 calls that precede the function's definition cannot be integrated, and
 375 neither can recursive calls within the definition).  If there is a
 376 nonintegrated call, then the function is compiled to assembler code as
 377 usual.  The function must also be compiled as usual if the program
 378 refers to its address, because that can't be inlined.
 379
 380    When an inline function is not `static', then the compiler must
 381 assume that there may be calls from other source files; since a global
 382 symbol can be defined only once in any program, the function must not
 383 be defined in the other source files, so the calls therein cannot be
 384 integrated.  Therefore, a non-`static' inline function is always
 385 compiled on its own in the usual fashion.
 386
 387    If you specify both `inline' and `extern' in the function
 388 definition, then the definition is used only for inlining.  In no case
 389 is the function compiled on its own, not even if you refer to its
 390 address explicitly.  Such an address becomes an external reference, as
 391 if you had only declared the function, and had not defined it.
 392
 393    This combination of `inline' and `extern' has almost the effect of a
 394 macro.  The way to use it is to put a function definition in a header
 395 file with these keywords, and put another copy of the definition
 396 (lacking `inline' and `extern') in a library file.  The definition in
 397 the header file will cause most calls to the function to be inlined.
 398 If any uses of the function remain, they will refer to the single copy
 399 in the library.
 400 </pre>
 401
 402 <li>
 403 This code
 404 <pre>
 405 int* p, q;
 406 </pre>
 407 looks like it declares two pointers but, in fact, only p is a pointer.
 408 It's safer to write this:
 409 <pre>
 410 int* p;
 411 int* q;
 412 </pre>
 413 You could also write this:
 414 <pre>
 415 int *p, *q;
 416 </pre>
 417 but Alastair prefers to split the declarations.
 418
 419 <li>
 420 Try to use ANSI C's enum feature when defining lists of constants of
 421 the same type.  Among other benefits, you'll notice that gdb uses the
 422 name instead of its (usually inscrutable) number when printing values
 423 with enum types and gdb will let you use the name in expressions you
 424 type.
 425
 426 <p>
 427 Examples:
 428 <pre>
 429     typedef enum { /* N.B. Used as indexes into arrays */
 430      NO_HEAP_PROFILING,
 431      HEAP_BY_CC,
 432      HEAP_BY_MOD,
 433      HEAP_BY_GRP,
 434      HEAP_BY_DESCR,
 435      HEAP_BY_TYPE,
 436      HEAP_BY_TIME
 437     } ProfilingFlags;
 438 </pre>
 439 instead of
 440 <pre>
 441     # define NO_HEAP_PROFILING  0       /* N.B. Used as indexes into arrays */
 442     # define HEAP_BY_CC         1
 443     # define HEAP_BY_MOD        2
 444     # define HEAP_BY_GRP        3
 445     # define HEAP_BY_DESCR      4
 446     # define HEAP_BY_TYPE       5
 447     # define HEAP_BY_TIME       6
 448 </pre>
 449 and
 450 <pre>
 451     typedef enum {
 452      CCchar    = 'C',
 453      MODchar   = 'M',
 454      GRPchar   = 'G',
 455      DESCRchar = 'D',
 456      TYPEchar  = 'Y',
 457      TIMEchar  = 'T'
 458     } ProfilingTag;
 459 </pre>
 460 instead of
 461 <pre>
 462     # define CCchar    'C'
 463     # define MODchar   'M'
 464     # define GRPchar   'G'
 465     # define DESCRchar 'D'
 466     # define TYPEchar  'Y'
 467     # define TIMEchar  'T'
 468 </pre>
 469 ToDo: at the time of writing, we still use the former.
 470
 471 <li>
 472 Alastair likes to use stgCast instead of C syntax.  He claims
 473 it's easier to write and easier to grep for. YMMV.
 474 <pre>
 475 #define stgCast(ty,e) ((ty)(e))
 476 </pre>
 477
 478 <li> Please keep to 80 columns: the line has to be drawn somewhere,
 479 and by keeping it to 80 columns we can ensure that code looks OK on
 480 everyone's screen.  Long lines are hard to read, and a sign that the
 481 code needs to be restructured anyway.
 482
 483 <li> We don't care too much about your indentation style but, if
 484 you're modifying a function, please try to use the same style as the
 485 rest of the function (or file).
 486
 487 <p>
 488 On which note:
 489 Hugs related pieces of code often start with the line:
 490 <pre>
 491   /* -*- mode: hugs-c; -*- */
 492 </pre>
 493 which helps Emacs mimic the indentation style used by Mark P Jones
 494 within Hugs.  Add this to your .emacs file.
 495 <pre>
 496   (defun hugs-c-mode ()
 497     "C mode with adjusted defaults for use with Hugs (based on linux-c-mode)"
 498     (interactive)
 499     (c-mode)
 500     (setq c-basic-offset 4)
 501     (setq indent-tabs-mode nil) ; don't use tabs to indent
 502     (setq c-recognize-knr-r nil)  ; no K&R here - don't pay the price
 503     ; no: (setq tab-width 4)
 504
 505     (c-set-offset 'knr-argdecl-intro    0)
 506     (c-set-offset 'case-label           0)
 507     (c-set-offset 'statement-case-intro '++)
 508     (c-set-offset 'statement-case-open  '+)
 509     )
 510 </pre>
 511
 512
 513 </ul>
 514
 515 <h2>CVS issues</h2>
 516
 517 <ul>
 518 <li>
 519 Don't be tempted to reindent or reorganise large chunks of code - it
 520 generates large diffs in which it's hard to see whether anything else
 521 was changed.
 522 <p>
 523 If you must reindent or reorganise, don't do anything else in that commit
 524 and give advance warning that you're about to do it in case anyone else
 525 is changing that file.
 526 </ul>
 527
 528
 529
 530 </body>
 531 </html>