doc/archman.tex

   1 \documentclass[10pt,oneside]{article}
   2 \reversemarginpar
   3 \usepackage{palatino}
   4 \usepackage{wrapfig}
   5 \usepackage{epsfig}
   6 \usepackage{verbatim}
   7 \usepackage{parskip}
   8 \usepackage{register}
   9 \usepackage{bytefield}
  10 \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
  11 \renewcommand{\ttdefault}{cmtt}
  12 \title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of Fleet}}
  13 \begin{document}
  14 \maketitle
  15
  16 \begin{thebibliography}{[GDG01]}
  17 \bibitem[IES02]{ies02}
  18   Sutherland, Ivan,
  19   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer}
  20 \bibitem[IES14]{ies14}
  21   Sutherland, Ivan,
  22   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition}
  23 \bibitem[IES44]{ies44}
  24   Sutherland, Ivan,
  25   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition}
  26 \bibitem[MS68]{WheelOfReincarnation}
  27   Myer, T. and Sutherland, Ivan, {\it On the Design of Display Processors}
  28   Communications of the ACM, Vol 11, No. 6, June 1968 .
  29 \end{thebibliography}
  30
  31 \vspace{3cm}
  32 \begin{abstract}
  33 This manual attempts to detail all those aspects of Fleet which are
  34 visible to a software programmer or compiler backend.
  35
  36 This document assumes that the reader is broadly familiar with the
  37 Fleet architecture and the motivation behind it.  For a far better
  38 introduction to these topics, the reader is referred to \cite{ies02}
  39 \cite{ies14} and \cite{ies44}.  Please bear in mind that many of the
  40 details in these documents have been changed, but they still serve as
  41 the authoritative introduction to the architecture.  After reviewing
  42 these introductory papers, the reader is invited to digest this
  43 architecture manual for the latest information on the details
  44 necessary to write software and compilers for Fleet.
  45 \end{abstract}
  46
  47
  48 \pagebreak
  49 \section*{Introduction (from \cite{ies02})}
  50
  51 During the past two and a half years, the Fleet architecture has gone
  52 through massive changes and improvements.  In spite of this, however,
  53 the fundamental idea behind Fleet remains unchanged.  As Sutherland
  54 writes in the original Berkeley Fleet description \cite{ies02},
  55
  56 {\it
  57 When computers were new, logic and storage were expensive and wires
  58 were relatively cheap.  Early computer designers avoided using logic
  59 wherever possible but were not greatly concerned with communication.
  60 They made design choices consistent with the costs of the day.  My
  61 favorite example is the jump instruction.  Early designers put jump
  62 instructions at the end of each block of code to avoid the expense of
  63 storing the address of the next block while executing the present one.
  64
  65 Today's chip fabrication methods invert the older cost balance.  In
  66 today's integrated circuits logic and memory are now almost free but
  67 communication costs dominate chip area, energy consumption and delay.
  68 In spite of these changes in the stuff from which we make computers,
  69 vestiges of the past remain in many of today's common microprocessor
  70 designs.  For example, jump instructions still appear at the end of
  71 each basic block of code in spite of the need to pre-fetch the next
  72 block while executing this one.  It seems better today to store a
  73 pointer to the next block and the length of the current block early in
  74 each block of code.
  75
  76 Instead of following the path of history, I'd like to listen carefully
  77 to what modern chip structures have to teach about how one might build
  78 a modern computer.  I see three major lessons.  First, simplicity can
  79 reduce cost.  Second, moving data will consume most of the time,
  80 energy and chip area of a modern computer. And third, the low cost of
  81 logic makes concurrency available if we can figure out how to use it.
  82
  83 \begin{itemize}
  84
  85 \item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating
  86       all processing devices alike...
  87
  88 \item {\bf Communication}: The Fleet architecture seeks to control the
  89       cost of communication by putting it under direct programmer
  90       control.  Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that
  91       include concealed communication to and from a register file...
  92
  93 \item {\bf Concurrency}: The Fleet architecture assumes concurrency
  94       nearly everywhere...
  95
  96 \item {\bf Asynchrony}: This provides enormous flexibility for
  97       implementers to improve the performance or reduce the cost of
  98       the processing devices in a Fleet system.  -- Synchronous
  99       implementations both of ships and of the switch fabric are
 100       possible provided they use validity or occupancy bits to achieve
 101       the arbitrary delays required at sources and destinations...
 102
 103 \end{itemize}
 104 }
 105
 106
 107 \pagebreak
 108 \section*{The Programmer's View of The Ship-Fabric Interface}
 109
 110 The role of the Fleet switch fabric is to carry {\it packets} from
 111 {\it sources} to {\it destinations}.
 112
 113 A packet consists of a {\it path} and a {\it payload}.  The path is
 114 some string of bits which tells the switch fabric how to route from
 115 the desired source to the desired destination.  The distinction
 116 between a {\it path} and a {\it destination} is extremely important!
 117 The payload is either a {\it token} or a {\it data item}.  If the
 118 payload is a data item, it includes one machine word of data.
 119
 120 The diagram below represents a {\it programmer's} conceptual view of
 121 the interface between ships and the switch fabric.  Actual
 122 implementation circuitry may differ substantially.  Sources and
 123 destinations which can send and recieve only tokens -- not data items
 124 -- are drawn as dashed lines.
 125
 126 \vspace{0.3cm}
 127 \begin{center}
 128 \epsfig{file=ports,width=4in}
 129 \end{center}
 130 \vspace{0.3cm}
 131
 132 The term {\it port} refers to each interface to the ship, the {\it
 133   valve} connecting it to the switch fabric, and the corresponding
 134 sources and destinations on the switch fabric.
 135
 136 Each valve consists of a {\it data latch}, which is as wide as a
 137 single machine word and a {\it pump}, which is a circular fifo of
 138 instruction-width latches.  The contents of the instruction fifo
 139 control the data latch, as future chapters will explain.
 140
 141 Note that the pump in each valve has a destination of its own, and
 142 that there is no buffering fifo guarding this destination.  Note that
 143 all other destinations are guarded by a buffering fifo; the size of
 144 this fifo is exposed to the software programmer so she can ensure that
 145 deadlock does not occur.
 146
 147
 148 \pagebreak
 149 \section*{Data Formats}
 150
 151 \subsection*{Packet Path (12 bits)}
 152
 153 These bits appear physically within the switch fabric, and have
 154 ``address bit timing.''  The {\tt T} bit is the ``tokenhood'' bit; if
 155 set, this packet represents a token and it does not cause the switch
 156 fabric data latches to fire.
 157
 158 {\tt\footnotesize
 159 \begin{bytefield}{49}
 160   \bitheader[b]{37,47,48}\\
 161   \bitbox{1}{T}
 162   \bitbox{11}{Path}
 163   \bitbox[l]{37}{}
 164 \end{bytefield}
 165 }
 166
 167 \subsection*{Data Word In Memory (37 bits)}
 168
 169 A word of data is 37 bits wide.
 170
 171 {\tt\footnotesize
 172 \begin{bytefield}{49}
 173   \bitheader[b]{0,36}\\
 174   \bitbox[r]{12}{}
 175   \bitbox{37}{Data Word}
 176 \end{bytefield}
 177 }
 178
 179 \subsection*{Packet In Flight (49 bits)}
 180
 181 {\tt\footnotesize
 182 \begin{bytefield}{49}
 183   \bitheader[b]{0,36,37,47,48}\\
 184   \bitbox{1}{T}
 185   \bitbox{11}{Packet Path}
 186   \bitbox{37}{Data Word}
 187 \end{bytefield}
 188 }
 189
 190 \subsection*{Instruction In Memory (37 bits)}
 191
 192 An instruction must be no wider than a memory word.  The next section
 193 explains the bits in greater detail.  The {\tt Instruction Path} is
 194 the path which the instruction itself will take in order to arrive at
 195 the desired pump destination.  The {\it Data/Token Path} is placed on
 196 any packet inserted into the switch fabric as a result of executing
 197 the instruction.
 198
 199 {\tt\tiny
 200 \begin{bytefield}{49}
 201   \bitheader[b]{0,10,11,17,18-26,36}\\
 202   \bitbox[r]{12}{}
 203   \bitbox{11}{Instruction Path}
 204   \bitbox{1}{1}
 205   \bitbox{1}{Ti}
 206   \bitbox{1}{Di}
 207   \bitbox{1}{Dc}
 208   \bitbox{1}{Do}
 209   \bitbox{1}{To}
 210   \bitbox{1}{Ig}
 211   \bitbox{1}{Rq}
 212   \bitbox{11}{Data/Token Path}
 213   \bitbox{7}{Count}
 214 \end{bytefield}
 215 }
 216
 217 \subsection*{Instruction Packet In Flight (49 bits)}
 218
 219 Note that Fleet simply copies the {\tt Instruction Path} field, bit
 220 for bit, into the packet {\tt Path} field in order to ``dispatch'' an
 221 instruction.
 222
 223 {\tt\tiny
 224 \begin{bytefield}{49}
 225   \bitheader[b]{0,10,11,17,18-25,37,47,48}\\
 226   \bitbox[r]{1}{0}
 227   \bitbox{11}{Instruction Path}
 228   \bitbox{11}{Instruction Path}
 229   \bitbox{1}{1}
 230   \bitbox{1}{Ti}
 231   \bitbox{1}{Di}
 232   \bitbox{1}{Dc}
 233   \bitbox{1}{Do}
 234   \bitbox{1}{To}
 235   \bitbox{1}{Ig}
 236   \bitbox{1}{Rq}
 237   \bitbox{11}{Data/Token Path}
 238   \bitbox{7}{Count}
 239 \end{bytefield}
 240 }
 241
 242 \setlength{\bitwidth}{5mm}
 243 \pagebreak
 244
 245 \section*{Instruction Formats}
 246 \begin{wrapfigure}{R}{2in}
 247 \epsfig{file=locations,width=2in}
 248 \vspace{-0.4in}
 249 \end{wrapfigure}
 250 Within the pump's instruction fifo, there are two points of particular
 251 interest.  The first point is the {\it insertion point}, which is the
 252 point in the ring where new instructions enter from the switch fabric.
 253 The other point of interest is the {\it execution point}, which is the
 254 point in the ring where instructions other than {\tt kill} and {\tt
 255   unclog} are performed.
 256
 257 The programmer may assume that the insertion point is the immediate
 258 successor of the insertion point.
 259
 260 \vspace{0.3cm}
 261 {\bf Kill}
 262
 263 {\tt
 264 \begin{bytefield}{26}
 265   \bitheader[b]{0,6,7,20-25}\\
 266   \bitbox{1}{0}
 267   \bitbox{1}{0}
 268   \bitbox{1}{1}
 269   \bitbox{1}{0}
 270   \bitbox{1}{1}
 271   \bitbox{14}{}
 272   \bitbox{7}{Count}
 273 \end{bytefield}}
 274 When a {\tt kill} instruction reaches the insertion point, it will
 275 wait there for a non-{\tt clog} instruction to reach the execution
 276 point.  When this occurs, the instruction at the execution point is
 277 retired and the count field of the {\tt kill} instruction is
 278 decremented.  If the {\tt kill} instruction's count was {\tt 0} before
 279 decrementing, the {\tt kill} instruction is retired.
 280
 281 \vspace{0.3cm}
 282 {\bf Clog}
 283
 284 {\tt
 285 \begin{bytefield}{26}
 286   \bitheader[b]{0,20-25}\\
 287   \bitbox{1}{0}
 288   \bitbox{1}{0}
 289   \bitbox{1}{1}
 290   \bitbox{1}{0}
 291   \bitbox{1}{0}
 292   \bitbox{21}{}
 293 \end{bytefield}}
 294 When a {\tt clog} instruction reaches the execution point, no more
 295 instructions will be executed until an {\tt unclog} is performed.
 296
 297 \vspace{0.3cm}
 298 {\bf UnClog}
 299
 300 {\tt
 301 \begin{bytefield}{26}
 302   \bitheader[b]{0,20-25}\\
 303   \bitbox{1}{0}
 304   \bitbox{1}{0}
 305   \bitbox{1}{0}
 306   \bitbox{1}{1}
 307   \bitbox{1}{0}
 308   \bitbox{21}{}
 309 \end{bytefield}}
 310 When an {\tt unclog} instruction reaches the insertion point, it will wait
 311 there until a {\tt clog} instruction is at the execution point.  When
 312 this occurs, both instructions retire. {\it Note:} this means
 313 that if an {\tt unclog} instruction enters the instruction fifo
 314 without a {\tt clog} ahead of it, the pump's insertion point will become
 315 irretrievably jammed.
 316
 317 \vspace{0.3cm}
 318 {\bf Literal}
 319
 320 {\tt
 321 \begin{bytefield}{26}
 322   \bitheader[b]{0,6,7,23-25}\\
 323   \bitbox{1}{0}
 324   \bitbox{1}{1}
 325   \bitbox{17}{Literal}
 326   \bitbox{7}{Count}
 327 \end{bytefield}}
 328 When a literal instruction reaches the execution point, its {\tt
 329   Literal} field is sign-extended to a full word length and captured
 330 in the data latch.
 331
 332
 333 \pagebreak
 334 {\bf Normal Instruction (at Execution Point)}
 335
 336 {\tt
 337 \begin{bytefield}{26}
 338   \bitheader[b]{0,6,7,16-25}\\
 339   \bitbox{1}{1}
 340   \bitbox{1}{Ti}
 341   \bitbox{1}{Di}
 342   \bitbox{1}{Dc}
 343   \bitbox{1}{Do}
 344   \bitbox{1}{To}
 345   \bitbox{1}{Ig}
 346   \bitbox{1}{Rq}
 347   \bitbox{11}{Dest}
 348   \bitbox{7}{Count}
 349 \end{bytefield}
 350 }
 351
 352 \begin{itemize}
 353
 354   \item [\tt Ti] ({\bf Token Input}) wait for a token and accept
 355                  it\footnote{{\tt Ti}=1,{\tt Di}=1 is invalid on inbox.}
 356
 357   \item [\tt Di] ({\bf Data Input}) wait for a datum and accept it.
 358
 359   \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
 360                  datum.  This bit is ignored if the incoming packet is
 361                  a token. \footnote{ Note that {\tt Di}=0,{\tt Dc}=1
 362                  is meaningless and therefore reserved for other
 363                  uses.}
 364
 365   \item [\tt Do] ({\bf Data Output}) emit a datum.
 366
 367   \item [\tt To] ({\bf Token Output}) emit a token.\footnote{ {\tt To}=1,{\tt
 368                  Do}=1 have special meaning on an outbox.}
 369
 370   \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
 371                  the {\tt To} bit unless {\tt Count=0} \footnote{{\tt
 372                  To}=0,{\tt Ig}=1 is invalid}
 373
 374   \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
 375                  count are ``Re-Queued'' rather than RePeated.  See
 376                  {\tt Count} for more detail. \footnote{ An
 377                  instruction {\it in memory} may not have {\tt
 378                  Rq=1,Count=0} (use {\tt Rq=0,Count=0})}
 379
 380   \item  [\tt Count] ({\bf Count}) {\it After} executing:
 381 \begin{verbatim}
 382 if (Count==0) {
 383     discard this instruction
 384 } else {
 385     if Count < MAX_COUNT {
 386       decrement count
 387     }
 388     if Rq=1 or Literal {
 389        put this instruction back into the instruction fifo
 390     } else {
 391        execute this instruction again
 392     }
 393 }
 394 \end{verbatim}
 395 Note how a ``standing'' instruction is encoded as {\tt Count=1111111}
 396
 397 \item [\tt Dest] ({\bf Data/Token Destination})
 398    Normally, this field is copied into the address portion of any
 399    outgoing packet ({\tt Do} on an outbox or {\tt To}).
 400
 401    However, in the special case of an outbox, if {\tt Do=1,To=1}, then
 402    the {\it most significant} {\tt 11} bits of the value in the {\it
 403    data register} are used as a destination address instead.
 404    %\footnote{This
 405    %functionality eliminates the need for any sort of ``{\tt Execute}''
 406    %ship, and lets a {\tt Fifo} ship act as an extension of the
 407    %instruction queue in the pump.}
 408
 409 \end{itemize}
 410
 411
 412 \pagebreak
 413 \section*{Fleet Assembly Language}
 414
 415 The Fleet Assembly language is designed to be a human-readable
 416 depiction of the bits which comprise a Fleet program.  The formal
 417 syntax is given below:
 418
 419 \verbatiminput{fleet.g}
 420
 421 \pagebreak
 422 \section*{FleetDoc}
 423
 424 Inspired by JavaDoc, the Fleet software toolchain includes a tool
 425 called {\tt FleetDoc}, which processes {\it ship description files}
 426 (with the extension {\tt .ship}).  These files contain sections for:
 427
 428 \begin{itemize}
 429 \item The name of the ship
 430 \item A list of its ports
 431 \item A prose description of the ship
 432 \item A list of the constants associated with a ship
 433 \item Java source code for a software simulation of the ship
 434 \item Verilog source code for an FPGA simulation of the ship
 435 \item A test case for the ship
 436 \item A list of the contributors to all of the above
 437 \end{itemize}
 438
 439 Keeping all of this information in a single file ensures that changes
 440 to one item (such as the list of ports) are propagated to the other
 441 items (such as the verilog code for simulation purposes).
 442
 443 Keeping this information in {\it machine-readable} format makes it
 444 possible to automatically generate tools which depend on the precise
 445 details of ship ports (such as the Fleet assembler) without the risk
 446 inherent in manual synchronization of the spec with the
 447 implementation.
 448
 449 The latter half of this manual is generated automatically from the
 450 contents of the {\tt .ship} files.
 451
 452 \pagebreak
 453 \subsection*{An Example FleetDoc File}
 454 \begin{verbatim}
 455 ship: BitFifo
 456
 457 == Ports ===========================================================
 458 in:   in
 459 in:   inOp
 460   constant lsbFirst: ....................................1
 461   constant msbFirst: ....................................0
 462   constant drop:     .............................uuuuuu..
 463   constant take:     .......................uuuuuu........
 464 ...
 465
 466 == TeX ==============================================================
 467 The BitFifo internally stores some number of bits in a fifo structure.
 468 Its capacity is guaranteed to be at least two full machine words or
 469 more.
 470 ...
 471
 472 == Fleeterpreter ====================================================
 473 public void service() {
 474   if (box_inOp.dataReadyForShip() && box_in.dataReadyForShip()) {
 475   ...
 476
 477 == FPGA ==============================================================
 478 always @(posedge clk) begin
 479   if (!in_r    && in_a)     in_a    <= 0;
 480   if (!inOp_r  && inOp_a)   inOp_a  <= 0;
 481   ...
 482
 483 == Test ==============================================================
 484 #expect 1
 485 #expect 68719476736
 486 #ship debug        : Debug
 487 #ship bitfifo      : BitFifo
 488
 489 bitfifo.outOp:      literal BitFifo.outOp[take=37]; [*] deliver;
 490 ...
 491
 492 == Contributors =========================================================
 493 Amir Kamil <kamil@cs.berkeley.edu>
 494 Adam Megacz <megacz@cs.berkeley.edu>
 495 \end{verbatim}
 496
 497
 498 \pagebreak
 499 \section*{Java APIs}
 500 \subsection*{Representing Code}
 501 \subsection*{Representing Ships}
 502
 503 \pagebreak
 504 \section*{Misc}
 505
 506 \begin{verbatim}
 507 - sequencing guarantees
 508 - codebag format
 509 - behavior when token arrives at data port, vice versa
 510 - overlapping codebags
 511 \end{verbatim}
 512
 513 \pagebreak
 514 \section*{Future Directions}
 515
 516 \subsection*{The Role of the Count Field}
 517
 518 Looking back on the design of the pump, several things are now
 519 apparent which were not initially.  In particular, it seems that it
 520 could be useful to separate {\it loading the count register} from
 521 other types of instructions.  This would have a few advantages:
 522
 523 \begin{itemize}
 524 \item The size of the count field would not be a consideration in the
 525       ``instruction budget'' of normal execution instructions
 526
 527 \item It would be possible to have finitely-repeating,
 528       infinitely-requeueing instructions.  Various implementation
 529       details\footnote{in many implementations, the count field of an
 530       instruction is destroyed as it counts down; another register and
 531       mux would be needed to ``save'' this value so it could be
 532       restored prior to being requeued} make it awkward to support
 533       this unless the count register can be manipulated independently
 534       of instruction execution.
 535 \end{itemize}
 536
 537 With this in mind, we can refactor the ``essence of the pump'' into
 538 the following fields, which are basically orthogonal:
 539
 540 \begin{itemize}
 541 \item wait for a token
 542 \item wait for a datum
 543 \item load the data latch from the awaited datum
 544 \item load the data latch from a literal in the instruction
 545 \item send a datum to a destination indicated by the instruction
 546 \item send a datum to a destination indicated by the data latch value
 547 \item send a token to a destination indicated by the instruction
 548 \item send a token to a destination indicated by the data latch value
 549 \item set the count register to a literal value
 550 \item decrement the count register
 551 \item requeue this instruction if {\tt count>0}
 552 \item requeue this instruction unconditionally
 553 \item repeat this instruction if {\tt count>0}
 554 \item repeat this instruction unconditionally
 555 \end{itemize}
 556
 557 The crucial change here is the decoupling of the act of {\it loading
 558   the count register} from the act of {\it loading the next
 559   instruction}.
 560
 561 At this point, it boils down to a question of instruction bit
 562 budgeting.  Currently we have a separate instruction form for
 563 literals, so that the literal can use bits which are not otherwise
 564 relevant to literal instructions (for example, {\tt Di}).  Adding an
 565 additional instruction form for loading the count register would have
 566 similar advantages.
 567
 568
 569 \subsection*{For Just a Little Bit More...}
 570
 571 Compared to early revisions of the Fleet architecture, the valves of
 572 FleetTwo are substantially more powerful.  One might ask, why not add
 573 additional power to the valves, perhaps up to the point of making them
 574 tiny von Neumann machines in their own right?  Indeed, doing so would
 575 risk reliving the {\it Wheel of Reincarnation}
 576 \cite{WheelOfReincarnation}.
 577
 578 I believe, however, that there is a fundamental distinction to be
 579 made between devices whose control flow {\it can branch} and those
 580 whose control flow {\it cannot branch}, and that this distinction is
 581 founded in the geometry of VLSI, and indeed, Euclidean space.
 582 Ultimately, the instruction stream for any component of a processor
 583 must exist in some physical incarnation -- some geometric extent.  If
 584 the control flow of a program is linear, with no branching, then the
 585 instruction stream has an optimal spatial mapping which is quite
 586 efficient: it is simply laid out in FIFO fashion.
 587
 588 By contrast, however, a program whose instruction stream has
 589 unrestricted branching is logically a {\it tree} structure.  Quite
 590 unfortunately, general trees have no efficient mapping onto
 591 two-dimensional surfaces.  By this, I mean that in every possible
 592 mapping, there exists some pair of {\it logically adjacent}
 593 instructions whose {\it physical distance} is proportional to the size
 594 of the entire program, rather than some constant factor irrespective
 595 of the program size.  Therefore I conclude that the spatial mapping of
 596 a program with unrestricted branching is in an entirely different
 597 class than that of programs with linear control flow.  I choose this
 598 distinction as the basis for limiting our valves to linear control
 599 flow sequences.  A similar argument can be made for looping constructs
 600 in which two or more loops may be nested within a single parent loop.
 601
 602 With this distinction in mind, it would not be objectionable for
 603 future incarnations of Fleet to have {\it predicated} instructions at
 604 the valves.  This might take the form of:
 605
 606 \begin{itemize}
 607 \item Instructions to unconditionally set some ``state flags''
 608 \item Instructions to conditionally set some ``state flags'' based on the contents of the data latch
 609 \item Bits which cause an instruction to be ignored or retired based on the state flags
 610 \end{itemize}
 611
 612 I feel that this addition would not risk falling down the slippery
 613 slope -- instruction streams with predicated instructions (but not
 614 unrestricted branching) preserve the crucial property that they have
 615 an efficient spatial mapping.  Any steps beyond this, however, cross a
 616 fairly fundamental line.
 617
 618
 619