doc/archman.tex

   1 \documentclass[10pt,oneside]{article}
   2 \reversemarginpar
   3 \usepackage{emp}
   4 \DeclareGraphicsRule{*}{mps}{*}{}
   5
   6 \ifx\pdftexversion\undefined
   7 \usepackage[dvips]{graphicx}
   8 \DeclareGraphicsExtensions{.eps}\else
   9 \usepackage[pdftex]{graphicx}
  10 \DeclareGraphicsExtensions{.pdf,.png,.mps,.mp}
  11 \usepackage{epstopdf}
  12 \fi
  13 \usepackage{palatino}
  14 \usepackage{wrapfig}
  15 \usepackage{epsfig}
  16 \usepackage{verbatim}
  17 \usepackage{parskip}
  18 \usepackage{register}
  19 \usepackage{bytefield}
  20 \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
  21 \renewcommand{\ttdefault}{cmtt}
  22 \title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of Fleet}}
  23 \begin{document}
  24 \maketitle
  25
  26 \begin{thebibliography}{[GDG01]}
  27 \bibitem[IES02]{ies02}
  28   Sutherland, Ivan,
  29   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer}
  30 \bibitem[IES14]{ies14}
  31   Sutherland, Ivan,
  32   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition}
  33 \bibitem[AM17]{am17}
  34   Megacz, Adam,
  35   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am17-The.Choice.Ship.pdf}{\it UCB-AM17: The Choice Ship}
  36 \bibitem[IES44]{ies44}
  37   Sutherland, Ivan,
  38   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition}
  39 \bibitem[AM25]{am25}
  40   Megacz, Adam,
  41   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am25.pdf}{\it UCB-AM25: Opcode Ports}
  42 \bibitem[AM27]{am27}
  43   Megacz, Adam,
  44   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am27.pdf}{\it UCB-AM27: Bypass Paths}
  45 \bibitem[IES31]{ies31}
  46   Sutherland, Ivan,
  47   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies31-Some.SHIPs.pdf}{\it UCB-IES31: Some Ships}
  48 \bibitem[MS68]{WheelOfReincarnation}
  49   Myer, T. and Sutherland, Ivan, {\it On the Design of Display Processors}
  50   Communications of the ACM, Vol 11, No. 6, June 1968 .
  51 \end{thebibliography}
  52
  53 \vspace{1cm}
  54 \begin{abstract}
  55 This manual attempts to detail all those aspects of Fleet which are
  56 visible to a software programmer or compiler backend.
  57
  58 This document assumes that the reader is broadly familiar with the
  59 Fleet architecture and the motivation behind it.  For a far better
  60 introduction to these topics, the reader is referred to \cite{ies02}
  61 \cite{ies14} and \cite{ies44}.  Please bear in mind that many of the
  62 details in these documents have been changed, but they still serve as
  63 the authoritative introduction to the architecture.  After reviewing
  64 these introductory papers, the reader is invited to digest this
  65 architecture manual for the latest information on the details
  66 necessary to write software and compilers for Fleet.
  67 \end{abstract}
  68
  69
  70 \pagebreak
  71 \section*{Introduction (from \cite{ies02})}
  72
  73 During the past two and a half years, the Fleet architecture has gone
  74 through massive changes and improvements.  Nonetheless, the
  75 fundamental vision remains unchanged.  As Sutherland writes in the
  76 original Berkeley Fleet description \cite{ies02},
  77
  78 {\it
  79 When computers were new, logic and storage were expensive and wires
  80 were relatively cheap.  Early computer designers avoided using logic
  81 wherever possible but were not greatly concerned with communication.
  82 They made design choices consistent with the costs of the day.  My
  83 favorite example is the jump instruction.  Early designers put jump
  84 instructions at the end of each block of code to avoid the expense of
  85 storing the address of the next block while executing the present one.
  86
  87 Today's chip fabrication methods invert the older cost balance.  In
  88 today's integrated circuits logic and memory are now almost free but
  89 communication costs dominate chip area, energy consumption and delay.
  90 In spite of these changes in the stuff from which we make computers,
  91 vestiges of the past remain in many of today's common microprocessor
  92 designs.  For example, jump instructions still appear at the end of
  93 each basic block of code in spite of the need to pre-fetch the next
  94 block while executing this one.  It seems better today to store a
  95 pointer to the next block and the length of the current block early in
  96 each block of code.
  97
  98 Instead of following the path of history, I'd like to listen carefully
  99 to what modern chip structures have to teach about how one might build
 100 a modern computer.  I see three major lessons.  First, simplicity can
 101 reduce cost.  Second, moving data will consume most of the time,
 102 energy and chip area of a modern computer. And third, the low cost of
 103 logic makes concurrency available if we can figure out how to use it.
 104
 105 \begin{itemize}
 106
 107 \item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating
 108       all processing devices alike...
 109
 110 \item {\bf Communication}: The Fleet architecture seeks to control the
 111       cost of communication by putting it under direct programmer
 112       control.  Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that
 113       include concealed communication to and from a register file...
 114
 115 \item {\bf Concurrency}: The Fleet architecture assumes concurrency
 116       nearly everywhere...
 117
 118 \item {\bf Asynchrony}: This provides enormous flexibility for
 119       implementers to improve the performance or reduce the cost of
 120       the processing devices in a Fleet system.  -- Synchronous
 121       implementations both of ships and of the switch fabric are
 122       possible provided they use validity or occupancy bits to achieve
 123       the arbitrary delays required at sources and destinations...
 124
 125 \end{itemize}
 126 }
 127
 128
 129 \pagebreak
 130 \section*{The Programmer's View of The Ship-Fabric Interface}
 131
 132 The role of the Fleet switch fabric is to carry {\it packets} from
 133 {\it sources} to {\it destinations}.
 134
 135 A packet consists of a {\it path} and a {\it payload}.  The path is
 136 some string of bits which tells the switch fabric how to route from
 137 the desired source to the desired destination.  The distinction
 138 between a {\it path} and a {\it destination} is extremely important!
 139 The payload is either a {\it token} or a {\it data item}.  If the
 140 payload is a data item, it includes one machine word of data.
 141
 142 The diagram below represents a {\it programmer's} conceptual view of
 143 the interface between ships and the switch fabric.  Actual
 144 implementation circuitry may differ substantially.  Sources and
 145 destinations which can send and recieve only tokens -- not data items
 146 -- are drawn as dashed lines.
 147
 148 \vspace{0.3cm}
 149 \begin{center}
 150 \epsfig{file=ports,width=4in}
 151 \end{center}
 152 \vspace{0.3cm}
 153
 154 The term {\it port} refers to an interface to the ship, the {\it
 155   valve} connecting it to the switch fabric, and the corresponding
 156 sources and destinations on the switch fabric.
 157
 158 Each valve consists of a {\it data latch}, which is as wide as a
 159 single machine word and a {\it pump}, which is a circular fifo of
 160 instruction-width latches.  The values in the instruction fifo
 161 control the data latch, as future chapters will explain.
 162
 163 Note that the pump in each valve has a destination of its own, and
 164 that there is no buffering fifo guarding this destination.  Note that
 165 all other destinations are guarded by a buffering fifo; the size of
 166 this fifo is exposed to the software programmer so she can ensure that
 167 deadlock does not occur.
 168
 169
 170 \pagebreak
 171 \section*{Data Formats}
 172
 173 \subsection*{Path (12 bits)}
 174
 175 These bits appear physically within the switch fabric. \footnote{In
 176   the Sun Labs implementation, these bits have ``address bit
 177   timing.''}  The {\tt T} bit is the ``tokenhood'' bit; if set, this
 178 packet represents a token\footnote{In the Sun Labs
 179   implementation, this bit inhibits the switch fabric data latches
 180   from firing.}
 181
 182 {\tt\footnotesize
 183 \begin{bytefield}{49}
 184   \bitheader[b]{37,47,48}\\
 185   \bitbox{1}{T}
 186   \bitbox{11}{Path}
 187   \bitbox[l]{37}{}
 188 \end{bytefield}
 189 }
 190
 191 \subsection*{Data Word In Memory (37 bits)}
 192
 193 A word of data is 37 bits wide.
 194
 195 {\tt\footnotesize
 196 \begin{bytefield}{49}
 197   \bitheader[b]{0,36}\\
 198   \bitbox[r]{12}{}
 199   \bitbox{37}{Data Word}
 200 \end{bytefield}
 201 }
 202
 203 \subsection*{Packet In Flight (49 bits)}
 204
 205 {\tt\footnotesize
 206 \begin{bytefield}{49}
 207   \bitheader[b]{0,36,37,47,48}\\
 208   \bitbox{1}{T}
 209   \bitbox{11}{Path}
 210   \bitbox{37}{Data Word}
 211 \end{bytefield}
 212 }
 213
 214 \subsection*{Instruction In Memory (37 bits)}
 215
 216 An instruction must be no wider than a memory word.  The next section
 217 explains the bits in greater detail.  The {\tt Instruction Path} is
 218 the path which the instruction itself will take in order to arrive at
 219 the desired pump destination.  The {\it Data/Token Path} is placed on
 220 any packet inserted into the switch fabric as a result of executing
 221 the instruction.
 222
 223 {\tt\tiny
 224 \begin{bytefield}{49}
 225   \bitheader[b]{0,10,11,17,18-26,36}\\
 226   \bitbox[r]{12}{}
 227   \bitbox{11}{Instruction Path}
 228   \bitbox{1}{1}
 229   \bitbox{1}{Ti}
 230   \bitbox{1}{Di}
 231   \bitbox{1}{Dc}
 232   \bitbox{1}{Do}
 233   \bitbox{1}{To}
 234   \bitbox{1}{Ig}
 235   \bitbox{1}{Rq}
 236   \bitbox{11}{Data/Token Path}
 237   \bitbox{7}{Count}
 238 \end{bytefield}
 239 }
 240
 241 \subsection*{Instruction Packet In Flight (49 bits)}
 242
 243 Note that Fleet simply copies the {\tt Instruction Path} field, bit
 244 for bit, into the packet {\tt Path} field in order to ``dispatch'' an
 245 instruction.
 246
 247 {\tt\tiny
 248 \begin{bytefield}{49}
 249   \bitheader[b]{0,10,11,17,18-25,37,47,48}\\
 250   \bitbox[r]{1}{0}
 251   \bitbox{11}{Instruction Path}
 252   \bitbox{11}{Instruction Path}
 253   \bitbox{1}{1}
 254   \bitbox{1}{Ti}
 255   \bitbox{1}{Di}
 256   \bitbox{1}{Dc}
 257   \bitbox{1}{Do}
 258   \bitbox{1}{To}
 259   \bitbox{1}{Ig}
 260   \bitbox{1}{Rq}
 261   \bitbox{11}{Data/Token Path}
 262   \bitbox{7}{Count}
 263 \end{bytefield}
 264 }
 265
 266 \setlength{\bitwidth}{5mm}
 267 \pagebreak
 268
 269 \section*{Instruction Formats}
 270 \begin{wrapfigure}{R}{2in}
 271 \epsfig{file=locations,width=2in}
 272 \vspace{-0.4in}
 273 \end{wrapfigure}
 274 Within the pump's instruction fifo, there are two points of particular
 275 interest.  The first point is the {\it insertion point}, which is the
 276 point in the ring where new instructions enter from the switch fabric.
 277 The other point of interest is the {\it execution point}, which is the
 278 point in the ring where instructions other than {\tt kill} and {\tt
 279   unclog} are performed.
 280
 281 The programmer may assume that the insertion point is the immediate
 282 successor of the execution point.
 283
 284 \vspace{0.3cm}
 285 {\bf Kill}
 286
 287 {\tt
 288 \begin{bytefield}{26}
 289   \bitheader[b]{0,6,7,20-25}\\
 290   \bitbox{1}{0}
 291   \bitbox{1}{0}
 292   \bitbox{1}{1}
 293   \bitbox{1}{0}
 294   \bitbox{1}{1}
 295   \bitbox{14}{}
 296   \bitbox{7}{Count}
 297 \end{bytefield}}
 298 When a {\tt kill} instruction reaches the insertion point, it will
 299 wait there for a non-{\tt clog} instruction to reach the execution
 300 point.  When this occurs, the instruction at the execution point is
 301 retired and the count field of the {\tt kill} instruction is
 302 decremented.  If the {\tt kill} instruction's count was {\tt 0} before
 303 decrementing, the {\tt kill} instruction is retired.  The programmer
 304 is assured that a kill instruction with a count of $n$ will kill $n$
 305 {\it consecutive} instructions.
 306
 307 \vspace{0.3cm}
 308 {\bf Clog}
 309
 310 {\tt
 311 \begin{bytefield}{26}
 312   \bitheader[b]{0,20-25}\\
 313   \bitbox{1}{0}
 314   \bitbox{1}{0}
 315   \bitbox{1}{1}
 316   \bitbox{1}{0}
 317   \bitbox{1}{0}
 318   \bitbox{21}{}
 319 \end{bytefield}}
 320 When a {\tt clog} instruction reaches the execution point, no more
 321 instructions will be executed until an {\tt unclog} is performed.
 322
 323 \vspace{0.3cm}
 324 {\bf UnClog}
 325
 326 {\tt
 327 \begin{bytefield}{26}
 328   \bitheader[b]{0,20-25}\\
 329   \bitbox{1}{0}
 330   \bitbox{1}{0}
 331   \bitbox{1}{0}
 332   \bitbox{1}{1}
 333   \bitbox{1}{0}
 334   \bitbox{21}{}
 335 \end{bytefield}}
 336 When an {\tt unclog} instruction reaches the insertion point, it will wait
 337 there until a {\tt clog} instruction is at the execution point.  When
 338 this occurs, both instructions retire.
 339
 340 \vspace{0.3cm}
 341 {\bf Literal}
 342
 343 {\tt
 344 \begin{bytefield}{26}
 345   \bitheader[b]{0,6,7,23-25}\\
 346   \bitbox{1}{0}
 347   \bitbox{1}{1}
 348   \bitbox{17}{Literal}
 349   \bitbox{7}{Count}
 350 \end{bytefield}}
 351 When a literal instruction reaches the execution point, its {\tt
 352   Literal} field is sign-extended to a full word length and captured
 353 in the data latch.
 354
 355
 356 \pagebreak
 357 {\bf Normal Instruction}
 358
 359 {\tt
 360 \begin{bytefield}{26}
 361   \bitheader[b]{0,6,7,16-25}\\
 362   \bitbox{1}{1}
 363   \bitbox{1}{Ti}
 364   \bitbox{1}{Di}
 365   \bitbox{1}{Dc}
 366   \bitbox{1}{Do}
 367   \bitbox{1}{To}
 368   \bitbox{1}{Ig}
 369   \bitbox{1}{Rq}
 370   \bitbox{11}{Dest}
 371   \bitbox{7}{Count}
 372 \end{bytefield}
 373 }
 374
 375 \begin{itemize}
 376
 377   \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge
 378                  it\footnote{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.}
 379
 380   \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it.
 381
 382   \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
 383                  datum in the data latch.  This bit is ignored if the incoming packet is
 384                  a token. \footnote{ Note that {\tt Di}=0,{\tt Dc}=1
 385                  is meaningless and therefore reserved for other
 386                  uses.}
 387
 388   \item [\tt Do] ({\bf Data Output}) emit a datum.
 389
 390   \item [\tt To] ({\bf Token Output}) emit a token.\footnote{ {\tt To}=1,{\tt
 391                  Do}=1 have special meaning on an output port.}
 392
 393   \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
 394                  the {\tt To} bit unless {\tt Count=0} \footnote{{\tt
 395                  To}=0,{\tt Ig}=1 is invalid}
 396
 397   \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
 398                  count are ``Re-Queued'' rather than RePeated.  See
 399                  {\tt Count} for more detail. \footnote{ An
 400                  instruction {\it in memory} may not have {\tt
 401                  Rq=1,Count=0} (use {\tt Rq=0,Count=0})}
 402
 403   \item  [\tt Count] ({\bf Count}) {\it After} executing:
 404 \begin{verbatim}
 405 if (Count==0) {
 406     discard this instruction
 407 } else {
 408     if Count < MAX_COUNT {
 409       decrement count
 410     }
 411     if Rq=1 or Literal {
 412        put this instruction back into the instruction fifo
 413     } else {
 414        execute this instruction again
 415     }
 416 }
 417 \end{verbatim}
 418 Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT}
 419
 420 \item [\tt Dest] ({\bf Data/Token Destination})
 421    Normally, this field is copied into the address portion of any
 422    outgoing packet ({\tt Do} on an output port or {\tt To} on either).
 423
 424    However, in the special case of an output port, if {\tt Do=1,To=1}, then
 425    the {\it most significant} {\tt 11} bits of the value in the {\it
 426    data register} are used as a destination address instead.
 427    %\footnote{This
 428    %functionality eliminates the need for any sort of ``{\tt Execute}''
 429    %ship, and lets a {\tt Fifo} ship act as an extension of the
 430    %instruction queue in the pump.}
 431
 432 \end{itemize}
 433
 434
 435 \pagebreak
 436 \section*{Opcodes}
 437
 438 Opcodes, as specified in \cite{am25}, are not yet implemented, but
 439 should be.
 440
 441 \section*{Bypasses}
 442
 443 Bypasses, as specified in \cite{am27}, are not yet implemented, but
 444 should be.
 445
 446 \section*{Sequencing Guarantees}
 447
 448 If two packets leave the same source and have the same path, they will
 449 arrive at their common destination {\it in the same order in which
 450   they left the source}.  This assurance is referred to as the {\it
 451   source sequence guarantee}.
 452
 453 CodeBags in memory consist of an unordered set of {\it fibers}.  A
 454 fiber is an ordered set of instructions which all share the same
 455 instruction path, and therefore will be executed by the same pump.
 456 Any code which dispatches a codebag {\it must} ensure that the
 457 instructions within a fiber reach their pump in the order in which
 458 they appear in the fiber.  This is known as the {\it instruction
 459   sequence guarantee}.  Typically the software dispatching a codebag
 460 will take advantage of the source sequence guarantee in order to
 461 fulfill the instruction sequence guarantee.
 462
 463 \section*{Code Bags}
 464
 465 Code bags may overlap in memory.  If the assembler determines that two
 466 code bags share a set of fibers, it may feel free to re-order those
 467 fibers in such a way that the code bags overlap, sharing memory words
 468 for these fibers.  The codebag descriptors for the two bags will then
 469 have different addresses and/or sizes, but the ranges which they
 470 encompass will overlap.
 471
 472 \subsection*{Dispatching Code Bags}
 473
 474 A codebag is ``dispatched'' by causing its constitutent words to
 475 arrive at the data latch of some output port, and then causing that
 476 output port's pump to execute a {\tt send} (not {\tt sendto})
 477 instruction, which is encoded as {\tt Do=1,To=1}.  Because
 478 instructions are encoded in such a way that their {\tt Instruction
 479   Path} appears in the upper 11 bits, the {\tt send} mechanism will
 480 correctly route the instruction to the pump which ought to execute it.
 481
 482 Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally
 483 suited for this purpose, although a similar effect can easily be
 484 achieved using the {\tt BitFifo} ship in conjunction with the {\tt
 485   Memory} ship.  The only disadvantage to this arrangement is that the
 486 {\tt out} port must execute {\tt send} a certain number of times, and
 487 that number of times is determined by the {\tt size} field of the
 488 codebag descriptor.  Unfortunately this value is not known at compile
 489 time, so it cannot be encoded as a repeat count on the instruction at
 490 the {\tt out} port.  The typical solution is to place a standing ({\tt
 491   count=$\infty$}) {\tt send} instruction at the {\tt out} port of one
 492 {\tt Memory} ship, and reserve that ship exclusively for dispatching
 493 code bags.
 494
 495 Note that instructions are encoded relative to a particular dispatch
 496 output port.  That is, the {\tt Instruction Path} field of the
 497 instruction specifies a path to the desired execution pump -- but this
 498 path is also specified with respect to some source.  Therefore the
 499 instruction can only be correctly dispatched from that particular
 500 source.  However, it is a fairly simple matter two write a software
 501 program which can ``relocate'' a codebag by replacing the {\tt
 502   Instruction Path} fields as needed.
 503
 504 \subsection*{Tokens and Data Items}
 505
 506 When a data item is sent to a destination which expects a token, such
 507 as an output port destination, the effect will be the same as if a token
 508 had been sent to that destination.
 509
 510 When a token is sent to a destination which expects a data item, such
 511 as a pump destination or an input port destination, the effect will be
 512 the same as if a data item {\it having an undefined value} were sent
 513 to that destination.  In other words, the programmer has no completely
 514 reliable mechanism for distinguishing between tokens and data items.
 515
 516
 517 \section*{Future Directions}
 518
 519 \subsection*{The Role of the Count Field}
 520
 521 Looking back on the design of the pump, several things are now
 522 apparent which were not initially.  In particular, it seems that it
 523 could be useful to separate {\it loading the count register} from
 524 other types of instructions.  This would have a few advantages:
 525
 526 \begin{itemize}
 527 \item The size of the count field would not be a consideration in the
 528       ``instruction budget'' of normal execution instructions
 529
 530 \item It would be possible to have finitely-repeating,
 531       infinitely-requeueing instructions.  Various implementation
 532       details\footnote{in many implementations, the count field of an
 533       instruction is destroyed as it counts down; another register and
 534       mux would be needed to ``save'' this value so it could be
 535       restored prior to being requeued} make it awkward to support
 536       this unless the count register can be manipulated independently
 537       of instruction execution.
 538 \end{itemize}
 539
 540 With this in mind, we can refactor the ``essence of the pump'' into
 541 the following actions, each of which may or may not be performed by a
 542 particular instruction:
 543
 544 \begin{itemize}
 545 \item await and acknowledge a token
 546 \item await and acknowledge a datum
 547 \item load the awaited datum into the data latch
 548 \item load the a literal into the data latch
 549 \item load a value indicated by the instruction into the path latch
 550 \item load the top 11 bits of the data latch into the path latch
 551 \item treat the values in the path latch and data latch as a packet and transmit it
 552 \item treat the value in the path latch as a token and transmit it
 553 \item set the count register to a literal value
 554 \item decrement the count register
 555 \item requeue this instruction if {\tt count>0}
 556 \item requeue this instruction unconditionally
 557 \item repeat this instruction if {\tt count>0}
 558 \item repeat this instruction unconditionally
 559 \end{itemize}
 560
 561 The crucial change here is the decoupling of the act of {\it loading
 562   the count register} from the act of {\it loading the next
 563   instruction}.  It also separates the act of loading the ``path
 564 latch'' from the act of actually performing the transmission.  This
 565 latter feature makes it possible to load the data and destination
 566 latches from two distinct data items, allowing ships to create,
 567 handle, and consume {\it packets} in the form of a pair of data items.
 568
 569 At this point, it boils down to a question of instruction bit
 570 budgeting.  Currently we have a separate instruction form for
 571 literals, so that the literal can use bits which are not otherwise
 572 relevant to literal instructions (for example, {\tt Di}).  Adding an
 573 additional instruction form for loading the count register would have
 574 similar advantages.
 575
 576
 577 \subsection*{For Just a Little Bit More...}
 578
 579 Compared to early revisions of the Fleet architecture, the valves of
 580 FleetTwo are substantially more powerful.  One might ask, why not add
 581 additional power to the valves, perhaps up to the point of making them
 582 tiny von Neumann machines in their own right?  Indeed, doing so would
 583 risk reliving the {\it Wheel of Reincarnation}
 584 \cite{WheelOfReincarnation}.
 585
 586 I believe, however, that there is a fundamental distinction to be
 587 made between devices whose control flow {\it can branch} and those
 588 whose control flow {\it cannot branch}, and that this distinction is
 589 founded in the geometry of VLSI, and indeed, Euclidean space.
 590 Ultimately, the instruction stream for any component of a processor
 591 must exist in some physical incarnation -- some geometric extent.  If
 592 the control flow of a program is linear, with no branching, then the
 593 instruction stream has an optimal spatial mapping which is quite
 594 efficient: it is simply laid out in FIFO fashion.
 595
 596 By contrast, however, a program whose instruction stream has
 597 unrestricted branching is logically a {\it tree} structure.  Quite
 598 unfortunately, general trees have no efficient mapping onto
 599 two-dimensional surfaces.  By this, I mean that in every possible
 600 mapping, there exists some pair of {\it logically adjacent}
 601 instructions whose {\it physical distance} is proportional to the size
 602 of the entire program, rather than some constant factor irrespective
 603 of the program size.  Therefore I conclude that the spatial mapping of
 604 a program with unrestricted branching is in an entirely different
 605 class than that of programs with linear control flow.  I choose this
 606 distinction as the basis for limiting our valves to linear control
 607 flow sequences.  A similar argument can be made for looping constructs
 608 in which two or more loops may be nested within a single parent loop.
 609
 610 With this distinction in mind, it would not be objectionable for
 611 future incarnations of Fleet to have {\it predicated} instructions at
 612 the valves.  This might take the form of:
 613
 614 \begin{itemize}
 615 \item Instructions to unconditionally set some ``state flags''
 616 \item Instructions to conditionally set some ``state flags'' based on the contents of the data latch
 617 \item Bits which cause an instruction to be ignored or retired based on the state flags
 618 \end{itemize}
 619
 620 I feel that this addition would not risk falling down the slippery
 621 slope -- instruction streams with predicated instructions (but not
 622 unrestricted branching) preserve the crucial property that they have
 623 an efficient spatial mapping.  Any steps beyond this, however, cross a
 624 fairly fundamental line.
 625
 626
 627