doc/archman.tex

   1 \documentclass[10pt,oneside]{article}
   2 \reversemarginpar
   3 \usepackage[titles]{tocloft}
   4 \usepackage{emp}
   5 \DeclareGraphicsRule{*}{mps}{*}{}
   6
   7 \ifx\pdftexversion\undefined
   8 \usepackage[dvips]{graphicx}
   9 \DeclareGraphicsExtensions{.eps}\else
  10 \usepackage[pdftex]{graphicx}
  11 \DeclareGraphicsExtensions{.pdf,.png,.mps,.mp}
  12 \usepackage{epstopdf}
  13 \fi
  14 \usepackage{palatino}
  15 \usepackage{wrapfig}
  16 \usepackage{epsfig}
  17 \usepackage{verbatim}
  18 \usepackage{parskip}
  19 \usepackage{register}
  20 \usepackage{bytefield}
  21 \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
  22 \renewcommand{\ttdefault}{cmtt}
  23 \title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of Fleet}}
  24 \begin{document}
  25 \maketitle
  26
  27 \begin{thebibliography}{[GDG01]}
  28 \bibitem[IES02]{ies02}
  29   Sutherland, Ivan,
  30   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer}
  31 \bibitem[IES14]{ies14}
  32   Sutherland, Ivan,
  33   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition}
  34 \bibitem[AM17]{am17}
  35   Megacz, Adam,
  36   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am17-The.Choice.Ship.pdf}{\it UCB-AM17: The Choice Ship}
  37 \bibitem[IES44]{ies44}
  38   Sutherland, Ivan,
  39   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition}
  40 \bibitem[AM25]{am25}
  41   Megacz, Adam,
  42   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am25.pdf}{\it UCB-AM25: Opcode Ports}
  43 \bibitem[AM27]{am27}
  44   Megacz, Adam,
  45   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am27.pdf}{\it UCB-AM27: Bypass Paths}
  46 \bibitem[IES31]{ies31}
  47   Sutherland, Ivan,
  48   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies31-Some.SHIPs.pdf}{\it UCB-IES31: Some Ships}
  49 \bibitem[MS68]{WheelOfReincarnation}
  50   Myer, T. and Sutherland, Ivan, {\it On the Design of Display Processors}
  51   Communications of the ACM, Vol 11, No. 6, June 1968 .
  52 \end{thebibliography}
  53
  54 \vspace{1cm}
  55 \begin{abstract}
  56 This manual attempts to detail all those aspects of Fleet which are
  57 visible to a software programmer or compiler backend.
  58
  59 This document assumes that the reader is broadly familiar with the
  60 Fleet architecture and the motivation behind it.  For a far better
  61 introduction to these topics, the reader is referred to \cite{ies02}
  62 \cite{ies14} and \cite{ies44}.  Please bear in mind that many of the
  63 details in these documents have been changed, but they still serve as
  64 the authoritative introduction to the architecture.  After reviewing
  65 these introductory papers, the reader is invited to digest this
  66 architecture manual for the latest information on the details
  67 necessary to write software and compilers for Fleet.
  68 \end{abstract}
  69
  70
  71 %\vfill
  72 %\fbox{\parbox{5in}{\footnotesize
  73 %\begin{center}
  74 %The software described in this memo may be obtained using
  75 %\href{http://www.darcs.net/}{\tt darcs}, with the command:
  76 %
  77 %{\tt darcs get http://research.cs.berkeley.edu/class/fleet/repos/fleet/}
  78 %\end{center}
  79 %}}
  80
  81 \pagebreak
  82 \tableofcontents
  83
  84 \pagebreak
  85 \section{Introduction (from \cite{ies02})}
  86
  87 During the past two and a half years, the Fleet architecture has gone
  88 through massive changes and improvements.  Nonetheless, the
  89 fundamental vision remains unchanged.  As Sutherland writes in the
  90 original Berkeley Fleet description \cite{ies02},
  91
  92 {\it
  93 When computers were new, logic and storage were expensive and wires
  94 were relatively cheap.  Early computer designers avoided using logic
  95 wherever possible but were not greatly concerned with communication.
  96 They made design choices consistent with the costs of the day.  My
  97 favorite example is the jump instruction.  Early designers put jump
  98 instructions at the end of each block of code to avoid the expense of
  99 storing the address of the next block while executing the present one.
 100
 101 Today's chip fabrication methods invert the older cost balance.  In
 102 today's integrated circuits logic and memory are now almost free but
 103 communication costs dominate chip area, energy consumption and delay.
 104 In spite of these changes in the stuff from which we make computers,
 105 vestiges of the past remain in many of today's common microprocessor
 106 designs.  For example, jump instructions still appear at the end of
 107 each basic block of code in spite of the need to pre-fetch the next
 108 block while executing this one.  It seems better today to store a
 109 pointer to the next block and the length of the current block early in
 110 each block of code.
 111
 112 Instead of following the path of history, I'd like to listen carefully
 113 to what modern chip structures have to teach about how one might build
 114 a modern computer.  I see three major lessons.  First, simplicity can
 115 reduce cost.  Second, moving data will consume most of the time,
 116 energy and chip area of a modern computer. And third, the low cost of
 117 logic makes concurrency available if we can figure out how to use it.
 118
 119 \begin{itemize}
 120
 121 \item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating
 122       all processing devices alike...
 123
 124 \item {\bf Communication}: The Fleet architecture seeks to control the
 125       cost of communication by putting it under direct programmer
 126       control.  Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that
 127       include concealed communication to and from a register file...
 128
 129 \item {\bf Concurrency}: The Fleet architecture assumes concurrency
 130       nearly everywhere...
 131
 132 \item {\bf Asynchrony}: This provides enormous flexibility for
 133       implementers to improve the performance or reduce the cost of
 134       the processing devices in a Fleet system.  -- Synchronous
 135       implementations both of ships and of the switch fabric are
 136       possible provided they use validity or occupancy bits to achieve
 137       the arbitrary delays required at sources and destinations...
 138
 139 \end{itemize}
 140 }
 141
 142
 143 \pagebreak
 144 \section{The Programmer's View of The Ship-Fabric Interface}
 145
 146 The role of the Fleet switch fabric is to carry {\it packets} from
 147 {\it sources} to {\it destinations}.
 148
 149 A packet consists of a {\it path} and a {\it payload}.  The path is
 150 some string of bits which tells the switch fabric how to route from
 151 the desired source to the desired destination.  The distinction
 152 between a {\it path} and a {\it destination} is extremely important!
 153 The payload is either a {\it token} or a {\it data item}.  If the
 154 payload is a data item, it includes one machine word of data.
 155
 156 The diagram below represents a {\it programmer's} conceptual view of
 157 the interface between ships and the switch fabric.  Actual
 158 implementation circuitry may differ substantially.  Sources and
 159 destinations which can send and receive only tokens -- not data items
 160 -- are drawn as dashed lines.
 161
 162 \vspace{0.3cm}
 163 \begin{center}
 164 \epsfig{file=ports,width=4in}
 165 \end{center}
 166 \vspace{0.3cm}
 167
 168 The term {\it port} refers to an interface to the ship, the {\it
 169   valve} connecting it to the switch fabric, and the corresponding
 170 sources and destinations on the switch fabric.
 171
 172 Each valve consists of a {\it data latch}, which is as wide as a
 173 single machine word and a {\it pump}, which is a circular fifo of
 174 instruction-width latches.  The values in the instruction fifo
 175 control the data latch, as future chapters will explain.
 176
 177 Note that the pump in each valve has a destination of its own, and
 178 that there is no buffering fifo guarding this destination.  Note that
 179 all other destinations are guarded by a buffering fifo; the size of
 180 this fifo is exposed to the software programmer so she can ensure that
 181 deadlock does not occur.
 182
 183
 184 \pagebreak
 185 \section{Data Formats}
 186
 187 \subsection{Path (12 bits)}
 188
 189 These bits appear physically within the switch fabric. \footnote{In
 190   the Sun Labs implementation, these bits have ``address bit
 191   timing.''}  The {\tt T} bit is the ``tokenhood'' bit; if set, this
 192 packet represents a token\footnote{In the Sun Labs
 193   implementation, this bit inhibits the switch fabric data latches
 194   from firing.}
 195
 196 {\tt\footnotesize
 197 \begin{bytefield}{49}
 198   \bitheader[b]{37,47,48}\\
 199   \bitbox{1}{T}
 200   \bitbox{11}{Path}
 201   \bitbox[l]{37}{}
 202 \end{bytefield}
 203 }
 204
 205 \subsection{Data Word In Memory (37 bits)}
 206
 207 A word of data is 37 bits wide.
 208
 209 {\tt\footnotesize
 210 \begin{bytefield}{49}
 211   \bitheader[b]{0,36}\\
 212   \bitbox[r]{12}{}
 213   \bitbox{37}{Data Word}
 214 \end{bytefield}
 215 }
 216
 217 \subsection{Packet In Flight (49 bits)}
 218
 219 {\tt\footnotesize
 220 \begin{bytefield}{49}
 221   \bitheader[b]{0,36,37,47,48}\\
 222   \bitbox{1}{T}
 223   \bitbox{11}{Path}
 224   \bitbox{37}{Data Word}
 225 \end{bytefield}
 226 }
 227
 228 \subsection{Instruction In Memory (37 bits)}
 229
 230 An instruction must be no wider than a memory word.  The next section
 231 explains the bits in greater detail.  The {\tt Instruction Path} is
 232 the path which the instruction itself will take in order to arrive at
 233 the desired pump destination.  The {\it Data/Token Path} is placed on
 234 any packet inserted into the switch fabric as a result of executing
 235 the instruction.
 236
 237 {\tt\tiny
 238 \begin{bytefield}{49}
 239   \bitheader[b]{0,10,11,17,18-26,36}\\
 240   \bitbox[r]{12}{}
 241   \bitbox{11}{Instruction Path}
 242   \bitbox{1}{1}
 243   \bitbox{1}{Ti}
 244   \bitbox{1}{Di}
 245   \bitbox{1}{Dc}
 246   \bitbox{1}{Do}
 247   \bitbox{1}{To}
 248   \bitbox{1}{Ig}
 249   \bitbox{1}{Rq}
 250   \bitbox{11}{Data/Token Path}
 251   \bitbox{7}{Count}
 252 \end{bytefield}
 253 }
 254
 255 \subsection{Instruction Packet In Flight (49 bits)}
 256
 257 Note that Fleet simply copies the {\tt Instruction Path} field, bit
 258 for bit, into the packet {\tt Path} field in order to ``dispatch'' an
 259 instruction.
 260
 261 {\tt\tiny
 262 \begin{bytefield}{49}
 263   \bitheader[b]{0,10,11,17,18-25,37,47,48}\\
 264   \bitbox[r]{1}{0}
 265   \bitbox{11}{Instruction Path}
 266   \bitbox{11}{Instruction Path}
 267   \bitbox{1}{1}
 268   \bitbox{1}{Ti}
 269   \bitbox{1}{Di}
 270   \bitbox{1}{Dc}
 271   \bitbox{1}{Do}
 272   \bitbox{1}{To}
 273   \bitbox{1}{Ig}
 274   \bitbox{1}{Rq}
 275   \bitbox{11}{Data/Token Path}
 276   \bitbox{7}{Count}
 277 \end{bytefield}
 278 }
 279
 280 \setlength{\bitwidth}{5mm}
 281 \pagebreak
 282
 283 \section{Instruction Formats}
 284 \begin{wrapfigure}{R}{2in}
 285 \epsfig{file=locations,width=2in}
 286 \vspace{-0.4in}
 287 \end{wrapfigure}
 288 Within the pump's instruction fifo, there are two points of particular
 289 interest.  The first point is the {\it insertion point}, which is the
 290 point in the ring where new instructions enter from the switch fabric.
 291 The other point of interest is the {\it execution point}, which is the
 292 point in the ring where instructions other than {\tt kill} and {\tt
 293   unclog} are performed.
 294
 295 The programmer may assume that the insertion point is the immediate
 296 successor of the execution point.
 297
 298 \vspace{0.3cm}
 299 {\bf Kill}
 300 \addcontentsline{toc}{subsection}{Kill}
 301
 302 {\tt
 303 \begin{bytefield}{26}
 304   \bitheader[b]{0,6,7,20-25}\\
 305   \bitbox{1}{0}
 306   \bitbox{1}{0}
 307   \bitbox{1}{1}
 308   \bitbox{1}{0}
 309   \bitbox{1}{1}
 310   \bitbox{14}{}
 311   \bitbox{7}{Count}
 312 \end{bytefield}}
 313 When a {\tt kill} instruction reaches the insertion point, it will
 314 wait there for a non-{\tt clog} instruction to reach the execution
 315 point.  When this occurs, the instruction at the execution point is
 316 retired and the count field of the {\tt kill} instruction is
 317 decremented.  If the {\tt kill} instruction's count was {\tt 0} before
 318 decrementing, the {\tt kill} instruction is retired.  The programmer
 319 is assured that a kill instruction with a count of $n$ will kill $n$
 320 {\it consecutive} instructions.
 321
 322 \vspace{0.3cm}
 323 {\bf Clog}
 324 \addcontentsline{toc}{subsection}{Clog}
 325
 326 {\tt
 327 \begin{bytefield}{26}
 328   \bitheader[b]{0,20-25}\\
 329   \bitbox{1}{0}
 330   \bitbox{1}{0}
 331   \bitbox{1}{1}
 332   \bitbox{1}{0}
 333   \bitbox{1}{0}
 334   \bitbox{21}{}
 335 \end{bytefield}}
 336 When a {\tt clog} instruction reaches the execution point, no more
 337 instructions will be executed until an {\tt unclog} is performed.
 338
 339 \vspace{0.3cm}
 340 {\bf UnClog}
 341 \addcontentsline{toc}{subsection}{UnClog}
 342
 343 {\tt
 344 \begin{bytefield}{26}
 345   \bitheader[b]{0,20-25}\\
 346   \bitbox{1}{0}
 347   \bitbox{1}{0}
 348   \bitbox{1}{0}
 349   \bitbox{1}{1}
 350   \bitbox{1}{0}
 351   \bitbox{21}{}
 352 \end{bytefield}}
 353 When an {\tt unclog} instruction reaches the insertion point, it will wait
 354 there until a {\tt clog} instruction is at the execution point.  When
 355 this occurs, both instructions retire.
 356
 357 \vspace{0.3cm}
 358 {\bf Literal}
 359 \addcontentsline{toc}{subsection}{Literal}
 360
 361 {\tt
 362 \begin{bytefield}{26}
 363   \bitheader[b]{0,6,7,23-25}\\
 364   \bitbox{1}{0}
 365   \bitbox{1}{1}
 366   \bitbox{17}{Literal}
 367   \bitbox{7}{Count}
 368 \end{bytefield}}
 369 When a literal instruction reaches the execution point, its {\tt
 370   Literal} field is sign-extended to a full word length and captured
 371 in the data latch.
 372
 373
 374 \pagebreak
 375 {\bf Normal Instruction}
 376 \addcontentsline{toc}{subsection}{Normal Instruction}
 377
 378 {\tt
 379 \begin{bytefield}{26}
 380   \bitheader[b]{0,6,7,16-25}\\
 381   \bitbox{1}{1}
 382   \bitbox{1}{Ti}
 383   \bitbox{1}{Di}
 384   \bitbox{1}{Dc}
 385   \bitbox{1}{Do}
 386   \bitbox{1}{To}
 387   \bitbox{1}{Ig}
 388   \bitbox{1}{Rq}
 389   \bitbox{11}{Dest}
 390   \bitbox{7}{Count}
 391 \end{bytefield}
 392 }
 393
 394 \begin{itemize}
 395
 396   \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge
 397                  it\footnote{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.}
 398
 399   \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it.
 400
 401   \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
 402                  datum in the data latch.  This bit is ignored if the incoming packet is
 403                  a token. \footnote{ Note that {\tt Di}=0,{\tt Dc}=1
 404                  is meaningless and therefore reserved for other
 405                  uses.}
 406
 407   \item [\tt Do] ({\bf Data Output}) emit a datum.
 408
 409   \item [\tt To] ({\bf Token Output}) emit a token.\footnote{ {\tt To}=1,{\tt
 410                  Do}=1 have special meaning on an output port.}
 411
 412   \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
 413                  the {\tt To} bit unless {\tt Count=0} \footnote{{\tt
 414                  To}=0,{\tt Ig}=1 is invalid}
 415
 416   \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
 417                  count are ``Re-Queued'' rather than RePeated.  See
 418                  {\tt Count} for more detail. \footnote{ An
 419                  instruction {\it in memory} may not have {\tt
 420                  Rq=1,Count=0} (use {\tt Rq=0,Count=0})}
 421
 422   \item  [\tt Count] ({\bf Count}) {\it After} executing:
 423 \begin{verbatim}
 424 if (Count==0) {
 425     discard this instruction
 426 } else {
 427     if Count < MAX_COUNT {
 428       decrement count
 429     }
 430     if Rq=1 or Literal {
 431        put this instruction back into the instruction fifo
 432     } else {
 433        execute this instruction again
 434     }
 435 }
 436 \end{verbatim}
 437 Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT}
 438
 439 \item [\tt Dest] ({\bf Data/Token Destination})
 440    Normally, this field is copied into the address portion of any
 441    outgoing packet ({\tt Do} on an output port or {\tt To} on either).
 442
 443    However, in the special case of an output port, if {\tt Do=1,To=1}, then
 444    the {\it most significant} {\tt 11} bits of the value in the {\it
 445    data register} are used as a destination address instead.
 446    %\footnote{This
 447    %functionality eliminates the need for any sort of ``{\tt Execute}''
 448    %ship, and lets a {\tt Fifo} ship act as an extension of the
 449    %instruction queue in the pump.}
 450
 451 \end{itemize}
 452
 453
 454 \pagebreak
 455 \section{Opcodes}
 456
 457 Opcodes, as specified in \cite{am25}, are not yet implemented, but
 458 should be.
 459
 460 \section{Bypasses}
 461
 462 Bypasses, as specified in \cite{am27}, are not yet implemented, but
 463 should be.
 464
 465 \section{Sequencing Guarantees}
 466
 467 \addcontentsline{toc}{subsection}{Source Sequence Guarantee}
 468 If two packets leave the same source and have the same path, they will
 469 arrive at their common destination {\it in the same order in which
 470   they left the source}.  This assurance is referred to as the {\it
 471   source sequence guarantee}.
 472
 473 \addcontentsline{toc}{subsection}{Instruction Sequence Guarantee}
 474 CodeBags in memory consist of an unordered set of {\it fibers}.  A
 475 fiber is an ordered set of instructions which all share the same
 476 instruction path, and therefore will be executed by the same pump.
 477 Any code which dispatches a codebag {\it must} ensure that the
 478 instructions within a fiber reach their pump in the order in which
 479 they appear in the fiber.  This is known as the {\it instruction
 480   sequence guarantee}.  Typically the software dispatching a codebag
 481 will take advantage of the source sequence guarantee in order to
 482 fulfill the instruction sequence guarantee.
 483
 484 \section{Code Bags}
 485
 486 Code bags may overlap in memory.  If the assembler determines that two
 487 code bags share a set of fibers, it may feel free to re-order those
 488 fibers in such a way that the code bags overlap, sharing memory words
 489 for these fibers.  The codebag descriptors for the two bags will then
 490 have different addresses and/or sizes, but the ranges which they
 491 encompass will overlap.
 492
 493 \subsection{Dispatching Code Bags}
 494
 495 A codebag is ``dispatched'' by causing its constituent words to
 496 arrive at the data latch of some output port, and then causing that
 497 output port's pump to execute a {\tt send} (not {\tt sendto})
 498 instruction, which is encoded as {\tt Do=1,To=1}.  Because
 499 instructions are encoded in such a way that their {\tt Instruction
 500   Path} appears in the upper 11 bits, the {\tt send} mechanism will
 501 correctly route the instruction to the pump which ought to execute it.
 502
 503 Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally
 504 suited for this purpose, although a similar effect can easily be
 505 achieved using the {\tt BitFifo} ship in conjunction with the {\tt
 506   Memory} ship.  The only disadvantage to this arrangement is that the
 507 {\tt out} port must execute {\tt send} a certain number of times, and
 508 that number of times is determined by the {\tt size} field of the
 509 codebag descriptor.  Unfortunately this value is not known at compile
 510 time, so it cannot be encoded as a repeat count on the instruction at
 511 the {\tt out} port.  The typical solution is to place a standing ({\tt
 512   count=$\infty$}) {\tt send} instruction at the {\tt out} port of one
 513 {\tt Memory} ship, and reserve that ship exclusively for dispatching
 514 code bags.
 515
 516 Note that instructions are encoded relative to a particular dispatch
 517 output port.  That is, the {\tt Instruction Path} field of the
 518 instruction specifies a path to the desired execution pump -- but this
 519 path is also specified with respect to some source.  Therefore the
 520 instruction can only be correctly dispatched from that particular
 521 source.  However, it is a fairly simple matter two write a software
 522 program which can ``relocate'' a codebag by replacing the {\tt
 523   Instruction Path} fields as needed.
 524
 525 \subsection{Tokens and Data Items}
 526
 527 When a data item is sent to a destination which expects a token, such
 528 as an output port destination, the effect will be the same as if a token
 529 had been sent to that destination.
 530
 531 When a token is sent to a destination which expects a data item, such
 532 as a pump destination or an input port destination, the effect will be
 533 the same as if a data item {\it having an undefined value} were sent
 534 to that destination.  In other words, the programmer has no completely
 535 reliable mechanism for distinguishing between tokens and data items.
 536
 537
 538 \section{Future Directions}
 539
 540 \subsection{The Role of the Count Field}
 541
 542 Looking back on the design of the pump, several things are now
 543 apparent which were not initially.  In particular, it seems that it
 544 could be useful to separate {\it loading the count register} from
 545 other types of instructions.  This would have a few advantages:
 546
 547 \begin{itemize}
 548 \item The size of the count field would not be a consideration in the
 549       ``instruction budget'' of normal execution instructions
 550
 551 \item It would be possible to have finitely-repeating,
 552       infinitely-requeueing instructions.  Various implementation
 553       details\footnote{in many implementations, the count field of an
 554       instruction is destroyed as it counts down; another register and
 555       mux would be needed to ``save'' this value so it could be
 556       restored prior to being requeued} make it awkward to support
 557       this unless the count register can be manipulated independently
 558       of instruction execution.
 559 \end{itemize}
 560
 561 With this in mind, we can refactor the ``essence of the pump'' into
 562 the following actions, each of which may or may not be performed by a
 563 particular instruction:
 564
 565 \begin{itemize}
 566 \item await and acknowledge a token
 567 \item await and acknowledge a datum
 568 \item load the awaited datum into the path latch (top 11 bits)
 569 \item load the awaited datum into the data latch
 570 \item load a literal into the data latch
 571 \item load a literal into the path latch
 572 \item send the value in the data latch along the path in the path latch
 573 \item send a token along the path in the path latch
 574 \item set the count register to a literal value
 575 \item decrement the count register
 576 \item requeue this instruction if {\tt count>0}
 577 \item requeue this instruction unconditionally
 578 \item repeat this instruction if {\tt count>0}
 579 \item repeat this instruction unconditionally
 580 \end{itemize}
 581
 582 The crucial change here is the decoupling of the act of {\it loading
 583   the count register} from the act of {\it loading the next
 584   instruction}.  It also separates the act of loading the ``path
 585 latch'' from the act of actually performing the transmission.  This
 586 latter feature makes it possible to load the data and destination
 587 latches from two distinct data items, allowing ships to create,
 588 handle, and consume {\it packets} in the form of a pair of data items.
 589
 590 At this point, the remaining decisions are simply a question of
 591 instruction bit budgeting.  Currently we have a separate instruction
 592 form for literals, so that the literal can use bits which are not
 593 otherwise relevant to literal instructions (for example, {\tt Di}).
 594 Adding an additional instruction form for loading the count register
 595 would have similar advantages.
 596
 597
 598 \subsection{For Just a Little Bit More...}
 599
 600 Compared to early revisions of the Fleet architecture, the valves of
 601 FleetTwo are substantially more powerful.  One might ask, why not add
 602 additional power to the valves, perhaps up to the point of making them
 603 tiny von Neumann machines in their own right?  Indeed, doing so would
 604 risk reliving the {\it Wheel of Reincarnation}
 605 \cite{WheelOfReincarnation}.
 606
 607 I believe, however, that there is a fundamental distinction to be
 608 made between devices whose control flow {\it can branch} and those
 609 whose control flow {\it cannot branch}, and that this distinction is
 610 founded in the geometry of VLSI, and indeed, Euclidean space.
 611 Ultimately, the instruction stream for any component of a processor
 612 must exist in some physical incarnation -- some geometric extent.  If
 613 the control flow of a program is linear, with no branching, then the
 614 instruction stream has an optimal spatial mapping which is
 615 efficient: simply lay it out as a FIFO.
 616
 617 By contrast, however, a program whose instruction stream has
 618 unrestricted branching is logically a {\it tree} structure.
 619 Unfortunately, however, general trees have no efficient mapping onto
 620 two-dimensional surfaces.  By this, I mean that in every possible
 621 mapping, there exists some pair of {\it logically adjacent}
 622 instructions whose {\it physical distance} is proportional to the size
 623 of the entire program, rather than some constant factor irrespective
 624 of the program size.  Therefore I conclude that the spatial mapping of
 625 a program with unrestricted branching is in an entirely different
 626 class than that of programs with linear control flow, and I choose this
 627 distinction as the basis for limiting the valves to linear control
 628 flow sequences.  A similar argument can be made for looping constructs
 629 in which two or more loops may be nested within a single parent loop.
 630
 631 With this distinction in mind, it would not be objectionable for
 632 future incarnations of Fleet to have {\it predicated} instructions at
 633 the valves.  This might take the form of:
 634
 635 \begin{itemize}
 636 \item Instructions to unconditionally set some ``state flags''
 637 \item Instructions to conditionally set some ``state flags'' based on the contents of the data latch
 638 \item Bits which cause an instruction to be ignored or retired based on the state flags
 639 \end{itemize}
 640
 641 I feel that this addition would not risk falling down the slippery
 642 slope -- instruction streams with predicated instructions (but not
 643 unrestricted branching or nested looping) preserve the crucial
 644 property that they have an efficient spatial mapping.  Any steps
 645 beyond this, however, cross a fairly fundamental line.
 646
 647
 648 \addcontentsline{toc}{section}{Ship Descriptions}
 649
 650