doc/archman.tex

   1 \documentclass[10pt,oneside]{article}
   2 \reversemarginpar
   3 \usepackage[titles]{tocloft}
   4 \usepackage{emp}
   5 \usepackage{amsmath}
   6 \DeclareGraphicsRule{*}{mps}{*}{}
   7
   8 \newcommand{\footnoteremember}[2]{
   9   \footnote{#2}
  10   \newcounter{#1}
  11   \setcounter{#1}{\value{footnote}}
  12 } \newcommand{\footnoterecall}[1]{
  13   \footnotemark[\value{#1}]
  14 }
  15
  16 \ifx\pdftexversion\undefined
  17 \usepackage[dvips]{graphicx}
  18 \DeclareGraphicsExtensions{.eps}\else
  19 \usepackage[pdftex]{graphicx}
  20 \DeclareGraphicsExtensions{.pdf,.png,.mps,.mp}
  21 \usepackage{epstopdf}
  22 \fi
  23 \usepackage{palatino}
  24 \usepackage{wrapfig}
  25 \usepackage{epsfig}
  26 \usepackage{verbatim}
  27 \usepackage{parskip}
  28 \usepackage{register}
  29 \usepackage{bytefield}
  30 \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
  31 \renewcommand{\ttdefault}{cmtt}
  32 \title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of FleetTwo}}
  33 \begin{document}
  34 \maketitle
  35
  36 \begin{thebibliography}{[GDG01]}
  37 \bibitem[IES02]{ies02}
  38   Sutherland, Ivan,
  39   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer}
  40 \bibitem[IES14]{ies14}
  41   Sutherland, Ivan,
  42   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition}
  43 \bibitem[AM17]{am17}
  44   Megacz, Adam,
  45   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am17-The.Choice.Ship.pdf}{\it UCB-AM17: The Choice Ship}
  46 \bibitem[IES44]{ies44}
  47   Sutherland, Ivan,
  48   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition}
  49 \bibitem[AM25]{am25}
  50   Megacz, Adam,
  51   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am25.pdf}{\it UCB-AM25: Opcode Ports}
  52 \bibitem[AM27]{am27}
  53   Megacz, Adam,
  54   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am27.pdf}{\it UCB-AM27: Bypass Paths}
  55 \bibitem[IES31]{ies31}
  56   Sutherland, Ivan,
  57   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies31-Some.SHIPs.pdf}{\it UCB-IES31: Some Ships}
  58 \bibitem[MS68]{WheelOfReincarnation}
  59   Myer, T. and Sutherland, Ivan, {\it On the Design of Display Processors}
  60   Communications of the ACM, Vol 11, No. 6, June 1968 .
  61 \end{thebibliography}
  62
  63 \vspace{1cm}
  64 \begin{abstract}
  65 This manual attempts to detail all those aspects of Fleet which are
  66 visible to a software programmer or compiler backend.
  67
  68 This document assumes that the reader is broadly familiar with the
  69 Fleet architecture and the motivation behind it.  For a far better
  70 introduction to these topics, the reader is referred to \cite{ies02}
  71 \cite{ies14} and \cite{ies44}.  Please bear in mind that many of the
  72 details in these documents have been changed, but they still serve as
  73 the authoritative introduction to the architecture.  After reviewing
  74 these introductory papers, the reader is invited to digest this
  75 architecture manual for the latest information on the details
  76 necessary to write software and compilers for Fleet.
  77 \end{abstract}
  78
  79
  80 %\vfill
  81 %\fbox{\parbox{5in}{\footnotesize
  82 %\begin{center}
  83 %The software described in this memo may be obtained using
  84 %\href{http://www.darcs.net/}{\tt darcs}, with the command:
  85 %
  86 %{\tt darcs get http://research.cs.berkeley.edu/class/fleet/repos/fleet/}
  87 %\end{center}
  88 %}}
  89
  90 \pagebreak
  91 \tableofcontents
  92
  93 \pagebreak
  94 \section{Introduction (from \cite{ies02})}
  95
  96 During the past two and a half years, the Fleet architecture has gone
  97 through massive changes and improvements.  Nonetheless, the
  98 fundamental vision remains unchanged.  As Sutherland writes in the
  99 original Berkeley Fleet description \cite{ies02},
 100
 101 {\it
 102 When computers were new, logic and storage were expensive and wires
 103 were relatively cheap.  Early computer designers avoided using logic
 104 wherever possible but were not greatly concerned with communication.
 105 They made design choices consistent with the costs of the day.  My
 106 favorite example is the jump instruction.  Early designers put jump
 107 instructions at the end of each block of code to avoid the expense of
 108 storing the address of the next block while executing the present one.
 109
 110 Today's chip fabrication methods invert the older cost balance.  In
 111 today's integrated circuits logic and memory are now almost free but
 112 communication costs dominate chip area, energy consumption and delay.
 113 In spite of these changes in the stuff from which we make computers,
 114 vestiges of the past remain in many of today's common microprocessor
 115 designs.  For example, jump instructions still appear at the end of
 116 each basic block of code in spite of the need to pre-fetch the next
 117 block while executing this one.  It seems better today to store a
 118 pointer to the next block and the length of the current block early in
 119 each block of code.
 120
 121 Instead of following the path of history, I'd like to listen carefully
 122 to what modern chip structures have to teach about how one might build
 123 a modern computer.  I see three major lessons.  First, simplicity can
 124 reduce cost.  Second, moving data will consume most of the time,
 125 energy and chip area of a modern computer. And third, the low cost of
 126 logic makes concurrency available if we can figure out how to use it.
 127
 128 \begin{itemize}
 129
 130 \item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating
 131       all processing devices alike...
 132
 133 \item {\bf Communication}: The Fleet architecture seeks to control the
 134       cost of communication by putting it under direct programmer
 135       control.  Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that
 136       include concealed communication to and from a register file...
 137
 138 \item {\bf Concurrency}: The Fleet architecture assumes concurrency
 139       nearly everywhere...
 140
 141 \item {\bf Asynchrony}: This provides enormous flexibility for
 142       implementers to improve the performance or reduce the cost of
 143       the processing devices in a Fleet system.  -- Synchronous
 144       implementations both of ships and of the switch fabric are
 145       possible provided they use validity or occupancy bits to achieve
 146       the arbitrary delays required at sources and destinations...
 147
 148 \end{itemize}
 149 }
 150
 151
 152 \pagebreak
 153 \section{The Programmer's View of The Ship-Fabric Interface}
 154
 155 The role of the Fleet switch fabric is to carry {\it packets} from
 156 {\it sources} to {\it destinations}.
 157
 158 A packet consists of a {\it path} and a {\it payload}.  The path is
 159 some string of bits which tells the switch fabric how to route from
 160 the desired source to the desired destination.  The distinction
 161 between a {\it path} and a {\it destination} is extremely important!
 162 The payload is either a {\it token} or a {\it data item}.  If the
 163 payload is a data item, it includes one machine word of data.
 164
 165 The diagram below represents a {\it programmer's} conceptual view of
 166 the interface between ships and the switch fabric.  Actual
 167 implementation circuitry may differ substantially.  Sources and
 168 destinations which can send and receive only tokens -- not data items
 169 -- are drawn as dashed lines.
 170
 171 \vspace{0.3cm}
 172 \begin{center}
 173 \epsfig{file=ports,width=4in}
 174 \end{center}
 175 \vspace{0.3cm}
 176
 177 The term {\it port} refers to an interface to the ship, the {\it
 178   dock} connecting it to the switch fabric, and the corresponding
 179 sources and destinations on the switch fabric.
 180
 181 Each dock consists of a {\it data latch}, which is as wide as a
 182 single machine word and a {\it pump}, which is a circular fifo of
 183 instruction-width latches.  The values in the instruction fifo
 184 control the data latch, as future chapters will explain.
 185
 186 Note that the pump in each dock has a destination of its own, and
 187 that there is no buffering fifo guarding this destination.  Note that
 188 all other destinations are guarded by a buffering fifo; the size of
 189 this fifo is exposed to the software programmer so she can avoid
 190 deadlock.
 191
 192
 193 \pagebreak
 194 \section{Data Formats}
 195
 196 \subsection{Path (12 bits)}
 197
 198 These bits appear physically within the switch fabric. \footnote{In
 199   the Sun Labs implementation, these bits have ``address bit
 200   timing.''}  The {\tt T} bit is the ``tokenhood'' bit; if set, this
 201 packet represents a token\footnote{In the Sun Labs
 202   implementation, this bit inhibits the switch fabric data latches
 203   from firing.}
 204
 205 {\tt\footnotesize
 206 \begin{bytefield}{49}
 207   \bitheader[b]{37,47,48}\\
 208   \bitbox{1}{T}
 209   \bitbox{11}{Path}
 210   \bitbox[l]{37}{}
 211 \end{bytefield}
 212 }
 213
 214 \subsection{Data Word In Memory (37 bits)}
 215
 216 A word of data is 37 bits wide.
 217
 218 {\tt\footnotesize
 219 \begin{bytefield}{49}
 220   \bitheader[b]{0,36}\\
 221   \bitbox[r]{12}{}
 222   \bitbox{37}{Data Word}
 223 \end{bytefield}
 224 }
 225
 226 \subsection{Packet In Flight (49 bits)}
 227
 228 {\tt\footnotesize
 229 \begin{bytefield}{49}
 230   \bitheader[b]{0,36,37,47,48}\\
 231   \bitbox{1}{T}
 232   \bitbox{11}{Path}
 233   \bitbox{37}{Data Word}
 234 \end{bytefield}
 235 }
 236
 237 \subsection{Instruction In Memory (37 bits)}
 238
 239 An instruction must be no wider than a memory word.  The next section
 240 explains the bits in greater detail.  The {\tt Instruction Path} is
 241 the path which the instruction itself will take in order to arrive at
 242 the desired pump destination.  The {\it Data/Token Path} is placed on
 243 any packet inserted into the switch fabric as a result of executing
 244 the instruction.
 245
 246 {\tt\tiny
 247 \begin{bytefield}{49}
 248   \bitheader[b]{0,6,7,17,18-26,36}\\
 249   \bitbox[r]{12}{}
 250   \bitbox{11}{Instruction Path}
 251   \bitbox{1}{1}
 252   \bitbox{1}{Ti}
 253   \bitbox{1}{Di}
 254   \bitbox{1}{Dc}
 255   \bitbox{1}{Do}
 256   \bitbox{1}{To}
 257   \bitbox{1}{Ig}
 258   \bitbox{1}{Rq}
 259   \bitbox{11}{Data/Token Path}
 260   \bitbox{7}{Count}
 261 \end{bytefield}
 262 }
 263
 264 \subsection{Instruction Packet In Flight (49 bits)}
 265
 266 Note that Fleet simply copies the {\tt Instruction Path} field, bit
 267 for bit, into the packet {\tt Path} field in order to ``dispatch'' an
 268 instruction.
 269
 270 {\tt\tiny
 271 \begin{bytefield}{49}
 272   \bitheader[b]{0,6,7,17,18-25,37,47,48}\\
 273   \bitbox[r]{1}{0}
 274   \bitbox{11}{Instruction Path}
 275   \bitbox{11}{Instruction Path}
 276   \bitbox{1}{1}
 277   \bitbox{1}{Ti}
 278   \bitbox{1}{Di}
 279   \bitbox{1}{Dc}
 280   \bitbox{1}{Do}
 281   \bitbox{1}{To}
 282   \bitbox{1}{Ig}
 283   \bitbox{1}{Rq}
 284   \bitbox{11}{Data/Token Path}
 285   \bitbox{7}{Count}
 286 \end{bytefield}
 287 }
 288
 289 \setlength{\bitwidth}{5mm}
 290 \pagebreak
 291
 292 \section{Instruction Formats}
 293 \begin{wrapfigure}{R}{2in}
 294 \epsfig{file=locations,width=2in}
 295 \vspace{-0.4in}
 296 \end{wrapfigure}
 297 Within the pump's instruction fifo, there are two points of particular
 298 interest.  The first point is the {\it insertion point}, which is the
 299 point in the ring where new instructions enter from the switch fabric.
 300 The other point of interest is the {\it execution point}, which is the
 301 point in the ring where instructions other than {\tt kill} and {\tt
 302   unclog} are performed.
 303
 304 The programmer may assume that the insertion point is the immediate
 305 successor of the execution point.
 306
 307 \vspace{0.3cm}
 308 {\bf Kill}
 309 \addcontentsline{toc}{subsection}{Kill}
 310
 311 {\tt
 312 \begin{bytefield}{26}
 313   \bitheader[b]{0,6,7,20-25}\\
 314   \bitbox{1}{0}
 315   \bitbox{1}{0}
 316   \bitbox{1}{1}
 317   \bitbox{1}{0}
 318   \bitbox{1}{1}
 319   \bitbox{14}{don't care}
 320   \bitbox{7}{Count}
 321 \end{bytefield}}
 322 When a {\tt kill} instruction reaches the insertion point, it will
 323 wait there for a non-{\tt clog} instruction to reach the execution
 324 point.  When this occurs, the instruction at the execution point is
 325 retired and the count field of the {\tt kill} instruction is
 326 decremented.  If the {\tt kill} instruction's count was {\tt 0} before
 327 decrementing, the {\tt kill} instruction is retired.  The programmer
 328 is assured that a kill instruction with a count of $n$ will kill $n+1$
 329 {\it consecutive} instructions.
 330
 331 \vspace{0.3cm}
 332 {\bf Clog}
 333 \addcontentsline{toc}{subsection}{Clog}
 334
 335 {\tt
 336 \begin{bytefield}{26}
 337   \bitheader[b]{0,20-25}\\
 338   \bitbox{1}{0}
 339   \bitbox{1}{0}
 340   \bitbox{1}{1}
 341   \bitbox{1}{0}
 342   \bitbox{1}{0}
 343   \bitbox{21}{don't care}
 344 \end{bytefield}}
 345 When a {\tt clog} instruction reaches the execution point, it remains
 346 there and no more instructions will be executed until an {\tt unclog}
 347 is performed.
 348
 349 \vspace{0.3cm}
 350 {\bf UnClog}
 351 \addcontentsline{toc}{subsection}{UnClog}
 352
 353 {\tt
 354 \begin{bytefield}{26}
 355   \bitheader[b]{0,20-25}\\
 356   \bitbox{1}{0}
 357   \bitbox{1}{0}
 358   \bitbox{1}{0}
 359   \bitbox{1}{1}
 360   \bitbox{1}{0}
 361   \bitbox{21}{don't care}
 362 \end{bytefield}}
 363 When an {\tt unclog} instruction reaches the insertion point, it will wait
 364 there until a {\tt clog} instruction is at the execution point.  When
 365 this occurs, both instructions retire.
 366
 367 \vspace{0.3cm}
 368 {\bf Literal}
 369 \addcontentsline{toc}{subsection}{Literal}
 370
 371 {\tt
 372 \begin{bytefield}{26}
 373   \bitheader[b]{0,6,7,23-25}\\
 374   \bitbox{1}{0}
 375   \bitbox{1}{1}
 376   \bitbox{17}{Literal}
 377   \bitbox{7}{Count}
 378 \end{bytefield}}
 379 When a literal instruction reaches the execution point, its {\tt
 380   Literal} field is sign-extended to a full word length and captured
 381 in the data latch.
 382
 383
 384 \pagebreak
 385 {\bf Normal Instruction}
 386 \addcontentsline{toc}{subsection}{Normal Instruction}
 387
 388 {\tt
 389 \begin{bytefield}{26}
 390   \bitheader[b]{0,6,7,16-25}\\
 391   \bitbox{1}{1}
 392   \bitbox{1}{Ti}
 393   \bitbox{1}{Di}
 394   \bitbox{1}{Dc}
 395   \bitbox{1}{Do}
 396   \bitbox{1}{To}
 397   \bitbox{1}{Ig}
 398   \bitbox{1}{Rq}
 399   \bitbox{11}{Dest}
 400   \bitbox{7}{Count}
 401 \end{bytefield}
 402 }
 403
 404 \footnoteremember{tidi}{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.}
 405 \footnoteremember{didc}{Note that {\tt Di}=0,{\tt Dc}=1
 406                  is meaningless and therefore reserved for other
 407                  uses.}
 408 \footnoteremember{todo}{ {\tt To}=1,{\tt Do}=1 have special meaning on an output port.}
 409 \footnoteremember{toig}{{\tt To}=0,{\tt Ig}=1 is invalid}
 410 \begin{itemize}
 411   \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge it.
 412
 413   \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it.
 414                  \footnoterecall{tidi}\footnoterecall{didc}
 415
 416   \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
 417                  datum in the data latch.  This bit is ignored if the incoming packet is
 418                  a token. \footnoterecall{didc}
 419
 420   \item [\tt Do] ({\bf Data Output}) emit a datum.\footnoterecall{todo}
 421
 422   \item [\tt To] ({\bf Token Output}) emit a token.\footnoterecall{todo}\footnoterecall{toig}
 423
 424   \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
 425                  the {\tt To} bit unless {\tt Count=0}\footnoterecall{toig}
 426
 427   \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
 428                  count are ``Re-Queued'' rather than RePeated.  See
 429                  {\tt Count} for more detail. \footnote{ An
 430                  instruction {\it in memory} may not have {\tt
 431                  Rq=1,Count=0} (use {\tt Rq=0,Count=0})}
 432
 433   \item  [\tt Count] ({\bf Count}) {\it After} executing:
 434 \begin{verbatim}
 435 if (Count==0) {
 436     discard this instruction
 437 } else {
 438     if Count < MAX_COUNT {
 439       decrement count
 440     }
 441     if Rq=1 or Literal {
 442        put this instruction back into the instruction fifo
 443     } else {
 444        execute this instruction again
 445     }
 446 }
 447 \end{verbatim}
 448 Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT}
 449
 450 \item [\tt Dest] ({\bf Data/Token Destination})
 451    Normally, this field is copied into the path portion of any
 452    outgoing packet ({\tt Do} on an output port or {\tt To} on either).
 453
 454    However, in the special case of an output port, if {\tt Do=1,To=1}, then
 455    the {\it most significant} {\tt 11} bits of the value in the {\it
 456    data register} are used as a destination path instead.
 457    %\footnote{This
 458    %functionality eliminates the need for any sort of ``{\tt Execute}''
 459    %ship, and lets a {\tt Fifo} ship act as an extension of the
 460    %instruction queue in the pump.}
 461
 462 \end{itemize}
 463
 464
 465 \pagebreak
 466 \section{Opcodes}
 467
 468 Opcodes, as specified in \cite{am25}, are not yet implemented, but
 469 should be.
 470
 471 \section{Bypasses}
 472
 473 Bypasses, as specified in \cite{am27}, are not yet implemented, but
 474 should be.
 475
 476 \section{Sequencing Guarantees}
 477
 478 \addcontentsline{toc}{subsection}{Source Sequence Guarantee}
 479 If two packets leave the same source and have the same path, they will
 480 arrive at their common destination {\it in the same order in which
 481   they left the source}.  This assurance is referred to as the {\it
 482   source sequence guarantee}.
 483
 484 \addcontentsline{toc}{subsection}{Instruction Sequence Guarantee}
 485 CodeBags in memory consist of an unordered set of {\it fibers}.  A
 486 fiber is an ordered set of instructions which all share the same
 487 instruction path, and therefore will be executed by the same pump.
 488 Any code which dispatches a codebag {\it must} ensure that the
 489 instructions within a fiber reach their pump in the order in which
 490 they appear in the fiber.  This is known as the {\it instruction
 491   sequence guarantee}.  Typically the software dispatching a codebag
 492 will take advantage of the source sequence guarantee in order to
 493 fulfill the instruction sequence guarantee.
 494
 495 \section{Code Bags}
 496
 497 Code bags may overlap in memory.  If the assembler determines that two
 498 code bags share a set of fibers, it may feel free to re-order those
 499 fibers in such a way that the code bags overlap, sharing memory words
 500 for these fibers.  The codebag descriptors for the two bags will then
 501 have different addresses and/or sizes, but the ranges which they
 502 encompass will overlap.
 503
 504 \subsection{Dispatching Code Bags}
 505
 506 A codebag is ``dispatched'' by causing its constituent words to
 507 arrive at the data latch of some output port, and then causing that
 508 output port's pump to execute a {\tt send} (not {\tt sendto})
 509 instruction, which is encoded as {\tt Do=1,To=1}.  Because
 510 instructions are encoded in such a way that their {\tt Instruction
 511   Path} appears in the upper 11 bits, the {\tt send} mechanism will
 512 correctly route the instruction to a pump will execute it.
 513
 514 Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally
 515 suited for this purpose, although a similar effect can easily be
 516 achieved using the {\tt BitFifo} ship in conjunction with the {\tt
 517   Memory} ship.  The only disadvantage to this arrangement is that the
 518 {\tt out} port must execute {\tt send} a certain number of times, and
 519 that number of times is determined by the {\tt size} field of the
 520 codebag descriptor.  Unfortunately this value is not known at compile
 521 time, so it cannot be encoded as a repeat count on the instruction at
 522 the {\tt out} port.  The typical solution is to place a standing ({\tt
 523   count=$\infty$}) {\tt send} instruction at the {\tt out} port of one
 524 {\tt Memory} ship, and reserve that ship exclusively for dispatching
 525 code bags.
 526
 527 Note that instructions are encoded relative to a particular dispatch
 528 output port.  That is, the {\tt Instruction Path} field of the
 529 instruction specifies a path to the desired execution pump -- but this
 530 path is also specified with respect to some source.  Therefore the
 531 instruction can be correctly dispatched only from that particular
 532 source.  However, it is fairly simple to write software which can
 533 ``relocate'' a codebag by replacing the {\tt Instruction Path} fields
 534 as needed.
 535
 536 \subsection{Tokens and Data Items}
 537
 538 When a data item is sent to a destination which expects a token, such
 539 as an output port destination, the effect will be the same as if a token
 540 had been sent to that destination, and the data value will be lost.
 541
 542 When a token is sent to a destination which expects a data item, such
 543 as a pump destination or an input port destination, the effect will be
 544 the same as if a data item {\it having an undefined value} were sent
 545 to that destination.  In other words, the programmer has no completely
 546 reliable mechanism for distinguishing between tokens and data items.
 547
 548
 549 \section{Future Directions}
 550
 551 \subsection{The Role of the Count Field}
 552
 553 Looking back on the design of the pump, several things are now
 554 apparent which were not initially.  In particular, it seems that it
 555 could be useful to separate {\it loading the count register} from
 556 other types of instructions.  This would have a few advantages:
 557
 558 \begin{itemize}
 559 \item The size of the count field would not be a consideration in the
 560       ``instruction budget'' of normal execution instructions
 561
 562 \item It would be possible to have finitely-repeating,
 563       infinitely-requeueing instructions.  Various implementation
 564       details\footnote{In many implementations, the count field of an
 565       instruction is destroyed as it counts down; another register and
 566       mux would be needed to ``save'' this value so it could be
 567       restored prior to being requeued} make it awkward to support
 568       this unless the count register can be manipulated independently
 569       of instruction execution.
 570 \end{itemize}
 571
 572 With this in mind, we can refactor the ``essence of the pump'' into
 573 the following actions, each of which may or may not be performed by a
 574 particular instruction:
 575
 576 \begin{itemize}
 577 \item await and acknowledge a token
 578 \item await and acknowledge a datum
 579 \item load the awaited datum into the path latch (top 11 bits)
 580 \item load the awaited datum into the data latch
 581 \item load a literal into the data latch
 582 \item load a literal into the path latch
 583 \item send the value in the data latch along the path in the path latch
 584 \item send a token along the path in the path latch
 585 \item set the count register to a literal value
 586 \item decrement the count register
 587 \item requeue this instruction if {\tt count>0}
 588 \item requeue this instruction unconditionally
 589 \item repeat this instruction if {\tt count>0}
 590 \item repeat this instruction unconditionally
 591 \end{itemize}
 592
 593 The crucial change here is the decoupling of the act of {\it loading
 594   the count register} from the act of {\it loading the next
 595   instruction}.  It also separates the act of loading the ``path
 596 latch'' from the act of actually performing the transmission.  This
 597 latter feature makes it possible to load the data and destination
 598 latches from two distinct data items, allowing ships to create,
 599 handle, and consume {\it packets} in the form of a pair of data items.
 600
 601 At this point, the remaining decisions are simply a question of
 602 instruction bit budgeting.  Currently we have a separate instruction
 603 form for literals, so that the literal can use bits which are not
 604 otherwise relevant to literal instructions (for example, {\tt Di}).
 605 Adding an additional instruction form for loading the count register
 606 would have similar advantages.
 607
 608
 609 \subsection{For Just a Little Bit More...}
 610
 611 Compared to early revisions of the Fleet architecture, the docks of
 612 FleetTwo are substantially more powerful.  One might ask, why not add
 613 additional power to the docks, perhaps up to the point of making them
 614 tiny von Neumann machines in their own right?  Indeed, doing so would
 615 risk reliving the {\it Wheel of Reincarnation}
 616 \cite{WheelOfReincarnation}.
 617
 618 I believe, however, that there is a fundamental distinction to be
 619 made between devices whose control flow {\it can branch} and those
 620 whose control flow {\it cannot branch}, and that this distinction is
 621 founded in the geometry of VLSI, and indeed, Euclidean space.
 622 Ultimately, the instruction stream for any component of a processor
 623 must exist in some physical incarnation -- some geometric extent.  If
 624 the control flow of a program is linear, with no branching, then the
 625 instruction stream has an optimal spatial mapping which is
 626 efficient: simply lay it out as a FIFO.
 627
 628 By contrast, however, a program whose instruction stream has
 629 unrestricted branching is logically a {\it tree} structure.
 630 Unfortunately, however, general trees have no efficient mapping onto
 631 two-dimensional surfaces.  By this, I mean that in every possible
 632 mapping, there exists some pair of {\it logically adjacent}
 633 instructions whose {\it physical distance} is proportional to the size
 634 of the entire program, rather than some constant factor irrespective
 635 of the program size.  Therefore I conclude that the spatial mapping of
 636 a program with unrestricted branching is in an entirely different
 637 class than that of programs with linear control flow, and I choose this
 638 distinction as the basis for limiting the docks to linear control
 639 flow sequences.  A similar argument can be made for looping constructs
 640 in which two or more loops may be nested within a single parent loop.
 641
 642 With this distinction in mind, it would not be objectionable for
 643 future incarnations of Fleet to have {\it predicated} instructions at
 644 the docks.  This might take the form of:
 645
 646 \begin{itemize}
 647 \item Instructions to unconditionally set some ``state flags''
 648 \item Instructions to conditionally set some ``state flags'' based on the contents of the data latch
 649 \item Bits which cause an instruction to be ignored or retired based on the state flags
 650 \end{itemize}
 651
 652 I feel that this addition would not risk falling down the slippery
 653 slope -- instruction streams with predicated instructions (but not
 654 unrestricted branching or nested looping) preserve the crucial
 655 property that they have an efficient spatial mapping.  Any steps
 656 beyond this, however, cross a fairly fundamental line.
 657
 658
 659 \addcontentsline{toc}{section}{Ship Descriptions}
 660
 661