1 \documentclass[10pt,oneside]{article}
3 \usepackage[titles]{tocloft}
6 \DeclareGraphicsRule{*}{mps}{*}{}
8 \newcommand{\footnoteremember}[2]{
11 \setcounter{#1}{\value{footnote}}
12 } \newcommand{\footnoterecall}[1]{
13 \footnotemark[\value{#1}]
16 \ifx\pdftexversion\undefined
17 \usepackage[dvips]{graphicx}
18 \DeclareGraphicsExtensions{.eps}\else
19 \usepackage[pdftex]{graphicx}
20 \DeclareGraphicsExtensions{.pdf,.png,.mps,.mp}
29 \usepackage{bytefield}
30 \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
31 \renewcommand{\ttdefault}{cmtt}
32 \title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of FleetTwo}}
36 \begin{thebibliography}{[GDG01]}
37 \bibitem[IES02]{ies02}
39 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer}
40 \bibitem[IES14]{ies14}
42 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition}
45 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am17-The.Choice.Ship.pdf}{\it UCB-AM17: The Choice Ship}
46 \bibitem[IES44]{ies44}
48 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition}
51 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am25.pdf}{\it UCB-AM25: Opcode Ports}
54 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am27.pdf}{\it UCB-AM27: Bypass Paths}
55 \bibitem[IES31]{ies31}
57 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies31-Some.SHIPs.pdf}{\it UCB-IES31: Some Ships}
58 \bibitem[MS68]{WheelOfReincarnation}
59 Myer, T. and Sutherland, Ivan, {\it On the Design of Display Processors}
60 Communications of the ACM, Vol 11, No. 6, June 1968 .
65 This manual attempts to detail all those aspects of Fleet which are
66 visible to a software programmer or compiler backend.
68 This document assumes that the reader is broadly familiar with the
69 Fleet architecture and the motivation behind it. For a far better
70 introduction to these topics, the reader is referred to \cite{ies02}
71 \cite{ies14} and \cite{ies44}. Please bear in mind that many of the
72 details in these documents have been changed, but they still serve as
73 the authoritative introduction to the architecture. After reviewing
74 these introductory papers, the reader is invited to digest this
75 architecture manual for the latest information on the details
76 necessary to write software and compilers for Fleet.
81 %\fbox{\parbox{5in}{\footnotesize
83 %The software described in this memo may be obtained using
84 %\href{http://www.darcs.net/}{\tt darcs}, with the command:
86 %{\tt darcs get http://research.cs.berkeley.edu/class/fleet/repos/fleet/}
94 \section{Introduction (from \cite{ies02})}
96 During the past two and a half years, the Fleet architecture has gone
97 through massive changes and improvements. Nonetheless, the
98 fundamental vision remains unchanged. As Sutherland writes in the
99 original Berkeley Fleet description \cite{ies02},
102 When computers were new, logic and storage were expensive and wires
103 were relatively cheap. Early computer designers avoided using logic
104 wherever possible but were not greatly concerned with communication.
105 They made design choices consistent with the costs of the day. My
106 favorite example is the jump instruction. Early designers put jump
107 instructions at the end of each block of code to avoid the expense of
108 storing the address of the next block while executing the present one.
110 Today's chip fabrication methods invert the older cost balance. In
111 today's integrated circuits logic and memory are now almost free but
112 communication costs dominate chip area, energy consumption and delay.
113 In spite of these changes in the stuff from which we make computers,
114 vestiges of the past remain in many of today's common microprocessor
115 designs. For example, jump instructions still appear at the end of
116 each basic block of code in spite of the need to pre-fetch the next
117 block while executing this one. It seems better today to store a
118 pointer to the next block and the length of the current block early in
121 Instead of following the path of history, I'd like to listen carefully
122 to what modern chip structures have to teach about how one might build
123 a modern computer. I see three major lessons. First, simplicity can
124 reduce cost. Second, moving data will consume most of the time,
125 energy and chip area of a modern computer. And third, the low cost of
126 logic makes concurrency available if we can figure out how to use it.
130 \item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating
131 all processing devices alike...
133 \item {\bf Communication}: The Fleet architecture seeks to control the
134 cost of communication by putting it under direct programmer
135 control. Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that
136 include concealed communication to and from a register file...
138 \item {\bf Concurrency}: The Fleet architecture assumes concurrency
141 \item {\bf Asynchrony}: This provides enormous flexibility for
142 implementers to improve the performance or reduce the cost of
143 the processing devices in a Fleet system. -- Synchronous
144 implementations both of ships and of the switch fabric are
145 possible provided they use validity or occupancy bits to achieve
146 the arbitrary delays required at sources and destinations...
153 \section{The Programmer's View of The Ship-Fabric Interface}
155 The role of the Fleet switch fabric is to carry {\it packets} from
156 {\it sources} to {\it destinations}.
158 A packet consists of a {\it path} and a {\it payload}. The path is
159 some string of bits which tells the switch fabric how to route from
160 the desired source to the desired destination. The distinction
161 between a {\it path} and a {\it destination} is extremely important!
162 The payload is either a {\it token} or a {\it data item}. If the
163 payload is a data item, it includes one machine word of data.
165 The diagram below represents a {\it programmer's} conceptual view of
166 the interface between ships and the switch fabric. Actual
167 implementation circuitry may differ substantially. Sources and
168 destinations which can send and receive only tokens -- not data items
169 -- are drawn as dashed lines.
173 \epsfig{file=ports,width=4in}
177 The term {\it port} refers to an interface to the ship, the {\it
178 dock} connecting it to the switch fabric, and the corresponding
179 sources and destinations on the switch fabric.
181 Each dock consists of a {\it data latch}, which is as wide as a
182 single machine word and a {\it pump}, which is a circular fifo of
183 instruction-width latches. The values in the instruction fifo
184 control the data latch, as future chapters will explain.
186 Note that the pump in each dock has a destination of its own, and
187 that there is no buffering fifo guarding this destination. Note that
188 all other destinations are guarded by a buffering fifo; the size of
189 this fifo is exposed to the software programmer so she can avoid
194 \section{Data Formats}
196 \subsection{Path (12 bits)}
198 These bits appear physically within the switch fabric. \footnote{In
199 the Sun Labs implementation, these bits have ``address bit
200 timing.''} The {\tt T} bit is the ``tokenhood'' bit; if set, this
201 packet represents a token\footnote{In the Sun Labs
202 implementation, this bit inhibits the switch fabric data latches
206 \begin{bytefield}{49}
207 \bitheader[b]{37,47,48}\\
214 \subsection{Data Word In Memory (37 bits)}
216 A word of data is 37 bits wide.
219 \begin{bytefield}{49}
220 \bitheader[b]{0,36}\\
222 \bitbox{37}{Data Word}
226 \subsection{Packet In Flight (49 bits)}
229 \begin{bytefield}{49}
230 \bitheader[b]{0,36,37,47,48}\\
233 \bitbox{37}{Data Word}
237 \subsection{Instruction In Memory (37 bits)}
239 An instruction must be no wider than a memory word. The next section
240 explains the bits in greater detail. The {\tt Instruction Path} is
241 the path which the instruction itself will take in order to arrive at
242 the desired pump destination. The {\it Data/Token Path} is placed on
243 any packet inserted into the switch fabric as a result of executing
247 \begin{bytefield}{49}
248 \bitheader[b]{0,6,7,17,18-26,36}\\
250 \bitbox{11}{Instruction Path}
259 \bitbox{11}{Data/Token Path}
264 \subsection{Instruction Packet In Flight (49 bits)}
266 Note that Fleet simply copies the {\tt Instruction Path} field, bit
267 for bit, into the packet {\tt Path} field in order to ``dispatch'' an
271 \begin{bytefield}{49}
272 \bitheader[b]{0,6,7,17,18-25,37,47,48}\\
274 \bitbox{11}{Instruction Path}
275 \bitbox{11}{Instruction Path}
284 \bitbox{11}{Data/Token Path}
289 \setlength{\bitwidth}{5mm}
292 \section{Instruction Formats}
293 \begin{wrapfigure}{R}{2in}
294 \epsfig{file=locations,width=2in}
297 Within the pump's instruction fifo, there are two points of particular
298 interest. The first point is the {\it insertion point}, which is the
299 point in the ring where new instructions enter from the switch fabric.
300 The other point of interest is the {\it execution point}, which is the
301 point in the ring where instructions other than {\tt kill} and {\tt
302 unclog} are performed.
304 The programmer may assume that the insertion point is the immediate
305 successor of the execution point.
309 \addcontentsline{toc}{subsection}{Kill}
312 \begin{bytefield}{26}
313 \bitheader[b]{0,6,7,20-25}\\
319 \bitbox{14}{don't care}
322 When a {\tt kill} instruction reaches the insertion point, it will
323 wait there for a non-{\tt clog} instruction to reach the execution
324 point. When this occurs, the instruction at the execution point is
325 retired and the count field of the {\tt kill} instruction is
326 decremented. If the {\tt kill} instruction's count was {\tt 0} before
327 decrementing, the {\tt kill} instruction is retired. The programmer
328 is assured that a kill instruction with a count of $n$ will kill $n+1$
329 {\it consecutive} instructions.
333 \addcontentsline{toc}{subsection}{Clog}
336 \begin{bytefield}{26}
337 \bitheader[b]{0,20-25}\\
343 \bitbox{21}{don't care}
345 When a {\tt clog} instruction reaches the execution point, it remains
346 there and no more instructions will be executed until an {\tt unclog}
351 \addcontentsline{toc}{subsection}{UnClog}
354 \begin{bytefield}{26}
355 \bitheader[b]{0,20-25}\\
361 \bitbox{21}{don't care}
363 When an {\tt unclog} instruction reaches the insertion point, it will wait
364 there until a {\tt clog} instruction is at the execution point. When
365 this occurs, both instructions retire.
369 \addcontentsline{toc}{subsection}{Literal}
372 \begin{bytefield}{26}
373 \bitheader[b]{0,6,7,23-25}\\
379 When a literal instruction reaches the execution point, its {\tt
380 Literal} field is sign-extended to a full word length and captured
385 {\bf Normal Instruction}
386 \addcontentsline{toc}{subsection}{Normal Instruction}
389 \begin{bytefield}{26}
390 \bitheader[b]{0,6,7,16-25}\\
404 \footnoteremember{tidi}{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.}
405 \footnoteremember{didc}{Note that {\tt Di}=0,{\tt Dc}=1
406 is meaningless and therefore reserved for other
408 \footnoteremember{todo}{ {\tt To}=1,{\tt Do}=1 have special meaning on an output port.}
409 \footnoteremember{toig}{{\tt To}=0,{\tt Ig}=1 is invalid}
411 \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge it.
413 \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it.
414 \footnoterecall{tidi}\footnoterecall{didc}
416 \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
417 datum in the data latch. This bit is ignored if the incoming packet is
418 a token. \footnoterecall{didc}
420 \item [\tt Do] ({\bf Data Output}) emit a datum.\footnoterecall{todo}
422 \item [\tt To] ({\bf Token Output}) emit a token.\footnoterecall{todo}\footnoterecall{toig}
424 \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
425 the {\tt To} bit unless {\tt Count=0}\footnoterecall{toig}
427 \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
428 count are ``Re-Queued'' rather than RePeated. See
429 {\tt Count} for more detail. \footnote{ An
430 instruction {\it in memory} may not have {\tt
431 Rq=1,Count=0} (use {\tt Rq=0,Count=0})}
433 \item [\tt Count] ({\bf Count}) {\it After} executing:
436 discard this instruction
438 if Count < MAX_COUNT {
442 put this instruction back into the instruction fifo
444 execute this instruction again
448 Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT}
450 \item [\tt Dest] ({\bf Data/Token Destination})
451 Normally, this field is copied into the path portion of any
452 outgoing packet ({\tt Do} on an output port or {\tt To} on either).
454 However, in the special case of an output port, if {\tt Do=1,To=1}, then
455 the {\it most significant} {\tt 11} bits of the value in the {\it
456 data register} are used as a destination path instead.
458 %functionality eliminates the need for any sort of ``{\tt Execute}''
459 %ship, and lets a {\tt Fifo} ship act as an extension of the
460 %instruction queue in the pump.}
468 Opcodes, as specified in \cite{am25}, are not yet implemented, but
473 Bypasses, as specified in \cite{am27}, are not yet implemented, but
476 \section{Sequencing Guarantees}
478 \addcontentsline{toc}{subsection}{Source Sequence Guarantee}
479 If two packets leave the same source and have the same path, they will
480 arrive at their common destination {\it in the same order in which
481 they left the source}. This assurance is referred to as the {\it
482 source sequence guarantee}.
484 \addcontentsline{toc}{subsection}{Instruction Sequence Guarantee}
485 CodeBags in memory consist of an unordered set of {\it fibers}. A
486 fiber is an ordered set of instructions which all share the same
487 instruction path, and therefore will be executed by the same pump.
488 Any code which dispatches a codebag {\it must} ensure that the
489 instructions within a fiber reach their pump in the order in which
490 they appear in the fiber. This is known as the {\it instruction
491 sequence guarantee}. Typically the software dispatching a codebag
492 will take advantage of the source sequence guarantee in order to
493 fulfill the instruction sequence guarantee.
497 Code bags may overlap in memory. If the assembler determines that two
498 code bags share a set of fibers, it may feel free to re-order those
499 fibers in such a way that the code bags overlap, sharing memory words
500 for these fibers. The codebag descriptors for the two bags will then
501 have different addresses and/or sizes, but the ranges which they
502 encompass will overlap.
504 \subsection{Dispatching Code Bags}
506 A codebag is ``dispatched'' by causing its constituent words to
507 arrive at the data latch of some output port, and then causing that
508 output port's pump to execute a {\tt send} (not {\tt sendto})
509 instruction, which is encoded as {\tt Do=1,To=1}. Because
510 instructions are encoded in such a way that their {\tt Instruction
511 Path} appears in the upper 11 bits, the {\tt send} mechanism will
512 correctly route the instruction to a pump will execute it.
514 Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally
515 suited for this purpose, although a similar effect can easily be
516 achieved using the {\tt BitFifo} ship in conjunction with the {\tt
517 Memory} ship. The only disadvantage to this arrangement is that the
518 {\tt out} port must execute {\tt send} a certain number of times, and
519 that number of times is determined by the {\tt size} field of the
520 codebag descriptor. Unfortunately this value is not known at compile
521 time, so it cannot be encoded as a repeat count on the instruction at
522 the {\tt out} port. The typical solution is to place a standing ({\tt
523 count=$\infty$}) {\tt send} instruction at the {\tt out} port of one
524 {\tt Memory} ship, and reserve that ship exclusively for dispatching
527 Note that instructions are encoded relative to a particular dispatch
528 output port. That is, the {\tt Instruction Path} field of the
529 instruction specifies a path to the desired execution pump -- but this
530 path is also specified with respect to some source. Therefore the
531 instruction can be correctly dispatched only from that particular
532 source. However, it is fairly simple to write software which can
533 ``relocate'' a codebag by replacing the {\tt Instruction Path} fields
536 \subsection{Tokens and Data Items}
538 When a data item is sent to a destination which expects a token, such
539 as an output port destination, the effect will be the same as if a token
540 had been sent to that destination, and the data value will be lost.
542 When a token is sent to a destination which expects a data item, such
543 as a pump destination or an input port destination, the effect will be
544 the same as if a data item {\it having an undefined value} were sent
545 to that destination. In other words, the programmer has no completely
546 reliable mechanism for distinguishing between tokens and data items.
549 \section{Future Directions}
551 \subsection{The Role of the Count Field}
553 Looking back on the design of the pump, several things are now
554 apparent which were not initially. In particular, it seems that it
555 could be useful to separate {\it loading the count register} from
556 other types of instructions. This would have a few advantages:
559 \item The size of the count field would not be a consideration in the
560 ``instruction budget'' of normal execution instructions
562 \item It would be possible to have finitely-repeating,
563 infinitely-requeueing instructions. Various implementation
564 details\footnote{In many implementations, the count field of an
565 instruction is destroyed as it counts down; another register and
566 mux would be needed to ``save'' this value so it could be
567 restored prior to being requeued} make it awkward to support
568 this unless the count register can be manipulated independently
569 of instruction execution.
572 With this in mind, we can refactor the ``essence of the pump'' into
573 the following actions, each of which may or may not be performed by a
574 particular instruction:
577 \item await and acknowledge a token
578 \item await and acknowledge a datum
579 \item load the awaited datum into the path latch (top 11 bits)
580 \item load the awaited datum into the data latch
581 \item load a literal into the data latch
582 \item load a literal into the path latch
583 \item send the value in the data latch along the path in the path latch
584 \item send a token along the path in the path latch
585 \item set the count register to a literal value
586 \item decrement the count register
587 \item requeue this instruction if {\tt count>0}
588 \item requeue this instruction unconditionally
589 \item repeat this instruction if {\tt count>0}
590 \item repeat this instruction unconditionally
593 The crucial change here is the decoupling of the act of {\it loading
594 the count register} from the act of {\it loading the next
595 instruction}. It also separates the act of loading the ``path
596 latch'' from the act of actually performing the transmission. This
597 latter feature makes it possible to load the data and destination
598 latches from two distinct data items, allowing ships to create,
599 handle, and consume {\it packets} in the form of a pair of data items.
601 At this point, the remaining decisions are simply a question of
602 instruction bit budgeting. Currently we have a separate instruction
603 form for literals, so that the literal can use bits which are not
604 otherwise relevant to literal instructions (for example, {\tt Di}).
605 Adding an additional instruction form for loading the count register
606 would have similar advantages.
609 \subsection{For Just a Little Bit More...}
611 Compared to early revisions of the Fleet architecture, the docks of
612 FleetTwo are substantially more powerful. One might ask, why not add
613 additional power to the docks, perhaps up to the point of making them
614 tiny von Neumann machines in their own right? Indeed, doing so would
615 risk reliving the {\it Wheel of Reincarnation}
616 \cite{WheelOfReincarnation}.
618 I believe, however, that there is a fundamental distinction to be
619 made between devices whose control flow {\it can branch} and those
620 whose control flow {\it cannot branch}, and that this distinction is
621 founded in the geometry of VLSI, and indeed, Euclidean space.
622 Ultimately, the instruction stream for any component of a processor
623 must exist in some physical incarnation -- some geometric extent. If
624 the control flow of a program is linear, with no branching, then the
625 instruction stream has an optimal spatial mapping which is
626 efficient: simply lay it out as a FIFO.
628 By contrast, however, a program whose instruction stream has
629 unrestricted branching is logically a {\it tree} structure.
630 Unfortunately, however, general trees have no efficient mapping onto
631 two-dimensional surfaces. By this, I mean that in every possible
632 mapping, there exists some pair of {\it logically adjacent}
633 instructions whose {\it physical distance} is proportional to the size
634 of the entire program, rather than some constant factor irrespective
635 of the program size. Therefore I conclude that the spatial mapping of
636 a program with unrestricted branching is in an entirely different
637 class than that of programs with linear control flow, and I choose this
638 distinction as the basis for limiting the docks to linear control
639 flow sequences. A similar argument can be made for looping constructs
640 in which two or more loops may be nested within a single parent loop.
642 With this distinction in mind, it would not be objectionable for
643 future incarnations of Fleet to have {\it predicated} instructions at
644 the docks. This might take the form of:
647 \item Instructions to unconditionally set some ``state flags''
648 \item Instructions to conditionally set some ``state flags'' based on the contents of the data latch
649 \item Bits which cause an instruction to be ignored or retired based on the state flags
652 I feel that this addition would not risk falling down the slippery
653 slope -- instruction streams with predicated instructions (but not
654 unrestricted branching or nested looping) preserve the crucial
655 property that they have an efficient spatial mapping. Any steps
656 beyond this, however, cross a fairly fundamental line.
659 \addcontentsline{toc}{section}{Ship Descriptions}