1 \documentclass[10pt,oneside]{article}
3 \usepackage[titles]{tocloft}
5 \DeclareGraphicsRule{*}{mps}{*}{}
7 \ifx\pdftexversion\undefined
8 \usepackage[dvips]{graphicx}
9 \DeclareGraphicsExtensions{.eps}\else
10 \usepackage[pdftex]{graphicx}
11 \DeclareGraphicsExtensions{.pdf,.png,.mps,.mp}
20 \usepackage{bytefield}
21 \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
22 \renewcommand{\ttdefault}{cmtt}
23 \title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of Fleet}}
27 \begin{thebibliography}{[GDG01]}
28 \bibitem[IES02]{ies02}
30 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer}
31 \bibitem[IES14]{ies14}
33 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition}
36 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am17-The.Choice.Ship.pdf}{\it UCB-AM17: The Choice Ship}
37 \bibitem[IES44]{ies44}
39 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition}
42 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am25.pdf}{\it UCB-AM25: Opcode Ports}
45 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am27.pdf}{\it UCB-AM27: Bypass Paths}
46 \bibitem[IES31]{ies31}
48 \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies31-Some.SHIPs.pdf}{\it UCB-IES31: Some Ships}
49 \bibitem[MS68]{WheelOfReincarnation}
50 Myer, T. and Sutherland, Ivan, {\it On the Design of Display Processors}
51 Communications of the ACM, Vol 11, No. 6, June 1968 .
56 This manual attempts to detail all those aspects of Fleet which are
57 visible to a software programmer or compiler backend.
59 This document assumes that the reader is broadly familiar with the
60 Fleet architecture and the motivation behind it. For a far better
61 introduction to these topics, the reader is referred to \cite{ies02}
62 \cite{ies14} and \cite{ies44}. Please bear in mind that many of the
63 details in these documents have been changed, but they still serve as
64 the authoritative introduction to the architecture. After reviewing
65 these introductory papers, the reader is invited to digest this
66 architecture manual for the latest information on the details
67 necessary to write software and compilers for Fleet.
72 %\fbox{\parbox{5in}{\footnotesize
74 %The software described in this memo may be obtained using
75 %\href{http://www.darcs.net/}{\tt darcs}, with the command:
77 %{\tt darcs get http://research.cs.berkeley.edu/class/fleet/repos/fleet/}
85 \section{Introduction (from \cite{ies02})}
87 During the past two and a half years, the Fleet architecture has gone
88 through massive changes and improvements. Nonetheless, the
89 fundamental vision remains unchanged. As Sutherland writes in the
90 original Berkeley Fleet description \cite{ies02},
93 When computers were new, logic and storage were expensive and wires
94 were relatively cheap. Early computer designers avoided using logic
95 wherever possible but were not greatly concerned with communication.
96 They made design choices consistent with the costs of the day. My
97 favorite example is the jump instruction. Early designers put jump
98 instructions at the end of each block of code to avoid the expense of
99 storing the address of the next block while executing the present one.
101 Today's chip fabrication methods invert the older cost balance. In
102 today's integrated circuits logic and memory are now almost free but
103 communication costs dominate chip area, energy consumption and delay.
104 In spite of these changes in the stuff from which we make computers,
105 vestiges of the past remain in many of today's common microprocessor
106 designs. For example, jump instructions still appear at the end of
107 each basic block of code in spite of the need to pre-fetch the next
108 block while executing this one. It seems better today to store a
109 pointer to the next block and the length of the current block early in
112 Instead of following the path of history, I'd like to listen carefully
113 to what modern chip structures have to teach about how one might build
114 a modern computer. I see three major lessons. First, simplicity can
115 reduce cost. Second, moving data will consume most of the time,
116 energy and chip area of a modern computer. And third, the low cost of
117 logic makes concurrency available if we can figure out how to use it.
121 \item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating
122 all processing devices alike...
124 \item {\bf Communication}: The Fleet architecture seeks to control the
125 cost of communication by putting it under direct programmer
126 control. Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that
127 include concealed communication to and from a register file...
129 \item {\bf Concurrency}: The Fleet architecture assumes concurrency
132 \item {\bf Asynchrony}: This provides enormous flexibility for
133 implementers to improve the performance or reduce the cost of
134 the processing devices in a Fleet system. -- Synchronous
135 implementations both of ships and of the switch fabric are
136 possible provided they use validity or occupancy bits to achieve
137 the arbitrary delays required at sources and destinations...
144 \section{The Programmer's View of The Ship-Fabric Interface}
146 The role of the Fleet switch fabric is to carry {\it packets} from
147 {\it sources} to {\it destinations}.
149 A packet consists of a {\it path} and a {\it payload}. The path is
150 some string of bits which tells the switch fabric how to route from
151 the desired source to the desired destination. The distinction
152 between a {\it path} and a {\it destination} is extremely important!
153 The payload is either a {\it token} or a {\it data item}. If the
154 payload is a data item, it includes one machine word of data.
156 The diagram below represents a {\it programmer's} conceptual view of
157 the interface between ships and the switch fabric. Actual
158 implementation circuitry may differ substantially. Sources and
159 destinations which can send and receive only tokens -- not data items
160 -- are drawn as dashed lines.
164 \epsfig{file=ports,width=4in}
168 The term {\it port} refers to an interface to the ship, the {\it
169 valve} connecting it to the switch fabric, and the corresponding
170 sources and destinations on the switch fabric.
172 Each valve consists of a {\it data latch}, which is as wide as a
173 single machine word and a {\it pump}, which is a circular fifo of
174 instruction-width latches. The values in the instruction fifo
175 control the data latch, as future chapters will explain.
177 Note that the pump in each valve has a destination of its own, and
178 that there is no buffering fifo guarding this destination. Note that
179 all other destinations are guarded by a buffering fifo; the size of
180 this fifo is exposed to the software programmer so she can ensure that
181 deadlock does not occur.
185 \section{Data Formats}
187 \subsection{Path (12 bits)}
189 These bits appear physically within the switch fabric. \footnote{In
190 the Sun Labs implementation, these bits have ``address bit
191 timing.''} The {\tt T} bit is the ``tokenhood'' bit; if set, this
192 packet represents a token\footnote{In the Sun Labs
193 implementation, this bit inhibits the switch fabric data latches
197 \begin{bytefield}{49}
198 \bitheader[b]{37,47,48}\\
205 \subsection{Data Word In Memory (37 bits)}
207 A word of data is 37 bits wide.
210 \begin{bytefield}{49}
211 \bitheader[b]{0,36}\\
213 \bitbox{37}{Data Word}
217 \subsection{Packet In Flight (49 bits)}
220 \begin{bytefield}{49}
221 \bitheader[b]{0,36,37,47,48}\\
224 \bitbox{37}{Data Word}
228 \subsection{Instruction In Memory (37 bits)}
230 An instruction must be no wider than a memory word. The next section
231 explains the bits in greater detail. The {\tt Instruction Path} is
232 the path which the instruction itself will take in order to arrive at
233 the desired pump destination. The {\it Data/Token Path} is placed on
234 any packet inserted into the switch fabric as a result of executing
238 \begin{bytefield}{49}
239 \bitheader[b]{0,10,11,17,18-26,36}\\
241 \bitbox{11}{Instruction Path}
250 \bitbox{11}{Data/Token Path}
255 \subsection{Instruction Packet In Flight (49 bits)}
257 Note that Fleet simply copies the {\tt Instruction Path} field, bit
258 for bit, into the packet {\tt Path} field in order to ``dispatch'' an
262 \begin{bytefield}{49}
263 \bitheader[b]{0,10,11,17,18-25,37,47,48}\\
265 \bitbox{11}{Instruction Path}
266 \bitbox{11}{Instruction Path}
275 \bitbox{11}{Data/Token Path}
280 \setlength{\bitwidth}{5mm}
283 \section{Instruction Formats}
284 \begin{wrapfigure}{R}{2in}
285 \epsfig{file=locations,width=2in}
288 Within the pump's instruction fifo, there are two points of particular
289 interest. The first point is the {\it insertion point}, which is the
290 point in the ring where new instructions enter from the switch fabric.
291 The other point of interest is the {\it execution point}, which is the
292 point in the ring where instructions other than {\tt kill} and {\tt
293 unclog} are performed.
295 The programmer may assume that the insertion point is the immediate
296 successor of the execution point.
300 \addcontentsline{toc}{subsection}{Kill}
303 \begin{bytefield}{26}
304 \bitheader[b]{0,6,7,20-25}\\
313 When a {\tt kill} instruction reaches the insertion point, it will
314 wait there for a non-{\tt clog} instruction to reach the execution
315 point. When this occurs, the instruction at the execution point is
316 retired and the count field of the {\tt kill} instruction is
317 decremented. If the {\tt kill} instruction's count was {\tt 0} before
318 decrementing, the {\tt kill} instruction is retired. The programmer
319 is assured that a kill instruction with a count of $n$ will kill $n$
320 {\it consecutive} instructions.
324 \addcontentsline{toc}{subsection}{Clog}
327 \begin{bytefield}{26}
328 \bitheader[b]{0,20-25}\\
336 When a {\tt clog} instruction reaches the execution point, no more
337 instructions will be executed until an {\tt unclog} is performed.
341 \addcontentsline{toc}{subsection}{UnClog}
344 \begin{bytefield}{26}
345 \bitheader[b]{0,20-25}\\
353 When an {\tt unclog} instruction reaches the insertion point, it will wait
354 there until a {\tt clog} instruction is at the execution point. When
355 this occurs, both instructions retire.
359 \addcontentsline{toc}{subsection}{Literal}
362 \begin{bytefield}{26}
363 \bitheader[b]{0,6,7,23-25}\\
369 When a literal instruction reaches the execution point, its {\tt
370 Literal} field is sign-extended to a full word length and captured
375 {\bf Normal Instruction}
376 \addcontentsline{toc}{subsection}{Normal Instruction}
379 \begin{bytefield}{26}
380 \bitheader[b]{0,6,7,16-25}\\
396 \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge
397 it\footnote{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.}
399 \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it.
401 \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
402 datum in the data latch. This bit is ignored if the incoming packet is
403 a token. \footnote{ Note that {\tt Di}=0,{\tt Dc}=1
404 is meaningless and therefore reserved for other
407 \item [\tt Do] ({\bf Data Output}) emit a datum.
409 \item [\tt To] ({\bf Token Output}) emit a token.\footnote{ {\tt To}=1,{\tt
410 Do}=1 have special meaning on an output port.}
412 \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
413 the {\tt To} bit unless {\tt Count=0} \footnote{{\tt
414 To}=0,{\tt Ig}=1 is invalid}
416 \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
417 count are ``Re-Queued'' rather than RePeated. See
418 {\tt Count} for more detail. \footnote{ An
419 instruction {\it in memory} may not have {\tt
420 Rq=1,Count=0} (use {\tt Rq=0,Count=0})}
422 \item [\tt Count] ({\bf Count}) {\it After} executing:
425 discard this instruction
427 if Count < MAX_COUNT {
431 put this instruction back into the instruction fifo
433 execute this instruction again
437 Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT}
439 \item [\tt Dest] ({\bf Data/Token Destination})
440 Normally, this field is copied into the address portion of any
441 outgoing packet ({\tt Do} on an output port or {\tt To} on either).
443 However, in the special case of an output port, if {\tt Do=1,To=1}, then
444 the {\it most significant} {\tt 11} bits of the value in the {\it
445 data register} are used as a destination address instead.
447 %functionality eliminates the need for any sort of ``{\tt Execute}''
448 %ship, and lets a {\tt Fifo} ship act as an extension of the
449 %instruction queue in the pump.}
457 Opcodes, as specified in \cite{am25}, are not yet implemented, but
462 Bypasses, as specified in \cite{am27}, are not yet implemented, but
465 \section{Sequencing Guarantees}
467 \addcontentsline{toc}{subsection}{Source Sequence Guarantee}
468 If two packets leave the same source and have the same path, they will
469 arrive at their common destination {\it in the same order in which
470 they left the source}. This assurance is referred to as the {\it
471 source sequence guarantee}.
473 \addcontentsline{toc}{subsection}{Instruction Sequence Guarantee}
474 CodeBags in memory consist of an unordered set of {\it fibers}. A
475 fiber is an ordered set of instructions which all share the same
476 instruction path, and therefore will be executed by the same pump.
477 Any code which dispatches a codebag {\it must} ensure that the
478 instructions within a fiber reach their pump in the order in which
479 they appear in the fiber. This is known as the {\it instruction
480 sequence guarantee}. Typically the software dispatching a codebag
481 will take advantage of the source sequence guarantee in order to
482 fulfill the instruction sequence guarantee.
486 Code bags may overlap in memory. If the assembler determines that two
487 code bags share a set of fibers, it may feel free to re-order those
488 fibers in such a way that the code bags overlap, sharing memory words
489 for these fibers. The codebag descriptors for the two bags will then
490 have different addresses and/or sizes, but the ranges which they
491 encompass will overlap.
493 \subsection{Dispatching Code Bags}
495 A codebag is ``dispatched'' by causing its constituent words to
496 arrive at the data latch of some output port, and then causing that
497 output port's pump to execute a {\tt send} (not {\tt sendto})
498 instruction, which is encoded as {\tt Do=1,To=1}. Because
499 instructions are encoded in such a way that their {\tt Instruction
500 Path} appears in the upper 11 bits, the {\tt send} mechanism will
501 correctly route the instruction to the pump which ought to execute it.
503 Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally
504 suited for this purpose, although a similar effect can easily be
505 achieved using the {\tt BitFifo} ship in conjunction with the {\tt
506 Memory} ship. The only disadvantage to this arrangement is that the
507 {\tt out} port must execute {\tt send} a certain number of times, and
508 that number of times is determined by the {\tt size} field of the
509 codebag descriptor. Unfortunately this value is not known at compile
510 time, so it cannot be encoded as a repeat count on the instruction at
511 the {\tt out} port. The typical solution is to place a standing ({\tt
512 count=$\infty$}) {\tt send} instruction at the {\tt out} port of one
513 {\tt Memory} ship, and reserve that ship exclusively for dispatching
516 Note that instructions are encoded relative to a particular dispatch
517 output port. That is, the {\tt Instruction Path} field of the
518 instruction specifies a path to the desired execution pump -- but this
519 path is also specified with respect to some source. Therefore the
520 instruction can only be correctly dispatched from that particular
521 source. However, it is a fairly simple matter two write a software
522 program which can ``relocate'' a codebag by replacing the {\tt
523 Instruction Path} fields as needed.
525 \subsection{Tokens and Data Items}
527 When a data item is sent to a destination which expects a token, such
528 as an output port destination, the effect will be the same as if a token
529 had been sent to that destination.
531 When a token is sent to a destination which expects a data item, such
532 as a pump destination or an input port destination, the effect will be
533 the same as if a data item {\it having an undefined value} were sent
534 to that destination. In other words, the programmer has no completely
535 reliable mechanism for distinguishing between tokens and data items.
538 \section{Future Directions}
540 \subsection{The Role of the Count Field}
542 Looking back on the design of the pump, several things are now
543 apparent which were not initially. In particular, it seems that it
544 could be useful to separate {\it loading the count register} from
545 other types of instructions. This would have a few advantages:
548 \item The size of the count field would not be a consideration in the
549 ``instruction budget'' of normal execution instructions
551 \item It would be possible to have finitely-repeating,
552 infinitely-requeueing instructions. Various implementation
553 details\footnote{in many implementations, the count field of an
554 instruction is destroyed as it counts down; another register and
555 mux would be needed to ``save'' this value so it could be
556 restored prior to being requeued} make it awkward to support
557 this unless the count register can be manipulated independently
558 of instruction execution.
561 With this in mind, we can refactor the ``essence of the pump'' into
562 the following actions, each of which may or may not be performed by a
563 particular instruction:
566 \item await and acknowledge a token
567 \item await and acknowledge a datum
568 \item load the awaited datum into the path latch (top 11 bits)
569 \item load the awaited datum into the data latch
570 \item load a literal into the data latch
571 \item load a literal into the path latch
572 \item send the value in the data latch along the path in the path latch
573 \item send a token along the path in the path latch
574 \item set the count register to a literal value
575 \item decrement the count register
576 \item requeue this instruction if {\tt count>0}
577 \item requeue this instruction unconditionally
578 \item repeat this instruction if {\tt count>0}
579 \item repeat this instruction unconditionally
582 The crucial change here is the decoupling of the act of {\it loading
583 the count register} from the act of {\it loading the next
584 instruction}. It also separates the act of loading the ``path
585 latch'' from the act of actually performing the transmission. This
586 latter feature makes it possible to load the data and destination
587 latches from two distinct data items, allowing ships to create,
588 handle, and consume {\it packets} in the form of a pair of data items.
590 At this point, the remaining decisions are simply a question of
591 instruction bit budgeting. Currently we have a separate instruction
592 form for literals, so that the literal can use bits which are not
593 otherwise relevant to literal instructions (for example, {\tt Di}).
594 Adding an additional instruction form for loading the count register
595 would have similar advantages.
598 \subsection{For Just a Little Bit More...}
600 Compared to early revisions of the Fleet architecture, the valves of
601 FleetTwo are substantially more powerful. One might ask, why not add
602 additional power to the valves, perhaps up to the point of making them
603 tiny von Neumann machines in their own right? Indeed, doing so would
604 risk reliving the {\it Wheel of Reincarnation}
605 \cite{WheelOfReincarnation}.
607 I believe, however, that there is a fundamental distinction to be
608 made between devices whose control flow {\it can branch} and those
609 whose control flow {\it cannot branch}, and that this distinction is
610 founded in the geometry of VLSI, and indeed, Euclidean space.
611 Ultimately, the instruction stream for any component of a processor
612 must exist in some physical incarnation -- some geometric extent. If
613 the control flow of a program is linear, with no branching, then the
614 instruction stream has an optimal spatial mapping which is
615 efficient: simply lay it out as a FIFO.
617 By contrast, however, a program whose instruction stream has
618 unrestricted branching is logically a {\it tree} structure.
619 Unfortunately, however, general trees have no efficient mapping onto
620 two-dimensional surfaces. By this, I mean that in every possible
621 mapping, there exists some pair of {\it logically adjacent}
622 instructions whose {\it physical distance} is proportional to the size
623 of the entire program, rather than some constant factor irrespective
624 of the program size. Therefore I conclude that the spatial mapping of
625 a program with unrestricted branching is in an entirely different
626 class than that of programs with linear control flow, and I choose this
627 distinction as the basis for limiting the valves to linear control
628 flow sequences. A similar argument can be made for looping constructs
629 in which two or more loops may be nested within a single parent loop.
631 With this distinction in mind, it would not be objectionable for
632 future incarnations of Fleet to have {\it predicated} instructions at
633 the valves. This might take the form of:
636 \item Instructions to unconditionally set some ``state flags''
637 \item Instructions to conditionally set some ``state flags'' based on the contents of the data latch
638 \item Bits which cause an instruction to be ignored or retired based on the state flags
641 I feel that this addition would not risk falling down the slippery
642 slope -- instruction streams with predicated instructions (but not
643 unrestricted branching or nested looping) preserve the crucial
644 property that they have an efficient spatial mapping. Any steps
645 beyond this, however, cross a fairly fundamental line.
648 \addcontentsline{toc}{section}{Ship Descriptions}