1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
15 \bibliographystyle{alpha}
16 \pagestyle{fancyplain}
18 \definecolor{light}{gray}{0.7}
20 \newcommand{\footnoteremember}[2]{
23 \setcounter{#1}{\value{footnote}}
24 } \newcommand{\footnoterecall}[1]{
25 \footnotemark[\value{#1}]
33 %\oddsidemargin 0.25in
34 %\evensidemargin 0.25in
36 \def\to{\ $\rightarrow$\ }
46 \title{\vspace{-1cm}The FleetTwo Dock}
59 & added ``if its predicate is true'' to repeat count \\
62 & renamed loop+repeat to outer+inner (not in red) \\
63 & renamed {\tt Z} flag to {\tt L} flag (not in red) \\
64 & rewrote ``inner and outer loops'' section \\
65 & updated all diagrams \\
68 & Moved address bits to the LSB-side of a 37-bit instruction \\
69 & Added {\it micro-instruction} and {\it composite instruction} terms \\
70 & Removed the {\tt DL} field, added {\tt decrement} mode to {\tt loop} \\
71 & Created the {\tt Hold} field \\
72 & Changed how ReLooping works \\
73 & Removed {\tt clog}, {\tt unclog}, {\tt interrupt}, and {\tt massacre} \\
80 \epsfig{file=overview,width=1.5in}
81 \epsfig{file=indock,width=3in}
86 \section{Overview of Fleet}
88 A Fleet processor consists of a {\it switch fabric} with several
89 functional units called {\it ships} connected to it. At each
90 connection between a ship and the switch fabric lies a programmable
91 element known as the {\it dock}.
93 A {\it path} specifies a route through the switch fabric from a
94 particular {\it source} to a particular {\it destination}. The
95 combination of a path and a single word {\it payload} is called a {\it packet}. The
96 switch fabric carries packets from their sources to their
97 destinations. Each dock has two destinations: one for {\it
98 instructions} and one for {\it data}. A Fleet is programmed by
99 depositing packets into the switch fabric; these packets' paths lead
100 them to the instruction destinations of the docks.
102 When a packet arrives at the instruction destination of a dock, it is
103 enqueued for execution. Before the instruction executes, it may cause
104 the dock to wait for a packet to arrive at the dock's data destination
105 or for a value to be presented by the ship. It may present a data
106 value to the ship or transmit it for transmission to some other
109 When an instruction sends a packet into the switch fabric, it may
110 specify that the payload of the packet is irrelevant. Such packets
111 are known as {\it tokens}, and consume less energy than data packets.
112 From a programmer's perspective, a token packet is indistinguishable
113 from a data packet with a unknown payload.
116 \epsfig{file=overview,width=3in}\\
117 {\it Overview of a Fleet processor; gray shading represents a
118 packet-switched network fabric; blue lines carry data, red lines
125 \section{The FleetTwo Pump}
127 The diagram below represents a {\it programmer's} conceptual view of
128 the interface between ships and the switch fabric. Actual
129 implementation circuitry may differ substantially. Sources and
130 destinations that can send and receive only tokens -- not data items
131 -- are drawn as dashed lines.
134 \epsfig{file=indock,width=3.5in}\\
135 {\it an ``input'' dock}
137 \epsfig{file=outdock,width=3.5in}\\
138 {\it an ``output'' dock}
141 The term {\it port} refers to an interface to the ship, the {\it
142 dock} connecting it to the switch fabric, and the corresponding
143 sources and destinations on the switch fabric.
145 Each dock consists of a {\it data latch}, which is as wide as a single
146 machine word and a {\it pump}, which is a circular fifo of
147 instruction-width latches. The values in the pump control the data
150 Note that the pump in each dock has a destination of its own; this is
151 the {\it instruction destination} mentioned in the previous section.
152 Note that unlike all other destinations, there is no buffering fifo
153 guarding this one. The size of these fifos are exposed to the
154 software programmer so he can avoid deadlock.
157 \section{Instructions}
159 In order to cause an instruction to execute, the programmer must first
160 cause that instruction word to arrive in the data latch of some output
161 dock. For example, this might be the ``data read'' output dock of the
162 memory access ship or the output of a fifo ship. Once an instruction
163 has arrived at this output dock, it is {\it dispatched} by sending it
164 to the {\it instruction port} of the dock at which it is to execute.
166 Each instruction is 26 bits long, which makes it possible for an
167 instruction and an 11-bit path to fit in a single word of memory.
168 This path is the path from the {\it dispatching} dock to the {\it
171 \setlength{\bitwidth}{3.5mm}
173 \begin{bytefield}{37}
174 \bitheader[b]{0,10,11,36}\\
175 \bitbox{26}{instruction}
176 \bitbox{11}{dispatch path}
179 {\bf Note:} the instruction encodings below are simply ``something to
180 shoot at'' and a sanity check to make sure we haven't overrun our bit
181 budget. The final instruction encodings will probably be
184 All instruction words have the following format:
186 \setlength{\bitwidth}{3.5mm}
188 \begin{bytefield}{37}
189 \bitheader[b]{0,10,11,36}\\
196 \bitbox{11}{dispatch path}
200 Each instruction word is called a {\it micro instruction}.
201 Collections of one or more micro instruction are known as {\it
202 composite instructions}.
204 The {\tt A} bit stands for {\tt Armor}\footnote{this is to be
205 pronounced with a Boston accent (``AAHH-mir'')}.
206 The {\tt OL} bit indicates whether or not this instruction is part of
207 an outer loop. Both of the preceding bits are explained in the next section.
211 The abbreviation {\tt P} stands for {\it predicate}; this is a two-bit
212 code that indicates if the instruction should be executed or ignored.
217 \subsection{Life Cycle of an Instruction}
219 The diagram below shows an input dock for purposes of illustration
220 (behavior at an output dock is identical).
223 \epsfig{file=indock,width=3in}\\
227 Note the circle on the path between ``instr horn'' and ``instr fifo'';
228 this is known as ``the hatch''. The hatch has two states: sealed and
229 unsealed. When the machine powers up, the hatch is unsealed; it is
230 sealed by the {\tt tail} instruction and unsealed as described below.
232 When a non-{\tt torpedo} instruction arrives at the instruction horn,
233 it waits there until the hatch is in the unsealed state. The
234 instruction then enters the instruction fifo.
236 When an instruction emerges from the instruction fifo, it arrives at
237 the ``on deck'' stage and starts two processes {\it
243 If the instruction has the {\tt OL} bit set and the value of {\tt OC}
244 is nonzero, this process will wait for the hatch to be {\it sealed}
245 and then enqueue a duplicate copy of the instruction into the instruction fifo.
249 If the value of {\tt OC} is zero and the instruction at on-deck is
250 {\it not} an {\tt setOuter} instruction whose {\tt OL} bit is cleared,
251 this process will unseal the hatch, set the {\it inner} loop counter
252 to zero, and terminate.
254 Otherwise, the instruction will execute one or more times, as
255 determined by the flags, predicate, and inner loop counter (see
256 below). If the instruction's {\tt A}rmor bit is not
257 set\footnote{note: we need to say something about only {\tt send}
258 instructions being {\tt torpedo}able}, each execution attempt will
259 be arbitrated against the arrival of a {\tt torpedo} instruction at
260 the instruction horn. If the {\tt torpedo} wins, the {\it outer} loop
261 counter is set to zero and this process terminates immediately.
264 When both processes have completed, the on deck stage is vacated and
265 another instruction may enter it.
267 Note that when a {\tt torpedo} arrives at the instruction horn it will
268 wait there until on deck is occupied by an instruction whose {\tt
269 A}rmor bit is {\it not set}, but it does {\it not} wait for the
270 hatch to be unsealed.
274 \subsection{Inner and Outer Loops}
278 Using the mechanisms described above, a programmer can perform two
279 types of loops: {\it inner} loops of only one instruction and {\it
280 outer} loops of multiple instructions. Inner loops may be nested
281 within an outer loop, but no other nesting of loops is allowed. The
282 paths used by inner loops and outer loops are shown below:
285 \begin{minipage}{2in}
287 \epsfig{file=inner-loop,width=2in}\\
288 {\it inner loop (in red)}
291 \begin{minipage}{2in}
293 \epsfig{file=outer-loop,width=2in}\\
294 {\it outer loop (in red)}
299 Each type of loop has a counter associated with it: the {\tt IC}
300 counter for inner loops and the {\tt OC} counter for outer loops. The
301 inner loop counter applies only to {\tt send} instructions; all other
302 instructions ignore the inner loop counter. When a {\tt send}
303 instruction reaches the on deck position,
305 if its predicate is true
308 will execute at least once; the number of times it executes after that
309 is determined by the inner loop counter.
311 The outer loop counter applies to all instructions {\it except} the
312 instruction {\tt setOuter} with {\tt OL=0}, because such instructions
313 are needed to reset the outer loop counter after it becomes zero.
314 However, {\tt setOuter} with {\tt OL} set to {\it one} is useful for
315 resetting the loop counter in the middle of the execution of a loop.
321 The pump has four flags: {\tt A}, {\tt B}, {\tt L}, and {\tt S}. Of
322 these four, only the first two may be modified directly by
326 \item The {\tt A} and {\tt B} flags are general-purpose flags which
327 may be set and cleared by the programmer.
331 The {\tt L} flag, known as the {\it last} flag, is set whenever
332 the value in the outer counter ({\tt OC}) is one,
335 that the dock is in the midst of the last iteration of an
336 outer loop. This flag can be used to perform certain
337 operations (such as sending a completion token) only on the last
338 iteration of an outer loop.
340 \item The {\tt S} flag, known as the {\it summary} flag. Its value is
341 determined by the ship, but unless stated otherwise, it should
342 be assumed that whenever the 37th bit of the data ({\tt D})
343 latch is loaded, that same bit is also loaded into the {\tt S}
344 flag. This lets the ship make decisions based on whether or not
345 the top bit of the data latch is set; if two's complement
346 numbers are in use, this will indicate whether or not the
347 latched value is negative.
350 Many instruction fields are specified as two-bit {\it predicates}.
351 These fields contain one of four values, indicating if an action
352 should be taken unconditionally or conditionally on one of the {\tt A}
356 \item {\tt 00:} if {\tt A} is set
357 \item {\tt 10:} if {\tt B} is set
358 \item {\tt 01:} if {\tt L} is set ({\tt OC=1})
359 \item {\tt 11:} always
364 \subsection{{\tt send} (variants: {\tt sendto}, {\tt dispatch})}
366 \setlength{\bitwidth}{5mm}
368 \begin{bytefield}{26}
369 \bitheader[b]{12-16,19,21}\\
387 %\begin{bytefield}{26}
388 % \bitheader[b]{12-18}\\
389 % \bitbox[]{8}{\raggedleft Input Dock:}
396 %\begin{bytefield}{26}
397 % \bitheader[b]{12-18}\\
398 % \bitbox[]{8}{\raggedleft Output Dock:}
405 \begin{bytefield}{26}
406 \bitheader[b]{0,10,11}\\
407 \bitbox[1]{13}{\raggedleft {\tt sendto} ({\tt LiteralPath\to Path})}
410 \bitbox{11}{\tt LiteralPath}
413 \begin{bytefield}{26}
414 \bitheader[b]{10,11}\\
415 \bitbox[1]{13}{\raggedleft {\tt dispatch} ({\tt DP[37:27]\to Path})\ \ }
424 \begin{bytefield}{26}
425 \bitheader[b]{10,11}\\
426 \bitbox[1]{13}{\raggedleft {\tt send} ({\tt Path} unchanged):}
436 \item {\tt Ti} - Token Input: wait for the token predecessor to be full and drain it.
437 \item {\tt Di} - Data Input: wait for the data predecessor to be full and drain it.
438 \item {\tt Dc} - Data Capture: pulse the data latch.
439 \item {\tt Do} - Data Output: fill the data successor.
440 \item {\tt To} - Token Output: fill the token successor.
443 The data successor and token successor must both be empty in order for
444 a {\tt send} instruction to attempt execution.
446 The inner loop counter can hold a number {\tt 0..MAX} or a special
447 value $\infty$. If {\tt IC} is nonzero after execution of a {\tt
448 send} instruction, the instruction will execute again, and {\tt IC}
449 will be latched with {\tt (IC==$\infty$?$\infty$:max(IC-1, 0))}. When
450 the inner loop counter reaches zero, the instruction ceases executing.
454 \subsection{{\tt data}, {\tt datahi}, {\tt datalo}}
456 These instructions load part or all of the data latch ({\tt D}).
458 {\tt datahi: Literal[18:1]\to D[37:20]} (and {\tt Literal[18]\to S})
460 \setlength{\bitwidth}{5mm}
462 \begin{bytefield}{26}
463 \bitheader[b]{0,18,19,21}\\
477 {\tt datalo: Literal[19:1]\to D[19:1]}
479 \setlength{\bitwidth}{5mm}
481 \begin{bytefield}{26}
482 \bitheader[b]{0,18,19,21}\\
495 \setlength{\bitwidth}{5mm}
497 \begin{bytefield}{26}
498 \bitheader[b]{0,18,19,21}\\
510 \begin{tabular}{|r|c|c|c|}\hline
511 sel & D[37:20] & D[19:1] \\\hline
512 00 & Literal[18:1] & all 0 \\
513 01 & Literal[18:1] & all 1 \\
514 10 & all 0 & Literal[19:1] \\
515 11 & all 1 & Literal[19:1] \\
522 \subsection{{\tt flags}}
524 \setlength{\bitwidth}{5mm}
526 \begin{bytefield}{26}
527 \bitheader[b]{0,7,8,15,16-19,21}\\
541 The {\tt P} field is a predicate; if it does not hold, the instruction
542 is ignored. Otherwise the two flags ({\tt A} and {\tt B}) are updated
543 according to the {\tt nextA} and {\tt nextB} fields; each specifies
544 the new value as the logical {\tt OR} of zero or more inputs:
550 \bitbox{1}{${\text{\tt A}}$}
551 \bitbox{1}{$\overline{\text{\tt A}}$}
552 \bitbox{1}{${\text{\tt B}}$}
553 \bitbox{1}{$\overline{\text{\tt B}}$}
554 \bitbox{1}{${\text{\tt S}}$}
555 \bitbox{1}{$\overline{\text{\tt S}}$}
556 \bitbox{1}{${\text{\tt L}}$}
557 \bitbox{1}{$\overline{\text{\tt L}}$}
561 Each bit corresponds to one possible input; all inputs whose bits are
562 set are {\tt OR}ed together, and the resulting value is assigned to
563 the flag. Note that if none of the bits are set, the value assigned
564 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
565 OR}ing any flag with its complement.
570 \subsection{{\tt setInner}}
572 This instruction loads the inner loop counter with either a literal
573 number, the special value $\infty$, or the contents of the {\tt data}
576 \setlength{\bitwidth}{5mm}
578 \begin{bytefield}{26}
579 \bitheader[b]{16-19,21}\\
594 \begin{bytefield}{26}
595 \bitbox[r]{18}{\raggedleft from data latch:\hspace{0.2cm}\ }
602 \begin{bytefield}{26}
603 \bitheader[b]{0,5,6,7}\\
604 \bitbox[r]{18}{\raggedleft from literal:\hspace{0.2cm}\ }
606 \bitbox{6}{\tt Literal}
609 \begin{bytefield}{26}
610 \bitheader[b]{0,5,6,7}\\
611 \bitbox[r]{18}{\raggedleft with $\infty$\ \ }
619 \subsection{{\tt setOuter}}
621 This instruction loads the outer loop counter {\tt OC} with either
622 {\tt max(0,OC-1)}, a literal or the contents of the {\tt data}
625 \setlength{\bitwidth}{5mm}
627 \begin{bytefield}{26}
628 \bitheader[b]{16-19,21,24}\\
644 \begin{bytefield}{26}
645 \bitbox[r]{19}{\raggedleft {\tt max(0,OC-1)}:\hspace{0.2cm}\ }
653 \begin{bytefield}{26}
654 \bitbox[r]{19}{\raggedleft from data latch:\hspace{0.2cm}\ }
661 \begin{bytefield}{26}
662 \bitheader[b]{0,5,6}\\
663 \bitbox[r]{19}{\raggedleft from literal:\hspace{0.2cm}\ }
665 \bitbox{6}{\tt Literal}
669 \subsection{{\tt takeOuterLoopCounter}}
671 \setlength{\bitwidth}{5mm}
673 \begin{bytefield}{26}
674 \bitheader[b]{16-19,21}\\
688 This instruction copies the value in the outer loop counter {\tt OC}
689 into the least significant bits of the data latch and leaves all other
690 bits of the data latch unchanged.
692 \subsection{{\tt takeInnerLoopCounter}}
694 \setlength{\bitwidth}{5mm}
696 \begin{bytefield}{26}
697 \bitheader[b]{16-19,21}\\
711 This instruction copies the value in the inner loop counter {\tt IC}
712 into the least significant bits of the data latch and leaves all other
713 bits of the data latch unchanged.
716 \subsection{{\tt torpedo}}
718 \setlength{\bitwidth}{5mm}
720 \begin{bytefield}{26}
721 \bitheader[b]{0,5,16-19,21}\\
733 When a {\tt torpedo} instruction reaches the instruction horn, it will
734 wait there until an instruction is on deck whose {\tt A}rmor bit is
735 not set. The {\tt torpedo} will then cause ``Process \#2'' of the on
736 deck instruction to terminate and will set the outer loop counter to zero.
738 \subsection{{\tt tail}}
740 \setlength{\bitwidth}{5mm}
742 \begin{bytefield}{26}
743 \bitheader[b]{0,5,16-19,21}\\
754 When a {\tt tail} instruction reaches {\tt IH}, it seals the hatch.
755 The {\tt tail} instruction does not enter the instruction fifo.
761 %\subsection{{\tt interrupt}}
763 %\setlength{\bitwidth}{5mm}
765 %\begin{bytefield}{26}
766 % \bitheader[b]{0,5,16-19,21}\\
777 %When an {\tt interrupt} instruction reaches {\tt IH}, it will wait
778 %there for the {\tt OD} stage to be full with an instruction that has
779 %the {\tt IM} bit set. When this occurs, the instruction at {\tt OD}
780 %{\it will not execute}, but {\it may reloop} if the conditions for
782 %\footnote{The ability to interrupt an instruction yet have it reloop is very
783 %useful for processing chunks of data with a fixed size header and/or
784 %footer and a variable length body.}
787 %\subsection{{\tt massacre}}
789 %\setlength{\bitwidth}{5mm}
791 %\begin{bytefield}{26}
792 % \bitheader[b]{16-19,21}\\
804 %When a {\tt massacre} instruction reaches {\tt IH}, it will wait there
805 %for the {\tt OD} stage to be full with an instruction that has the
806 %{\tt IM} bit set. When this occurs, all instructions in the
807 %instruction fifo (including {\tt OD}) are retired.
809 %\subsection{{\tt clog}}
811 %\setlength{\bitwidth}{5mm}
813 %\begin{bytefield}{26}
814 % \bitheader[b]{16-19,21}\\
826 %When a {\tt clog} instruction reaches {\tt OD}, it remains there and
827 %no more instructions will be executed until an {\tt unclog} is
830 %\subsection{{\tt unclog}}
832 %\setlength{\bitwidth}{5mm}
834 %\begin{bytefield}{26}
835 % \bitheader[b]{16-19,21}\\
841 % \bitbox[lrtb]{2}{11}
847 %When an {\tt unclog} instruction reaches {\tt IH}, it will wait there
848 %until a {\tt clog} instruction is at {\tt OD}. When this occurs, both
849 %instructions retire.
851 %Note that issuing an {\tt unclog} instruction to a dock which is not
852 %clogged and whose instruction fifo contains no {\tt clog} instructions
853 %will cause the dock to deadlock.
858 \epsfig{file=overview,height=5in,angle=90}
861 \subsection*{Input Dock}
862 \epsfig{file=indock,width=7in,angle=90}
865 \subsection*{Output Dock}
866 \epsfig{file=outdock,width=6.5in,angle=90}
870 %\epsfig{file=ports,height=5in,angle=90}
873 %\epsfig{file=best,height=5in,angle=90}