1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
18 \bibliographystyle{alpha}
19 \pagestyle{fancyplain}
21 \definecolor{light}{gray}{0.7}
23 \newcommand{\footnoteremember}[2]{
26 \setcounter{#1}{\value{footnote}}
27 } \newcommand{\footnoterecall}[1]{
28 \footnotemark[\value{#1}]
36 %\oddsidemargin 0.25in
37 %\evensidemargin 0.25in
39 \def\to{\ $\rightarrow$\ }
49 \title{\vspace{-1cm}The FleetTwo Dock}
62 & noted that {\tt setFlags} can be used as {\tt nop} \\
64 & removed the {\tt L} flag (epilogues can now do this) \\
65 & removed {\tt take\{Inner|Outer\}LoopCounter} instructions \\
66 & renamed {\tt data} instruction to {\tt literal} \\
67 & renamed {\tt send} instruction to {\tt move} \\
69 & added ``if its predicate is true'' to repeat count \\
70 & added note that red wires do not contact ships \\
71 & changed name of {\tt flags} instruction to {\tt setFlags} \\
72 & removed black dot from diagrams \\
73 & changed {\tt OL} (Outer Loop participant) to {\tt OS} (One Shot) and inverted polarity \\
74 & indicated that the death of the {\tt tail} instruction is what causes the hatch to be unsealed \\
75 & indicated that only {\tt send} instructions which wait for data are torpedoable \\
76 & added section ``Torpedo Details'' \\
77 & removed {\tt torpedo} instruction \\
80 & renamed loop+repeat to outer+inner (not in red) \\
81 & renamed {\tt Z} flag to {\tt L} flag (not in red) \\
82 & rewrote ``inner and outer loops'' section \\
83 & updated all diagrams \\
86 & Moved address bits to the LSB-side of a 37-bit instruction \\
87 & Added {\it micro-instruction} and {\it composite instruction} terms \\
88 & Removed the {\tt DL} field, added {\tt decrement} mode to {\tt loop} \\
89 & Created the {\tt Hold} field \\
90 & Changed how ReLooping works \\
91 & Removed {\tt clog}, {\tt unclog}, {\tt interrupt}, and {\tt massacre} \\
98 \epsfig{file=overview,width=1.5in}
99 \epsfig{file=indock,width=3in}
104 \section{Overview of Fleet}
106 A Fleet processor consists of a {\it switch fabric} with several
107 functional units called {\it ships} connected to it. At each
108 connection between a ship and the switch fabric lies a programmable
109 element known as the {\it dock}.
111 A {\it path} specifies a route through the switch fabric from a
112 particular {\it source} to a particular {\it destination}. The
113 combination of a path and a single word {\it payload} is called a {\it packet}. The
114 switch fabric carries packets from their sources to their
115 destinations. Each dock has two destinations: one for {\it
116 instructions} and one for {\it data}. A Fleet is programmed by
117 depositing packets into the switch fabric; these packets' paths lead
118 them to the instruction destinations of the docks.
120 When a packet arrives at the instruction destination of a dock, it is
121 enqueued for execution. Before the instruction executes, it may cause
122 the dock to wait for a packet to arrive at the dock's data destination
123 or for a value to be presented by the ship. It may present a data
124 value to the ship or transmit it for transmission to some other
127 When an instruction sends a packet into the switch fabric, it may
128 specify that the payload of the packet is irrelevant. Such packets
129 are known as {\it tokens}, and consume less energy than data packets.
130 From a programmer's perspective, a token packet is indistinguishable
131 from a data packet with a unknown payload.
133 In the diagram below, the red wires carry instructions and the blue
134 wires carry data; the switch fabric (gray area) carries both. Notice
135 that the red (instruction) wires do not contact the ships. This is an
136 advantage: ships are designed without any consideration for the
137 instructions used to program their docks.
140 \epsfig{file=overview,width=2.5in}\\
141 {\it Overview of a Fleet processor; gray shading represents a
142 packet-switched network fabric; blue lines carry data, red lines
149 \section{The FleetTwo Pump}
151 The diagram below represents a {\it programmer's} conceptual view of
152 the interface between ships and the switch fabric. Actual
153 implementation circuitry may differ substantially. Sources and
154 destinations that can send and receive only tokens -- not data items
155 -- are drawn as dashed lines.
158 \epsfig{file=indock,width=3.5in}\\
159 {\it an ``input'' dock}
161 \epsfig{file=outdock,width=3.5in}\\
162 {\it an ``output'' dock}
165 The term {\it port} refers to an interface to the ship, the {\it
166 dock} connecting it to the switch fabric, and the corresponding
167 sources and destinations on the switch fabric.
169 Each dock consists of a {\it data latch}, which is as wide as a single
170 machine word and a {\it pump}, which is a circular fifo of
171 instruction-width latches. The values in the pump control the data
174 Note that the pump in each dock has a destination of its own; this is
175 the {\it instruction destination} mentioned in the previous section.
176 Note that unlike all other destinations, there is no buffering fifo
177 guarding this one. The size of these fifos are exposed to the
178 software programmer so he can avoid deadlock.
181 \section{Instructions}
183 In order to cause an instruction to execute, the programmer must first
184 cause that instruction word to arrive in the data latch of some output
185 dock. For example, this might be the ``data read'' output dock of the
186 memory access ship or the output of a fifo ship. Once an instruction
187 has arrived at this output dock, it is {\it dispatched} by sending it
188 to the {\it instruction port} of the dock at which it is to execute.
190 Each instruction is 26 bits long, which makes it possible for an
191 instruction and an 11-bit path to fit in a single word of memory.
192 This path is the path from the {\it dispatching} dock to the {\it
195 \setlength{\bitwidth}{3.5mm}
197 \begin{bytefield}{37}
198 \bitheader[b]{0,10,11,36}\\
199 \bitbox{26}{instruction}
200 \bitbox{11}{dispatch path}
203 {\bf Note:} the instruction encodings below are simply ``something to
204 shoot at'' and a sanity check to make sure we haven't overrun our bit
205 budget. The final instruction encodings will probably be
208 All instruction words have the following format:
210 \setlength{\bitwidth}{3.5mm}
212 \begin{bytefield}{37}
213 \bitheader[b]{0,10,11,36}\\
220 \bitbox{11}{dispatch path}
224 Each instruction word is called a {\it micro instruction}.
225 Collections of one or more micro instruction are known as {\it
226 composite instructions}.
228 The {\tt I} bit stands for {\tt Interruptible}. The {\tt OS} (``One
229 Shot'') bit indicates whether or not this instruction is part of an
230 outer loop. Both of the preceding bits are explained in the next
235 The abbreviation {\tt P} stands for {\it predicate}; this is a two-bit
236 code that indicates if the instruction should be executed or ignored.
241 \subsection{Life Cycle of an Instruction}
243 The diagram below shows an input dock for purposes of illustration
244 (behavior at an output dock is identical).
247 \epsfig{file=indock,width=3in}\\
251 Note the circle on the path between ``instr horn'' and ``instr fifo'';
252 this is known as ``the hatch''. The hatch has two states: sealed and
253 unsealed. When the machine powers up, the hatch is unsealed; it is
254 sealed by the {\tt tail} instruction and unsealed whenever the outer
255 loop counter is set to zero (for any reason\footnote{this
256 includes {\tt OLC} being decremented to zero, a {\tt setOuter} with
257 a literal field of zero, a {\tt setOuter} which copies a zero from
258 the data register to {\tt OLC}, or the occurrence of a
261 When an instruction arrives at the instruction horn, it waits there
262 until the hatch is in the unsealed state. The instruction then enters
263 the instruction fifo. When an instruction emerges from the
264 instruction fifo, it arrives at the ``on deck'' stage, where it may
267 \subsubsection{Inner and Outer Loops}
269 A programmer can perform two types of loops: {\it inner} loops of only
270 one micro-instruction and {\it outer} loops of multiple
271 micro-instructions. Inner loops may be nested within an outer loop,
272 but no other nesting of loops is allowed. The paths used by inner
273 loops and outer loops are shown below:
276 \begin{minipage}{2in}
278 \epsfig{file=inner-loop,width=2in}\\
279 {\it inner loop (in red)}
282 \begin{minipage}{2in}
284 \epsfig{file=outer-loop,width=2in}\\
285 {\it outer loop (in red)}
290 Each type of loop has a counter associated with it: the {\tt ILC}
291 counter for inner loops and the {\tt OLC} counter for outer loops.
292 The inner loop counter applies only to certain ``inner-looping''
293 instructions (see the table below for details). When such an
294 instruction reaches On Deck, if its predicate is true it will execute
295 a number of times equal to {\tt ILC+1}, and leave {\tt ILC=0} after
296 executing. Non-inner-looping instructions and instructions whose
297 predicate is false do not decrement {\tt ILC}.
299 The outer loop counter applies to all instructions {\it except} the
300 instruction {\tt setOuter} with {\tt OS=1}, because such instructions
301 are needed to reset the outer loop counter after it becomes zero.
302 However, predicated {\tt setOuter} with {\tt OS=0} is useful for
303 resetting the loop counter in the middle of the execution of a loop.
305 \subsubsection{On Deck}
307 The table below lists the actions which may be taken when an
308 instruction arrives on deck:
311 \def\side#1{\begin{sideways}\parbox{15mm}{#1}\end{sideways}}
312 \begin{tabular}{|r|ccccc|cccccc|}\hline
313 %&\multicolumn{10}{c}{Predicate}&\\
314 %&\multicolumn{10}{c}{True}&\\\hline
315 &\multicolumn{5}{c}{Outer-Looping} &\multicolumn{5}{c}{One-Shot}&\\
316 &\multicolumn{5}{c}{{\tt (OS=0)}} &\multicolumn{5}{c}{{\tt (OS=1)}}&\\
318 &\side{{\tt literal}}
319 &\side{{\tt setFlags}}
320 &\side{{\tt setInner}}
321 &\side{{\tt setOuter}}
323 &\side{{\tt literal}}
324 &\side{{\tt setFlags}}
325 &\side{{\tt setInner}}
326 &\side{{\tt setOuter}}
329 Wait for hatch sealed & + & + & + & + & + & & & & & & \\
330 Fill IF0 w/ copy of self & + & + & + & + & + & & & & & & \\\hline
331 Request arbiter & P+$\star$ & & & & & P+$\star$ & & & & & \\
332 Potentially torpedoable & P+$\star$ & & & & & P+$\star$ & & & & & \\\hline
333 Execute & P+ & P+& P+& P+& P+ & ? & ? & ? & ? & P & \\
334 Inner-looping & P+ & & & & ? & P+ & & & & ? & \\
338 \begin{tabular}{|r|l|}\hline
339 + & Only if {\tt OLC>0} (ie {\tt OLC} is positive) \\
340 P & Only if predicate is true \\
341 P+ & Only if predicate is true and {\tt OLC>0} \\
342 P+$\star$ & Only if predicate is true and {\tt OLC>0} and {\tt I=1} and one of {\tt Ti},{\tt Di},{\tt Do} true. \\
343 ? & to discuss \\\hline
347 \subsubsection{Torpedo}
349 There is a small fifo (not shown) before the latch marked
350 ``Instruction Horn''; after the {\tt tail} instruction seals the
351 hatch, any subsequent instructions will queue up in this fifo until
352 the hatch is unsealed. This is typically used as storage for a ``loop
353 epilogue'' -- a sequence of instructions to be executed after a
354 torpedo arrives or the outer loop counter expires.
356 Each dock has a fourth connection to the switch fabric (not shown),
357 called its {\it torpedo destination}. Anything (even a token) sent to
358 this destination is treated as a torpedo. Note that because this is a
359 distinct destination, instructions or data queued up in the other
360 destination fifos will not prevent a torpedo from occuring.
362 When a data item or token arrives at the torpedo destination, it lies
363 there in wait until On Deck holds a potentially torpedoable
364 instruction (see previous table). Once this is the case, the torpedo
365 causes the inner and outer loop counters to be set to zero (and
366 therefore also unseals the hatch).\footnote{it is unspecified whether
367 the torpedoed instruction is requeued or not; this may or may not
368 occur, nondeterministically. It is the programmer's responsibility
369 to ensure that the program behaves the same whether this happens or
370 not. We think that this will not matter in most situations.}
377 The pump has three flags: {\tt A}, {\tt B}, and {\tt S}.
380 \item The {\tt A} and {\tt B} flags are general-purpose flags which
381 may be set and cleared by the programmer.
385 % The {\tt L} flag, known as the {\it last} flag, is set whenever
386 % the value in the outer counter ({\tt OLC}) is one,
389 % that the dock is in the midst of the last iteration of an
390 % outer loop. This flag can be used to perform certain
391 % operations (such as sending a completion token) only on the last
392 % iteration of an outer loop.
394 \item The {\tt S} flag, known as the {\it summary} flag. Its value is
395 determined by the ship, but unless stated otherwise, it should
396 be assumed that whenever the 37th bit of the data ({\tt D})
397 latch is loaded, that same bit is also loaded into the {\tt S}
398 flag. This lets the ship make decisions based on whether or not
399 the top bit of the data latch is set; if two's complement
400 numbers are in use, this will indicate whether or not the
401 latched value is negative.
404 Many instruction fields are specified as two-bit {\it predicates}.
405 These fields contain one of four values, indicating if an action
406 should be taken unconditionally or conditionally on one of the {\tt A}
410 \item {\tt 00:} if {\tt A} is set
411 \item {\tt 10:} if {\tt B} is set
413 \item {\tt 11:} always
418 \section{Instructions}
420 Here is a list of the instructions supported by the dock:
423 \begin{tabular}{|l|}\hline
424 {\tt move} (variants: {\tt moveto}, {\tt dispatch}) \\
425 {\tt literal} (variants: {\tt literalhi}, {\tt literallo})\\
434 {\tt tail} {\it will probably become a bit on every instruction rather than
440 \subsection{{\tt move} (variants: {\tt moveto}, {\tt dispatch})}
442 \setlength{\bitwidth}{5mm}
444 \begin{bytefield}{26}
445 \bitheader[b]{12-16,19,21}\\
463 %\begin{bytefield}{26}
464 % \bitheader[b]{12-18}\\
465 % \bitbox[]{8}{\raggedleft Input Dock:}
472 %\begin{bytefield}{26}
473 % \bitheader[b]{12-18}\\
474 % \bitbox[]{8}{\raggedleft Output Dock:}
481 \begin{bytefield}{26}
482 \bitheader[b]{0,10,11}\\
483 \bitbox[1]{13}{\raggedleft {\tt moveto} ({\tt LiteralPath\to Path})}
486 \bitbox{11}{\tt LiteralPath}
489 \begin{bytefield}{26}
490 \bitheader[b]{10,11}\\
491 \bitbox[1]{13}{\raggedleft {\tt dispatch} ({\tt DP[37:27]\to Path})\ \ }
500 \begin{bytefield}{26}
501 \bitheader[b]{10,11}\\
502 \bitbox[1]{13}{\raggedleft {\tt move} ({\tt Path} unchanged):}
512 \item {\tt Ti} - Token Input: wait for the token predecessor to be full and drain it.
513 \item {\tt Di} - Data Input: wait for the data predecessor to be full and drain it.
514 \item {\tt Dc} - Data Capture: pulse the data latch.
515 \item {\tt Do} - Data Output: fill the data successor.
516 \item {\tt To} - Token Output: fill the token successor.
519 The data successor and token successor must both be empty in order for
520 a {\tt move} instruction to attempt execution.
522 The inner loop counter can hold a number {\tt 0..MAX} or a special
523 value $\infty$. If {\tt ILC} is nonzero after execution of a {\tt
524 move} instruction, the instruction will execute again, and {\tt ILC}
525 will be latched with {\tt (ILC==$\infty$?$\infty$:max(ILC-1, 0))}. When
526 the inner loop counter reaches zero, the instruction ceases executing.
530 \subsection{{\tt literal}, {\tt literalhi}, {\tt literallo}}
532 These instructions load part or all of the data latch ({\tt D}).
534 {\tt literalhi: Literal[18:1]\to D[37:20]} (and {\tt Literal[18]\to S})
536 \setlength{\bitwidth}{5mm}
538 \begin{bytefield}{26}
539 \bitheader[b]{0,18,19,21}\\
553 {\tt literallo: Literal[19:1]\to D[19:1]}
555 \setlength{\bitwidth}{5mm}
557 \begin{bytefield}{26}
558 \bitheader[b]{0,18,19,21}\\
571 \setlength{\bitwidth}{5mm}
573 \begin{bytefield}{26}
574 \bitheader[b]{0,18,19,21}\\
586 \begin{tabular}{|r|c|c|c|}\hline
587 sel & D[37:20] & D[19:1] \\\hline
588 00 & Literal[18:1] & all 0 \\
589 01 & Literal[18:1] & all 1 \\
590 10 & all 0 & Literal[19:1] \\
591 11 & all 1 & Literal[19:1] \\
598 \subsection{{\tt setFlags}}
600 \setlength{\bitwidth}{5mm}
602 \begin{bytefield}{26}
603 \bitheader[b]{0,7,8,15,16-19,21}\\
616 The {\tt P} field is a predicate; if it does not hold, the instruction
617 is ignored. Otherwise the flags are updated according to the {\tt
618 nextA}, {\tt nextB}, and {\tt nextS} fields; each specifies the new
619 value as the logical {\tt OR} of zero or more inputs:
625 \bitbox{1}{${\text{\tt A}}$}
626 \bitbox{1}{$\overline{\text{\tt A}}$}
627 \bitbox{1}{${\text{\tt B}}$}
628 \bitbox{1}{$\overline{\text{\tt B}}$}
629 \bitbox{1}{${\text{\tt S}}$}
630 \bitbox{1}{$\overline{\text{\tt S}}$}
634 Each bit corresponds to one possible input; all inputs whose bits are
635 set are {\tt OR}ed together, and the resulting value is assigned to
636 the flag. Note that if none of the bits are set, the value assigned
637 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
638 OR}ing any flag with its complement.
640 Note that {\tt setFlags} can be used to create a {\tt nop} (no-op) by
641 setting each flag to itself.
647 \subsection{{\tt setInner}}
649 This instruction loads the inner loop counter with either a literal
650 number, the special value $\infty$, or the contents of the {\tt data}
653 \setlength{\bitwidth}{5mm}
655 \begin{bytefield}{26}
656 \bitheader[b]{16-19,21}\\
671 \begin{bytefield}{26}
672 \bitbox[r]{18}{\raggedleft from data latch:\hspace{0.2cm}\ }
679 \begin{bytefield}{26}
680 \bitheader[b]{0,5,6,7}\\
681 \bitbox[r]{18}{\raggedleft from literal:\hspace{0.2cm}\ }
683 \bitbox{6}{\tt Literal}
686 \begin{bytefield}{26}
687 \bitheader[b]{0,5,6,7}\\
688 \bitbox[r]{18}{\raggedleft with $\infty$\ \ }
696 \subsection{{\tt setOuter}}
698 This instruction loads the outer loop counter {\tt OLC} with either
699 {\tt max(0,OLC-1)}, a literal or the contents of the {\tt data}
702 \setlength{\bitwidth}{5mm}
704 \begin{bytefield}{26}
705 \bitheader[b]{16-19,21,24}\\
721 \begin{bytefield}{26}
722 \bitbox[r]{19}{\raggedleft {\tt max(0,OLC-1)}:\hspace{0.2cm}\ }
730 \begin{bytefield}{26}
731 \bitbox[r]{19}{\raggedleft from data latch:\hspace{0.2cm}\ }
738 \begin{bytefield}{26}
739 \bitheader[b]{0,5,6}\\
740 \bitbox[r]{19}{\raggedleft from literal:\hspace{0.2cm}\ }
742 \bitbox{6}{\tt Literal}
746 %\subsection{{\tt torpedo}}
748 %\setlength{\bitwidth}{5mm}
750 %\begin{bytefield}{26}
751 % \bitheader[b]{0,5,16-19,21}\\
763 %When a {\tt torpedo} instruction reaches the instruction horn, it will
764 %wait there until an instruction is on deck whose {\tt A}rmor bit is
765 %not set. The {\tt torpedo} will then cause ``Process \#2'' of the on
766 %deck instruction to terminate and will set the outer loop counter to zero.
768 \subsection{{\tt tail}}
770 {\it This will probably become a bit on every instruction rather than
771 its own instruction. The only problem is that we have run out of bits in the {\tt literal} instruction. Two possible solutions: (a) declare that {\tt literal} cannot be the last instruction in a loop or (b) because {\tt literal} instructions cannot be torpedoed anyways, re-use its {\tt I} bit for this purpose.}
773 \setlength{\bitwidth}{5mm}
775 \begin{bytefield}{26}
776 \bitheader[b]{0,5,16-19,21}\\
787 When a {\tt tail} instruction reaches {\tt IH}, it seals the hatch.
788 The {\tt tail} instruction does not enter the instruction fifo.
792 %\subsection{{\tt takeOuterLoopCounter}}
794 %\setlength{\bitwidth}{5mm}
796 %\begin{bytefield}{26}
797 % \bitheader[b]{16-19,21}\\
811 %This instruction copies the value in the outer loop counter {\tt OLC}
812 %into the least significant bits of the data latch and leaves all other
813 %bits of the data latch unchanged.
815 %\subsection{{\tt takeInnerLoopCounter}}
817 %\setlength{\bitwidth}{5mm}
819 %\begin{bytefield}{26}
820 % \bitheader[b]{16-19,21}\\
834 %This instruction copies the value in the inner loop counter {\tt ILC}
835 %into the least significant bits of the data latch and leaves all other
836 %bits of the data latch unchanged.
841 %%\subsection{{\tt interrupt}}
843 %%\setlength{\bitwidth}{5mm}
845 %\begin{bytefield}{26}
846 % \bitheader[b]{0,5,16-19,21}\\
857 %When an {\tt interrupt} instruction reaches {\tt IH}, it will wait
858 %there for the {\tt OD} stage to be full with an instruction that has
859 %the {\tt IM} bit set. When this occurs, the instruction at {\tt OD}
860 %{\it will not execute}, but {\it may reloop} if the conditions for
862 %\footnote{The ability to interrupt an instruction yet have it reloop is very
863 %useful for processing chunks of data with a fixed size header and/or
864 %footer and a variable length body.}
867 %\subsection{{\tt massacre}}
869 %\setlength{\bitwidth}{5mm}
871 %\begin{bytefield}{26}
872 % \bitheader[b]{16-19,21}\\
884 %When a {\tt massacre} instruction reaches {\tt IH}, it will wait there
885 %for the {\tt OD} stage to be full with an instruction that has the
886 %{\tt IM} bit set. When this occurs, all instructions in the
887 %instruction fifo (including {\tt OD}) are retired.
889 %\subsection{{\tt clog}}
891 %\setlength{\bitwidth}{5mm}
893 %\begin{bytefield}{26}
894 % \bitheader[b]{16-19,21}\\
906 %When a {\tt clog} instruction reaches {\tt OD}, it remains there and
907 %no more instructions will be executed until an {\tt unclog} is
910 %\subsection{{\tt unclog}}
912 %\setlength{\bitwidth}{5mm}
914 %\begin{bytefield}{26}
915 % \bitheader[b]{16-19,21}\\
921 % \bitbox[lrtb]{2}{11}
927 %When an {\tt unclog} instruction reaches {\tt IH}, it will wait there
928 %until a {\tt clog} instruction is at {\tt OD}. When this occurs, both
929 %instructions retire.
931 %Note that issuing an {\tt unclog} instruction to a dock which is not
932 %clogged and whose instruction fifo contains no {\tt clog} instructions
933 %will cause the dock to deadlock.
938 \epsfig{file=overview,height=5in,angle=90}
941 \subsection*{Input Dock}
942 \epsfig{file=indock,width=7in,angle=90}
945 \subsection*{Output Dock}
946 \epsfig{file=outdock,width=6.5in,angle=90}
950 %\epsfig{file=ports,height=5in,angle=90}
953 %\epsfig{file=best,height=5in,angle=90}