1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
18 \bibliographystyle{alpha}
19 \pagestyle{fancyplain}
21 \definecolor{light}{gray}{0.7}
23 \newcommand{\footnoteremember}[2]{
26 \setcounter{#1}{\value{footnote}}
27 } \newcommand{\footnoterecall}[1]{
28 \footnotemark[\value{#1}]
36 %\oddsidemargin 0.25in
37 %\evensidemargin 0.25in
39 \def\to{\ $\rightarrow$\ }
49 \title{\vspace{-1cm}The FleetTwo Dock}
62 & removed the {\tt L} flag (epilogues can now do this) \\
63 & removed {\tt take\{Inner|Outer\}LoopCounter} instructions \\
64 & renamed {\tt data} instruction to {\tt literal} \\
65 & renamed {\tt send} instruction to {\tt move} \\
69 & added ``if its predicate is true'' to repeat count \\
70 & added note that red wires do not contact ships \\
71 & changed name of {\tt flags} instruction to {\tt setFlags} \\
72 & removed black dot from diagrams \\
73 & changed {\tt OL} (Outer Loop participant) to {\tt OS} (One Shot) and inverted polarity \\
74 & indicated that the death of the {\tt tail} instruction is what causes the hatch to be unsealed \\
75 & indicated that only {\tt send} instructions which wait for data are torpedoable \\
76 & added section ``Torpedo Details'' \\
77 & removed {\tt torpedo} instruction \\
80 & renamed loop+repeat to outer+inner (not in red) \\
81 & renamed {\tt Z} flag to {\tt L} flag (not in red) \\
82 & rewrote ``inner and outer loops'' section \\
83 & updated all diagrams \\
86 & Moved address bits to the LSB-side of a 37-bit instruction \\
87 & Added {\it micro-instruction} and {\it composite instruction} terms \\
88 & Removed the {\tt DL} field, added {\tt decrement} mode to {\tt loop} \\
89 & Created the {\tt Hold} field \\
90 & Changed how ReLooping works \\
91 & Removed {\tt clog}, {\tt unclog}, {\tt interrupt}, and {\tt massacre} \\
98 \epsfig{file=overview,width=1.5in}
99 \epsfig{file=indock,width=3in}
104 \section{Overview of Fleet}
106 A Fleet processor consists of a {\it switch fabric} with several
107 functional units called {\it ships} connected to it. At each
108 connection between a ship and the switch fabric lies a programmable
109 element known as the {\it dock}.
111 A {\it path} specifies a route through the switch fabric from a
112 particular {\it source} to a particular {\it destination}. The
113 combination of a path and a single word {\it payload} is called a {\it packet}. The
114 switch fabric carries packets from their sources to their
115 destinations. Each dock has two destinations: one for {\it
116 instructions} and one for {\it data}. A Fleet is programmed by
117 depositing packets into the switch fabric; these packets' paths lead
118 them to the instruction destinations of the docks.
120 When a packet arrives at the instruction destination of a dock, it is
121 enqueued for execution. Before the instruction executes, it may cause
122 the dock to wait for a packet to arrive at the dock's data destination
123 or for a value to be presented by the ship. It may present a data
124 value to the ship or transmit it for transmission to some other
127 When an instruction sends a packet into the switch fabric, it may
128 specify that the payload of the packet is irrelevant. Such packets
129 are known as {\it tokens}, and consume less energy than data packets.
130 From a programmer's perspective, a token packet is indistinguishable
131 from a data packet with a unknown payload.
134 In the diagram below, the red wires carry instructions and the blue
135 wires carry data; the switch fabric (gray area) carries both. Notice
136 that the red (instruction) wires do not contact the ships. This is an
137 advantage: ships are designed without any consideration for the
138 instructions used to program their docks.
142 \epsfig{file=overview,width=2.5in}\\
143 {\it Overview of a Fleet processor; gray shading represents a
144 packet-switched network fabric; blue lines carry data, red lines
151 \section{The FleetTwo Pump}
153 The diagram below represents a {\it programmer's} conceptual view of
154 the interface between ships and the switch fabric. Actual
155 implementation circuitry may differ substantially. Sources and
156 destinations that can send and receive only tokens -- not data items
157 -- are drawn as dashed lines.
160 \epsfig{file=indock,width=3.5in}\\
161 {\it an ``input'' dock}
163 \epsfig{file=outdock,width=3.5in}\\
164 {\it an ``output'' dock}
167 The term {\it port} refers to an interface to the ship, the {\it
168 dock} connecting it to the switch fabric, and the corresponding
169 sources and destinations on the switch fabric.
171 Each dock consists of a {\it data latch}, which is as wide as a single
172 machine word and a {\it pump}, which is a circular fifo of
173 instruction-width latches. The values in the pump control the data
176 Note that the pump in each dock has a destination of its own; this is
177 the {\it instruction destination} mentioned in the previous section.
178 Note that unlike all other destinations, there is no buffering fifo
179 guarding this one. The size of these fifos are exposed to the
180 software programmer so he can avoid deadlock.
183 \section{Instructions}
185 In order to cause an instruction to execute, the programmer must first
186 cause that instruction word to arrive in the data latch of some output
187 dock. For example, this might be the ``data read'' output dock of the
188 memory access ship or the output of a fifo ship. Once an instruction
189 has arrived at this output dock, it is {\it dispatched} by sending it
190 to the {\it instruction port} of the dock at which it is to execute.
192 Each instruction is 26 bits long, which makes it possible for an
193 instruction and an 11-bit path to fit in a single word of memory.
194 This path is the path from the {\it dispatching} dock to the {\it
197 \setlength{\bitwidth}{3.5mm}
199 \begin{bytefield}{37}
200 \bitheader[b]{0,10,11,36}\\
201 \bitbox{26}{instruction}
202 \bitbox{11}{dispatch path}
205 {\bf Note:} the instruction encodings below are simply ``something to
206 shoot at'' and a sanity check to make sure we haven't overrun our bit
207 budget. The final instruction encodings will probably be
210 All instruction words have the following format:
212 \setlength{\bitwidth}{3.5mm}
214 \begin{bytefield}{37}
215 \bitheader[b]{0,10,11,36}\\
217 \bitbox{1}{\color{red}I\color{black}}
218 \bitbox{1}{\color{red}OS\color{black}}
222 \bitbox{11}{dispatch path}
226 Each instruction word is called a {\it micro instruction}.
227 Collections of one or more micro instruction are known as {\it
228 composite instructions}.
231 The {\tt I} bit stands for {\tt Interruptible}\color{black}. The \color{red}{\tt OS}
232 (``One Shot'')\color{black}\ bit indicates whether or not this instruction is part
233 of an outer loop. Both of the preceding bits are explained in the
238 The abbreviation {\tt P} stands for {\it predicate}; this is a two-bit
239 code that indicates if the instruction should be executed or ignored.
244 \subsection{Life Cycle of an Instruction}
246 The diagram below shows an input dock for purposes of illustration
247 (behavior at an output dock is identical).
250 \epsfig{file=indock,width=3in}\\
254 Note the circle on the path between ``instr horn'' and ``instr fifo'';
255 this is known as ``the hatch''. The hatch has two states: sealed and
256 unsealed. When the machine powers up, the hatch is unsealed; it is
257 sealed by the {\tt tail} instruction and unsealed \color{red}whenever
258 the outer loop counter is set to zero (for any
259 reason\footnote{\color{red}this includes {\tt OLC} being decremented
260 to zero, a {\tt setOuter} with a literal field of zero, a {\tt
261 setOuter} which copies a zero from the data register to {\tt OLC},
262 or the occurrence of a torpedo\color{black}}).\color{black}
264 When an instruction arrives at the instruction horn, it waits there
265 until the hatch is in the unsealed state. The instruction then enters
266 the instruction fifo. When an instruction emerges from the
267 instruction fifo, it arrives at the ``on deck'' stage, where it may
270 \subsubsection{Inner and Outer Loops}
272 A programmer can perform two types of loops: {\it inner} loops of only
273 one micro-instruction and {\it outer} loops of multiple
274 micro-instructions. Inner loops may be nested within an outer loop,
275 but no other nesting of loops is allowed. The paths used by inner
276 loops and outer loops are shown below:
279 \begin{minipage}{2in}
281 \epsfig{file=inner-loop,width=2in}\\
282 {\it inner loop (in red)}
285 \begin{minipage}{2in}
287 \epsfig{file=outer-loop,width=2in}\\
288 {\it outer loop (in red)}
295 Each type of loop has a counter associated with it: the {\tt ILC}
296 counter for inner loops and the {\tt OLC} counter for outer loops.
297 The inner loop counter applies only to certain ``inner-looping''
298 instructions (see the table below for details). When such an
299 instruction reaches On Deck, if its predicate is true it will execute
300 a number of times equal to {\tt ILC+1}, and leave {\tt ILC=0} after
301 executing. Non-inner-looping instructions and instructions whose
302 predicate is false do not decrement {\tt ILC}.
304 The outer loop counter applies to all instructions {\it except} the
305 instruction {\tt setOuter} with {\tt OS=1}, because such instructions
306 are needed to reset the outer loop counter after it becomes zero.
307 However, predicated {\tt setOuter} with {\tt OS=0} is useful for
308 resetting the loop counter in the middle of the execution of a loop.
312 \subsubsection{On Deck}
314 table below lists the actions which may be taken when an
315 instruction arrives on deck:
318 \def\side#1{\begin{sideways}\parbox{15mm}{#1}\end{sideways}}
319 \begin{tabular}{|r|ccccc|cccccc|}\hline
320 %&\multicolumn{10}{c}{Predicate}&\\
321 %&\multicolumn{10}{c}{True}&\\\hline
322 &\multicolumn{5}{c}{Outer-Looping} &\multicolumn{5}{c}{One-Shot}&\\
323 &\multicolumn{5}{c}{{\tt (OS=0)}} &\multicolumn{5}{c}{{\tt (OS=1)}}&\\
325 &\side{{\tt literal}}
326 &\side{{\tt setFlags}}
327 &\side{{\tt setInner}}
328 &\side{{\tt setOuter}}
330 &\side{{\tt literal}}
331 &\side{{\tt setFlags}}
332 &\side{{\tt setInner}}
333 &\side{{\tt setOuter}}
336 Wait for hatch sealed & + & + & + & + & + & & & & & & \\
337 Fill IF0 w/ copy of self & + & + & + & + & + & & & & & & \\\hline
338 Request arbiter & P+$\star$ & & & & & P+$\star$ & & & & & \\
339 Potentially torpedoable & P+$\star$ & & & & & P+$\star$ & & & & & \\\hline
340 Execute & P+ & P+& P+& P+& P+ & ? & ? & ? & ? & P & \\
341 Inner-looping & P+ & & & & ? & P+ & & & & ? & \\
345 \begin{tabular}{|r|l|}\hline
346 + & Only if {\tt OLC>0} (ie {\tt OLC} is positive) \\
347 P & Only if predicate is true \\
348 P+ & Only if predicate is true and {\tt OLC>0} \\
349 P+$\star$ & Only if predicate is true and {\tt OLC>0} and {\tt I=1} and one of {\tt Ti},{\tt Di},{\tt Do} true. \\
350 ? & to discuss \\\hline
354 \subsubsection{\color{red}Torpedo\color{black}}
357 There is a small fifo (not shown) before the latch marked
358 ``Instruction Horn''; after the {\tt tail} instruction seals the
359 hatch, any subsequent instructions will queue up in this fifo until
360 the hatch is unsealed. This is typically used as storage for a ``loop
361 epilogue'' -- a sequence of instructions to be executed after a
362 torpedo arrives or the outer loop counter expires.
364 Each dock has a fourth connection to the switch fabric (not shown),
365 called its {\it torpedo destination}. Anything (even a token) sent to
366 this destination is treated as a torpedo. Note that because this is a
367 distinct destination, instructions or data queued up in the other
368 destination fifos will not prevent a torpedo from occuring.
370 When a data item or token arrives at the torpedo destination, it lies
371 there in wait until On Deck holds a potentially torpedoable
372 instruction (see previous table). Once this is the case, the torpedo
373 causes the inner and outer loop counters to be set to zero (and
374 therefore also unseals the hatch).\footnote{it is unspecified whether
375 the torpedoed instruction is requeued or not; this may or may not
376 occur, nondeterministically. It is the programmer's responsibility
377 to ensure that the program behaves the same whether this happens or
378 not. We think that this will not matter in most situations.}
385 The pump has \color{red}three\color{black}\ flags: {\tt A}, {\tt B},
389 \item The {\tt A} and {\tt B} flags are general-purpose flags which
390 may be set and cleared by the programmer.
394 % The {\tt L} flag, known as the {\it last} flag, is set whenever
395 % the value in the outer counter ({\tt OLC}) is one,
398 % that the dock is in the midst of the last iteration of an
399 % outer loop. This flag can be used to perform certain
400 % operations (such as sending a completion token) only on the last
401 % iteration of an outer loop.
403 \item The {\tt S} flag, known as the {\it summary} flag. Its value is
404 determined by the ship, but unless stated otherwise, it should
405 be assumed that whenever the 37th bit of the data ({\tt D})
406 latch is loaded, that same bit is also loaded into the {\tt S}
407 flag. This lets the ship make decisions based on whether or not
408 the top bit of the data latch is set; if two's complement
409 numbers are in use, this will indicate whether or not the
410 latched value is negative.
413 Many instruction fields are specified as two-bit {\it predicates}.
414 These fields contain one of four values, indicating if an action
415 should be taken unconditionally or conditionally on one of the {\tt A}
419 \item {\tt 00:} if {\tt A} is set
420 \item {\tt 10:} if {\tt B} is set
421 \item {\tt 01:} \color{red}TBD\color{black}
422 \item {\tt 11:} always
427 \section{Instructions}
431 Here is a list of the instructions supported by the dock:
434 \begin{tabular}{|l|}\hline
435 {\tt move} (variants: {\tt moveto}, {\tt dispatch}) \\
436 {\tt literal} (variants: {\tt literalhi}, {\tt literallo})\\
445 {\tt tail} {\it will probably become a bit on every instruction rather than
451 \subsection{{\tt move} (variants: {\tt moveto}, {\tt dispatch})}
453 \setlength{\bitwidth}{5mm}
455 \begin{bytefield}{26}
456 \bitheader[b]{12-16,19,21}\\
474 %\begin{bytefield}{26}
475 % \bitheader[b]{12-18}\\
476 % \bitbox[]{8}{\raggedleft Input Dock:}
483 %\begin{bytefield}{26}
484 % \bitheader[b]{12-18}\\
485 % \bitbox[]{8}{\raggedleft Output Dock:}
492 \begin{bytefield}{26}
493 \bitheader[b]{0,10,11}\\
494 \bitbox[1]{13}{\raggedleft {\tt moveto} ({\tt LiteralPath\to Path})}
497 \bitbox{11}{\tt LiteralPath}
500 \begin{bytefield}{26}
501 \bitheader[b]{10,11}\\
502 \bitbox[1]{13}{\raggedleft {\tt dispatch} ({\tt DP[37:27]\to Path})\ \ }
511 \begin{bytefield}{26}
512 \bitheader[b]{10,11}\\
513 \bitbox[1]{13}{\raggedleft {\tt move} ({\tt Path} unchanged):}
523 \item {\tt Ti} - Token Input: wait for the token predecessor to be full and drain it.
524 \item {\tt Di} - Data Input: wait for the data predecessor to be full and drain it.
525 \item {\tt Dc} - Data Capture: pulse the data latch.
526 \item {\tt Do} - Data Output: fill the data successor.
527 \item {\tt To} - Token Output: fill the token successor.
530 The data successor and token successor must both be empty in order for
531 a {\tt move} instruction to attempt execution.
533 The inner loop counter can hold a number {\tt 0..MAX} or a special
534 value $\infty$. If {\tt ILC} is nonzero after execution of a {\tt
535 move} instruction, the instruction will execute again, and {\tt ILC}
536 will be latched with {\tt (ILC==$\infty$?$\infty$:max(ILC-1, 0))}. When
537 the inner loop counter reaches zero, the instruction ceases executing.
541 \subsection{{\tt literal}, {\tt literalhi}, {\tt literallo}}
543 These instructions load part or all of the data latch ({\tt D}).
545 {\tt literalhi: Literal[18:1]\to D[37:20]} (and {\tt Literal[18]\to S})
547 \setlength{\bitwidth}{5mm}
549 \begin{bytefield}{26}
550 \bitheader[b]{0,18,19,21}\\
564 {\tt literallo: Literal[19:1]\to D[19:1]}
566 \setlength{\bitwidth}{5mm}
568 \begin{bytefield}{26}
569 \bitheader[b]{0,18,19,21}\\
582 \setlength{\bitwidth}{5mm}
584 \begin{bytefield}{26}
585 \bitheader[b]{0,18,19,21}\\
597 \begin{tabular}{|r|c|c|c|}\hline
598 sel & D[37:20] & D[19:1] \\\hline
599 00 & Literal[18:1] & all 0 \\
600 01 & Literal[18:1] & all 1 \\
601 10 & all 0 & Literal[19:1] \\
602 11 & all 1 & Literal[19:1] \\
615 \setlength{\bitwidth}{5mm}
617 \begin{bytefield}{26}
618 \bitheader[b]{0,7,8,15,16-19,21}\\
633 The {\tt P} field is a predicate; if it does not hold, the instruction
634 is ignored. Otherwise the flags are updated according to the {\tt
635 nextA}, {\tt nextB}, and {\tt nextS} fields; each specifies the new
636 value as the logical {\tt OR} of zero or more inputs:
642 \bitbox{1}{${\text{\tt A}}$}
643 \bitbox{1}{$\overline{\text{\tt A}}$}
644 \bitbox{1}{${\text{\tt B}}$}
645 \bitbox{1}{$\overline{\text{\tt B}}$}
646 \bitbox{1}{${\text{\tt S}}$}
647 \bitbox{1}{$\overline{\text{\tt S}}$}
651 Each bit corresponds to one possible input; all inputs whose bits are
652 set are {\tt OR}ed together, and the resulting value is assigned to
653 the flag. Note that if none of the bits are set, the value assigned
654 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
655 OR}ing any flag with its complement.
660 \subsection{{\tt setInner}}
662 This instruction loads the inner loop counter with either a literal
663 number, the special value $\infty$, or the contents of the {\tt data}
666 \setlength{\bitwidth}{5mm}
668 \begin{bytefield}{26}
669 \bitheader[b]{16-19,21}\\
684 \begin{bytefield}{26}
685 \bitbox[r]{18}{\raggedleft from data latch:\hspace{0.2cm}\ }
692 \begin{bytefield}{26}
693 \bitheader[b]{0,5,6,7}\\
694 \bitbox[r]{18}{\raggedleft from literal:\hspace{0.2cm}\ }
696 \bitbox{6}{\tt Literal}
699 \begin{bytefield}{26}
700 \bitheader[b]{0,5,6,7}\\
701 \bitbox[r]{18}{\raggedleft with $\infty$\ \ }
709 \subsection{{\tt setOuter}}
711 This instruction loads the outer loop counter {\tt OLC} with either
712 {\tt max(0,OLC-1)}, a literal or the contents of the {\tt data}
715 \setlength{\bitwidth}{5mm}
717 \begin{bytefield}{26}
718 \bitheader[b]{16-19,21,24}\\
734 \begin{bytefield}{26}
735 \bitbox[r]{19}{\raggedleft {\tt max(0,OLC-1)}:\hspace{0.2cm}\ }
743 \begin{bytefield}{26}
744 \bitbox[r]{19}{\raggedleft from data latch:\hspace{0.2cm}\ }
751 \begin{bytefield}{26}
752 \bitheader[b]{0,5,6}\\
753 \bitbox[r]{19}{\raggedleft from literal:\hspace{0.2cm}\ }
755 \bitbox{6}{\tt Literal}
759 %\subsection{{\tt torpedo}}
761 %\setlength{\bitwidth}{5mm}
763 %\begin{bytefield}{26}
764 % \bitheader[b]{0,5,16-19,21}\\
776 %When a {\tt torpedo} instruction reaches the instruction horn, it will
777 %wait there until an instruction is on deck whose {\tt A}rmor bit is
778 %not set. The {\tt torpedo} will then cause ``Process \#2'' of the on
779 %deck instruction to terminate and will set the outer loop counter to zero.
781 \subsection{{\tt tail}}
784 {\it This will probably become a bit on every instruction rather than
785 its own instruction. The only problem is that we have run out of bits in the {\tt literal} instruction. Two possible solutions: (a) declare that {\tt literal} cannot be the last instruction in a loop or (b) because {\tt literal} instructions cannot be torpedoed anyways, re-use its {\tt I} bit for this purpose.}
788 \setlength{\bitwidth}{5mm}
790 \begin{bytefield}{26}
791 \bitheader[b]{0,5,16-19,21}\\
802 When a {\tt tail} instruction reaches {\tt IH}, it seals the hatch.
803 The {\tt tail} instruction does not enter the instruction fifo.
807 %\subsection{{\tt takeOuterLoopCounter}}
809 %\setlength{\bitwidth}{5mm}
811 %\begin{bytefield}{26}
812 % \bitheader[b]{16-19,21}\\
826 %This instruction copies the value in the outer loop counter {\tt OLC}
827 %into the least significant bits of the data latch and leaves all other
828 %bits of the data latch unchanged.
830 %\subsection{{\tt takeInnerLoopCounter}}
832 %\setlength{\bitwidth}{5mm}
834 %\begin{bytefield}{26}
835 % \bitheader[b]{16-19,21}\\
849 %This instruction copies the value in the inner loop counter {\tt ILC}
850 %into the least significant bits of the data latch and leaves all other
851 %bits of the data latch unchanged.
856 %%\subsection{{\tt interrupt}}
858 %%\setlength{\bitwidth}{5mm}
860 %\begin{bytefield}{26}
861 % \bitheader[b]{0,5,16-19,21}\\
872 %When an {\tt interrupt} instruction reaches {\tt IH}, it will wait
873 %there for the {\tt OD} stage to be full with an instruction that has
874 %the {\tt IM} bit set. When this occurs, the instruction at {\tt OD}
875 %{\it will not execute}, but {\it may reloop} if the conditions for
877 %\footnote{The ability to interrupt an instruction yet have it reloop is very
878 %useful for processing chunks of data with a fixed size header and/or
879 %footer and a variable length body.}
882 %\subsection{{\tt massacre}}
884 %\setlength{\bitwidth}{5mm}
886 %\begin{bytefield}{26}
887 % \bitheader[b]{16-19,21}\\
899 %When a {\tt massacre} instruction reaches {\tt IH}, it will wait there
900 %for the {\tt OD} stage to be full with an instruction that has the
901 %{\tt IM} bit set. When this occurs, all instructions in the
902 %instruction fifo (including {\tt OD}) are retired.
904 %\subsection{{\tt clog}}
906 %\setlength{\bitwidth}{5mm}
908 %\begin{bytefield}{26}
909 % \bitheader[b]{16-19,21}\\
921 %When a {\tt clog} instruction reaches {\tt OD}, it remains there and
922 %no more instructions will be executed until an {\tt unclog} is
925 %\subsection{{\tt unclog}}
927 %\setlength{\bitwidth}{5mm}
929 %\begin{bytefield}{26}
930 % \bitheader[b]{16-19,21}\\
936 % \bitbox[lrtb]{2}{11}
942 %When an {\tt unclog} instruction reaches {\tt IH}, it will wait there
943 %until a {\tt clog} instruction is at {\tt OD}. When this occurs, both
944 %instructions retire.
946 %Note that issuing an {\tt unclog} instruction to a dock which is not
947 %clogged and whose instruction fifo contains no {\tt clog} instructions
948 %will cause the dock to deadlock.
953 \epsfig{file=overview,height=5in,angle=90}
956 \subsection*{Input Dock}
957 \epsfig{file=indock,width=7in,angle=90}
960 \subsection*{Output Dock}
961 \epsfig{file=outdock,width=6.5in,angle=90}
965 %\epsfig{file=ports,height=5in,angle=90}
968 %\epsfig{file=best,height=5in,angle=90}