1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
15 \bibliographystyle{alpha}
16 \pagestyle{fancyplain}
18 \definecolor{light}{gray}{0.7}
20 \newcommand{\footnoteremember}[2]{
23 \setcounter{#1}{\value{footnote}}
24 } \newcommand{\footnoterecall}[1]{
25 \footnotemark[\value{#1}]
33 %\oddsidemargin 0.25in
34 %\evensidemargin 0.25in
36 \def\to{\ $\rightarrow$\ }
46 \title{\vspace{-1cm}The FleetTwo Dock}
58 & Moved address bits to the LSB-side of a 37-bit instruction \\
59 & Added {\it micro-instruction} and {\it composite instruction} terms \\
60 & Removed the {\tt DL} field, added {\tt decrement} mode to {\tt loop} \\
61 & Created the {\tt Hold} field \\
62 & Changed how ReLooping works \\
63 & Removed {\tt clog}, {\tt unclog}, {\tt interrupt}, and {\tt massacre} \\
71 \epsfig{file=overview,width=1.5in}
72 \epsfig{file=ports,width=1.5in}
73 \epsfig{file=best,width=1.5in}
78 \section{Overview of Fleet}
80 A Fleet processor consists of a {\it switch fabric} with several
81 functional units called {\it ships} connected to it. At each
82 connection between a ship and the switch fabric lies a programmable
83 element known as the {\it dock}.
85 A {\it path} specifies a route through the switch fabric from a
86 particular {\it source} to a particular {\it destination}. The
87 combination of a path and a single word {\it payload} is called a {\it packet}. The
88 switch fabric carries packets from their sources to their
89 destinations. Each dock has two destinations: one for {\it
90 instructions} and one for {\it data}. A Fleet is programmed by
91 depositing packets into the switch fabric; these packets' paths lead
92 them to the instruction destinations of the docks.
94 When a packet arrives at the instruction destination of a dock, it is
95 enqueued for execution. Before the instruction executes, it may cause
96 the dock to wait for a packet to arrive at the dock's data destination
97 or for a value to be presented by the ship. It may present a data
98 value to the ship or transmit it for transmission to some other
101 When an instruction sends a packet into the switch fabric, it may
102 specify that the payload of the packet is irrelevant. Such packets
103 are known as {\it tokens}, and consume less energy than data packets.
104 From a programmer's perspective, a token packet is indistinguishable
105 from a data packet with a unknown payload.
108 \epsfig{file=overview,width=4in}\\
109 {\it Overview of a Fleet processor}
114 \section{The Ship-Switch Fabric Interface}
116 The diagram below represents a {\it programmer's} conceptual view of
117 the interface between ships and the switch fabric. Actual
118 implementation circuitry may differ substantially. Sources and
119 destinations that can send and receive only tokens -- not data items
120 -- are drawn as dashed lines.
123 \epsfig{file=ports,width=4in}\\
124 {\it The interface betwen the switch fabric and the ship}
127 The term {\it port} refers to an interface to the ship, the {\it
128 dock} connecting it to the switch fabric, and the corresponding
129 sources and destinations on the switch fabric.
131 Each dock consists of a {\it data latch}, which is as wide as a
132 single machine word and a {\it pump}, which is a circular fifo of
133 instruction-width latches. The values in the instruction fifo
134 control the data latch.
136 Note that the pump in each dock has a destination of its own; this is
137 the {\it instruction destination} mentioned in the previous section.
138 Note that unlike all other destinations, there is no buffering fifo
139 guarding this one. The size of these fifos are exposed to the
140 software programmer so she can avoid deadlock.
144 \section{The FleetTwo Pump}
146 The diagram below shows the datapath for the FleetTwo pump circuitry.
147 The square box marked {\tt D} on the output from the {\tt IH} latch is
148 the instruction decoder, which decodes word-width instructions into a
149 set of control signals suitable for operating the pump. The boxes
150 marked {\tt CD} are carry detectors. These detect zero values in the
151 count and also generate the partial differences used in the decrement
155 \epsfig{file=best,width=4in}\\
156 {\it The pump datapath}
159 The latches of primary interest here are:
161 \item {\tt IH}: Instruction Horn (leaf node; may be shared)
162 \item {\tt F0}: Fifo Stage 0 (first fifo stage)
163 \item {\tt OD}: On Deck
164 \item {\tt F}: Flags, {\tt NF}: Next Flags
165 \item {\tt P}: Path (the path to use for outbound data/tokens)
167 \item {\tt DP}: Data Predecessor (ship for output ports, switch fabric for input ports)
168 \item {\tt DS}: Data Successor (switch fabric for output ports, ship for input ports)
169 \item {\tt RC}: Repeat Count, {\tt NRC}: Next Repeat Count
170 \item {\tt LC}: Loop Count, {\tt NLC}: Next Loop Count
173 Each instruction that executes causes the latches of the pump to fire
174 in two phases, denoted as the ``left phase'' and the ``right phase''.
175 In the diagram, the left phase latches are those to the left of the
176 vertical line down the center, and the right phase latches are to the
177 right. Therefore each instruction execution requires two GasP
178 pipeline stages to complete.
182 The pump has four flags: {\tt A}, {\tt B}, {\tt S}, {\tt Z}. Of
183 these four, only the first two may be modified directly by
187 \item The {\tt A} and {\tt B} flags are general-purpose flags which
188 may be set and cleared by the programmer.
190 \item The {\tt S} flag, known as the {\it summary} flag. Its value is
191 determined by the ship, but unless stated otherwise, it should
192 be assumed that whenever the 37th bit of the data ({\tt D})
193 latch is loaded, that same bit is also loaded into the {\tt S}
194 flag. This lets the ship make decisions based on whether or not
195 the top bit of the data latch is set; if two's complement
196 numbers are in use, this will indicate whether or not the
197 latched value is negative.
199 \item The {\tt Z} flag, known as the {\it zero} flag, is set whenever
200 the value in the loop counter ({\tt LC}) is zero. This flag can
201 be used to perform certain operations (such as sending a
202 completion token) only on the last iteration of a loop.
205 Many instruction fields are specified as two-bit {\it predicates}.
206 These fields contain one of four values, indicating if an action
207 should be taken unconditionally or conditionally on one of the {\tt A}
211 \item {\tt 00:} if {\tt A} is set
212 \item {\tt 10:} if {\tt B} is set
213 \item {\tt 01:} if {\tt Z} is set ({\tt LC=0})
214 \item {\tt 11:} always
218 \section{Instructions}
220 In order to cause an instruction to execute, the programmer must first
221 cause that instruction word to arrive in the data latch of some output
222 dock. For example, this might be the ``data read'' output dock of the
223 memory access ship or the output of a fifo ship. Once an instruction
224 has arrived at this output dock, it is {\it dispatched} by sending it
225 to the {\it instruction port} of the dock at which it is to execute.
227 Each instruction is 26 bits long, which makes it possible for an
228 instruction and an 11-bit path to fit in a single word of memory.
229 This path is the path from the {\it dispatching} dock to the {\it
232 \setlength{\bitwidth}{3.5mm}
234 \begin{bytefield}{37}
235 \bitheader[b]{0,10,11,36}\\
236 \bitbox{26}{instruction}
238 \bitbox{11}{dispatch path}
242 {\bf Note:} the instruction encodings below are simply ``something to
243 shoot at'' and a sanity check to make sure we haven't overrun our bit
244 budget. The final instruction encodings will probably be
247 All instruction words have the following format:
249 \setlength{\bitwidth}{3.5mm}
251 \begin{bytefield}{37}
252 \bitheader[b]{0,10,11,36}\\
261 \bitbox{11}{dispatch path}
267 Each instruction word is called a {\it micro instruction}.
268 Collections of one or more micro instruction are known as {\it
269 composite instructions}. The {\tt Hold} field indicates how micro
270 instructions are gathered together into composite instructions:
273 \item {\tt 00:} {\tt solo} -- this word is not part of a composite instruction
274 \item {\tt 01:} {\tt soloT} -- like {\tt solo}, but {\tt torpedo}-able
275 \item {\tt 10:} {\tt body} -- this word is part of a composite instruction, but not the last
276 \item {\tt 11:} {\tt tail} -- this is the last micro instruction in a composite instruction
279 Solo instructions never reloop (described later); they are
280 ``one-shot'' instructions. Multiple solo instructions may be in the
281 instruction fifo simultaneously. A {\tt solo} instruction is immune
282 to {\tt torpedo}s (described later); a {\tt soloT} instruction is
283 not\footnote{the {\tt soloT} instruction is meant to be used for
284 ``standing repeating'' instructions}.
286 Composite instructions reloop until the loop counter is zero. When a
287 composite instruction is in the instruction fifo, no other
288 instructions may enter the fifo. A {\tt body} instruction is immune
289 to {\tt torpedo}s; a {\tt tail} instruction is not. \color{black}
291 The abbreviation {\tt P} stands for {\it predicate}; this is a two-bit
292 code that indicates if the instruction should be executed or ignored.
293 If an instruction is ignored, it might still reloop.
296 \subsection{RePeating and ReLooping}
300 \begin{minipage}{3in}
303 \begin{tabular}{|r|c|c|}\hline
304 & RePeating? & ReLooping? \\\hline
305 {\tt send} & Y & Y \\\hline
306 {\tt literal} & N & Y \\\hline
307 {\tt flags} & N & Y \\\hline
308 {\tt repeat} & N & Y \\
310 \footnote{note, however, that the decision to reloop or not is based on the value in the loop counter {\it before} execution of the {\tt loop} instruction}
312 {\tt takeLoopCounter} & N & Y \\
313 {\tt takeRepeatCounter} & N & Y \\
316 {\tt torpedo} \color{black} & n/a & n/a \\
318 %{\tt clog} & N & N \\
319 %{\tt unclog} & n/a & n/a \\
320 %{\tt interrupt} & n/a & n/a \\
321 %{\tt massacre} & n/a & n/a \\
326 \caption{classification of instructions}
330 An instruction will repeat if it is classified as a repeating
331 instruction and the repeat counter is nonzero.
332 Non-repeating instructions have no effect on the repeat
333 counter (except for {\tt repeat}, of course).
337 Solo instructions (both {\tt solo} and {\tt soloT})
338 completely ignore the loop counter; it has no effect on them.
340 If a {\tt body} or {\tt tail} instruction reaches the on deck stage
341 and the loop counter ({\tt LC}) is zero, the instruction dies
342 immediately without executing or relooping.
344 If a {\tt body} or {\tt tail} instruction reaches the on deck stage
345 and the loop counter ({\tt LC}) is nonzero, a (duplicate) copy of that
346 instruction is immediately enqueued at the head of the instruction
347 fifo; the original instruction then waits at {\tt OD} until either its
348 execution conditions are met or it is {\tt torpedo}ed.
353 \subsection{{\tt send} (variants: {\tt sendto}, {\tt dispatch})}
355 \setlength{\bitwidth}{5mm}
357 \begin{bytefield}{26}
358 \bitheader[b]{12-16,19,21}\\
375 %\begin{bytefield}{26}
376 % \bitheader[b]{12-18}\\
377 % \bitbox[]{8}{\raggedleft Input Dock:}
384 %\begin{bytefield}{26}
385 % \bitheader[b]{12-18}\\
386 % \bitbox[]{8}{\raggedleft Output Dock:}
393 \begin{bytefield}{26}
394 \bitheader[b]{0,10,11}\\
395 \bitbox[1]{13}{\raggedleft {\tt sendto} ({\tt LiteralPath\to Path})}
398 \bitbox{11}{\tt LiteralPath}
401 \begin{bytefield}{26}
402 \bitheader[b]{10,11}\\
403 \bitbox[1]{13}{\raggedleft {\tt dispatch} ({\tt DP[37:27]\to Path})\ \ }
412 \begin{bytefield}{26}
413 \bitheader[b]{10,11}\\
414 \bitbox[1]{13}{\raggedleft {\tt send} ({\tt Path} unchanged):}
424 \item {\tt Ti} - Token Input: wait for the token predecessor to be full and drain it.
425 \item {\tt Di} - Data Input: wait for the data predecessor to be full and drain it.
426 \item {\tt Dc} - Data Capture: pulse the data latch.
427 \item {\tt Do} - Data Output: fill the data successor.
428 \item {\tt To} - Token Output: fill the token successor.
431 The {\tt F0}, {\tt DS}, and {\tt TS} stages must all be empty in order for an
432 instruction to execute.
434 The repeat counter can hold a number {\tt 0..MAX} or a special value
435 $\infty$. If the repeat count ({\tt RC}) holds a value other than
436 $\infty$, it is latched with {\tt max(RC-1, 0)}. If the repeat
437 counter reaches zero, the instruction ceases executing and either
438 reloops or retires (see earlier section for details).
442 \subsection{{\tt data}, {\tt datahi}, {\tt datalo}}
444 These instructions load part or all of the data latch ({\tt D}).
446 {\tt datahi: Literal[18:1]\to D[37:20]} (and {\tt Literal[18]\to S})
448 \setlength{\bitwidth}{5mm}
450 \begin{bytefield}{26}
451 \bitheader[b]{0,18,19,21}\\
464 {\tt datalo: Literal[19:1]\to D[19:1]}
466 \setlength{\bitwidth}{5mm}
468 \begin{bytefield}{26}
469 \bitheader[b]{0,18,19,21}\\
481 \setlength{\bitwidth}{5mm}
483 \begin{bytefield}{26}
484 \bitheader[b]{0,18,19,21}\\
495 \begin{tabular}{|r|c|c|c|}\hline
496 sel & D[37:20] & D[19:1] \\\hline
497 00 & Literal[18:1] & all 0 \\
498 01 & Literal[18:1] & all 1 \\
499 10 & all 0 & Literal[19:1] \\
500 11 & all 1 & Literal[19:1] \\
507 \subsection{{\tt flags}}
509 \setlength{\bitwidth}{5mm}
511 \begin{bytefield}{26}
512 \bitheader[b]{0,7,8,15,16-19,21}\\
525 The {\tt P} field is a predicate; if it does not hold, the instruction
526 is ignored. Otherwise the two flags ({\tt A} and {\tt B}) are updated
527 according to the {\tt nextA} and {\tt nextB} fields; each specifies
528 the new value as the logical {\tt OR} of zero or more inputs:
534 \bitbox{1}{${\text{\tt A}}$}
535 \bitbox{1}{$\overline{\text{\tt A}}$}
536 \bitbox{1}{${\text{\tt B}}$}
537 \bitbox{1}{$\overline{\text{\tt B}}$}
538 \bitbox{1}{${\text{\tt S}}$}
539 \bitbox{1}{$\overline{\text{\tt S}}$}
540 \bitbox{1}{${\text{\tt Z}}$}
541 \bitbox{1}{$\overline{\text{\tt Z}}$}
545 Each bit corresponds to one possible input; all inputs whose bits are
546 set are {\tt OR}ed together, and the resulting value is assigned to
547 the flag. Note that if none of the bits are set, the value assigned
548 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
549 OR}ing any flag with its complement.
554 \subsection{{\tt repeat}}
556 This instruction loads the repeat counter with either a literal
557 number, the special value $\infty$, or the contents of the {\tt data}
560 \setlength{\bitwidth}{5mm}
562 \begin{bytefield}{26}
563 \bitheader[b]{16-19,21}\\
577 \begin{bytefield}{26}
578 \bitbox[r]{18}{\raggedleft from data latch:\hspace{0.2cm}\ }
585 \begin{bytefield}{26}
586 \bitheader[b]{0,5,6,7}\\
587 \bitbox[r]{18}{\raggedleft from literal:\hspace{0.2cm}\ }
589 \bitbox{6}{\tt Literal}
592 \begin{bytefield}{26}
593 \bitheader[b]{0,5,6,7}\\
594 \bitbox[r]{18}{\raggedleft with $\infty$\ \ }
602 \subsection{{\tt loop}}
604 This instruction loads the loop counter {\tt LC} with either {\tt max(0,LC-1)}, a literal or the
605 contents of the {\tt data} register.
607 \setlength{\bitwidth}{5mm}
609 \begin{bytefield}{26}
610 \bitheader[b]{16-19,21,24}\\
625 \begin{bytefield}{26}
627 \bitbox[r]{19}{\raggedleft {\tt max(0,LC-1)}:\hspace{0.2cm}\ }
635 \begin{bytefield}{26}
636 \bitbox[r]{19}{\raggedleft from data latch:\hspace{0.2cm}\ }
643 \begin{bytefield}{26}
644 \bitheader[b]{0,5,6}\\
645 \bitbox[r]{19}{\raggedleft from literal:\hspace{0.2cm}\ }
647 \bitbox{6}{\tt Literal}
651 \subsection{{\tt takeLoopCounter}}
653 \setlength{\bitwidth}{5mm}
655 \begin{bytefield}{26}
656 \bitheader[b]{16-19,21}\\
669 The {\tt P} field is a predicate; if it does not hold, the instruction
670 is ignored (but may reloop). This instruction copies the value in the
671 loop counter {\tt LC} into the least significant bits of the data
672 latch and leaves all other bits of the data latch unchanged.
674 \subsection{{\tt takeRepeatCounter}}
676 \setlength{\bitwidth}{5mm}
678 \begin{bytefield}{26}
679 \bitheader[b]{16-19,21}\\
692 The {\tt P} field is a predicate; if it does not hold, the instruction
693 is ignored (but may reloop). This instruction copies the value in the
694 repeat counter {\tt RC} into the least significant bits of the data
695 latch and leaves all other bits of the data latch unchanged.
699 \subsection{{\tt torpedo}}
701 \setlength{\bitwidth}{5mm}
703 \begin{bytefield}{26}
704 \bitheader[b]{0,5,16-19,21}\\
715 When a {\tt torpedo} instruction reaches {\tt IH}, it will wait there
716 until an instruction is on deck (at {\tt OD}) and that instruction's
717 {\tt Hold} field is {\tt tail} or {\tt soloT}. The {\tt torpedo} will then
718 annihilate the on-deck instruction {\it and set the loop counter to zero}.
723 %\subsection{{\tt interrupt}}
725 %\setlength{\bitwidth}{5mm}
727 %\begin{bytefield}{26}
728 % \bitheader[b]{0,5,16-19,21}\\
739 %When an {\tt interrupt} instruction reaches {\tt IH}, it will wait
740 %there for the {\tt OD} stage to be full with an instruction that has
741 %the {\tt IM} bit set. When this occurs, the instruction at {\tt OD}
742 %{\it will not execute}, but {\it may reloop} if the conditions for
744 %\footnote{The ability to interrupt an instruction yet have it reloop is very
745 %useful for processing chunks of data with a fixed size header and/or
746 %footer and a variable length body.}
749 %\subsection{{\tt massacre}}
751 %\setlength{\bitwidth}{5mm}
753 %\begin{bytefield}{26}
754 % \bitheader[b]{16-19,21}\\
766 %When a {\tt massacre} instruction reaches {\tt IH}, it will wait there
767 %for the {\tt OD} stage to be full with an instruction that has the
768 %{\tt IM} bit set. When this occurs, all instructions in the
769 %instruction fifo (including {\tt OD}) are retired.
771 %\subsection{{\tt clog}}
773 %\setlength{\bitwidth}{5mm}
775 %\begin{bytefield}{26}
776 % \bitheader[b]{16-19,21}\\
788 %When a {\tt clog} instruction reaches {\tt OD}, it remains there and
789 %no more instructions will be executed until an {\tt unclog} is
792 %\subsection{{\tt unclog}}
794 %\setlength{\bitwidth}{5mm}
796 %\begin{bytefield}{26}
797 % \bitheader[b]{16-19,21}\\
803 % \bitbox[lrtb]{2}{11}
809 %When an {\tt unclog} instruction reaches {\tt IH}, it will wait there
810 %until a {\tt clog} instruction is at {\tt OD}. When this occurs, both
811 %instructions retire.
813 %Note that issuing an {\tt unclog} instruction to a dock which is not
814 %clogged and whose instruction fifo contains no {\tt clog} instructions
815 %will cause the dock to deadlock.
820 \epsfig{file=overview,height=5in,angle=90}
823 \epsfig{file=ports,height=5in,angle=90}
826 \epsfig{file=best,height=5in,angle=90}