1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
15 \bibliographystyle{alpha}
16 \pagestyle{fancyplain}
18 \definecolor{light}{gray}{0.7}
20 \newcommand{\footnoteremember}[2]{
23 \setcounter{#1}{\value{footnote}}
24 } \newcommand{\footnoterecall}[1]{
25 \footnotemark[\value{#1}]
33 %\oddsidemargin 0.25in
34 %\evensidemargin 0.25in
36 \def\to{\ $\rightarrow$\ }
46 \title{\vspace{-1cm}The FleetTwo Dock}
58 \epsfig{file=overview,width=1.5in}
59 \epsfig{file=ports,width=1.5in}
60 \epsfig{file=best,width=1.5in}
65 \section{Overview of Fleet}
67 A Fleet processor consists of a {\it switch fabric} with several
68 functional units called {\it ships} connected to it. At each
69 connection between a ship and the switch fabric lies a programmable
70 element known as the {\it dock}.
72 A {\it path} specifies a route through the switch fabric from a
73 particular {\it source} to a particular {\it destination}. The
74 combination of a path and a single word {\it payload} is called a {\it packet}. The
75 switch fabric carries packets from their sources to their
76 destinations. Each dock has two destinations: one for {\it
77 instructions} and one for {\it data}. A Fleet is programmed by
78 depositing packets into the switch fabric; these packets' paths lead
79 them to the instruction destinations of the docks.
81 When a packet arrives at the instruction destination of a dock, it is
82 enqueued for execution. Before the instruction executes, it may cause
83 the dock to wait for a packet to arrive at the dock's data destination
84 or for a value to be presented by the ship. It may present a data
85 value to the ship or transmit it for transmission to some other
88 When an instruction sends a packet into the switch fabric, it may
89 specify that the payload of the packet is irrelevant. Such packets
90 are known as {\it tokens}, and consume less energy than data packets.
91 From a programmer's perspective, a token packet is indistinguishable
92 from a data packet with a unknown payload.
95 \epsfig{file=overview,width=4in}\\
96 {\it Overview of a Fleet processor}
101 \section{The Ship-Switch Fabric Interface}
103 The diagram below represents a {\it programmer's} conceptual view of
104 the interface between ships and the switch fabric. Actual
105 implementation circuitry may differ substantially. Sources and
106 destinations that can send and receive only tokens -- not data items
107 -- are drawn as dashed lines.
110 \epsfig{file=ports,width=4in}\\
111 {\it The interface betwen the switch fabric and the ship}
114 The term {\it port} refers to an interface to the ship, the {\it
115 dock} connecting it to the switch fabric, and the corresponding
116 sources and destinations on the switch fabric.
118 Each dock consists of a {\it data latch}, which is as wide as a
119 single machine word and a {\it pump}, which is a circular fifo of
120 instruction-width latches. The values in the instruction fifo
121 control the data latch.
123 Note that the pump in each dock has a destination of its own; this is
124 the {\it instruction destination} mentioned in the previous section.
125 Note that unlike all other destinations, there is no buffering fifo
126 guarding this one. The size of these fifos are exposed to the
127 software programmer so she can avoid deadlock.
131 \section{The FleetTwo Pump}
133 The diagram below shows the datapath for the FleetTwo pump circuitry.
134 The square box marked {\tt D} on the output from the {\tt IH} latch is
135 the instruction decoder, which decodes word-width instructions into a
136 set of control signals suitable for operating the pump. The boxes
137 marked {\tt CD} are carry detectors. These detect zero values in the
138 count and also generate the partial differences used in the decrement
142 \epsfig{file=best,width=4in}\\
143 {\it The pump datapath}
146 The latches of primary interest here are:
148 \item {\tt IH}: Instruction Horn (leaf node; may be shared)
149 \item {\tt F0}: Fifo Stage 0 (first fifo stage)
150 \item {\tt OD}: On Deck
151 \item {\tt F}: Flags, {\tt NF}: Next Flags
152 \item {\tt P}: Path (the path to use for outbound data/tokens)
154 \item {\tt DP}: Data Predecessor (ship for output ports, switch fabric for input ports)
155 \item {\tt DS}: Data Successor (switch fabric for output ports, ship for input ports)
156 \item {\tt RC}: Repeat Count, {\tt NRC}: Next Repeat Count
157 \item {\tt LC}: Loop Count, {\tt NLC}: Next Loop Count
160 Each instruction that executes causes the latches of the pump to fire
161 in two phases, denoted as the ``left phase'' and the ``right phase''.
162 In the diagram, the left phase latches are those to the left of the
163 vertical line down the center, and the right phase latches are to the
164 right. Therefore each instruction execution requires two GasP
165 pipeline stages to complete.
169 The pump has four flags: {\tt A}, {\tt B}, {\tt S}, {\tt Z}. Of
170 these four, only the first two may be modified directly by
174 \item The {\tt A} and {\tt B} flags are general-purpose flags which
175 may be set and cleared by the programmer.
177 \item The {\tt S} flag, known as the {\it summary} flag. Its value is
178 determined by the ship, but unless stated otherwise, it should
179 be assumed that whenever the 37th bit of the data ({\tt D})
180 latch is loaded, that same bit is also loaded into the {\tt S}
181 flag. This lets the ship make decisions based on whether or not
182 the top bit of the data latch is set; if two's complement
183 numbers are in use, this will indicate whether or not the
184 latched value is negative.
186 \item The {\tt Z} flag, known as the {\it zero} flag, is set whenever
187 the value in the loop counter ({\tt LC}) is zero. This flag can
188 be used to perform certain operations (such as sending a
189 completion token) only on the last iteration of a loop.
192 Many instruction fields are specified as two-bit {\it predicates}.
193 These fields contain one of four values, indicating if an action
194 should be taken unconditionally or conditionally on one of the {\tt A}
198 \item {\tt 00:} if {\tt A} is set
199 \item {\tt 10:} if {\tt B} is set
200 \item {\tt 01:} if {\tt Z} is set ({\tt LC=0})
201 \item {\tt 11:} always
205 \section{Instructions}
207 In order to cause an instruction to execute, the programmer must first
208 cause that instruction word to arrive in the data latch of some output
209 dock. For example, this might be the ``data read'' output dock of the
210 memory access ship or the output of a fifo ship. Once an instruction
211 has arrived at this output dock, it is {\it dispatched} by sending it
212 to the {\it instruction port} of the dock at which it is to execute.
214 Each instruction is 26 bits long, which makes it possible for an
215 instruction and an 11-bit path to fit in a single word of memory.
216 This path is the path from the {\it dispatching} dock to the {\it
219 \setlength{\bitwidth}{3.5mm}
221 \begin{bytefield}{37}
222 \bitheader[b]{0,25,26,36}\\
223 \bitbox{11}{dispatch path}
224 \bitbox{26}{instruction}
227 {\bf Note:} the instruction encodings below are simply ``something to
228 shoot at'' and a sanity check to make sure we haven't overrun our bit
229 budget. The final instruction encodings will probably be
232 All instructions other than {\tt interrupt}, {\tt massacre}, {\tt clog},
233 and {\tt unclog} have the following format:
235 \setlength{\bitwidth}{3.5mm}
237 \begin{bytefield}{37}
238 \bitheader[b]{0,21,22,23,24,25,26,36}\\
240 \bitbox{11}{dispatch path}
250 The abbreviation {\tt IM} stands for {\it Interruptible/Massacreable}.
251 An {\tt interrupt} or {\tt massacre} instruction will not execute
252 until an instruction with the {\tt IM} bit is present at the head of
253 the instruction fifo.
255 The abbreviation {\tt DL} stands for {\it Decrement Loop}; if this bit
256 is set, the loop counter decrements. Once an instruction has
257 finished executing (including repeating, if applicable), the
258 instruction will reloop if the loop count ({\tt LC}) value was
259 greater than zero {\it prior to decrementing}.
261 The abbreviation {\tt P} stands for {\it predicate}; this is a two-bit
262 code that indicates if the instruction should be executed or ignored.
263 If an instruction is ignored, it might still reloop.
266 \subsection{RePeating and ReLooping}
270 \begin{minipage}{3in}
273 \begin{tabular}{|r|c|c|}\hline
274 & RePeating? & ReLooping? \\\hline
275 {\tt send} & Y & Y \\\hline
276 {\tt literal} & N & Y \\\hline
277 {\tt flags} & N & Y \\\hline
278 {\tt repeat} & N & Y \\
280 \footnote{note, however, that the decision to reloop or not is based on the value in the loop counter {\it before} execution of the {\tt loop} instruction}
282 {\tt takeLoopCounter} & N & Y \\
283 {\tt takeRepeatCounter} & N & Y \\
285 {\tt clog} & N & N \\
286 {\tt unclog} & n/a & n/a \\
287 {\tt interrupt} & n/a & n/a \\
288 {\tt massacre} & n/a & n/a \\\hline
292 \caption{classification of instructions}
296 An instruction will repeat if it is classified as a repeating
297 instruction and the repeat counter is nonzero. Instructions
298 non-repeating instructions have no effect on the repeat
299 counter (except for {\tt repeat}, of course).
302 An instruction reloops if {\bf all} of the following are true:
304 \item The repeat counter has reached zero.
305 \item The instruction is a ``relooping instruction'' as shown below.
306 \item The loop counter is greater than zero. This check takes into
307 account the result of a {\tt literal} instruction, but does not
308 take into account the effect of the {\tt DL} bit.
310 {\it Important:} if an instruction is {\it not} going to reloop
311 (according to the rules above), it must execute even if the tail of
312 the instruction fifo is occupied (full).
316 \subsection{{\tt send} (variants: {\tt sendto}, {\tt dispatch})}
318 \setlength{\bitwidth}{5mm}
320 \begin{bytefield}{26}
321 \bitheader[b]{12-16,19,21}\\
339 %\begin{bytefield}{26}
340 % \bitheader[b]{12-18}\\
341 % \bitbox[]{8}{\raggedleft Input Dock:}
348 %\begin{bytefield}{26}
349 % \bitheader[b]{12-18}\\
350 % \bitbox[]{8}{\raggedleft Output Dock:}
357 \begin{bytefield}{26}
358 \bitheader[b]{0,10,11}\\
359 \bitbox[1]{13}{\raggedleft {\tt sendto} ({\tt LiteralPath\to Path})}
362 \bitbox{11}{\tt LiteralPath}
365 \begin{bytefield}{26}
366 \bitheader[b]{10,11}\\
367 \bitbox[1]{13}{\raggedleft {\tt dispatch} ({\tt DP[37:27]\to Path})\ \ }
376 \begin{bytefield}{26}
377 \bitheader[b]{10,11}\\
378 \bitbox[1]{13}{\raggedleft {\tt send} ({\tt Path} unchanged):}
388 \item {\tt Ti} - Token Input: wait for the token predecessor to be full and drain it.
389 \item {\tt Di} - Data Input: wait for the data predecessor to be full and drain it.
390 \item {\tt Dc} - Data Capture: pulse the data latch.
391 \item {\tt Do} - Data Output: fill the data successor.
392 \item {\tt To} - Token Output: fill the token successor.
395 The {\tt F0}, {\tt DS}, and {\tt TS} stages must all be empty in order for an
396 instruction to execute.
398 The repeat counter can hold a number {\tt 0..MAX} or a special value
399 $\infty$. If the repeat count ({\tt RC}) holds a value other than
400 $\infty$, it is latched with {\tt max(RC-1, 0)}. If the repeat
401 counter reaches zero, the instruction ceases executing and either
402 reloops or retires (see earlier section for details).
406 \subsection{{\tt data}, {\tt datahi}, {\tt datalo}}
408 These instructions load part or all of the data latch ({\tt DL}).
410 {\tt datahi: Literal[18:1]\to D[37:20]} (and {\tt Literal[18]\to S})
412 \setlength{\bitwidth}{5mm}
414 \begin{bytefield}{26}
415 \bitheader[b]{0,18,19,21}\\
429 {\tt datalo: Literal[19:1]\to D[19:1]}
431 \setlength{\bitwidth}{5mm}
433 \begin{bytefield}{26}
434 \bitheader[b]{0,18,19,21}\\
447 \setlength{\bitwidth}{5mm}
449 \begin{bytefield}{26}
450 \bitheader[b]{0,18,19,21}\\
462 \begin{tabular}{|r|c|c|c|}\hline
463 sel & D[37:20] & D[19:1] \\\hline
464 00 & Literal[18:1] & all 0 \\
465 01 & Literal[18:1] & all 1 \\
466 10 & all 0 & Literal[19:1] \\
467 11 & all 1 & Literal[19:1] \\
474 \subsection{{\tt flags}}
476 \setlength{\bitwidth}{5mm}
478 \begin{bytefield}{26}
479 \bitheader[b]{0,7,8,15,16-19,21}\\
493 The {\tt P} field is a predicate; if it does not hold, the instruction
494 is ignored. Otherwise the two flags ({\tt A} and {\tt B}) are updated
495 according to the {\tt nextA} and {\tt nextB} fields; each specifies
496 the new value as the logical {\tt OR} of zero or more inputs:
502 \bitbox{1}{${\text{\tt A}}$}
503 \bitbox{1}{$\overline{\text{\tt A}}$}
504 \bitbox{1}{${\text{\tt B}}$}
505 \bitbox{1}{$\overline{\text{\tt B}}$}
506 \bitbox{1}{${\text{\tt S}}$}
507 \bitbox{1}{$\overline{\text{\tt S}}$}
508 \bitbox{1}{${\text{\tt Z}}$}
509 \bitbox{1}{$\overline{\text{\tt Z}}$}
513 Each bit corresponds to one possible input; all inputs whose bits are
514 set are {\tt OR}ed together, and the resulting value is assigned to
515 the flag. Note that if none of the bits are set, the value assigned
516 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
517 OR}ing any flag with its complement.
522 \subsection{{\tt repeat}}
524 This instruction loads the repeat counter with either a literal
525 number, the special value $\infty$, or the contents of the {\tt data}
528 \setlength{\bitwidth}{5mm}
530 \begin{bytefield}{26}
531 \bitheader[b]{16-19,21}\\
546 \begin{bytefield}{26}
547 \bitbox[r]{18}{\raggedleft from data latch:\hspace{0.2cm}\ }
554 \begin{bytefield}{26}
555 \bitheader[b]{0,5,6,7}\\
556 \bitbox[r]{18}{\raggedleft from literal:\hspace{0.2cm}\ }
558 \bitbox{6}{\tt Literal}
561 \begin{bytefield}{26}
562 \bitheader[b]{0,5,6,7}\\
563 \bitbox[r]{18}{\raggedleft with $\infty$\ \ }
571 \subsection{{\tt loop}}
573 This instruction loads the loop counter with either a literal or the
574 contents of the {\tt data} register.
576 \setlength{\bitwidth}{5mm}
578 \begin{bytefield}{26}
579 \bitheader[b]{16-19,21,24}\\
596 \begin{bytefield}{26}
597 \bitbox[r]{19}{\raggedleft from data latch:\hspace{0.2cm}\ }
604 \begin{bytefield}{26}
605 \bitheader[b]{0,5,6}\\
606 \bitbox[r]{19}{\raggedleft from literal:\hspace{0.2cm}\ }
608 \bitbox{6}{\tt Literal}
611 Note that the bit normally marked {\tt DL} (``decrement loop
612 counter'') {\bf must} be set to {\tt
616 \subsection{{\tt takeLoopCounter}}
618 \setlength{\bitwidth}{5mm}
620 \begin{bytefield}{26}
621 \bitheader[b]{16-19,21}\\
635 The {\tt P} field is a predicate; if it does not hold, the instruction
636 is ignored (but may reloop). This instruction copies the value in the
637 loop counter {\tt LC} into the least significant bits of the data
638 latch and leaves all other bits of the data latch unchanged.
640 \subsection{{\tt takeRepeatCounter}}
642 \setlength{\bitwidth}{5mm}
644 \begin{bytefield}{26}
645 \bitheader[b]{16-19,21}\\
659 The {\tt P} field is a predicate; if it does not hold, the instruction
660 is ignored (but may reloop). This instruction copies the value in the
661 repeat counter {\tt RC} into the least significant bits of the data
662 latch and leaves all other bits of the data latch unchanged.
665 \subsection{{\tt interrupt}}
667 \setlength{\bitwidth}{5mm}
669 \begin{bytefield}{26}
670 \bitheader[b]{0,5,16-19,21}\\
683 When an {\tt interrupt} instruction reaches {\tt IH}, it will wait
684 there for the {\tt OD} stage to be full with an instruction that has
685 the {\tt IM} bit set. When this occurs, the instruction at {\tt OD}
686 {\it will not execute}, but {\it may reloop} if the conditions for
688 \footnote{The ability to interrupt an instruction yet have it reloop is very
689 useful for processing chunks of data with a fixed size header and/or
690 footer and a variable length body.}
693 \subsection{{\tt massacre}}
695 \setlength{\bitwidth}{5mm}
697 \begin{bytefield}{26}
698 \bitheader[b]{16-19,21}\\
710 When a {\tt massacre} instruction reaches {\tt IH}, it will wait there
711 for the {\tt OD} stage to be full with an instruction that has the
712 {\tt IM} bit set. When this occurs, all instructions in the
713 instruction fifo (including {\tt OD}) are retired.
715 \subsection{{\tt clog}}
717 \setlength{\bitwidth}{5mm}
719 \begin{bytefield}{26}
720 \bitheader[b]{16-19,21}\\
732 When a {\tt clog} instruction reaches {\tt OD}, it remains there and
733 no more instructions will be executed until an {\tt unclog} is
736 \subsection{{\tt unclog}}
738 \setlength{\bitwidth}{5mm}
740 \begin{bytefield}{26}
741 \bitheader[b]{16-19,21}\\
753 When an {\tt unclog} instruction reaches {\tt IH}, it will wait there
754 until a {\tt clog} instruction is at {\tt OD}. When this occurs, both
757 Note that issuing an {\tt unclog} instruction to a dock which is not
758 clogged and whose instruction fifo contains no {\tt clog} instructions
759 will cause the dock to deadlock.
764 \epsfig{file=overview,height=5in,angle=90}
767 \epsfig{file=ports,height=5in,angle=90}
770 \epsfig{file=best,height=5in,angle=90}