1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
15 \bibliographystyle{alpha}
16 \pagestyle{fancyplain}
18 \definecolor{light}{gray}{0.7}
20 \newcommand{\footnoteremember}[2]{
23 \setcounter{#1}{\value{footnote}}
24 } \newcommand{\footnoterecall}[1]{
25 \footnotemark[\value{#1}]
33 %\oddsidemargin 0.25in
34 %\evensidemargin 0.25in
36 \def\to{\ $\rightarrow$\ }
46 \title{\vspace{-1cm}The FleetTwo Dock}
59 & renamed loop+repeat to outer+inner (not in red) \\
60 & renamed {\tt Z} flag to {\tt L} flag (not in red) \\
61 & rewrote ``inner and outer loops'' section \\
62 & updated all diagrams \\
65 & Moved address bits to the LSB-side of a 37-bit instruction \\
66 & Added {\it micro-instruction} and {\it composite instruction} terms \\
67 & Removed the {\tt DL} field, added {\tt decrement} mode to {\tt loop} \\
68 & Created the {\tt Hold} field \\
69 & Changed how ReLooping works \\
70 & Removed {\tt clog}, {\tt unclog}, {\tt interrupt}, and {\tt massacre} \\
77 \epsfig{file=overview,width=1.5in}
78 \epsfig{file=indock,width=3in}
83 \section{Overview of Fleet}
85 A Fleet processor consists of a {\it switch fabric} with several
86 functional units called {\it ships} connected to it. At each
87 connection between a ship and the switch fabric lies a programmable
88 element known as the {\it dock}.
90 A {\it path} specifies a route through the switch fabric from a
91 particular {\it source} to a particular {\it destination}. The
92 combination of a path and a single word {\it payload} is called a {\it packet}. The
93 switch fabric carries packets from their sources to their
94 destinations. Each dock has two destinations: one for {\it
95 instructions} and one for {\it data}. A Fleet is programmed by
96 depositing packets into the switch fabric; these packets' paths lead
97 them to the instruction destinations of the docks.
99 When a packet arrives at the instruction destination of a dock, it is
100 enqueued for execution. Before the instruction executes, it may cause
101 the dock to wait for a packet to arrive at the dock's data destination
102 or for a value to be presented by the ship. It may present a data
103 value to the ship or transmit it for transmission to some other
106 When an instruction sends a packet into the switch fabric, it may
107 specify that the payload of the packet is irrelevant. Such packets
108 are known as {\it tokens}, and consume less energy than data packets.
109 From a programmer's perspective, a token packet is indistinguishable
110 from a data packet with a unknown payload.
113 \epsfig{file=overview,width=3in}\\
115 {\it Overview of a Fleet processor; gray shading represents a
116 packet-switched network fabric; blue lines carry data, red lines
123 \section{The FleetTwo Pump}
125 The diagram below represents a {\it programmer's} conceptual view of
126 the interface between ships and the switch fabric. Actual
127 implementation circuitry may differ substantially. Sources and
128 destinations that can send and receive only tokens -- not data items
129 -- are drawn as dashed lines.
132 \epsfig{file=indock,width=3.5in}\\
133 {\it an ``input'' dock}
135 \epsfig{file=outdock,width=3.5in}\\
136 {\it an ``output'' dock}
139 The term {\it port} refers to an interface to the ship, the {\it
140 dock} connecting it to the switch fabric, and the corresponding
141 sources and destinations on the switch fabric.
143 Each dock consists of a {\it data latch}, which is as wide as a single
144 machine word and a {\it pump}, which is a circular fifo of
145 instruction-width latches. The values in the pump control the data
148 Note that the pump in each dock has a destination of its own; this is
149 the {\it instruction destination} mentioned in the previous section.
150 Note that unlike all other destinations, there is no buffering fifo
151 guarding this one. The size of these fifos are exposed to the
152 software programmer so \color{red}he\color{black}\ can avoid deadlock.
155 \section{Instructions}
157 In order to cause an instruction to execute, the programmer must first
158 cause that instruction word to arrive in the data latch of some output
159 dock. For example, this might be the ``data read'' output dock of the
160 memory access ship or the output of a fifo ship. Once an instruction
161 has arrived at this output dock, it is {\it dispatched} by sending it
162 to the {\it instruction port} of the dock at which it is to execute.
164 Each instruction is 26 bits long, which makes it possible for an
165 instruction and an 11-bit path to fit in a single word of memory.
166 This path is the path from the {\it dispatching} dock to the {\it
169 \setlength{\bitwidth}{3.5mm}
171 \begin{bytefield}{37}
172 \bitheader[b]{0,10,11,36}\\
173 \bitbox{26}{instruction}
174 \bitbox{11}{dispatch path}
177 {\bf Note:} the instruction encodings below are simply ``something to
178 shoot at'' and a sanity check to make sure we haven't overrun our bit
179 budget. The final instruction encodings will probably be
182 All instruction words have the following format:
184 \setlength{\bitwidth}{3.5mm}
186 \begin{bytefield}{37}
187 \bitheader[b]{0,10,11,36}\\
194 \bitbox{11}{dispatch path}
198 Each instruction word is called a {\it micro instruction}.
199 Collections of one or more micro instruction are known as {\it
200 composite instructions}.
204 The {\tt A} bit stands for {\tt Armor}\footnote{this is to be
205 pronounced with a Boston accent (``AAHH-mir'')}.
206 The {\tt OL} bit indicates whether or not this instruction is part of
207 an outer loop. Both of the preceding bits are explained in the next section.
211 The abbreviation {\tt P} stands for {\it predicate}; this is a two-bit
212 code that indicates if the instruction should be executed or ignored.
217 \subsection{Life Cycle of an Instruction}
219 The diagram below shows an input dock for purposes of illustration
220 (behavior at an output dock is identical).
223 \epsfig{file=indock,width=3in}\\
227 Note the circle on the path between ``instr horn'' and ``instr fifo'';
228 this is known as ``the hatch''. The hatch has two states: sealed and
229 unsealed. When the machine powers up, the hatch is unsealed; it is
230 sealed by the {\tt tail} instruction and unsealed as described below.
232 When a non-{\tt torpedo} instruction arrives at the instruction horn,
233 it waits there until the hatch is in the unsealed state. The
234 instruction then enters the instruction fifo.
236 When an instruction emerges from the instruction fifo, it arrives at
237 the ``on deck'' stage and starts two processes {\it
243 If the instruction has the {\tt OL} bit set and the value of {\tt OC}
244 is nonzero, this process will wait for the hatch to be {\it sealed}
245 and then enqueue a duplicate copy of the instruction into the instruction fifo.
249 If the value of {\tt OC} is zero and the instruction at on-deck is
250 {\it not} an {\tt setOuter} instruction whose {\tt OL} bit is cleared,
251 this process will unseal the hatch, set the {\it inner} loop counter
252 to zero, and terminate.
254 Otherwise, the instruction will execute one or more times, as
255 determined by the flags, predicate, and inner loop counter (see
256 below). If the instruction's {\tt A}rmor bit is not
257 set\footnote{note: we need to say something about only {\tt send}
258 instructions being {\tt torpedo}able}, each execution attempt will
259 be arbitrated against the arrival of a {\tt torpedo} instruction at
260 the instruction horn. If the {\tt torpedo} wins, the {\it outer} loop
261 counter is set to zero and this process terminates immediately.
264 When both processes have completed, the on deck stage is vacated and
265 another instruction may enter it.
267 Note that when a {\tt torpedo} arrives at the instruction horn it will
268 wait there until on deck is occupied by an instruction whose {\tt
269 A}rmor bit is {\it not set}, but it does {\it not} wait for the
270 hatch to be unsealed.
274 \subsection{Inner and Outer Loops}
278 Using the mechanisms described above, a programmer can perform two
279 types of loops: {\it inner} loops of only one instruction and {\it
280 outer} loops of multiple instructions. Inner loops may be nested
281 within an outer loop, but no other nesting of loops is allowed. The
282 paths used by inner loops and outer loops are shown below:
285 \begin{minipage}{2in}
287 \epsfig{file=inner-loop,width=2in}\\
288 {\it inner loop (in red)}
291 \begin{minipage}{2in}
293 \epsfig{file=outer-loop,width=2in}\\
294 {\it outer loop (in red)}
299 Each type of loop has a counter associated with it: the {\tt IC}
300 counter for inner loops and the {\tt OC} counter for outer loops.
301 The inner loop counter applies only to {\tt send} instructions; all
302 other instructions ignore the inner loop counter. When a {\tt send}
303 instruction reaches the on deck position, it will execute at least
304 once; the number of times it executes after that is determined by the
307 The outer loop counter applies to all instructions {\it except} the
308 instruction {\tt setOuter} with {\tt OL=0}, because such instructions
309 are needed to reset the outer loop counter after it becomes zero.
310 However, {\tt setOuter} with {\tt OL} set to {\it one} is useful for
311 resetting the loop counter in the middle of the execution of a loop.
317 The pump has four flags: {\tt A}, {\tt B}, {\tt S}, {\tt L}. Of
318 these four, only the first two may be modified directly by
322 \item The {\tt A} and {\tt B} flags are general-purpose flags which
323 may be set and cleared by the programmer.
327 The {\tt L} flag, known as the {\it last} flag, is set whenever
328 the value in the outer counter ({\tt OC}) is one,
331 that the dock is in the midst of the last iteration of an
332 outer loop. This flag can be used to perform certain
333 operations (such as sending a completion token) only on the last
334 iteration of an outer loop.
336 \item The {\tt S} flag, known as the {\it summary} flag. Its value is
337 determined by the ship, but unless stated otherwise, it should
338 be assumed that whenever the 37th bit of the data ({\tt D})
339 latch is loaded, that same bit is also loaded into the {\tt S}
340 flag. This lets the ship make decisions based on whether or not
341 the top bit of the data latch is set; if two's complement
342 numbers are in use, this will indicate whether or not the
343 latched value is negative.
346 Many instruction fields are specified as two-bit {\it predicates}.
347 These fields contain one of four values, indicating if an action
348 should be taken unconditionally or conditionally on one of the {\tt A}
352 \item {\tt 00:} if {\tt A} is set
353 \item {\tt 10:} if {\tt B} is set
354 \item {\tt 01:} if {\tt L} is set ({\tt OC=1})
355 \item {\tt 11:} always
360 \subsection{{\tt send} (variants: {\tt sendto}, {\tt dispatch})}
362 \setlength{\bitwidth}{5mm}
364 \begin{bytefield}{26}
365 \bitheader[b]{12-16,19,21}\\
383 %\begin{bytefield}{26}
384 % \bitheader[b]{12-18}\\
385 % \bitbox[]{8}{\raggedleft Input Dock:}
392 %\begin{bytefield}{26}
393 % \bitheader[b]{12-18}\\
394 % \bitbox[]{8}{\raggedleft Output Dock:}
401 \begin{bytefield}{26}
402 \bitheader[b]{0,10,11}\\
403 \bitbox[1]{13}{\raggedleft {\tt sendto} ({\tt LiteralPath\to Path})}
406 \bitbox{11}{\tt LiteralPath}
409 \begin{bytefield}{26}
410 \bitheader[b]{10,11}\\
411 \bitbox[1]{13}{\raggedleft {\tt dispatch} ({\tt DP[37:27]\to Path})\ \ }
420 \begin{bytefield}{26}
421 \bitheader[b]{10,11}\\
422 \bitbox[1]{13}{\raggedleft {\tt send} ({\tt Path} unchanged):}
432 \item {\tt Ti} - Token Input: wait for the token predecessor to be full and drain it.
433 \item {\tt Di} - Data Input: wait for the data predecessor to be full and drain it.
434 \item {\tt Dc} - Data Capture: pulse the data latch.
435 \item {\tt Do} - Data Output: fill the data successor.
436 \item {\tt To} - Token Output: fill the token successor.
439 The data successor and token successor must both be empty in order for
440 a {\tt send} instruction to attempt execution.
442 The inner loop counter can hold a number {\tt 0..MAX} or a special
443 value $\infty$. If {\tt IC} is nonzero after execution of a {\tt
444 send} instruction, the instruction will execute again, and {\tt IC}
445 will be latched with {\tt (IC==$\infty$?$\infty$:max(IC-1, 0))}. When
446 the inner loop counter reaches zero, the instruction ceases executing.
450 \subsection{{\tt data}, {\tt datahi}, {\tt datalo}}
452 These instructions load part or all of the data latch ({\tt D}).
454 {\tt datahi: Literal[18:1]\to D[37:20]} (and {\tt Literal[18]\to S})
456 \setlength{\bitwidth}{5mm}
458 \begin{bytefield}{26}
459 \bitheader[b]{0,18,19,21}\\
473 {\tt datalo: Literal[19:1]\to D[19:1]}
475 \setlength{\bitwidth}{5mm}
477 \begin{bytefield}{26}
478 \bitheader[b]{0,18,19,21}\\
491 \setlength{\bitwidth}{5mm}
493 \begin{bytefield}{26}
494 \bitheader[b]{0,18,19,21}\\
506 \begin{tabular}{|r|c|c|c|}\hline
507 sel & D[37:20] & D[19:1] \\\hline
508 00 & Literal[18:1] & all 0 \\
509 01 & Literal[18:1] & all 1 \\
510 10 & all 0 & Literal[19:1] \\
511 11 & all 1 & Literal[19:1] \\
518 \subsection{{\tt flags}}
520 \setlength{\bitwidth}{5mm}
522 \begin{bytefield}{26}
523 \bitheader[b]{0,7,8,15,16-19,21}\\
537 The {\tt P} field is a predicate; if it does not hold, the instruction
538 is ignored. Otherwise the two flags ({\tt A} and {\tt B}) are updated
539 according to the {\tt nextA} and {\tt nextB} fields; each specifies
540 the new value as the logical {\tt OR} of zero or more inputs:
546 \bitbox{1}{${\text{\tt A}}$}
547 \bitbox{1}{$\overline{\text{\tt A}}$}
548 \bitbox{1}{${\text{\tt B}}$}
549 \bitbox{1}{$\overline{\text{\tt B}}$}
550 \bitbox{1}{${\text{\tt S}}$}
551 \bitbox{1}{$\overline{\text{\tt S}}$}
552 \bitbox{1}{${\text{\tt L}}$}
553 \bitbox{1}{$\overline{\text{\tt L}}$}
557 Each bit corresponds to one possible input; all inputs whose bits are
558 set are {\tt OR}ed together, and the resulting value is assigned to
559 the flag. Note that if none of the bits are set, the value assigned
560 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
561 OR}ing any flag with its complement.
566 \subsection{{\tt setInner}}
568 This instruction loads the inner loop counter with either a literal
569 number, the special value $\infty$, or the contents of the {\tt data}
572 \setlength{\bitwidth}{5mm}
574 \begin{bytefield}{26}
575 \bitheader[b]{16-19,21}\\
590 \begin{bytefield}{26}
591 \bitbox[r]{18}{\raggedleft from data latch:\hspace{0.2cm}\ }
598 \begin{bytefield}{26}
599 \bitheader[b]{0,5,6,7}\\
600 \bitbox[r]{18}{\raggedleft from literal:\hspace{0.2cm}\ }
602 \bitbox{6}{\tt Literal}
605 \begin{bytefield}{26}
606 \bitheader[b]{0,5,6,7}\\
607 \bitbox[r]{18}{\raggedleft with $\infty$\ \ }
615 \subsection{{\tt setOuter}}
617 This instruction loads the outer loop counter {\tt OC} with either
618 {\tt max(0,OC-1)}, a literal or the contents of the {\tt data}
621 \setlength{\bitwidth}{5mm}
623 \begin{bytefield}{26}
624 \bitheader[b]{16-19,21,24}\\
640 \begin{bytefield}{26}
641 \bitbox[r]{19}{\raggedleft {\tt max(0,OC-1)}:\hspace{0.2cm}\ }
649 \begin{bytefield}{26}
650 \bitbox[r]{19}{\raggedleft from data latch:\hspace{0.2cm}\ }
657 \begin{bytefield}{26}
658 \bitheader[b]{0,5,6}\\
659 \bitbox[r]{19}{\raggedleft from literal:\hspace{0.2cm}\ }
661 \bitbox{6}{\tt Literal}
665 \subsection{{\tt takeOuterLoopCounter}}
667 \setlength{\bitwidth}{5mm}
669 \begin{bytefield}{26}
670 \bitheader[b]{16-19,21}\\
684 This instruction copies the value in the outer loop counter {\tt OC}
685 into the least significant bits of the data latch and leaves all other
686 bits of the data latch unchanged.
688 \subsection{{\tt takeInnerLoopCounter}}
690 \setlength{\bitwidth}{5mm}
692 \begin{bytefield}{26}
693 \bitheader[b]{16-19,21}\\
707 This instruction copies the value in the inner loop counter {\tt IC}
708 into the least significant bits of the data latch and leaves all other
709 bits of the data latch unchanged.
712 \subsection{{\tt torpedo}}
714 \setlength{\bitwidth}{5mm}
716 \begin{bytefield}{26}
717 \bitheader[b]{0,5,16-19,21}\\
729 When a {\tt torpedo} instruction reaches the instruction horn, it will
730 wait there until an instruction is on deck whose {\tt A}rmor bit is
731 not set. The {\tt torpedo} will then cause ``Process \#2'' of the on
732 deck instruction to terminate and will set the outer loop counter to zero.
734 \subsection{{\tt tail}}
736 \setlength{\bitwidth}{5mm}
738 \begin{bytefield}{26}
739 \bitheader[b]{0,5,16-19,21}\\
750 When a {\tt tail} instruction reaches {\tt IH}, it seals the hatch.
751 The {\tt tail} instruction does not enter the instruction fifo.
757 %\subsection{{\tt interrupt}}
759 %\setlength{\bitwidth}{5mm}
761 %\begin{bytefield}{26}
762 % \bitheader[b]{0,5,16-19,21}\\
773 %When an {\tt interrupt} instruction reaches {\tt IH}, it will wait
774 %there for the {\tt OD} stage to be full with an instruction that has
775 %the {\tt IM} bit set. When this occurs, the instruction at {\tt OD}
776 %{\it will not execute}, but {\it may reloop} if the conditions for
778 %\footnote{The ability to interrupt an instruction yet have it reloop is very
779 %useful for processing chunks of data with a fixed size header and/or
780 %footer and a variable length body.}
783 %\subsection{{\tt massacre}}
785 %\setlength{\bitwidth}{5mm}
787 %\begin{bytefield}{26}
788 % \bitheader[b]{16-19,21}\\
800 %When a {\tt massacre} instruction reaches {\tt IH}, it will wait there
801 %for the {\tt OD} stage to be full with an instruction that has the
802 %{\tt IM} bit set. When this occurs, all instructions in the
803 %instruction fifo (including {\tt OD}) are retired.
805 %\subsection{{\tt clog}}
807 %\setlength{\bitwidth}{5mm}
809 %\begin{bytefield}{26}
810 % \bitheader[b]{16-19,21}\\
822 %When a {\tt clog} instruction reaches {\tt OD}, it remains there and
823 %no more instructions will be executed until an {\tt unclog} is
826 %\subsection{{\tt unclog}}
828 %\setlength{\bitwidth}{5mm}
830 %\begin{bytefield}{26}
831 % \bitheader[b]{16-19,21}\\
837 % \bitbox[lrtb]{2}{11}
843 %When an {\tt unclog} instruction reaches {\tt IH}, it will wait there
844 %until a {\tt clog} instruction is at {\tt OD}. When this occurs, both
845 %instructions retire.
847 %Note that issuing an {\tt unclog} instruction to a dock which is not
848 %clogged and whose instruction fifo contains no {\tt clog} instructions
849 %will cause the dock to deadlock.
854 \epsfig{file=overview,height=5in,angle=90}
857 \subsection*{Input Dock}
858 \epsfig{file=indock,width=7in,angle=90}
861 \subsection*{Output Dock}
862 \epsfig{file=outdock,width=6.5in,angle=90}
866 %\epsfig{file=ports,height=5in,angle=90}
869 %\epsfig{file=best,height=5in,angle=90}