1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
10 \bibliographystyle{alpha}
11 \pagestyle{fancyplain}
13 \definecolor{light}{gray}{0.7}
15 \title{\vspace{-1cm}AM42: The F2 Dock
32 - tokenhood as address bit
33 - signal/path boundary/etc
34 - Rename EPI and OD to something more meaningful
36 - get rid of shadow latch
38 - figure out C-flag / signal bit situation
39 - single "predicate" flag
40 - Suggestion that there should be a "T" flag
41 - Get rid of "shadow latch" for literals?
42 - unify flags and signal bit by saying that the dock can
43 see the upper X bits of a word?
44 - should have a way to set just the upper X bits of the word
45 - flags are actually part of the data latch!
46 - the signal bit(s) belong to the Destination (or is it the Path?)
48 - How do you get a runtime count value to an input dock?
49 - Simplify the whole c-flag/signal-bit situation
50 - tokenhood should be LITERALLY an address bit!
65 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
66 \section{Overview of Fleet}
68 A Fleet processor is organized around a {\it switch fabric}, which is
69 a packet-switched network with reliable in-order delivery. The switch
70 fabric is used to carry data between different functional units,
71 called {\it ships}. Each ship is connected to the switch fabric by
72 one or more programmable elements known as {\it docks}.
74 A {\it path} specifies a route through the switch fabric from a
75 particular {\it source} to a particular {\it destination}. The
76 combination of a path and a single word to be delivered is called a
77 {\it packet}. The switch fabric carries packets from their sources to
78 their destinations. Each dock has four\
79 destinations: one each for {\it instructions}, {\it
80 torpedoes}, {\it tokens},\ and {\it words}. A Fleet is
81 programmed by depositing instruction packets into the switch fabric
82 with paths that will lead them to instruction destinations of the
83 docks at which they are to execute.
85 When a packet arrives at the instruction destination of a dock, it is
86 enqueued for execution. Before the instruction executes, it may cause
87 the dock to wait for a packet to arrive at the dock's data destination
88 or for a value to be presented by the ship. When an instruction
89 executes it may consume this data and may present a data value to the
90 ship or transmit a packet.
92 Packets sent to token and torpedo destinations carry no payload. Such
93 packets consume less energy than instruction packets or word packets.
96 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
98 \section{The FleetTwo Dock}
100 The diagram below represents a conceptual view of the interface
101 between ships and the switch fabric; actual implementation circuitry
106 Each dock consists of a {\it data latch}, which is as wide as a single
107 machine word and a circular {\it instruction fifo} of
108 instruction-width latches. The values in the instruction fifo control
109 the data latch. The dock also includes a {\it path latch}, which
110 stores the path along which outgoing packets will be
113 Note that the instruction fifo in each dock has a destination of its
114 own; this is the {\it instruction destination} mentioned in the
115 previous section. A token sent to an instruction destination is
116 called a {\it torpedo}; it does not enter the instruction fifo, but
117 rather is held in a waiting area where it may interrupt certain
118 instructions (see the section on the {\tt move} instruction for further
121 From any source to any dock's data destination there are
122 two distinct paths which differ by a single bit. This bit is known as
123 the ``signal'' bit, and the routing of a packet is not affected by it;
124 the signal bit is used to pass control values between docks. Note that paths
125 terminating at an {\it instruction} destination need not have a signal
129 Source-sequence guarantee. Shared across instruction/torpedo (?) and
130 token/word destinations.
132 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
134 \section{Instructions}
136 In order to cause an instruction to execute, the programmer must first
137 arrange for that instruction word to arrive in the data latch of some
138 output dock. For example, this might be the ``data read'' output dock
139 of the memory access ship or the output of a fifo ship. Once an
140 instruction has arrived at this output dock, it is {\it dispatched} by
141 sending it to the {\it instruction destination} of the dock at which
144 Each instruction is 25\ bits long, which makes
145 it possible for an instruction and an 12-bit
146 path to fit in a single word of memory. This path is the path from
147 the {\it dispatching} dock to the {\it executing} dock.
151 \setlength{\bitwidth}{3.5mm}
153 \begin{bytefield}{37}
154 \bitheader[b]{0,24,25,36}\\
155 \bitbox{12}{dispatch path}
156 \bitbox{25}{instruction}
160 Note that the 12\ bit {\tt dispatch path}
161 field is not the same width as the 13 bit {\tt Immediate} path field
162 in the {\tt move} instruction, which in turn may not be the same width
163 as the actual path latches in the switch fabric.
165 The algorithm for expanding a path to a wider width is specific to the
166 switch fabric implementation, and is not specified by this
167 document.\footnote{for the Marina experiment, the correct
168 algorithm is to sign-extend the path; the most significant bit of
169 the given path is used to fill the vacant bit of the latch} In
170 particular, because the {\tt dispatch path} field is always used to
171 specify a path which terminates at an instruction destination (never a
172 data destination), and because instruction destinations ignore the
173 signal bit, certain optimizations may be possible.
176 \subsection{Loop Counter}
178 A programmer can perform two types of loops: {\it inner} loops
179 consisting of only one {\tt move} instruction and {\it outer} loops of
180 multiple instructions of any type. Inner loops may be nested within
181 an outer loop, but no other nesting of loops is allowed.
183 The dock has one loop counter, called {\tt LC}. It is the
184 same width as a word carried through the switch fabric (37 bits).
188 The dock has four flags: {\tt A}, {\tt B}, {\tt C}, and {\tt Z}.
191 \item The {\tt A} and {\tt B} flags are general-purpose flags which
192 may be set and cleared by the programmer.
194 \item The {\tt C} flag is known as the {\it control} flag, and may be
195 set by the {\tt move} instruction based on information from the
196 ship or from an inbound packet. See the {\tt move} instruction
199 \item The {\tt Z}\ flag is known as the
200 {\it zero}\ flag. The {\tt
201 Z}\ flag is {\it set} whenever the {\tt LC} is zero.
202 In an actual implementation the {\tt Z}\
203 flag might require an actual latch; it might simply be derived
204 from the ``zeroness'' of the {\tt LC}.
208 \subsection{Predication}
210 All instructions except for {\tt head} and {\tt tail} have a three-bit
211 field marked {\tt P}, which specifies a {\it predicate}.
214 \setlength{\bitwidth}{5mm}
216 \begin{bytefield}{25}
217 \bitheader[b]{0,21,22,24}\\
224 The predicate determines which conditions must be true in order for
225 the instruction to execute; if it is not executed, it is simply {\it
226 ignored}. The table below shows what conditions must be true in
227 order for an instruction to execute:
230 \begin{tabular}{|r|l|}\hline
231 Code & Execute if \\\hline
232 {\tt 000:} & {\tt Z=0}\ and {\tt A=0} \\
233 {\tt 001:} & {\tt Z=0}\ and {\tt A=1} \\
234 {\tt 010:} & {\tt Z=0}\ and {\tt B=0} \\
235 {\tt 011:} & {\tt Z=0}\ and {\tt B=1} \\
236 {\tt 100:} & Unused \\
237 {\tt 101:} & {\tt Z=1}\ \\
238 {\tt 110:} & {\tt Z=0}\ \\
239 {\tt 111:} & always \\
245 \subsection{The Requeue Stage}
247 The requeue stage has two inputs, which will be referred to as the
248 {\it enqueueing} input and the {\it recirculating} input. It has a
249 single output which feeds into the instruction fifo.
251 The requeue stage has two states: {\sc Updating} and {\sc
254 \subsubsection{The {\sc Updating} State}
256 On initialization, the dock is in the {\sc Updating} state. In this
257 state the requeue stage is performing three tasks:
259 \item it is draining the
260 previous loop's instructions (if any) from the fifo
261 \item it is executing any ``one
262 shot'' instructions which come between the previous loop's {\tt tail}
263 and the next loop's {\tt head}
264 \item it is loading the instructions of
265 the next loop into the fifo.
268 In the {\sc Updating} state, the requeue stage will accept any
269 instruction other than a {\tt tail} which arrives at its {\it
270 enqueueing} input, and pass this instruction to its output. Any
271 instruction other than a {\tt head} which arrives at the {\it
272 recirculating} input will be discarded.
274 Note that when a {\tt tail} instruction arrives at the {\it
275 enqueueing} input, it ``gets stuck'' there. Likewise, when a {\tt
276 head} instruction arrives at the {\it recirculating} input, it also
277 ``gets stuck''. When the requeue stage finds {\it both} a {\tt tail}
278 instruction stuck at the {\it enqueueing} input and a {\tt head}
279 instruction stuck at the {\it recirculating} input, the requeue stage
280 discards both the {\tt head} and {\tt tail} and transitions to the
281 {\sc Circulating} state.
283 \subsubsection{The {\sc Circulating} State}
285 In the {\sc Circulating} state, the dock repeatedly executes the set
286 of instructions that are in the instruction fifo.
288 In the {\sc Circulating} state, the requeue stage will not accept
289 items from its {\it enqueueing} input. Any item presented at the {\it
290 recirculating} input will be passed through to the requeue stage's
293 When an {\tt abort} instruction is executed, the requeue stage
294 transitions back to the {\sc Updating} state. Note that {\tt abort}
295 instructions include a predicate; an {\tt abort} instruction whose
296 predicate is not met will not cause this transition.
300 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
302 \section{Instructions}
304 \subsection{{\tt move}}
306 \newcommand{\bitsMove}{\setlength{\bitwidth}{5mm}
308 \begin{bytefield}{25}
309 \bitheader[b]{14-21}\\
326 \begin{bytefield}{25}
327 \bitheader[b]{0,12,13}\\
328 \bitbox[1]{10}{\raggedleft {\tt moveto} ({\tt Immediate$\to$ Path})}
331 \bitbox{13}{\tt Immediate}
334 \begin{bytefield}{25}
335 \bitheader[b]{11,12,13}\\
336 \bitbox[1]{10}{\raggedleft {\tt dispatch} ({\footnotesize {\tt DataPredecessor[37:26]$\to$ Path}})\ \ }
344 \begin{bytefield}{25}
345 \bitheader[b]{11,12,13}\\
346 \bitbox[1]{10}{\raggedleft {\tt move} ({\tt Path} unchanged):}
358 \item {\tt Ti} - Token Input: wait for the token predecessor to be full and drain it.
359 \item {\tt Di} - Data Input: wait for the data predecessor to be full and drain it.
360 \item {\tt Dc} - Data Capture: pulse the data latch.
361 \item {\tt Do} - Data Output: fill the data successor.
362 \item {\tt To} - Token Output: fill the token successor.
365 The data successor and token successor must both be empty in order for
366 a {\tt move} instruction to attempt execution.
369 If the {\tt S} bit is set (not shown -- there is no space left!), the
370 {\tt move} instruction will subtract one from the {\tt LC} counter
371 each time it executes.
372 NOTE: the flavor of {\tt set} instruction which decrements the counter
373 is now unnecessary; we can simply use a ``do-nothing {\tt move}'' with
374 the {\tt S}-bit set for that.
376 If the {\tt R} bit is set, the {\tt move} instruction will execute
377 repeatedly until its predicate no longer holds (or a torpedo strikes).
378 An ``infinite'' or ``standing'' move can be achieved by setting the
379 {\tt R} bit and clearing the {\tt S} bit.
382 \subsection*{Torpedoes}
384 The {\tt I} bit stands for {\tt Immune}, and indicates if the
385 instruction is immune to torpedoes. If a {\tt move} instruction which
386 is not immune is waiting to execute and a torpedo is lying in wait,
387 the torpedo {\it strikes}. When a torpedo strikes, the
388 {\tt move} instruction and the torpedo are both consumed and the {\tt
391 \subsection*{The C Flag}
393 Every time the {\tt move} instruction executes, the {\tt C} flag may
397 \item At an {\it input} dock the {\tt C} flag is set to the signal bit
398 of the incoming packet.
400 \item At an {\it output} dock the {\tt C} flag is set to a value
401 provided by the ship if the {\tt Dc} bit is set. If the {\tt
402 Dc} bit is not set, the {\tt C} flag is set to the signal bit of
407 \subsection*{Flushing}
409 The {\tt flush} instruction is a variant of {\tt move} which is valid
410 only at input docks. It has the same effect as {\tt deliver}, except
411 that it sets a special ``flushing'' indicator along with the data
414 \newcommand{\bitsFlush}{\setlength{\bitwidth}{5mm}
416 \begin{bytefield}{25}
417 \bitheader[b]{14-18}\\
418 \bitbox[r]{6}{\raggedleft{\tt flush\ \ }}
433 When a ship fires, it must examine the ``flushing'' indicators on the
434 input docks whose fullness was part of the firing condition. If all
435 of the input docks' flushing indicators are set, the ship must drain
436 all of their data successors and take no action. If some, but not
437 all, of the indicators are set, the ship must drain {\it only the data
438 successors of the docks whose indicators were {\bf not} set}, and
439 take no action. If none of the flushing indicators was set, the ship
445 \subsection{{\tt set}}
447 The {\tt set} command is used to set the data latch, the flags, or the
450 \newcommand{\bitsSet}{
451 {\tt\begin{bytefield}{25}
452 \bitheader[b]{19-21}\\
464 \begin{bytefield}{25}
465 \bitheader[b]{0,11-18}\\
466 \bitbox[1]{5}{\raggedleft {\tt Immediate}$\to${\tt LC}}
470 \bitbox{12}{\tt Immediate}
473 \begin{bytefield}{25}
474 \bitheader[b]{12-18}\\
475 \bitbox[1]{5}{\raggedleft {\tt Data Latch}$\to${\tt LC}}
482 \begin{bytefield}{25}
483 \bitheader[b]{0,13-18}\\
484 \bitbox[1]{5}{\raggedleft \footnotesize {\tt Sign-Extended Immediate}$\to${\tt Data Latch}}
487 \bitbox{1}{\begin{minipage}{0.5cm}{
494 \bitbox{14}{\tt Immediate}
497 \begin{bytefield}{25}
498 \bitheader[b]{0,5,6,11,15-18}\\
499 \bitbox[1]{5}{\raggedleft {\tt Update Flags}}
503 \bitbox{6}{\tt nextA}
504 \bitbox{6}{\tt nextB}
510 The FleetTwo implementation is likely to have an unarchitected
511 ``literal latch'' at the on deck ({\tt OD}) stage, which is loaded
512 with the possibly-extended literal {\it at the time that the {\tt set}
513 instruction comes on deck}. This latch is then copied into the data
514 latch when a {\tt set Data Latch} instruction
517 The {\tt Sign-Extended Immediate} instruction copies the {\tt
518 Immediate} field into the least significant bits of the data latch.
519 All other bits of the data latch are filled with a copy of the
520 bit marked ``{\tt Sign}.''
523 Each of the {\tt nextA} and {\tt nextB} fields has the following
524 structure, and indicates which old flag values should be logically
525 {\tt OR}ed together to produce the new flag value:
531 \bitbox{1}{${\text{\tt A}}$}
532 \bitbox{1}{$\overline{\text{\tt A}}$}
533 \bitbox{1}{${\text{\tt B}}$}
534 \bitbox{1}{$\overline{\text{\tt B}}$}
535 \bitbox{1}{${\text{{\tt C}\ }}$}
536 \bitbox{1}{$\overline{\text{{\tt C}\ }}$}
540 Each bit corresponds to one possible input; all inputs whose bits are
541 set are {\tt OR}ed together, and the resulting value is assigned to
542 the flag. Note that if none of the bits are set, the value assigned
543 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
544 OR}ing any flag with its complement, and that {\tt set Flags} can
545 be used to create a {\tt nop} (no-op) by setting each flag to itself.
551 \subsection{{\tt shift}}
553 \newcommand{\shiftImmediateSize}{19}
555 Each {\tt shift} instruction carries an immediate of \shiftImmediateSize\
556 bits. When a {\tt shift} instruction is executed, this immediate is copied
557 into the least significant \shiftImmediateSize\ bits of the data latch,
558 and the remaining most significant bits of the data latch are loaded
559 with the value formerly in the least significant bits of the data latch.
560 In this manner, large literals can be built up by ``shifting'' them
561 into the data latch \shiftImmediateSize\ bits at a time.
563 \newcommand{\bitsShift}{
564 \setlength{\bitwidth}{5mm}
566 \begin{bytefield}{25}
567 \bitheader[b]{0,18-21}\\
576 \bitbox{\shiftImmediateSize}{Immediate}
581 The FleetTwo implementation is likely to have an unarchitected
582 ``literal latch'' at the on deck ({\tt OD}) stage, which is loaded
583 with the literal {\it at the time that the {\tt shift} instruction
584 comes on deck}. This latch is then copied into the data latch when
585 the instruction executes.
589 \subsection{{\tt abort}}
590 \newcommand{\bitsAbort}{\setlength{\bitwidth}{5mm}
592 \begin{bytefield}{25}
593 \bitheader[b]{18-21}\\
608 An {\tt abort} instruction causes a loop to exit; see the section on
609 the Requeue Stage for further details.
611 \subsection{{\tt head}}
612 \newcommand{\bitsHead}{
613 \setlength{\bitwidth}{5mm}
615 \begin{bytefield}{25}
616 \bitheader[b]{18-21}\\
631 A {\tt head} instruction marks the start of a loop; see the section on
632 the Requeue Stage for further details.
635 \subsection{{\tt tail}}
636 \newcommand{\bitsTail}{
637 \setlength{\bitwidth}{5mm}
639 \begin{bytefield}{25}
640 \bitheader[b]{18-21}\\
655 A {\tt tail} instruction marks the end of a loop; see the section on
656 the Requeue Stage for further details.
658 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
660 \section*{Instruction Encoding Map}
663 \vspace{3mm}\hspace{-1cm}{\tt move}\hspace{1cm}\vspace{-6mm}\\
667 \vspace{3mm}\hspace{-1cm}{\tt shift}\hspace{1cm}\vspace{-6mm}\\
670 \vspace{3mm}\hspace{-1cm}{\tt set}\hspace{1cm}\vspace{-6mm}\\
673 \vspace{3mm}\hspace{-1cm}{\tt abort}\hspace{1cm}\vspace{-6mm}\\
676 \vspace{3mm}\hspace{-1cm}{\tt head}\hspace{1cm}\vspace{-6mm}\\
679 \vspace{3mm}\hspace{-1cm}{\tt tail}\hspace{1cm}\vspace{-6mm}\\