1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
10 \bibliographystyle{alpha}
11 \pagestyle{fancyplain}
13 \definecolor{light}{gray}{0.7}
15 \title{\vspace{-1cm}AM42: The F2 Dock
32 - tokenhood as address bit
33 - signal/path boundary/etc
34 - Rename EPI and OD to something more meaningful
36 - get rid of shadow latch
38 - figure out C-flag / signal bit situation
39 - Suggestion that there should be a "T" flag
40 - Get rid of "shadow latch" for literals?
41 - unify flags and signal bit by saying that the dock can
42 see the upper X bits of a word?
43 - should have a way to set just the upper X bits of the word
44 - flags are actually part of the data latch!
45 - the signal bit(s) belong to the Destination (or is it the Path?)
47 - How do you get a runtime count value to an input dock?
48 - Simplify the whole c-flag/signal-bit situation
49 - tokenhood should be LITERALLY an address bit!
52 - ship-to-ship data (words)
53 - dock-to-dock data (signal bits)
54 - ship-to-dock data (c-flag)
55 - dock-to-ship data (flushing/bonus bit)
58 - ship word (at output dock)
60 - packet word (at input dock)
76 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
77 \section{Overview of Fleet}
79 A Fleet processor is organized around a {\it switch fabric}, which is
80 a packet-switched network with reliable in-order delivery. The switch
81 fabric is used to carry data between different functional units,
82 called {\it ships}. Each ship is connected to the switch fabric by
83 one or more programmable elements known as {\it docks}.
85 A {\it path} specifies a route through the switch fabric from a
86 particular {\it source} to a particular {\it destination}. The
87 combination of a path and a single word to be delivered is called a
88 {\it packet}. The switch fabric carries packets from their sources to
89 their destinations. Each dock has four\
90 destinations: one each for {\it instructions}, {\it
91 torpedoes}, {\it tokens},\ and {\it words}. A Fleet is
92 programmed by depositing instruction packets into the switch fabric
93 with paths that will lead them to instruction destinations of the
94 docks at which they are to execute.
96 When a packet arrives at the instruction destination of a dock, it is
97 enqueued for execution. Before the instruction executes, it may cause
98 the dock to wait for a packet to arrive at the dock's data destination
99 or for a value to be presented by the ship. When an instruction
100 executes it may consume this data and may present a data value to the
101 ship or transmit a packet.
103 Packets sent to token and torpedo destinations carry no payload. Such
104 packets consume less energy than instruction packets or word packets.
107 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
109 \section{The FleetTwo Dock}
111 The diagram below represents a conceptual view of the interface
112 between ships and the switch fabric; actual implementation circuitry
117 Each dock consists of a {\it data latch}, which is as wide as a single
118 machine word and a circular {\it instruction fifo} of
119 instruction-width latches. The values in the instruction fifo control
120 the data latch. The dock also includes a {\it path latch}, which
121 stores the path along which outgoing packets will be
124 Note that the instruction fifo in each dock has a destination of its
125 own; this is the {\it instruction destination} mentioned in the
126 previous section. A token sent to an instruction destination is
127 called a {\it torpedo}; it does not enter the instruction fifo, but
128 rather is held in a waiting area where it may interrupt certain
129 instructions (see the section on the {\tt move} instruction for further
132 From any source to any dock's data destination there are
133 two distinct paths which differ by a single bit. This bit is known as
134 the ``signal'' bit, and the routing of a packet is not affected by it;
135 the signal bit is used to pass control values between docks. Note that paths
136 terminating at an {\it instruction} destination need not have a signal
140 Source-sequence guarantee. Shared across instruction/torpedo (?) and
141 token/word destinations.
143 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
145 \section{Instructions}
147 In order to cause an instruction to execute, the programmer must first
148 arrange for that instruction word to arrive in the data latch of some
149 output dock. For example, this might be the ``data read'' output dock
150 of the memory access ship or the output of a fifo ship. Once an
151 instruction has arrived at this output dock, it is {\it dispatched} by
152 sending it to the {\it instruction destination} of the dock at which
155 Each instruction is 25\ bits long, which makes
156 it possible for an instruction and an 12-bit
157 path to fit in a single word of memory. This path is the path from
158 the {\it dispatching} dock to the {\it executing} dock.
162 \setlength{\bitwidth}{3.5mm}
164 \begin{bytefield}{37}
165 \bitheader[b]{0,24,25,36}\\
166 \bitbox{12}{dispatch path}
167 \bitbox{25}{instruction}
171 Note that the 12\ bit {\tt dispatch path}
172 field is not the same width as the 13 bit {\tt Immediate} path field
173 in the {\tt move} instruction, which in turn may not be the same width
174 as the actual path latches in the switch fabric.
176 The algorithm for expanding a path to a wider width is specific to the
177 switch fabric implementation, and is not specified by this
178 document.\footnote{for the Marina experiment, the correct
179 algorithm is to sign-extend the path; the most significant bit of
180 the given path is used to fill the vacant bit of the latch} In
181 particular, because the {\tt dispatch path} field is always used to
182 specify a path which terminates at an instruction destination (never a
183 data destination), and because instruction destinations ignore the
184 signal bit, certain optimizations may be possible.
187 \subsection{Loop Counter}
189 A programmer can perform two types of loops: {\it inner} loops
190 consisting of only one {\tt move} instruction and {\it outer} loops of
191 multiple instructions of any type. Inner loops may be nested within
192 an outer loop, but no other nesting of loops is allowed.
194 The dock has one loop counter, called {\tt LC}. It is the
195 same width as a word carried through the switch fabric (37 bits).
199 The dock has four flags: {\tt A}, {\tt B}, {\tt C}, and {\tt Z}.
202 \item The {\tt A} and {\tt B} flags are general-purpose flags which
203 may be set and cleared by the programmer.
205 \item The {\tt C} flag is known as the {\it control} flag, and may be
206 set by the {\tt move} instruction based on information from the
207 ship or from an inbound packet. See the {\tt move} instruction
210 \item The {\tt P} flag is used for predication; see the next section
211 for details. When a torpedo strikes or the counter is
212 decremented from any value to zero, the {\tt P} flag is cleared.
213 The {\tt P} flag may also be set and cleared by the {\tt set}
216 \item The {\tt Z} flag is known as the
217 {\it zero} flag. The {\tt
218 Z}\ flag is {\it set} whenever the {\tt LC} is zero.
219 In an actual implementation the {\tt Z}\
220 flag might require an actual latch; it might simply be derived
221 from the ``zeroness'' of the {\tt LC}.
225 \subsection{Predication}
227 All instructions except for {\tt head} and {\tt tail} have a bit
228 marked {\tt U}, for {\it unconditional}. An instruction with the {\tt
229 U} bit set always executes. An instruction with the {\tt U} bit
230 cleared will execute {\it only if the {\tt P} flag is set}.
233 \setlength{\bitwidth}{5mm}
235 \begin{bytefield}{25}
236 \bitheader[b]{0,24}\\
243 \subsection{The Requeue Stage}
245 The requeue stage has two inputs, which will be referred to as the
246 {\it enqueueing} input and the {\it recirculating} input. It has a
247 single output which feeds into the instruction fifo.
249 The requeue stage has two states: {\sc Updating} and {\sc
252 \subsubsection{The {\sc Updating} State}
254 On initialization, the dock is in the {\sc Updating} state. In this
255 state the requeue stage is performing three tasks:
257 \item it is draining the
258 previous loop's instructions (if any) from the fifo
259 \item it is executing any ``one
260 shot'' instructions which come between the previous loop's {\tt tail}
261 and the next loop's {\tt head}
262 \item it is loading the instructions of
263 the next loop into the fifo.
266 In the {\sc Updating} state, the requeue stage will accept any
267 instruction other than a {\tt tail} which arrives at its {\it
268 enqueueing} input, and pass this instruction to its output. Any
269 instruction other than a {\tt head} which arrives at the {\it
270 recirculating} input will be discarded.
272 Note that when a {\tt tail} instruction arrives at the {\it
273 enqueueing} input, it ``gets stuck'' there. Likewise, when a {\tt
274 head} instruction arrives at the {\it recirculating} input, it also
275 ``gets stuck''. When the requeue stage finds {\it both} a {\tt tail}
276 instruction stuck at the {\it enqueueing} input and a {\tt head}
277 instruction stuck at the {\it recirculating} input, the requeue stage
278 discards both the {\tt head} and {\tt tail} and transitions to the
279 {\sc Circulating} state.
281 \subsubsection{The {\sc Circulating} State}
283 In the {\sc Circulating} state, the dock repeatedly executes the set
284 of instructions that are in the instruction fifo.
286 In the {\sc Circulating} state, the requeue stage will not accept
287 items from its {\it enqueueing} input. Any item presented at the {\it
288 recirculating} input will be passed through to the requeue stage's
291 When an {\tt abort} instruction is executed, the requeue stage
292 transitions back to the {\sc Updating} state. Note that {\tt abort}
293 instructions include a {\tt U} bit -- an {\tt abort} instruction with that bit set
294 will not cause this transition when the {\tt P} flag is cleared.
298 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
300 \section{Instructions}
302 \subsection{{\tt move}}
304 \newcommand{\bitsMove}{\setlength{\bitwidth}{5mm}
306 \begin{bytefield}{25}
307 \bitheader[b]{14-21}\\
325 \begin{bytefield}{25}
326 \bitheader[b]{0,12,13}\\
327 \bitbox[1]{10}{\raggedleft {\tt moveto} ({\tt Immediate$\to$ Path})}
330 \bitbox{13}{\tt Immediate}
333 \begin{bytefield}{25}
334 \bitheader[b]{11,12,13}\\
335 \bitbox[1]{10}{\raggedleft {\tt dispatch} ({\footnotesize {\tt DataPredecessor[37:26]$\to$ Path}})\ \ }
343 \begin{bytefield}{25}
344 \bitheader[b]{11,12,13}\\
345 \bitbox[1]{10}{\raggedleft {\tt move} ({\tt Path} unchanged):}
357 \item {\tt Fi} - Fabric input: wait for fabric predecessor to be full and drain it.
358 \item {\tt Fo} - Fabric output: wait for fabric successor to be empty and fill it.
359 \item {\tt Dc} - Data Capture: pulse the data latch.
360 \item {\tt Sh} - Ship: at an input/output dock, wait for the ship successor/predecessor to be empty/full and fill/drain it.
363 The fabric successor must be empty in order for a {\tt move}
364 instruction to attempt execution.
366 If the {\tt S} bit is set, the {\tt move} instruction will subtract
367 one from the {\tt LC} counter each time it executes. An instruction
368 with only this bit set (and no other) takes the place of the dedicated
369 ``decrement OLC'' instruction in previous designs.
371 If the {\tt R} bit is set, the {\tt move} instruction will execute
372 repeatedly until its predicate no longer holds (or a torpedo strikes).
373 An ``infinite'' or ``standing'' move can be achieved by setting the
374 {\tt R} bit and clearing the {\tt S} bit.
376 The {\tt I} bit stands for {\tt Immune}, and indicates if the
377 instruction is immune to torpedoes. If a {\tt move} instruction which
378 is not immune is waiting to execute and a torpedo is lying in wait,
379 the torpedo {\it strikes}. When a torpedo strikes, the
380 {\tt move} instruction and the torpedo are both consumed and the {\tt
383 \subsection*{The C Flag}
385 Every time the {\tt move} instruction executes, the {\tt C} flag may
389 \item At an {\it input} dock the {\tt C} flag is set to the signal bit
390 of the incoming packet.
392 \item At an {\it output} dock the {\tt C} flag is set to a value
393 provided by the ship if the {\tt Dc} bit is set. If the {\tt
394 Dc} bit is not set, the {\tt C} flag is set to the signal bit of
399 \subsection*{Flushing}
401 The {\tt flush} instruction is a variant of {\tt move} which is valid
402 only at input docks. It has the same effect as {\tt deliver}, except
403 that it sets a special ``flushing'' indicator along with the data
406 \newcommand{\bitsFlush}{\setlength{\bitwidth}{5mm}
408 \begin{bytefield}{25}
409 \bitheader[b]{14-18}\\
410 \bitbox[r]{6}{\raggedleft{\tt flush\ \ }}
425 When a ship fires, it must examine the ``flushing'' indicators on the
426 input docks whose fullness was part of the firing condition. If all
427 of the input docks' flushing indicators are set, the ship must drain
428 all of their data successors and take no action. If some, but not
429 all, of the indicators are set, the ship must drain {\it only the data
430 successors of the docks whose indicators were {\bf not} set}, and
431 take no action. If none of the flushing indicators was set, the ship
437 \subsection{{\tt set}}
439 The {\tt set} command is used to set the data latch, the flags, or the
442 \newcommand{\bitsSet}{
443 {\tt\begin{bytefield}{25}
444 \bitheader[b]{19-21}\\
458 \begin{bytefield}{25}
459 \bitheader[b]{0,11-18}\\
460 \bitbox[1]{5}{\raggedleft {\tt Immediate}$\to${\tt LC}}
464 \bitbox{12}{\tt Immediate}
467 \begin{bytefield}{25}
468 \bitheader[b]{12-18}\\
469 \bitbox[1]{5}{\raggedleft {\tt Data Latch}$\to${\tt LC}}
476 \begin{bytefield}{25}
477 \bitheader[b]{0,13-18}\\
478 \bitbox[1]{5}{\raggedleft \footnotesize {\tt Sign-Extended Immediate}$\to${\tt Data Latch}}
481 \bitbox{1}{\begin{minipage}{0.5cm}{
488 \bitbox{14}{\tt Immediate}
491 \begin{bytefield}{25}
492 \bitheader[b]{0,5,6,11,15-18}\\
493 \bitbox[1]{5}{\raggedleft {\tt Update Flags}}
497 \bitbox{6}{\tt nextA}
498 \bitbox{6}{\tt nextB}
504 The FleetTwo implementation is likely to have an unarchitected
505 ``literal latch'' at the on deck ({\tt OD}) stage, which is loaded
506 with the possibly-extended literal {\it at the time that the {\tt set}
507 instruction comes on deck}. This latch is then copied into the data
508 latch when a {\tt set Data Latch} instruction
511 The {\tt Sign-Extended Immediate} instruction copies the {\tt
512 Immediate} field into the least significant bits of the data latch.
513 All other bits of the data latch are filled with a copy of the
514 bit marked ``{\tt Sign}.''
517 Each of the {\tt nextA} and {\tt nextB} fields has the following
518 structure, and indicates which old flag values should be logically
519 {\tt OR}ed together to produce the new flag value:
525 \bitbox{1}{${\text{\tt A}}$}
526 \bitbox{1}{$\overline{\text{\tt A}}$}
527 \bitbox{1}{${\text{\tt B}}$}
528 \bitbox{1}{$\overline{\text{\tt B}}$}
529 \bitbox{1}{${\text{{\tt C}\ }}$}
530 \bitbox{1}{$\overline{\text{{\tt C}\ }}$}
534 Each bit corresponds to one possible input; all inputs whose bits are
535 set are {\tt OR}ed together, and the resulting value is assigned to
536 the flag. Note that if none of the bits are set, the value assigned
537 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
538 OR}ing any flag with its complement, and that {\tt set Flags} can
539 be used to create a {\tt nop} (no-op) by setting each flag to itself.
545 \subsection{{\tt shift}}
547 \newcommand{\shiftImmediateSize}{19}
549 Each {\tt shift} instruction carries an immediate of \shiftImmediateSize\
550 bits. When a {\tt shift} instruction is executed, this immediate is copied
551 into the least significant \shiftImmediateSize\ bits of the data latch,
552 and the remaining most significant bits of the data latch are loaded
553 with the value formerly in the least significant bits of the data latch.
554 In this manner, large literals can be built up by ``shifting'' them
555 into the data latch \shiftImmediateSize\ bits at a time.
557 \newcommand{\bitsShift}{
558 \setlength{\bitwidth}{5mm}
560 \begin{bytefield}{25}
561 \bitheader[b]{0,18-21}\\
572 \bitbox{\shiftImmediateSize}{Immediate}
577 The FleetTwo implementation is likely to have an unarchitected
578 ``literal latch'' at the on deck ({\tt OD}) stage, which is loaded
579 with the literal {\it at the time that the {\tt shift} instruction
580 comes on deck}. This latch is then copied into the data latch when
581 the instruction executes.
585 \subsection{{\tt abort}}
586 \newcommand{\bitsAbort}{\setlength{\bitwidth}{5mm}
588 \begin{bytefield}{25}
589 \bitheader[b]{18-21}\\
606 An {\tt abort} instruction causes a loop to exit; see the section on
607 the Requeue Stage for further details.
609 \subsection{{\tt head}}
610 \newcommand{\bitsHead}{
611 \setlength{\bitwidth}{5mm}
613 \begin{bytefield}{25}
614 \bitheader[b]{18-21}\\
629 A {\tt head} instruction marks the start of a loop; see the section on
630 the Requeue Stage for further details.
633 \subsection{{\tt tail}}
634 \newcommand{\bitsTail}{
635 \setlength{\bitwidth}{5mm}
637 \begin{bytefield}{25}
638 \bitheader[b]{18-21}\\
653 A {\tt tail} instruction marks the end of a loop; see the section on
654 the Requeue Stage for further details.
656 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
658 \section*{Instruction Encoding Map}
661 \vspace{3mm}\hspace{-1cm}{\tt move}\hspace{1cm}\vspace{-6mm}\\
665 \vspace{3mm}\hspace{-1cm}{\tt shift}\hspace{1cm}\vspace{-6mm}\\
668 \vspace{3mm}\hspace{-1cm}{\tt set}\hspace{1cm}\vspace{-6mm}\\
671 \vspace{3mm}\hspace{-1cm}{\tt abort}\hspace{1cm}\vspace{-6mm}\\
674 \vspace{3mm}\hspace{-1cm}{\tt head}\hspace{1cm}\vspace{-6mm}\\
677 \vspace{3mm}\hspace{-1cm}{\tt tail}\hspace{1cm}\vspace{-6mm}\\