1 \documentclass[10pt]{article}
6 \usepackage{bytefield1}
10 \bibliographystyle{alpha}
11 \pagestyle{fancyplain}
13 \definecolor{light}{gray}{0.7}
15 \title{\vspace{-1cm}AM42: The F2 Dock
32 "Supertorpedo" for retrieving dock state.
34 THEREFORE: I propose that DockTwo only be able to load the counter from the data latch, not from the instruction word. This gets rid of an entire instruction. In exchange, I would like to propose that "data predecessor full is a condition of execution" and "drain data predecessor when you execute" be different opcode bits.
37 - debug capability? ability to kick the dock in the head to get it
38 to spit out its state?
40 - tokenhood as address bit
41 - signal/path boundary/etc
42 - Rename EPI and OD to something more meaningful
44 - get rid of shadow latch
46 - figure out C-flag / signal bit situation
47 - Suggestion that there should be a "T" flag
48 - Get rid of "shadow latch" for literals?
49 - unify flags and signal bit by saying that the dock can
50 see the upper X bits of a word?
51 - should have a way to set just the upper X bits of the word
52 - flags are actually part of the data latch!
53 - the signal bit(s) belong to the Destination (or is it the Path?)
55 - How do you get a runtime count value to an input dock?
56 - Simplify the whole c-flag/signal-bit situation
57 - tokenhood should be LITERALLY an address bit!
60 - ship-to-ship data (words)
61 - dock-to-dock data (signal bits)
62 - ship-to-dock data (c-flag)
63 - dock-to-ship data (flushing/bonus bit)
66 - ship word (at output dock)
68 - packet word (at input dock)
84 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
85 \section{Overview of Fleet}
87 A Fleet processor is organized around a {\it switch fabric}, which is
88 a packet-switched network with reliable in-order delivery. The switch
89 fabric is used to carry data between different functional units,
90 called {\it ships}. Each ship is connected to the switch fabric by
91 one or more programmable elements known as {\it docks}.
93 A {\it path} specifies a route through the switch fabric from a
94 particular {\it source} to a particular {\it destination}. The
95 combination of a path and a single word to be delivered is called a
96 {\it packet}. The switch fabric carries packets from their sources to
97 their destinations. Each dock has four\
98 destinations: one each for {\it instructions}, {\it
99 torpedoes}, {\it tokens},\ and {\it words}. A Fleet is
100 programmed by depositing instruction packets into the switch fabric
101 with paths that will lead them to instruction destinations of the
102 docks at which they are to execute.
104 When a packet arrives at the instruction destination of a dock, it is
105 enqueued for execution. Before the instruction executes, it may cause
106 the dock to wait for a packet to arrive at the dock's data destination
107 or for a value to be presented by the ship. When an instruction
108 executes it may consume this data and may present a data value to the
109 ship or transmit a packet.
111 Packets sent to token and torpedo destinations carry no payload. Such
112 packets consume less energy than instruction packets or word packets.
115 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
117 \section{The FleetTwo Dock}
119 The diagram below represents a conceptual view of the interface
120 between ships and the switch fabric; actual implementation circuitry
125 Each dock consists of a {\it data latch}, which is as wide as a single
126 machine word and a circular {\it instruction fifo} of
127 instruction-width latches. The values in the instruction fifo control
128 the data latch. The dock also includes a {\it path latch}, which
129 stores the path along which outgoing packets will be
132 Note that the instruction fifo in each dock has a destination of its
133 own; this is the {\it instruction destination} mentioned in the
134 previous section. A token sent to an instruction destination is
135 called a {\it torpedo}; it does not enter the instruction fifo, but
136 rather is held in a waiting area where it may interrupt certain
137 instructions (see the section on the {\tt move} instruction for further
140 From any source to any dock's data destination there are
141 two distinct paths which differ by a single bit. This bit is known as
142 the ``signal'' bit, and the routing of a packet is not affected by it;
143 the signal bit is used to pass control values between docks. Note that paths
144 terminating at an {\it instruction} destination need not have a signal
148 Source-sequence guarantee. Shared across instruction/torpedo (?) and
149 token/word destinations.
151 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
153 \section{Instructions}
155 In order to cause an instruction to execute, the programmer must first
156 arrange for that instruction word to arrive in the data latch of some
157 output dock. For example, this might be the ``data read'' output dock
158 of the memory access ship or the output of a fifo ship. Once an
159 instruction has arrived at this output dock, it is {\it dispatched} by
160 sending it to the {\it instruction destination} of the dock at which
163 Each instruction is 25\ bits long, which makes
164 it possible for an instruction and an 12-bit
165 path to fit in a single word of memory. This path is the path from
166 the {\it dispatching} dock to the {\it executing} dock.
170 \setlength{\bitwidth}{3.5mm}
172 \begin{bytefield}{37}
173 \bitheader[b]{0,24,25,36}\\
174 \bitbox{12}{dispatch path}
175 \bitbox{25}{instruction}
179 Note that the 12\ bit {\tt dispatch path}
180 field is not the same width as the 13 bit {\tt Immediate} path field
181 in the {\tt move} instruction, which in turn may not be the same width
182 as the actual path latches in the switch fabric.
184 The algorithm for expanding a path to a wider width is specific to the
185 switch fabric implementation, and is not specified by this
186 document.\footnote{for the Marina experiment, the correct
187 algorithm is to sign-extend the path; the most significant bit of
188 the given path is used to fill the vacant bit of the latch} In
189 particular, because the {\tt dispatch path} field is always used to
190 specify a path which terminates at an instruction destination (never a
191 data destination), and because instruction destinations ignore the
192 signal bit, certain optimizations may be possible.
195 \subsection{Loop Counter}
197 A programmer can perform two types of loops: {\it inner} loops
198 consisting of only one {\tt move} instruction and {\it outer} loops of
199 multiple instructions of any type. Inner loops may be nested within
200 an outer loop, but no other nesting of loops is allowed.
202 The dock has one loop counter, called {\tt LC}. It is the
203 same width as a word carried through the switch fabric (37 bits).
207 The dock has four flags: {\tt A}, {\tt B}, {\tt C}, and {\tt Z}.
210 \item The {\tt A} and {\tt B} flags are general-purpose flags which
211 may be set and cleared by the programmer.
213 \item The {\tt C} flag is known as the {\it control} flag, and may be
214 set by the {\tt move} instruction based on information from the
215 ship or from an inbound packet. See the {\tt move} instruction
218 \item The {\tt P} flag is used for predication; see the next section
219 for details. When a torpedo strikes or the counter is
220 decremented from any value to zero, the {\tt P} flag is cleared.
221 The {\tt P} flag may also be set and cleared by the {\tt set}
224 \item The {\tt Z} flag is known as the
225 {\it zero} flag. The {\tt
226 Z}\ flag is {\it set} whenever the {\tt LC} is zero.
227 In an actual implementation the {\tt Z}\
228 flag might require an actual latch; it might simply be derived
229 from the ``zeroness'' of the {\tt LC}.
233 \subsection{Predication}
235 All instructions except for {\tt head} and {\tt tail} have a bit
236 marked {\tt U}, for {\it unconditional}. An instruction with the {\tt
237 U} bit set always executes. An instruction with the {\tt U} bit
238 cleared will execute {\it only if the {\tt P} flag is set}.
241 \setlength{\bitwidth}{5mm}
243 \begin{bytefield}{25}
244 \bitheader[b]{0,24}\\
251 \subsection{The Requeue Stage}
253 The requeue stage has two inputs, which will be referred to as the
254 {\it enqueueing} input and the {\it recirculating} input. It has a
255 single output which feeds into the instruction fifo.
257 The requeue stage has two states: {\sc Updating} and {\sc
260 \subsubsection{The {\sc Updating} State}
262 On initialization, the dock is in the {\sc Updating} state. In this
263 state the requeue stage is performing three tasks:
265 \item it is draining the
266 previous loop's instructions (if any) from the fifo
267 \item it is executing any ``one
268 shot'' instructions which come between the previous loop's {\tt tail}
269 and the next loop's {\tt head}
270 \item it is loading the instructions of
271 the next loop into the fifo.
274 In the {\sc Updating} state, the requeue stage will accept any
275 instruction other than a {\tt tail} which arrives at its {\it
276 enqueueing} input, and pass this instruction to its output. Any
277 instruction other than a {\tt head} which arrives at the {\it
278 recirculating} input will be discarded.
280 Note that when a {\tt tail} instruction arrives at the {\it
281 enqueueing} input, it ``gets stuck'' there. Likewise, when a {\tt
282 head} instruction arrives at the {\it recirculating} input, it also
283 ``gets stuck''. When the requeue stage finds {\it both} a {\tt tail}
284 instruction stuck at the {\it enqueueing} input and a {\tt head}
285 instruction stuck at the {\it recirculating} input, the requeue stage
286 discards both the {\tt head} and {\tt tail} and transitions to the
287 {\sc Circulating} state.
289 \subsubsection{The {\sc Circulating} State}
291 In the {\sc Circulating} state, the dock repeatedly executes the set
292 of instructions that are in the instruction fifo.
294 In the {\sc Circulating} state, the requeue stage will not accept
295 items from its {\it enqueueing} input. Any item presented at the {\it
296 recirculating} input will be passed through to the requeue stage's
299 When an {\tt abort} instruction is executed, the requeue stage
300 transitions back to the {\sc Updating} state. Note that {\tt abort}
301 instructions include a {\tt U} bit -- an {\tt abort} instruction with that bit set
302 will not cause this transition when the {\tt P} flag is cleared.
306 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
308 \section{Instructions}
310 \subsection{{\tt move}}
312 \newcommand{\bitsMove}{\setlength{\bitwidth}{5mm}
314 \begin{bytefield}{25}
315 \bitheader[b]{14-21}\\
333 \begin{bytefield}{25}
334 \bitheader[b]{0,12,13}\\
335 \bitbox[1]{10}{\raggedleft {\tt moveto} ({\tt Immediate$\to$ Path})}
338 \bitbox{13}{\tt Immediate}
341 \begin{bytefield}{25}
342 \bitheader[b]{11,12,13}\\
343 \bitbox[1]{10}{\raggedleft {\tt dispatch} ({\footnotesize {\tt DataPredecessor[37:26]$\to$ Path}})\ \ }
351 \begin{bytefield}{25}
352 \bitheader[b]{11,12,13}\\
353 \bitbox[1]{10}{\raggedleft {\tt move} ({\tt Path} unchanged):}
365 \item {\tt Fi} - Fabric input: wait for fabric predecessor to be full and drain it.
366 \item {\tt Fo} - Fabric output: wait for fabric successor to be empty and fill it.
367 \item {\tt Dc} - Data Capture: pulse the data latch.
368 \item {\tt Sh} - Ship: at an input/output dock, wait for the ship successor/predecessor to be empty/full and fill/drain it.
371 Ability to await and capture, but not drain, data predecessor.
373 The fabric successor must be empty in order for a {\tt move}
374 instruction to attempt execution.
376 If the {\tt S} bit is set, the {\tt move} instruction will subtract
377 one from the {\tt LC} counter each time it executes. An instruction
378 with only this bit set (and no other) takes the place of the dedicated
379 ``decrement OLC'' instruction in previous designs.
381 If the {\tt R} bit is set, the {\tt move} instruction will execute
382 repeatedly until its predicate no longer holds (or a torpedo strikes).
383 An ``infinite'' or ``standing'' move can be achieved by setting the
384 {\tt R} bit and clearing the {\tt S} bit.
386 The {\tt I} bit stands for {\tt Immune}, and indicates if the
387 instruction is immune to torpedoes. If a {\tt move} instruction which
388 is not immune is waiting to execute and a torpedo is lying in wait,
389 the torpedo {\it strikes}. When a torpedo strikes, the
390 {\tt move} instruction and the torpedo are both consumed and the {\tt
393 \subsection*{The C Flag}
395 Every time the {\tt move} instruction executes, the {\tt C} flag may
399 \item At an {\it input} dock the {\tt C} flag is set to the signal bit
400 of the incoming packet.
402 \item At an {\it output} dock the {\tt C} flag is set to a value
403 provided by the ship if the {\tt Dc} bit is set. If the {\tt
404 Dc} bit is not set, the {\tt C} flag is set to the signal bit of
409 \subsection*{Flushing}
411 The {\tt flush} instruction is a variant of {\tt move} which is valid
412 only at input docks. It has the same effect as {\tt deliver}, except
413 that it sets a special ``flushing'' indicator along with the data
416 \newcommand{\bitsFlush}{\setlength{\bitwidth}{5mm}
418 \begin{bytefield}{25}
419 \bitheader[b]{14-18}\\
420 \bitbox[r]{6}{\raggedleft{\tt flush\ \ }}
435 When a ship fires, it must examine the ``flushing'' indicators on the
436 input docks whose fullness was part of the firing condition. If all
437 of the input docks' flushing indicators are set, the ship must drain
438 all of their data successors and take no action. If some, but not
439 all, of the indicators are set, the ship must drain {\it only the data
440 successors of the docks whose indicators were {\bf not} set}, and
441 take no action. If none of the flushing indicators was set, the ship
447 \subsection{{\tt set}}
449 The {\tt set} command is used to set the data latch, the flags, or the
452 \newcommand{\bitsSet}{
453 {\tt\begin{bytefield}{25}
454 \bitheader[b]{19-21}\\
468 \begin{bytefield}{25}
469 \bitheader[b]{12-18}\\
470 \bitbox[1]{5}{\raggedleft {\tt Data Latch}$\to${\tt LC}}
477 \begin{bytefield}{25}
478 \bitheader[b]{0,5,6,11,15-18}\\
479 \bitbox[1]{5}{\raggedleft {\tt Update Flags}}
483 \bitbox{6}{\tt nextA}
484 \bitbox{6}{\tt nextB}
490 The FleetTwo implementation is likely to have an unarchitected
491 ``literal latch'' at the on deck ({\tt OD}) stage, which is loaded
492 with the possibly-extended literal {\it at the time that the {\tt set}
493 instruction comes on deck}. This latch is then copied into the data
494 latch when a {\tt set Data Latch} instruction
497 The {\tt Sign-Extended Immediate} instruction copies the {\tt
498 Immediate} field into the least significant bits of the data latch.
499 All other bits of the data latch are filled with a copy of the
500 bit marked ``{\tt Sign}.''
503 Each of the {\tt nextA} and {\tt nextB} fields has the following
504 structure, and indicates which old flag values should be logically
505 {\tt OR}ed together to produce the new flag value:
511 \bitbox{1}{${\text{\tt A}}$}
512 \bitbox{1}{$\overline{\text{\tt A}}$}
513 \bitbox{1}{${\text{\tt B}}$}
514 \bitbox{1}{$\overline{\text{\tt B}}$}
515 \bitbox{1}{${\text{{\tt C}\ }}$}
516 \bitbox{1}{$\overline{\text{{\tt C}\ }}$}
520 Each bit corresponds to one possible input; all inputs whose bits are
521 set are {\tt OR}ed together, and the resulting value is assigned to
522 the flag. Note that if none of the bits are set, the value assigned
523 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
524 OR}ing any flag with its complement, and that {\tt set Flags} can
525 be used to create a {\tt nop} (no-op) by setting each flag to itself.
531 \subsection{{\tt literal}}
533 \newcommand{\literalImmediateSize}{19}
535 Each {\tt literal} instruction carries an immediate of
536 \literalImmediateSize\ bits. When a {\tt literal} instruction is
537 executed, this immediate is copied into the least significant
538 \literalImmediateSize\ bits of the data latch, and the remaining most
539 significant bits of the data latch are loaded with the value formerly
540 in the least significant bits of the data latch. In this manner,
541 large literals can be built up by ``shifting'' them into the data
542 latch \literalImmediateSize\ bits at a time.
544 \newcommand{\bitsLiteral}{
545 \setlength{\bitwidth}{5mm}
547 \begin{bytefield}{25}
548 \bitheader[b]{0,18-21}\\
559 \bitbox{\literalImmediateSize}{Immediate}
564 Need two forms: one of them loads the bottom half and sign-extends it
565 into the top half. The other form loads just the top half.
568 \subsection{{\tt abort}}
569 \newcommand{\bitsAbort}{\setlength{\bitwidth}{5mm}
571 \begin{bytefield}{25}
572 \bitheader[b]{18-21}\\
589 An {\tt abort} instruction causes a loop to exit; see the section on
590 the Requeue Stage for further details.
592 \subsection{{\tt head}}
593 \newcommand{\bitsHead}{
594 \setlength{\bitwidth}{5mm}
596 \begin{bytefield}{25}
597 \bitheader[b]{18-21}\\
612 A {\tt head} instruction marks the start of a loop; see the section on
613 the Requeue Stage for further details.
616 \subsection{{\tt tail}}
617 \newcommand{\bitsTail}{
618 \setlength{\bitwidth}{5mm}
620 \begin{bytefield}{25}
621 \bitheader[b]{18-21}\\
636 A {\tt tail} instruction marks the end of a loop; see the section on
637 the Requeue Stage for further details.
639 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
641 \section*{Instruction Encoding Map}
644 \vspace{3mm}\hspace{-1cm}{\tt move}\hspace{1cm}\vspace{-6mm}\\
648 \vspace{3mm}\hspace{-1cm}{\tt shift}\hspace{1cm}\vspace{-6mm}\\
651 \vspace{3mm}\hspace{-1cm}{\tt set}\hspace{1cm}\vspace{-6mm}\\
654 \vspace{3mm}\hspace{-1cm}{\tt abort}\hspace{1cm}\vspace{-6mm}\\
657 \vspace{3mm}\hspace{-1cm}{\tt head}\hspace{1cm}\vspace{-6mm}\\
660 \vspace{3mm}\hspace{-1cm}{\tt tail}\hspace{1cm}\vspace{-6mm}\\