1 \documentclass[10pt]{article}
5 \usepackage[figureright]{rotating}
8 \usepackage{bytefield1}
11 \usepackage{subfigure}
20 \bibliographystyle{alpha}
21 \pagestyle{fancyplain}
23 \definecolor{light}{gray}{0.7}
25 \setlength{\marginparwidth}{1.2in}
26 \let\oldmarginpar\marginpar
27 \renewcommand\marginpar[1]{\-\oldmarginpar[\raggedleft\footnotesize #1]%
28 {\raggedright\footnotesize #1}}
31 \newcommand{\footnoteremember}[2]{
34 \setcounter{#1}{\value{footnote}}
35 } \newcommand{\footnoterecall}[1]{
36 \footnotemark[\value{#1}]
44 %\oddsidemargin 0.25in
45 %\evensidemargin 0.25in
47 \def\to{\ $\rightarrow$\ }
57 \title{\vspace{-1cm}AM33: The Marina Docks
69 This document describes the instruction format for the docks on the
70 Marina test chip, as well as their software-visible behavior. Two
71 subsequent memos will describe the chip's circuit design and
80 %& Added errata for Kessels counter on Marina test chip \\
82 %& Added errata for Marina test chip \\
84 %& Clarified setting of the {\tt C}-flag\color{black}\\
85 %& Removed {\tt OS} bit\color{black}\\
86 %& Changed instruction length from 26 bits to 25\color{black}\\
87 %& Updated which bits are used when the {\tt Path} latch captures from the data predecessor\color{black}\\
89 %& Fixed a one-word typo \\
91 %& Added {\tt head} instruction \\
92 %& Lengthened external encoding of {\tt tail} instruction by one bit \\
93 %& Added {\tt abort} instruction \\
94 %& Removed {\tt OS} field from instructions \\
95 %& Renamed the {\tt Z}-flag (olc {\bf Z}ero) to the {\tt D}-flag (loop {\bf D}one)\\
97 %& Updated diagram in section 3 to put dispatch path near MSB\\
98 %& Changed DP[37:25] to DP[37:27]\\
99 %& Added note on page 4 regarding previous\\
101 %& Roll back ``Distinguish {\tt Z}-flag from OLC=0'' \\
102 %& Clarify what ``{\tt X-Extended}'' means \\
103 %& Change C-bit source selector from {\tt Di} to {\tt Dc} \\
105 %& Distinguish {\tt Z}-flag from OLC=0\\
106 %& Add {\tt flush} instruction\\
107 %& Change {\t I} bit from ``Interruptable'' to ``Immune''\\
109 %& Update hatch description to match \href{http://fleet.cs.berkeley.edu/docs/people/ivan.e.sutherland/ies50-Requeue.State.Diagram.pdf}{IES50} \\
111 %& Note that decision to requeue is based on value of OLC {\it before} execution\\
112 %& Note that decision to open the hatch is based on value of {\tt OS} bit\\
114 %& Added {\tt OLC=0} predicate \\
115 %& Eliminated {\tt TAPL} (made possible by previous change) \\
116 %& Expanded {\tt set} {\tt Immediate} field from 13 bits to 14 bits (made possible by previous change)\\
118 %& Fixed a few typos \\
119 %& Added {\tt DataLatch}\to{\tt TAPL} (Amir's request) \\
120 %& Eliminate ability to predicate directly on {\tt C}-flag (Ivan's request) \\
122 %& When a torpedo strikes, {\tt ILC} is set to {\tt 1} \\
123 %& Only {\tt move} can be torpedoed (removed {\tt I}-bit from {\tt set}/{\tt shift}) \\
125 %& Changed all uses of ``Payload'' to ``Immediate'' \color{black} (not in red) \\
126 %& Reworked encoding of {\tt set} instruction \\
129 %& Factored in Russell Kao's comments (thanks!)\\
130 %& Added mechanism for setting C-flag from fabric even on outboxes\\
132 %& Made {\tt OLC} test a predicate-controlled condition\\
133 %& Rewrote ``on deck'' section \\
134 %& Added ``{\tt unset}'' value for {\tt ILC}\\
135 %& Changed {\tt DP} to {\tt DataPredecessor} for clarity\\
138 %& added comment about address-to-path ship \\
139 %& changed {\tt DST} field of {\tt set} instruction from 2 bits to 3 \\
140 %& changed the order of instructions in the encoding map \\
142 %& added epilogue fifo to diagrams \\
143 %& indicated that a token sent to the instruction port is treated as a torpedo \\
145 %& replaced {\tt setInner}, {\tt setOuter}, {\tt setFlags} with unified {\tt set} instruction \\
146 %& replaced {\tt literal} with {\tt shift} instruction \\
148 %& Made all instructions except {\tt setOuter} depend on {\tt OLC>0} \\
149 %& Removed ability to manually set the {\tt C} flag \\
150 %& Expanded predicate field to three bits \\
151 %& New literals scheme (via shifting) \\
152 %& Instruction encoding changes made at Ivan's request (for layout purposes) \\
153 %& Added summary of instruction encodings on last page \\
155 %& removed ``+'' from ``potentially torpedoable'' row where it does not occur in Execute \\
157 %& extended {\tt LiteralPath} to 13 bits (impl need not use all of them) \\
158 %& update table 3.1.2 \\
159 %& rename {\tt S} flag to {\tt C} \\
160 %& noted that {\tt setFlags} can be used as {\tt nop} \\
162 %& removed the {\tt L} flag (epilogues can now do this) \\
163 %& removed {\tt take\{Inner|Outer\}LoopCounter} instructions \\
164 %& renamed {\tt data} instruction to {\tt literal} \\
165 %& renamed {\tt send} instruction to {\tt move} \\
167 %& added ``if its predicate is true'' to repeat count \\
168 %& added note that red wires do not contact ships \\
169 %& changed name of {\tt flags} instruction to {\tt setFlags} \\
170 %& removed black dot from diagrams \\
171 %& changed {\tt OL} (Outer Loop participant) to {\tt OS} (One Shot) and inverted polarity \\
172 %& indicated that the death of the {\tt tail} instruction is what causes the hatch to be unsealed \\
173 %& indicated that only {\tt send} instructions which wait for data are torpedoable \\
174 %& added section ``Torpedo Details'' \\
175 %& removed {\tt torpedo} instruction \\
178 %& renamed loop+repeat to outer+inner (not in red) \\
179 %& renamed {\tt Z} flag to {\tt L} flag (not in red) \\
180 %& rewrote ``inner and outer loops'' section \\
181 %& updated all diagrams \\
184 %& Moved address bits to the LSB-side of a 37-bit instruction \\
185 %& Added {\it micro-instruction} and {\it composite instruction} terms \\
186 %& Removed the {\tt DL} field, added {\tt decrement} mode to {\tt loop} \\
187 %& Created the {\tt Hold} field \\
188 %& Changed how ReLooping works \\
189 %& Removed {\tt clog}, {\tt unclog}, {\tt interrupt}, and {\tt massacre} \\
196 \epsfig{file=all,height=1.5in}
197 \epsfig{file=overview-new,height=1.5in}
202 \section{Overview of Fleet}
204 A Fleet processor is organized around a {\it switch fabric}, which is
205 a packet-switched network with reliable in-order delivery. The switch
206 fabric is used to carry data between different functional units,
207 called {\it ships}. Each ship is connected to the switch fabric by
208 one or more programmable elements known as {\it docks}.
210 A {\it path} specifies a route through the switch fabric from a
211 particular {\it source} to a particular {\it destination}. The
212 combination of a path and a single word to be delivered is called a
213 {\it packet}. The switch fabric carries packets from their sources to
214 their destinations. Each dock has two destinations: one for {\it
215 instructions} and one for {\it data}. A Fleet is programmed by
216 depositing instruction packets into the switch fabric with paths that
217 will lead them to the instruction destinations of the docks at which they
220 When a packet arrives at the instruction destination of a dock, it is
221 enqueued for execution. Before the instruction executes, it may cause
222 the dock to wait for a packet to arrive at the dock's data destination
223 or for a value to be presented by the ship. When an instruction
224 executes it may consume this data and may present a data value to the
225 ship or transmit a packet.
227 When an instruction sends a packet into the switch fabric, it may
228 specify that the payload of the packet is irrelevant. Such packets
229 are known as {\it tokens}, and consume less energy than data packets.
233 \epsfig{file=overview-new,width=2.5in}\\
234 {\it Overview of a Fleet processor; dark gray shading represents the
235 switch fabric, ships are shown in light gray, and docks are shown in blue.}
241 \section{The Marina Dock}
243 The diagram below represents a conceptual view of the interface
244 between ships and the switch fabric; actual implementation circuitry
248 \epsfig{file=all,width=3.5in}\\
249 {\it An ``input'' dock and ``output'' dock connected to a ship. Solid
250 blue lines carry either tokens or data words, red lines carry either
251 instructions or torpedoes, and dashed lines carry only tokens.}
254 Each dock consists of a {\it data latch}, which is as wide as a single
255 machine word and a circular {\it instruction fifo} of
256 instruction-width latches. The values in the instruction fifo control
257 the data latch. The dock also includes a {\it path latch}, which
258 stores the path along which outgoing packets will be
261 Note that the instruction fifo in each dock has a destination of its
262 own; this is the {\it instruction destination} mentioned in the
263 previous section. A token sent to an instruction destination is
264 called a {\it torpedo}; it does not enter the instruction fifo, but
265 rather is held in a waiting area where it may interrupt certain
266 instructions (see the section on the {\tt move} instruction for further
269 From any source to any dock's data destination there are
270 two distinct paths which differ by a single bit. This bit is known as
271 the ``signal'' bit, and the routing of a packet is not affected by it;
272 the signal bit is used to pass control values between docks. Note that paths
273 terminating at an {\it instruction} destination need not have a signal
277 \section{Instructions}
279 In order to cause an instruction to execute, the programmer must first
280 arrange for that instruction word to arrive in the data latch of some
281 output dock. For example, this might be the ``data read'' output dock
282 of the memory access ship or the output of a fifo ship. Once an
283 instruction has arrived at this output dock, it is {\it dispatched} by
284 sending it to the {\it instruction destination} of the dock at which
287 There are two instruction formats, an {\it external format} described
288 in this section and an {\it internal format} described in the last
289 section of this memo.
291 Each instruction is 25\color{black}\ bits long, which makes it
292 possible for an instruction and an 12\color{black}-bit path to fit in
293 a single word of memory. This path is the path from the {\it
294 dispatching} dock to the {\it executing} dock.
298 \setlength{\bitwidth}{3.5mm}
300 \begin{bytefield}{37}
301 \bitheader[b]{0,24,25,36}\\
302 \bitbox{12}{dispatch path}
303 \bitbox{25}{instruction (external format)}
307 Note that the 12\color{black}\ bit {\tt dispatch path} field is not
308 the same width as the 13 bit {\tt Immediate} path field in the {\tt
309 move} instruction, which in turn may not be the same width as the
310 actual path latches in the switch fabric. The algorithm for expanding
311 a path to a wider width is specific to the switch fabric
312 implementation, and may vary from Fleet to Fleet. For the Marina
313 experiment, the correct algorithm is to sign-extend the path; the most
314 significant bit of the given path is used to fill the vacant bit of
315 the latch. Because the {\tt dispatch path} field is always used to
316 specify a path which terminates at an instruction destination (never a
317 data destination), and because instruction destinations ignore the
318 signal bit, certain optimizations may be possible.
320 %\subsection{Life Cycle of an Instruction}
322 %The diagram below shows an input dock for purposes of illustration:
325 %\epsfig{file=in,width=4in}\\
332 %\epsfig{file=out,width=4in}\\
333 %{\it an output dock}
336 %\subsection{Format of an Instruction}
338 %All instruction words have the following format:
340 \newcommand{\bitsHeader}{
344 \newcommand{\bitsHeaderNoI}{
350 %The {\tt P} bits are a {\it predicate}; this holds a code which
351 %indicates if the instruction should be executed or ignored depending
352 %on the state of flags in the dock. Note that {\tt head} and {\tt
353 %tail} instructions do not have {\tt P} fields.
356 \subsection{Loop Counters}
358 A programmer can perform two types of loops: {\it inner} loops
359 consisting of only one {\tt move} instruction and {\it outer} loops of
360 multiple instructions of any type. Inner loops may be nested within
361 an outer loop, but no other nesting of loops is allowed.
363 The dock has two loop counters, one for each kind of loop:
366 \item {\tt OLC} is the Outer Loop Counter
367 \item {\tt ILC} is the Inner Loop Counter
370 The {\tt OLC} applies to all instructions and can hold integers {\tt
373 The {\tt ILC} applies only to {\tt move} instructions and can hold
374 integers {\tt 0..MAX_ILC} (63) as well as a special value: $\infty$. When
375 {\tt ILC=0} the next {\tt move} instruction executes zero times (ie is
376 ignored). When {\tt ILC=$\infty$} the next {\tt move} instruction
377 executes until interrupted by a torpedo. After every {\tt move}
378 instruction the {\tt ILC} is reset to {\tt 1} (note that it is reset
379 to {\tt 1}, {\it not to 0}).
384 The dock has four flags: {\tt A}, {\tt B},
385 {\tt C}, and {\tt D}.
388 \item The {\tt A} and {\tt B} flags are general-purpose flags which
389 may be set and cleared by the programmer.
393 % The {\tt L} flag, known as the {\it last} flag, is set whenever
394 % the value in the outer counter ({\tt OLC}) is one,
397 % that the dock is in the midst of the last iteration of an
398 % outer loop. This flag can be used to perform certain
399 % operations (such as sending a completion token) only on the last
400 % iteration of an outer loop.
402 \item The {\tt C} flag is known as the {\it control} flag, and may be
403 set by the {\tt move} instruction based on information from the
404 ship or from an inbound packet. See the {\tt move} instruction
407 \item The {\tt D} flag is known as the {\it done} flag. The {\tt D}
408 flag is {\it set} when the {\tt OLC} is zero immediately after
409 execution of a {\tt set olc} or {\tt decrement olc} instruction,
410 or when a torpedo strikes. The {\tt D} flag is {\it cleared}
411 when a {\tt set olc} instruction causes the {\tt OLC} to be
412 loaded with a nonzero value.
418 \subsection{Predication}
420 All instructions except for {\tt head} and {\tt tail} have a three-bit
421 field marked {\tt P}, which specifies a {\it predicate}.
424 \setlength{\bitwidth}{5mm}
426 \begin{bytefield}{25}
427 \bitheader[b]{0,20,21,23-24}\\
434 The predicate determines which conditions must be true in order for
435 the instruction to execute; if it is not executed, it is simply {\it
436 ignored}. The table below shows what conditions must be true in
437 order for an instruction to execute:
440 \begin{tabular}{|r|l|}\hline
441 Code & Execute if \\\hline
442 {\tt 000:} & {\tt D=0}\ and {\tt A=0} \\
443 {\tt 001:} & {\tt D=0}\ and {\tt A=1} \\
444 {\tt 010:} & {\tt D=0}\ and {\tt B=0} \\
445 {\tt 011:} & {\tt D=0}\ and {\tt B=1} \\
446 {\tt 100:} & Unused \\
447 {\tt 101:} & {\tt D=1}\ \\
448 {\tt 110:} & {\tt D=0}\ \\
449 {\tt 111:} & always \\
455 \begin{wrapfigure}{r}{40mm}
457 \epsfig{file=requeue,height=1.5in}\\
459 \caption{{\it the requeue stage}}
462 \subsection{The Requeue Stage}
464 The requeue stage has two inputs, which will be referred to as the
465 {\it enqueueing} input and the {\it recirculating} input. It has a
466 single output which feeds into the instruction fifo.
468 The requeue stage has two states: {\sc Updating} and {\sc
471 \subsubsection{The {\sc Updating} State}
473 On initialization, the dock is in the {\sc Updating} state. In this
474 state the requeue stage is performing three tasks:
476 \item it is draining the
477 previous loop's instructions (if any) from the fifo
478 \item it is executing any ``one
479 shot'' instructions which come between the previous loop's {\tt tail}
480 and the next loop's {\tt head}
481 \item it is loading the instructions of
482 the next loop into the fifo.
485 In the {\sc Updating} state, the requeue stage will accept any
486 instruction other than a {\tt tail} which arrives at its {\it
487 enqueueing} input, and pass this instruction to its output. Any
488 instruction other than a {\tt head} which arrives at the {\it
489 recirculating} input will be discarded.
491 Note that when a {\tt tail} instruction arrives at the {\it
492 enqueueing} input, it ``gets stuck'' there. Likewise, when a {\tt
493 head} instruction arrives at the {\it recirculating} input, it also
494 ``gets stuck''. When the requeue stage finds {\it both} a {\tt tail}
495 instruction stuck at the {\it enqueueing} input and a {\tt head}
496 instruction stuck at the {\it recirculating} input, the requeue stage
497 discards both the {\tt head} and {\tt tail} and transitions to the
498 {\sc Circulating} state.
500 \subsubsection{The {\sc Circulating} State}
502 In the {\sc Circulating} state, the dock repeatedly executes the set
503 of instructions that are in the instruction fifo.
505 In the {\sc Circulating} state, the requeue stage will not accept
506 items from its {\it enqueueing} input. Any item presented at the {\it
507 recirculating} input will be passed through to the requeue stage's
510 When an {\tt abort} instruction is executed, the requeue stage
511 transitions back to the {\sc Updating} state. Note that {\tt abort}
512 instructions include a predicate; an {\tt abort} instruction whose
513 predicate is not met will not cause this transition.
519 \section{Instructions}
521 %The dock supports four instructions:
522 %{\tt move} (variants: {\tt moveto}, {\tt dispatch}),
529 \subsection{{\tt move}}
531 \newcommand{\bitsMove}{\setlength{\bitwidth}{5mm}
533 \begin{bytefield}{25}
534 \bitheader[b]{14-20}\\
548 \begin{bytefield}{25}
549 \bitheader[b]{0,12,13}\\
550 \bitbox[1]{10}{\raggedleft {\tt moveto} ({\tt Immediate\to Path})}
553 \bitbox{13}{\tt Immediate}
556 \begin{bytefield}{25}
557 \bitheader[b]{11,12,13}\\
558 \bitbox[1]{10}{\raggedleft {\tt dispatch} ({\footnotesize {\tt DataPredecessor[37:26\color{black}]\to Path}})\ \ }
567 \begin{bytefield}{25}
568 \bitheader[b]{11,12,13}\\
569 \bitbox[1]{10}{\raggedleft {\tt move} ({\tt Path} unchanged):}
580 \item {\tt Ti} - Token Input: wait for the token predecessor to be full and drain it.
581 \item {\tt Di} - Data Input: wait for the data predecessor to be full and drain it.
582 \item {\tt Dc} - Data Capture: pulse the data latch.
583 \item {\tt Do} - Data Output: fill the data successor.
584 \item {\tt To} - Token Output: fill the token successor.
587 The data successor and token successor must both be empty in order for
588 a {\tt move} instruction to attempt execution.
590 The {\tt I} bit stands for {\tt Immune}, and indicates if the
591 instruction is immune to torpedoes.
593 Every time the {\tt move} instruction executes, the {\tt C} flag is
597 \item If the dock is an {\it output} and the instruction has the {\tt
598 Dc} bit set, the {\tt C} flag is set to a value provided by the
601 \item Otherwise, if {\tt Ti=1} at any kind of dock or {\tt Di=1} at an
602 input dock, the {\tt C} flag is set to the signal bit of the
605 \item Otherwise, the signal bit is set to an undefined value.
610 The {\tt flush} instruction is a variant of {\tt move} which is valid
611 only at input docks. It has the same effect as {\tt deliver}, except
612 that it sets a special ``flushing'' indicator along with the data
615 \newcommand{\bitsFlush}{\setlength{\bitwidth}{5mm}
617 \begin{bytefield}{25}
618 \bitheader[b]{14-18}\\
619 \bitbox[r]{6}{\raggedleft{\tt flush\ \ }}
631 When a ship fires, it must examine the ``flushing'' indicators on the
632 input docks whose fullness was part of the firing condition. If all
633 of the input docks' flushing indicators are set, the ship must drain
634 all of their data successors and take no action. If some, but not
635 all, of the indicators are set, the ship must drain {\it only the data
636 successors of the docks whose indicators were {\bf not} set}, and
637 take no action. If none of the flushing indicators was set, the ship
644 \subsection{{\tt set}}
646 The {\tt set} command is used to set or decrement the inner loop
647 counter, outer loop counter, and data latch.
649 \newcommand{\bitsSet}{
651 \begin{bytefield}{25}
652 \bitheader[b]{19-24}\\
663 \begin{bytefield}{25}
664 \bitheader[b]{0,5,12-18}\\
665 \bitbox[1]{5}{\raggedleft {\tt Immediate}\to{\tt OLC}}
667 \bitbox{4}{\tt 1000\color{black}}
670 \bitbox{6}{\tt Immediate}
673 \begin{bytefield}{25}
674 \bitheader[b]{12-18}\\
675 \bitbox[1]{5}{\raggedleft {\tt Data Latch}\to{\tt OLC}}
677 \bitbox{4}{\tt 1000\color{black}}
682 \begin{bytefield}{25}
683 \bitheader[b]{12-18}\\
684 \bitbox[1]{5}{\raggedleft {\tt OLC-1}\to{\tt OLC}}
686 \bitbox{4}{\tt 1000\color{black}}
691 \begin{bytefield}{25}
692 \bitheader[b]{0,5,6,12-18}\\
693 \bitbox[1]{5}{\raggedleft {\tt Immediate}\to{\tt ILC}}
695 \bitbox{4}{\tt 0100\color{black}}
699 \bitbox{6}{\tt Immediate}
702 \begin{bytefield}{25}
703 \bitheader[b]{6,12-18}\\
704 \bitbox[1]{5}{\raggedleft $\infty$\to{\tt ILC}}
706 \bitbox{4}{\tt 0100\color{black}}
713 \begin{bytefield}{25}
714 \bitheader[b]{12-18}\\
715 \bitbox[1]{5}{\raggedleft {\tt Data Latch}\to{\tt ILC}}
717 \bitbox{4}{\tt 0100\color{black}}
722 \begin{bytefield}{25}
723 \bitheader[b]{0,13-18}\\
724 \bitbox[1]{5}{\raggedleft \footnotesize {\tt Sign-Extended Immediate}\to{\tt Data Latch}}
726 \bitbox{4}{\tt 0010\color{black}}
727 \bitbox{1}{\begin{minipage}{0.5cm}{
734 \bitbox{14}{\tt Immediate}
737 \begin{bytefield}{25}
738 \bitheader[b]{0,5,6,11,15-18}\\
739 \bitbox[1]{5}{\raggedleft {\tt Update Flags}}
741 \bitbox{4}{\tt 0001\color{black}}
743 \bitbox{6}{\tt nextA}
744 \bitbox{6}{\tt nextB}
750 The Marina implementation has an unarchitected
751 ``literal latch'' at the on deck ({\tt OD}) stage, which is loaded
752 with the possibly-extended literal {\it at the time that the {\tt set}
753 instruction comes on deck}. This latch is then copied into the data
754 latch when a {\tt set Data Latch} instruction
757 The {\tt Sign-Extended Immediate} instruction copies the {\tt
758 Immediate} field into the least significant bits of the data latch.
759 All other bits of the data latch are filled with a copy of the
760 bit marked ``{\tt Sign}.''
763 Each of the {\tt nextA} and {\tt nextB} fields has the following
764 structure, and indicates which old flag values should be logically
765 {\tt OR}ed together to produce the new flag value:
771 \bitbox{1}{${\text{\tt A}}$}
772 \bitbox{1}{$\overline{\text{\tt A}}$}
773 \bitbox{1}{${\text{\tt B}}$}
774 \bitbox{1}{$\overline{\text{\tt B}}$}
775 \bitbox{1}{${\text{{\tt C}\ }}$}
776 \bitbox{1}{$\overline{\text{{\tt C}\ }}$}
780 Each bit corresponds to one possible input; all inputs whose bits are
781 set are {\tt OR}ed together, and the resulting value is assigned to
782 the flag. Note that if none of the bits are set, the value assigned
783 is zero. Note also that it is possible to produce a {\tt 1} by {\tt
784 OR}ing any flag with its complement, and that {\tt set Flags} can
785 be used to create a {\tt nop} (no-op) by setting each flag to itself.
791 \subsection{{\tt shift}}
793 \newcommand{\shiftImmediateSize}{19}
795 Each {\tt shift} instruction carries an immediate of \shiftImmediateSize\
796 bits. When a {\tt shift} instruction is executed, this immediate is copied
797 into the least significant \shiftImmediateSize\ bits of the data latch,
798 and the remaining most significant bits of the data latch are loaded
799 with the value formerly in the least significant bits of the data latch.
800 In this manner, large literals can be built up by ``shifting'' them
801 into the data latch \shiftImmediateSize\ bits at a time.
803 \newcommand{\bitsShift}{
804 \setlength{\bitwidth}{5mm}
806 \begin{bytefield}{25}
807 \bitheader[b]{0,18-20}\\
814 \bitbox{\shiftImmediateSize}{Immediate}
819 The Marina implementation has an unarchitected
820 ``literal latch'' at the on deck ({\tt OD}) stage, which is loaded
821 with the literal {\it at the time that the {\tt shift} instruction
822 comes on deck}. This latch is then copied into the data latch when
823 the instruction executes.
827 \subsection{{\tt abort}}
828 \newcommand{\bitsAbort}{\setlength{\bitwidth}{5mm}
830 \begin{bytefield}{25}
831 \bitheader[b]{17-20}\\
844 An {\tt abort} instruction causes a loop to exit; see the section on
845 the Requeue Stage for further details.
847 \subsection{{\tt head}}
848 \newcommand{\bitsHead}{
849 \setlength{\bitwidth}{5mm}
851 \begin{bytefield}{25}
852 \bitheader[b]{17-20}\\
865 A {\tt head} instruction marks the start of a loop; see the section on
866 the Requeue Stage for further details.
869 \subsection{{\tt tail}}
870 \newcommand{\bitsTail}{
871 \setlength{\bitwidth}{5mm}
873 \begin{bytefield}{25}
874 \bitheader[b]{17-20}\\
887 A {\tt tail} instruction marks the end of a loop; see the section on
888 the Requeue Stage for further details.
892 %\subsection{{\tt takeOuterLoopCounter}}
894 %\setlength{\bitwidth}{5mm}
896 %\begin{bytefield}{25}
897 % \bitheader[b]{16-19,21}\\
911 %This instruction copies the value in the outer loop counter {\tt OLC}
912 %into the least significant bits of the data latch and leaves all other
913 %bits of the data latch unchanged.
915 %\subsection{{\tt takeInnerLoopCounter}}
917 %\setlength{\bitwidth}{5mm}
919 %\begin{bytefield}{25}
920 % \bitheader[b]{16-19,21}\\
934 %This instruction copies the value in the inner loop counter {\tt ILC}
935 %into the least significant bits of the data latch and leaves all other
936 %bits of the data latch unchanged.
941 %%\subsection{{\tt interrupt}}
943 %%\setlength{\bitwidth}{5mm}
945 %\begin{bytefield}{25}
946 % \bitheader[b]{0,5,16-19,21}\\
957 %When an {\tt interrupt} instruction reaches {\tt IH}, it will wait
958 %there for the {\tt OD} stage to be full with an instruction that has
959 %the {\tt IM} bit set. When this occurs, the instruction at {\tt OD}
960 %{\it will not execute}, but {\it may reloop} if the conditions for
962 %\footnote{The ability to interrupt an instruction yet have it reloop is very
963 %useful for processing chunks of data with a fixed size header and/or
964 %footer and a variable length body.}
967 %\subsection{{\tt massacre}}
969 %\setlength{\bitwidth}{5mm}
971 %\begin{bytefield}{25}
972 % \bitheader[b]{16-19,21}\\
984 %When a {\tt massacre} instruction reaches {\tt IH}, it will wait there
985 %for the {\tt OD} stage to be full with an instruction that has the
986 %{\tt IM} bit set. When this occurs, all instructions in the
987 %instruction fifo (including {\tt OD}) are retired.
989 %\subsection{{\tt clog}}
991 %\setlength{\bitwidth}{5mm}
993 %\begin{bytefield}{25}
994 % \bitheader[b]{16-19,21}\\
1002 % \bitbox[tbr]{16}{}
1006 %When a {\tt clog} instruction reaches {\tt OD}, it remains there and
1007 %no more instructions will be executed until an {\tt unclog} is
1010 %\subsection{{\tt unclog}}
1012 %\setlength{\bitwidth}{5mm}
1014 %\begin{bytefield}{25}
1015 % \bitheader[b]{16-19,21}\\
1021 % \bitbox[lrtb]{2}{11}
1023 % \bitbox[tbr]{16}{}
1027 %When an {\tt unclog} instruction reaches {\tt IH}, it will wait there
1028 %until a {\tt clog} instruction is at {\tt OD}. When this occurs, both
1029 %instructions retire.
1031 %Note that issuing an {\tt unclog} instruction to a dock which is not
1032 %clogged and whose instruction fifo contains no {\tt clog} instructions
1033 %will cause the dock to deadlock.
1038 The following additional restrictions have been imposed on the dock in
1039 the Marina test chip:
1041 \subsection*{Both Docks}
1046 A Marina dock initializes with the {\tt ILC}, {\tt OLC}, and flags in
1047 an indeterminate state.
1050 The instruction immediately after a {\tt move} instruction must not be
1051 a {\tt set flags} instruction which utilizes the {\tt C}-flag (the
1052 value of the {\tt C}-flag is not stable for a brief time after a {\tt
1056 If a {\tt move} instruction is torpedoable (ie it has the {\tt I} bit
1057 set to {\tt 0}), it {\it must} have either the {\tt Ti} bit or {\tt
1058 Di} bit set (or both). It is not permitted for a torpedoable {\tt
1059 move} to have both bits cleared.
1064 \subsection*{Dock with Ivan's Counter (non-stretch)}
1070 A torpedoable {\tt move} instruction must not be followed immediately
1071 by a {\tt set olc} instruction or another torpedoable {\tt move}.
1075 This document specifies that when a torpedoable {\tt move} instruction
1076 executes successfully, the {\tt D} flag is unchanged. In Marina, when
1077 a torpedoable {\tt move} instruction executes successfully, it causes
1078 the {\tt D} flag to be set if the {\tt OLC} was zero and causes it to
1079 be cleared if the {\tt OLC} was nonzero. Thus, in the following
1080 instruction sequence:
1085 send token to self:i;
1087 [*] send token to self;
1093 Will leave the {\tt D} flag {\it set} on Marina, whereas a strict
1094 implementation of this document would leave it cleared.
1096 In practice, this distinction rarely matters.
1100 \subsection*{Dock with Kessels Counter (``stretch'')}
1102 With the Kessels counter, the {\tt D}-flag {\it is exactly equal to}
1103 the zeroness of the {\tt OLC}; it cannot be ``out of sync'' with it.
1108 Every ``load OLC'' instruction must be predicated on the {\tt D}-flag
1109 being {\it set}. This is a sneaky way of forcing the programmer to
1110 ``run down'' the counter before loading it, because Kessels' counter
1111 does not support ``unloading.''
1114 Every ``decrement OLC'' instruction must be predicated on the {\tt
1115 D}-flag being {\it cleared}. This way we never have to check if the
1116 counter is already empty before decrementing.
1119 The instruction after a torpedoable {\tt move} must not be predicated
1120 on the {\tt D}-flag being {\it set} (it may be predicated on the {\tt
1121 D}-flag being {\it cleared}. This is because, while the move
1122 instruction is waiting to execute, the {\tt D}-flag will be cleared,
1123 and the predicate stage believes that it can skip the instruction even
1124 though {\tt do[ins]} is still high (I think this is dumb).
1132 \section*{External Instruction Encoding Map\color{black}}
1135 \vspace{3mm}\hspace{-1cm}{\tt shift}\hspace{1cm}\vspace{-6mm}\\
1138 \vspace{3mm}\hspace{-1cm}{\tt set}\hspace{1cm}\vspace{-6mm}\\
1141 \vspace{3mm}\hspace{-1cm}{\tt move}\hspace{1cm}\vspace{-6mm}\\
1145 \vspace{3mm}\hspace{-1cm}{\tt abort}\hspace{1cm}\vspace{-6mm}\\
1148 \vspace{3mm}\hspace{-1cm}{\tt head}\hspace{1cm}\vspace{-6mm}\\
1151 \vspace{3mm}\hspace{-1cm}{\tt tail}\hspace{1cm}\vspace{-6mm}\\
1156 %\epsfig{file=all,height=5in,angle=90}
1159 %\subsection*{Input Dock}
1160 %\epsfig{file=in,width=8in,angle=90}
1163 %\subsection*{Output Dock}
1164 %\epsfig{file=out,width=8in,angle=90}
1168 %\epsfig{file=ports,height=5in,angle=90}
1171 %\epsfig{file=best,height=5in,angle=90}
1174 \section*{Internal Instruction Encoding Map\color{black}}
1176 Marina Instructions in main memory occupy 37 bits. Of this, 11 bits
1177 give the path to the dock which is to execute the instruction; thus,
1178 only 26 of these bits are interpreted by the dock.
1180 It is easiest to design the OD and EX stages of the dock if the
1181 control bits supplied there are mostly one-hot encoded. Moreover, due
1182 to layout considerations there is very little cost associated with
1183 making the instruction fifo 36 bits wide rather than 26 bits wide.
1185 Due to these two considerations, all 26-bit instructions
1186 binary-coded-control instructions are expanded into 36-bit
1187 unary-coded-control instructions upon entry to the instruction fifo.
1188 This section documents the 36-bit unary-coded-control format.
1190 \subsection*{Predicate Field}
1192 The {\tt Predicate} field, common to many instructions, consists of a
1193 six-bit wide, one-hot encoded field. The instruction will be {\bf
1194 skipped} (not executed) if {\bf any} condition corresponding to a
1195 bit whose value is one is met.
1197 \setlength{\bitwidth}{3.5mm}
1198 {\footnotesize\tt\begin{bytefield}{36}
1199 \bitheader[b]{0,29-35}\\
1211 For example, if bits 31 and 34 are set, the instruction will be
1212 skipped if either the {\tt B} flag is cleared or the {\tt A} flag is
1213 set. Equivalently, it will be executed iff the {\tt B} flag is set
1214 and the {\tt A} flag is cleared.
1216 \subsection*{Set Flags}
1218 Each of the {\tt FlagA} and {\tt FlagB} fields in the Set Flags
1219 instruction gives a truth table; the new value of the flag is the
1220 logical OR of the inputs whose bits are set to {\tt 1}.
1222 \setlength{\bitwidth}{5mm}
1223 {\tt\begin{bytefield}{6}
1224 \bitheader[b]{0-5}\\
1234 \newcommand{\common}{%
1235 \bitbox{6}{Predicate}%
1244 \oddsidemargin 0.9in
1247 \begin{sidewaysfigure}[h!]
1250 \setlength{\bitwidth}{5mm}
1254 {\tt\begin{bytefield}{36}
1255 \bitheader[b]{0,18,19,21-30,35}\\
1270 \bitbox{19}{immediate}
1273 {\tt\begin{bytefield}{36}
1274 \bitheader[b]{0,13,14,15,21-30,35}\\
1293 \bitbox{14}{immediate to sign ext}
1294 \end{bytefield}} \\\hline
1296 Move, Immediate$\rightarrow$Path &
1297 {\tt\begin{bytefield}{36}
1298 \bitheader[b]{0,13,14-20,21-30,35}\\
1321 \bitbox{13}{Immediate}
1323 Move, DP[37:26]$\rightarrow$Path &
1324 {\tt\begin{bytefield}{36}
1325 \bitheader[b]{0,12-13,14-20,21-30,35}\\
1352 Move, Path unchanged &
1353 {\tt\begin{bytefield}{36}
1354 \bitheader[b]{0,11-13,14-20,21-30,35}\\
1378 \bitbox{1}{F$\dagger$}
1386 {\tt\begin{bytefield}{36}
1387 \bitheader[b]{0,11,12,21-30,35}\\
1403 \end{bytefield}} \\\hline
1406 {\tt\begin{bytefield}{36}
1407 \bitheader[b]{0,20-30,35}\\
1426 {\tt\begin{bytefield}{36}
1427 \bitheader[b]{0,19-30,35}\\
1449 {\tt\begin{bytefield}{36}
1450 \bitheader[b]{0,5,19-30,35}\\
1470 \bitbox{6}{Immediate}
1471 \end{bytefield}} \\\hline
1474 {\tt\begin{bytefield}{36}
1475 \bitheader[b]{0,19,21-30,35}\\
1495 {\tt\begin{bytefield}{36}
1496 \bitheader[b]{0,5,7,19,21-30,35}\\
1516 \bitbox{1}{0${}^\star$}
1520 \bitbox{6}{Immediate}
1523 {\tt\begin{bytefield}{36}
1524 \bitheader[b]{0,7,21-30,35}\\
1543 \bitbox{1}{1${}^\star$}
1548 \end{bytefield}} \\\hline
1551 {\tt\begin{bytefield}{36}
1563 {\tt\begin{bytefield}{36}
1566 \bitbox{6}{Predicate}
1592 {\tt\begin{bytefield}{36}
1605 $\star$ -- Bit 8 is the ``infinity'' bit \\
1606 $\dagger$ -- When a ``Move, Path unchanged'' is performed, bit 12 is copied to the ``flushing latch''. \\
1607 .\hspace{0.5cm} When a ship fires, it examines the ``flushing latches'' of all of its inboxes as part of its decision about what to do. \\
1608 $1$ -- The encoding of the {\tt abort} instruction was chosen in order to make it look like a {\tt set flags} instruction which does not change the flags. \\
1609 Tp\ \ = Torpedoable (1=Torpedoable, 0=Not-Torpedoable) \\
1610 rD\ \ = recompute D-flag (1=recompute, 0=leave unchanged)
1612 \end{sidewaysfigure}
1615 \section*{Marina Dock Block Diagram}
1616 This diagram was produced by Ivan Sutherland.
1618 \epsfig{file=blockDiagram,width=8in,angle=90}