-\documentclass[10pt]{book}
+\documentclass[10pt,oneside]{article}
+\reversemarginpar
+\usepackage[titles]{tocloft}
+\usepackage{emp}
\usepackage{amsmath}
+\DeclareGraphicsRule{*}{mps}{*}{}
+
+\newcommand{\footnoteremember}[2]{
+ \footnote{#2}
+ \newcounter{#1}
+ \setcounter{#1}{\value{footnote}}
+} \newcommand{\footnoterecall}[1]{
+ \footnotemark[\value{#1}]
+}
+
+\ifx\pdftexversion\undefined
+\usepackage[dvips]{graphicx}
+\DeclareGraphicsExtensions{.eps}\else
+\usepackage[pdftex]{graphicx}
+\DeclareGraphicsExtensions{.pdf,.png,.mps,.mp}
+\usepackage{epstopdf}
+\fi
\usepackage{palatino}
+\usepackage{wrapfig}
+\usepackage{epsfig}
+\usepackage{verbatim}
\usepackage{parskip}
\usepackage{register}
\usepackage{bytefield}
+\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
\renewcommand{\ttdefault}{cmtt}
-\title{The FleetTwo Architecture Manual}
+\title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of FleetTwo}}
\begin{document}
\maketitle
+\begin{thebibliography}{[GDG01]}
+\bibitem[IES02]{ies02}
+ Sutherland, Ivan,
+ \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer}
+\bibitem[IES14]{ies14}
+ Sutherland, Ivan,
+ \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition}
+\bibitem[AM17]{am17}
+ Megacz, Adam,
+ \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am17-The.Choice.Ship.pdf}{\it UCB-AM17: The Choice Ship}
+\bibitem[IES44]{ies44}
+ Sutherland, Ivan,
+ \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition}
+\bibitem[AM25]{am25}
+ Megacz, Adam,
+ \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am25.pdf}{\it UCB-AM25: Opcode Ports}
+\bibitem[AM27]{am27}
+ Megacz, Adam,
+ \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am27.pdf}{\it UCB-AM27: Bypass Paths}
+\bibitem[IES31]{ies31}
+ Sutherland, Ivan,
+ \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies31-Some.SHIPs.pdf}{\it UCB-IES31: Some Ships}
+\bibitem[MS68]{WheelOfReincarnation}
+ Myer, T. and Sutherland, Ivan, {\it On the Design of Display Processors}
+ Communications of the ACM, Vol 11, No. 6, June 1968 .
+\end{thebibliography}
+
+\vspace{1cm}
+\begin{abstract}
+This manual attempts to detail all those aspects of Fleet which are
+visible to a software programmer or compiler backend.
+
+This document assumes that the reader is broadly familiar with the
+Fleet architecture and the motivation behind it. For a far better
+introduction to these topics, the reader is referred to \cite{ies02}
+\cite{ies14} and \cite{ies44}. Please bear in mind that many of the
+details in these documents have been changed, but they still serve as
+the authoritative introduction to the architecture. After reviewing
+these introductory papers, the reader is invited to digest this
+architecture manual for the latest information on the details
+necessary to write software and compilers for Fleet.
+\end{abstract}
+
+
+%\vfill
+%\fbox{\parbox{5in}{\footnotesize
+%\begin{center}
+%The software described in this memo may be obtained using
+%\href{http://www.darcs.net/}{\tt darcs}, with the command:
+%
+%{\tt darcs get http://research.cs.berkeley.edu/class/fleet/repos/fleet/}
+%\end{center}
+%}}
+
+\pagebreak
+\tableofcontents
+
\pagebreak
-\section*{Data Formats}
+\section{Introduction (from \cite{ies02})}
+
+During the past two and a half years, the Fleet architecture has gone
+through massive changes and improvements. Nonetheless, the
+fundamental vision remains unchanged. As Sutherland writes in the
+original Berkeley Fleet description \cite{ies02},
+
+{\it
+When computers were new, logic and storage were expensive and wires
+were relatively cheap. Early computer designers avoided using logic
+wherever possible but were not greatly concerned with communication.
+They made design choices consistent with the costs of the day. My
+favorite example is the jump instruction. Early designers put jump
+instructions at the end of each block of code to avoid the expense of
+storing the address of the next block while executing the present one.
+
+Today's chip fabrication methods invert the older cost balance. In
+today's integrated circuits logic and memory are now almost free but
+communication costs dominate chip area, energy consumption and delay.
+In spite of these changes in the stuff from which we make computers,
+vestiges of the past remain in many of today's common microprocessor
+designs. For example, jump instructions still appear at the end of
+each basic block of code in spite of the need to pre-fetch the next
+block while executing this one. It seems better today to store a
+pointer to the next block and the length of the current block early in
+each block of code.
+
+Instead of following the path of history, I'd like to listen carefully
+to what modern chip structures have to teach about how one might build
+a modern computer. I see three major lessons. First, simplicity can
+reduce cost. Second, moving data will consume most of the time,
+energy and chip area of a modern computer. And third, the low cost of
+logic makes concurrency available if we can figure out how to use it.
+
+\begin{itemize}
+
+\item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating
+ all processing devices alike...
-\subsection*{Packet Destination Address (12 bits)}
+\item {\bf Communication}: The Fleet architecture seeks to control the
+ cost of communication by putting it under direct programmer
+ control. Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that
+ include concealed communication to and from a register file...
+
+\item {\bf Concurrency}: The Fleet architecture assumes concurrency
+ nearly everywhere...
+
+\item {\bf Asynchrony}: This provides enormous flexibility for
+ implementers to improve the performance or reduce the cost of
+ the processing devices in a Fleet system. -- Synchronous
+ implementations both of ships and of the switch fabric are
+ possible provided they use validity or occupancy bits to achieve
+ the arbitrary delays required at sources and destinations...
+
+\end{itemize}
+}
+
+
+\pagebreak
+\section{The Programmer's View of The Ship-Fabric Interface}
+
+The role of the Fleet switch fabric is to carry {\it packets} from
+{\it sources} to {\it destinations}.
+
+A packet consists of a {\it path} and a {\it payload}. The path is
+some string of bits which tells the switch fabric how to route from
+the desired source to the desired destination. The distinction
+between a {\it path} and a {\it destination} is extremely important!
+The payload is either a {\it token} or a {\it data item}. If the
+payload is a data item, it includes one machine word of data.
+
+The diagram below represents a {\it programmer's} conceptual view of
+the interface between ships and the switch fabric. Actual
+implementation circuitry may differ substantially. Sources and
+destinations which can send and receive only tokens -- not data items
+-- are drawn as dashed lines.
+
+\vspace{0.3cm}
+\begin{center}
+\epsfig{file=ports,width=4in}
+\end{center}
+\vspace{0.3cm}
+
+The term {\it port} refers to an interface to the ship, the {\it
+ dock} connecting it to the switch fabric, and the corresponding
+sources and destinations on the switch fabric.
+
+Each dock consists of a {\it data latch}, which is as wide as a
+single machine word and a {\it pump}, which is a circular fifo of
+instruction-width latches. The values in the instruction fifo
+control the data latch, as future chapters will explain.
+
+Note that the pump in each dock has a destination of its own, and
+that there is no buffering fifo guarding this destination. Note that
+all other destinations are guarded by a buffering fifo; the size of
+this fifo is exposed to the software programmer so she can avoid
+deadlock.
-These bits appear physically within the switch fabric, and have
-``address bit timing.'' The {\tt T} bit is the ``tokenhood'' bit; if
-set, this packet represents a token and it does not cause the switch
-fabric data latches to fire.
+
+\pagebreak
+\section{Data Formats}
+
+\subsection{Path (12 bits)}
+
+These bits appear physically within the switch fabric. \footnote{In
+ the Sun Labs implementation, these bits have ``address bit
+ timing.''} The {\tt T} bit is the ``tokenhood'' bit; if set, this
+packet represents a token\footnote{In the Sun Labs
+ implementation, this bit inhibits the switch fabric data latches
+ from firing.}
{\tt\footnotesize
\begin{bytefield}{49}
\bitheader[b]{37,47,48}\\
\bitbox{1}{T}
- \bitbox{11}{Destination Address}
+ \bitbox{11}{Path}
\bitbox[l]{37}{}
\end{bytefield}
}
-\subsection*{Data Word In Memory (37 bits)}
+\subsection{Data Word In Memory (37 bits)}
-A word of memory is 37 bits wide. For convenience, we assume that the
-memory word width is also the width of a pointer as well as the width
-of all on-chip data item registers.
+A word of data is 37 bits wide.
{\tt\footnotesize
\begin{bytefield}{49}
\end{bytefield}
}
-\subsection*{Data Packet In Flight (49 bits)}
-
-A {\it data packet} is a data item in the switch fabric, on its way to
-some destination.
+\subsection{Packet In Flight (49 bits)}
{\tt\footnotesize
\begin{bytefield}{49}
\bitheader[b]{0,36,37,47,48}\\
\bitbox{1}{T}
- \bitbox{11}{Destination Address}
+ \bitbox{11}{Path}
\bitbox{37}{Data Word}
\end{bytefield}
}
-\subsection*{Instruction In Memory (37 bits)}
+\subsection{Instruction In Memory (37 bits)}
An instruction must be no wider than a memory word. The next section
-explains the bits in greater detail.
+explains the bits in greater detail. The {\tt Instruction Path} is
+the path which the instruction itself will take in order to arrive at
+the desired pump destination. The {\it Data/Token Path} is placed on
+any packet inserted into the switch fabric as a result of executing
+the instruction.
{\tt\tiny
\begin{bytefield}{49}
- \bitheader[b]{0,10,11,17,18-26,36}\\
+ \bitheader[b]{0,6,7,17,18-26,36}\\
\bitbox[r]{12}{}
- \bitbox{11}{Instruction Register Address}
- \bitbox{1}{K}
- \bitbox{1}{L}
+ \bitbox{11}{Instruction Path}
+ \bitbox{1}{1}
\bitbox{1}{Ti}
\bitbox{1}{Di}
- \bitbox{1}{Ld}
+ \bitbox{1}{Dc}
\bitbox{1}{Do}
\bitbox{1}{To}
+ \bitbox{1}{Ig}
\bitbox{1}{Rq}
+ \bitbox{11}{Data/Token Path}
\bitbox{7}{Count}
- \bitbox{11}{Data/Token Destination}
\end{bytefield}
}
-\subsection*{Instruction Packet In Flight (49 bits)}
+\subsection{Instruction Packet In Flight (49 bits)}
-A {\it instruction packet} is an instruction in the instruction horn
-(which may or may not be the same thing as the data horn), on its way
-to some instruction register (BenkoBox).
+Note that Fleet simply copies the {\tt Instruction Path} field, bit
+for bit, into the packet {\tt Path} field in order to ``dispatch'' an
+instruction.
{\tt\tiny
\begin{bytefield}{49}
- \bitheader[b]{0,10,11,17,18-25,37,47,48}\\
+ \bitheader[b]{0,6,7,17,18-25,37,47,48}\\
\bitbox[r]{1}{0}
- \bitbox{11}{Instruction Register Address}
- \bitbox[lr]{11}{}
- \bitbox{1}{K}
- \bitbox{1}{L}
+ \bitbox{11}{Instruction Path}
+ \bitbox{11}{Instruction Path}
+ \bitbox{1}{1}
\bitbox{1}{Ti}
\bitbox{1}{Di}
- \bitbox{1}{Ld}
+ \bitbox{1}{Dc}
\bitbox{1}{Do}
\bitbox{1}{To}
+ \bitbox{1}{Ig}
\bitbox{1}{Rq}
+ \bitbox{11}{Data/Token Path}
\bitbox{7}{Count}
- \bitbox{11}{Data/Token Destination}
\end{bytefield}
}
-
+\setlength{\bitwidth}{5mm}
\pagebreak
-\section*{Instruction Format Detail}
+\section{Instruction Formats}
+\begin{wrapfigure}{R}{2in}
+\epsfig{file=locations,width=2in}
+\vspace{-0.4in}
+\end{wrapfigure}
+Within the pump's instruction fifo, there are two points of particular
+interest. The first point is the {\it insertion point}, which is the
+point in the ring where new instructions enter from the switch fabric.
+The other point of interest is the {\it execution point}, which is the
+point in the ring where instructions other than {\tt kill} and {\tt
+ unclog} are performed.
+
+The programmer may assume that the insertion point is the immediate
+successor of the execution point.
+
+\vspace{0.3cm}
+{\bf Kill}
+\addcontentsline{toc}{subsection}{Kill}
+
+{\tt
+\begin{bytefield}{26}
+ \bitheader[b]{0,6,7,20-25}\\
+ \bitbox{1}{0}
+ \bitbox{1}{0}
+ \bitbox{1}{1}
+ \bitbox{1}{0}
+ \bitbox{1}{1}
+ \bitbox{14}{don't care}
+ \bitbox{7}{Count}
+\end{bytefield}}
+When a {\tt kill} instruction reaches the insertion point, it will
+wait there for a non-{\tt clog} instruction to reach the execution
+point. When this occurs, the instruction at the execution point is
+retired and the count field of the {\tt kill} instruction is
+decremented. If the {\tt kill} instruction's count was {\tt 0} before
+decrementing, the {\tt kill} instruction is retired. The programmer
+is assured that a kill instruction with a count of $n$ will kill $n+1$
+{\it consecutive} instructions.
+
+\vspace{0.3cm}
+{\bf Clog}
+\addcontentsline{toc}{subsection}{Clog}
+
+{\tt
+\begin{bytefield}{26}
+ \bitheader[b]{0,20-25}\\
+ \bitbox{1}{0}
+ \bitbox{1}{0}
+ \bitbox{1}{1}
+ \bitbox{1}{0}
+ \bitbox{1}{0}
+ \bitbox{21}{don't care}
+\end{bytefield}}
+When a {\tt clog} instruction reaches the execution point, it remains
+there and no more instructions will be executed until an {\tt unclog}
+is performed.
+
+\vspace{0.3cm}
+{\bf UnClog}
+\addcontentsline{toc}{subsection}{UnClog}
-\setlength{\bitwidth}{5mm}
{\tt
\begin{bytefield}{26}
- \bitheader[b]{0,10,11,17,18-25}\\
- \bitbox{1}{K}
- \bitbox{1}{L}
+ \bitheader[b]{0,20-25}\\
+ \bitbox{1}{0}
+ \bitbox{1}{0}
+ \bitbox{1}{0}
+ \bitbox{1}{1}
+ \bitbox{1}{0}
+ \bitbox{21}{don't care}
+\end{bytefield}}
+When an {\tt unclog} instruction reaches the insertion point, it will wait
+there until a {\tt clog} instruction is at the execution point. When
+this occurs, both instructions retire.
+
+\vspace{0.3cm}
+{\bf Literal}
+\addcontentsline{toc}{subsection}{Literal}
+
+{\tt
+\begin{bytefield}{26}
+ \bitheader[b]{0,6,7,23-25}\\
+ \bitbox{1}{0}
+ \bitbox{1}{1}
+ \bitbox{17}{Literal}
+ \bitbox{7}{Count}
+\end{bytefield}}
+When a literal instruction reaches the execution point, its {\tt
+ Literal} field is sign-extended to a full word length and captured
+in the data latch.
+
+
+\pagebreak
+{\bf Normal Instruction}
+\addcontentsline{toc}{subsection}{Normal Instruction}
+
+{\tt
+\begin{bytefield}{26}
+ \bitheader[b]{0,6,7,16-25}\\
+ \bitbox{1}{1}
\bitbox{1}{Ti}
\bitbox{1}{Di}
- \bitbox{1}{Ld}
+ \bitbox{1}{Dc}
\bitbox{1}{Do}
\bitbox{1}{To}
+ \bitbox{1}{Ig}
\bitbox{1}{Rq}
- \bitbox{7}{C}
\bitbox{11}{Dest}
+ \bitbox{7}{Count}
\end{bytefield}
}
+\footnoteremember{tidi}{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.}
+\footnoteremember{didc}{Note that {\tt Di}=0,{\tt Dc}=1
+ is meaningless and therefore reserved for other
+ uses.}
+\footnoteremember{todo}{ {\tt To}=1,{\tt Do}=1 have special meaning on an output port.}
+\footnoteremember{toig}{{\tt To}=0,{\tt Ig}=1 is invalid}
\begin{itemize}
+ \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge it.
- \item [\tt K] ({\bf Kill}) if set, this instruction is a kill;
- ignore all further directions below.
-
- \item [\tt L] ({\bf Literal}) if set, use all bits below except
- {\tt Count} (17 bits) as a literal, sign extend them,
- and load into the data register; ignore all further
- directions below except {\tt Count}.
+ \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it.
+ \footnoterecall{tidi}\footnoterecall{didc}
- \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge
- it; {\tt Ti}=1,{\tt Di}=1 is invalid on inbox
+ \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
+ datum in the data latch. This bit is ignored if the incoming packet is
+ a token. \footnoterecall{didc}
- \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge
- it
+ \item [\tt Do] ({\bf Data Output}) emit a datum.\footnoterecall{todo}
- \item [\tt Dc] ({\bf Data Capture}) capture (latch) a datum; {\tt
- Di}=0,{\tt Dc}=1 is invalid. This bit is ignored if
- the {\tt T} (token) bit is set on the incoming
- packet.
+ \item [\tt To] ({\bf Token Output}) emit a token.\footnoterecall{todo}\footnoterecall{toig}
- \item [\tt Do] ({\bf Data Output}) emit a datum
-
- \item [\tt To] ({\bf Token Output}) emit a token; {\tt To}=1,{\tt
- Do}=1 is invalid on outbox
-
-% \item [\tt Fi] ({\bf Final Iteration}) emit a token when {\tt
-% Count=0} (ie the last iteration of a requeueing
-% loop); {\tt Tl}=1,{\tt To}=1 is invalid.
+ \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
+ the {\tt To} bit unless {\tt Count=0}\footnoterecall{toig}
\item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
- count are ``Re-Queued'' in the instruction queue
- after execution (see below); otherwise instructions
- with nonzero count are executed again immediately.
+ count are ``Re-Queued'' rather than RePeated. See
+ {\tt Count} for more detail. \footnote{ An
+ instruction {\it in memory} may not have {\tt
+ Rq=1,Count=0} (use {\tt Rq=0,Count=0})}
- \item [\tt C] ({\bf Count}) {\it After} executing:
+ \item [\tt Count] ({\bf Count}) {\it After} executing:
\begin{verbatim}
-if Count==0 { discard this instruction }
-else {
- if Count<1111111 { decrement count }
- if Rq=1 and L=0 {
+if (Count==0) {
+ discard this instruction
+} else {
+ if Count < MAX_COUNT {
+ decrement count
+ }
+ if Rq=1 or Literal {
put this instruction back into the instruction fifo
} else {
execute this instruction again
}
}
-\end{verbatim}
+\end{verbatim}
+Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT}
\item [\tt Dest] ({\bf Data/Token Destination})
- Any packets (token or datum) emitted {\it to the switch fabric}
- will be emitted with this address in the packet's destination
- field.
+ Normally, this field is copied into the path portion of any
+ outgoing packet ({\tt Do} on an output port or {\tt To} on either).
+
+ However, in the special case of an output port, if {\tt Do=1,To=1}, then
+ the {\it most significant} {\tt 11} bits of the value in the {\it
+ data register} are used as a destination path instead.
+ %\footnote{This
+ %functionality eliminates the need for any sort of ``{\tt Execute}''
+ %ship, and lets a {\tt Fifo} ship act as an extension of the
+ %instruction queue in the pump.}
+
\end{itemize}
-\subsection*{Notes}
+
+\pagebreak
+\section{Opcodes}
+
+Opcodes, as specified in \cite{am25}, are not yet implemented, but
+should be.
+
+\section{Bypasses}
+
+Bypasses, as specified in \cite{am27}, are not yet implemented, but
+should be.
+
+\section{Sequencing Guarantees}
+
+\addcontentsline{toc}{subsection}{Source Sequence Guarantee}
+If two packets leave the same source and have the same path, they will
+arrive at their common destination {\it in the same order in which
+ they left the source}. This assurance is referred to as the {\it
+ source sequence guarantee}.
+
+\addcontentsline{toc}{subsection}{Instruction Sequence Guarantee}
+CodeBags in memory consist of an unordered set of {\it fibers}. A
+fiber is an ordered set of instructions which all share the same
+instruction path, and therefore will be executed by the same pump.
+Any code which dispatches a codebag {\it must} ensure that the
+instructions within a fiber reach their pump in the order in which
+they appear in the fiber. This is known as the {\it instruction
+ sequence guarantee}. Typically the software dispatching a codebag
+will take advantage of the source sequence guarantee in order to
+fulfill the instruction sequence guarantee.
+
+\section{Code Bags}
+
+Code bags may overlap in memory. If the assembler determines that two
+code bags share a set of fibers, it may feel free to re-order those
+fibers in such a way that the code bags overlap, sharing memory words
+for these fibers. The codebag descriptors for the two bags will then
+have different addresses and/or sizes, but the ranges which they
+encompass will overlap.
+
+\subsection{Dispatching Code Bags}
+
+A codebag is ``dispatched'' by causing its constituent words to
+arrive at the data latch of some output port, and then causing that
+output port's pump to execute a {\tt send} (not {\tt sendto})
+instruction, which is encoded as {\tt Do=1,To=1}. Because
+instructions are encoded in such a way that their {\tt Instruction
+ Path} appears in the upper 11 bits, the {\tt send} mechanism will
+correctly route the instruction to a pump will execute it.
+
+Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally
+suited for this purpose, although a similar effect can easily be
+achieved using the {\tt BitFifo} ship in conjunction with the {\tt
+ Memory} ship. The only disadvantage to this arrangement is that the
+{\tt out} port must execute {\tt send} a certain number of times, and
+that number of times is determined by the {\tt size} field of the
+codebag descriptor. Unfortunately this value is not known at compile
+time, so it cannot be encoded as a repeat count on the instruction at
+the {\tt out} port. The typical solution is to place a standing ({\tt
+ count=$\infty$}) {\tt send} instruction at the {\tt out} port of one
+{\tt Memory} ship, and reserve that ship exclusively for dispatching
+code bags.
+
+Note that instructions are encoded relative to a particular dispatch
+output port. That is, the {\tt Instruction Path} field of the
+instruction specifies a path to the desired execution pump -- but this
+path is also specified with respect to some source. Therefore the
+instruction can be correctly dispatched only from that particular
+source. However, it is fairly simple to write software which can
+``relocate'' a codebag by replacing the {\tt Instruction Path} fields
+as needed.
+
+\subsection{Tokens and Data Items}
+
+When a data item is sent to a destination which expects a token, such
+as an output port destination, the effect will be the same as if a token
+had been sent to that destination, and the data value will be lost.
+
+When a token is sent to a destination which expects a data item, such
+as a pump destination or an input port destination, the effect will be
+the same as if a data item {\it having an undefined value} were sent
+to that destination. In other words, the programmer has no completely
+reliable mechanism for distinguishing between tokens and data items.
+
+
+\section{Future Directions}
+
+\subsection{The Role of the Count Field}
+
+Looking back on the design of the pump, several things are now
+apparent which were not initially. In particular, it seems that it
+could be useful to separate {\it loading the count register} from
+other types of instructions. This would have a few advantages:
+
\begin{itemize}
-\item A "standing" instruction is encoded as {\tt Count}=$\text{\tt 1111111}_\text{two}$
+\item The size of the count field would not be a consideration in the
+ ``instruction budget'' of normal execution instructions
+
+\item It would be possible to have finitely-repeating,
+ infinitely-requeueing instructions. Various implementation
+ details\footnote{In many implementations, the count field of an
+ instruction is destroyed as it counts down; another register and
+ mux would be needed to ``save'' this value so it could be
+ restored prior to being requeued} make it awkward to support
+ this unless the count register can be manipulated independently
+ of instruction execution.
+\end{itemize}
-\item A {\tt clog} is encoded as a "standing" {\tt nop} ({\tt Ti},
- {\tt Di}, {\tt Dc}, {\tt Do}, {\tt To} all cleared)
+With this in mind, we can refactor the ``essence of the pump'' into
+the following actions, each of which may or may not be performed by a
+particular instruction:
-\item An {\tt unclog} is encoded as {\tt K=1},{\tt Ti=1}
+\begin{itemize}
+\item await and acknowledge a token
+\item await and acknowledge a datum
+\item load the awaited datum into the path latch (top 11 bits)
+\item load the awaited datum into the data latch
+\item load a literal into the data latch
+\item load a literal into the path latch
+\item send the value in the data latch along the path in the path latch
+\item send a token along the path in the path latch
+\item set the count register to a literal value
+\item decrement the count register
+\item requeue this instruction if {\tt count>0}
+\item requeue this instruction unconditionally
+\item repeat this instruction if {\tt count>0}
+\item repeat this instruction unconditionally
+\end{itemize}
+The crucial change here is the decoupling of the act of {\it loading
+ the count register} from the act of {\it loading the next
+ instruction}. It also separates the act of loading the ``path
+latch'' from the act of actually performing the transmission. This
+latter feature makes it possible to load the data and destination
+latches from two distinct data items, allowing ships to create,
+handle, and consume {\it packets} in the form of a pair of data items.
+
+At this point, the remaining decisions are simply a question of
+instruction bit budgeting. Currently we have a separate instruction
+form for literals, so that the literal can use bits which are not
+otherwise relevant to literal instructions (for example, {\tt Di}).
+Adding an additional instruction form for loading the count register
+would have similar advantages.
+
+
+\subsection{For Just a Little Bit More...}
+
+Compared to early revisions of the Fleet architecture, the docks of
+FleetTwo are substantially more powerful. One might ask, why not add
+additional power to the docks, perhaps up to the point of making them
+tiny von Neumann machines in their own right? Indeed, doing so would
+risk reliving the {\it Wheel of Reincarnation}
+\cite{WheelOfReincarnation}.
+
+I believe, however, that there is a fundamental distinction to be
+made between devices whose control flow {\it can branch} and those
+whose control flow {\it cannot branch}, and that this distinction is
+founded in the geometry of VLSI, and indeed, Euclidean space.
+Ultimately, the instruction stream for any component of a processor
+must exist in some physical incarnation -- some geometric extent. If
+the control flow of a program is linear, with no branching, then the
+instruction stream has an optimal spatial mapping which is
+efficient: simply lay it out as a FIFO.
+
+By contrast, however, a program whose instruction stream has
+unrestricted branching is logically a {\it tree} structure.
+Unfortunately, however, general trees have no efficient mapping onto
+two-dimensional surfaces. By this, I mean that in every possible
+mapping, there exists some pair of {\it logically adjacent}
+instructions whose {\it physical distance} is proportional to the size
+of the entire program, rather than some constant factor irrespective
+of the program size. Therefore I conclude that the spatial mapping of
+a program with unrestricted branching is in an entirely different
+class than that of programs with linear control flow, and I choose this
+distinction as the basis for limiting the docks to linear control
+flow sequences. A similar argument can be made for looping constructs
+in which two or more loops may be nested within a single parent loop.
+
+With this distinction in mind, it would not be objectionable for
+future incarnations of Fleet to have {\it predicated} instructions at
+the docks. This might take the form of:
+
+\begin{itemize}
+\item Instructions to unconditionally set some ``state flags''
+\item Instructions to conditionally set some ``state flags'' based on the contents of the data latch
+\item Bits which cause an instruction to be ignored or retired based on the state flags
\end{itemize}
+
+I feel that this addition would not risk falling down the slippery
+slope -- instruction streams with predicated instructions (but not
+unrestricted branching or nested looping) preserve the crucial
+property that they have an efficient spatial mapping. Any steps
+beyond this, however, cross a fairly fundamental line.
+
+
+\addcontentsline{toc}{section}{Ship Descriptions}
+
+