\documentclass[10pt,oneside]{article}
\reversemarginpar
+\usepackage[titles]{tocloft}
\usepackage{emp}
+\usepackage{amsmath}
\DeclareGraphicsRule{*}{mps}{*}{}
+\newcommand{\footnoteremember}[2]{
+ \footnote{#2}
+ \newcounter{#1}
+ \setcounter{#1}{\value{footnote}}
+} \newcommand{\footnoterecall}[1]{
+ \footnotemark[\value{#1}]
+}
+
\ifx\pdftexversion\undefined
\usepackage[dvips]{graphicx}
\DeclareGraphicsExtensions{.eps}\else
\usepackage{bytefield}
\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
\renewcommand{\ttdefault}{cmtt}
-\title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of Fleet}}
+\title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of FleetTwo}}
\begin{document}
\maketitle
\end{abstract}
+%\vfill
+%\fbox{\parbox{5in}{\footnotesize
+%\begin{center}
+%The software described in this memo may be obtained using
+%\href{http://www.darcs.net/}{\tt darcs}, with the command:
+%
+%{\tt darcs get http://research.cs.berkeley.edu/class/fleet/repos/fleet/}
+%\end{center}
+%}}
+
\pagebreak
-\section*{Introduction (from \cite{ies02})}
+\tableofcontents
+
+\pagebreak
+\section{Introduction (from \cite{ies02})}
During the past two and a half years, the Fleet architecture has gone
through massive changes and improvements. Nonetheless, the
\pagebreak
-\section*{The Programmer's View of The Ship-Fabric Interface}
+\section{The Programmer's View of The Ship-Fabric Interface}
The role of the Fleet switch fabric is to carry {\it packets} from
{\it sources} to {\it destinations}.
The diagram below represents a {\it programmer's} conceptual view of
the interface between ships and the switch fabric. Actual
implementation circuitry may differ substantially. Sources and
-destinations which can send and recieve only tokens -- not data items
+destinations which can send and receive only tokens -- not data items
-- are drawn as dashed lines.
\vspace{0.3cm}
\vspace{0.3cm}
The term {\it port} refers to an interface to the ship, the {\it
- valve} connecting it to the switch fabric, and the corresponding
+ dock} connecting it to the switch fabric, and the corresponding
sources and destinations on the switch fabric.
-Each valve consists of a {\it data latch}, which is as wide as a
+Each dock consists of a {\it data latch}, which is as wide as a
single machine word and a {\it pump}, which is a circular fifo of
instruction-width latches. The values in the instruction fifo
control the data latch, as future chapters will explain.
-Note that the pump in each valve has a destination of its own, and
+Note that the pump in each dock has a destination of its own, and
that there is no buffering fifo guarding this destination. Note that
all other destinations are guarded by a buffering fifo; the size of
-this fifo is exposed to the software programmer so she can ensure that
-deadlock does not occur.
+this fifo is exposed to the software programmer so she can avoid
+deadlock.
\pagebreak
-\section*{Data Formats}
+\section{Data Formats}
-\subsection*{Path (12 bits)}
+\subsection{Path (12 bits)}
These bits appear physically within the switch fabric. \footnote{In
the Sun Labs implementation, these bits have ``address bit
\end{bytefield}
}
-\subsection*{Data Word In Memory (37 bits)}
+\subsection{Data Word In Memory (37 bits)}
A word of data is 37 bits wide.
\end{bytefield}
}
-\subsection*{Packet In Flight (49 bits)}
+\subsection{Packet In Flight (49 bits)}
{\tt\footnotesize
\begin{bytefield}{49}
\end{bytefield}
}
-\subsection*{Instruction In Memory (37 bits)}
+\subsection{Instruction In Memory (37 bits)}
An instruction must be no wider than a memory word. The next section
explains the bits in greater detail. The {\tt Instruction Path} is
{\tt\tiny
\begin{bytefield}{49}
- \bitheader[b]{0,10,11,17,18-26,36}\\
+ \bitheader[b]{0,6,7,17,18-26,36}\\
\bitbox[r]{12}{}
\bitbox{11}{Instruction Path}
\bitbox{1}{1}
\end{bytefield}
}
-\subsection*{Instruction Packet In Flight (49 bits)}
+\subsection{Instruction Packet In Flight (49 bits)}
Note that Fleet simply copies the {\tt Instruction Path} field, bit
for bit, into the packet {\tt Path} field in order to ``dispatch'' an
{\tt\tiny
\begin{bytefield}{49}
- \bitheader[b]{0,10,11,17,18-25,37,47,48}\\
+ \bitheader[b]{0,6,7,17,18-25,37,47,48}\\
\bitbox[r]{1}{0}
\bitbox{11}{Instruction Path}
\bitbox{11}{Instruction Path}
\setlength{\bitwidth}{5mm}
\pagebreak
-\section*{Instruction Formats}
+\section{Instruction Formats}
\begin{wrapfigure}{R}{2in}
\epsfig{file=locations,width=2in}
\vspace{-0.4in}
\vspace{0.3cm}
{\bf Kill}
+\addcontentsline{toc}{subsection}{Kill}
{\tt
\begin{bytefield}{26}
\bitbox{1}{1}
\bitbox{1}{0}
\bitbox{1}{1}
- \bitbox{14}{}
+ \bitbox{14}{don't care}
\bitbox{7}{Count}
\end{bytefield}}
When a {\tt kill} instruction reaches the insertion point, it will
retired and the count field of the {\tt kill} instruction is
decremented. If the {\tt kill} instruction's count was {\tt 0} before
decrementing, the {\tt kill} instruction is retired. The programmer
-is assured that a kill instruction with a count of $n$ will kill $n$
+is assured that a kill instruction with a count of $n$ will kill $n+1$
{\it consecutive} instructions.
\vspace{0.3cm}
{\bf Clog}
+\addcontentsline{toc}{subsection}{Clog}
{\tt
\begin{bytefield}{26}
\bitbox{1}{1}
\bitbox{1}{0}
\bitbox{1}{0}
- \bitbox{21}{}
+ \bitbox{21}{don't care}
\end{bytefield}}
-When a {\tt clog} instruction reaches the execution point, no more
-instructions will be executed until an {\tt unclog} is performed.
+When a {\tt clog} instruction reaches the execution point, it remains
+there and no more instructions will be executed until an {\tt unclog}
+is performed.
\vspace{0.3cm}
{\bf UnClog}
+\addcontentsline{toc}{subsection}{UnClog}
{\tt
\begin{bytefield}{26}
\bitbox{1}{0}
\bitbox{1}{1}
\bitbox{1}{0}
- \bitbox{21}{}
+ \bitbox{21}{don't care}
\end{bytefield}}
When an {\tt unclog} instruction reaches the insertion point, it will wait
there until a {\tt clog} instruction is at the execution point. When
\vspace{0.3cm}
{\bf Literal}
+\addcontentsline{toc}{subsection}{Literal}
{\tt
\begin{bytefield}{26}
\pagebreak
{\bf Normal Instruction}
+\addcontentsline{toc}{subsection}{Normal Instruction}
{\tt
\begin{bytefield}{26}
\end{bytefield}
}
+\footnoteremember{tidi}{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.}
+\footnoteremember{didc}{Note that {\tt Di}=0,{\tt Dc}=1
+ is meaningless and therefore reserved for other
+ uses.}
+\footnoteremember{todo}{ {\tt To}=1,{\tt Do}=1 have special meaning on an output port.}
+\footnoteremember{toig}{{\tt To}=0,{\tt Ig}=1 is invalid}
\begin{itemize}
-
- \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge
- it\footnote{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.}
+ \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge it.
\item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it.
+ \footnoterecall{tidi}\footnoterecall{didc}
\item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
datum in the data latch. This bit is ignored if the incoming packet is
- a token. \footnote{ Note that {\tt Di}=0,{\tt Dc}=1
- is meaningless and therefore reserved for other
- uses.}
+ a token. \footnoterecall{didc}
- \item [\tt Do] ({\bf Data Output}) emit a datum.
+ \item [\tt Do] ({\bf Data Output}) emit a datum.\footnoterecall{todo}
- \item [\tt To] ({\bf Token Output}) emit a token.\footnote{ {\tt To}=1,{\tt
- Do}=1 have special meaning on an output port.}
+ \item [\tt To] ({\bf Token Output}) emit a token.\footnoterecall{todo}\footnoterecall{toig}
\item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
- the {\tt To} bit unless {\tt Count=0} \footnote{{\tt
- To}=0,{\tt Ig}=1 is invalid}
+ the {\tt To} bit unless {\tt Count=0}\footnoterecall{toig}
\item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
count are ``Re-Queued'' rather than RePeated. See
Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT}
\item [\tt Dest] ({\bf Data/Token Destination})
- Normally, this field is copied into the address portion of any
+ Normally, this field is copied into the path portion of any
outgoing packet ({\tt Do} on an output port or {\tt To} on either).
However, in the special case of an output port, if {\tt Do=1,To=1}, then
the {\it most significant} {\tt 11} bits of the value in the {\it
- data register} are used as a destination address instead.
+ data register} are used as a destination path instead.
%\footnote{This
%functionality eliminates the need for any sort of ``{\tt Execute}''
%ship, and lets a {\tt Fifo} ship act as an extension of the
\pagebreak
-\section*{Opcodes}
+\section{Opcodes}
Opcodes, as specified in \cite{am25}, are not yet implemented, but
should be.
-\section*{Bypasses}
+\section{Bypasses}
Bypasses, as specified in \cite{am27}, are not yet implemented, but
should be.
-\section*{Sequencing Guarantees}
+\section{Sequencing Guarantees}
+\addcontentsline{toc}{subsection}{Source Sequence Guarantee}
If two packets leave the same source and have the same path, they will
arrive at their common destination {\it in the same order in which
they left the source}. This assurance is referred to as the {\it
source sequence guarantee}.
+\addcontentsline{toc}{subsection}{Instruction Sequence Guarantee}
CodeBags in memory consist of an unordered set of {\it fibers}. A
fiber is an ordered set of instructions which all share the same
instruction path, and therefore will be executed by the same pump.
will take advantage of the source sequence guarantee in order to
fulfill the instruction sequence guarantee.
-\section*{Code Bags}
+\section{Code Bags}
Code bags may overlap in memory. If the assembler determines that two
code bags share a set of fibers, it may feel free to re-order those
have different addresses and/or sizes, but the ranges which they
encompass will overlap.
-\subsection*{Dispatching Code Bags}
+\subsection{Dispatching Code Bags}
-A codebag is ``dispatched'' by causing its constitutent words to
+A codebag is ``dispatched'' by causing its constituent words to
arrive at the data latch of some output port, and then causing that
output port's pump to execute a {\tt send} (not {\tt sendto})
instruction, which is encoded as {\tt Do=1,To=1}. Because
instructions are encoded in such a way that their {\tt Instruction
Path} appears in the upper 11 bits, the {\tt send} mechanism will
-correctly route the instruction to the pump which ought to execute it.
+correctly route the instruction to a pump will execute it.
Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally
suited for this purpose, although a similar effect can easily be
output port. That is, the {\tt Instruction Path} field of the
instruction specifies a path to the desired execution pump -- but this
path is also specified with respect to some source. Therefore the
-instruction can only be correctly dispatched from that particular
-source. However, it is a fairly simple matter two write a software
-program which can ``relocate'' a codebag by replacing the {\tt
- Instruction Path} fields as needed.
+instruction can be correctly dispatched only from that particular
+source. However, it is fairly simple to write software which can
+``relocate'' a codebag by replacing the {\tt Instruction Path} fields
+as needed.
-\subsection*{Tokens and Data Items}
+\subsection{Tokens and Data Items}
When a data item is sent to a destination which expects a token, such
as an output port destination, the effect will be the same as if a token
-had been sent to that destination.
+had been sent to that destination, and the data value will be lost.
When a token is sent to a destination which expects a data item, such
as a pump destination or an input port destination, the effect will be
reliable mechanism for distinguishing between tokens and data items.
-\section*{Future Directions}
+\section{Future Directions}
-\subsection*{The Role of the Count Field}
+\subsection{The Role of the Count Field}
Looking back on the design of the pump, several things are now
apparent which were not initially. In particular, it seems that it
\item It would be possible to have finitely-repeating,
infinitely-requeueing instructions. Various implementation
- details\footnote{in many implementations, the count field of an
+ details\footnote{In many implementations, the count field of an
instruction is destroyed as it counts down; another register and
mux would be needed to ``save'' this value so it could be
restored prior to being requeued} make it awkward to support
\begin{itemize}
\item await and acknowledge a token
\item await and acknowledge a datum
+\item load the awaited datum into the path latch (top 11 bits)
\item load the awaited datum into the data latch
-\item load the a literal into the data latch
-\item load a value indicated by the instruction into the path latch
-\item load the top 11 bits of the data latch into the path latch
-\item treat the values in the path latch and data latch as a packet and transmit it
-\item treat the value in the path latch as a token and transmit it
+\item load a literal into the data latch
+\item load a literal into the path latch
+\item send the value in the data latch along the path in the path latch
+\item send a token along the path in the path latch
\item set the count register to a literal value
\item decrement the count register
\item requeue this instruction if {\tt count>0}
latches from two distinct data items, allowing ships to create,
handle, and consume {\it packets} in the form of a pair of data items.
-At this point, it boils down to a question of instruction bit
-budgeting. Currently we have a separate instruction form for
-literals, so that the literal can use bits which are not otherwise
-relevant to literal instructions (for example, {\tt Di}). Adding an
-additional instruction form for loading the count register would have
-similar advantages.
+At this point, the remaining decisions are simply a question of
+instruction bit budgeting. Currently we have a separate instruction
+form for literals, so that the literal can use bits which are not
+otherwise relevant to literal instructions (for example, {\tt Di}).
+Adding an additional instruction form for loading the count register
+would have similar advantages.
-\subsection*{For Just a Little Bit More...}
+\subsection{For Just a Little Bit More...}
-Compared to early revisions of the Fleet architecture, the valves of
+Compared to early revisions of the Fleet architecture, the docks of
FleetTwo are substantially more powerful. One might ask, why not add
-additional power to the valves, perhaps up to the point of making them
+additional power to the docks, perhaps up to the point of making them
tiny von Neumann machines in their own right? Indeed, doing so would
risk reliving the {\it Wheel of Reincarnation}
\cite{WheelOfReincarnation}.
Ultimately, the instruction stream for any component of a processor
must exist in some physical incarnation -- some geometric extent. If
the control flow of a program is linear, with no branching, then the
-instruction stream has an optimal spatial mapping which is quite
-efficient: it is simply laid out in FIFO fashion.
+instruction stream has an optimal spatial mapping which is
+efficient: simply lay it out as a FIFO.
By contrast, however, a program whose instruction stream has
-unrestricted branching is logically a {\it tree} structure. Quite
-unfortunately, general trees have no efficient mapping onto
+unrestricted branching is logically a {\it tree} structure.
+Unfortunately, however, general trees have no efficient mapping onto
two-dimensional surfaces. By this, I mean that in every possible
mapping, there exists some pair of {\it logically adjacent}
instructions whose {\it physical distance} is proportional to the size
of the entire program, rather than some constant factor irrespective
of the program size. Therefore I conclude that the spatial mapping of
a program with unrestricted branching is in an entirely different
-class than that of programs with linear control flow. I choose this
-distinction as the basis for limiting our valves to linear control
+class than that of programs with linear control flow, and I choose this
+distinction as the basis for limiting the docks to linear control
flow sequences. A similar argument can be made for looping constructs
in which two or more loops may be nested within a single parent loop.
With this distinction in mind, it would not be objectionable for
future incarnations of Fleet to have {\it predicated} instructions at
-the valves. This might take the form of:
+the docks. This might take the form of:
\begin{itemize}
\item Instructions to unconditionally set some ``state flags''
I feel that this addition would not risk falling down the slippery
slope -- instruction streams with predicated instructions (but not
-unrestricted branching) preserve the crucial property that they have
-an efficient spatial mapping. Any steps beyond this, however, cross a
-fairly fundamental line.
+unrestricted branching or nested looping) preserve the crucial
+property that they have an efficient spatial mapping. Any steps
+beyond this, however, cross a fairly fundamental line.
+
+\addcontentsline{toc}{section}{Ship Descriptions}