X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=doc%2Farchman.tex;h=7019dd8f94f1ce4d5aeef1b8dc0e9b0614431249;hb=aee85d3dd0435554e14e03b97f268752843a441f;hp=5d79cc82ca0a5dc8e2152c293b60ebc1511f3f88;hpb=a94fe3639c24442e029b4a468b64f28aea7a76e0;p=fleet.git diff --git a/doc/archman.tex b/doc/archman.tex index 5d79cc8..7019dd8 100644 --- a/doc/archman.tex +++ b/doc/archman.tex @@ -1,83 +1,219 @@ -\documentclass[10pt,oneside]{book} +\documentclass[10pt,oneside]{article} \reversemarginpar +\usepackage[titles]{tocloft} +\usepackage{emp} +\usepackage{amsmath} +\DeclareGraphicsRule{*}{mps}{*}{} + +\newcommand{\footnoteremember}[2]{ + \footnote{#2} + \newcounter{#1} + \setcounter{#1}{\value{footnote}} +} \newcommand{\footnoterecall}[1]{ + \footnotemark[\value{#1}] +} + +\ifx\pdftexversion\undefined +\usepackage[dvips]{graphicx} +\DeclareGraphicsExtensions{.eps}\else +\usepackage[pdftex]{graphicx} +\DeclareGraphicsExtensions{.pdf,.png,.mps,.mp} +\usepackage{epstopdf} +\fi \usepackage{palatino} +\usepackage{wrapfig} \usepackage{epsfig} +\usepackage{verbatim} \usepackage{parskip} \usepackage{register} \usepackage{bytefield} +\usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} \renewcommand{\ttdefault}{cmtt} -\title{The FleetTwo Architecture Manual} +\title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of FleetTwo}} \begin{document} \maketitle + +\begin{thebibliography}{[GDG01]} +\bibitem[IES02]{ies02} + Sutherland, Ivan, + \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer} +\bibitem[IES14]{ies14} + Sutherland, Ivan, + \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition} +\bibitem[AM17]{am17} + Megacz, Adam, + \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am17-The.Choice.Ship.pdf}{\it UCB-AM17: The Choice Ship} +\bibitem[IES44]{ies44} + Sutherland, Ivan, + \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition} +\bibitem[AM25]{am25} + Megacz, Adam, + \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am25.pdf}{\it UCB-AM25: Opcode Ports} +\bibitem[AM27]{am27} + Megacz, Adam, + \href{http://research.cs.berkeley.edu/class/fleet/docs/people/adam.megacz/am27.pdf}{\it UCB-AM27: Bypass Paths} +\bibitem[IES31]{ies31} + Sutherland, Ivan, + \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies31-Some.SHIPs.pdf}{\it UCB-IES31: Some Ships} +\bibitem[MS68]{WheelOfReincarnation} + Myer, T. and Sutherland, Ivan, {\it On the Design of Display Processors} + Communications of the ACM, Vol 11, No. 6, June 1968 . +\end{thebibliography} + +\vspace{1cm} +\begin{abstract} +This manual attempts to detail all those aspects of Fleet which are +visible to a software programmer or compiler backend. + +This document assumes that the reader is broadly familiar with the +Fleet architecture and the motivation behind it. For a far better +introduction to these topics, the reader is referred to \cite{ies02} +\cite{ies14} and \cite{ies44}. Please bear in mind that many of the +details in these documents have been changed, but they still serve as +the authoritative introduction to the architecture. After reviewing +these introductory papers, the reader is invited to digest this +architecture manual for the latest information on the details +necessary to write software and compilers for Fleet. +\end{abstract} + + +%\vfill +%\fbox{\parbox{5in}{\footnotesize +%\begin{center} +%The software described in this memo may be obtained using +%\href{http://www.darcs.net/}{\tt darcs}, with the command: +% +%{\tt darcs get http://research.cs.berkeley.edu/class/fleet/repos/fleet/} +%\end{center} +%}} + +\pagebreak +\tableofcontents + \pagebreak -\section*{Glossary} +\section{Introduction (from \cite{ies02})} + +During the past two and a half years, the Fleet architecture has gone +through massive changes and improvements. Nonetheless, the +fundamental vision remains unchanged. As Sutherland writes in the +original Berkeley Fleet description \cite{ies02}, + +{\it +When computers were new, logic and storage were expensive and wires +were relatively cheap. Early computer designers avoided using logic +wherever possible but were not greatly concerned with communication. +They made design choices consistent with the costs of the day. My +favorite example is the jump instruction. Early designers put jump +instructions at the end of each block of code to avoid the expense of +storing the address of the next block while executing the present one. + +Today's chip fabrication methods invert the older cost balance. In +today's integrated circuits logic and memory are now almost free but +communication costs dominate chip area, energy consumption and delay. +In spite of these changes in the stuff from which we make computers, +vestiges of the past remain in many of today's common microprocessor +designs. For example, jump instructions still appear at the end of +each basic block of code in spite of the need to pre-fetch the next +block while executing this one. It seems better today to store a +pointer to the next block and the length of the current block early in +each block of code. + +Instead of following the path of history, I'd like to listen carefully +to what modern chip structures have to teach about how one might build +a modern computer. I see three major lessons. First, simplicity can +reduce cost. Second, moving data will consume most of the time, +energy and chip area of a modern computer. And third, the low cost of +logic makes concurrency available if we can figure out how to use it. + +\begin{itemize} + +\item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating + all processing devices alike... + +\item {\bf Communication}: The Fleet architecture seeks to control the + cost of communication by putting it under direct programmer + control. Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that + include concealed communication to and from a register file... -Port\\ -Inlet\\ -Outlet\\ -Pump\\ -Packet\\ -Data Item\\ -Opcode Port\\ -Bypass\\ +\item {\bf Concurrency}: The Fleet architecture assumes concurrency + nearly everywhere... + +\item {\bf Asynchrony}: This provides enormous flexibility for + implementers to improve the performance or reduce the cost of + the processing devices in a Fleet system. -- Synchronous + implementations both of ships and of the switch fabric are + possible provided they use validity or occupancy bits to achieve + the arbitrary delays required at sources and destinations... + +\end{itemize} +} \pagebreak -\section*{Programmer's View of The Ship-Fabric Interface} +\section{The Programmer's View of The Ship-Fabric Interface} -The diagram below represents a {\it programmer's} conceptual view of -the interface between ships and the switch fabric. Actual -implementations may differ radically, as long as such differences are -not visible to software programs. +The role of the Fleet switch fabric is to carry {\it packets} from +{\it sources} to {\it destinations}. -%\begin{wrapfigure}{R}{2in} -\epsfig{file=ports,width=6in} -%\end{wrapfigure} +A packet consists of a {\it path} and a {\it payload}. The path is +some string of bits which tells the switch fabric how to route from +the desired source to the desired destination. The distinction +between a {\it path} and a {\it destination} is extremely important! +The payload is either a {\it token} or a {\it data item}. If the +payload is a data item, it includes one machine word of data. -The term {\it port} refers to each interface to the ship (either -inbound or outbound) and the machinery required to manage this -interface. The machinery consists of a {\it latch}, which is as wide -as a single machine word, a {\it pump}, which is a circular fifo of -instruction-width latches, and a number of {\it sources} or {\it - destinations}. Sources and destinations which can only send or -recieve tokens (rather than data items) are drawn as dashed lines. -Buffering fifos are drawn where they appear. +The diagram below represents a {\it programmer's} conceptual view of +the interface between ships and the switch fabric. Actual +implementation circuitry may differ substantially. Sources and +destinations which can send and receive only tokens -- not data items +-- are drawn as dashed lines. -Note in particular that every pump is a destination capable of -recieving a data word. This is how instructions are dispatched to -pumps -- they are inserted into the switch fabric as {\it packets} -whose destination addresses +\vspace{0.3cm} +\begin{center} +\epsfig{file=ports,width=4in} +\end{center} +\vspace{0.3cm} -Note that addresses are actually paths. +The term {\it port} refers to an interface to the ship, the {\it + dock} connecting it to the switch fabric, and the corresponding +sources and destinations on the switch fabric. -Define packets. +Each dock consists of a {\it data latch}, which is as wide as a +single machine word and a {\it pump}, which is a circular fifo of +instruction-width latches. The values in the instruction fifo +control the data latch, as future chapters will explain. +Note that the pump in each dock has a destination of its own, and +that there is no buffering fifo guarding this destination. Note that +all other destinations are guarded by a buffering fifo; the size of +this fifo is exposed to the software programmer so she can avoid +deadlock. \pagebreak -\section*{Data Formats} +\section{Data Formats} -\subsection*{Packet Destination Address (12 bits)} +\subsection{Path (12 bits)} -These bits appear physically within the switch fabric, and have -``address bit timing.'' The {\tt T} bit is the ``tokenhood'' bit; if -set, this packet represents a token and it does not cause the switch -fabric data latches to fire. +These bits appear physically within the switch fabric. \footnote{In + the Sun Labs implementation, these bits have ``address bit + timing.''} The {\tt T} bit is the ``tokenhood'' bit; if set, this +packet represents a token\footnote{In the Sun Labs + implementation, this bit inhibits the switch fabric data latches + from firing.} {\tt\footnotesize \begin{bytefield}{49} \bitheader[b]{37,47,48}\\ \bitbox{1}{T} - \bitbox{11}{Destination Address} + \bitbox{11}{Path} \bitbox[l]{37}{} \end{bytefield} } -\subsection*{Data Word In Memory (37 bits)} +\subsection{Data Word In Memory (37 bits)} -A word of memory is 37 bits wide. For convenience, we assume that the -memory word width is also the width of a pointer as well as the width -of all on-chip data item registers. +A word of data is 37 bits wide. {\tt\footnotesize \begin{bytefield}{49} @@ -87,91 +223,90 @@ of all on-chip data item registers. \end{bytefield} } -\subsection*{Data Packet In Flight (49 bits)} - -A {\it data packet} is a data item in the switch fabric, on its way to -some destination. +\subsection{Packet In Flight (49 bits)} {\tt\footnotesize \begin{bytefield}{49} \bitheader[b]{0,36,37,47,48}\\ \bitbox{1}{T} - \bitbox{11}{Destination Address} + \bitbox{11}{Path} \bitbox{37}{Data Word} \end{bytefield} } -\subsection*{Instruction In Memory (37 bits)} +\subsection{Instruction In Memory (37 bits)} An instruction must be no wider than a memory word. The next section -explains the bits in greater detail. +explains the bits in greater detail. The {\tt Instruction Path} is +the path which the instruction itself will take in order to arrive at +the desired pump destination. The {\it Data/Token Path} is placed on +any packet inserted into the switch fabric as a result of executing +the instruction. {\tt\tiny \begin{bytefield}{49} - \bitheader[b]{0,10,11,17,18-26,36}\\ + \bitheader[b]{0,6,7,17,18-26,36}\\ \bitbox[r]{12}{} - \bitbox{11}{Instruction Register Address} - \bitbox{1}{K} - \bitbox{1}{L} + \bitbox{11}{Instruction Path} + \bitbox{1}{1} \bitbox{1}{Ti} \bitbox{1}{Di} \bitbox{1}{Dc} \bitbox{1}{Do} \bitbox{1}{To} + \bitbox{1}{Ig} \bitbox{1}{Rq} + \bitbox{11}{Data/Token Path} \bitbox{7}{Count} - \bitbox{11}{Data/Token Destination} \end{bytefield} } -\subsection*{Instruction Packet In Flight (49 bits)} +\subsection{Instruction Packet In Flight (49 bits)} -A {\it instruction packet} is an instruction in the instruction horn -(which may or may not be the same thing as the data horn), on its way -to some instruction register (Valve). +Note that Fleet simply copies the {\tt Instruction Path} field, bit +for bit, into the packet {\tt Path} field in order to ``dispatch'' an +instruction. {\tt\tiny \begin{bytefield}{49} - \bitheader[b]{0,10,11,17,18-25,37,47,48}\\ + \bitheader[b]{0,6,7,17,18-25,37,47,48}\\ \bitbox[r]{1}{0} - \bitbox{11}{Instruction Register Address} - \bitbox[lr]{11}{} - \bitbox{1}{K} - \bitbox{1}{L} + \bitbox{11}{Instruction Path} + \bitbox{11}{Instruction Path} + \bitbox{1}{1} \bitbox{1}{Ti} \bitbox{1}{Di} \bitbox{1}{Dc} \bitbox{1}{Do} \bitbox{1}{To} + \bitbox{1}{Ig} \bitbox{1}{Rq} + \bitbox{11}{Data/Token Path} \bitbox{7}{Count} - \bitbox{11}{Data/Token Destination} \end{bytefield} } - -\pagebreak - -\section*{Instruction Formats} - -Instructions can be grouped into two categories: {\it killing} -instructions, which are acted upon as soon as they leave the -instruction horn, and {\it executing} instructions, which pass through -the instruction queue before being acted upon. - -Blank fields below are reserved for future use and must be set to -zero. - -Note that the arbiter is requested whenever {\it any of the first - three bits is {\tt 1}}. If the arbiter is not requested, - - - \setlength{\bitwidth}{5mm} +\pagebreak -\subsection*{Killing Instructions} - -Kill (kill anything other than a Clog) +\section{Instruction Formats} +\begin{wrapfigure}{R}{2in} +\epsfig{file=locations,width=2in} +\vspace{-0.4in} +\end{wrapfigure} +Within the pump's instruction fifo, there are two points of particular +interest. The first point is the {\it insertion point}, which is the +point in the ring where new instructions enter from the switch fabric. +The other point of interest is the {\it execution point}, which is the +point in the ring where instructions other than {\tt kill} and {\tt + unclog} are performed. + +The programmer may assume that the insertion point is the immediate +successor of the execution point. + +\vspace{0.3cm} +{\bf Kill} +\addcontentsline{toc}{subsection}{Kill} {\tt \begin{bytefield}{26} @@ -181,38 +316,57 @@ Kill (kill anything other than a Clog) \bitbox{1}{1} \bitbox{1}{0} \bitbox{1}{1} - \bitbox{14}{} + \bitbox{14}{don't care} \bitbox{7}{Count} \end{bytefield}} - -UnClog (kill a Clog) +When a {\tt kill} instruction reaches the insertion point, it will +wait there for a non-{\tt clog} instruction to reach the execution +point. When this occurs, the instruction at the execution point is +retired and the count field of the {\tt kill} instruction is +decremented. If the {\tt kill} instruction's count was {\tt 0} before +decrementing, the {\tt kill} instruction is retired. The programmer +is assured that a kill instruction with a count of $n$ will kill $n+1$ +{\it consecutive} instructions. + +\vspace{0.3cm} +{\bf Clog} +\addcontentsline{toc}{subsection}{Clog} {\tt \begin{bytefield}{26} \bitheader[b]{0,20-25}\\ \bitbox{1}{0} \bitbox{1}{0} - \bitbox{1}{0} \bitbox{1}{1} \bitbox{1}{0} - \bitbox{21}{} + \bitbox{1}{0} + \bitbox{21}{don't care} \end{bytefield}} +When a {\tt clog} instruction reaches the execution point, it remains +there and no more instructions will be executed until an {\tt unclog} +is performed. -\subsection*{Executing Instructions} -Clog +\vspace{0.3cm} +{\bf UnClog} +\addcontentsline{toc}{subsection}{UnClog} {\tt \begin{bytefield}{26} \bitheader[b]{0,20-25}\\ \bitbox{1}{0} \bitbox{1}{0} - \bitbox{1}{1} \bitbox{1}{0} + \bitbox{1}{1} \bitbox{1}{0} - \bitbox{21}{} + \bitbox{21}{don't care} \end{bytefield}} +When an {\tt unclog} instruction reaches the insertion point, it will wait +there until a {\tt clog} instruction is at the execution point. When +this occurs, both instructions retire. -Literal (sign extended, implicit {\tt Rq=1}) +\vspace{0.3cm} +{\bf Literal} +\addcontentsline{toc}{subsection}{Literal} {\tt \begin{bytefield}{26} @@ -222,34 +376,19 @@ Literal (sign extended, implicit {\tt Rq=1}) \bitbox{17}{Literal} \bitbox{7}{Count} \end{bytefield}} - - -Normal - -{\tt -\begin{bytefield}{26} - \bitheader[b]{0,6,7,17-25}\\ - \bitbox{1}{1} - \bitbox{1}{Ti} - \bitbox{1}{Di} - \bitbox{1}{Dc} - \bitbox{1}{Do} - \bitbox{1}{To} - \bitbox{1}{Ig} - \bitbox{1}{Rq} - \bitbox{11}{Dest} - \bitbox{7}{Count} -\end{bytefield}} - - +When a literal instruction reaches the execution point, its {\tt + Literal} field is sign-extended to a full word length and captured +in the data latch. \pagebreak -\subsection*{Field Descriptions} +{\bf Normal Instruction} +\addcontentsline{toc}{subsection}{Normal Instruction} + {\tt \begin{bytefield}{26} \bitheader[b]{0,6,7,16-25}\\ - \bitbox{1}{0} + \bitbox{1}{1} \bitbox{1}{Ti} \bitbox{1}{Di} \bitbox{1}{Dc} @@ -262,27 +401,28 @@ Normal \end{bytefield} } +\footnoteremember{tidi}{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.} +\footnoteremember{didc}{Note that {\tt Di}=0,{\tt Dc}=1 + is meaningless and therefore reserved for other + uses.} +\footnoteremember{todo}{ {\tt To}=1,{\tt Do}=1 have special meaning on an output port.} +\footnoteremember{toig}{{\tt To}=0,{\tt Ig}=1 is invalid} \begin{itemize} + \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge it. - \item [\tt Ti] ({\bf Token Input}) wait for a token and accept - it\footnote{{\tt Ti}=1,{\tt Di}=1 is invalid on inbox.} - - \item [\tt Di] ({\bf Data Input}) wait for a datum and accept it. + \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it. + \footnoterecall{tidi}\footnoterecall{didc} \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted - datum. This bit is ignored if the incoming packet is - a token. \footnote{ Note that {\tt Di}=0,{\tt Dc}=1 - is meaningless and therefore reserved for other - uses.} + datum in the data latch. This bit is ignored if the incoming packet is + a token. \footnoterecall{didc} - \item [\tt Do] ({\bf Data Output}) emit a datum. + \item [\tt Do] ({\bf Data Output}) emit a datum.\footnoterecall{todo} - \item [\tt To] ({\bf Token Output}) emit a token.\footnote{ {\tt To}=1,{\tt - Do}=1 have special meaning on an outbox.} + \item [\tt To] ({\bf Token Output}) emit a token.\footnoterecall{todo}\footnoterecall{toig} \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore - the {\tt To} bit unless {\tt Count=0} \footnote{{\tt - To}=0,{\tt Ig}=1 is invalid} + the {\tt To} bit unless {\tt Count=0}\footnoterecall{toig} \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero count are ``Re-Queued'' rather than RePeated. See @@ -305,24 +445,111 @@ if (Count==0) { } } \end{verbatim} -Note how a ``standing'' instruction is encoded as {\tt Count=1111111} +Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT} \item [\tt Dest] ({\bf Data/Token Destination}) - Normally, this field is copied into the address portion of any - outgoing packet ({\tt Do} on an outbox or {\tt To}). + Normally, this field is copied into the path portion of any + outgoing packet ({\tt Do} on an output port or {\tt To} on either). - However, in the special case of an outbox, if {\tt Do=1,To=1}, then + However, in the special case of an output port, if {\tt Do=1,To=1}, then the {\it most significant} {\tt 11} bits of the value in the {\it - data register} are used as a destination address instead. \footnote{This - functionality eliminates the need for any sort of ``{\tt Execute}'' - ship, and lets a {\tt Fifo} ship act as an extension of the - instruction queue in the pump.} + data register} are used as a destination path instead. + %\footnote{This + %functionality eliminates the need for any sort of ``{\tt Execute}'' + %ship, and lets a {\tt Fifo} ship act as an extension of the + %instruction queue in the pump.} \end{itemize} + \pagebreak +\section{Opcodes} + +Opcodes, as specified in \cite{am25}, are not yet implemented, but +should be. + +\section{Bypasses} + +Bypasses, as specified in \cite{am27}, are not yet implemented, but +should be. + +\section{Sequencing Guarantees} + +\addcontentsline{toc}{subsection}{Source Sequence Guarantee} +If two packets leave the same source and have the same path, they will +arrive at their common destination {\it in the same order in which + they left the source}. This assurance is referred to as the {\it + source sequence guarantee}. + +\addcontentsline{toc}{subsection}{Instruction Sequence Guarantee} +CodeBags in memory consist of an unordered set of {\it fibers}. A +fiber is an ordered set of instructions which all share the same +instruction path, and therefore will be executed by the same pump. +Any code which dispatches a codebag {\it must} ensure that the +instructions within a fiber reach their pump in the order in which +they appear in the fiber. This is known as the {\it instruction + sequence guarantee}. Typically the software dispatching a codebag +will take advantage of the source sequence guarantee in order to +fulfill the instruction sequence guarantee. + +\section{Code Bags} + +Code bags may overlap in memory. If the assembler determines that two +code bags share a set of fibers, it may feel free to re-order those +fibers in such a way that the code bags overlap, sharing memory words +for these fibers. The codebag descriptors for the two bags will then +have different addresses and/or sizes, but the ranges which they +encompass will overlap. + +\subsection{Dispatching Code Bags} + +A codebag is ``dispatched'' by causing its constituent words to +arrive at the data latch of some output port, and then causing that +output port's pump to execute a {\tt send} (not {\tt sendto}) +instruction, which is encoded as {\tt Do=1,To=1}. Because +instructions are encoded in such a way that their {\tt Instruction + Path} appears in the upper 11 bits, the {\tt send} mechanism will +correctly route the instruction to a pump will execute it. + +Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally +suited for this purpose, although a similar effect can easily be +achieved using the {\tt BitFifo} ship in conjunction with the {\tt + Memory} ship. The only disadvantage to this arrangement is that the +{\tt out} port must execute {\tt send} a certain number of times, and +that number of times is determined by the {\tt size} field of the +codebag descriptor. Unfortunately this value is not known at compile +time, so it cannot be encoded as a repeat count on the instruction at +the {\tt out} port. The typical solution is to place a standing ({\tt + count=$\infty$}) {\tt send} instruction at the {\tt out} port of one +{\tt Memory} ship, and reserve that ship exclusively for dispatching +code bags. + +Note that instructions are encoded relative to a particular dispatch +output port. That is, the {\tt Instruction Path} field of the +instruction specifies a path to the desired execution pump -- but this +path is also specified with respect to some source. Therefore the +instruction can be correctly dispatched only from that particular +source. However, it is fairly simple to write software which can +``relocate'' a codebag by replacing the {\tt Instruction Path} fields +as needed. + +\subsection{Tokens and Data Items} + +When a data item is sent to a destination which expects a token, such +as an output port destination, the effect will be the same as if a token +had been sent to that destination, and the data value will be lost. + +When a token is sent to a destination which expects a data item, such +as a pump destination or an input port destination, the effect will be +the same as if a data item {\it having an undefined value} were sent +to that destination. In other words, the programmer has no completely +reliable mechanism for distinguishing between tokens and data items. + + \section{Future Directions} +\subsection{The Role of the Count Field} + Looking back on the design of the pump, several things are now apparent which were not initially. In particular, it seems that it could be useful to separate {\it loading the count register} from @@ -331,31 +558,104 @@ other types of instructions. This would have a few advantages: \begin{itemize} \item The size of the count field would not be a consideration in the ``instruction budget'' of normal execution instructions + \item It would be possible to have finitely-repeating, - infinitely-requeueing instructions (FIXME). + infinitely-requeueing instructions. Various implementation + details\footnote{In many implementations, the count field of an + instruction is destroyed as it counts down; another register and + mux would be needed to ``save'' this value so it could be + restored prior to being requeued} make it awkward to support + this unless the count register can be manipulated independently + of instruction execution. \end{itemize} -\begin{verbatim} -essence of the pump: +With this in mind, we can refactor the ``essence of the pump'' into +the following actions, each of which may or may not be performed by a +particular instruction: -ti -di +\begin{itemize} +\item await and acknowledge a token +\item await and acknowledge a datum +\item load the awaited datum into the path latch (top 11 bits) +\item load the awaited datum into the data latch +\item load a literal into the data latch +\item load a literal into the path latch +\item send the value in the data latch along the path in the path latch +\item send a token along the path in the path latch +\item set the count register to a literal value +\item decrement the count register +\item requeue this instruction if {\tt count>0} +\item requeue this instruction unconditionally +\item repeat this instruction if {\tt count>0} +\item repeat this instruction unconditionally +\end{itemize} -dc (data capture) -dl (data literal, take from instruction bits) +The crucial change here is the decoupling of the act of {\it loading + the count register} from the act of {\it loading the next + instruction}. It also separates the act of loading the ``path +latch'' from the act of actually performing the transmission. This +latter feature makes it possible to load the data and destination +latches from two distinct data items, allowing ships to create, +handle, and consume {\it packets} in the form of a pair of data items. + +At this point, the remaining decisions are simply a question of +instruction bit budgeting. Currently we have a separate instruction +form for literals, so that the literal can use bits which are not +otherwise relevant to literal instructions (for example, {\tt Di}). +Adding an additional instruction form for loading the count register +would have similar advantages. + + +\subsection{For Just a Little Bit More...} + +Compared to early revisions of the Fleet architecture, the docks of +FleetTwo are substantially more powerful. One might ask, why not add +additional power to the docks, perhaps up to the point of making them +tiny von Neumann machines in their own right? Indeed, doing so would +risk reliving the {\it Wheel of Reincarnation} +\cite{WheelOfReincarnation}. + +I believe, however, that there is a fundamental distinction to be +made between devices whose control flow {\it can branch} and those +whose control flow {\it cannot branch}, and that this distinction is +founded in the geometry of VLSI, and indeed, Euclidean space. +Ultimately, the instruction stream for any component of a processor +must exist in some physical incarnation -- some geometric extent. If +the control flow of a program is linear, with no branching, then the +instruction stream has an optimal spatial mapping which is +efficient: simply lay it out as a FIFO. + +By contrast, however, a program whose instruction stream has +unrestricted branching is logically a {\it tree} structure. +Unfortunately, however, general trees have no efficient mapping onto +two-dimensional surfaces. By this, I mean that in every possible +mapping, there exists some pair of {\it logically adjacent} +instructions whose {\it physical distance} is proportional to the size +of the entire program, rather than some constant factor irrespective +of the program size. Therefore I conclude that the spatial mapping of +a program with unrestricted branching is in an entirely different +class than that of programs with linear control flow, and I choose this +distinction as the basis for limiting the docks to linear control +flow sequences. A similar argument can be made for looping constructs +in which two or more loops may be nested within a single parent loop. + +With this distinction in mind, it would not be objectionable for +future incarnations of Fleet to have {\it predicated} instructions at +the docks. This might take the form of: -di -doi (data output, destination taken from instruction) -dod (data output, destination taken from Data register) +\begin{itemize} +\item Instructions to unconditionally set some ``state flags'' +\item Instructions to conditionally set some ``state flags'' based on the contents of the data latch +\item Bits which cause an instruction to be ignored or retired based on the state flags +\end{itemize} -set count -decrement count +I feel that this addition would not risk falling down the slippery +slope -- instruction streams with predicated instructions (but not +unrestricted branching or nested looping) preserve the crucial +property that they have an efficient spatial mapping. Any steps +beyond this, however, cross a fairly fundamental line. -requeue if count>0 -requeue unconditionally -repeat if count>0 -repeat unconditionally -\end{verbatim} +\addcontentsline{toc}{section}{Ship Descriptions}