X-Git-Url: http://git.megacz.com/?a=blobdiff_plain;f=doc%2Farchman.tex;h=7019dd8f94f1ce4d5aeef1b8dc0e9b0614431249;hb=9482376b15db2da4bcb09a452d007db460b70c01;hp=9617822f98636978bdb284121ad96ccb1531dcf7;hpb=7a9896ade2db76ca658f10fc42020d0fbacbc4e2;p=fleet.git diff --git a/doc/archman.tex b/doc/archman.tex index 9617822..7019dd8 100644 --- a/doc/archman.tex +++ b/doc/archman.tex @@ -1,8 +1,18 @@ \documentclass[10pt,oneside]{article} \reversemarginpar +\usepackage[titles]{tocloft} \usepackage{emp} +\usepackage{amsmath} \DeclareGraphicsRule{*}{mps}{*}{} +\newcommand{\footnoteremember}[2]{ + \footnote{#2} + \newcounter{#1} + \setcounter{#1}{\value{footnote}} +} \newcommand{\footnoterecall}[1]{ + \footnotemark[\value{#1}] +} + \ifx\pdftexversion\undefined \usepackage[dvips]{graphicx} \DeclareGraphicsExtensions{.eps}\else @@ -19,7 +29,7 @@ \usepackage{bytefield} \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref} \renewcommand{\ttdefault}{cmtt} -\title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of Fleet}} +\title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of FleetTwo}} \begin{document} \maketitle @@ -67,8 +77,21 @@ necessary to write software and compilers for Fleet. \end{abstract} +%\vfill +%\fbox{\parbox{5in}{\footnotesize +%\begin{center} +%The software described in this memo may be obtained using +%\href{http://www.darcs.net/}{\tt darcs}, with the command: +% +%{\tt darcs get http://research.cs.berkeley.edu/class/fleet/repos/fleet/} +%\end{center} +%}} + \pagebreak -\section*{Introduction (from \cite{ies02})} +\tableofcontents + +\pagebreak +\section{Introduction (from \cite{ies02})} During the past two and a half years, the Fleet architecture has gone through massive changes and improvements. Nonetheless, the @@ -127,7 +150,7 @@ logic makes concurrency available if we can figure out how to use it. \pagebreak -\section*{The Programmer's View of The Ship-Fabric Interface} +\section{The Programmer's View of The Ship-Fabric Interface} The role of the Fleet switch fabric is to carry {\it packets} from {\it sources} to {\it destinations}. @@ -142,7 +165,7 @@ payload is a data item, it includes one machine word of data. The diagram below represents a {\it programmer's} conceptual view of the interface between ships and the switch fabric. Actual implementation circuitry may differ substantially. Sources and -destinations which can send and recieve only tokens -- not data items +destinations which can send and receive only tokens -- not data items -- are drawn as dashed lines. \vspace{0.3cm} @@ -152,25 +175,25 @@ destinations which can send and recieve only tokens -- not data items \vspace{0.3cm} The term {\it port} refers to an interface to the ship, the {\it - valve} connecting it to the switch fabric, and the corresponding + dock} connecting it to the switch fabric, and the corresponding sources and destinations on the switch fabric. -Each valve consists of a {\it data latch}, which is as wide as a +Each dock consists of a {\it data latch}, which is as wide as a single machine word and a {\it pump}, which is a circular fifo of instruction-width latches. The values in the instruction fifo control the data latch, as future chapters will explain. -Note that the pump in each valve has a destination of its own, and +Note that the pump in each dock has a destination of its own, and that there is no buffering fifo guarding this destination. Note that all other destinations are guarded by a buffering fifo; the size of -this fifo is exposed to the software programmer so she can ensure that -deadlock does not occur. +this fifo is exposed to the software programmer so she can avoid +deadlock. \pagebreak -\section*{Data Formats} +\section{Data Formats} -\subsection*{Path (12 bits)} +\subsection{Path (12 bits)} These bits appear physically within the switch fabric. \footnote{In the Sun Labs implementation, these bits have ``address bit @@ -188,7 +211,7 @@ packet represents a token\footnote{In the Sun Labs \end{bytefield} } -\subsection*{Data Word In Memory (37 bits)} +\subsection{Data Word In Memory (37 bits)} A word of data is 37 bits wide. @@ -200,7 +223,7 @@ A word of data is 37 bits wide. \end{bytefield} } -\subsection*{Packet In Flight (49 bits)} +\subsection{Packet In Flight (49 bits)} {\tt\footnotesize \begin{bytefield}{49} @@ -211,7 +234,7 @@ A word of data is 37 bits wide. \end{bytefield} } -\subsection*{Instruction In Memory (37 bits)} +\subsection{Instruction In Memory (37 bits)} An instruction must be no wider than a memory word. The next section explains the bits in greater detail. The {\tt Instruction Path} is @@ -222,7 +245,7 @@ the instruction. {\tt\tiny \begin{bytefield}{49} - \bitheader[b]{0,10,11,17,18-26,36}\\ + \bitheader[b]{0,6,7,17,18-26,36}\\ \bitbox[r]{12}{} \bitbox{11}{Instruction Path} \bitbox{1}{1} @@ -238,7 +261,7 @@ the instruction. \end{bytefield} } -\subsection*{Instruction Packet In Flight (49 bits)} +\subsection{Instruction Packet In Flight (49 bits)} Note that Fleet simply copies the {\tt Instruction Path} field, bit for bit, into the packet {\tt Path} field in order to ``dispatch'' an @@ -246,7 +269,7 @@ instruction. {\tt\tiny \begin{bytefield}{49} - \bitheader[b]{0,10,11,17,18-25,37,47,48}\\ + \bitheader[b]{0,6,7,17,18-25,37,47,48}\\ \bitbox[r]{1}{0} \bitbox{11}{Instruction Path} \bitbox{11}{Instruction Path} @@ -266,7 +289,7 @@ instruction. \setlength{\bitwidth}{5mm} \pagebreak -\section*{Instruction Formats} +\section{Instruction Formats} \begin{wrapfigure}{R}{2in} \epsfig{file=locations,width=2in} \vspace{-0.4in} @@ -283,6 +306,7 @@ successor of the execution point. \vspace{0.3cm} {\bf Kill} +\addcontentsline{toc}{subsection}{Kill} {\tt \begin{bytefield}{26} @@ -292,7 +316,7 @@ successor of the execution point. \bitbox{1}{1} \bitbox{1}{0} \bitbox{1}{1} - \bitbox{14}{} + \bitbox{14}{don't care} \bitbox{7}{Count} \end{bytefield}} When a {\tt kill} instruction reaches the insertion point, it will @@ -301,11 +325,12 @@ point. When this occurs, the instruction at the execution point is retired and the count field of the {\tt kill} instruction is decremented. If the {\tt kill} instruction's count was {\tt 0} before decrementing, the {\tt kill} instruction is retired. The programmer -is assured that a kill instruction with a count of $n$ will kill $n$ +is assured that a kill instruction with a count of $n$ will kill $n+1$ {\it consecutive} instructions. \vspace{0.3cm} {\bf Clog} +\addcontentsline{toc}{subsection}{Clog} {\tt \begin{bytefield}{26} @@ -315,13 +340,15 @@ is assured that a kill instruction with a count of $n$ will kill $n$ \bitbox{1}{1} \bitbox{1}{0} \bitbox{1}{0} - \bitbox{21}{} + \bitbox{21}{don't care} \end{bytefield}} -When a {\tt clog} instruction reaches the execution point, no more -instructions will be executed until an {\tt unclog} is performed. +When a {\tt clog} instruction reaches the execution point, it remains +there and no more instructions will be executed until an {\tt unclog} +is performed. \vspace{0.3cm} {\bf UnClog} +\addcontentsline{toc}{subsection}{UnClog} {\tt \begin{bytefield}{26} @@ -331,7 +358,7 @@ instructions will be executed until an {\tt unclog} is performed. \bitbox{1}{0} \bitbox{1}{1} \bitbox{1}{0} - \bitbox{21}{} + \bitbox{21}{don't care} \end{bytefield}} When an {\tt unclog} instruction reaches the insertion point, it will wait there until a {\tt clog} instruction is at the execution point. When @@ -339,6 +366,7 @@ this occurs, both instructions retire. \vspace{0.3cm} {\bf Literal} +\addcontentsline{toc}{subsection}{Literal} {\tt \begin{bytefield}{26} @@ -355,6 +383,7 @@ in the data latch. \pagebreak {\bf Normal Instruction} +\addcontentsline{toc}{subsection}{Normal Instruction} {\tt \begin{bytefield}{26} @@ -372,27 +401,28 @@ in the data latch. \end{bytefield} } +\footnoteremember{tidi}{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.} +\footnoteremember{didc}{Note that {\tt Di}=0,{\tt Dc}=1 + is meaningless and therefore reserved for other + uses.} +\footnoteremember{todo}{ {\tt To}=1,{\tt Do}=1 have special meaning on an output port.} +\footnoteremember{toig}{{\tt To}=0,{\tt Ig}=1 is invalid} \begin{itemize} - - \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge - it\footnote{{\tt Ti}=1,{\tt Di}=1 is invalid on input port.} + \item [\tt Ti] ({\bf Token Input}) wait for a token and acknowledge it. \item [\tt Di] ({\bf Data Input}) wait for a datum and acknowledge it. + \footnoterecall{tidi}\footnoterecall{didc} \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted datum in the data latch. This bit is ignored if the incoming packet is - a token. \footnote{ Note that {\tt Di}=0,{\tt Dc}=1 - is meaningless and therefore reserved for other - uses.} + a token. \footnoterecall{didc} - \item [\tt Do] ({\bf Data Output}) emit a datum. + \item [\tt Do] ({\bf Data Output}) emit a datum.\footnoterecall{todo} - \item [\tt To] ({\bf Token Output}) emit a token.\footnote{ {\tt To}=1,{\tt - Do}=1 have special meaning on an output port.} + \item [\tt To] ({\bf Token Output}) emit a token.\footnoterecall{todo}\footnoterecall{toig} \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore - the {\tt To} bit unless {\tt Count=0} \footnote{{\tt - To}=0,{\tt Ig}=1 is invalid} + the {\tt To} bit unless {\tt Count=0}\footnoterecall{toig} \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero count are ``Re-Queued'' rather than RePeated. See @@ -418,12 +448,12 @@ if (Count==0) { Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT} \item [\tt Dest] ({\bf Data/Token Destination}) - Normally, this field is copied into the address portion of any + Normally, this field is copied into the path portion of any outgoing packet ({\tt Do} on an output port or {\tt To} on either). However, in the special case of an output port, if {\tt Do=1,To=1}, then the {\it most significant} {\tt 11} bits of the value in the {\it - data register} are used as a destination address instead. + data register} are used as a destination path instead. %\footnote{This %functionality eliminates the need for any sort of ``{\tt Execute}'' %ship, and lets a {\tt Fifo} ship act as an extension of the @@ -433,23 +463,25 @@ Note how a ``standing'' instruction is encoded as {\tt Count=MAX\_COUNT} \pagebreak -\section*{Opcodes} +\section{Opcodes} Opcodes, as specified in \cite{am25}, are not yet implemented, but should be. -\section*{Bypasses} +\section{Bypasses} Bypasses, as specified in \cite{am27}, are not yet implemented, but should be. -\section*{Sequencing Guarantees} +\section{Sequencing Guarantees} +\addcontentsline{toc}{subsection}{Source Sequence Guarantee} If two packets leave the same source and have the same path, they will arrive at their common destination {\it in the same order in which they left the source}. This assurance is referred to as the {\it source sequence guarantee}. +\addcontentsline{toc}{subsection}{Instruction Sequence Guarantee} CodeBags in memory consist of an unordered set of {\it fibers}. A fiber is an ordered set of instructions which all share the same instruction path, and therefore will be executed by the same pump. @@ -460,7 +492,7 @@ they appear in the fiber. This is known as the {\it instruction will take advantage of the source sequence guarantee in order to fulfill the instruction sequence guarantee. -\section*{Code Bags} +\section{Code Bags} Code bags may overlap in memory. If the assembler determines that two code bags share a set of fibers, it may feel free to re-order those @@ -469,15 +501,15 @@ for these fibers. The codebag descriptors for the two bags will then have different addresses and/or sizes, but the ranges which they encompass will overlap. -\subsection*{Dispatching Code Bags} +\subsection{Dispatching Code Bags} -A codebag is ``dispatched'' by causing its constitutent words to +A codebag is ``dispatched'' by causing its constituent words to arrive at the data latch of some output port, and then causing that output port's pump to execute a {\tt send} (not {\tt sendto}) instruction, which is encoded as {\tt Do=1,To=1}. Because instructions are encoded in such a way that their {\tt Instruction Path} appears in the upper 11 bits, the {\tt send} mechanism will -correctly route the instruction to the pump which ought to execute it. +correctly route the instruction to a pump will execute it. Currently the {\tt inCBD} port of the {\tt Memory} ship is ideally suited for this purpose, although a similar effect can easily be @@ -496,16 +528,16 @@ Note that instructions are encoded relative to a particular dispatch output port. That is, the {\tt Instruction Path} field of the instruction specifies a path to the desired execution pump -- but this path is also specified with respect to some source. Therefore the -instruction can only be correctly dispatched from that particular -source. However, it is a fairly simple matter two write a software -program which can ``relocate'' a codebag by replacing the {\tt - Instruction Path} fields as needed. +instruction can be correctly dispatched only from that particular +source. However, it is fairly simple to write software which can +``relocate'' a codebag by replacing the {\tt Instruction Path} fields +as needed. -\subsection*{Tokens and Data Items} +\subsection{Tokens and Data Items} When a data item is sent to a destination which expects a token, such as an output port destination, the effect will be the same as if a token -had been sent to that destination. +had been sent to that destination, and the data value will be lost. When a token is sent to a destination which expects a data item, such as a pump destination or an input port destination, the effect will be @@ -514,9 +546,9 @@ to that destination. In other words, the programmer has no completely reliable mechanism for distinguishing between tokens and data items. -\section*{Future Directions} +\section{Future Directions} -\subsection*{The Role of the Count Field} +\subsection{The Role of the Count Field} Looking back on the design of the pump, several things are now apparent which were not initially. In particular, it seems that it @@ -529,7 +561,7 @@ other types of instructions. This would have a few advantages: \item It would be possible to have finitely-repeating, infinitely-requeueing instructions. Various implementation - details\footnote{in many implementations, the count field of an + details\footnote{In many implementations, the count field of an instruction is destroyed as it counts down; another register and mux would be needed to ``save'' this value so it could be restored prior to being requeued} make it awkward to support @@ -544,12 +576,12 @@ particular instruction: \begin{itemize} \item await and acknowledge a token \item await and acknowledge a datum +\item load the awaited datum into the path latch (top 11 bits) \item load the awaited datum into the data latch -\item load the a literal into the data latch -\item load a value indicated by the instruction into the path latch -\item load the top 11 bits of the data latch into the path latch -\item treat the values in the path latch and data latch as a packet and transmit it -\item treat the value in the path latch as a token and transmit it +\item load a literal into the data latch +\item load a literal into the path latch +\item send the value in the data latch along the path in the path latch +\item send a token along the path in the path latch \item set the count register to a literal value \item decrement the count register \item requeue this instruction if {\tt count>0} @@ -566,19 +598,19 @@ latter feature makes it possible to load the data and destination latches from two distinct data items, allowing ships to create, handle, and consume {\it packets} in the form of a pair of data items. -At this point, it boils down to a question of instruction bit -budgeting. Currently we have a separate instruction form for -literals, so that the literal can use bits which are not otherwise -relevant to literal instructions (for example, {\tt Di}). Adding an -additional instruction form for loading the count register would have -similar advantages. +At this point, the remaining decisions are simply a question of +instruction bit budgeting. Currently we have a separate instruction +form for literals, so that the literal can use bits which are not +otherwise relevant to literal instructions (for example, {\tt Di}). +Adding an additional instruction form for loading the count register +would have similar advantages. -\subsection*{For Just a Little Bit More...} +\subsection{For Just a Little Bit More...} -Compared to early revisions of the Fleet architecture, the valves of +Compared to early revisions of the Fleet architecture, the docks of FleetTwo are substantially more powerful. One might ask, why not add -additional power to the valves, perhaps up to the point of making them +additional power to the docks, perhaps up to the point of making them tiny von Neumann machines in their own right? Indeed, doing so would risk reliving the {\it Wheel of Reincarnation} \cite{WheelOfReincarnation}. @@ -590,26 +622,26 @@ founded in the geometry of VLSI, and indeed, Euclidean space. Ultimately, the instruction stream for any component of a processor must exist in some physical incarnation -- some geometric extent. If the control flow of a program is linear, with no branching, then the -instruction stream has an optimal spatial mapping which is quite -efficient: it is simply laid out in FIFO fashion. +instruction stream has an optimal spatial mapping which is +efficient: simply lay it out as a FIFO. By contrast, however, a program whose instruction stream has -unrestricted branching is logically a {\it tree} structure. Quite -unfortunately, general trees have no efficient mapping onto +unrestricted branching is logically a {\it tree} structure. +Unfortunately, however, general trees have no efficient mapping onto two-dimensional surfaces. By this, I mean that in every possible mapping, there exists some pair of {\it logically adjacent} instructions whose {\it physical distance} is proportional to the size of the entire program, rather than some constant factor irrespective of the program size. Therefore I conclude that the spatial mapping of a program with unrestricted branching is in an entirely different -class than that of programs with linear control flow. I choose this -distinction as the basis for limiting our valves to linear control +class than that of programs with linear control flow, and I choose this +distinction as the basis for limiting the docks to linear control flow sequences. A similar argument can be made for looping constructs in which two or more loops may be nested within a single parent loop. With this distinction in mind, it would not be objectionable for future incarnations of Fleet to have {\it predicated} instructions at -the valves. This might take the form of: +the docks. This might take the form of: \begin{itemize} \item Instructions to unconditionally set some ``state flags'' @@ -619,9 +651,11 @@ the valves. This might take the form of: I feel that this addition would not risk falling down the slippery slope -- instruction streams with predicated instructions (but not -unrestricted branching) preserve the crucial property that they have -an efficient spatial mapping. Any steps beyond this, however, cross a -fairly fundamental line. +unrestricted branching or nested looping) preserve the crucial +property that they have an efficient spatial mapping. Any steps +beyond this, however, cross a fairly fundamental line. + +\addcontentsline{toc}{section}{Ship Descriptions}