doc/archman.tex

   1 \documentclass[10pt,oneside]{article}
   2 \reversemarginpar
   3 \usepackage{palatino}
   4 \usepackage{epsfig}
   5 \usepackage{verbatim}
   6 \usepackage{parskip}
   7 \usepackage{register}
   8 \usepackage{bytefield}
   9 \usepackage[colorlinks=true, pdfstartview=FitV, linkcolor=blue, citecolor=blue, urlcolor=blue]{hyperref}
  10 \renewcommand{\ttdefault}{cmtt}
  11 \title{The FleetTwo Architecture Manual\\{\normalsize A Programmer's View of Fleet}}
  12 \begin{document}
  13 \maketitle
  14
  15 \begin{thebibliography}{[GDG01]}
  16 \bibitem[IES02]{ies02}
  17   Sutherland, Ivan,
  18   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies02-FLEET-A.Once.Instruction.Computer.pdf}{\it UCB-IES02: Fleet -- A One Instruction Computer}
  19 \bibitem[IES14]{ies14}
  20   Sutherland, Ivan,
  21   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies14-Fleet.Definition.pdf}{\it UCB-IES14: The Fleet Definition}
  22 \bibitem[IES44]{ies44}
  23   Sutherland, Ivan,
  24   \href{http://research.cs.berkeley.edu/class/fleet/docs/people/ivan.e.sutherland/ies44-Fleet.Definition.pdf}{\it UCB-IES44: The Fleet Definition}
  25 \end{thebibliography}
  26
  27 \vspace{3cm}
  28 \begin{abstract}
  29 This manual attempts to detail all those aspects of Fleet which are
  30 visible to a software programmer or compiler backend.
  31
  32 This document assumes that the reader is broadly familiar with the
  33 Fleet architecture and the motivation behind it.  For a far better
  34 introduction to these topics, the reader is referred to \cite{ies02}
  35 \cite{ies14} and \cite{ies44}.  Please bear in mind that many of the
  36 details in these documents have been changed, but they still serve as
  37 the authoritative introduction to the architecture.  After reviewing
  38 these introductory papers, the reader is invited to digest this
  39 architecture manual for the latest information on the details
  40 necessary to write software and compilers for Fleet.
  41 \end{abstract}
  42
  43
  44 \pagebreak
  45 \section*{Introduction (from \cite{ies02})}
  46
  47 In \cite{ies02}, Sutherland writes:
  48
  49 {\it
  50 When computers were new, logic and storage were expensive and wires
  51 were relatively cheap.  Early computer designers avoided using logic
  52 wherever possible but were not greatly concerned with communication.
  53 They made design choices consistent with the costs of the day.  My
  54 favorite example is the jump instruction.  Early designers put jump
  55 instructions at the end of each block of code to avoid the expense of
  56 storing the address of the next block while executing the present one.
  57
  58 Today's chip fabrication methods invert the older cost balance.  In
  59 today's integrated circuits logic and memory are now almost free but
  60 communication costs dominate chip area, energy consumption and delay.
  61 In spite of these changes in the stuff from which we make computers,
  62 vestiges of the past remain in many of today's common microprocessor
  63 designs.  For example, jump instructions still appear at the end of
  64 each basic block of code in spite of the need to pre-fetch the next
  65 block while executing this one.  It seems better today to store a
  66 pointer to the next block and the length of the current block early in
  67 each block of code.
  68
  69 Instead of following the path of history, I'd like to listen carefully
  70 to what modern chip structures have to teach about how one might build
  71 a modern computer.  I see three major lessons.  First, simplicity can
  72 reduce cost.  Second, moving data will consume most of the time,
  73 energy and chip area of a modern computer. And third, the low cost of
  74 logic makes concurrency available if we can figure out how to use it.
  75
  76 \begin{itemize}
  77
  78 \item {\bf Simplicity}: The Fleet architecture seeks simplicity by treating
  79       all processing devices alike...
  80
  81 \item {\bf Communication}: The Fleet architecture seeks to control the
  82       cost of communication by putting it under direct programmer
  83       control.  Thus Fleet avoids instructions like {\sc ADD} or {\sc STORE} that
  84       include concealed communication to and from a register file...
  85
  86 \item {\bf Concurrency}: The Fleet architecture assumes concurrency
  87       nearly everywhere...
  88
  89 \item {\bf Asynchrony}: This provides enormous flexibility for
  90       implementers to improve the performance or reduce the cost of
  91       the processing devices in a Fleet system.  -- Synchronous
  92       implementations both of ships and of the switch fabric are
  93       possible provided they use validity or occupancy bits to achieve
  94       the arbitrary delays required at sources and destinations...
  95
  96 \end{itemize}
  97 }
  98
  99
 100 \pagebreak
 101 \section*{The Programmer's View of The Ship-Fabric Interface}
 102
 103 The role of the Fleet switch fabric is to carry {\it packets} from
 104 {\it sources} to {\it destinations}.
 105
 106 A packet consists of a {\it path} and a {\it payload}.  The path is
 107 some string of bits which tells the switch fabric how to route from
 108 the desired source to the desired destination.  The distinction
 109 between a {\it path} and a {\it destination} is extremely important!
 110 The payload is either a {\it token} or a {\it data item}.  If the
 111 payload is a data item, it includes one machine word of data.
 112
 113 The diagram below represents a {\it programmer's} conceptual view of
 114 the interface between ships and the switch fabric.  Actual
 115 implementation circuitry may differ substantially.  Sources and
 116 destinations which can send and recieve only tokens -- not data items
 117 -- are drawn as dashed lines.
 118
 119 \vspace{0.3cm}
 120 \begin{center}
 121 \epsfig{file=ports,width=4in}
 122 \end{center}
 123 \vspace{0.3cm}
 124
 125 The term {\it port} refers to each interface to the ship, the {\it
 126   valve} connecting it to the switch fabric, and the corresponding
 127 sources and destinations on the switch fabric.
 128
 129 Each valve consists of a {\it data latch}, which is as wide as a
 130 single machine word and a {\it pump}, which is a circular fifo of
 131 instruction-width latches.  The contents of the instruction fifo
 132 control the data latch, as future chapters will explain.
 133
 134 Note that the pump in each valve has a destination of its own, and
 135 that there is no buffering fifo guarding this destination.  Note that
 136 all other destinations are guarded by a buffering fifo; the size of
 137 this fifo is exposed to the software programmer so she can ensure that
 138 deadlock does not occur.
 139
 140
 141 \pagebreak
 142 \section*{Data Formats}
 143
 144 \subsection*{Packet Path (12 bits)}
 145
 146 These bits appear physically within the switch fabric, and have
 147 ``address bit timing.''  The {\tt T} bit is the ``tokenhood'' bit; if
 148 set, this packet represents a token and it does not cause the switch
 149 fabric data latches to fire.
 150
 151 {\tt\footnotesize
 152 \begin{bytefield}{49}
 153   \bitheader[b]{37,47,48}\\
 154   \bitbox{1}{T}
 155   \bitbox{11}{Path}
 156   \bitbox[l]{37}{}
 157 \end{bytefield}
 158 }
 159
 160 \subsection*{Data Word In Memory (37 bits)}
 161
 162 A word of data is 37 bits wide.
 163
 164 {\tt\footnotesize
 165 \begin{bytefield}{49}
 166   \bitheader[b]{0,36}\\
 167   \bitbox[r]{12}{}
 168   \bitbox{37}{Data Word}
 169 \end{bytefield}
 170 }
 171
 172 \subsection*{Packet In Flight (49 bits)}
 173
 174 {\tt\footnotesize
 175 \begin{bytefield}{49}
 176   \bitheader[b]{0,36,37,47,48}\\
 177   \bitbox{1}{T}
 178   \bitbox{11}{Packet Path}
 179   \bitbox{37}{Data Word}
 180 \end{bytefield}
 181 }
 182
 183 \subsection*{Instruction In Memory (37 bits)}
 184
 185 An instruction must be no wider than a memory word.  The next section
 186 explains the bits in greater detail.  The {\tt Instruction Path} is
 187 the path which the instruction itself will take in order to arrive at
 188 the desired pump destination.  The {\it Data/Token Path} is placed on
 189 any packet inserted into the switch fabric as a result of executing
 190 the instruction.
 191
 192 {\tt\tiny
 193 \begin{bytefield}{49}
 194   \bitheader[b]{0,10,11,17,18-26,36}\\
 195   \bitbox[r]{12}{}
 196   \bitbox{11}{Instruction Path}
 197   \bitbox{1}{K}
 198   \bitbox{1}{L}
 199   \bitbox{1}{Ti}
 200   \bitbox{1}{Di}
 201   \bitbox{1}{Dc}
 202   \bitbox{1}{Do}
 203   \bitbox{1}{To}
 204   \bitbox{1}{Rq}
 205   \bitbox{7}{Count}
 206   \bitbox{11}{Data/Token Path}
 207 \end{bytefield}
 208 }
 209
 210 \subsection*{Instruction Packet In Flight (49 bits)}
 211
 212 Note that Fleet simply copies the {\tt Instruction Path} field, bit
 213 for bit, into the packet {\tt Path} field in order to ``dispatch'' an
 214 instruction.
 215
 216 {\tt\tiny
 217 \begin{bytefield}{49}
 218   \bitheader[b]{0,10,11,17,18-25,37,47,48}\\
 219   \bitbox[r]{1}{0}
 220   \bitbox{11}{Instruction Path}
 221   \bitbox{11}{Instruction Path}
 222   \bitbox{1}{K}
 223   \bitbox{1}{L}
 224   \bitbox{1}{Ti}
 225   \bitbox{1}{Di}
 226   \bitbox{1}{Dc}
 227   \bitbox{1}{Do}
 228   \bitbox{1}{To}
 229   \bitbox{1}{Rq}
 230   \bitbox{7}{Count}
 231   \bitbox{11}{Data/Token Path}
 232 \end{bytefield}
 233 }
 234
 235
 236 \pagebreak
 237
 238 \section*{Instruction Formats}
 239
 240 Instructions can be grouped into two categories: {\it killing}
 241 instructions, which are acted upon as soon as they leave the
 242 instruction horn, and {\it executing} instructions, which pass through
 243 the instruction queue before being acted upon.
 244
 245 Blank fields below are reserved for future use and must be set to
 246 zero.
 247
 248 Note that the arbiter is requested whenever {\it any of the first
 249   three bits is {\tt 1}}.  If the arbiter is not requested,
 250
 251
 252
 253 \setlength{\bitwidth}{5mm}
 254
 255 \subsection*{Killing Instructions}
 256
 257 Kill (kill anything other than a Clog)
 258
 259 {\tt
 260 \begin{bytefield}{26}
 261   \bitheader[b]{0,6,7,20-25}\\
 262   \bitbox{1}{0}
 263   \bitbox{1}{0}
 264   \bitbox{1}{1}
 265   \bitbox{1}{0}
 266   \bitbox{1}{1}
 267   \bitbox{14}{}
 268   \bitbox{7}{Count}
 269 \end{bytefield}}
 270
 271 UnClog (kill a Clog)
 272
 273 {\tt
 274 \begin{bytefield}{26}
 275   \bitheader[b]{0,20-25}\\
 276   \bitbox{1}{0}
 277   \bitbox{1}{0}
 278   \bitbox{1}{0}
 279   \bitbox{1}{1}
 280   \bitbox{1}{0}
 281   \bitbox{21}{}
 282 \end{bytefield}}
 283
 284 \subsection*{Executing Instructions}
 285 Clog
 286
 287 {\tt
 288 \begin{bytefield}{26}
 289   \bitheader[b]{0,20-25}\\
 290   \bitbox{1}{0}
 291   \bitbox{1}{0}
 292   \bitbox{1}{1}
 293   \bitbox{1}{0}
 294   \bitbox{1}{0}
 295   \bitbox{21}{}
 296 \end{bytefield}}
 297
 298 Literal (sign extended, implicit {\tt Rq=1})
 299
 300 {\tt
 301 \begin{bytefield}{26}
 302   \bitheader[b]{0,6,7,23-25}\\
 303   \bitbox{1}{0}
 304   \bitbox{1}{1}
 305   \bitbox{17}{Literal}
 306   \bitbox{7}{Count}
 307 \end{bytefield}}
 308
 309
 310 Normal
 311
 312 {\tt
 313 \begin{bytefield}{26}
 314   \bitheader[b]{0,6,7,17-25}\\
 315   \bitbox{1}{1}
 316   \bitbox{1}{Ti}
 317   \bitbox{1}{Di}
 318   \bitbox{1}{Dc}
 319   \bitbox{1}{Do}
 320   \bitbox{1}{To}
 321   \bitbox{1}{Ig}
 322   \bitbox{1}{Rq}
 323   \bitbox{11}{Dest}
 324   \bitbox{7}{Count}
 325 \end{bytefield}}
 326
 327
 328
 329
 330 \pagebreak
 331 \subsection*{Field Descriptions}
 332 {\tt
 333 \begin{bytefield}{26}
 334   \bitheader[b]{0,6,7,16-25}\\
 335   \bitbox{1}{0}
 336   \bitbox{1}{Ti}
 337   \bitbox{1}{Di}
 338   \bitbox{1}{Dc}
 339   \bitbox{1}{Do}
 340   \bitbox{1}{To}
 341   \bitbox{1}{Ig}
 342   \bitbox{1}{Rq}
 343   \bitbox{11}{Dest}
 344   \bitbox{7}{Count}
 345 \end{bytefield}
 346 }
 347
 348 \begin{itemize}
 349
 350   \item [\tt Ti] ({\bf Token Input}) wait for a token and accept
 351                  it\footnote{{\tt Ti}=1,{\tt Di}=1 is invalid on inbox.}
 352
 353   \item [\tt Di] ({\bf Data Input}) wait for a datum and accept it.
 354
 355   \item [\tt Dc] ({\bf Data Capture}) capture (latch) the accepted
 356                  datum.  This bit is ignored if the incoming packet is
 357                  a token. \footnote{ Note that {\tt Di}=0,{\tt Dc}=1
 358                  is meaningless and therefore reserved for other
 359                  uses.}
 360
 361   \item [\tt Do] ({\bf Data Output}) emit a datum.
 362
 363   \item [\tt To] ({\bf Token Output}) emit a token.\footnote{ {\tt To}=1,{\tt
 364                  Do}=1 have special meaning on an outbox.}
 365
 366   \item [\tt Ig] ({\bf Ignore {\tt To} Until Last Iteration}) ignore
 367                  the {\tt To} bit unless {\tt Count=0} \footnote{{\tt
 368                  To}=0,{\tt Ig}=1 is invalid}
 369
 370   \item [\tt Rq] ({\bf ReQueue}) if set, instructions having nonzero
 371                  count are ``Re-Queued'' rather than RePeated.  See
 372                  {\tt Count} for more detail. \footnote{ An
 373                  instruction {\it in memory} may not have {\tt
 374                  Rq=1,Count=0} (use {\tt Rq=0,Count=0})}
 375
 376   \item  [\tt Count] ({\bf Count}) {\it After} executing:
 377 \begin{verbatim}
 378 if (Count==0) {
 379     discard this instruction
 380 } else {
 381     if Count < MAX_COUNT {
 382       decrement count
 383     }
 384     if Rq=1 or Literal {
 385        put this instruction back into the instruction fifo
 386     } else {
 387        execute this instruction again
 388     }
 389 }
 390 \end{verbatim}
 391 Note how a ``standing'' instruction is encoded as {\tt Count=1111111}
 392
 393 \item [\tt Dest] ({\bf Data/Token Destination})
 394    Normally, this field is copied into the address portion of any
 395    outgoing packet ({\tt Do} on an outbox or {\tt To}).
 396
 397    However, in the special case of an outbox, if {\tt Do=1,To=1}, then
 398    the {\it most significant} {\tt 11} bits of the value in the {\it
 399    data register} are used as a destination address instead.  \footnote{This
 400    functionality eliminates the need for any sort of ``{\tt Execute}''
 401    ship, and lets a {\tt Fifo} ship act as an extension of the
 402    instruction queue in the pump.}
 403
 404 \end{itemize}
 405
 406 \pagebreak
 407 \section*{Fleet Assembly Language}
 408
 409 The Fleet Assembly language is designed to be a human-readable
 410 depiction of the bits which comprise a Fleet program.  The formal
 411 syntax is given below:
 412
 413 \verbatiminput{fleet.g}
 414
 415 \pagebreak
 416 \section*{Java APIs}
 417 \subsection*{Representing Code}
 418 \subsection*{Representing Ships}
 419
 420 \pagebreak
 421 \section*{Misc}
 422
 423 \begin{verbatim}
 424 - sequencing guarantees
 425 - codebag format
 426 - behavior when token arrives at data port, vice versa
 427 - overlapping codebags
 428 \end{verbatim}
 429
 430 \pagebreak
 431 \section*{Future Directions}
 432
 433 Looking back on the design of the pump, several things are now
 434 apparent which were not initially.  In particular, it seems that it
 435 could be useful to separate {\it loading the count register} from
 436 other types of instructions.  This would have a few advantages:
 437
 438 \begin{itemize}
 439 \item The size of the count field would not be a consideration in the
 440       ``instruction budget'' of normal execution instructions
 441 \item It would be possible to have finitely-repeating,
 442       infinitely-requeueing instructions (FIXME).
 443 \end{itemize}
 444
 445 \begin{verbatim}
 446 essence of the pump:
 447
 448 ti
 449 di
 450
 451 dc (data capture)
 452 dl (data literal, take from instruction bits)
 453
 454 di
 455 doi (data output, destination taken from instruction)
 456 dod (data output, destination taken from Data register)
 457
 458 set count
 459 decrement count
 460
 461 requeue if count>0
 462 requeue unconditionally
 463 repeat  if count>0
 464 repeat  unconditionally
 465
 466 \end{verbatim}
 467
 468