ship: Memory

== Ports ===========================================================
data  in:    inCBD
data  in:    inAddrRead
data  in:    inAddrWrite
data  in:    inDataWrite

data  out:   out

== TeX ==============================================================

The {\tt Memory} ship represents an interface to a storage space,
which can be used to read from it or write to it.  This storage space
might be a fast on-chip cache, off chip DRAM, or perhaps even a disk
drive.

Generally, distinct {\tt Memory} ships do not access the same backing
storage, although this is not strictly prohibited.

Each {\tt Memory} ship may have multiple {\it interfaces}, numbered
starting with {\tt 0}.  Each interface may have any subset of the
following docks: {\tt inCBD}, {\tt inAddrRead}, {\tt inAddrWrite},
{\tt inDataWrite}, and {\tt out}.  If {\tt inCBD} or {\tt inAddrRead}
is present on an interface, then {\tt out} must be present as well.
If {\tt inAddrWrite} is present then {\tt inDataWrite} must be present
as well.

Each interface serializes the operations presented to it; this means
that an interface with both read and write capabilities will not be
able to read and write concurrently.  Instead, a {\tt Memory} ship
with the ability to read and write concurrently should have two
interfaces, one which is read-only and one which is write-only.

There may be multiple {\tt Memory} ships which interface to the same
physical storage space.  An implementation of Fleet must provide
additional documentation to the programmer indicating which {\tt
Memory} ships correspond to which storage spaces.  A single {\tt
Memory} ship may also access a ``virtual storage space'' formed by
concatenating multiple physical storage spaces.

\subsection*{Code Bag Fetch}

When a word appears at the {\tt inCBD} port, it is treated as a {\it
code bag descriptor}, as shown below:

\begin{center}
\setlength{\bitwidth}{3mm}
{\tt
\begin{bytefield}{37}
  \bitheader[b]{36,6,5,0}\\
  \bitbox{31}{Address} 
  \bitbox{6}{size} 
\end{bytefield}
}
\end{center}

When a word arrives at the {\tt inCBD} port, it is treated as a memory
read with {\tt inAddrRead=Address}, {\tt inStride=1}, and {\tt
inCount=size}.

\subsection*{Reading}

When a word is delivered to {\tt inAddrRead}, the word residing in
memory at that address is provided at {\tt out}.  The {\tt c-flag} at
the {\tt out} port is set to zero.

\subsection*{Writing}

When a word is delivered to {\tt inAddrWrite} and {\tt inDataWrite},
the word at {\tt inDataWrite} is written to the address specified by
{\tt inAddrWrite}.  Once the word is successfully committed to memory,
the value {\tt inAddr+inStride} is provided at {\tt out} (that is, the
address of the next word to be written).  The {\tt c-flag} at
the {\tt out} port is set to one.

\subsection*{To Do}

Stride and count are not implemented.

We need a way to do an ``unordered fetch'' -- a way to tell the memory
unit to retrieve some block of words in any order it likes.  This can
considerably accelerate fetches when the first word of the region is
not cached, but other parts are cached.  This can also be used for
dispatching codebags efficiently -- but how will we make sure that
instructions destined for a given pump are dispatched in the correct
order (source sequence guarantee)?

A more advanced form would be ``unordered fetch of ordered records''
-- the ability to specify a record size (in words), the offset of the
first record, and the number of records to be fetched.  The memory
unit would then fetch the records in any order it likes, but would be
sure to return the words comprising a record in the order in which
they appear in memory.  This feature could be used to solve the source
sequence guarantee problem mentioned in the previous paragraph.

== Fleeterpreter ====================================================
    private long[] mem = new long[0];
    public long readMem(int addr) { return addr >= mem.length ? 0 : mem[addr]; }
    public void writeMem(int addr, long val) {
        if (addr >= mem.length) {
            long[] newmem = new long[addr * 2 + 1];
            System.arraycopy(mem, 0, newmem, 0, mem.length);
            mem = newmem;
        }
        mem[addr] = val;
    }

    private long stride = 0;
    private long count = 0;
    private long addr = 0;
    private boolean writing = false;

    private Queue<Long> toDispatch = new LinkedList<Long>();
    public void service() {

        if (toDispatch.size() > 0) {
            //if (!box_out.readyForDataFromShip()) return;
            //box_out.addDataFromShip(toDispatch.remove());
            getInterpreter().dispatch(getInterpreter().readInstruction(toDispatch.remove(), getDock("out")));
        }

        if (box_inCBD.dataReadyForShip() && box_out.readyForDataFromShip()) {
            long val = box_inCBD.removeDataForShip();
            long addr = val >> 6;
            long size = val & 0x3f;
            for(int i=0; i<size; i++)
              toDispatch.add(readMem((int)(addr+i)));
        }
        if (count > 0) {
            if (writing) {
              if (box_inDataWrite.dataReadyForShip() && box_out.readyForDataFromShip()) {
                 writeMem((int)addr, box_inDataWrite.removeDataForShip());
                 box_out.addDataFromShip(0);
                 count--;
                 addr += stride;
              }
            } else {
              if (box_out.readyForDataFromShip()) {
                 box_out.addDataFromShip(readMem((int)addr));
                 count--;
                 addr += stride;
              }
            }

        } else if (box_inAddrRead.dataReadyForShip()) {
            addr = box_inAddrRead.removeDataForShip();
            stride = 0;
            count = 1;
            writing = false;

        } else if (box_inAddrWrite.dataReadyForShip()) {
            addr = box_inAddrWrite.removeDataForShip();
            stride = 0;
            count = 1;
            writing = true;
        }
    }

== FleetSim ==============================================================

== FPGA ==============================================================

  wire [(`WORDWIDTH-1):0] out1;
  wire [(`WORDWIDTH-1):0] out2;

  reg [(`CODEBAG_SIZE_BITS-1):0]   counter;
  reg [(`BRAM_ADDR_WIDTH-1):0]     cursor;
  initial cursor = 0;
  initial counter = 0;

  reg                              write_flag;
  reg                              out_w;
  reg                              dispatching_cbd;
  initial write_flag = 0;
  initial dispatching_cbd = 0;

  wire [(`BRAM_ADDR_WIDTH-1):0]   addr1;
  assign addr1 = write_flag ? inAddrWrite_d[(`WORDWIDTH-1):0] : inAddrRead_d[(`WORDWIDTH-1):0];
  bram14 mybram(clk, rst, write_flag, addr1, cursor, inDataWrite_d, out1, out2);

  assign out_d_ = { out_w , (dispatching_cbd ? out2 : out1) };

  always @(posedge clk) begin

    write_flag <= 0;

    if (!rst) begin
      `reset
      cursor  <= 0;
      counter <= 0;
      write_flag <= 0;
      dispatching_cbd <= 0;
    end else begin
      `flush
      `cleanup
      write_flag <= 0;

      // assumes we never want a zero-length codebag
      if (`inCBD_full && `out_empty) begin
        if (!dispatching_cbd) begin
          cursor          <= inCBD_d[(`WORDWIDTH-1):(`CODEBAG_SIZE_BITS)];
          counter         <= 0;
          dispatching_cbd <= 1;
        end
        `fill_out
        out_w <= 0;
      end else if (`inCBD_full && `out_draining) begin
        if (counter != inCBD_d[(`CODEBAG_SIZE_BITS-1):0]) begin
          cursor  <= cursor + 1;
          counter <= counter + 1;
        end else begin
          `drain_inCBD
          counter <= 0;
          dispatching_cbd <= 0;
        end
      end else if (!dispatching_cbd && `out_empty && `inAddrRead_full) begin
        `drain_inAddrRead
        `fill_out

      end else if (!dispatching_cbd && `out_empty && `inAddrWrite_full && `inDataWrite_full) begin
        // timing note: it's okay to drain here because *_d will still
        // be valid on the *very next* cycle, which is all we care about
        `drain_inAddrWrite
        `drain_inDataWrite
        `fill_out
        write_flag      <= 1;
        out_w           <= 1;
      end
    end
  end
    

== Test ==============================================================
// FIXME: test c-flag at out dock
// FIXME: rename to inCBD0, inAddrWrite0, etc

// expected output
#expect 12
#expect 13
#expect 14

// ships required in order to run this code
#ship debug          : Debug
#ship memory         : Memory

// instructions not in any codebag are part of the "root codebag"
// which is dispatched when the code is loaded

memory.out:
  set ilc=*;  collect packet, send;

memory.inCBD:
  set word= BOB;
  deliver;

BOB: {
  debug.in:
    set word= 12; deliver;
    set word= 13; deliver;
    set word= 14; deliver;
}


== Constants ========================================================

== Contributors =========================================================
Adam Megacz <megacz@cs.berkeley.edu>