+v v v v v v v
+\end{verbatim}}
+*************
+\item {\tt -falign-functions}
+Normally a function's location in memory has no effect on its execution
+speed. However, in the NestedVM binary translator, the .text segment is
+split up on power of two boundaries. If a function is unlucky enough to
+start near the end of one of these boundaries a performance critical part of
+the function could end up spanning two methods. There is a significant
+amount of overhead in switching between two methods so this must be avoided
+at all costs. By telling GCC to align all functions to the boundary that the
+.text segment is split on the chances of a critical part of a function
+spanning two methods is significantly reduced.
+^ ^ ^ ^ ^ ^ ^
+
+v v v v v v v
+These two methods can even be combined. MIPS can call Java through the
+CALL\_JAVA syscall, which can in turn invoke a MIPS function in the binary
+with the call() method.
+*************
+\item {\tt -fno-rename-registers}
+Some processors can better schedule code when registers are not reused for
+two different purposes. By default GCC will try to use as many registers as
+possibly when it can. This excess use of registers just confuses JIT's
+trying to compile the output from the binary translator. All the JIT
+compilers we tested do much better with a few frequently used registers.
+^ ^ ^ ^ ^ ^ ^
+
+v v v v v v v
+Users preferring a simpler communication mechanism can also use Java
+Stream's and file descriptors. Runtime provides a simple interface for
+mapping a Java Input or OutputStream to a File Descriptor.
+*************
+\item {\tt -fno-delayed-branch}
+The MIPS CPU has a delay slot (see above). Earlier versions of NestedVM did
+not efficiently emulate delay slots. This option causes GCC to avoid using
+delay slots for anything (a NOP is simply placed in the delay slot). This
+had a small performance benefit. However, recent versions of NestedVM
+emulate delay slots with no performance overhead so this options has little
+effect. Nonetheless, these delay slots provide no benefit under NestedVM
+either so they are avoided with this option.
+^ ^ ^ ^ ^ ^ ^
+
+v v v v v v v
+%Java source code can create a copy of the translated binary by
+%instantiating the corresponding class, which extends {\tt Runtime}.
+%Invoking the {\tt main()} method on this class is equivalent to
+%calling the {\tt main()} function within the binary; the {\tt String}
+%arguments to this function are copied into the binary's memory space
+%and made available as {\tt **argv} and {\tt argc}.
+*************
+\item {\tt -fno-schedule-insns}
+Load operations in the MIPS ISA also have a delay slot. The results of a
+load operation are not available for use until one instruction later.
+Several other instructions also have similar delay slots. GCC tries to do
+useful work wile waiting for the results of one of these operations by
+default. However, this, like register renaming, tends to confuse JIT
+compilers. This option prevents GCC from going out of its way to take
+advantage of these delay slots and makes the code generated by NestedVM
+easier for JIT compilers to handle.
+^ ^ ^ ^ ^ ^ ^
+
+v v v v v v v
+%The translated binary communicates with the rest of the VM by
+%executing MIPS {\tt SYSCALL} instructions, which are translated into
+%invocations of the {\tt syscall()} method. This calls back to the
+%native Java world, which can manipulate the binary's environment by
+%reading and writing to its memory space, checking its exit status,
+%pausing the VM, and restarting the VM.
+*************
+\item {\tt -mmemcpy}
+GCC sometimes has to copy somewhat large areas of memory. The most common
+example of this is assigning one struct to another. Memory copying can be
+done far more efficiently in Java than under NestedVM. Calls to the memcpy
+libc function are treated specially by the binary translator. They are
+turned into calls to a memcpy method in Runtime. The {\tt -mmemcpy} option
+causes GCC to invoke libc's memcpy() function when it needs to copy a region
+of memory rather than generating its own memcpy code. This call in then
+turned into a call to this Java memcpy function which is significantly
+faster than the MIPS implementation.
+^ ^ ^ ^ ^ ^ ^
+
+v v v v v v v
+*************
+\item {\tt -ffunction-sections -fdata-sections}
+These two options are used in conjunction with the {\tt --gc-section} linker
+option. These three options cause the linker to aggressively discard unused
+functions and data sections. In some cases this leads to significantly
+smaller binaries.
+^ ^ ^ ^ ^ ^ ^
+
+%\subsection{Virtualization}
+
+%The {\tt Runtime} class implements the majority of the standard {\tt
+%libc} syscalls, providing a complete interface to the filesystem,
+%network socket library, time of day, (Brian: what else goes here?).
+
+v v v v v v v
+%\begin{itemize}
+*************
+\begin{itemize}
+^ ^ ^ ^ ^ ^ ^
+
+\item Better use of local variables in binary-to-binary compiler -- need to
+do data flow analysis to find how how and when registers are used and avoid
+the costly load/restore when it isn't necessary.
+
+\item More advanced Runtime support -- support more syscalls. This will
+allow running large applications such as GCC under NestedVM.
+
+\item World domination