+\subsubsection{Compiler Flags}
+
+Although NestedVM perfectly emulates a MIPS R2000 CPU, its performance
+profile is nothing like that of actual silicon. In particular, {\tt
+gcc} makes several optimizations that increase performance on an
+actually MIPS CPU but actually decrease the performance of
+NestedVM-generated bytecode. We found the following compiler options
+could be used to improve performance:
+
+\begin{itemize}
+
+\item {\tt -falign-functions}
+
+ Normally a function's location in memory has no effect on its
+ execution speed. However, in the NestedVM binary translator,
+ the {\tt .text} segment is split on power-of-two boundaries. If
+ a function starts near the end of one of these boundaries, a
+ performance critical part of the function winds up spanning two
+ Java methods. Telling {\tt gcc} to align all functions along
+ these boundaries decreases the chance of this sort of splitting.
+
+\item {\tt -fno-rename-registers}
+
+ On an actual silicon chip, using additional registers carries no
+ performance penalty (as long as none are spilled to the stack).
+ However, when generating bytecode, using {\it fewer}
+ ``registers'' helps the JVM optimize the machine code it
+ generates by simplifying the constraints it needs to deal with.
+ Disabling register renaming has this effect.
+
+\item {\tt -fno-schedule-insns}
+
+ Results of MIPS load operations are not available until {\it
+ two} instructions after the load. Without the {\tt
+ -fno-schedule-insns} instruction, {\tt gcc} will attempt to
+ reorder instructions to do other useful work during this period
+ of unavailability. NestedVM is under no such constraint, so
+ removing this reordering typically generates simpler machine
+ code.
+
+\item {\tt -mmemcpy}
+
+ Enabling this instruction causes {\tt gcc} to use the system
+ {\tt memcpy()} routine instead of generating loads and stores.
+ As explained in the next section, the NestedVM runtime
+ implements {\tt memcpy()} using {\tt System.arraycopy()}, which
+ is substantially more efficient.
+
+NestedVM has two primary ways of executing code, the interpreter, and the binary translators. Both the interpreter and the output from the binary translators sit on top of a Runtime class. This class provides the public interface to both the interpreter and the translated binaries.
+
+The Runtime class does the work that the operating system usually does.
+Conceptually the Runtime class can be thought of as the operating system and
+its subclasses (translated binaries and the interpreter) the CPU. The
+Runtime fulfills 5 primary goals:
+
+The Runtime class does the work that the operating system usually does. Conceptually the Runtime class can be thought of as the operating system and itÕs subclasses (translated binaries and the interpreter) the CPU. The Runtime fulfills 5 primary goals:
+
+\item {\tt -fno-delayed-branch} The MIPS CPU has a delay slot (see
+ above). Earlier versions of NestedVM didn't efficiently emulate
+ delay slots. This option causes GCC to avoid using delay slots
+ for anything (a NOP is simply placed in the delay slot). This
+ had a small performance benefit. However, recent versions of
+ NestedVM emulate delay slots with no performance overhead so
+ this options has little effect. Nonetheless, these delay slots
+ provide no benefit under NestedVM either so they are avoided
+ with this option.
+
+\item Provides a consistent external interface - The method of actually executing the code (currently only translated binaries and the interpreter) can be changed without any code changes to the caller because only Runtime exposes a public interface.
+
+\item Provide an easy to use interface - The interpreter and the output from the binary translators only know how to execute code. The Runtime class provides an easy to use interface to the code. It contains methods to pass arguments to the main() function, read and write from memory, and call individual functions in the binary.
+
+\item Manage the processÕs memory - The Runtime class contains large int[] arrays that represent the process`s entire memory space. Subclasses read and write to these arrays as required by the instructions they are executing. Subclasses can expend their memory space using the sbrk syscall.
+
+\item Provide access to the file system and streams - Subclasses access the file system through standard UNIX syscalls (read, write, open, etc). The Runtime manages the file descriptor table that maps UNIX file descriptors to Java RandomAccessFiles, InputStreams, OutputStreams, and sockets.
+
+\item Miscellaneous other syscalls - In additions to those mentioned above the Runtime class implements a variety of other syscalls (sleep, gettimeofday, getpagesize, sysconf, fcntl, etc).