switch charts to gnuplot

[nestedvm.git] / doc / ivme04.tex
diff --git a/doc/ivme04.tex b/doc/ivme04.tex

index 731022a..c7d544e 100644 (file)
--- a/doc/ivme04.tex
+++ b/doc/ivme04.tex
@@ -40,25 +40,22 @@ Complete Translation of Unsafe Native Code to Safe Bytecode
  Existing techniques for using code written in an unsafe language
  within a safe virtual machine generally involve transformations from
  one source code language (such as C, Pascal, or Fortran) to another
-(such as Java) which is the compiled into virtual machine bytecodes.
+(such as Java) which is then compiled into virtual machine bytecodes.
  
  We present an alternative approach which translate MIPS binaries
  produced by any compiler into safe virtual machine bytecodes.  This
-approach offers four key advantages over existing techniques:
-
-\begin{itemize}
-\item Language-agnostic
-\item Bug-for-bug compiler compatability
-\item No post-translation human intervention
-\item No build process modifications
-\end{itemize}
+approach offers four key advantages over existing techniques: it is
+language agnostic, it offers bug-for-bug compiler compatability,
+requires no post-translation human intervention, and introduces no
+build process modifications.
  
  We also present NestedVM, an implementation of this technique, and
-discuss six examples: LINPACK (Fortran), which was used as one of our
-performance tests, \TeX\ (Pascal), which was used to typeset this
-document, {\tt libjpeg}, {\tt libmspack}, and FreeType (all C source),
-which are currently in production use as part of the Ibex Project, and
-{\tt gcc}, which was used to compile all of the aforementioned.
+discuss its application to six software packages: LINPACK (Fortran),
+which was used as one of our performance tests, \TeX\ (Pascal), which
+was used to typeset this document, {\tt libjpeg}, {\tt libmspack}, and
+FreeType (all C source), which are currently in production use as part
+of the Ibex Project, and {\tt gcc}, which was used to compile all of
+the aforementioned.
  
  Performance measurements indicate a best case performance within 3x of
  native code and worst case typically within 10x, making it an
@@ -70,11 +67,11 @@ attractive solution for code which is not performance-critical.
  
  Unsafe languages such as C \cite{KR} and C++ \cite{soustroup} have
  been in use much longer than any of today's widely accepted safe
-languages such as Java \cite{java} and C\# \cite{csharp}.  Consequently, there is
-a huge library of software written in these languages.  Although safe
-languages offer substantial benefits, their comparatively young age
-often puts them at a disadvantage when breadth of existing support
-code is an important criterion.
+languages such as Java \cite{java} and C\# \cite{csharp}.
+Consequently, there is a huge library of software written in these
+languages.  Although safe languages offer substantial benefits, their
+comparatively young age often puts them at a disadvantage when breadth
+of existing support code is an important criterion.
  
  The typical solution to this dilemma is to use a native interface such
  as JNI \cite{jni} or CNI \cite{cni} to invoke unsafe code from within a
@@ -83,29 +80,43 @@ are a number of situations in which this is not an acceptable
  solution.  These situations can be broadly classified into two
  categories: {\it security concerns} and {\it portability concerns}.
  
-Using Java as an example, JNI and CNI are prohibited in a number of
-contexts, including applets environments and servlet containers with a
+Security is often a major concern when integrating native code.  Using
+Java as an example, JNI and CNI are prohibited in a number of
+contexts, including applet environments and servlet containers with a
  {\tt SecurityManager}.  Additionally, even in the context of trusted
  code, {\tt native} methods invoked via JNI are susceptible to buffer
  overflow and heap corruption attacks which are not a concern for
-verified bytecode.
+verified, bounds-checked bytecode.
  
  The second class of disadvantages revolves around portability
  concerns; native interfaces require the native library to be compiled
-ahead of time, for every architecture on which they will be
-deployed.  This is unworkable for situations in which the full set of
-target architectures is not known at deployment time.  Additionally,
-some JVM platform variants such as J2ME \cite{j2me} simply do not offer
-support for native code.
+ahead of time for every architecture on which it will be deployed.
+This is unacceptable for scenarios in which the full set of target
+architectures is not known at deployment time.  Additionally, some JVM
+platform variants such as J2ME \cite{j2me} simply do not offer support
+for native code.
  
  The technique we present here uses typical compiler to compile unsafe
  code into a MIPS binary, which is then translated on an
  instruction-by-instruction basis into Java bytecode.  The technique
  presented here is general; we anticipate that it can be applied to
-other secure virtual machines such as Microsoft's .NET \cite{msil}, Perl
-Parrot \cite{parrot}, or Python bytecode \cite{python}.
+other secure virtual machines such as Microsoft's .NET \cite{msil},
+Perl Parrot \cite{parrot}, or Python bytecode \cite{python}.
+
+The remainder of this paper is divided as follows: in the next section
+we review the relevant set of program representations (safe source,
+unsafe source, binary, and bytecode) and survey existing work for
+performing transformations between them.  In the third section we
+introduce NestedVM and cover its two primary translation modes in
+detail.  Section four describes the NestedVM runtime, which plays the
+role of the OS kernel.  Section five adresses the optimizations we
+employ and quantifies NestedVM's performance.  Section six reviews our
+experiences in applying NestedVM to various popular software packages.
+We conclude with an analysis of NestedVM's weaknesses and potential
+for future improvements.
  
-\section{Approaches to Translation}
+
+\section{Existing Work}
  
  The four program representations of interest in this context, along
  with their specific types in the C-to-JVM instantiation of the
@@ -154,17 +165,10 @@ translations performed by a few familiar tools:
  \endpsmatrix
  \end{pdfpic}
  
-Techniques for translating unsafe code into VM bytecode generally fall
-into four categories, which we expand upon in the next two sections:
-
-\begin{itemize}
-\item source-to-source translation
-\item source-to-binary translation
-\item binary-to-source translation
-\item binary-to-binary translation
-\end{itemize}
-
-\section{Existing Work}
+Existing techniques for translating unsafe code into VM bytecode
+generally fall into two categories, which we expand upon in the
+remainder of this section: source-to-source translation and
+source-to-binary translation.
  
  \subsection{Source-to-Source Translation}
  
@@ -198,38 +202,36 @@ translation which is completed by a human, and those which perform a
  total translation but fail (yield an error) on a large class of input
  programs.
  
-
-\subsubsection{Incomplete Translation}
+\subsubsection{Human-Assisted Translation}
  
  Jazillian \cite{jazillian} is a commercial solution which produces
  extremely readable Java source code from C source code, but ony
  translates a small portion of the C language.  Jazillian is unique in
  that in addition to {\it language migration}, it also performs {\it
-API migration}; for example, Jazillian is intelligent enough
-to translate {\tt char*~s1~=~strcpy(s2)} into {\tt String~s1~=~s2}.
-
-Unfortunately such deep analysis is intractible for most of the C
-language and standard library; Jazillian's documentation notes that
-{\it ``This is not your father's language translator.  It's not
-generating ugly code that's guaranteed to work out of the
-box... Jazillian does not always produce code that works correctly.''}
-
-MoHCA-Java \cite{mohca} is the other major tool in this category, and steps
-beyond Jazillian by providing tools for analysis of the source C++
-abstract syntax tree.  Additionally, MoHCA-Java's analysis engine is
-extensible, making it a platform for constructing application-specific
-translators rather than a single translation tool.  However,
-MoHCA-Java does not always generate complete Java code for all of the C++
-programs which it accepts.
-
-
-\subsubsection{Partial Domain Translation}
-
-The c2j \cite{c2j}, c2j++ \cite{c2jpp}, Cappucinno \cite{capp},
-and Ephedra \cite{ephedra} systems each provide support for complete
-translation of a {\it subset} of the source language (C or C++).  Each
-of the four tools supports a progressively greater subset than the one
-preceding it; however none covers the entire input language.
+API migration}; for example, Jazillian is intelligent enough to
+translate ``{\tt char*~s1~=~strcpy(s2)}'' into ``{\tt
+String~s1~=~s2}''.  Unfortunately such deep analysis is intractible
+for most of the C language and standard library; indeed, Jazillian's
+documentation notes that {\it ``This is not your father's language
+translator... Jazillian does not always produce code that works
+correctly.''}
+
+MoHCA-Java \cite{mohca} is the other major tool in this category, and
+steps beyond Jazillian by providing tools for analysis of the source
+C++ abstract syntax tree.  Additionally, MoHCA-Java's analysis engine
+is extensible, making it a platform for constructing
+application-specific translators rather than a single translation
+tool.  However, MoHCA-Java does not always generate complete Java code
+for all of the C++ programs which it accepts.
+
+\subsubsection{Partial-Domain Translation}
+
+The {\tt c2j} \cite{c2j}, {\tt c2j++} \cite{c2jpp}, Cappucinno
+\cite{capp}, and Ephedra \cite{ephedra} systems each provide support
+for complete translation of a {\it subset} of the source language (C
+or C++).  Each of the four tools supports a progressively greater
+subset than the one preceding it; however none covers the entire input
+language.
  
  Ephedra, the most advanced of the four, supports most of the C++
  language, and claims to produce ``human readable'' Java code as
@@ -272,40 +274,36 @@ language which has been modified to emit safe bytecode.
  \endpsmatrix
  \end{pdfpic}
  
-The primary occupant of this category is {\tt egcs-jvm}
-\cite{egcsjvm}, an experimental ``JVM backend'' for the GNU Compiler
-Collection ( {\tt gcc} ) \cite{gcc}.  Since {\tt gcc} employs a highly
-modular architecture, it {\it is} possible to add RTL code generators
-for nonstandard processors.  However, {\tt gcc}'s parsing, RTL
-generation, and optimization layers make fundamental assumptions (such
-as the availability of pointer math) which cannot be directly
-supported; thus the compiler still fails for a substantial class of
-input programs.
+An experimental ``JVM backend'' for the {\tt gcc} compiler, known as
+{\tt egcs-jvm} \cite{egcsjvm}, attempts this approach.  Since {\tt
+gcc} employs a highly modular architecture, it {\it is} possible to
+add RTL code generators for nonstandard processors.  However, {\tt
+gcc}'s parsing, RTL generation, and optimization layers make
+fundamental assumptions (such as the availability of pointer math)
+which cannot be directly supported; thus the compiler still fails for
+a substantial class of input programs.
  
  A Java backend for the {\tt lcc} [CITE] compiler, known as {\tt
-lcc-java} [CITE], but was not completed.  {\tt lcc-java} also lacks
-any form of system library ({\tt libc}), so very few C programs will
-run without custom modification, which would cause them to diverge
-from the upstream sources.  Finally, {\tt lcc-java}'s memory model is
-much more restricted; it uses a fixed-size array to represent all
-memory, and expands the array by allocating a new array and copying,
-which is extremely inefficient.  No attempt is made to take advantage
-of {\tt NullPointerException} checking (which costs nothing if the
-exception is not thrown since most JVMs use the MMU to detect this).
-Finally, {\tt lcc-java} targets Java source code, which places the
-vast majority of NestedVM's optimizations beyond its reach, and
-severely restricts the maximum program size {\tt lcc-java} can handle.
-
-Finally, {\tt lcc-java} maintains a separate memory area for the
-stack, which appears to limit the exchange of stack pointers and heap
-pointers.  It is unclear from the documentation how this is handled.
+lcc-java} [CITE], is also available.  Although this system is quite
+clean and elegantly designed, it lacks any form of system library
+({\tt libc}), so very few C programs will run without custom
+modification (which would cause them to diverge from the upstream
+sources).  The memory model employed by {\tt lcc-java} is also
+somewhat awkward; a separate {\tt int[]} is maintained for the stack
+and heap, leading to difficulties mingling pointers to these two
+memory regions.  Additionally, the heap is a single {\tt int[]}, which
+means that it must be copied in order to be expanded, and prevents
+{\tt lcc-java} from taking advantage of {\tt NullPointerException}
+checking, which costs nothing in the ``common case'' since nearly all
+JVMs use the host CPU's MMU to detect this condition.
  
  
  \section{NestedVM}
  
  The principal difference between NestedVM and other approaches is that
-NestedVM {\it does not} attempt to deal with source code as an input.
-This leads immediately to three advantages:
+NestedVM {\it does not} attempt to deal with source code as an input,
+instead opting for {\it binary-to-source} and {\it binary-to-binary}
+translation.  This offers three immediate advantages:
  
  \begin{itemize}
  \item {\bf Language Agnostic}
@@ -456,83 +454,53 @@ produce bytecode files:
  \begin{multicols}{2}
  {\footnotesize\begin{verbatim}
  private final static int r0 = 0;
-private int r1, r2, r3,...,r30;
+private int r1, r2, r3, /* ... */ r30;
  private int r31 = 0xdeadbeef;
  private int pc = ENTRY_POINT;
  
  public void run() {
-    for(;;) {
+    while (true)
          switch(pc) {
-            case 0x10000:
-                r29 = r29 - 32;
-            case 0x10004:
-                r1 = r4 + r5;
-            case 0x10008:
-                if(r1 == r6) {
-                    /* delay slot */
-                    r1 = r1 + 1;
-                    pc = 0x10018:
-                    continue;
-                }
-            case 0x1000C:
-                r1 = r1 + 1;
-            case 0x10010:
-                r31 = 0x10018;
-                pc = 0x10210;
-                continue;
-            case 0x10014:
-                /* nop */
-            case 0x10018:
-                pc = r31;
-                continue;
+            case 0x10000: r29 = r29 - 32;
+            case 0x10004: r1 = r4 + r5;
+            case 0x10008: if (r1 == r6) {
+                            /* delay slot */
+                            r1 = r1 + 1;
+                            pc = 0x10018:
+                            continue; }
+            case 0x1000C: r1 = r1 + 1;
+            case 0x10010: r31 = 0x10018;
+                          pc = 0x10210;
+                          continue;
+            case 0x10014: /* nop */
+            case 0x10018: pc = r31; continue;
              ...
-            case 0xdeadbeef:
-                System.err.println(``Exited.'');
-                System.exit(1);
-        }
-    }
-}
+            case 0xdeadbeef:  System.exit(1);
+...
  \end{verbatim}}
  \vspace{1in}
  {\footnotesize\begin{verbatim}
  public void run_0x10000() {
-    for(;;) {
-    switch(pc) {
-        case 0x10000:
-            ...
-        case 0x10004:
-            ...
-        ...
-        case 0x10010:
-            r31 = 0x10018;
-            pc = 0x10210;
-            return;
-        ...
-    }
-    }
-}
+    while (true) switch(pc) {
+        case 0x10000: ...
+        case 0x10004: ...
+        case 0x10010: r31 = 0x10018;
+                      pc = 0x10210;
+                      continue;
+....
  
  pubic void run_0x10200() {
-    for(;;) {
-    switch(pc) {
-        case 0x10200:
-            ...
-        case 0x10204:
-            ...
-    }
-    }
-}
+    while (true) switch(pc) {
+        case 0x10200: ...
+        case 0x10204: ...
+....
  
  public void trampoline() {
-    for(;;) {
-    switch(pc&0xfffffe00) {
-            case 0x10000: run_0x10000(); break;
-            case 0x10200: run_0x10200(); break;
-            case 0xdeadbe00:
-                ...
-        }
-    }
-}
+    while (true) switch(pc&0xfffffe00) {
+        case 0x10000:    run_0x10000(); break;
+        case 0x10200:    run_0x10200(); break;
+        case 0xdeadbe00: ...
+....
  \end{verbatim}}
  \end{multicols}
  \end{minipage}
@@ -561,7 +529,161 @@ Translating unsafe code for use within a JVM proceeds as follows:
  
  \end{enumerate}
  
-\subsubsection{Optimizations}
+
+\subsection{Binary-to-Binary}
+
+After implementing the binary-to-source compiler, a binary-to-binary
+translation mode was added.
+
+\begin{pdfpic}
+\newlength{\MyLength}
+\settowidth{\MyLength}{xmachine codex}
+\newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
+\psmatrix[colsep=2,rowsep=0,nrot=:U]
+  & \\[0pt]
+  [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source}   \\[0pt]
+  & \\[0pt]
+  & \\[0pt]
+  & \\[0pt]
+  & \\[0pt]
+  & \\[0pt]
+  [name=b0]\MyBox{machine code}  & [name=b1]\MyBox{safe bytecode} \\[0pt]
+  & \\[0pt]
+  \psset{nodesep=5pt,arrows=->}
+  \ncline{s0}{b0}\bput{:U}{\tt gcc}
+  \ncline{b0}{b1}\naput{\tt NestedVM}
+\endpsmatrix
+\end{pdfpic}
+
+This mode has several advantages:
+
+\begin{itemize}
+      
+\item There are quite a few interesting bytecode sequences that cannot
+      be generated as a result of compiling Java source code.
+
+\item Directly generating {\tt .class} files Eliminates the
+      time-consuming {\tt javac} step.
+
+\item Direct compilation to {\tt .class} files opens up the
+      interesting possibility of dynamically translating MIPS binaries
+      and loading them via {\tt ClassLoader.fromBytes()} {\it at
+      deployment time}, eliminating the need to compile binaries ahead
+      of time.
+
+\end{itemize}
+
+\section{The NestedVM Runtime}
+
+In addition to binary-to-source and binary-to-binary translation,
+NestedVM also includes a MIPS binary interpreter.  All three
+translation approaches expose the same API to both the translated
+binary and the surrounding VM (including peer Java code).
+
+The NestedVM Runtime (various subclasses of {\tt
+org.ibex.nestedvm.Runtime}) fill the role of an OS Kernel.
+Communication between MIPS code and the outside world is via the MIPS
+{\tt SYSCALL} instruction, just as the {\tt libc}-kernel interface is
+on real MIPS implementations.
+
+Two implemenations of the runtime are offered; a simple runtime with
+the minimum support required to comply with ANSI C, and a more
+sophisticated runtime which emulates a large portion of the POSIX API.
+
+\subsection{The ANSI C Runtime}
+
+The ANSI C runtime offers typical file I/O operations including {\tt
+open()}, {\tt close()}, {\tt read()}, {\tt write()}, and {\tt seek()}.
+File descriptors are implemented much as they are in OS kernels; a
+table of open files is maintained and descriptors act as an index into
+that table.  Each file is represented as a Java {\tt RandomAccessFile}
+in order to match the semantics of {\tt seek()}.
+
+Process-level memory management is done through the {\tt sbrk()}
+system call, which extends the process heap by adding more pages to
+the memory page table.  Fast memory clear and copy operations can be
+performed with {\tt memset()} and {\tt memcpy()}, which invoke the
+Java {\tt System.arraycopy()} method, which is generally much faster
+than a {\tt for()} loop.
+
+The {\tt exit()} call records the exit status, marks the VM instance
+as terminated and halts execution.  The {\tt pause()} syscall
+implements a crude form of Java-MIPS communication by returning
+control to the Java code which spawned the MIPS process.
+
+\subsection{The Unix Runtime}
+
+The Unix runtime extends the simple ANSI file I/O model to include a
+single-root filesystem, and device nodes, as well as {\tt fcntl()}
+APIs to manipulate these.  Device nodes are generally simulated by
+mapping reads, writes, and {\tt fcntl()}s on the device to the
+appropriate Java API.
+
+MIPS processes can ``mount'' other filesystems within the virtual
+filesystem made visible to the MIPS process.  Each filesystem is
+implemented by a Java class, which could, for example, offer access to
+the host filesystem (including {\tt state()}, {\tt lstat()}, {\tt
+mkdir}, and {\tt unlink()}, and {\tt getdents()}), the contents of a
+zip archive, or even a remote HTTP server.
+
+MIPS processes can also spawn subprocesses using the {\tt fork()} and
+{\tt exec()} calls, which create new Java threads to run the process.
+The {\tt fork()} call -- which is supposed to duplicate the memory
+image of a process -- is implemented in an elegant manner by calling
+the Java {\tt clone()} method (deep copy) on the VM object itself.
+Copy-on-write is not currently implemented.  The new instance is added
+to a static process table to facilitate interprocess communication.
+
+The {\tt waitpid()} API allows a parent process to block pending the
+completion of a child process, which is done quite easily with the
+Java {\tt wait()} method.
+
+The {\tt exec()} method actually loads a MIPS binary image from the
+filesystem, feeds it to the MIPS-to-bytecode translator, and then
+loads the resulting bytecode on the fly using {\tt
+ClassLoader.loadBytes()}.  The {\tt pipe()} system call permits
+parent-to-child IPC just as on a normal Unix system.
+
+Simple networking support is provided by the {\tt opensocket()}, {\tt
+listensocket()}, and {\tt accept()} methods, which are not currently
+compatible with the usual Berkeley sockets API.
+
+
+\subsection{Security Concerns}
+
+RuntimeExceptions don't escape, they care caught and turned into
+checked exceptions filesystem access does though security manager
+(NestedVM Runtime.SecurityManager, and the JVM's)
+
+
+\subsection{Threading}
+
+The NestedVM runtime currently does not support threading.  Providing
+robust support for ``true threads'', whereby each MIPS thread maps to
+a Java thread is probably not possible as the Java Memory Model
+[CITE], since all MIPS memory is stored in a set of {\tt int[]}'s and
+the Java Memory Model does not permit varying treatment or coherency
+policies at the granularity of a single array element.
+
+While this presents a major barrier for applications that use
+sophisticated locking schemes (such as {\it hash synchronization})
+which depend on atomic memory operations, it is probably possible to
+apply this threading model to ``well behaved'' multithreaded
+applications which depend only on OS-provided semaphores and mutexes
+for synchronization.
+
+Complex synchronization and incorrectly synchronized applications can
+be supported by implementing a variant of {\it user threads} within a
+single Java thread by providing a timer interrupt (via a Java
+asynchronous exception).  Unfortunately this requires that the
+compiled binary be able to restart from any arbitrary instruction
+address, which would require a {\tt case} statement for every
+instruction (rather than every jump target), which would degrade
+performance and increase the size of the resulting class file.
+
+\section{Optimization and Performance}
+
+\subsection{Binary-to-Source mode}
  
  Generating Java source code instead of bytecode frees NestedVM from
  having to perform simple constant propagation optimizations, as most
@@ -618,9 +740,9 @@ methods invoked from the trampoline.  Trial and error led to the
  onclusion that HotSpot \cite{hotspot} -- the most widely deployed JVM
  -- performs best when 128 MIPS instructions are mapped to each method.
  
-\epsfig{file=chart5,width=3in}
+\epsfig{file=charts/chart5,width=3in}
  
-\epsfig{file=chart6,width=3in}
+\epsfig{file=charts/chart6,width=3in}
  
  This phenomenon is due to two factors:
  
@@ -697,55 +819,13 @@ degredation and a similarly small increase in the size of the {\tt
  directly to {\tt .class} files (without the intermediate {\tt .java}
  file) eliminates this problem entirely.
  
-
-\subsection{Binary-to-Binary}
-
-After implementing the binary-to-source compiler, a binary-to-binary
-translation mode was added.
-
-\begin{pdfpic}
-\newlength{\MyLength}
-\settowidth{\MyLength}{xmachine codex}
-\newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
-\psmatrix[colsep=2,rowsep=0,nrot=:U]
-  & \\[0pt]
-  [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source}   \\[0pt]
-  & \\[0pt]
-  & \\[0pt]
-  & \\[0pt]
-  & \\[0pt]
-  & \\[0pt]
-  [name=b0]\MyBox{machine code}  & [name=b1]\MyBox{safe bytecode} \\[0pt]
-  & \\[0pt]
-  \psset{nodesep=5pt,arrows=->}
-  \ncline{s0}{b0}\bput{:U}{\tt gcc}
-  \ncline{b0}{b1}\naput{\tt NestedVM}
-\endpsmatrix
-\end{pdfpic}
-
-This mode has several advantages:
-
-\begin{itemize}
-      
-\item There are quite a few interesting bytecode sequences that cannot
-      be generated as a result of compiling Java source code.
-
-\item Directly generating {\tt .class} files Eliminates the
-      time-consuming {\tt javac} step.
-
-\item Direct compilation to {\tt .class} files opens up the
-      interesting possibility of dynamically translating MIPS binaries
-      and loading them via {\tt ClassLoader.fromBytes()} {\it at
-      deployment time}, eliminating the need to compile binaries ahead
-      of time.
-
-\end{itemize}
+\subsection{Binary-to-Binary mode}
  
  Most of the performance improvemen where made where in the handling of
  branch instructions and in taking advantage of the JVM stack to
  eliminate unnecessary {\tt LOAD}s and {\tt STORE}s to local variables.
  
-\epsfig{file=chart7,width=3in}
+\epsfig{file=charts/chart7,width=3in}
  
  The first optimization gained by direct bytecode generation came from
  the use of the JVM {\tt GOTO} instruction.  Despite the fact that the
@@ -825,7 +905,7 @@ ALOAD\_0} to prepare for a {\tt INVOKESPECIAL} call.  By simply
  lifting this bytecode outside of the {\tt switch} statement, each {\tt
  case} arm shrinks by one instruction.
  
-\subsubsection{Compiler Flags}
+\subsection{Compiler Flags}
  
  Although NestedVM perfectly emulates a MIPS R2000 CPU, its performance
  profile is nothing like that of actual silicon.  In particular, {\tt
@@ -884,11 +964,15 @@ could be used to improve performance:
  The effects of the various optimizations presented in this chapter are
  summarized in the table below.
  
-\epsfig{file=chart4,width=3in}
+\epsfig{file=charts/chart4,width=3in}
+
+\epsfig{file=charts/chart3,width=3in}
+
+\epsfig{file=charts/chart8,width=3in}
  
-\epsfig{file=chart3,width=3in}
+\epsfig{file=charts/chart9,width=3in}
  
-\section{Experiences}
+\section{Sample Applications}
  
  \subsection{FreeType, {\tt libmspack}, and {\tt libjpeg}}
  
@@ -953,119 +1037,16 @@ approximation.
  
  
  
-\section{The NestedVM Runtime}
-
-In addition to binary-to-source and binary-to-binary translation,
-NestedVM also includes a MIPS binary interpreter.  All three
-translation approaches expose the same API to both the translated
-binary and the surrounding VM (including peer Java code).
-
-The NestedVM Runtime (various subclasses of {\tt
-org.ibex.nestedvm.Runtime}) fill the role of an OS Kernel.
-Communication between MIPS code and the outside world is via the MIPS
-{\tt SYSCALL} instruction, just as the {\tt libc}-kernel interface is
-on real MIPS implementations.
-
-Two implemenations of the runtime are offered; a simple runtime with
-the minimum support required to comply with ANSI C, and a more
-sophisticated runtime which emulates a large portion of the POSIX API.
-
-\subsection{The ANSI C Runtime}
-
-The ANSI C runtime offers typical file I/O operations including {\tt
-open()}, {\tt close()}, {\tt read()}, {\tt write()}, and {\tt seek()}.
-File descriptors are implemented much as they are in OS kernels; a
-table of open files is maintained and descriptors act as an index into
-that table.  Each file is represented as a Java {\tt RandomAccessFile}
-in order to match the semantics of {\tt seek()}.
-
-Process-level memory management is done through the {\tt sbrk()}
-system call, which extends the process heap by adding more pages to
-the memory page table.  Fast memory clear and copy operations can be
-performed with {\tt memset()} and {\tt memcpy()}, which invoke the
-Java {\tt System.arraycopy()} method, which is generally much faster
-than a {\tt for()} loop.
-
-The {\tt exit()} call records the exit status, marks the VM instance
-as terminated and halts execution.  The {\tt pause()} syscall
-implements a crude form of Java-MIPS communication by returning
-control to the Java code which spawned the MIPS process.
-
-\subsection{The Unix Runtime}
-
-The Unix runtime extends the simple ANSI file I/O model to include a
-single-root filesystem, and device nodes, as well as {\tt fcntl()}
-APIs to manipulate these.  Device nodes are generally simulated by
-mapping reads, writes, and {\tt fcntl()}s on the device to the
-appropriate Java API.
-
-MIPS processes can ``mount'' other filesystems within the virtual
-filesystem made visible to the MIPS process.  Each filesystem is
-implemented by a Java class, which could, for example, offer access to
-the host filesystem (including {\tt state()}, {\tt lstat()}, {\tt
-mkdir}, and {\tt unlink()}, and {\tt getdents()}), the contents of a
-zip archive, or even a remote HTTP server.
-
-MIPS processes can also spawn subprocesses using the {\tt fork()} and
-{\tt exec()} calls, which create new Java threads to run the process.
-The {\tt fork()} call -- which is supposed to duplicate the memory
-image of a process -- is implemented in an elegant manner by calling
-the Java {\tt clone()} method (deep copy) on the VM object itself.
-Copy-on-write is not currently implemented.  The new instance is added
-to a static process table to facilitate interprocess communication.
-
-The {\tt waitpid()} API allows a parent process to block pending the
-completion of a child process, which is done quite easily with the
-Java {\tt wait()} method.
-
-The {\tt exec()} method actually loads a MIPS binary image from the
-filesystem, feeds it to the MIPS-to-bytecode translator, and then
-loads the resulting bytecode on the fly using {\tt
-ClassLoader.loadBytes()}.  The {\tt pipe()} system call permits
-parent-to-child IPC just as on a normal Unix system.
-
-Simple networking support is provided by the {\tt opensocket()}, {\tt
-listensocket()}, and {\tt accept()} methods, which are not currently
-compatible with the usual Berkeley sockets API.
-
-
-\subsection{Security Concerns}
  
-RuntimeExceptions don't escape, they care caught and turned into
-checked exceptions filesystem access does though security manager
-(NestedVM Runtime.SecurityManager, and the JVM's)
-
-
-\subsection{Threading}
-
-The NestedVM runtime currently does not support threading.  Providing
-robust support for ``true threads'', whereby each MIPS thread maps to
-a Java thread is probably not possible as the Java Memory Model
-[CITE], since all MIPS memory is stored in a set of {\tt int[]}'s and
-the Java Memory Model does not permit varying treatment or coherency
-policies at the granularity of a single array element.
  
-While this presents a major barrier for applications that use
-sophisticated locking schemes (such as {\it hash synchronization})
-which depend on atomic memory operations, it is probably possible to
-apply this threading model to ``well behaved'' multithreaded
-applications which depend only on OS-provided semaphores and mutexes
-for synchronization.
+\section{Conclusion}
  
-Complex synchronization and incorrectly synchronized applications can
-be supported by implementing a variant of {\it user threads} within a
-single Java thread by providing a timer interrupt (via a Java
-asynchronous exception).  Unfortunately this requires that the
-compiled binary be able to restart from any arbitrary instruction
-address, which would require a {\tt case} statement for every
-instruction (rather than every jump target), which would degrade
-performance and increase the size of the resulting class file.
+\subsection{Theoretical Limitations}
  
  Theoretical limitations: threads, reading from code, self-modifying
  code, COW?
  
-
-\section{Future Directions}
+\subsection{Future Directions}
  
  Although we have only implemented it for the Java Virtual Machine, our
  technique generalizes to other safe bytecode architectures.  In
@@ -1076,17 +1057,7 @@ Additionally, we would like to explore other uses for dynamic loading
  of translated MIPS binaries by combining NestedVM (which itself is
  written in Java) and the {\tt ClassLoader.fromBytes()} mechanism.
  
-
-\section{Conclusion}
-
-We have presented a novel technique for using libraries written in
-unsafe languages within a safe virtual machine without resorting to
-native interfaces.  We have implemented this technique in NestedVM,
-which is currently used by the Ibex project\footnote{{\tt
-http://www.ibex.org}} to perform font rasterization (via {\tt
-libfreetype}), JPEG decoding (via {\tt libjpeg}), and CAB archive
-extraction (via {\tt libmspack}), three libraries for which no
-equivalent Java classes exist.
+\subsection{Availability}
  
  NestedVM is available under an open source license, and can be
  obtained from
@@ -1094,7 +1065,7 @@ obtained from
      http://nestedvm.ibex.org
  \end{verbatim}
  
-
+\appendix
  \section{Appendix: Testing Methodology}
  
  All times are measured in seconds. These were all run on a dual 1Ghz