1 \documentclass{acmconf}
4 \usepackage{amssymb,amsmath,epsfig,alltt}
15 \bibliographystyle{amsplain}
17 \title{\textbf{\textsf{
18 Complete Translation of Unsafe Native Code to Safe Bytecode
21 \author{\begin{tabular}{@{}c@{}}
22 {\em {Brian Alliet}} \\
23 {Rochester Institute of Technology}\\
24 {\tt bja8464@cs.rit.edu}
25 \end{tabular}\hskip 1in\begin{tabular}{@{}c@{}}
26 {\em {Adam Megacz}} \\
27 {University of California, Berkeley} \\
28 {\tt megacz@cs.berkeley.edu}
34 {\it This document was typeset using D. E. Knuth's original \TeX 89,
35 which was both compiled and executed entirely within a Java
36 Virtual Machine without the use of native code.}
40 FIXME: mention unsigned arithmetic, threading, memory model.
42 Existing techniques for using code written in an unsafe language
43 within a safe virtual machine generally involve transformations from
44 one source code language (such as C, Pascal, or Fortran) to another
45 (such as Java) which is the compiled into virtual machine bytecodes.
47 We present an alternative approach which translate MIPS binaries
48 produced by any compiler into safe virtual machine bytecodes. This
49 approach offers four key advantages over existing techniques:
52 \item Language-agnostic
53 \item Bug-for-bug compiler compatability
54 \item No post-translation human intervention
55 \item No build process modifications
58 We also present NestedVM, an implementation of this technique, and
59 discuss six examples: LINPACK (Fortran), which was used as one of our
60 performance tests, \TeX\ (Pascal), which was used to typeset this
61 document, {\tt libjpeg}, {\tt libmspack}, and FreeType (all C source),
62 which are currently in production use as part of the Ibex Project, and
63 {\tt gcc}, which was used to compile all of the aforementioned.
65 Performance measurements indicate a best case performance within 3x of
66 native code and worst case typically within 10x, making it an
67 attractive solution for code which is not performance-critical.
71 \section{Introduction}
73 Unsafe languages such as C \cite{KR} and C++ \cite{soustroup} have
74 been in use much longer than any of today's widely accepted safe
75 languages such as Java \cite{java} and C\# \cite{csharp}. Consequently, there is
76 a huge library of software written in these languages. Although safe
77 languages offer substantial benefits, their comparatively young age
78 often puts them at a disadvantage when breadth of existing support
79 code is an important criterion.
81 The typical solution to this dilemma is to use a native interface such
82 as JNI \cite{jni} or CNI \cite{cni} to invoke unsafe code from within a
83 virtual machine or otherwise safe environment. Unfortunately, there
84 are a number of situations in which this is not an acceptable
85 solution. These situations can be broadly classified into two
86 categories: {\it security concerns} and {\it portability concerns}.
88 Using Java as an example, JNI and CNI are prohibited in a number of
89 contexts, including applets environments and servlet containers with a
90 {\tt SecurityManager}. Additionally, even in the context of trusted
91 code, {\tt native} methods invoked via JNI are susceptible to buffer
92 overflow and heap corruption attacks which are not a concern for
95 The second class of disadvantages revolves around portability
96 concerns; native interfaces require the native library to be compiled
97 ahead of time, for every architecture on which they will be
98 deployed. This is unworkable for situations in which the full set of
99 target architectures is not known at deployment time. Additionally,
100 some JVM platform variants such as J2ME \cite{j2me} simply do not offer
101 support for native code.
103 The technique we present here uses typical compiler to compile unsafe
104 code into a MIPS binary, which is then translated on an
105 instruction-by-instruction basis into Java bytecode. The technique
106 presented here is general; we anticipate that it can be applied to
107 other secure virtual machines such as Microsoft's .NET \cite{msil}, Perl
108 Parrot \cite{parrot}, or Python bytecode \cite{python}.
110 \section{Approaches to Translation}
112 The four program representations of interest in this context, along
113 with their specific types in the C-to-JVM instantiation of the
114 problem are shown in the following diagram:
117 \newlength{\MyLength}
118 \settowidth{\MyLength}{machine code}
119 \newcommand{\MyBox}[1]{\makebox[\MyLength][c]{#1}}
120 \begin{psmatrix}[colsep=2,rowsep=0]
122 [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source} \\[0pt]
123 [name=s00]\MyBox{\tt (.c)} & [name=s11]\MyBox{\tt (.java)} \\[0pt]
127 [name=b0]\MyBox{machine code} & [name=b1]\MyBox{safe bytecode} \\[0pt]
128 [name=b00]\MyBox{\tt (.o)} & [name=b11]\MyBox{\tt (.class)} \\
130 \psset{nodesep=5pt,arrows=->}
134 To illustrate the context of this diagram, the following arcs show the
135 translations performed by a few familiar tools:
138 \newlength{\MyLength}
139 \settowidth{\MyLength}{xmachine codex}
140 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
141 \psmatrix[colsep=2,rowsep=0,nrot=:D]
143 [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source} \\[0pt]
149 [name=b0]\MyBox{machine code} & [name=b1]\MyBox{safe bytecode} \\[0pt]
151 \psset{nodesep=5pt,arrows=->}
152 \ncline{s0}{b0}\bput{:U}{\tt gcc}
153 \ncline{s1}{b0}\bput{:D}{\tt gcj}
154 \ncline{s1}{b1}\aput{:U}{\tt javac}
155 \ncline{b1}{b0}\aput{:D}{\tt gcj}\bput{:D}{JITs}
159 Techniques for translating unsafe code into VM bytecode generally fall
160 into four categories, which we expand upon in the next two sections:
163 \item source-to-source translation
164 \item source-to-binary translation
165 \item binary-to-source translation
166 \item binary-to-binary translation
169 \section{Existing Work}
171 \subsection{Source-to-Source Translation}
173 The most common technique employed is partial translation from unsafe
174 source code to safe source code:
177 \newlength{\MyLength}
178 \settowidth{\MyLength}{xmachine codex}
179 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
180 \psmatrix[colsep=2,rowsep=0,nrot=:U]
183 [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source} \\[0pt]
189 [name=b0]\MyBox{machine code} & [name=b1]\MyBox{safe bytecode} \\[0pt]
191 \psset{nodesep=5pt,arrows=->}
192 \ncline{s0}{s1}\aput{:U}{source-to}\bput{:U}{source}
193 \ncline{s1}{b1}\aput{:U}{\tt javac}
197 A number of existing systems employ this technique; they can
198 be divided into two categories: those which perform a partial
199 translation which is completed by a human, and those which perform a
200 total translation but fail (yield an error) on a large class of input
204 \subsubsection{Incomplete Translation}
206 Jazillian \cite{jazillian} is a commercial solution which produces
207 extremely readable Java source code from C source code, but ony
208 translates a small portion of the C language. Jazillian is unique in
209 that in addition to {\it language migration}, it also performs {\it
210 API migration}; for example, Jazillian is intelligent enough
211 to translate {\tt char*~s1~=~strcpy(s2)} into {\tt String~s1~=~s2}.
213 Unfortunately such deep analysis is intractible for most of the C
214 language and standard library; Jazillian's documentation notes that
215 {\it ``This is not your father's language translator. It's not
216 generating ugly code that's guaranteed to work out of the
217 box... Jazillian does not always produce code that works correctly.''}
219 MoHCA-Java \cite{mohca} is the other major tool in this category, and steps
220 beyond Jazillian by providing tools for analysis of the source C++
221 abstract syntax tree. Additionally, MoHCA-Java's analysis engine is
222 extensible, making it a platform for constructing application-specific
223 translators rather than a single translation tool. However,
224 MoHCA-Java does not always generate complete Java code for all of the C++
225 programs which it accepts.
228 \subsubsection{Partial Domain Translation}
230 The c2j \cite{c2j}, c2j++ \cite{c2jpp}, Cappucinno \cite{capp},
231 and Ephedra \cite{ephedra} systems each provide support for complete
232 translation of a {\it subset} of the source language (C or C++). Each
233 of the four tools supports a progressively greater subset than the one
234 preceding it; however none covers the entire input language.
236 Ephedra, the most advanced of the four, supports most of the C++
237 language, and claims to produce ``human readable'' Java code as
238 output. Notable omissions from the input domain include support for
239 fully general pointer arithmetic, casting between unrelated types, and
240 reading from a {\tt union} via a different member than the one most
243 Unfortunately, when the program being translated is large and complex,
244 it is quite likely that it will use an unsupported feature in at least
245 one place. In the absence of a programmer who understands the source
246 program, a single anomoly is often enough to render the entire
247 translation process useless. As a result, these tools are mainly
248 useful as an {\it aid} to programmers who could normally perform the
249 conversion themselves, but want to save time by automating most of the
253 \subsection{Source-to-Binary Translation}
255 Source-to-binary translation involves a compiler for the unsafe
256 language which has been modified to emit safe bytecode.
259 \newlength{\MyLength}
260 \settowidth{\MyLength}{xmachine codex}
261 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
262 \psmatrix[colsep=2,rowsep=0,nrot=:U]
264 [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source} \\[0pt]
270 [name=b0]\MyBox{machine code} & [name=b1]\MyBox{safe bytecode} \\[0pt]
272 \psset{nodesep=5pt,arrows=->}
273 \ncline{s0}{b1}\bput{:U}{source-to-binary}
277 The primary occupant of this category is {\tt egcs-jvm}
278 \cite{egcsjvm}, an experimental ``JVM backend'' for the GNU Compiler
279 Collection ( {\tt gcc} ) \cite{gcc}. Since {\tt gcc} employs a highly
280 modular architecture, it {\it is} possible to add RTL code generators
281 for nonstandard processors. However, {\tt gcc}'s parsing, RTL
282 generation, and optimization layers make fundamental assumptions (such
283 as the availability of pointer math) which cannot be directly
284 supported; thus the compiler still fails for a substantial class of
287 A Java backend for the {\tt lcc} [CITE] compiler, known as {\tt
288 lcc-java} [CITE], but was not completed. {\tt lcc-java} also lacks
289 any form of system library ({\tt libc}), so very few C programs will
290 run without custom modification, which would cause them to diverge
291 from the upstream sources. Finally, {\tt lcc-java}'s memory model is
292 much more restricted; it uses a fixed-size array to represent all
293 memory, and expands the array by allocating a new array and copying,
294 which is extremely inefficient. No attempt is made to take advantage
295 of {\tt NullPointerException} checking (which costs nothing if the
296 exception is not thrown since most JVMs use the MMU to detect this).
297 Finally, {\tt lcc-java} targets Java source code, which places the
298 vast majority of NestedVM's optimizations beyond its reach, and
299 severely restricts the maximum program size {\tt lcc-java} can handle.
301 Finally, {\tt lcc-java} maintains a separate memory area for the
302 stack, which appears to limit the exchange of stack pointers and heap
303 pointers. It is unclear from the documentation how this is handled.
308 The principal difference between NestedVM and other approaches is that
309 NestedVM {\it does not} attempt to deal with source code as an input.
310 This leads immediately to three advantages:
313 \item {\bf Language Agnostic}
315 Because NestedVM does not attempt to implement the parsing and
316 code generation steps of compilation, it is freed from the
317 extremely complex task of faithfully implementing languages
318 which are often not fully or formally specified (such as C and
319 C++), and is able to support any langage for which a
320 MIPS-targeted compiler exists.
322 \item {\bf Bug-for-bug compiler compatability}
324 Since NestedVM uses the compiler's {\it output} as its own {\it
325 input}, it ensures that programs which are inadvertently
326 dependent on the vagaries of a particular compiler can still be
329 \item {\bf No build process modifications}
331 NestedVM does not modify existing build processes, which can be
332 extremely complex and dependent on strange preprocessor usage as
333 well as the complex interplay between compiler switches and
334 header file locations.
338 NestedVM's approach carries a fourth benefit as well, arising from its
342 \item {\bf No post-translation human intervention}
344 NestedVM offers total support for all non-privileged
345 instructions, registers, and resources found on a MIPS {\tt
346 R2000} CPU, including the add/multiply unit and floating point
347 coprocessor. As such, it constitutes a total function mapping
348 from the entire domain of (non-kernel-mode) programs onto (a
349 subset of) the semantics of the Java Virtual Machine. This
350 ensures that the translation process is fully automated and
351 always succeeds for valid input binaries.
354 This is a much more important factor than is obvious at first glance.
355 If post-translation human intervention is required, then the {\it
356 human becomes part of the build process}. This means that if a third
357 party library used in the project needs to be upgraded, {\it a human
358 must intervene} in the rebuild process. In addition to slowing the
359 process and introducing opportunities for error, this task often
360 requires specialized knowledge which becomes tied to the particular
361 individual performing this task, rather than being encoded in build
362 scripts which persist throughout the lifetime of the project.
364 \subsection{Why MIPS?}
366 We chose MIPS as a source format for three reasons: the availability
367 of tools to compile legacy code into MIPS binaries, the close
368 similarity (FIXME: explain) between the MIPS ISA and the Java Virtual
369 Machine, and the relatively high degree of program structure that can
370 be inferred from ABI-adherent binaries.
372 The MIPS architecture has been around for quite some time, and is well
373 supported by the GNU Compiler Collection, which is capable of
374 compiling C, C++, Java, Fortran, Pascal, and Objective C
377 The MIPS R2000 ISA bears a striking similarity to the Java Virtual
382 \item Most of the instructions in the original MIPS ISA operate only
383 on 32-bit aligned memory locations. This allows NestedVM to
384 represent memory as a Java {\tt int[]} array without introducing
385 additional overhead. The remaining non-aligned memory load
386 instructions are only rarely emitted by most compilers since
387 they carry a performance penalty on physical MIPS
390 \item Unlike its predecessor, the R2000 supports 32-bit by 32-bit
391 multiply and divide instructions as well as a single and double
392 precision floating point unit. These capabilities map nicely
393 onto Java's arithmetic instructions.
395 \item Although MIPS offers unsigned arithmetic and Java does not, few
396 MIPS instructions actually depend on non-two's-complement
397 handling of integer math. Furthermore, since most high-level
398 languages such as C do not expose access to arithmetic-overflow
399 exceptions, these instructions are rarely found except in
400 hand-coded assembler. In the few situations where these
401 instructions {\it are} encountered, the {\tt unsigned int} is
402 cast (bitwise) to a Java {\tt long}, the operation is performed,
403 and the result is cast back. On architectures offering 64-bit
404 integer math this conversion carries no overhead.
408 Finally, the MIPS ISA and ABI convey quite a bit of information about
409 program structure. This information can be used for optimization
414 \item The structure of MIPS branching and jump instructions make it
415 easy to infer the set of likely target instructions.
417 \item The MIPS ABI specifies particular registers as caller-save and
418 callee-save, as well as designating a register for the return
419 address after a function call. This allows NestedVM to optimize
420 many operations for the common case of ABI-adherent binaries.
422 \item All MIPS instructions are exactly 32 bits long.
428 \subsection{Binary-to-Source}
430 The simplest operational mode for NestedVM is binary-to-source
431 translation. In this mode, NestedVM translates MIPS binaries into
432 Java source code, which is then fed to a Java compiler in order to
433 produce bytecode files:
436 \newlength{\MyLength}
437 \settowidth{\MyLength}{xmachine codex}
438 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
439 \psmatrix[colsep=2,rowsep=0,nrot=:U]
442 [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source} \\[0pt]
448 [name=b0]\MyBox{machine code} & [name=b1]\MyBox{safe bytecode} \\[0pt]
449 \psset{nodesep=5pt,arrows=->}
450 \ncline{s0}{b0}\bput{:U}{\tt gcc}
451 \ncline{s1}{b1}\aput{:U}{\tt javac}
452 \ncline{b0}{s1}\naput{\tt NestedVM}
457 \begin{minipage}[c]{7in}%
459 {\footnotesize\begin{verbatim}
460 private final static int r0 = 0;
461 private int r1, r2, r3,...,r30;
462 private int r31 = 0xdeadbeef;
463 private int pc = ENTRY_POINT;
492 System.err.println(``Exited.'');
499 {\footnotesize\begin{verbatim}
500 public void run_0x10000() {
517 pubic void run_0x10200() {
528 public void trampoline() {
530 switch(pc&0xfffffe00) {
531 case 0x10000: run_0x10000(); break;
532 case 0x10200: run_0x10200(); break;
541 \caption{\label{code1} Trampoline transformation necessitated by Java's 64kb method size limit}
544 Translating unsafe code for use within a JVM proceeds as follows:
548 \item Compile the source code to a statically linked binary, targeting
549 the MIPS R2000 ISA. Typically this will involve linking against
550 {\tt libc}, which translates system requests (such as {\tt
551 open()}, {\tt read()}, or {\tt write()}) into appropriate
552 invocations of the MIPS {\tt SYSCALL} instruction. Any other
553 libraries which are not tied to a particular OS kernel can be
554 linked in (even in binary form) using standard linker commands.
556 \item Invoke {\tt NestedVM} on the statically linked binary.
558 \item Compile the resulting {\tt .java} code using {\tt jikes}
559 \cite{jikes} or {\tt javac}.
561 \item From java code, invoke the {\tt run()} method on the generated
562 class. This is equivalent to the {\tt main()} entry point.
566 \subsubsection{Optimizations}
568 Generating Java source code instead of bytecode frees NestedVM from
569 having to perform simple constant propagation optimizations, as most
570 Java compilers already do this. A recurring example is the treatment
571 of the {\tt r0} register, which is fixed as {\tt 0} in the MIPS ISA.
573 Lacking the ability to generate specially optimized bytecode
574 sequences, a straightforward mapping of the general purpose hardware
575 registers to 32 {\tt int} fields turned out to be optimal.
578 \epsfig{file=charts/chart1,width=3in}
580 Unfortunately, Java imposes a 64kb limit on the size of the bytecode
581 for a single method. This presents a problem for NestedVM, and
582 necessitates a {\it trampoline transformation}, as shown in
583 Figure~\ref{code1}. With this trampoline in place, large binaries can
584 be handled without much difficulty -- fortunately, there is no
585 corresponding limit on the size of a classfile as a whole.
587 One difficulty that arose as a result of using the trampoline
588 transformation was the fact that {\tt javac} and {\tt jikes} are
589 unable to properly optimize its switch statements. For example, the
590 following code is compiled into a comparatively inefficient {\tt
595 switch(pc&0xffffff00) {
596 case 0x00000100: run_100(); break;
597 case 0x00000200: run_200(); break;
598 case 0x00000300: run_300(); break;
602 Whereas the next block of code code optimized into a {\tt
614 This problem was surmounted by switching on a denser set of {\tt case}
615 values, which is more amenable to the {\tt TABLESWITCH} structure.
616 This change alone nearly doubled the speed of the compiled binary.
618 The next performance improvement came from tuning the size of the
619 methods invoked from the trampoline. Trial and error led to the
620 onclusion that HotSpot \cite{hotspot} -- the most widely deployed JVM
621 -- performs best when 128 MIPS instructions are mapped to each method.
623 \epsfig{file=chart5,width=3in}
625 \epsfig{file=chart6,width=3in}
627 This phenomenon is due to two factors:
631 \item While the trampoline method's {\tt switch} statement can be
632 coded as a {\tt TABLESWITCH}, the {\tt switch} statement
633 within the individual methods is to sparse to encode this way.
635 \item Hybrid Interpretive-JIT compilers such as HotSpot generally
636 favor smaller methods since they are easier to compile and are
637 better candidates for compilation in ``normal'' programs (unlike
642 After tuning method sizes, our next performance boost came from
643 eliminating exraneous case branches. Having case statements before
644 each instruction prevents JIT compilers from being able to optimize
645 across instruction boundaries, since control flow can enter the body
646 of a {\tt switch} statement at any of the {\tt case}s. In order to
647 eliminate unnecessary case statements we needed to identify all
648 possible jump targets. Jump targets can come from three sources:
652 \item {\bf The {\tt .text} segment}
654 Every instruction in the text segment is scanned, and every
655 branch instruction's destination is added to the list of
656 possible branch targets. In addition, any function that sets
657 the link register is added to the list \footnote{actually {\tt addr+8}}.
658 Finally, combinations of {\tt LUI} (Load Upper Immediate) and
659 {\tt ADDIU} (Add Immediate Unsigned) are scanned for possible
660 addresses in the {\tt .text} segment since this combination of
661 instructions is often used to load a 32-bit word into a
664 \item {\bf The {\tt .data} segment}
666 When compiling {\tt switch} statements, compilers often use a
667 jump table stored in the {\tt .data} segment. Unfortunately
668 they typically do not identify these jump tables in any way.
669 Therefore, the entire {\tt .data} segment is conservatively
670 scanned for possible addresses in the {\tt .text} segment.
672 \item {\bf The symbol table}
674 This is mainly used as a backup. Scanning the {\tt .text} and
675 {\tt .data} segments should identify any possible jump targets;
676 however, adding all function symbols in the ELF symbol table
677 also catches functions that are never called directly from the
678 MIPS binary, such as those invoked only via the NestedVM
679 runtime's {\tt call()} method.
683 Eliminating unnecessary {\tt case} statements provided a 10-25\% speed
686 Despite all the above optimizations, one insurmountable obstacle
687 remained: the Java {\tt .class} file format limits the constant pool
688 to 65535 entries. Every integer literal greater than {\tt 32767}
689 requires an entry in this pool, and each branch instruction generates
692 One suboptimal solution was to express constants as offsets from a few
693 central values; for example ``{\tt pc~=~N\_0x00010000~+~0x10}'' (where
694 {\tt N\_0x000100000} is a non-final field to prevent {\tt javac} from
695 inlining it). This was sufficient to get reasonably large binaries to
696 compile, and caused only a small (approximately 5\%) performance
697 degredation and a similarly small increase in the size of the {\tt
698 .class} file. However, as we will see in the next section, compiling
699 directly to {\tt .class} files (without the intermediate {\tt .java}
700 file) eliminates this problem entirely.
703 \subsection{Binary-to-Binary}
705 After implementing the binary-to-source compiler, a binary-to-binary
706 translation mode was added.
709 \newlength{\MyLength}
710 \settowidth{\MyLength}{xmachine codex}
711 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
712 \psmatrix[colsep=2,rowsep=0,nrot=:U]
714 [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source} \\[0pt]
720 [name=b0]\MyBox{machine code} & [name=b1]\MyBox{safe bytecode} \\[0pt]
722 \psset{nodesep=5pt,arrows=->}
723 \ncline{s0}{b0}\bput{:U}{\tt gcc}
724 \ncline{b0}{b1}\naput{\tt NestedVM}
728 This mode has several advantages:
732 \item There are quite a few interesting bytecode sequences that cannot
733 be generated as a result of compiling Java source code.
735 \item Directly generating {\tt .class} files Eliminates the
736 time-consuming {\tt javac} step.
738 \item Direct compilation to {\tt .class} files opens up the
739 interesting possibility of dynamically translating MIPS binaries
740 and loading them via {\tt ClassLoader.fromBytes()} {\it at
741 deployment time}, eliminating the need to compile binaries ahead
746 Most of the performance improvemen where made where in the handling of
747 branch instructions and in taking advantage of the JVM stack to
748 eliminate unnecessary {\tt LOAD}s and {\tt STORE}s to local variables.
750 \epsfig{file=chart7,width=3in}
752 The first optimization gained by direct bytecode generation came from
753 the use of the JVM {\tt GOTO} instruction. Despite the fact that the
754 Java {\it language} does not have a {\tt goto} keyword, the VM does in
755 fact have a corresponding instruction which is used quite heavily by
756 {\tt javac}. NestedVM's binary-to-binary mode exploits this
757 instruction to avoid emitting inefficient {\tt switch..case}
760 Related to the {\tt GOTO} instruction is branch statement
761 optimization. When emitting source code, NestedVM translates branches
762 into Java source code like this:
764 {\footnotesize\begin{verbatim}
771 This requires a branch in the JVM {\it regardless} of whether the MIPS
772 branch is actually taken. If {\tt condition} is false the JVM has to
773 jump over the code to set {\tt pc} and go back to the {\tt switch}
774 statemenmt; if {\tt condition} is true the JVM has to jump to the {\tt
775 switch} block. By generating bytecode directly, NestedVM is able to
776 emit a JVM bytecode branching directly to the address corresponding to
777 the target of the MIPS branch. In the case where the branch is not
778 taken the JVM doesn't branch at all.
780 A side effect of the previous two optimizations is a solution to the
781 excess constant pool entries problem. When jumps are implemented as
782 {\tt GOTO}s and branches are taken directly, the {\tt pc} field does
783 not need to be set. This eliminates a huge number of constant pool
784 entries. The {\tt .class} file constant pool size limit is still
785 present, but it is less likely to be encountered.
787 Implementation of the MIPS delay slot offers another opportunity for
788 bytecode-level optimization. In order to take advantage of
789 instructions already in the pipeline, the MIPS ISA specifies that the
790 instruction after a jump or branch is always executed, even if the
791 jump/branch is taken. This instruction is referred to as the ``delay
792 slot\footnote{Newer MIPS CPUs have pipelines that are much larger than
793 early MIPS CPUs, so they have to discard instructions anyways}.'' The
794 instruction in the delay slot is actually executed {\it before} the
795 branch is taken. To further complicate matters, values from the
796 register file are loaded {\it before} the delay slot is executed.
798 Fortunately there is a very elegent solution to this problem which can
799 be expressed in JVM bytecode. When a branch instruction is
800 encountered, the registers needed for the comparison are pushed onto
801 the stack to prepare for the JVM branch instruction. Then, {\it
802 after} the values are on the stack the delay slot instruction is
803 emitted, followed by the actual JVM branch instruction. Because the
804 values were pushed to the stack before the delay slot was executed, any
805 changes the delay slot made to the registers are not visible to the
808 One final advantage that generating bytecode directly allows is a
809 reduction in the size of the ultimate {\tt .class} file. All the
810 optimizations above lead to more compact bytecode as a beneficial side
811 effect; in addition, NestedVM performs a few additional optimizations.
813 When encountering the following {\tt switch} block, both {\tt javac}
814 and {\tt jikes} generate redundant bytecode.
816 {\footnotesize\begin{verbatim}
818 case 0x1: run_1(); break;
819 case 0x2: run_2(); break
821 case 0x100: run_100(); break;
825 The first bytecode in each case arm in the switch statement is {\tt
826 ALOAD\_0} to prepare for a {\tt INVOKESPECIAL} call. By simply
827 lifting this bytecode outside of the {\tt switch} statement, each {\tt
828 case} arm shrinks by one instruction.
830 \subsubsection{Compiler Flags}
832 Although NestedVM perfectly emulates a MIPS R2000 CPU, its performance
833 profile is nothing like that of actual silicon. In particular, {\tt
834 gcc} makes several optimizations that increase performance on an
835 actually MIPS CPU but actually decrease the performance of
836 NestedVM-generated bytecode. We found the following compiler options
837 could be used to improve performance:
841 \item {\tt -falign-functions}
843 Normally a function's location in memory has no effect on its
844 execution speed. However, in the NestedVM binary translator,
845 the {\tt .text} segment is split on power-of-two boundaries. If
846 a function starts near the end of one of these boundaries, a
847 performance critical part of the function winds up spanning two
848 Java methods. Telling {\tt gcc} to align all functions along
849 these boundaries decreases the chance of this sort of splitting.
851 \item {\tt -fno-rename-registers}
853 On an actual silicon chip, using additional registers carries no
854 performance penalty (as long as none are spilled to the stack).
855 However, when generating bytecode, using {\it fewer}
856 ``registers'' helps the JVM optimize the machine code it
857 generates by simplifying the constraints it needs to deal with.
858 Disabling register renaming has this effect.
860 \item {\tt -fno-schedule-insns}
862 Results of MIPS load operations are not available until {\it
863 two} instructions after the load. Without the {\tt
864 -fno-schedule-insns} instruction, {\tt gcc} will attempt to
865 reorder instructions to do other useful work during this period
866 of unavailability. NestedVM is under no such constraint, so
867 removing this reordering typically generates simpler machine
872 Enabling this instruction causes {\tt gcc} to use the system
873 {\tt memcpy()} routine instead of generating loads and stores.
874 As explained in the next section, the NestedVM runtime
875 implements {\tt memcpy()} using {\tt System.arraycopy()}, which
876 is substantially more efficient.
878 \item {\tt -ffunction-sections -fdata-sections}
880 These two options are used in conjunction with the {\tt
881 --gc-section} linker option, prompting the linker to more
882 aggressively prune dead code.
886 The effects of the various optimizations presented in this chapter are
887 summarized in the table below.
889 \epsfig{file=chart4,width=3in}
891 \epsfig{file=chart3,width=3in}
893 \section{Experiences}
895 \subsection{FreeType, {\tt libmspack}, and {\tt libjpeg}}
897 The Ibex Project utilizes three libraries for which no Java-only
898 equivalent exists. The first is the FreeType font library, which
899 parses, hints, and rasterizes TrueType and Postscript fonts with
900 exceptional quality. The project also needed an open source JPEG
901 decompressor; surprisingly, none exist for Java. While encoders are
904 These three libraries make minimal use of the standard library and OS
905 services, and are all written in very portable ANSI C code, which made
906 them easy targets for initial development.
908 \subsection{The GNU Compiler Collection}
910 Our next target, {\tt gcc}, was chosen in order to relieve developers
911 from the time-consuming and complex task of building a compiler
912 themselves. The Ibex Project requires a specially configured and
913 patched version of {\tt gcc} and its ahead-of-time Java compiler ({\tt
914 gcj}) which is not distributed in binary form.
916 GCC was the first ``major'' application NestedVM was used on, and
917 drove the development of most of the system library interface
918 development; particularly support for {\tt fork()} and {\tt exec()},
919 which require the NestedVM Runtime to perform binary-to-bytecode
920 translation on the fly.
922 GCC also makes extensive use of 64-bit integers ({\tt long long}),
923 which -- for performance reasons -- are typically manipulated using
924 nonobvious instruction sequences on the 32-bit MIPS architecture.
925 Dealing with these operations uncovered a number of bugs in the
928 Despite our original goal, we have not yet been able to translate the
929 {\tt C++} or Java front-ends, as the resulting binary produces a
930 trampoline which exceeds the maximum size of a single method. Future
931 work will explore a multi-level trampoline to address this issue.
935 \subsection{\TeX and LINPACK}
937 In order to distinguish NestedVM from other single-language
938 translators for the JVM, we undertook the task of translating \TeX 89
939 (written in Pascal) and the Fortran source code for LINPACK into Java
942 Although actually producing the initial MIPS binaries from the \TeX\
943 source code turned out to be an exceptionally tedious and frustrating
944 task, the resulting binary translated and executed perfectly on the
945 first run, as did LINPACK. Our reward for this effort was typesetting
946 our presentation of NestedVM using NestedVM itself. We have also had
947 initial successes running \TeX\ in a Java Applet, and intend to
948 produce a {\tt jar} for embedding \TeX\ code (``\TeX lets'') in web
949 pages without the use of a post-processing tool.
951 The LINPACK benchmark called our attention to Java's lack of an API
952 for checking the ``cpu time'' of a process. Unfortunately we had to
953 substitute wall-clock time on an otherwise-quiescent machine as an
958 \section{The NestedVM Runtime}
960 In addition to binary-to-source and binary-to-binary translation,
961 NestedVM also includes a MIPS binary interpreter. All three
962 translation approaches expose the same API to both the translated
963 binary and the surrounding VM (including peer Java code).
965 \subsection{The Runtime Class}
967 The runtime fulfills four roles:
971 \item It provides a simple, consistent external interface. The method
972 of actually executing the code (currently only translated
973 binaries and the interpreter) can be changed without any code
974 changes to the caller because only runtime exposes a public
975 interface. This includes methods to pass arguments to the
976 binary's {\tt main()} function, read and write from memory, and
977 call individual functions in the binary.
979 \item It manages the process's memory. The runtime class contains
980 large {\tt int[]} arrays that represent the process`s entire
981 memory space. Subclasses read and write to these arrays as
982 required by the instructions they are executing, and can expand
983 their memory space using the {\tt sbrk} system call.
985 \item The runtime provides access to the host file system and standard
986 I/O streams. Subclasses of {\tt runtime} can access the file
987 system through standard UNIX syscalls ({\tt read()}, {\tt
988 write()}, {\tt open()}, etc). The runtime manages the file
989 descriptor table that maps UNIX file descriptors to Java {\tt
990 RandomAccessFile}s, {\tt InputStream}s, {\tt OutputStream}s, and
993 \item It provides general OS services, including {\tt sleep()}, {\tt
994 gettimeofday()}, {\tt getpagesize()}, {\tt sysconf()}, {\tt
999 \section{Future Directions}
1001 Although we have only implemented it for the Java Virtual Machine, our
1002 technique generalizes to other safe bytecode architectures. In
1003 particular we would like to demonstrate this generality by retargeting
1004 the translator to the Microsoft Intermediate Language \cite{msil}.
1006 Additionally, we would like to explore other uses for dynamic loading
1007 of translated MIPS binaries by combining NestedVM (which itself is
1008 written in Java) and the {\tt ClassLoader.fromBytes()} mechanism.
1011 \section{Conclusion}
1013 We have presented a novel technique for using libraries written in
1014 unsafe languages within a safe virtual machine without resorting to
1015 native interfaces. We have implemented this technique in NestedVM,
1016 which is currently used by the Ibex project\footnote{{\tt
1017 http://www.ibex.org}} to perform font rasterization (via {\tt
1018 libfreetype}), JPEG decoding (via {\tt libjpeg}), and CAB archive
1019 extraction (via {\tt libmspack}), three libraries for which no
1020 equivalent Java classes exist.
1022 NestedVM is available under an open source license, and can be
1025 http://nestedvm.ibex.org
1029 \section{Appendix: Testing Methodology}
1031 All times are measured in seconds. These were all run on a dual 1Ghz
1032 Macintosh G4 running Apple's latest JVM (Sun HotSpot JDK 1.4.1). Each
1033 test was run 8 times within a single VM. The highest and lowest times
1034 were removed and the remaining 6 were averaged. In each case only the
1035 first run differed significantly from the rest.
1037 The {\tt libjpeg} test consisted of decoding a 1280x1024 jpeg and
1038 writing a tga. The {\tt mspack} test consisted of extracting all
1039 members from {\tt arial32.exe}, {\tt comic32.exe}, {\tt times32.exe},
1040 and {\tt verdan32.exe}. The {\tt libfreetype} test consisted of
1041 rendering ASCII characters 32-127 of {\tt Comic.TTF} at sizes from 8
1042 to 48 incrementing by 4 for a total of 950 glyphs.
1044 \bibliography{nestedvm}