doc/nestedvm.ivme04.tex

   1 \documentclass{acmconf}
   2 \usepackage{graphicx}
   3 \usepackage{multicol}
   4 \usepackage{amssymb,amsmath,epsfig,alltt}
   5 \sloppy
   6 \usepackage{palatino}
   7 \usepackage{pdftricks}
   8 \begin{psinputs}
   9   \usepackage{pstricks}
  10   \usepackage{pst-node}
  11 \end{psinputs}
  12 \usepackage{parskip}
  13 \usepackage{tabularx}
  14 \usepackage{alltt}
  15 \bibliographystyle{amsplain}
  16
  17 \title{\textbf{\textsf{
  18 Complete Translation of Unsafe Native Code to Safe Bytecode
  19 }}}
  20 \date{}
  21 \author{\begin{tabular}{@{}c@{}}
  22         {\em {Brian Alliet}} \\
  23         {Rochester Institute of Technology}\\
  24         {\tt bja8464@cs.rit.edu}
  25    \end{tabular}\hskip 1in\begin{tabular}{@{}c@{}}
  26         {\em {Adam Megacz}} \\
  27         {University of California, Berkeley} \\
  28         {\tt megacz@cs.berkeley.edu}
  29 \end{tabular}}
  30 \begin{document}
  31
  32 \maketitle
  33
  34 \begin{abstract}
  35
  36 Most existing techniques for using code written in an unsafe language
  37 within a safe virtual machine involve transformations from one source
  38 code language (such as C) to another (such as Java) and then to
  39 virtual machine bytecodes.  We present an alternative approach which
  40 uses a standard compiler to turn unsafe source code into unsafe MIPS
  41 binaries, which are then translated into safe virtual machine
  42 bytecodes.  This approach offers four key advantages over existing
  43 techniques:
  44
  45 \begin{itemize}
  46 \item Total coverage of all language features
  47 \item No post-translation human intervention
  48 \item No build process modifications
  49 \item Bug-for-bug compiler compatability
  50 \end{itemize}
  51
  52 We have implemented this technique in NestedVM, a binary-to-source and
  53 binary-to-binary translator targeting the Java Virtual Machine.
  54 NestedVM-translated versions of the {\tt libfreetype}, {\tt libjpeg},
  55 and {\tt libmspack} libraries are currently in production use.
  56 Performance measurements indicate a best case performance within 3x of
  57 native code and worst case typically within 10x, making it an
  58 attractive solution for code which is not performance-critical.
  59
  60 \end{abstract}
  61
  62 \section{Introduction}
  63
  64 Unsafe languages such as C \cite{KR} and C++ \cite{soustroup} have
  65 been in use much longer than any of today's widely accepted safe
  66 languages such as Java \cite{java} and C\# \cite{csharp}.  Consequently, there is
  67 a huge library of software written in these languages.  Although safe
  68 languages offer substantial benefits, their comparatively young age
  69 often puts them at a disadvantage when breadth of existing support
  70 code is an important criterion.
  71
  72 The typical solution to this dilemma is to use a native interface such
  73 as JNI \cite{jni} or CNI \cite{cni} to invoke unsafe code from within a
  74 virtual machine or otherwise safe environment.  Unfortunately, there
  75 are a number of situations in which this is not an acceptable
  76 solution.  These situations can be broadly classified into two
  77 categories: {\it security concerns} and {\it portability concerns}.
  78
  79 Using Java as an example, JNI and CNI are prohibited in a number of
  80 contexts, including applets environments and servlet containers with a
  81 {\tt SecurityManager}.  Additionally, even in the context of trusted
  82 code, {\tt native} methods invoked via JNI are susceptible to buffer
  83 overflow and heap corruption attacks which are not a concern for
  84 verified bytecode.
  85
  86 The second class of disadvantages revolves around portability
  87 concerns; native interfaces require the native library to be compiled
  88 ahead of time, for every architecture on which they will be
  89 deployed.  This is unworkable for situations in which the full set of
  90 target architectures is not known at deployment time.  Additionally,
  91 some JVM platform variants such as J2ME \cite{j2me} simply do not offer
  92 support for native code.
  93
  94 The technique we present here uses typical compiler to compile unsafe
  95 code into a MIPS binary, which is then translated on an
  96 instruction-by-instruction basis into Java bytecode.  The technique
  97 presented here is general; we anticipate that it can be applied to
  98 other secure virtual machines such as Microsoft's .NET \cite{msil}, Perl
  99 Parrot \cite{parrot}, or Python bytecode \cite{python}.
 100
 101 \section{Approaches to Translation}
 102
 103 The four program representations of interest in this context, along
 104 with their specific types in the C-to-JVM instantiation of the
 105 problem are shown in the following diagram:
 106
 107 \begin{pdfpic}
 108 \newlength{\MyLength}
 109 \settowidth{\MyLength}{machine code}
 110 \newcommand{\MyBox}[1]{\makebox[\MyLength][c]{#1}}
 111 \begin{psmatrix}[colsep=2,rowsep=0]
 112   & \\[0pt]
 113   [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source}   \\[0pt]
 114   [name=s00]\MyBox{\tt (.c)} & [name=s11]\MyBox{\tt (.java)}   \\[0pt]
 115   & \\[0pt]
 116   & \\[0pt]
 117   & \\[0pt]
 118   [name=b0]\MyBox{machine code}  & [name=b1]\MyBox{safe bytecode} \\[0pt]
 119   [name=b00]\MyBox{\tt (.o)}  & [name=b11]\MyBox{\tt (.class)} \\
 120   & \\[0pt]
 121   \psset{nodesep=5pt,arrows=->}
 122 \end{psmatrix}
 123 \end{pdfpic}
 124
 125 To illustrate the context of this diagram, the following arcs show the
 126 translations performed by a few familiar tools:
 127
 128 \begin{pdfpic}
 129 \newlength{\MyLength}
 130 \settowidth{\MyLength}{xmachine codex}
 131 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
 132 \psmatrix[colsep=2,rowsep=0,nrot=:D]
 133   & \\[0pt]
 134   [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source}   \\[0pt]
 135   & \\[0pt]
 136   & \\[0pt]
 137   & \\[0pt]
 138   & \\[0pt]
 139   & \\[0pt]
 140   [name=b0]\MyBox{machine code}  & [name=b1]\MyBox{safe bytecode} \\[0pt]
 141   & \\[0pt]
 142   \psset{nodesep=5pt,arrows=->}
 143   \ncline{s0}{b0}\bput{:U}{\tt gcc}
 144   \ncline{s1}{b0}\bput{:D}{\tt gcj}
 145   \ncline{s1}{b1}\aput{:U}{\tt javac}
 146   \ncline{b1}{b0}\aput{:D}{\tt gcj}\bput{:D}{JITs}
 147 \endpsmatrix
 148 \end{pdfpic}
 149
 150 Techniques for translating unsafe code into VM bytecode generally fall
 151 into four categories, which we expand upon in the next two sections:
 152
 153 \begin{itemize}
 154 \item source-to-source translation
 155 \item source-to-binary translation
 156 \item binary-to-source translation
 157 \item binary-to-binary translation
 158 \end{itemize}
 159
 160 \section{Existing Work}
 161
 162 \subsection{Source-to-Source Translation}
 163
 164 The most common technique employed is partial translation from unsafe
 165 source code to safe source code:
 166
 167 \begin{pdfpic}
 168 \newlength{\MyLength}
 169 \settowidth{\MyLength}{xmachine codex}
 170 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
 171 \psmatrix[colsep=2,rowsep=0,nrot=:U]
 172   & \\[0pt]
 173   & \\[0pt]
 174   [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source}   \\[0pt]
 175   & \\[0pt]
 176   & \\[0pt]
 177   & \\[0pt]
 178   & \\[0pt]
 179   & \\[0pt]
 180   [name=b0]\MyBox{machine code}  & [name=b1]\MyBox{safe bytecode} \\[0pt]
 181   & \\[0pt]
 182   \psset{nodesep=5pt,arrows=->}
 183   \ncline{s0}{s1}\aput{:U}{source-to}\bput{:U}{source}
 184   \ncline{s1}{b1}\aput{:U}{\tt javac}
 185 \endpsmatrix
 186 \end{pdfpic}
 187
 188 A number of existing systems employ this technique; they can
 189 be divided into two categories: those which perform a partial
 190 translation which is completed by a human, and those which perform a
 191 total translation but fail (yield an error) on a large class of input
 192 programs.
 193
 194
 195 \subsubsection{Incomplete Translation}
 196
 197 Jazillian \cite{jazillian} is a commercial solution which produces
 198 extremely readable Java source code from C source code, but ony
 199 translates a small portion of the C language.  Jazillian is unique in
 200 that in addition to {\it language migration}, it also performs {\it
 201 API migration}; for example, Jazillian is intelligent enough
 202 to translate {\tt char*~s1~=~strcpy(s2)} into {\tt String~s1~=~s2}.
 203
 204 Unfortunately such deep analysis is intractible for most of the C
 205 language and standard library; Jazillian's documentation notes that
 206 {\it ``This is not your father's language translator.  It's not
 207 generating ugly code that's guaranteed to work out of the
 208 box... Jazillian does not always produce code that works correctly.''}
 209
 210 MoHCA-Java \cite{mohca} is the other major tool in this category, and steps
 211 beyond Jazillian by providing tools for analysis of the source C++
 212 abstract syntax tree.  Additionally, MoHCA-Java's analysis engine is
 213 extensible, making it a platform for constructing application-specific
 214 translators rather than a single translation tool.  However,
 215 MoHCA-Java does not always generate complete Java code for all of the C++
 216 programs which it accepts.
 217
 218
 219 \subsubsection{Partial Domain Translation}
 220
 221 The c2j \cite{c2j}, c2j++ \cite{c2jpp}, Cappucinno \cite{capp},
 222 and Ephedra \cite{ephedra} systems each provide support for complete
 223 translation of a {\it subset} of the source language (C or C++).  Each
 224 of the four tools supports a progressively greater subset than the one
 225 preceding it; however none covers the entire input language.
 226
 227 Ephedra, the most advanced of the four, supports most of the C++
 228 language, and claims to produce ``human readable'' Java code as
 229 output.  Notable omissions from the input domain include support for
 230 fully general pointer arithmetic, casting between unrelated types, and
 231 reading from a {\tt union} via a different member than the one most
 232 recently written.
 233
 234 Unfortunately, when the program being translated is large and complex,
 235 it is quite likely that it will use an unsupported feature in at least
 236 one place.  In the absence of a programmer who understands the source
 237 program, a single anomoly is often enough to render the entire
 238 translation process useless.  As a result, these tools are mainly
 239 useful as an {\it aid} to programmers who could normally perform the
 240 conversion themselves, but want to save time by automating most of the
 241 process.
 242
 243
 244 \subsection{Source-to-Binary Translation}
 245
 246 Source-to-binary translation involves a compiler for the unsafe
 247 language which has been modified to emit safe bytecode.
 248
 249 \begin{pdfpic}
 250 \newlength{\MyLength}
 251 \settowidth{\MyLength}{xmachine codex}
 252 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
 253 \psmatrix[colsep=2,rowsep=0,nrot=:U]
 254   & \\[0pt]
 255   [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source}   \\[0pt]
 256   & \\[0pt]
 257   & \\[0pt]
 258   & \\[0pt]
 259   & \\[0pt]
 260   & \\[0pt]
 261   [name=b0]\MyBox{machine code}  & [name=b1]\MyBox{safe bytecode} \\[0pt]
 262   & \\[0pt]
 263   \psset{nodesep=5pt,arrows=->}
 264   \ncline{s0}{b1}\bput{:U}{source-to-binary}
 265 \endpsmatrix
 266 \end{pdfpic}
 267
 268 The primary occupant of this category is {\tt egcs-jvm}
 269 \cite{egcsjvm}, an experimental ``JVM backend'' for the GNU Compiler
 270 Collection ( {\tt gcc} ) \cite{gcc}.  Since {\tt gcc} employs a highlym
 271 odular architecture, it {\it is} possible to add RTL code generators
 272 for nonstandard processors.  However, {\tt gcc}'s parsing, RTL
 273 generation, and optimization layers make fundamental assumptions (such
 274 as the availability of pointer math) which cannot be directly
 275 supported; thus the compiler still fails for a substantial class of
 276 input programs.
 277
 278
 279
 280 \section{NestedVM}
 281
 282 The principal difference between NestedVM and other approaches is that
 283 NestedVM {\it does not} attempt to deal with source code as an input.
 284 This leads immediately to three advantages:
 285
 286 \begin{itemize}
 287 \item {\bf Total coverage of all language features}
 288
 289       Because NestedVM does not attempt to implement the parsing and
 290       code generation steps of compilation, it is freed from the
 291       extremely complex task of faithfully implementing languages
 292       which are often not fully or formally specified (such as C and
 293       C++).
 294
 295 \item {\bf No build process modifications}
 296
 297       NestedVM does not modify existing build processes, which can be
 298       extremely complex and dependent on strange preprocessor usage as
 299       well as the complex interplay between compiler switches and
 300       header file locations.
 301
 302 \item {\bf Bug-for-bug compiler compatability}
 303
 304       Since NestedVM uses the compiler's {\it output} as its own {\it
 305       input}, it ensures that programs which are inadvertently
 306       dependent on the vagaries of a particular compiler can still be
 307       used.
 308
 309 \end{itemize}
 310
 311 NestedVM's approach carries a fourth benefit as well, arising from its
 312 totality:
 313
 314 \begin{itemize}
 315 \item {\bf No post-translation human intervention}
 316
 317       NestedVM offers total support for all non-privileged
 318       instructions, registers, and resources found on a MIPS {\tt
 319       R2000} CPU, including the add/multiply unit and floating point
 320       coprocessor.  As such, it constitutes a total function mapping
 321       from the entire domain of (non-kernel-mode) programs onto (a
 322       subset of) the semantics of the Java Virtual Machine.  This
 323       ensures that the translation process is fully automated and
 324       always succeeds for valid input binaries.
 325 \end{itemize}
 326
 327 This is a much more important factor than is obvious at first glance.
 328 If post-translation human intervention is required, then the {\it
 329 human becomes part of the build process}.  This means that if a third
 330 party library used in the project needs to be upgraded, {\it a human
 331 must intervene} in the rebuild process.  In addition to slowing the
 332 process and introducing opportunities for error, this task often
 333 requires specialized knowledge which becomes tied to the particular
 334 individual performing this task, rather than being encoded in build
 335 scripts which persist throughout the lifetime of the project.
 336
 337 \subsection{Why MIPS?}
 338
 339 We chose MIPS as a source format for three reasons: the availability
 340 of tools to compile legacy code into MIPS binaries, the close
 341 similarity between the MIPS ISA and the Java Virtual Machine, and the
 342 relatively high degree of program structure that can be inferred from
 343 ABI-adherent binaries.
 344
 345 The MIPS architecture has been around for quite some time, and is well
 346 supported by the GNU Compiler Collection, which is capable of
 347 compiling C, C++, Java, Fortran, Pascal, and Objective C
 348 into MIPS binaries.
 349
 350 The MIPS R2000 ISA bears a striking similarity to the Java Virtual
 351 Machine:
 352
 353 \begin{itemize}
 354
 355 \item Most of the instructions in the original MIPS ISA operate only
 356       on 32-bit aligned memory locations. This allows NestedVM to
 357       represent memory as a Java {\tt int[]} array without introducing
 358       additional overhead.  The remaining non-aligned memory load
 359       instructions are only rarely emitted by most compilers since
 360       they carry a performance penalty on physical MIPS
 361       implementations.
 362
 363 \item Unlike its predecessor, the R2000 supports 32-bit by 32-bit
 364       multiply and divide instructions as well as a single and double
 365       precision floating point unit.  These capabilities map nicely
 366       onto Java's arithmetic instructions.
 367
 368 \end{itemize}
 369
 370 Finally, the MIPS ISA and ABI convey quite a bit of information about
 371 program structure.  This information can be used for optimization
 372 purposes:
 373
 374 \begin{itemize}
 375
 376 \item The structure of MIPS branching and jump instructions make it
 377       easy to infer the set of likely target instructions.
 378
 379 \item The MIPS ABI specifies particular registers as caller-save and
 380       callee-save, as well as designating a register for the return
 381       address after a function call.  This allows NestedVM to optimize
 382       many operations for the common case of ABI-adherent binaries.
 383
 384 \item All MIPS instructions are exactly 32 bits long.
 385
 386 \end{itemize}
 387
 388
 389
 390 \subsection{Binary-to-Source}
 391
 392 The simplest operational mode for NestedVM is binary-to-source
 393 translation.  In this mode, NestedVM translates MIPS binaries into
 394 Java source code, which is then fed to a Java compiler in order to
 395 produce bytecode files:
 396
 397 \begin{pdfpic}
 398 \newlength{\MyLength}
 399 \settowidth{\MyLength}{xmachine codex}
 400 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
 401 \psmatrix[colsep=2,rowsep=0,nrot=:U]
 402   & \\[0pt]
 403   & \\[0pt]
 404   [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source}   \\[0pt]
 405   & \\[0pt]
 406   & \\[0pt]
 407   & \\[0pt]
 408   & \\[0pt]
 409   & \\[0pt]
 410   [name=b0]\MyBox{machine code}  & [name=b1]\MyBox{safe bytecode} \\[0pt]
 411   \psset{nodesep=5pt,arrows=->}
 412   \ncline{s0}{b0}\bput{:U}{\tt gcc}
 413   \ncline{s1}{b1}\aput{:U}{\tt javac}
 414   \ncline{b0}{s1}\naput{\tt NestedVM}
 415 \endpsmatrix
 416 \end{pdfpic}
 417
 418 \begin{figure*}[t]
 419 \begin{minipage}[c]{7in}%
 420 \begin{multicols}{2}
 421 {\footnotesize\begin{verbatim}
 422 private final static int r0 = 0;
 423 private int r1, r2, r3,...,r30;
 424 private int r31 = 0xdeadbeef;
 425 private int pc = ENTRY_POINT;
 426
 427 public void run() {
 428     for(;;) {
 429         switch(pc) {
 430             case 0x10000:
 431                 r29 = r29 - 32;
 432             case 0x10004:
 433                 r1 = r4 + r5;
 434             case 0x10008:
 435                 if(r1 == r6) {
 436                     /* delay slot */
 437                     r1 = r1 + 1;
 438                     pc = 0x10018:
 439                     continue;
 440                 }
 441             case 0x1000C:
 442                 r1 = r1 + 1;
 443             case 0x10010:
 444                 r31 = 0x10018;
 445                 pc = 0x10210;
 446                 continue;
 447             case 0x10014:
 448                 /* nop */
 449             case 0x10018:
 450                 pc = r31;
 451                 continue;
 452             ...
 453             case 0xdeadbeef:
 454                 System.err.println(``Exited.'');
 455                 System.exit(1);
 456         }
 457     }
 458 }
 459 \end{verbatim}}
 460 \vspace{1in}
 461 {\footnotesize\begin{verbatim}
 462 public void run_0x10000() {
 463     for(;;) {
 464     switch(pc) {
 465         case 0x10000:
 466             ...
 467         case 0x10004:
 468             ...
 469         ...
 470         case 0x10010:
 471             r31 = 0x10018;
 472             pc = 0x10210;
 473             return;
 474         ...
 475     }
 476     }
 477 }
 478
 479 pubic void run_0x10200() {
 480     for(;;) {
 481     switch(pc) {
 482         case 0x10200:
 483             ...
 484         case 0x10204:
 485             ...
 486     }
 487     }
 488 }
 489
 490 public void trampoline() {
 491     for(;;) {
 492     switch(pc&0xfffffe00) {
 493             case 0x10000: run_0x10000(); break;
 494             case 0x10200: run_0x10200(); break;
 495             case 0xdeadbe00:
 496                 ...
 497         }
 498     }
 499 }
 500 \end{verbatim}}
 501 \end{multicols}
 502 \end{minipage}
 503 \caption{\label{code1} Trampoline transformation necessitated by Java's 64kb method size limit}
 504 \end{figure*}
 505
 506 Translating unsafe code for use within a JVM proceeds as follows:
 507
 508 \begin{enumerate}
 509
 510 \item Compile the source code to a statically linked binary, targeting
 511       the MIPS R2000 ISA.  Typically this will involve linking against
 512       {\tt libc}, which translates system requests (such as {\tt
 513       open()}, {\tt read()}, or {\tt write()}) into appropriate
 514       invocations of the MIPS {\tt SYSCALL} instruction.
 515
 516 \item Invoke {\tt NestedVM} on the statically linked binary.
 517
 518 \item Compile the resulting {\tt .java} code using {\tt jikes}
 519       \cite{jikes} or {\tt javac}.
 520
 521 \item From java code, invoke the {\tt run()} method on the generated
 522       class.  This is equivalent to the {\tt main()} entry point.
 523
 524 \end{enumerate}
 525
 526 \subsubsection{Optimizations}
 527
 528 Generating Java source code instead of bytecode frees NestedVM from
 529 having to perform simple constant propagation optimizations, as most
 530 Java compilers already do this.  A recurring example is the treatment
 531 of the {\tt r0} register, which is fixed as {\tt 0} in the MIPS ISA.
 532
 533 Lacking the ability to generate specially optimized bytecode
 534 sequences, a straightforward mapping of the general purpose hardware
 535 registers to 32 {\tt int} fields turned out to be optimal.
 536
 537
 538 \epsfig{file=chart1,width=3in}
 539
 540 Unfortunately, Java imposes a 64kb limit on the size of the bytecode
 541 for a single method.  This presents a problem for NestedVM, and
 542 necessitates a {\it trampoline transformation}, as shown in
 543 Figure~\ref{code1}.  With this trampoline in place, large binaries can
 544 be handled without much difficulty -- fortunately, there is no
 545 corresponding limit on the size of a classfile as a whole.
 546
 547 One difficulty that arose as a result of using the trampoline
 548 transformation was the fact that {\tt javac} and {\tt jikes} are
 549 unable to properly optimize its switch statements.  For example, the
 550 following code is compiled into a comparatively inefficient {\tt
 551 LOOKUPSWITCH}:
 552
 553 {\footnotesize
 554 \begin{verbatim}
 555     switch(pc&0xffffff00) {
 556         case 0x00000100: run_100(); break;
 557         case 0x00000200: run_200(); break;
 558         case 0x00000300: run_300(); break;
 559     }
 560 \end{verbatim}}
 561
 562 \begin{figure}
 563 {\footnotesize\begin{verbatim}
 564 switch(pc>>>8) {
 565     case 0x1: run_100();
 566     case 0x2: run_200();
 567     case 0x3: run_300();
 568 }
 569 \end{verbatim}}
 570
 571 Javac isn't smart enough to see the pattern in the case values and
 572 generates very suboptimal bytecode. Manually doing the shifts
 573 convinces javac to emit a tableswitch statement, which is
 574 significantly faster. This change alone nearly doubled the speed of
 575 the compiled binary.
 576
 577 Finding the optimal method size lead to the next big performance
 578 increase.  It was determined through experimentation that the optimal
 579 number of MIPS instructions per method is 128 (considering only power
 580 of two options). Going above or below that lead to performance
 581 decreases. This is most likely due to a combination of two factors.
 582
 583 \epsfig{file=chart5,width=3in}
 584
 585 \epsfig{file=chart6,width=3in}
 586
 587 This phenomenon is due to two factors:
 588
 589 \begin{itemize}
 590
 591 \item While the trampoline method's {\tt switch} statement can be
 592       coded as a {\tt TABLESWITCH}, the {\tt switch} statement
 593       within the individual methods is to sparse to encode this way.
 594
 595 \item Hybrid Interpretive-JIT compilers such as HotSpot generally
 596       favor smaller methods since they are easier to compile and are
 597       better candidates for compilation in ``normal'' programs (unlike
 598       NestedVM programs).
 599
 600 \end{itemize}
 601
 602 After tuning method sizes, our next performance boost came from
 603 eliminating exraneous case branches.  Having case statements before
 604 each instruction prevents JIT compilers from being able to optimize
 605 across instruction boundaries, since control flow can enter the body
 606 of a {\tt switch} statement at any of the {\tt case}s.  In order to
 607 eliminate unnecessary case statements we needed to identify all
 608 possible jump targets.  Jump targets can come from three sources:
 609
 610 The next big optimization was eliminating unnecessary case
 611 statements. Having case statements before each instruction prevents
 612 JIT compilers from being able to optimize across instruction
 613 boundaries. In order to eliminate unnecessary case statements every
 614 possible address that could be jumped to directly needed to be
 615 identified. The sources for possible jump targets come from 3 places.
 616
 617 \begin{itemize}
 618
 619 \item The .text segment - Every instruction in the text segment is
 620       scanned for jump targets. Every branch instruction (BEQ, JAL,
 621       etc) has its destination added to the list of possible branch
 622       targets. In addition, functions that set the link register have
 623       theirpc+8 added to the list (the address that would've been put
 624       to the link register). Finally, combinations of LUI (Load Upper
 625       Immediate) of ADDIU (Add Immediate Unsigned) are scanned for
 626       possible addresses in the text segment. This combination of
 627       instructions is often used to load a 32-bit word into a
 628       register.
 629
 630 \item The .data segment - When GCC generates switch() statements it
 631       often uses a jump table stored in the .data
 632       segment. Unfortunately gcc doesn't identify these jump tables in
 633       any way. Therefore, the entire .data segment is conservatively
 634       scanned for possible addresses in the .text segment.
 635
 636 \item The symbol table - This is mainly used as a backup. Scanning the
 637       .text and .data segments should identify any possible jump
 638       targets but adding every function in the symbol table in the ELF
 639       binary doesn't hurt. This will also catch functions that are
 640       never called directly from the MIPS binary (for example,
 641       functions called with the call() method in the runtime).
 642
 643       This is mainly used as a backup.  Scanning the {\tt .text} and
 644       {\tt .data} segments should identify any possible jump targets;
 645       however, adding all function symbols in the ELF symbol table
 646       also catches functions that are never called directly from the
 647       MIPS binary, such as those invoked only via the NestedVM
 648       runtime's {\tt call()} method.
 649
 650 \end{itemize}
 651
 652 Eliminating unnecessary {\tt case} statements provided a 10-25\% speed
 653 increase.
 654
 655 Despite all the above optimizations, one insurmountable obstacle
 656 remained: the Java {\tt .class} file format limits the constant pool
 657 to 65535 entries.  Every integer literal greater than {\tt 32767}
 658 requires an entry in this pool, and each branch instruction generates
 659 one of these.
 660
 661 One suboptimal solution was to express constants as offsets from a few
 662 central values; for example ``{\tt pc~=~N\_0x00010000~+~0x10}'' (where
 663 {\tt N\_0x000100000} is a non-final field to prevent {\tt javac} from
 664 inlining it).  This was sufficient to get reasonably large binaries to
 665 compile, and caused only a small (approximately 5\%) performance
 666 degredation and a similarly small increase in the size of the {\tt
 667 .class} file.  However, as we will see in the next section, compiling
 668 directly to {\tt .class} files (without the intermediate {\tt .java}
 669 file) eliminates this problem entirely.
 670
 671
 672 \subsection{Binary-to-Binary}
 673
 674 After implementing the binary-to-source compiler, a binary-to-binary
 675 translation mode was added.
 676
 677 \begin{pdfpic}
 678 \newlength{\MyLength}
 679 \settowidth{\MyLength}{xmachine codex}
 680 \newcommand{\MyBox}[1]{\makebox[\MyLength]{#1}}
 681 \psmatrix[colsep=2,rowsep=0,nrot=:U]
 682   & \\[0pt]
 683   [name=s0]\MyBox{unsafe source} & [name=s1]\MyBox{safe source}   \\[0pt]
 684   & \\[0pt]
 685   & \\[0pt]
 686   & \\[0pt]
 687   & \\[0pt]
 688   & \\[0pt]
 689   [name=b0]\MyBox{machine code}  & [name=b1]\MyBox{safe bytecode} \\[0pt]
 690   & \\[0pt]
 691   \psset{nodesep=5pt,arrows=->}
 692   \ncline{s0}{b0}\bput{:U}{\tt gcc}
 693   \ncline{b0}{b1}\naput{\tt NestedVM}
 694 \endpsmatrix
 695 \end{pdfpic}
 696
 697 This mode has several advantages:
 698
 699 \begin{itemize}
 700
 701 \item There are little tricks that can be done in JVM bytecode that
 702       can't be done in Java source code.
 703
 704 \item Directly generating {\tt .class} files Eliminates the
 705       time-consuming {\tt javac} step.
 706
 707 \item Direct compilation to {\tt .class} files opens up the
 708       interesting possibility of dynamically translating MIPS binaries
 709       and loading them via {\tt ClassLoader.fromBytes()} {\it at
 710       deployment time}, eliminating the need to compile binaries ahead
 711       of time.
 712
 713 \end{itemize}
 714
 715 Most of the performance improvemen where made where in the handling of
 716 branch instructions and in taking advantage of the JVM stack to
 717 eliminate unnecessary {\tt LOAD}s and {\tt STORE}s to local variables.
 718
 719 The first obvious optimization that generating bytecode allows for is the
 720 use of GOTO. Despite the fact that java doesn't have a GOTO keyword a GOTO
 721 bytecode does exist and is used heavily in the code generates by javac.
 722 Unfortunately the java language doesn't provide any way to take advantage of
 723 this. As result of this, jumps within a method were implemented in the
 724 binary-to-source compiler by setting the PC field to the new address and
 725 making a trip back to the initial switch statement.  In the classfile
 726 compiler these jumps are implemented as GOTOs directly to the target
 727 instruction. This saves a costly trip back through the LOOKUPSWITCH
 728 statement and is a huge win for small loops within a method.
 729
 730 The first optimization gained by direct bytecode generation came from
 731 the use of the JVM {\tt GOTO} instruction.  Despite the fact that the
 732 Java {\it language} does not have a {\tt goto} keyword, the VM does in
 733 fact have a corresponding instruction which is used quite heavily by
 734 {\tt javac}.  NestedVM's binary-to-binary mode exploits this
 735 instruction to avoid emitting inefficient {\tt switch..case}
 736 structures.
 737
 738 Related to the {\tt GOTO} instruction is branch statement
 739 optimization.  When emitting source code, NestedVM translates branches
 740 into Java source code like this:
 741
 742 {\footnotesize\begin{verbatim}
 743     if (condition) {
 744         pc = TARGET;
 745         continue;
 746     }
 747 \end{verbatim}}
 748
 749 This requires a branch in the JVM regardless of whether the MIPS
 750 branch is actually taken. If condition is false the JVM has to jump
 751 over the code to set the PC and go back to the switch block. If
 752 condition is true the JVM has to jump to the switch block. By
 753 generating bytecode directly we can make the target of the JVM branch
 754 statement the actual bytecode of the final destination. In the case
 755 where the branch isn't taken the JVM doesn't need to branch at all.
 756
 757 A side affect of the above two optimizations is a solution to the
 758 excess constant pool entries problem. When jumps are implemented as
 759 GOTOs and direct branches to the target the PC field doesn't need to
 760 be set. This eliminates many of the constant pool entries the java
 761 source compiler requires. The limit is still there however, and given
 762 a large enough binary it will still be reached.
 763
 764 Implementation of the MIPS delay slot offers another opportunity for
 765 bytecode-level optimization.  In order to take advantage of
 766 instructions already in the pipeline, the MIPS ISA specifies that the
 767 instruction after a jump or branch is always executed, even if the
 768 jump/branch is taken.  This instruction is referred to as the ``delay
 769 slot\footnote{Newer MIPS CPUs have pipelines that are much larger than
 770 early MIPS CPUs, so they have to discard instructions anyways}.''  The
 771 instruction in the delay slot is actually executed {\it before} the
 772 branch is taken.  To further complicate matters, values from the
 773 register file are loaded {\it before} the delay slot is executed.
 774
 775 Fortunately there is a very elegent solution to this problem which can
 776 be expressed in JVM bytecode.  When a branch instruction is
 777 encountered, the registers needed for the comparison are pushed onto
 778 the stack to prepare for the JVM branch instruction.  Then, {\it
 779 after} the values are on the stack the delay slot instruction is
 780 emitted, followed by the actual JVM branch instruction.  Because the
 781 values were pushed to the stack before the delay slot was executed, any
 782 changes the delay slot made to the registers are not visible to the
 783 branch bytecode.
 784
 785 One final advantage that generating bytecode directly allows is a
 786 reduction in the size of the ultimate {\tt .class} file.  All the
 787 optimizations above lead to more compact bytecode as a beneficial side
 788 effect; in addition, NestedVM performs a few additional optimizations.
 789
 790 When encountering the following {\tt switch} block, both {\tt javac}
 791 and {\tt jikes} generate redundant bytecode.
 792
 793 {\footnotesize\begin{verbatim}
 794     switch(pc>>>8) {
 795         case 0x1: run_1(); break;
 796         case 0x2: run_2(); break
 797         ...
 798         case 0x100: run_100(); break;
 799     }
 800 \end{verbatim}}
 801
 802 The first bytecode in each case arm in the switch statement is {\tt
 803 ALOAD\_0} to prepare for a {\tt INVOKESPECIAL} call.  By simply
 804 lifting this bytecode outside of the {\tt switch} statement, each {\tt
 805 case} arm shrinks by one instruction.
 806
 807 \subsubsection{Compiler Flags}
 808
 809 Although NestedVM perfectly emulates a MIPS R2000 CPU, its performance
 810 profile is nothing like that of actual silicon.  In particular, {\tt
 811 gcc} makes several optimizations that increase performance on an
 812 actually MIPS CPU but actually decrease the performance of
 813 NestedVM-generated bytecode.  We found the following compiler options
 814 could be used to improve performance:
 815
 816 \begin{itemize}
 817
 818 \item {\tt -falign-functions}
 819
 820       Normally a function's location in memory has no effect on its
 821       execution speed.  However, in the NestedVM binary translator,
 822       the {\tt .text} segment is split on power-of-two boundaries.  If
 823       a function starts near the end of one of these boundaries, a
 824       performance critical part of the function winds up spanning two
 825       Java methods.  Telling {\tt gcc} to align all functions along
 826       these boundaries decreases the chance of this sort of splitting.
 827
 828 \item {\tt -fno-rename-registers}
 829
 830       On an actual silicon chip, using additional registers carries no
 831       performance penalty (as long as none are spilled to the stack).
 832       However, when generating bytecode, using {\it fewer}
 833       ``registers'' helps the JVM optimize the machine code it
 834       generates by simplifying the constraints it needs to deal with.
 835       Disabling register renaming has this effect.
 836
 837 \item {\tt -fno-schedule-insns}
 838
 839       Results of MIPS load operations are not available until {\it
 840       two} instructions after the load.  Without the {\tt
 841       -fno-schedule-insns} instruction, {\tt gcc} will attempt to
 842       reorder instructions to do other useful work during this period
 843       of unavailability.  NestedVM is under no such constraint, so
 844       removing this reordering typically generates simpler machine
 845       code.
 846
 847 \item {\tt -mmemcpy}
 848
 849       Enabling this instruction causes {\tt gcc} to use the system
 850       {\tt memcpy()} routine instead of generating loads and stores.
 851       As explained in the next section, the NestedVM runtime
 852       implements {\tt memcpy()} using {\tt System.arraycopy()}, which
 853       is substantially more efficient.
 854
 855 NestedVM has two primary ways of executing code, the interpreter, and the binary translators. Both the interpreter and the output from the binary translators sit on top of a Runtime class. This class provides the public interface to both the interpreter and the translated binaries.
 856
 857 The Runtime class does the work that the operating system usually does.
 858 Conceptually the Runtime class can be thought of as the operating system and
 859 its subclasses (translated binaries and the interpreter) the CPU. The
 860 Runtime fulfills 5 primary goals:
 861
 862 The Runtime class does the work that the operating system usually does. Conceptually the Runtime class can be thought of as the operating system and itÕs subclasses (translated binaries and the interpreter) the CPU. The Runtime fulfills 5 primary goals:
 863
 864 \item {\tt -fno-delayed-branch} The MIPS CPU has a delay slot (see
 865       above). Earlier versions of NestedVM didn't efficiently emulate
 866       delay slots. This option causes GCC to avoid using delay slots
 867       for anything (a NOP is simply placed in the delay slot). This
 868       had a small performance benefit. However, recent versions of
 869       NestedVM emulate delay slots with no performance overhead so
 870       this options has little effect. Nonetheless, these delay slots
 871       provide no benefit under NestedVM either so they are avoided
 872       with this option.
 873
 874 \item Provides a consistent external interface - The method of actually executing the code (currently only translated binaries and the interpreter) can be changed without any code changes to the caller because only Runtime exposes a public interface.
 875
 876 \item Provide an easy to use interface - The interpreter and the output from the binary translators only know how to execute code. The Runtime class provides an easy to use interface to the code. It contains methods to pass arguments to the main() function, read and write from memory, and call individual functions in the binary.
 877
 878 \item Manage the processÕs memory - The Runtime class contains large int[] arrays that represent the process`s entire memory space.  Subclasses read and write to these arrays as required by the instructions they are executing.  Subclasses can expend their memory space using the sbrk syscall.
 879
 880 \item Provide access to the file system and streams - Subclasses access the file system through standard UNIX syscalls (read, write,  open, etc). The Runtime manages the file descriptor table that maps UNIX file descriptors to Java RandomAccessFiles, InputStreams, OutputStreams, and sockets.
 881
 882 \item Miscellaneous other syscalls - In additions to those mentioned above the Runtime class implements a variety of other syscalls (sleep, gettimeofday, getpagesize, sysconf, fcntl, etc).
 883
 884 In addition to binary-to-source and binary-to-binary translation,
 885 NestedVM also includes a MIPS binary interpreter.  All three
 886 translation approaches expose the same API to both the translated
 887 binary and the surrounding VM (including peer Java code).
 888
 889 \subsection{The Runtime Class}
 890
 891 The runtime fulfills four roles:
 892
 893 \begin{itemize}
 894
 895 \item It provides a simple, consistent external interface.  The method
 896       of actually executing the code (currently only translated
 897       binaries and the interpreter) can be changed without any code
 898       changes to the caller because only runtime exposes a public
 899       interface.  This includes methods to pass arguments to the
 900       binary's {\tt main()} function, read and write from memory, and
 901       call individual functions in the binary.
 902
 903 \item It manages the process's memory.  The runtime class contains
 904       large {\tt int[]} arrays that represent the process`s entire
 905       memory space.  Subclasses read and write to these arrays as
 906       required by the instructions they are executing, and can expand
 907       their memory space using the {\tt sbrk} system call.
 908
 909 \item The runtime provides access to the host file system and standard
 910       I/O streams.  Subclasses of {\tt runtime} can access the file
 911       system through standard UNIX syscalls ({\tt read()}, {\tt
 912       write()}, {\tt open()}, etc).  The runtime manages the file
 913       descriptor table that maps UNIX file descriptors to Java {\tt
 914       RandomAccessFile}s, {\tt InputStream}s, {\tt OutputStream}s, and
 915       {\tt Socket}s.
 916
 917 \item It provides general OS services, including {\tt sleep()}, {\tt
 918       gettimeofday()}, {\tt getpagesize()}, {\tt sysconf()}, {\tt
 919       fcntl()}, and so on.
 920
 921 \end{itemize}
 922
 923 \section{Future Directions}
 924
 925 Java source code can create a copy of the translated binary by instantiating the class generated by the binary translator  or instantiating the interpreter. It can then interact with the process through the many facilities provided by the Runtime interface.  Invoking the run() method of the Runtime interface will load the given arguments into the processÕs memory as invoke the binaries entry point (typically \_start() in crt0.o). This will pass control on to the main() function which will have the arguments passed to run() loaded into argv and argc.
 926
 927 As the binary executes it often passes control back to the Runtime class through the MIPS {\tt SYSCALL} instruction. The interpreter and translated binaries invoke the {\tt syscall()} method of the Runtime class when the {\tt SYSCALL} instruction is executed. The Runtime class then can manipulate the processÕs environment (read and write to memory, modify the file descriptor table, etc) and interact with the rest of the JVM on behalf of the process (read and write to a file or stream, etc). There is even a syscall to pause the VM and temporarily return control to the caller.
 928
 929 In addition to the interfaces provided by NestedVM, users can create their own interfaces between  the MIPS and Java world. The Runtime provides a method called call() that will call a function by name in the MIPS binary. The call() method looks up the function name in the binaryÕs ELF symbol table and manipulating the stack and registers accordingly to execute the given function. This allows Java code to seamlessly invoke functions in the binary.
 930
 931 \section{Conclusion}
 932
 933 The MIPS binaries can also invoke a special method of Runtime called callJava().When the MIPS binary invokes the {\tt CALL\_JAVA}  syscall (usually done through the {\tt \_call\_java()} function provided by the NestedVM support library) the callJava() method in Runtime is invoked with the arguments passes to the syscall.
 934
 935 NestedVM is available under an open source license, and can be
 936 obtained from
 937 \begin{verbatim}
 938     http://nestedvm.ibex.org
 939 \end{verbatim}
 940
 941
 942 \section{Appendix: Testing Methodology}
 943
 944 The MIPS binaries can also invoke a special method of Runtime called
 945 callJava().When the MIPS binary invokes the {\tt CALL\_JAVA} syscall
 946 (usually done through the {\tt \_call\_java()} function provided by
 947 the NestedVM support library) the callJava() method in Runtime is
 948 invoked with the arguments passes to the syscall.
 949
 950 {\footnotesize\begin{verbatim}
 951
 952 // Java
 953 private Runtime rt = new MyBinary() {
 954     pubilc int callJava(int a, int b, int c, int d) {
 955         System.err.println("Got " + a + " " + b);
 956     }
 957 };
 958 public void foo() { rt.run(); }
 959
 960 /* C */
 961 void main(int argc, char **argv) {
 962     \_call\_java(1,2);
 963 }
 964
 965 \end{verbatim}}
 966
 967 These two methods can even be combined. MIPS can call Java through the CALL\_JAVA syscall, which can in turn invoke a MIPS function in the binary with the call() method.\r\r
 968 Users preferring a simpler communication mechanism can also use Java StreamÕs and file descriptors. Runtime provides a simple interface for mapping a Java Input or OutputStream to a File Descriptor.
 969
 970 %Java source code can create a copy of the translated binary by
 971 %instantiating the corresponding class, which extends {\tt Runtime}.
 972 %Invoking the {\tt main()} method on this class is equivalent to
 973 %calling the {\tt main()} function within the binary; the {\tt String}
 974 %arguments to this function are copied into the binary's memory space
 975 %and made available as {\tt **argv} and {\tt argc}.
 976
 977 %The translated binary communicates with the rest of the VM by
 978 %executing MIPS {\tt SYSCALL} instructions, which are translated into
 979 %invocations of the {\tt syscall()} method.  This calls back to the
 980 %native Java world, which can manipulate the binary's environment by
 981 %reading and writing to its memory space, checking its exit status,
 982 %pausing the VM, and restarting the VM.
 983
 984
 985 %\subsection{Virtualization}
 986
 987 %The {\tt Runtime} class implements the majority of the standard {\tt
 988 %libc} syscalls, providing a complete interface to the filesystem,
 989 %network socket library, time of day, (Brian: what else goes here?).
 990
 991 %\begin{itemize}
 992
 993 %\item ability to provide the same interface to CNI code and
 994       % NestedVMified code
 995
 996 %\item security advantages (chroot the {\tt fork()}ed process)
 997       %
 998 %\end{itemize}
 999
1000
1001 \section{Future Directions}
1002
1003 \section{Conclusion}
1004
1005 \section{Appendix A: Testing Environment}
1006
1007 All times are measured in seconds. These were all run on a dual 1ghz
1008 G4 running OS X 10.3.1 with Apple's latest VM (JDK 1.4.1\_01-27). Each
1009 *************
1010 /* C */
1011 void do_work(int n) {
1012     int i;
1013     int ret=0;
1014     for(i=0;i<n;i++) ret += i;
1015     return n;
1016 }
1017
1018 \end{verbatim}}
1019
1020 The MIPS binaries can also invoke a special method of Runtime called
1021 callJava().When the MIPS binary invokes the {\tt CALL\_JAVA} syscall
1022 (usually done through the {\tt \_call\_java()} function provided by
1023 the NestedVM support library) the callJava() method in Runtime is
1024 invoked with the arguments passes to the syscall.
1025
1026 Although NestedVM perfectly emulates a MIPS R2000 CPU its performance characteristics aren't anything like an actual MIPS R2000 CPU. GCC makes several optimizations that increase performance on an actually MIPS CPU but actually decrease performance when run through the NestedVM binary translator. Fortunately, GCC provides many options to customize its code generations and eliminate these optimizations. GCC also has optimization options that aren't helpful on a real MIPS CPU but are very helpful under NestedVM
1027
1028 // Java
1029 private Runtime rt = new MyBinary() {
1030     pubilc int callJava(int a, int b, int c, int d) {
1031         System.err.println("Got " + a + " " + b);
1032     }
1033 };
1034 public void foo() { rt.run(); }
1035
1036 /* C */
1037 void main(int argc, char **argv) {
1038     _call_java(1,2);
1039 }
1040
1041 \item {\tt -falign-functions}
1042 Normally a function's location in memory has no effect on its execution speed. However, in the NestedVM binary translator, the .text segment is split up on power of two boundaries. If a function is unlucky enough to start near the end of one of these boundaries a performance critical part of the function could end up spanning two methods. There is a significant amount of overhead in switching between two methods so this must be avoided at all costs. By telling GCC to align all functions to the boundary that the .text segment is split on the chances of a critical part of a function spanning two methods is significantly reduced.
1043
1044 \item {\tt -fno-rename-registers}
1045 Some processors can better schedule code when registers aren't reused for two different purposes. By default GCC will try to use as many registers as possibly when it can. This excess use of registers just confuses JIT's trying to compile the output from the binary translator. All the JIT compilers we tested do much better with a few frequently used registers.
1046
1047 \item {\tt -fno-delayed-branch}
1048 The MIPS CPU has a delay slot (see above). Earlier versions of NestedVM didn't efficiently emulate delay slots. This option causes GCC to avoid using delay slots for anything (a NOP is simply placed in the delay slot). This had a small performance benefit. However, recent versions of NestedVM emulate delay slots with no performance overhead so this options has little effect. Nonetheless, these delay slots provide no benefit under NestedVM either so they are avoided with this option.
1049
1050 \item {\tt -fno-schedule-insns}
1051 Load operations in the MIPS ISA also have a delay slot. The results of a load operation are not available for use until one instruction later. Several other instructions also have similar delay slots. GCC tries to do useful work wile waiting for the results of one of these operations by default. However, this, like register renaming, tends to confuse JIT compilers. This option prevents GCC from going out of its way to take advantage of these delay slots and makes the code generated by NestedVM easier for JIT compilers to handle.
1052
1053 \item {\tt -mmemcpy}
1054 GCC sometimes has to copy somewhat large areas of memory. The most common example of this is assigning one struct to another. Memory copying can be done far more efficiently in Java than under NestedVM. Calls to the memcpy libc function are treated specially by the binary translator. They are turned into calls to a memcpy method in Runtime. The {\tt -mmemcpy} option causes GCC to invoke libc's memcpy() function when it needs to copy a region of memory rather than generating its own memcpy code. This call in then turned into a call to this Java memcpy function which is significantly faster than the MIPS implementation.
1055
1056 \item {\tt -ffunction-sections -fdata-sections}
1057 These two options are used in conjunction with the {\tt --gc-section} linker option. These three options cause the linker to aggressively discard unused functions and data sections. In some cases this leads to significantly smaller binaries.
1058
1059 %\subsection{Virtualization}
1060
1061 %The {\tt Runtime} class implements the majority of the standard {\tt
1062 %libc} syscalls, providing a complete interface to the filesystem,
1063 %network socket library, time of day, (Brian: what else goes here?).
1064
1065 World domination.
1066
1067 \item Better use of local variables in binary-to-binary compiler -- need to
1068 do data flow analysis to find how how and when registers are used and avoid
1069 the costly load/restore when it isn't necessary.
1070
1071 \item More advanced Runtime support -- support more syscalls. This will
1072 allow running large applications such as GCC under NestedVM.
1073
1074 \item World domination
1075
1076 \end{itemize}
1077
1078 %\item ability to provide the same interface to CNI code and
1079       % NestedVMified code
1080
1081 %\item security advantages (chroot the {\tt fork()}ed process)
1082       %
1083 %\end{itemize}
1084
1085
1086 \section{Future Directions}
1087
1088 \section{Conclusion}
1089
1090 \section{Appendix A: Testing Environment}
1091
1092 All times are measured in seconds. These were all run on a dual 1ghz
1093 G4 running OS X 10.3.1 with Apple's latest VM (JDK 1.4.1\_01-27). Each
1094 *************
1095 All times are measured in seconds. These were all run on a dual 1Ghz
1096 Macintosh G4 running Apple's latest JVM (Sun HotSpot JDK 1.4.1). Each
1097 ^ ^ ^ ^ ^ ^ ^
1098 test was run 8 times within a single VM. The highest and lowest times
1099 were removed and the remaining 6 were averaged. In each case only the
1100 first run differed significantly from the rest.
1101
1102 The libjpeg test consisted of decoding a 1280x1024 jpeg
1103 (thebride\_1280.jpg) and writing a tga. The mspack test consisted of
1104 extracting all members from arial32.exe, comic32.exe, times32.exe, and
1105 verdan32.exe. The freetype test consisted of rendering characters
1106 32-127 of Comic.TTF at sizes from 8 to 48 incrementing by 4. (That is
1107 about 950 individual glyphs).
1108
1109 \section{References}
1110
1111 \end{document}
1112