-This requires a branch in the JVM regardless of whether the MIPS
-branch is actually taken. If condition is false the JVM has to jump
-over the code to set the PC and go back to the switch block. If
-condition is true the JVM as to jump to the switch block. By
-generating bytecode directly we can make the target of the JVM branch
-statement the actual bytecode of the final destination. In the case
-where the branch isn?t taken the JVM doesn?t need to branch at all.
-
-A side affect of the above two optimizations is a solution to the
-excess constant pool entries problem. When jumps are implemented as
-GOTOs and direct branches to the target the PC field doesn?t need to
-be set. This eliminates many of the constant pool entries the java
-source compiler requires. The limit is still there however, and given
-a large enough binary it will still be reached.
-
-Delay slots are another area where things are done somewhat
-inefficiently in the Java source compiler. In order to take advantage
-of instructions already in the pipeline MIPS cpu have a ?delay
-slot?. That is, an instruction after a branch or jump instruction that
-is executed regardless of whether the branch is taken. This is done
-because by the time the branch or jump instruction is finished being
-processes the next instruction is already ready to be executed and it
-is wasteful to discard it. (However, newer MIPS CPUs have pipelines
-that are much larger than early MIPS CPUs so they have to discard many
-instructions anyway.) As a result of this the instruction in the delay
-slot is actually executed BEFORE the branch is taken. To make things
-even more difficult, values from the register file are loaded BEFORE
-the delay slot is executed. Here is a small piece of MIPS assembly:
+This requires a branch in the JVM {\it regardless} of whether the MIPS
+branch is actually taken. If {\tt condition} is false the JVM has to
+jump over the code to set {\tt pc} and go back to the {\tt switch}
+statemenmt; if {\tt condition} is true the JVM has to jump to the {\tt
+switch} block. By generating bytecode directly, NestedVM is able to
+emit a JVM bytecode branching directly to the address corresponding to
+the target of the MIPS branch. In the case where the branch is not
+taken the JVM doesn't branch at all.
+
+A side effect of the previous two optimizations is a solution to the
+excess constant pool entries problem. When jumps are implemented as
+{\tt GOTO}s and branches are taken directly, the {\tt pc} field does
+not need to be set. This eliminates a huge number of constant pool
+entries. The {\tt .class} file constant pool size limit is still
+present, but it is less likely to be encountered.
+
+Implementation of the MIPS delay slot offers another opportunity for
+bytecode-level optimization. In order to take advantage of
+instructions already in the pipeline, the MIPS ISA specifies that the
+instruction after a jump or branch is always executed, even if the
+jump/branch is taken. This instruction is referred to as the ``delay
+slot\footnote{Newer MIPS CPUs have pipelines that are much larger than
+early MIPS CPUs, so they have to discard instructions anyways}.'' The
+instruction in the delay slot is actually executed {\it before} the
+branch is taken. To further complicate matters, values from the
+register file are loaded {\it before} the delay slot is executed.
+
+Fortunately there is a very elegent solution to this problem which can
+be expressed in JVM bytecode. When a branch instruction is
+encountered, the registers needed for the comparison are pushed onto
+the stack to prepare for the JVM branch instruction. Then, {\it
+after} the values are on the stack the delay slot instruction is
+emitted, followed by the actual JVM branch instruction. Because the
+values were pushed to the stack before the delay slot was executed, any
+changes the delay slot made to the registers are not visible to the
+branch bytecode.
+
+One final advantage that generating bytecode directly allows is a
+reduction in the size of the ultimate {\tt .class} file. All the
+optimizations above lead to more compact bytecode as a beneficial side
+effect; in addition, NestedVM performs a few additional optimizations.
+
+When encountering the following {\tt switch} block, both {\tt javac}
+and {\tt jikes} generate redundant bytecode.