深入了解Bytecode

hmtomyang 2012-03-30

展開全文

二、Bytecode

1，什么是Bytecode
C/C++編譯器把源代碼編譯成匯編代碼，Java編譯器把Java源代碼編譯成字節(jié)碼bytecode。
Java跨平臺其實就是基于相同的bytecode規(guī)范做不同平臺的虛擬機，我們的Java程序編譯成bytecode后就可以在不同平臺跑了。
.net框架有IL(intermediate language)，匯編是C/C++程序的中間表達方式，而bytecode可以說是Java平臺的中間語言。
了解Java字節(jié)碼知識對debugging、performance tuning以及做一些高級語言擴展或框架很有幫助。

2，使用javap生成Bytecode
JDK自帶的javap.exe文件可以反匯編Bytecode，讓我們看個例子:
Test.java:

Java代碼

public class Test {
public static void main(String[] args) {
int i = 10000;
System.out.println("Hello Bytecode! Number = " + i);
}
}

編譯后的Test.class:

Java代碼

漱壕 1 +
<init> ()V Code LineNumberTable main ([Ljava/lang/String;)V
SourceFile Test.java
! " java/lang/StringBuilder Hello Bytecode! Number = # $ # % & ' ( ) * Test java/lang/Object java/lang/System out Ljava/io/PrintStream; append -(Ljava/lang/String;)Ljava/lang/StringBuilder; (I)Ljava/lang/StringBuilder; toString ()Ljava/lang/String; java/io/PrintStream println (Ljava/lang/String;)V !
* > '< Y

使用javap -c Test > Test.bytecode生成的Test.bytecode:

Java代碼

Compiled from "Test.java"
public class Test extends java.lang.Object{
public Test();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: sipush 10000
3: istore_1
4: getstatic #2; //Field java/lang/System.out:Ljava/io/PrintStream;
7: new #3; //class java/lang/StringBuilder
10: dup
11: invokespecial #4; //Method java/lang/StringBuilder."<init>":()V
14: ldc #5; //String Hello Bytecode! Number =
16: invokevirtual #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
19: iload_1
20: invokevirtual #7; //Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
23: invokevirtual #8; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
26: invokevirtual #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
29: return
}

JVM就是一個基于stack的機器，每個thread擁有一個存儲著一些frames的JVM stack，每次調(diào)用一個方法時生成一個frame。
一個frame包括一個local variables數(shù)組(本地變量表)，一個Operand LIFO stack和運行時常量池的一個引用。

我們來簡單分析一下生成的字節(jié)碼指令:
aload和iload指令的“a”前綴和“i”分別表示對象引用和int類型，其他還有“b”表示byte，“c”表示char，“d”表示double等等
我們這里的aload_0表示將把local variable table中index 0的值push到Operand stack，iload_1類似
invokespecial表示初始化對象，return表示返回
sipush表示把10000這個int值push到Operand stack
getstatic表示取靜態(tài)域
invokevirtual表示調(diào)用一些實例方法
這些指令又稱為opcode，Java一直以來只有約202個Opcode，具體請參考Java Bytecode規(guī)范。

我們看到Test.class文件不全是二進制的指令，有些是我們可以識別的字符，這是因為有些包名、類名和常量字符串沒有編譯成二進制Bytecode指令。

3，體驗字節(jié)碼增強的魔力
我們J2EE常用的Hibernate、Spring都用到了動態(tài)字節(jié)碼修改來改變類的行為。
讓我們通過看看ASM的org.objectweb.asm.MethodWriter類的部分方法來理解ASM是如何修改字節(jié)碼的:

Java代碼

class MethodWriter implements MethodVisitor {
private ByteVector code = new ByteVector();
public void visitIntInsn(final int opcode, final int operand) {
// Label currentBlock = this.currentBlock;
if (currentBlock != null) {
if (compute == FRAMES) {
currentBlock.frame.execute(opcode, operand, null, null);
} else if (opcode != Opcodes.NEWARRAY) {
// updates current and max stack sizes only for NEWARRAY
// (stack size variation = 0 for BIPUSH or SIPUSH)
int size = stackSize + 1;
if (size > maxStackSize) {
maxStackSize = size;
}
stackSize = size;
}
}
// adds the instruction to the bytecode of the method
if (opcode == Opcodes.SIPUSH) {
code.put12(opcode, operand);
} else { // BIPUSH or NEWARRAY
code.put11(opcode, operand);
}
}
public void visitMethodInsn(
final int opcode,
final String owner,
final String name,
final String desc)
{
boolean itf = opcode == Opcodes.INVOKEINTERFACE;
Item i = cw.newMethodItem(owner, name, desc, itf);
int argSize = i.intVal;
// Label currentBlock = this.currentBlock;
if (currentBlock != null) {
if (compute == FRAMES) {
currentBlock.frame.execute(opcode, 0, cw, i);
} else {
/*
* computes the stack size variation. In order not to recompute
* several times this variation for the same Item, we use the
* intVal field of this item to store this variation, once it
* has been computed. More precisely this intVal field stores
* the sizes of the arguments and of the return value
* corresponding to desc.
*/
if (argSize == 0) {
// the above sizes have not been computed yet,
// so we compute them...
argSize = getArgumentsAndReturnSizes(desc);
// ... and we save them in order
// not to recompute them in the future
i.intVal = argSize;
}
int size;
if (opcode == Opcodes.INVOKESTATIC) {
size = stackSize - (argSize >> 2) + (argSize & 0x03) + 1;
} else {
size = stackSize - (argSize >> 2) + (argSize & 0x03);
}
// updates current and max stack sizes
if (size > maxStackSize) {
maxStackSize = size;
}
stackSize = size;
}
}
// adds the instruction to the bytecode of the method
if (itf) {
if (argSize == 0) {
argSize = getArgumentsAndReturnSizes(desc);
i.intVal = argSize;
}
code.put12(Opcodes.INVOKEINTERFACE, i.index).put11(argSize >> 2, 0);
} else {
code.put12(opcode, i.index);
}
}
}

通過注釋我們可以大概理解visitIntInsn和visitMethodInsn方法的意思。
比如visitIntInsn先計算stack的size，然后根據(jù)opcode來判斷是SIPUSH指令還是BIPUSH or NEWARRAY指令，并相應(yīng)的調(diào)用字節(jié)碼修改相關(guān)的方法。