EDIT: OP is asking about JS, I made my answer kinda Java-specific. Meh.
A little out of order in answering...
Hi,
Thank you for response, but I do not understand this.
... It feels like my mental picture of the process has some hole in it that i'm not seeing...
No problem. This is pretty subtle actually.
By 'results' i do mean the output of path1.
...
So even though a VM does read/execute the bytecode, the VM has to translate and output to x86 cpu for something to happen? So why not cache this VM output?
OK. I think I understand your current perception. The following is what I think you think -- tell me if I'm wrong.
Compiler translates source to bytecode. VM translates bytecode to machine code (e.g., x86), which then runs natively.
That is what happens only with a JIT-based approach. Not all VMs will JIT all code, nor will all VMs JIT at all. What else can they do? See below.
Why do you say 'VM doesn't make machine code' ? If you're running javascript on a intel cpu, at some point the VM has to create instructions that the intel cpu can understand, x86 machine code?
I say 'VM doesn't make machine code' because not all VMs make machine code.
What is bytecode? Bytecode is a sequence of instructions that the VM understands, and most CPUs do not. What are the ways that a program represented in bytecode can be executed?
1. Translate the bytecode to machine code, and run the machine code (this is the JIT approach).
2. Read the bytecode and execute each command on behalf of the bytecode within the VM. (interpretation)
Here's an example of #2. Consider a bytecode language with four symbols:
o - Open file
c - Close file
w - Write file
r - Read file
Clearly these operations cannot be executed directly by a CPU. CPUs operate well below the file abstraction, each operation involves the OS, etc., but these are legal operations for any
language, including bytecode.
So consider this string of bytecode:
o "foo.txt" "rw" r buffer 64 w "Hello, world!" 14 c
In human language: Open file foo.txt in rw mode. Read 64 bytes from the front into buffer. Then write "Hello, World!" into the file. Close the file.
Still, this sequence cannot be executed directly by the CPU. But it
can be read and executed by a VM, by some code that looks like this:
Pseudocode
Code:
forever {
cmd = readBytecode();
switch(cmd) {
case 'o':
filename = arg1;
mode = arg2;
f = open_file( filename, mode );
break;
case 'r':
buf = arg1;
len = arg2;
buf = read_file( f, len );
break;
case 'w':
data = arg1;
len = arg2;
write_file( f, data, len );
break;
case 'c':
close_file( f );
}
}
The above code is an
interpreting VM. The VM does not
execute the bytecode, it
reads the bytecode and performs the commands encoded within the bytecode. An interpreter need not generate machine code at all for the target execution, because the interpreter knows how to execute the entire language all by itself.
Notice that there aren't any left-overs from interpretation like there are from JIT-ing. There's no machine code to cache, because no machine code was generated.
When to prefer interpretation to native
Lots of languages are interpreted. Shell scripts, python, Java, lots more. Lots more are compiled to native: C, C++, D, etc. Some can go either way: Java, python, etc.
Java is one of those interesting languages that can go either way (actually, Java's intermediate form, bytecode, can go either way). So, what are the advantages and disadvantages of each?
Pro native
- Native code is faster, once it has been compiled
Con native
- Compilation to native takes a long time, but only needs to be done once
Pro interpreted
- Almost no compilation step
Con interpreted
- Runs quite a bit slower than native
The simplest JVMs
only interpret. Thats entirely legal in the Java execution model. They never JIT, because decent JIT compilers are hard to write and are pretty large.
The smartest and fastest JVMs out there usually interpret code on the first run. When they detect they are running the same section of code over and over again, they pause execution and JIT the code to native. That allows the execution to pick up a lot of speed, after the compilation is done.
Because compilation is expensive, JVMs keep a compiled code cache (sometime just called a code cache), so they need not repeat most compilation steps if they execute the same code again. But when running interpreted, there are no 'results' of interpretation that can be safely cached -- there
are a set of deltas to the outside world (the output set of the interpretation), but that might change from run to run depending on inputs.