Disclaimer: The V8 engine is a huge project and inevitably complex. It takes lots of time to go through everything and cannot be done in a reasonably sized blogpost, thus here I am just scraping the surface. Hopefully I can show you why I find trekking through V8 a delightful journey.
Lets get familiar with some V8 lingo: Orinoco is the garbage collector. Liftoff is the web assembly(WASM) compiler, then comes the duo making V8 fastest engine on the planet: Ignition and Turbofan. So why is V8 so fast? Great credit goes to Ignition and Turbofan, Ignition is an interpreter and Turbofan is a compiler. So when you bring up a web page you want to see it loaded fast, so you JIT (compile, just in time) later when you keep running a JS function though, you ask yourself, isn’t it better that instead of running some VM bytecodes to execute this function over and over again, I had some faster, machine code instead? And that is the trick they do, if some code is deemed ‘hot’ and you keep calling it it is optimized by being compiled to machine code via Turbofan but the buck doesn’t stop here: V8 gather type feedback to speed it up. Recall that JS is an untyped language, so it makes it painfull to execute simple addition in JS: is it an addition of numbers? Of strings? You just do not know and thus you have to generate many additional code in cases that the typing is not clear. The type feedback guesses types so that after you call a function with a number as the argument after a few times, it hints Turbofan that the argument is going to be a number. What happens if this intuition proves wrong and you call it with a string? What happens is called deoptimization, the generated specialized machine code is thrown away, and again Ignition takes the reins.
Further more every piece is crafted with performance in mind. For example the scanner (the part in charge of transforming an stream of characters into tokens) makes tables containing flags for each ASCII character to flag whether each character can be start of an id, or continuation of an id so that on consuming each character we can verify whether we’re still in side an id, or that it uses perfect hashing on the length of each keyword and the first two characters to tell if an identifier is a keyword or not.
The required tools to build V8 is git (which you probably have) and depot-tools (git is included in depot-tools on Windows). Then you execute “gclient“ and it updates. By this point, you should be able to do this as depot-tools directory should be in your path. Then cd into the directory you wish and execute “fetch v8“ and grab a coffee, this will take some time. Now, before buliding, executing gclient sync will get you build dependencies and you will be ready to build. tools/dev/gm.py x64.release builds V8 and you can access the built binaries at the subdirectory tools/dev/out/x64.release. Go on, take a look at the contents of this directory.
icudtl.dat for example is the bundled ICU: International Components for Unicode data. If you open this file using an editor and choose Unicode encoding you can see e.g. names of countries in your language of choice [here Iran can be seen in Farsi].
There are 2 directories here, gen which as the name suggests contains auto generated stuff. You can find header and source files here (e.g. generated by torque), among ninja metadata and cache files. Besides that there are some dll and lib files (or equivalent shared libraries in *nix systems).
Reading through the source code and trying to make out how things fit together is good practice for C++ enthusiasts as the code conforms to good practices (well, mostly) but as the project is huge, one cannot just swallow it all in one bite rather you can take a subsystem of the project, say the scanner, and first focus on that and later after the smaller pieces are worked out look into how they integrate to make the greater picture. For example, in the scanner, to define the types of tokens X Macro are used which is a cool trick leveraging the preprocessor that I find to be underrated and less used.
Now onwards to the executables.
This tool takes in a js file and outputs a cpp source file (by default snapshot.cc). But what snapshot? What is it doing? The js file you put in is initialization code for an instance of V8 engine, and the idea is that to improve start-up times, you can “capture” the state of V8 engine into a snapshot using this tool. So as it can be guessed you can also pass optional V8 arguments to this tool to customize the snapshot you get. Afterwards you just copy the generated snapshot to /src directory and build again. So to recap, normally V8 would create the stuff it needs on-the-fly on the heap (loading math libraries for example) with snapshot you put it in a blob and later the deserialize it back. Way faster! Now that we are here let’s also talk about isolates. An isolate is an isolated instance of V8 engine: meaning that objects of an isolate for example, should not be used in another isolate, each isolate has it’s own separate independent state. This technology for sandboxing has gained traction and has been, for example used by cloudflare and is about taking the idea of containers to the extreme: with containers we set out to take VMs a step towards more performance, by sharing the OS kernel across containers instead of everyone bringing their OS kernel with themselves. Now with an isolate, we share more stuff [here, the code of the V8 engine, doing all the heavy work of a JS runtime] across isolate instances. Here is how you make an isolate instance from an snapshot file (which you have made using mksnapshot):
int len = 0; // for storing length of file
byte* snapshot = ReadBytes(snap_filename, &len);
v8::Isolate* v8_isolate = v8::Isolate::New();; // our isolate instance
if (snapshot) // make sure file successfully read
SnapshotData data(Vector<const byte>(str, len));
v8_isolate = reinterpret_cast<v8::Isolate*>(isolate);
There is so many things you can learn from V8, and so many ways to have fun with it. The V8 docs has good material, I also recommend this post by Franziska Hinkelmann regarding the bytecode. Once you begin to make sense of V8, you will also be ready to grasp SpiderMonkey as the concepts are quite similar, for example the idea of optimizing hotcode and using type feedback can be seen in Mozilla flavor, named tracing jit .
Post by: Iman Hosseini