Tuesday, December 20, 2011

New Emscripten tutorial: C/C++ to JavaScript now easier than ever with "emcc"

A new compiler frontend for Emscripten, emcc, has landed recently. emcc can be used basically as a drop-in replacement for gcc, making it much easier to compile C and C++ into JavaScript. For example,

   emcc src.cpp

will generate a.out.js, and

  emcc src.cpp -o src.html

will generate a complete HTML file with the compiled code as embedded JavaScript, including SDL support so the code can render to a Canvas element. Optimizing code is now easy as well,

  emcc -O2 src.cpp

will generate optimized code (optimizing in LLVM, the Emscripten compiler itself, the Closure Compiler, and the Emscripten JavaScript optimizer). (Note that there is an even faster setting, -O3, see the docs for more.)

emcc is presented in more detail in the new Emscripten Tutorial. Check it out! Feedback is welcome :)

Saturday, December 10, 2011

Typed Arrays by Default in Emscripten

Emscripten has several ways of compiling code into JavaScript, for example, it can use typed arrays or not (for more, see Code Generation Modes). I merged the 'ta2 by default' branch into master in Emscripten just now, which makes one of the typed array modes the default. I'll explain here the reason for that, and the results of it.

Originally Emscripten did not use typed arrays. When I began to write it, typed arrays were supported only in Firefox and Chrome, and even there they were of limited benefit due to lack of optimization and incomplete implementation. Perhaps more importantly, it was not clear whether they would ever be universally supported in all browsers. So to generate code that truly runs everywhere, Emscripten did not use typed arrays, it generated "plain vanilla" JavaScript.

However, that has changed. Firefox and Chrome now have mature and well-performing implementations of typed arrays, and Opera and Safari are very close to the same. Importantly, Microsoft has said that IE10 will support typed arrays. So typed arrays are becoming ubiquitous, and have a bright future.

The main benefits of using typed arrays are speed and code compatibility. Speed is simply a cause of JS engines being able to optimize typed arrays better than normal ones, both in how they are laid out in memory and how they are accessed. Compatibility stems from the fact that by using typed arrays with a shared buffer, you can get the same memory behavior as C has, for example, you can read an 8-bit byte from the middle of a 32-bit int and get the same result C would get. It's possible to do that without typed arrays, but it would be much, much slower. (There is however a downside to such C-like memory access: Your code, if it was not 100% portable in the first place, may depend on the CPU endianness.)

Because of those benefits, I worked towards using typed arrays by default. To get there, I had to fix various problems with accessing 64-bit values, which are only a problem when doing C-like memory access, because unaligned 64-bit reads and writes do not work (due to how the typed arrays API is structured). The settings I64_MODE and DOUBLE_MODE control reading those 64-bit values: If set to 1, reads and writes will be in two 32-bit parts, in a safe way.

Another complication is that typed arrays cannot be resized. So when sbrk() is called to a value that is larger than the max size, we can't easily enlarge the typed arrays we are using. The current implementation will create new typed arrays and copy the old values into them, which will work but is potentially slow.

Typed arrays have already worked in Emscripten for a long time (in two modes, even, shared and non-shared buffers), but the issues mentioned in the previous two paragraphs limited their use in some areas. So the recent work has been to smooth over all the missing pieces, to make typed arrays ready as the default mode.

The current default in Emscripten, after the merge, is to use typed arrays (in mode 2, with a shared buffer, that is, C-like memory access), and all the other settings are set to safe values (I64_MODE and DOUBLE_MODE are both 1), etc. This means that all the code that worked out of the box before will continue to work, and additional code will now work out of the box as well. Note that this is just the defaults: If your makefile sets all the Emscripten settings itself (like defining whether to use typed arrays or not, etc.), then nothing will change.

The only thing to keep in mind with this change is that by default, you will need typed arrays to run the generated code. If you want your code, right now, to run in the most places, you should set USE_TYPED_ARRAYS to 0 to disable typed arrays. Another possible issue is that not all JS console environments support typed arrays: Recent versions of SpiderMonkey and Node.js do, but the V8 shell has some issues (note that this is just a problem in the commandline shell, not in Chrome), so if you test your generated code using d8 then it will not work. Instead, you can test it in a browser, or by using Node.js or the SpiderMonkey shell for now.

Monday, December 5, 2011

Emscripten in node.js and on the web

Until now, to use Emscripten to compile LLVM to JavaScript you had to install a JavaScript engine shell (like SpiderMonkey's or V8's), both to run Emscripten itself and to run the generated code. This meant you had to get the latest source code of one of those shells and build it, which isn't hard but isn't super convenient either. So over the weekend I landed support for running Emscripten itself in node.js and in web browsers, as well as support for running the generated code in node.js (it always ran in browsers).

What this means is that if you have node.js, Python and Clang, you have everything you need to use Emscripten. For more, see the updated Getting Started page. (Regarding running Emscripten itself in a web browser, see src/compiler.html. This isn't really intended as a serious way to use it, but there are some interesting use cases for it, or will be.)

It is still strongly recommended to install the JavaScript engine shells themselves, though. One reason is the trunk engine shells are the very latest code, so to see the maximum speed code can run you should use them. Also, some tests require the SpiderMonkey shell because the others do not yet fully support the latest typed arrays spec. But, if you already have node.js installed anyhow, it is now easier to use Emscripten because you can just use that.