Wednesday, February 22, 2012

Unicode source code

As Frank Salter wrote in a recent comment on case-sensitivity, most scientific fields requires the usage of several dozens of symbols and alphabets that go beyond the latin symbols usable in plain old ASCII, which is a standard almost half-century old (1968).
For example my field - civil and structural engineering - uses almost the entire Greek alphabets in the Eurocodes . 
There is a widespread need to go beyond case-sensitivity and ASCII.
I already expressed at the beginning of our efforts the choice to have Unicode source code as a design choice; shame on me for not having been able to find the time necessary to implement what it is clear in our mind.
We shall - at least - write source code in Unicode and lay down some style guide for its usage. 

First of all we needs more liberal infix and prefix operators. By “liberal” I mean allowing for example Unicode math codes, like many mathematical symbols the Unicode standard defines.
Unicode libraries available as “logicielle libre” (free-as-in-freedom software) - for example Glib - allows to know it a given character is a symbol (i.e.
g_unichar_type (a_char) = G_UNICODE_MATH_SYMBOL; see also this table ).
Of course I wasn’t meaning to allow code like «class PERSON… feature infix "open"», otherwise we will drive the parser crazy ending up with something Perl-like.
This way we would get rid of usual rant of people coming from languages with overloading (i.e. C++, Java) that says:
“why do I have to write
my_matrix.multiply (another_matrix).scalar_vector(a_vector)
instead of (my_matrix*another_matrix)*a_vector?”
Because the mathematician would have rather written
"(my_matrix × another_matrix) ^ a_vector"
scalar and matrix multiplications are not arithmentic multiplication and in fact they actually have different symbols in "real" math.
The infix-prefix name rule could be therefore expressed using Unicode classification of characters.
Actually I would like to write Latex-like code `y := { -b +- sqrt { b^2 - 4 a c} } / {2a} ` or ` `A_x := int_0^1 f_m(x)dx ` in a way similar to what ASCIIMathMl does for HTML pages. But this is currently a dream.

JavaScript backend?

Did anyone saw me writing about an Eiffel to Javascript compiler?
Well indeed I was pondering about using Liberty as a language for the Web 2.0, AJAX, anything having a JavaScript interpreter.
Is there any recent appliance that does not have a JavaScript interpreter built-in these days?
We cannot afford being cut-out of this!
We need a JavaScript backend, so that Liberty actually becomes an Eiffel-to-Javascript compiler!
Yet we have to deal with resource limitations, most notably a chronic lack of time.
Here enters Emscripten, the LLVM-to-JavaScript compiler.
As written elsewhere in this blog we were already targetting LLVM, so Emscripten is really a blessing from heaven above: it takes compiles from LLVM bytecode to JavaScript. Bingo!
Compiling to LLVM will automagically make Liberty an Eiffel-to-Javascript compiler.
You may get the source code https://github.com/kripken/emscripten/wiki and read about it from http://syntensity.blogspot.com/2011/04/emscripten-10.html (in http://syntensity.blogspot.com/ ), a blog continued in http://mozakai.blogspot.com/

Actually there are some enjoyable considerations about how such a tool shall be called:
  • Q. Is this really a compiler? Isn't it better described as a translator?
    A. Well, a compiler is usually defined as a program that transforms source code written in one programming language into another, which is what Emscripten does. A translator is a more specific term that is usually used for compilation between high-level languages, which isn't exactly applicable. On the other hand a decompiler is something that translates a low-level language to a higher-level one, so that might technically be a valid description, but it sounds odd since we aren't going back to the original language we compiled from (C/C++, most likely) but into something else (JavaScript).