Friday, June 7, 2013

The importance of being GNU

More than a year has passed since the last article published on this blog.
Yet we haven't been idle as we kept working on the code hosted on GitHub and discussing in the mailing list.
I am now delighted and honoured to inform you that Liberty Eiffel has been officially being accepted as a GNU project and that it has been blessed as the continuation of SmartEiffel. 
Please visit our page on Savannah the facilities where most of the GNU projects hosts their development.
We also expect to release a first version of Liberty Eiffel this summer.

Monday, April 2, 2012

Adding øMQ and LLVM...

Pick an Eiffel compiler, cut it in small pieces, add a cup of socket library - it will work as a concurrency framework - and a cup of a low level virtual machine. Blend it all in the mixer for several revisions until all the tests are passed then pour into a repository.
Recipe from "Recipes for a successful evening with friends"
I would have liked to title this "LEC: Liberty/LLVM Eiffel Compiler".
Currently SmartEiffel is a monolithic piece of code in many senses.
It was considered fast, perhaps the fastest Eiffel compiler available. But it was under precise conditions, more specifically when building projects from the scratch on a single-core 32-bit processor.
Now that «the times they are a-changin'», almost all those assumptions does not held anymore: 
  • most of the time the programmer will rebuild a project after some small changes, 
  • multi-core processors are the norm even in phones; widespread machines easily have 4-6-8 or even more cores. Even the Ubuntu-certified Asus Eee PC TM Seashell 1015PXB I bought last week at 199€ is seen by the kernel as a four-cores machine.
  • most of those processors are 64bit
Like ISE Eiffel also SmartEiffel compiles to C and then invokes a C compiler to actually produce the binaries and that phase has always been parallelized to put at work all the cores of the machine. I initially planned to start to parallelize the parsing phase then after a few days of study I discovered that SmartEiffel design gives me an easier start from the back-end. My idea is simple: replace the original C back-end with one that outputs LLVM bytecodes, one compilation unit per class. After the original code made all the parsing, deconding and syntactic analysis I just wrote (some comments and check instructions removed):

Obviously all the work is done by a LLVM_WORKER which runs as a fork of the main compiler process; that way it has all the data structure ready in its address space; each worker starts listening to socket path for commands:


 class LLVM_WORKER
inherit 
    POSIX_PROCESS
    CLASS_TEXT_VISITOR -- To access CLASS_TEXT.feature_dictionary


insert EXCEPTIONS GLOBALS
creation communicating_over

feature {} -- Creation
    communicating_over (an_endpoint: ABSTRACT_STRING) is
        -- Start a new worker process that will wait for commands over a øMQ socket connected to `an_endpoint'
    require an_endpoint/=Void
    do
        endpoint := an_endpoint
        start -- this will fork a new process and invoke `run'
    end

feature -- Commands
    run is
        local command: ZMQ_STRING_MESSAGE
        do
            pid := process_id;
            ("Worker #(1) listening on '#(2)'%N" # & pid # endpoint).print_on(std_output)
            create context
            socket := context.new_pull_socket
            from socket.connect(endpoint)
            until socket.is_unsuccessful loop 
                create command
                socket.wait_for(command)
                if socket.is_successful then 
                    process(command)
                else throw(socket.zmq_exception)
                end
            end
            ("Worker #(1) ending%N" # & pid ).print_on(std_error)
        end

feature {} -- Implementation
    process (a_command: ZMQ_STRING_MESSAGE) is
        require a_command/=Void
        local words: COLLECTION[STRING]; index: INTEGER; cluster: CLUSTER
        do
            words := a_command.split
            if words/=Void and then words.count=2 and then
                words.first.is_equal("compile-cluster") and then 
                words.last.is_integer then
                index := words.last.to_integer
                cluster := ace.cluster_at(index)
                ("Worker process #(1) starts compiling cluster '#(2)' (##(3))%N" # &pid # cluster.name # &index).print_on(std_output)
                cluster.for_all(agent visit_class_text)
            end
            ("Cluster '#(2)' (##(3)) compiled by worker #(1)%N" # &pid # cluster.name # &index).print_on(std_output)
        end
.....




LLVM_WORKER does not yet use the PROCESS_POSIX provided by SmartEiffel: I wanted the quickest'n'dirtiest way to use fork() as this is primarily a test to øMQ bindings . After all the quick'n'dirty approach sometimes proves to be exceptionally successful...
People coming from loosely typed languages may argue that I could have written the process command like this:


    process (a_command: ZMQ_STRING_MESSAGE) is

        local words: COLLECTION[STRING]; index: INTEGER; cluster: CLUSTER

        do
            if a_command.split.first.is_equal("compile-cluster") then 

                index := a_command.split.last.to_integer
                cluster := ace.cluster_at(index)

                 ("Worker process #(1) starts compiling cluster '#(2)' 
(##(3))%N" # &pid # cluster.name # &index).print_on(std_output)

                cluster.for_all(agent visit_class_text)

            end

            ("Cluster '#(2)' (##(3)) compiled by worker #(1)%N" # &pid # cluster.name # &index).print_on(std_output)

        end


While this may be true now, I ideally want this to scale at least to a local-network scale - for the messaging part it's just a matter of adding socket.bind("tcp://localhost:9999") after the first bind - so assuming anything about a received message is plainly wrong; we may receive garbage. And when you process garbage all you get is garbage.
Nasty reader or innocent C++ programmers may have noticed that I haven't used threads, so I couldn't have used the real zero-copy in-process messaging. Any Posix programmer worth his/her salt knows that threads are evil... Jokes apart I would really like to have real zero-copy IPC; yet our current compiler is not thread-safe. I think I should rather implement auto-relative references and share a memory region between processes. I actually had a somehow working prototype of such a class, modelled after autorelative pointers, but they are so brittle to use that I was ashamed to commit it anywhere...

Tuesday, March 20, 2012

Successful languages have bearded designers


Bertrand Meyer (source Wikipedia)
Dislaimer: this post is a joke.
I have been an avid Slashdot reader since I got on the net when I started university. Yesterday I was naively reading "Why new programming languages succeed or fail" when I stumbled upon the first comment: «Everyone knows it's the Amount of Facial hair» which lead me to this "ancient"  blog entry.
Please read that entry blog, from which I'm taking many of the following photos.
See some designers of successful, widespread languages: C, Basic (entire generations started with it), C++, Python, Ruby, Java, PHP.
image image imageimageimage
image imageimage image
The student will usefully discover the name of each designer and his language.

Now let's see creators of famous yet-not-so-used or not-so-used-anymore programming languages or recent photos of the creators of not-so-fashionable-as-they-were languages that are losing ground: Fortran, Ada and Simula and C++ (in recent years)

image imageimage

Now some photos of Bertrand Meyer.
I'm pretty sure you note a striking similarity with one of the previous groups.
Oh, have you really seen it?
So please, Bertrand grew some beard or put a properly-bearded computer scientist in charge of Eiffel's future.

To end this blog entry with style here's a virtual exchange of spicy quotes:
«There are only two things wrong with C++: The initial concept and the implementation.» and «C++ is the only current language making COBOL look good.» – Bertrand Meyer
«There are only two kinds of languages: the ones people complain about and the ones nobody uses.» Bjarne Stroustrup

Long life and prosper, dear Meyer! Thanks for Eiffel and OOSC!

Don't cross commands and queries:  it would be bad. Me, know, paraphrasing Egon Spengler

Wednesday, February 22, 2012

Unicode source code

As Frank Salter wrote in a recent comment on case-sensitivity, most scientific fields requires the usage of several dozens of symbols and alphabets that go beyond the latin symbols usable in plain old ASCII, which is a standard almost half-century old (1968).
For example my field - civil and structural engineering - uses almost the entire Greek alphabets in the Eurocodes . 
There is a widespread need to go beyond case-sensitivity and ASCII.
I already expressed at the beginning of our efforts the choice to have Unicode source code as a design choice; shame on me for not having been able to find the time necessary to implement what it is clear in our mind.
We shall - at least - write source code in Unicode and lay down some style guide for its usage. 

First of all we needs more liberal infix and prefix operators. By “liberal” I mean allowing for example Unicode math codes, like many mathematical symbols the Unicode standard defines.
Unicode libraries available as “logicielle libre” (free-as-in-freedom software) - for example Glib - allows to know it a given character is a symbol (i.e.
g_unichar_type (a_char) = G_UNICODE_MATH_SYMBOL; see also this table ).
Of course I wasn’t meaning to allow code like «class PERSON… feature infix "open"», otherwise we will drive the parser crazy ending up with something Perl-like.
This way we would get rid of usual rant of people coming from languages with overloading (i.e. C++, Java) that says:
“why do I have to write
my_matrix.multiply (another_matrix).scalar_vector(a_vector)
instead of (my_matrix*another_matrix)*a_vector?”
Because the mathematician would have rather written
"(my_matrix × another_matrix) ^ a_vector"
scalar and matrix multiplications are not arithmentic multiplication and in fact they actually have different symbols in "real" math.
The infix-prefix name rule could be therefore expressed using Unicode classification of characters.
Actually I would like to write Latex-like code `y := { -b +- sqrt { b^2 - 4 a c} } / {2a} ` or ` `A_x := int_0^1 f_m(x)dx ` in a way similar to what ASCIIMathMl does for HTML pages. But this is currently a dream.

JavaScript backend?

Did anyone saw me writing about an Eiffel to Javascript compiler?
Well indeed I was pondering about using Liberty as a language for the Web 2.0, AJAX, anything having a JavaScript interpreter.
Is there any recent appliance that does not have a JavaScript interpreter built-in these days?
We cannot afford being cut-out of this!
We need a JavaScript backend, so that Liberty actually becomes an Eiffel-to-Javascript compiler!
Yet we have to deal with resource limitations, most notably a chronic lack of time.
Here enters Emscripten, the LLVM-to-JavaScript compiler.
As written elsewhere in this blog we were already targetting LLVM, so Emscripten is really a blessing from heaven above: it takes compiles from LLVM bytecode to JavaScript. Bingo!
Compiling to LLVM will automagically make Liberty an Eiffel-to-Javascript compiler.
You may get the source code https://github.com/kripken/emscripten/wiki and read about it from http://syntensity.blogspot.com/2011/04/emscripten-10.html (in http://syntensity.blogspot.com/ ), a blog continued in http://mozakai.blogspot.com/

Actually there are some enjoyable considerations about how such a tool shall be called:
  • Q. Is this really a compiler? Isn't it better described as a translator?
    A. Well, a compiler is usually defined as a program that transforms source code written in one programming language into another, which is what Emscripten does. A translator is a more specific term that is usually used for compilation between high-level languages, which isn't exactly applicable. On the other hand a decompiler is something that translates a low-level language to a higher-level one, so that might technically be a valid description, but it sounds odd since we aren't going back to the original language we compiled from (C/C++, most likely) but into something else (JavaScript).

Friday, January 20, 2012

I'm almost a literate program, please format me with a proportional font

Many long-time Eiffel programmers and experts always knew it: Eiffel should be formatted with a proportional font. Looking for the original reasoning of Bertrand Meyer about it I found this link in Wikipedia:
Proportional font: Unlike most programming languages, Eiffel is not normally displayed in a monospaced typeface. The recommended display style is to use a proportional typeface. Keywords are displayed in bold, user-defined identifiers and constants are displayed in italics. Standard upright (roman) style is used for comments, operators, and punctuation marks. This is essentially the same concept as syntax coloring which has become very popular in IDEs for many languages, except that Eiffel's recommendations also extend to print media.
I tried to follow these rules to write a little style for SyntaxHighlighter (I'll put them online as soon as possible to properly format our sources here). I would like to know your opinion about it; also please feel free to partecipate!
My personal opinion is that this stylistical rule is deeply yet unconsciously linked to Literate Programming since valid Eiffel source code often resemble a natural language like English, at least much more than C, C++, Python, Java and so on.
Actually an Eiffel program is not "written as an uninterrupted exposition of logic in an ordinary human language", yet the source code of an Eiffel class, with its emphasis on documentation, the preconditions, the postconditions and its invariant strikingly resemble an explanation in a natural language, such as English, of the design of the type it represent.
Eiffel source code looks so much like the pseudocode typically used in teaching computer science to explain algorithms that when Bertrand Meyer wrote Object Oriented Software Construction he actually used Eiffel from page one without telling it to his reader, pretending it was pseudocode; in the pages of the book it justified each and every design choice of his design then in the epilogue he actually revealed that the pseudo-code he used wasn't actually pseudo-code at all but real, effective Eiffel code.

Eiffel saw from ADA programmers

Some weeks ago I stumbled upon this entry of Ada FAQ: Some Isn't Ada less "elegant" than Eiffel? (also readable as a thread in Lisaac mailing list).
Now that I think I finally found an actual way to re-implement deep_twin and is_deep_equal with a pure Eiffel, reentrant, thread safe design (beware that I still have to test it extensively before saying it's good even to be labelled "alpha-quality") I must answer to those issues, since most of them don't hold anymore.
In particular, although I like the assertion stuff in Eiffel, I think the language has a number of "inelegant" aspects. For example:
  1. exception handlers only at the top level of a routine, with the only way to "handle" an exception being by retrying the whole routine.

    As far as I remember this have been a deliberate design choice of Meyer. The rationale behind this is that if a feature, either a command or a query is longer than a 20-30 lines it becomes far more difficoult to understand; it becomes easier to understand for the reader if it is broken in several independant pieces. Therefore the need for intra-feature exceptions fades away: when you need to handle an exception in a specific section of a feature this is the distinguished mark that that piece could be made an independant feature. If you don't want to expose it to your clients just hide it private behind a "feature {} -- Implementation of foo feature".

  2. No way to return from a routine in the middle. This makes it a pain in the neck to search through a list for something in a loop, and then return immediately when you find what you want. (I have never found the addition of extra boolean control variable a help to the understanding of an algorithm.)

    This is actually something I found myself longing to sometimes.

  3. Namespace control handled by a separate sublanguage, and no real higher level concept of "module" or "subsystem."

  4. An obscure notation like "!!" being used for an important and frequent operation (construction).

    This was a good point. In fact the "!!" notation have been (somehow) deprecated; while it has never been phased out by the language itself I have never read it in code writter in this century. Nowadays all Eiffel programmers worth their salt always use one of the following syntaxes:

    • "foo: LINKED_LIST[STRING] .... create foo" used when the created object be an actual instance as declared, using "default_create" feature to initialize it;
    • "foo: SET[STRING] .... create foo.make_with_capacity(120)", the most widespread usage where you tell the compiler which creation feature shall be used to initialize the newly created object
    • "foo: COLLECTION[STRING] .... create {TWO_WAY_LINKED_STRING} foo" used when you want to create an object of a subtype of the declared type, either because the declared type is deferred (a.k.a. virtual for you people coming from C++/Java world) or because you want to use a subclass that better fits the task.
    • manifest notation like "foo := {TWO_WAY_LINKED_LIST[STRING] << "Some", "strings", "in", "a collection">> }" or "{RING_ARRAY[CHARACTER] 1, << "These", "strings", "live in", "a collection", "which is actually a RING_ARRAY">> }"; you may read further examples in MANIFEST_NOTATION tutorial
    • in-place creation, in the middle of the arguments of a feature "produce_and_ship_with(12, "Chocolate cakes", create {CAR}.with_plate_and_label(clients_car_plate, "The car of client "|client_name)"; by the way this line show the usage of one of the three concatenation operators we introduced in strings. I should really write a blog entry about these... (they are infix "+", infix "|" and infix "&", see ABSTRACT_STRING for a brief description.
    I thing it may be useful to make the compiler emit a warning when such a syntax is still used. Perhaps we may be daring enought to just drop it, allowing to use the exclamation point for free operators: it is a good thing that people learning Eiffel shall not be exposed to that "archeological" syntax.

  5. No way to conveniently "use" another abstraction without inheriting from it.

    This is exactly what both Eiffel standards (ECMA and GNU/Smart) implemented in this decade: non-conforming inheritance; now when you need to use another abstraction without being a type conforming to it you "insert" it.

  6. No strong distinctions between integer types used for array indexing.

    I think this is a "non issue" at all, since array access does not have any special status at all in Eiffel; actually you may just define your arrays to accept different types on indexes. I suspect that this

  7. Using the same operator ":=" for both (aliasing) pointer assignment, and for value assignment, depending on whether the type is "expanded." (Simula's solution was far preferable, IMHO).

    This were actually confusing some years ago since you may decide in the definition of each feature whenever an argument shall be expanded or not. Now in GNU/SmartEiffel this is notnot confusing in my humble opinion in since the difference between assignment by reference or by value (expanded values) is estabilished class per class; when you need to have a reference to an expanded class you just use REFERENCE[FOO] with FOO being an expanded class. By the way you may read the definition ofREFERENCE discovering that it is a simple, plain and normal generic class that does not require any special support from the compiler at all. I think that deciding once and forever if a class is either a reference or an expanded value makes code far easier to understand for the reader, easier for the designer to conceive and easier for the compiler to parse and optimize.


    And most critically:

  8. No separate interface for an abstraction. You can view a interface by running a tool, but this misses completely the importance of having a physical module that represents the interface, and acts as a contract between the specifier or user of an abstraction and its implementor. In Eiffel, one might not even be truly aware when one is changing the interface to an abstraction, because there is no particular physical separation between interface and implementation.
  9. Again this is a deliberate and willingful consequence of an explicit design choice; it has been extensively explained by Meyer in Object Oriented Software Construction; this would be a nice topic for a foreseeable blog entry as it's a sin not to explain it to people on the net; I wish Meyer published OOSC directly on the net as I'm sure that the profits for the advertizing that he would put will more than compensate his lost revenues.