Friday, September 17, 2010

I was told Eiffel wasn't like C++...

Many Eiffel enthusiasts - I shall put myself in the group - often write about the many pros of our beloved language when compared to C++.
One of those aspects that I was sure Eiffel handled better than C++ was templates, known as generics in Eiffel.
C++ templates has been often accused to lead to code bloat and large executable, since the compiler generates specialized code for each type used in a template.
As an example when you use QList as QList<int>, QList<MyClass*>, QList<AnotherClass*> you will end up having three specialized copies of each and every function defined in QList, a class that has more than seventy functions. 
This justify why C++ executables are often considered "fat".
I don't know why but I have been always convinced that generics in Eiffel didn't show this pattern.
SmartEiffel proved me wrong.
I suspected it compiling my own wrappers-generator and getting  a 5,2 Mb executable from more or less 4450 lines of code as reported by command "wc".
Ehi! This is more than 1200 bytes of binary code for each and every Eiffel line, empty lines and comments included!
No, my coding style can't be so awesome and this tool is nothing special.
It just shall not be this big.
So I dived into generated source code.
At the beginning of wrappers_generator.id which contains a description of the compiled classes from the C and Eiffel compiler point of view I can read something like:

478 "HASHED_DICTIONARY[XML_DTD_ATTRIBUTE,UNICODE_STRING]" l
#
555 "HASHED_DICTIONARY[COMPOSED_NODE,UNICODE_STRING]" l
#
404 "HASHED_DICTIONARY[WEAK_REFERENCE[ANY_LINKED_LIST_NODE],STRING]" l
#
467 "HASHED_DICTIONARY[RECYCLING_POOL[PROTOCOL],STRING]" l
#
365 "HASHED_DICTIONARY[POINTER,STRING]" l


So I looked into wrappers_generator_T478.c wrappers_generator_T555.c  wrappers_generator_T467.c wrappers_generator_T365.c which are the C code containing the translations for those.
Please note that with the exception of POINTER, each and every classes referred in those incarnations of HASHED_DICTIONARY are reference classes that gets converted into a type-less C pointer ("void*" for the C-fond).
Now I looked into the function called Txxxcreate_with_capacity ... I wasn't too surprised to find a couple of pages of machine-generated C code that is exactly the same as all thos Txxx pointers are actually void pointers.
So in the end SmartEiffel actually implements generics in the very same (bloated?) way as C++ implements templates.
Now I guess that people smarter than me have made many researches on the topic but let me wonder whenever Liberty may avoid this.
And I know it is avoidable for reference classes.
Think a little about GLib collections and how they implemented generic object-oriented containers in C. They do not have the compiler to generate templates or generics for them, so they will end up with exactly one "binary" implementation of "replace" (g_hash_table_replace) for every kind of hash table.
That is how I would like to implement generic classes.
The only tricky part is that when you will invoke feature "foo" of the parametric type (ITEM in class LIST[ITEM]) you need to query the runtime for the address of ITEM_foo, so you will need a full fledged object-oriented type system.
I'm thinking about eventual implementation of multiple-inheritance runtime. I'm investigating a variant of Gobject interfaces. In fact if we turn each and every attribute query into an actual function call multiple inheritance looks somehow like using multiple interface; this will have a deep impact of performance since every access to any structure field will be turned into a deferred/virtual/indirect function call; actually I suspect that the link-time function in-lining of LLVM may be the silver bullet. Otherwise it won't be feasible, except if we want to "degradate" Liberty to an interpreted language.
A little ending note: when dealing with "expanded" classes, or with objects passed by value "template instantiation" is the only feasible way to go, so C++ was right.
Luckily (Smart)Eiffel prescribe that reference classes are always passed by reference and expanded always by value. This greatly simplify the work for the compiler to translate source code but most importantly greatly simplify things for our neurons.
Please feel free to correct any wrong deduction of this little note.

8 comments:

  1. EiffelStudio only generate one version for all generic derivation which are references, and for optimization purpose one generic derivation for each expanded actual generic parameter.

    ReplyDelete
  2. That's the most desirable implementation. Thanks for the hint.

    ReplyDelete
  3. Mmmm. Not sure about that. Always the same trade-off: size vs. performance.

    Note that a unique implementation for all reference classes means that there is an "if" in each and every call site (or a VFT, whatever, it's not a direct call). Does llvm correctly optimize such cases?

    Anyway, does it matter that much? (SmartEiffel people say yes, but I have no hard figures)

    Also note, currently the Liberty internals produce a distinct type for each generic class derivation, so it is ready for SmartEiffel-like compilation.

    ReplyDelete
  4. PS- founding articles on the subject:

    http://www.loria.fr/~colnet/publis/jmlc97.ps.gz

    http://www.loria.fr/~colnet/publis/oopsla97.ps.gz

    http://www.loria.fr/~colnet/publis/ismm98.ps.gz

    ReplyDelete
  5. As far as I can say LLVM can inline functions at link-time, i.e. just before runtime.
    I once got some little C prototype that using C weak linking directive allowed to redefine a function. So each class, also generics may be compiled separately thinking it will use "deferred calls", i.e. pointer to functions or Vtables while each and every object linked later on can redefine it handling various implementations of deferred or redefined calls. This include the final executable which is able to use the same static analysis approach of SmartEiffel.
    I would try to follow GHashTable approach: the compiler would pick all the features of parametric type used in the generic class and build a "inner", low level creation function having as parameters the pointers to such features, like GHashTable ask for pointers to functions used to compute the hash and to compare items. Making those creation functions weak linkable symbols we allow further libraries to provide more specialized version of those features, removing the usage of pointers to functions.

    ReplyDelete
  6. This days i'm working with project based on micro kernels so C++ is essential one for me

    ReplyDelete
  7. No any templates - compiling of simple "Hello world" in EiffelStudio. Generated .exe file (Windows) is 24Mb!!! No any generics :) How to "strip" this exe?

    ReplyDelete