Friday, September 17, 2010

I was told Eiffel wasn't like C++...

Many Eiffel enthusiasts - I shall put myself in the group - often write about the many pros of our beloved language when compared to C++.
One of those aspects that I was sure Eiffel handled better than C++ was templates, known as generics in Eiffel.
C++ templates has been often accused to lead to code bloat and large executable, since the compiler generates specialized code for each type used in a template.
As an example when you use QList as QList<int>, QList<MyClass*>, QList<AnotherClass*> you will end up having three specialized copies of each and every function defined in QList, a class that has more than seventy functions. 
This justify why C++ executables are often considered "fat".
I don't know why but I have been always convinced that generics in Eiffel didn't show this pattern.
SmartEiffel proved me wrong.
I suspected it compiling my own wrappers-generator and getting  a 5,2 Mb executable from more or less 4450 lines of code as reported by command "wc".
Ehi! This is more than 1200 bytes of binary code for each and every Eiffel line, empty lines and comments included!
No, my coding style can't be so awesome and this tool is nothing special.
It just shall not be this big.
So I dived into generated source code.
At the beginning of wrappers_generator.id which contains a description of the compiled classes from the C and Eiffel compiler point of view I can read something like:

478 "HASHED_DICTIONARY[XML_DTD_ATTRIBUTE,UNICODE_STRING]" l
#
555 "HASHED_DICTIONARY[COMPOSED_NODE,UNICODE_STRING]" l
#
404 "HASHED_DICTIONARY[WEAK_REFERENCE[ANY_LINKED_LIST_NODE],STRING]" l
#
467 "HASHED_DICTIONARY[RECYCLING_POOL[PROTOCOL],STRING]" l
#
365 "HASHED_DICTIONARY[POINTER,STRING]" l


So I looked into wrappers_generator_T478.c wrappers_generator_T555.c  wrappers_generator_T467.c wrappers_generator_T365.c which are the C code containing the translations for those.
Please note that with the exception of POINTER, each and every classes referred in those incarnations of HASHED_DICTIONARY are reference classes that gets converted into a type-less C pointer ("void*" for the C-fond).
Now I looked into the function called Txxxcreate_with_capacity ... I wasn't too surprised to find a couple of pages of machine-generated C code that is exactly the same as all thos Txxx pointers are actually void pointers.
So in the end SmartEiffel actually implements generics in the very same (bloated?) way as C++ implements templates.
Now I guess that people smarter than me have made many researches on the topic but let me wonder whenever Liberty may avoid this.
And I know it is avoidable for reference classes.
Think a little about GLib collections and how they implemented generic object-oriented containers in C. They do not have the compiler to generate templates or generics for them, so they will end up with exactly one "binary" implementation of "replace" (g_hash_table_replace) for every kind of hash table.
That is how I would like to implement generic classes.
The only tricky part is that when you will invoke feature "foo" of the parametric type (ITEM in class LIST[ITEM]) you need to query the runtime for the address of ITEM_foo, so you will need a full fledged object-oriented type system.
I'm thinking about eventual implementation of multiple-inheritance runtime. I'm investigating a variant of Gobject interfaces. In fact if we turn each and every attribute query into an actual function call multiple inheritance looks somehow like using multiple interface; this will have a deep impact of performance since every access to any structure field will be turned into a deferred/virtual/indirect function call; actually I suspect that the link-time function in-lining of LLVM may be the silver bullet. Otherwise it won't be feasible, except if we want to "degradate" Liberty to an interpreted language.
A little ending note: when dealing with "expanded" classes, or with objects passed by value "template instantiation" is the only feasible way to go, so C++ was right.
Luckily (Smart)Eiffel prescribe that reference classes are always passed by reference and expanded always by value. This greatly simplify the work for the compiler to translate source code but most importantly greatly simplify things for our neurons.
Please feel free to correct any wrong deduction of this little note.