1 Introduction
In the early days, Operating Systems lacked many services or features needed by applications. Instead of focusing on the core business, an application programmer had to provide everything from graphical interfaces to multitasking, networking, etc.
As Operating Systems became more powerful and complete, applications tended to trust many of their basic functions to them. It is unlikely that a new application developed today would implement a communication or graphical framework of its own.
When Smalltalk was first introduced, Operating Systems were small, the set of tools they provided were rather small, every application built had to do everything by itself. Operating Systems today are very powerful and have a rich set of features to offer. Modern applications delegate basic functionality to the lower layers. Developers concentrate their efforts on the core business rather than wasting precious time on peripheral functionality.
Huemul is a modern Smalltalk. It relays entirely on the Operating System (and hardware) to implement the basic behavior. Huemul has wrappers to that functionality in order to adapt them and provide the behavior that a Smalltalk user would expect, but, since those wrappers are a small layer of code, Huemul is very fast.
This concept is repeated everywhere in Huemul. The first example of this principle is reflected in the compilation process. Instead of implementing a new virtual machine, Huemul uses the fastest possible virtual machine for an architecture, the very self CPU.
Another major approach to the concept of re usability of the Operating System is the graphical user interface (GUI). Instead of drawing the GUI by itself, Huemul relies on existing technologies or frameworks like GTK [1]. This gives the possibility to evolve with the framework, instead of having to make the framework evolve.
In the next chapters, it is described a thorough explanation of how the re usability paradigm is applied in Huemul.
2 Virtual Machine
Smalltalk has taken off because it is a decent programming language and because it is implemented by compiling to bytecodes. However, interpreting bytecodes makes a Smalltalk program several times slower than comparable C or C++ programs [13]. In Smalltalk, the execution phase is handled by a classic Virtual Machine (VM) that reads in and executes the bytecodes. Think of the VM as a simulator for a machine whose instruction set is Smalltalk bytecodes.
Using an interpreter (simulator) adds quite a bit of execution overhead. A common solution for high-performance Virtual Machines is to use dynamic translation or just-in-time (JIT) compilers. In that case, the runtime system will notice a method has been called enough times to make it worthwhile to generate machine code for that method on the fly. Future calls to the method will execute the machine code directly. This can provide substantial speed-up, but it is still slower than C or C++.
There are two main problems with the JIT approach compared to conventional compilers: the startup time and poor optimization [14].
The compilation is done every time the application is executed, which increases start-up times substantially. It takes time to compile a method, especially if you want to do any optimization. If you decide to compile only the methods most often executed, then you have the overhead of measuring those.
The JIT compiler has to run fast, and therefore cannot do any substantial optimization. This is specially true if it is expected a better version of the method, each time that it is run. This leads to an increasingly potential bottleneck.
Huemul uses the CPU as its Virtual Machine. Instead of making an early compilation to bytecodes, Huemul's early compilation is done directly to machine code. In this way a first approach to speed is gained, and no startup time is wasted, methods in Huemul are run fast from the very first execution. Huemul could still benefit from most of the techniques available to speed up dynamic object oriented languages, like PIC and automatic recompilation and inlining of code.
2.1 Object Memory
Huemul inherits much of its functionality from Squeak [2]. One of the main bottlenecks of Squeak is it's Object Memory. It is fairly complex to build a good just in time compiler because most of the CPU time is spend in circumventing this complexity [10]. Huemul's Object Memory, on the other hand is designed with simplicity in mind.
Every Object (except small integers) in Huemul has the same header:
|
Class |
||
|
Size in bytes |
||
|
Information |
||
|
First instance variable or unnamed element |
|
|
|
Second instance variable or unnamed element |
|
|
|
... |
|
|
So every object in Huemul has an overhead of 3 words for every object. It is a fairly big overhead in terms of size, but it is a big advantage in terms of execution speed. Class, size or information access is done in a very straightforward way, and this allows an easy inlining of most of the basic functionality of the Smalltalk language.
2.2 Tagged integers
Smalltalk paradigm instructs that everything is an object. But, as shown in the Blue Book [9], a simple optimization for size is to encapsulate small integers. With this mechanism, we avoid using a header for Small Integers, and doing so, we gain a lot of space, since Small Integers are used a lot in a Smalltalk system. A drawback of this approach is that we have two methods of obtaining an object's class, first we have to examine if it is a Small Integer, in case if it is not a Small Integer, then we must lookup the object's class.
The encapsulation of Small Integers as defined in the Blue Book, is done by using the least significant bit of the word as a signal that tells if this word is a Small Integer or if it is a pointer. If it is a Small Integer, the signal is a 1, if it is a pointer, the signal is a 0.
|
Small Integer |
1 |
|
Object pointer |
0 |
If implemented in this way, accessing an object pointer is a direct reference, as long as every object is aligned. But the problem arises with Small Integers, as we have to detag, do normal arithmetic and retag again. Optimizers like Exupery [10] make a good job circumventing this problem.
But there is another alternative, called 0 tagged integers as opposed to what is explained before. With this approach, Small Integers get the 0 flag, and object pointers get the 1 flag.
|
Small Integer |
0 |
|
Object pointer |
1 |
This time we can not access a pointer directly, because it is pointing to the next byte of where the object actually is. This is not really a problem, because in CPU's it is the same (in terms of speed) to do:
load the contents of pointer X
and:
load the contents of pointer X – 1.
because, load the contents of pointer X is in fact, load the contents of pointer X + 0.
But the real gaining in speed is obtained by the 0 tagged integer, because we don't need to detag for most of the frequently used arithmetic. The most common operation is Small Integer addition, and this can be done without the detagging/tagging mechanism.
Huemul uses 0 tagged integers instead of 1 tagged integers. Changing from 1 tagged to 0 tagged lead to substantial improvement in execution speed, and it also lead to better inlining and optimization.
2.3 Processes
Smalltalk units of execution are defined by instances of the Process class. This Processes are implemented mostly as application threads. This means that the Virtual Machine is in charge of the life cycle of the processes. So every Smalltalk VM has embedded a scheduler to run this mechanism. The problem here is that poorly written schedulers don't scale [15]. For a simple desktop application this is not a problem, but for web based, server applications with thousands of clients connected, it rapidly becomes a problem. There is still another drawback, multi core CPU's are becoming a common default, with large scaling systems, schedulers are stressed a lot [16].
Unix pthreads are a standard multi thread framework. It is a simple framework, scales well and it is fast [17]. Huemul uses pthreads as the back end for the Process class. In this way, Huemul let the Operating System do the magic. Every instance of the Process call generates a UNIX pthread. The Operating System take control of the execution and scheduling of the threads, so there is no scheduler in Huemul. A direct advantage of this behavior is that modern multi CPUs, multi core systems are automatically and efficiently used by Huemul.
2.4 Contexts
Every process in Smalltalk is further divided in smaller activation records called contexts. This contexts keep information about the execution of a chunk of code. When a new method is invoked, a new context is created (or a block's context is activated). This context has references for the temporaries, arguments, stack linkage, and every piece of information needed to run the method. Huemul doesn't use contexts. At least not the formal Smalltalk specification of contexts. Huemul uses the standard C++ Calling Convention, which is easy to implement, fast and standardized. Huemul methods could be viewed by standard debuggers like gdb, and it is easy to mix Huemul code with C code, and even call linked libraries.
Activation record example
Consider the following piece of Smalltalk code:
^ x foo: y
It is translated to this (uncached):
push y
push x
mov #foo:, %eax
call getMethodIP
call *%eax
add $12,%esp
The function getMethodIP return the IP of the starting address for the method in %eax, and the last pushed value (the receiver). If the IP is cached, it is a very fast function. If it is not cached, it has to search through the classes the normal and slow way. The function getMethodIP always returns a valid IP, either for the find method or for #doesNotUnderstand.
When we reach #foo: we will normally do:
push %ebp
mov %esp, %ebp
sub $[numTemps+4], %esp
....
leave
ret
Which is the same peace of code that a normal C compiler would add. So, with this calling convention, variables are accessed the same way as normal C code would, which is as a displacement from the %ebp register.
Huemul Blocks are implemented with the standard setjmp/longjmp C functions. Whenever a peace of code encounters a returning code, it issues a setjmp in the calling method. This helps the remote returning block to return home by issuing a longjmp. This setjmp/longjmp mechanism is also used in the unwinding procedure, for exceptions and ensured blocks.
3 Compilation Process
Huemul behaves like a normal Smalltalk system in terms of the development process. It has an internal compiler that translates to it's native code. The difference is that the native code of Huemul is CPU native code instead of bytecodes. When a method is accepted it has to be compiled to become available for execution.
The compilation of code is executed in phases. The first phase is the parsing of the sourcecode. It is handled by the SmaCC [4] . It analyses the sourcecode, check for syntax errors, and if everything is O.K. it produces a tree that follows the grammar of the Smalltalk language.
Then the Refactory Browser [5] is used to rebuild the tree and convert it to a suitable source for Exupery [Exupery ]. This phase does code optimization by inlining common expressions like #ifTrue:ifFalse, and basic arithmetic and comparison of small integers. The original Exupery translates code directly from bytecodes, and it does not use Squeak's New Compiler [3].
The last phase is handled by Exupery, it optimizes and produces x86 machine code. Right now Exupery generates just x86 machine code, but it is prepared to be extended to any other hardware architecture.
Apart from normal information, such as the number of temporaries or arguments, literals, and the actual bytecodes (x86 machine instructions), compiled method holds relocation and debug info. Relocation info is used to allow method to point to globals and standard functions directly. Debug info is used to implement the internal debugger.
4 Traits
Traits [11] are a simple composition mechanism for structuring object-oriented programs. A Trait is essentially a parameterized set of methods; it serves as a behavioral building block for classes and is the primitive unit of code reuse. With Traits, classes are still organized in a single inheritance hierarchy, but they can make use of Traits to specify the incremental difference in behavior with respect to their superclasses. Unlike mixins and multiple inheritance, Traits do not employ inheritance as the composition operator. Instead, Trait composition is based on a set of composition operators that are complementary to single inheritance and result in better composition properties.
There are two major approaches of implementation, static and dynamic. With the first approach, Trait compositions are flattened at compile time, and Trait methods appear as normal methods to the classes that use them. With the dynamic approach, there is no physical inclusion of the Trait at compile time, the binding of the method is done at runtime. While Squeak uses static Traits because it doesn't require any change in Squeak's Virtual Machine, Huemul uses the dynamic approach. It doesn't have the complex flattening mechanism required by the static approach, instead it uses a simple and cached extended search mechanism, that binds the method at run time.
5 Libraries
Huemul is designed to be easy to connect to standard UNIX libraries. Huemul has a wrapper around UNIX libraries dynamic loader. I also has a set of classes to access standard C structures and data types. With this features calling a normal C function is as easy as:
(LibC default functionNamed: 'fread') invokeAnswering: returnOop with: destination with: (ExternalAddress fromInteger: 1)
Which is the Smalltalk equivalent of the following C code:
returnOop = fread( destination, 1 );
Everything is done at runtime, no need to recompile anything.
6 Graphical User Interface
Huemul does not have a self implemented graphics interface. It includes a GTK wrapper ported from the GTK Wrapper [8].
By using an existing GUI package, all of its features come for free to the hand of the Smalltalk programmer [18]. Many useful features like localization, theming, extensibility, etc. are standard.
Huemul has built in a set of tools that resemble the original ones. But they are build with the GTK framework. In this way, Huemul implements an intuitive user interface that is easy to adopt for the experienced user and the novice.
7 The system now
Huemul system is pretty stable right now, but it still needs Squeak as the external development environment. Development effort is focused in making Huemul standalone. This task includes making Huemul standalone, evolve the tools, and develop a good image footprint and a snapshot mechanism.
Huemul still lacks of a garbage collector (GC) . Different GC strategies are still being evaluated. The mostly parallel garbage collector [12] seems to be the best choice.
8 Future Work
Many new technologies are still being considered. Future releases may include these technologies
Namespaces. It would be desirable to implement some kind of name demangling and resolution for class names, but it should also be implemented in all kind of global variables. This would lead to a better scalability of the language, adding a big amount of clarity to the design
Better Compiler. There are two directions being studied for the compiler. The first one, to evolve the compiler using Exupery, the second one, by developing a front end for gcc [7]. Current compiler uses Exupery. It is a fairly simple compiler, but not very powerful. The main advantage with this approach is that everything is done in Smalltalk. Developing a gcc front end is a very difficult task in itself, but gcc is a very good compiler, and is very well tested too. The problem with gcc is that it would be used as an external tool, which is not a very “Smalltalk” approach.
Dynamic Module Loading. Smalltalk environments tend to become very large and complex. It would be desirable to have some sort of packaging, that packages could be loaded and unloaded at runtime, and be easily distributable or shared with other images.
Multiuser and ACL Security. Smalltalk historically lacked from a uniform multi user design. An image is mainly a mono user program that could be extended to provide some kind of authentication to the final user of a Smalltalk application. But there are not object or class protection embedded in the image. Some sort of security at the object level would provide the protection needed to provide a multi user environment at the language level.
Web Awareness. Seaside [6] and other frameworks proved that Smalltalk can also be used in the Web. Huemul is in the process of implementing network connectivity, but that is not enough. The idea is to provide a solid, fast and coherence infrastructure for those frameworks.
9 Acknowledgments
Thanks to Klaus D. Witzel and Sebastian Sastre for proof reading a draft version of this paper.
10 References
[3] http://smallwiki.unibe.ch/NewCompiler
[4] http://www.refactory.com/Software/SmaCC/index.html
[5] http://www.refactory.com/RefactoringBrowser/index.html
[6] http://www.seaside.st/
[7] http://gcc.gnu.org/
[8] http://squeakgtk.pbwiki.com/
[9] Adele Goldberg and David Robson, Smalltalk-80: The Language and Its Implementation, 1983.
[10] Bryce Kampjes, “Exupery – A Native Code Compiler for Squeak”, http://goran.krampe.se/exuperyDesign.pdf
[11] Lienhard, A. 2004. “Bootstrapping Traits”. M.S. thesis, University of Bern.
[12] Hans-J. Boehm, Alan J. Demers, Scott Shenker, 1991 “Mostly parallel garbage collectors”
[13] Craig Chambers, David Ungar. Customization: Optimizing compiler technology for SELF, a dynamically-typed object-oriented programming language. In ACM SIGPLAN '89 Conference on Programming Language Dessign and implementation, pages 146-160, June 1989.
[14] Matthew Arnold, Adam Welc, V. T. Rajan, Improving virtual machine performance using a cross-run profile repository, Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming systems languages and applications, October 2005.
[15] A. Borg. Avoiding blocking system calls in a user-level thread scheduler for shared memory multiprocessors. B.Sc. I.T. Final Year Project, Department of Computer Science and Artificial Intelligence, University of Malta, June 2001
[16] Saisanthosh Balakrishnan, Ravi Rajwar, Mike Upton, The Impact of Performance Asymmetry in Emerging Multicore Architectures, 32nd Annual International Symposium on Computer Architecture (ISCA'05), pages 506-517, 2005.
[17] Bradford Nichols, Dick Buttlar, Jacqueline Proulx Farrell, Pthreads Programming, A POSIX Standard for Better Multiprocessing, September 1996
[18] Michael Babcock, The Importance of the GUI in Cross Platform Development, Linux Journal, Volume 1998 , Issue 49es (May 1998) Article No. 5, 1998
