Wednesday, April 1, 2009

Separate Implementation and Interface

I've been chafing at the limitations of OO programming, particularly v-table style. And interfaces is one of the problem areas, because I want to re-use implementations.

I'm reading "The Paradoxical Success of Aspect-Oriented Programming" by Friedrich Steimann, and he talks about modularity. Parnas1 defined modularity as restricting the number of programmers that have to know about some implementation2. That's right, not a textual, or language issue. A person issue. This leads Stiemann to talk about interfaces, and leads me to distinguishing interface from implementation.

First, "class" should be the public-interface (the API), and inheritance should be on the basis of interfaces only. Because we want Liskov substitution3. The interface is how you can use an object, or, if you prefer, how it behaves. Typically, the interface is a list of method signatures. There is no implementation. Interestingly, some of the functional languages define class this way (e.g. Clean, and I think Haskell). Unfortunately, while some languages like Java can support this, it is a bit tedious, though much of Java's standard libraries are done this way. Even PHP has interfaces for things like the standard Array behavior.

This use of "interface" is different than the historic understanding related to modules, according to Stiemann4 ("class" being a restricted form of module). We should remember that this use is a a shorthand for "class interface," or some such.

Second, an "implementation" is a data-structure plus procedures that provide the interface5. All this, I would say, in a module, since we want to localize the implementation and let other programmers be oblivious. You don't inherit from implementations, though you should be able to re-use them.

The StdC++Lib is built this way. If you haven't studied this library (particularly the original STL documents), you should. Interesting that a language with poor support for the separation of interface/implementation should have the premier example. Note that part of the interface, e.g. complexity-guarantees, are not expressible in C++.

Any algorithm that is polymorphic could be provided as a default implementation in the interface. For example, "map", "filter", and "foreach" can be written in terms of "fold" without caring about the actual data-structure. An implementation can specialize any of these algorithms for optimization, or whatever. Clean (and Haskell?) do this.

Why this distinction between implementation and interface? Because, I can now do what we all want to do: re-use implementations. I believe that's what 90% of OO programmers do when they use inheritance. And, they get frustrated when they want to re-use implementation, but it's not compatible with class-inheritance. And, it is now clear that they are 2 different things. This separation is more natural in a multi-method language, since methods don't have to belong to the class's module, and polymorphic methods are more common (and easier).

When I re-use implementation by common class-inheritance, I want the vast majority of the implementation to stay the same, and change a few bits of it. For example, you make Model objects by inheriting from ActiveRecord. And in one case, I needed to customize the PK behaviors.

Or, more radically, haven't you been tempted to use multiple inheritance to get the implementation of something (say a hash/dict) but to use it as extending a some class? I know, "delegation" or some-such is the purported solution here, but it's tedious and annoying.

Third, there is one more level of distinction. When an algorithm for a method is complex enough, you'll naturally factor it into helper functions. Someone re-using your implementation should be able to override relevant helper-functions. And that is dangerous, because there is no agreement that the implementation is made up of so-and-so helper-functions, interacting in such-and-such an order. Our assumption is that the implementation is changeable (e.g. RoR's ActiveRecord changed between versions).

We need an analog of "interface" for algorithms. Which means we need something like the idea of "module" for the method and its helpers, with a declaration about behavior and what can be overridden6. Steimann calls this the protocol, "the sequence in which procedures can be called."7

For example, imagine some data from a web-form for the fields of a Model object _and_ some of its related Model objects. Highly normalized Person and Address comes to mind. So, represented as a dictionary/tree you would have something like: Person => first_name => "Bob" last_name => "Jones" Address => street => "123 Main St." city => "New York" ... I wrote an implementation to populate Model objects from this, handling foreign-keys for the relations, etc. Naturally, it's a bit more than 10 lines of code, and wants to be factored into a set of helper functions.

And, naturally, there is some warped Model object that wants to mess with the algorithm. So, I'd like to declare what pieces of this algorithm are available (and reliable) for overriding. We already do this, of course, in documentation. Which is fine, and is the evidence for this last distinction of "algorithm interface."

Documentation and code have a hard time staying in sync, so some language support would be nice (at least annotations). Something like annotations in the interface-method, saying "these helpers can be tweaked, they are used in this way." Essentially a description of the method. Forcing strategies (functional style) can help express this too8.

I would like to program more using these distinctions, but most languages make it gruesome (Java, I'm looking at you). What disciplines/strategies have you used in a language to gain the benefit of these distinctions?

[1] Go look it up. What? Am I your research assistant?
[2] So says Steimann. http://onward-conference.org/files/steimannessay.pdf
[3] Complexity could be another characteristic of the interface, or any other trait about behavior.
[4] Steimann2 p. 8, footnotes 13 and 14. See "Gautheir & Pont," and, of course Parnas.
[5] And, perhaps, like StdC++Lib, complexity descriptions.
[6] This all applies to the polymorphic methods that are the default implementations mentioned above.
[7] Steimann2, p.8 footnote 14.
[8] I read some blog entry on this, which I now can't find.

Sunday, March 15, 2009

OO Terminology, or V-Tables Are So 1980

What should be a focused idea will often lead me to some major (essential) digressions in fundamentals. For example, while writing notes for a small essay1, I had to think hard about a couple of things, one of which was the idea of objects. I have spent some effort over the years trying to understand what OO really means, and thought I had a good handle on it2.

The capability-security-model uses objects3 (and is now properly known as the object-capability-model, aka ocaps), and the issue I was exploring lead me into some difficulties. I have a prejudice for generic functions, and thus the separation of type and behavior4. Simula-like (e.g. Java) languages package the two together. I found it easy to think about ocap issues with Simula-like objects, but confusing with generic methods.

I wanted to use the correct terminology in my notes, and ultimately in any resulting writings. That would help me keep the ideas straight, and my readers wouldn't have to decode my private language. However, I believed that the current usage was terribly confused.

Specifically, the popular understanding of "class" is wrong, and makes it difficult to talk about generic functions. I managed to restrain my impulse of correcting the wayward and imposing my meanings. I tried to look it up.

Apparently, Armstrong, "The Quarks of Object-Oriented Development"5, thinks OO means Simula-like objects, and wants "class" to mean type+methods. I believe this is a typical hierarchical approach to definitions, and I've found it more fruitful to consider orthogonal elements. Armstrong's apparent archetype is only one popular assemblage of the elements.

Rees, "JAR on Object Oriented"6, takes the orthogonal, or chinese-menu approach of definition and slyly notes that the Simula-style OOP is impoverished, yet widely considered the definition7.

On c2.com, there is a lovely concept to bring in focus the meaning of OO: "that an expression in a program, consisting of a function identifier and one or more arguments, can stand for multiple implementations." Similarly, "the basic idiom is to identify an object and then act upon it,"8 though that excludes multi-dispatch.

There seems to be a conflict between two camps: the v-table crowd (C++/Java) which tends to be reactionary ("that's not OO!"), and the crowd with wider experience (e.g. CLOS) that sees the v-table style as a special, and impoverished case. In the professional community, the v-table crowd dominates to the point of exclusion.

So, while there is some support of the taxonomies I prefer (cf. "wider" above), nobody will understand me if I use it. I'll have to use the popular definitions and complicate the explanation by qualifications and exceptions. And, probably have side-bars to explain multi-methods, etc.

I haven't found a good entre into the academic discourse, nor do I want to read pointless, hyperfocused papers on irrelevant minutiae.

[1] An essay on managing capability-security-model graphs when trying to share data: e.g. the file-system. Aka "Deep Attenuation."
[2] I thought I had identified all the orthogonal pieces of OO.
[3] Name change claimed by http://c2.com/cgi/wiki?ObjectCapabilityModel as of 2009.03.03
[4] Cf. the dualism of single-dispatch vs generics and monkey-patching...
[5] See http://wiki.gsi.de/pub/Personalpages/RootTips/RClasses_2.ppt, and http://portal.acm.org/citation.cfm?id=1113040.
[6] See http://mumble.net/~jar/articles/oo.html
[7] If you only know Simula-style OOP, you are hereby required to prefix all of your mentions of OO with "Simula-style" or "v-table hack style."
[8] The author of that statement intended it to contradict the previous quote, but I see them as equivalent: "coherent objects that interact .... The rest is stylistic elaboration." http://c2.com/cgi/wiki?DiscussAlternateObjectOrientedProgrammingView, as of 2009.02.15

References http://en.wikipedia.org/wiki/Object-oriented_programming