Why Object-Oriented Languages Need Tail Calls

This blog post is a mirror of a post on the Project Fortress Community Blog by Guy Steele. The original posting is no longer available. I retrieved this copy of the text from the Wayback Machine.

Because the snapshot that the Internet Archive took of the original post was made on December 6, 2009, the only comments that appear are those that were made leading up to that date.

November 30, 2009: Why Object-Oriented Languages Need Tail Calls

[Note: The original blog post used the term “tail-recursion”, but in response to feedback on December 1, I have replaced this term with “tail calls” to emphasize that the issue has to do with multiple objects calling each others’ methods, not just the limited case of an object invoking its own methods or the even more limited case of a single method invoking itself (whether on the same object or another object). Also, some minor errors in the code examples have been corrected as of December 3.]

We recommend an interesting essay from the recent ACM OOPSLA Conference:

William R. Cook. On understanding data abstraction, revisited. Onward! Essays, 2009 ACM OOPSLA. SIGPLAN Notices 44, 10 (Oct. 2009), 557-572. DOI=http://doi.acm.org/10.1145/1639949.1640133 (Or here is the draft version.)

Cook makes an important distinction between abstract data types and object-oriented programming, and argues that they have related but complementary characteristics.

In this blog post we extend one of his examples in order to make a completely different point: object-oriented programming languages need tail calls correctly implemented, not just as a “trivial and optional” optimization of the use of stack space that could be achieved by using iteration statements, but in order to preserve object-oriented abstractions.

Following Cook, we begin by considering an abstraction of “sets of integer values”. At a minimum, we would like to be able to (a) obtain an empty set, (b) adjoin an integer to a set to create a new set, (c) construct a set that is the union of two given sets, (d) inquire whether a set is empty, and (e) inquire whether a set contains a specified integer. We can describe this abstraction as a Fortress API:

api IntSetAPI

trait IntSet
  isEmpty: Boolean
  adjoin(x: ZZ): IntSet
  union(other: IntSet): IntSet
  contains(x: ZZ): Boolean
end

(*) Example: `Empty.adjoin(1).adjoin(2).union(Empty.adjoin(3)).contains(2)` should be `true`.

Empty: IntSet

end IntSetAPI

(To facilitate comparison to the examples in Cook’s paper and also to the way one might code these examples in the Java programming language, we use dotted methods throughout these examples. A more Fortress-like style would be to define operators for the “union” and “contains” operations:

trait IntSet   (*) Alternate interface definition #1
  isEmpty: Boolean
  adjoin(x: ZZ): IntSet
  opr UNION(self, other: IntSet): IntSet
  opr IN(x: ZZ, self): Boolean
end

(*) Example: `2 IN (Empty.adjoin(1).adjoin(2) UNION Empty.adjoin(3))` should be `true`.

Indeed, an even more Fortress-like style would be to eliminate the adjoin operation from the interface and replace it with a constructor of singleton sets:

trait IntSet   (*) Alternate interface definition #2
  isEmpty: Boolean
  opr UNION(self, other: IntSet): IntSet
  opr IN(x: ZZ, self): Boolean
end

opr {x: ZZ}: IntSet

(*) Example: `2 IN ({1} UNION {2} UNION {3})` should be `true`.

But we digress.)

One possible implementation for the original API is as follows:

component IntSetOO
export IntSetAPI

trait IntSet
  getter isEmpty(): Boolean = false
  adjoin(x: ZZ): IntSet = AdjoinObject(self, x)
  union(other: IntSet): IntSet = UnionObject(self, other)
  contains(y: ZZ): Boolean
end

object AdjoinObject(s: IntSet, x: ZZ) extends IntSet
  contains(y: ZZ) = (y = x) OR: s.contains(x)
end

object UnionObject(s1: IntSet, s2: IntSet) extends IntSet
  isEmpty: Boolean = s1.isEmpty AND s2.isEmpty
  contains(y: ZZ) = s1.contains(x) OR: s2.contains(x)
end

object EmptyObject extends IntSet
  isEmpty: Boolean = true
  contains(_: ZZ) = false
end

Empty = EmptyObject

end IntSetOO

We use several different sorts of object to represent sets. Each implements the same abstract interface. One of Cook’s points is that, in a truly object-oriented style of programming, the implementation of each object uses only the abstract interface to communicate with other objects, allowing the implementations to be quite independent of each other, and allowing new implementations to be added easily at any time. For example, Cook shows that infinite sets can easily be represented using the same abstract interface, and we can readily generalize his example and translate it into Fortress code as follows:

object IntegersMod(n: ZZ) extends IntSet
  contains(y: ZZ) = (y MOD n = 0)
end

Now we can write an expression such as

IntegersMod(2).union(IntegersMod(3)).adjoin(7)

and expect it to behave as if it contained all integers that are a multiple of either 2 or 3 (or both), as well as the integer 7.

Now, let’s suppose that we were to construct a set of the first million prime numbers using sets implemented in this way:

do
  s: IntSet := Empty
  k: ZZ := 2
  n: ZZ := 0
  while (n < 1'000'000) do
    if isPrime(k) then
      s := s.adjoin(k)
      n += 1
    end
    k += 1
  end
  s.contains(13)
end

Yes, we know lots of ways this code can be optimized, but let us please focus on the behavior of the constructed set, which is a linear chain of one million objects of type AdjoinObject. When we ask whether the set s contains 13, the first AdjoinObject will check its x value, see that is it not 13, and then pass the buck to the next AdjoinObject. Observe that this “passing of the buck” is a tail call; to see this, note that the short-circuit Boolean OR operator is equivalent to the use of an if statement:

object AdjoinObject(s: Intset, x: ZZ) extends IntSet
  contains(y: ZZ) = if (y = x) then true else s.contains(x) end
end

if the underlying implementation fails to implement tail calls without unnecessary growth of stack, and also is unable to support nearly a million nested stack frames, then execution of this query will fail.

Many programmers, including those familiar with the style of the Java programming language, will object that the original design of the implementation was wrong-headed: if we want iterative behavior, we should use an iterative statement such as while rather than method calls. This approach is easily captured in Fortress for these AdjoinObject searches:

object AdjoinObject(s: Intset, x: ZZ) extends IntSet
  contains(y: ZZ) = do
    u: IntSet := self
    (*) The next statement has an interesting idea but is erroneous.
    while (NOT u.isEmpty) AND: (y =/= u.x) do u := u.s end
    NOT u.isEmpty
  end
end

But there is a problem here; our implementation uses not only AdjoinObject instances but also instances of UnionObject and IntegerMod. In order to maintain the iterative structure of this implementation of the contains method, we will need to make it handle these other kinds of objects as well. This requires the implementation of AdjoinObject to be aware of the implementation details of these other kinds of objects, violating our object-oriented abstractions. Every time we want to invent a new implementation of IntSet, we will also need to add a case to this while loop in AdjoinObject. (In point of fact, the while statement shown above contains static type errors. These could be remedied by using typecase. Trying to satisfy the typecase without using an else clause would then point up the need to handle all the other possible implementations of IntSet.)

We could try to abstract the iterative process: If only our sets supported Java-style iterators, we could just enumerate all the elements of the set—oops, can’t do that, because some of our represented sets are infinite. And Java-style iterators have problems of their own, not least that they rely upon side effects and therefore introduce synchronization issues in parallel code.

Even without the other kinds of objects, we can see that this latest implementation of contains for AdjoinObject violates object-oriented abstraction: the implementation inspects the field x not only for the instance on which the contains method was invoked, but the field x of other instances as well. Thus instances of AdjoinObject are abstract with respect to “the outside world”, but not with respect to each other. As Cook points out, this is typical of an abstract data type: there is conceptually a single implementation of the data type, and the code for the abstract data type is privy to these details for purposes of accessing all instances of the data type. This has its advantages; in particular, it caters to certain kinds of optimizations.

In contrast, the object-oriented style makes every object fully abstract, even to other instances of the same “kind” of object. This also has its advantages; in particular, it easily supports multiple implementations and code refactoring. If we want this latter sort of abstraction and want to code iterations over chained objects in a fully abstract and modular manner, then proper support of tail calls is indispensable. This is even more important for the implementation of transitions in an object-oriented finite-state machine of indefinitely long lifetime.

For another example of the importance of tail calls, see Automata via Macros by Shriram Krishnamurthi.

The paper A Tail-Recursive Machine with Stack Inspection by John Clements and Matthias Felleisen demonstrates that correct implementation of tail calls is not incompatible with security mechanisms that rely on stack inspection.

Posted: 2009-11-30 14:56 (Updated: 2009-12-03 22:28)
Author: gls
Categories: Implementation TailCalls TailRecursion

Comments
  1. Faré – 2009-12-01 13:24

    In other words, Proper Tail Calls make control modular, whereas iteration requires omniscience.

    I tried to reply as much to Guido van Rossum when he used absurd arguments against providing PTC http://fare.livejournal.com/142410.html

  2. matthias – 2009-12-01 17:00

    Thank you, Guy, for emphasizing the importance of proper tail-call implementation (let’s call it PITCH here) for software design and refactoring. When I delivered my ECOOP keynote in 2004, I made the PITCH point slightly differently using the Design Patterns book. I developed some code using the patterns from the book in Java. I requested and received confirmation that the code was truly OO oriented and truly according to Their Book. Then I tested it and it worked fine. Then I stress-tested it (on a not so large input), and it failed with stack overflow.

    The next step was to demo the same code in the PLT Scheme class system, which looked very similar to the Java code. And it ran great on those stress tests and even larger ones. So I asked the 300-400 people in the audience what the problem was and why this slow, interpreted language Scheme did it right and Java, C#, Eiffel, Python, etc got it wrong.

    Not one person had any insight so after a minute of playing with them, I told them about the story of “Smalltalk to Actors to Scheme and how PITCH had ‘come along for the ride’” (you may recall our email exchange in the spring of 2004 to confirm the story of ‘apply’). And I did reinforce how much knowledge the OO research community has forgotten over the years.

    Sadly, the reaction was surprisingly hostile from some corners in the room. One person was especially outspoken in arguing that we all know that we just need three or four special constructs – and they may have to be map or for-each or such – and we’d be in business. Given that it was a public exchange, it was difficult to point out to him how messed up his argument was.

    While I have received many positive emails about the talk (my slides are on-line and people find them time and again), this message about PITCH deserves to be repeated and repeated and repeated for OO audiences. Let’s keep going at it.

  3. gls – 2009-12-03 10:44

    Yes, Faré, the third paragraph of your post is quite pertinent. I especially liked this remark:

    [I]n programs for which proper tail calls matters, the choice is conspicuously not between proper tail calls and improper tail calls, it is a choice between proper tail calls and explicit central stateful loops. And the absence of debugging information is constant when you transform tail calls into stateful loops.

  4. timbray – 2009-12-03 11:03

    I’m wondering if Clojure’s “recur” and “trampoline” forms - see http://clojure.org/special_forms#toc10 and http://clojure.org/api#toc569 - constitute another plausible path to the same goal. Clojure has a commendable focus on immutability.

  5. Faré – 2009-12-03 12:37

    Tim, recur is only for self-recursion, which is nice but doesn’t allow for modular design.

    And trampoline provide you full PTC at the cost of a (local) encoding that must be followed by every single bit of software that you want to use PTC with. Admittedly, though, you can probably statically determine which pieces of software may be involved in higher-order or otherwise modular tail recursion, and systematically encode those parts of the software with trampolines. Kindof sucks, but works. Now why should the user have to be doing manually what the computer could automatically do a better job at?

    In any case, if you want to use third-party libraries and they failed to enforce that convention, you lose. Whereas if the use of trampolines is to become universal – you may as well put PTC in the language implementation and save users the hassle of dealing with trampolines.

  6. jyasskin – 2009-12-03 13:50

    It’s really not convincing to show that a set implementation nobody would ever use works better with TCO. Further, your “even more Fortress-like style” with a union operator replacing adjoin isn’t helped by TCO: build the set in the wrong order, and you still blow out the stack.

    The objection to TCO that you’re trying to answer is that it’s rarely helpful in the real world. So you have to find real-world examples to refute that. An impractically-slow set isn’t a real-world example.

  7. gls – 2009-12-03 14:09

    jyasskin, your point is well-taken, and I will even take it a step further: it is even more in the spirit of Fortress to use data structures such as relatively balanced trees that are amenable to parallel traversal than to use long linear-chain structures that require sequential traversal. To that end, a “real” implementation of UNION would automatically rebalance the UnionObject tree. (See Jan Maessen’s earlier posts in this blog about implementing treaps.)

    On the other hand, I believe that the overheads of parallel processing are inherently large enough that there will always be a place for sequential processing, if only near the low-level leaves of the trees, and I believe that there will almost certainly be a sequential aspect to the highest level of any real-world program, if only “Wanna do it again?” I believe that tail calls are important for sequential programs that are organized according to certain principles of modular structure.

    I wish I could fit a real-world example onto a blog page. The best I can do is to exhibit a “toy” example that illustrates the basic idea.

  8. ccshan – 2009-12-03 16:56

    jyasskin– Most operating systems make tail calls between user code and kernel code. The context switch is usually implemented with a bit of assembly code, precisely because C doesn’t have tail calls (but machine code does). All these operating systems are real-world examples of the necessity of tail calls.

  9. rsc – 2009-12-03 22:07

    ccshan– It’s certainly a thought-provoking analogy, but I don’t think it withstands scrutiny.

    The user → kernel transition is almost never a tail call: there are results coming back from the kernel. The exception is the exit system call, the final switch a process makes.

    The kernel → user transition is a little like tail call, but more like a return from a function call (not a tail return, just a return). The exception is process creation, the first switch the kernel makes.

    Regardless of how well the analogy works, OS context switches would still be implemented in assembly even if C had tail calls, because C doesn’t give you access to the instructions necessary to change the processor privilege level and virtual memory state (but machine code does). I wouldn’t use that as an example of the necessity of exposing those very low-level hardware details, so I don’t think it makes sense to use it as an example of the necessity of tail calls either.

  10. metageek – 2009-12-04 07:07

    gls wrote:

    I wish I could fit a real-world example onto a blog page.

    I bet you can find something in the graphics world to do it. For example, given a tree of non-overlapping bounding volumes, find the smallest volume that matches a given point. That involves walking down the tree, asking each child, “does your bounding volume contain this point?”; when one replies yes, you recurse into it. Since no more than one child can say yes, that’s a tail call.

    I think this is a plausible scenario from video games, used to find the game-world object at a given point. However, I’m not sure how often they actually do that, as opposed to finding the game-world object that overlaps with a given volume (i.e., collision detection).

  11. ggchappell – 2009-12-04 09:04

    Guy, this is a thought-provoking article that makes an excellent point.

    On the other hand, I think that jyasskin’s point about UNION is also a good one, and your response is not really adequate. Doesn’t the proposed tree-rebalancing implementation of UNION have exactly the same problem as the iterative implementation of ADJOIN? I.e., it requires knowledge of the internal structure of other objects. So if we added a new union-like object to the mix, then the new & old union objects could not rebalance each other.

    Perhaps we can summmarize all this by saying that TCO permits the use of black-box abstractions when a property of an object depends on a property of another object; however, TCO is not sufficient to handle the case when a property of an object depends on properties of two other objects.

    Thus, your example neatly illustrates not only why TCO is a Good Thing, but also its limitations.

  12. Derek Elkins – 2009-12-04 09:30

    I have two other examples here. The first example is perfectly reasonable code, though not common (mainstream) object-oriented style, though arguably it should be. The second is just a sketch but is a very pragmatic example. Hopefully, this (Guy’s) blog post provides the explanation that my post left implicit.

  13. dasht – 2009-12-04 18:13

    Just for kicks… and while I like the article I’m replying to… I have an argument for the conclusion that gls is mistaken here - his argument doesn’t (quite) hold up. It’s over on LtU:

    http://lambda-the-ultimate.org/node/3702#comment-52777

    “Steele’s argument bugs me. I don’t think it is quite right: We can construct a Cooke style OO language that does not have TCO yet which can solve the integer set problem Steele poses. [….]”

  14. ccshan – 2009-12-04 20:02

    rsc– I was talking about calls made to a continuation. Think about how fork() and exit() work. It is because exit() is a tail call that all my finished processes aren’t sitting around waiting for their calls to exit() to return.

Comments (closed)
dr2chase 00:28, 2 Oct 2011

You only missed one (I'm the guy who mostly ran the site; we have live backups, including one on the laptop I use):

Faré -- 2009-12-07 01:56

As the devil's advocate, I'll give *valid* arguments why PTC might not be as important in practice as it is in theory:

1- Indeed with explicit trampolines, you can actually achieve PTC at the cost of merely an order of magnitude in performance and in syntactic complexity.

2- In current languages, programmers are already so burdened into low-level considerations that they don't pay a huge premium to do without PTC. Just like they used to do fine without GC, they can do fine without PTC.

3- As part of (manually) parallelizing algorithms, programmers already spend time refactoring their control and data structures to avoid deep towers of sequential computations, instead balancing things into shallow trees that can be readily parallelized.

4- If we eventually get more declarative languages then sure an SSC (sufficiently smart compiler) will handle PTC, but that might be the least (and least interesting) of its magic.

As to why the move (to java.net), it has a lot to do with standardizing infrastructure, increasing security, and sufficient maturity in the Kenai infrastructure to host our stuff. But it didn't do blogs the way we were accustomed to, and lacked the mighty-fine Fortress display macro (a Trac plugin that invoked emacs, latex, and ghostscript behind the scenes).  And we're really really busy trying to get the compiler whipped into shape.

Tony Garnock-Jones 00:30, 2 Oct 2011 (in reply to this comment)

Thanks very much for posting that. I'm glad you have the content of the old site backed up; best of luck with the remainder of the site move!