Notes on Self techniques
authorTony Garnock-Jones <>
Sun, 15 Jul 2018 13:43:34 +0100
changeset 384 2d82efe83d59
parent 383 bcad96e920f0
child 385 0d3839af02db
Notes on Self techniques
--- a/experiments/little-smalltalk/	Sun Jul 15 11:43:19 2018 +0100
+++ b/experiments/little-smalltalk/	Sun Jul 15 13:43:34 2018 +0100
@@ -37,3 +37,97 @@
     19476567 bytecodes/sec; 3422322 sends/sec
 2018-07-15 11:12:26 tonyg on hop: 0 tinyBenchmarks 7816316 bytecodes/sec; 1313400 sends/sec
+2018-07-15 12:30:53 Reading some of "the classics" [1,2] to refresh my
+memory. The first of the two [1] presents the techniques of
+*customization*, *inlining*, and *splitting*:
+ - *Customization* means compiling a specific version of each method
+   for each distinct concrete receiver class. In
+   `jit-SmallWorld-2015.rkt`, this would correspond to including a
+   `class` argument to `compile-native-proc`.
+   Their example of `#min:` is instructive. Consider `#min:` inherited
+   by, say, `SmallInt` and `String`:
+       min: x
+         ^ (self < x) ifTrue: [self] ifFalse: [x]
+   By using customization, the system can directly link to (at least)
+   or inline (better) the `SmallInt`- and `String`-specific
+   implementations of `#<`.
+ - *Inlining* comprises both inlining of a method implementation into
+   a call site, where the receiver class of the inlined method is
+   statically known, and open-coding of primitives at call sites. In
+   `jit-SmallWorld-2015.rkt`, primitives are open-coded, but other
+   methods are not yet inlined, because no type information is
+   propagated inside the compiler.
+ - *Splitting*—properly *message splitting*—is the hoisting of pieces
+   of a continuation into alternative branches leading up to that
+   continuation. It corresponds to a kind of
+   CPS-transformation-induced binding-time improvement: it's the kind
+   of thing that happens if you do the CPS transformation, and then
+   specialize it ("customize" it!) based on type information gleaned
+   from each branch.
+   In the `#min:` example, after inlining and open-coding the
+   primitive `#<` for `SmallInt` receivers, we have two branches: one
+   where the comparison yielded the `true` object, and one where it
+   yielded `false`. Without splitting, the subsequent
+   `#ifTrue:ifFalse:` can't be inlined effectively (though [IDEA] a
+   partial evaluator able to represent *alternatives*—*unions*—might
+   be able to do well! The result would be (U True False) and a simple
+   cascade of tests could recover quite a bit of efficiency). By
+   splitting, hoisting the `#ifTrue:ifFalse:` send into the branches
+   producing the `true` and `false` values, we get to inline the
+   constant blocks that are its arguments.
+They also mention [1] other techniques:
+ - "type prediction" based on hand-coded heuristics about common uses
+   of certain selectors. Our `jit-SmallWorld-2015.rkt` (and the pure
+   interpreter it descends from, and the SmallWorld implementations of
+   Budd and Allen that *that* descends from) all benefit from this:
+   the `SendUnary` and `SendBinary` opcodes (10 and 11) include
+   special checks for `SmallInt` instances. Reading the introduction
+   to [2] suggests that this kind of type prediction will be obviated
+   by dynamic type feedback. We shall see.
+ - "extended splitting" is essentially even more like CPS-based
+   binding-time improvement: whole chains of operations, larger
+   segments of continuations, are hoisted above merge nodes in the
+   control flow graph.
+ - with some small additional effort (?), the optimizations combine to
+   hoist some loop invariants from a loop body into its header,
+   apparently.
+They describe their "adaptive recompilation" technique of the time.
+Two compilers are used; the non-optimizing compiler adds code to
+increment a per-method (presumably, a *customized* method?) counter on
+each entry to the unoptimized routine. Later checking of the counters
+identifies hot spots.
+QUESTIONS: How does it figure out which receiver-type to optimize for?
+How many counters per implementation are there - one per
+receivertype/implementation pair, or just one for the implementation
+On to the second paper [2]. At the end of the introduction, they claim
+that their new system runs 1.5 times faster than their old system,
+despite the simplicity of the new system, because of the power of type
+ - The built-in PICs
+[1] Ungar, David, Randall B. Smith, Craig Chambers, and Urs Hölzle.
+“Object, Message, and Performance: How They Coexist in Self.” Computer
+25, no. 10 (1992): 53–64.
+[2] Hölzle, Urs, and David Ungar. “Optimizing Dynamically-Dispatched
+Calls with Run-Time Type Feedback.” In Proceedings of the ACM SIGPLAN
+1994 Conference on Programming Language Design and Implementation,
+p:326–336. ACM New York, NY, USA, 1994.