PageRenderTime 27ms CodeModel.GetById 19ms app.highlight 3ms RepoModel.GetById 1ms app.codeStats 0ms

Plain Text | 297 lines | 239 code | 58 blank | 0 comment | 0 complexity | e139c732a928734627b85edb4c18d305 MD5 | raw file
  1(Copyright 2006 Sriram Srinivasan)
  3Kilim IFAQ: Infrequently Asked Questions. Kilim v 1.0
  4-- sriram srinivasan (Kilim _at_
  7Why is multi-threaded programming considered so hard? 
  9It is relatively easy to get thread programming correct (to a first
 10approximation) by synchronizing all your shared data structures and
 11taking locks in the right order. 
 13You could have one giant lock and just do things one at a time (like
 14the current python interpreter with its Global Interpreter Lock).
 15Clearly, this is not efficient.  Increasing concurrent access of a
 16data structure (by using finer-grained locks) is what makes it 
 17error-prone and hard to debug. 
 20Kilim uses kernel threads. Where do tasks and threads meet?
 23Kilim's tasks are cooperatively scheduled on a kernel thread pool. 
 25Tasks are needed when you want to split up your workflow into small
 26stages and write code as if it is blocking (instead of writing a
 27callback and having to jump to that function when it gets called).
 28Tasks should not ideally make thread-blocking calls, although if you
 29*have* to call one, it is not the end of the world. That's what other
 30threads are for .. they'll take care of the other tasks meanwhile.
 32A Kilim task is owned and managed by a scheduler, which manages the
 33thread pool. When a task needs to pause, it removes itself from the
 34thread by popping its call stack, remembering enough about each
 35activation frame in order to help rebuild the stack and resume, at a
 36later point). The scheduler then reuses that thread for some other
 39You can have more than one scheduler (read: thread pool) and assign
 40each task to a particular scheduler. See the bench directory for 
 45How lightweight is "lightweight"?
 47The amount of memory occupied by a task is:
 481. The java object that represents the task class
 502. If paused, an array of activation frames is stored. The Kilim
 51   weaver performs data flow and live variable and constant analysis
 52   (intra-procedurally) to ensure that it capture only as
 53   much as is needed to resume.
 553. The contents of all mailboxes that the task is receiving on.
 57Clearly, all these depend on your application. 
 59The depth of the task stack is limited only by the thread's stack; no
 60memory is preallocated. Note that when written in the message passing
 61style, stacks tend not to be too deep because each task is like a
 62stage in a workflow, with its own stack.
 65What's the difference between channels in Ada, CSP/Occam, Newsqueak,
 66Alef etc. and Kilim's mailboxes?
 69Most of these languages use synchronous channels as their basic
 70construct, where a sending task can proceed only after the receiver
 71has received (or vice-versa). 
 731. Synchronous channels are easier to reason about because there is
 74   automatic flow control; the sender does not proceed unless the
 75   recipient drains the channel. Tony Hoare, Robin Milner, Rob Pike
 76   and John Reppy have all written extensively about synchronous
 77   programming, so I will take their word for it. However, I still
 78   find asynchronous programming (through buffering) a better default
 79   choice for practical reasons:
 812. Context switching has a cost, however inexpensive Kilim's tasks are
 82   to create and context-switch (unlike the Occam/transputer world
 83   with its hardware-assisted switching).  Although Kilim's mailboxes
 84   can be configured to be synchronous, it is not the default.  There
 85   are many cases where you want to send messages to multiple
 86   recipients before waiting to collect replies. I find tedious the
 87   CSP approach of spawning a task to avoid blocking while sending.
 893. I like the same interface for both concurrent and distributed
 90   programming (although support for distributed programming is yet to
 91   be bundled with Kilim).  Synchronous _distributed_ programming is
 92   horribly inefficient .. every put has to be acked when a
 93   corresponding _get_ is done.
 95This is why I have followed Erlang's example to prefer buffered
 96channels (called mailboxes) as the default choice.
 99Erlang vs. Kilim
102Kilim is an ode to Erlang (, and strives to bring
103some of its features into the more familiar Java world.
105The term "Erlang", like Perl, refers to both the language and the sole
106available implementation. Comparisons have to be made on these two
107axes separately.
109The Erlang language is a soft-typed, purely functional language and
110has many of the goodies of a functional setting: higher-order
111functions, beautifully simple syntax and pattern matching on terms, 
112features that I'd love to see in Java.  However, programming in a purely
113functional style is not everyone's cup of tea and there is no reason
114that higher order functions and pattern matching can't be made
115available in an imperative setting (See Scala, JMatch, Tom(from INRIA)
116etc). If you have to have types, it is better to have Ocaml-style 
117types (or even Smalltalk); but compared to Java-style types, I prefer
118the simplicity of Erlang's soft types.
120The argument for Java lies not in the language, but in the incredible
121JIT compilers, JDK, enormous open code base and community, excellent
122IDEs, good network, database, GUI and systems support. Why throw away
123all that?
125The Erlang *environment* (not the language) offers lightweight
126processes, fast messaging, uniform abstraction for concurrency and
127distribution and many, many systemic features (process monitoring,
128automatic restart), process isolation, failure isolation etc. These can be
129built atop Kilim as well. 
131The idea behind Kilim is that one can have all the features of the
132Erlang environment without having to move to the Erlang
136Kilim vs. Transactional Memory
139Hardware/Software Transactional Memory is currently the new hope and
140an alternative for concurrent programing in the shared memory
141world. It is appropriate in a mostly functional setting where most
142objects are immutable and side-effects are rare or contained. In an
143imperative setting, I have my doubts about TM's scalability; hotspots
144are expensive. Atomic sections can't be too big, otherwise they risk
145getting retried all over again.  And the part of code that retries had
146better not have any side effects that doesn't know about or is not
147controlled by the TM, such as sending messages on the network.
149I think the task and mailbox approach is a more understandable model,
150has nice run-to-completion semantics, has convenient graphical
151representations (dataflow diagrams, workflow diagrams, Petri nets). It
152brings the interaction with other processes out in the open. It allows
153batched and efficient communication.
155That said, there is absolutely no reason not to use the TM facilities
156internally inside Kilim. I intend to use non-blocking data structures
157when they perform well (currently, Java's data structures aren't
158as fast as I'd like them to be)
161What's the relation between CCS/pi-calculus and Kilim
163The notion that the Mailbox itself is a first class message datatype
164and can be sent as part of a message is inspired by Prof. Robin
165Milner's pi-calculus. This allows the topology to change with time.
166A can send a mailbox in a message to B, B can forward that message to C 
167and C and D can shared that mailbox.
169Beyond that, CCS, like CSP is a modeling and specification language,
170and uses synchronous interaction between processes. At a practical
171level, this is terribly inefficient (esp. in Java).
174RMI vs. Kilim
177We need to distinguish between RMI implementations and the concept. 
179RMI implementations block the java thread. That's a no-no for
180scalability.  They themselves are incredibly heavyweight -- I/O
181serialization is always used, even in a concurrent setting, for
182ensuring isolation. The request response paradigm doesn't allow many
183other patterns of communication: fork/join, flow control, rate
184control, timeouts, streaming etc.
186Kilim, in a concurrent (local) setting, is at least 100x faster than
187Java RMI on even the simplest benchmarks. In a distributed setting,
188the Kilim approach is better because asynchronous messaging is much
189more scalable. Combine this with automatic stack management and you
190get a far easier programming model
193What are Continuations and what is Continuation Passing Style(CPS)?
195There is so much doubt and misinformation on the topic that a few
196words are in order.
198Simply put, a CPS style of programming is where a "return" keyword is
199not needed.
201The notions of procedures calling procedures by building up a stack
202has been burnt into our collective programming consciousness.  If a()
203calls b() calls c(), we think, the stack must be three deep.
205Suppose a() has nothing more to do after calling b(). It (that is, a()) really
206doesn't need b() to return to it, so there is no use pushing a return
207address on the stack. In other words, the flow of control _continues_
208from a to b, never to return.  Most respectable code generators
209recognize this special case and prevent the stack from building up
210("tail call optimization"). It is a pity this isn't available under
211the standard JVMs. Even GCC doesn't do it all the time.
213Now consider,
214   a() {
215      do stuff
216      b()
217      do more stuff
218   }
219   b() { 
220      ...
221      return
222   }
224Now you need a stack and you want b() to return in order to "do more
225stuff". However, this bit of code can be transformed to ensure that b
226doesn't return; instead it continues on to another procedure that
227performs the "do more stuff" bit.
229  a() {
230     do stuff
231     b("c") // pass a reference to c()
232  }
234  b(nextProc) {
235     ...
236     call nextProc
237  }
239  c() {
240    do more stuff
241  }
243The "do more stuff" part has now been separated out into c().  Now,
244a() chains on to b, supplying it the name of the next call to
245make. For its part, b _continues_ to the procedure referred to by its
246nextProc parameter, instead of returning.
248This transformation ensures that you never need the "return" keyword
249... you always continue onwards to the parameter supplied.
251What if "do more stuff" needed to refer to local variables in a()'s
252stack frame? Well, the transformation ensures that a() packages the
253values of those variables along with a reference to the next proc to
254call. Now, instead of "nextProc", we have an _object_ (with state and
255a procedure) called a continuation.
257The obvious question is, why bother? The stack worked well, didn't it?
258Why dispense with it? Yes, the stack works incredibly well, which is
259why CPUs and compilers have special support for it. However, the
260continuation passing style allows for other forms of transfer of
261control very simply. C++ and Java provide two forms of "return", one
262normal and another using exceptions. If we had CPS, we wouldn't need
263these special cases.
265Instead of a() installing an exception handler, it would pass in two
266continuation objects to b() that know what to do under normal and
267under exceptional conditions. b() simply chains on to the appropriate
268object as its last move.
270As another example, you can have tasks that pass control to a
271scheduler that in turn passes control to another task, all without
272having to return to whoever called it. 
274In a programming language with explicit support for continuations (ML,
275Lisp, Haskell), one can have the "return" keyword merely as a
276syntactic sugar (like a macro). Internally, the compiler CPS
277transforms the entire code, so no procedure returns to its caller.
279Are there any disadvantages of continuations? Oh yes. Machines are
280so well optimized for stack usage and no tail calls that the
281system is biased against continuations, performance-wise. The 
282continuation object has to be allocated from the heap and depends
283on garbage collection. This is one reason why OCaml doesn't use CPS.
285That said, the current crop of garbage collectors and the amortized
286cost of garbage collection often matches that of stack-based
287allocation, and continuations are simply too powerful a feature to 
290Where does Kilim fit into all this?
292Kilim's transformation is similar to CPS, but it needs to live within
293a JVM that does not even support tail calls. It also needs to live
294with the Java verifier that doesn't allow random gotos to be inserted
295in the code willy-nilly. More details in the paper "A Thread of One's
296Own" (included in the docs directory)