HomePage RecentChanges ToDo Reading Projects Categories Notes Sand Box New

HomePage

Hi there,

this is a somewhat dead blog of Jaroslav Sevcik.

This is a research and work oriented wiki:


2009-07-17 Research

On irrelevant read introduction and multi-threading

Hmm, let me talk a bit about a technical problem that was bothering me for more than a year now.

Irrelevant read introduction is a seemingly harmless optimisation that introduces a useless read to a program. Let me illustrate this on a simple example (with a single thread): Take a program

  r = 1
  print r

Now let us rewrite the program to:

  r = x
  r = 1
  print r

This is an obviously correct transformation for our little program because the new program outputs 1, just like the original program.

Even if there were other threads (assuming that r is a thread local register and x is a shared-memory location) and we had sequential consistency or Sun TSO or Itanium memory model, there would be no problem with this transformation. Intuitively, this is because the irrelevant read cannot affect my thread (because we throw away its value) and it cannot affect other threads (because reads are invisible to other threads in all of the above memory models in the sense that removing a read from an execution can only break intra-thread consistency of the read's thread but nothing else).

Hmm, so what is the problem, then? To explain this I will have to do a brief detour to memory models: you might know that programming languages do not give you sequential consistency or even a strong memory model, such as TSO, because many optimisations break strong models. Furtunately, most optimisations play nice with data race free programs, which are programs that never touch the same memory at the same time (or at least mark all memory accessed concurrently as volatile). This the model that will be adopted by C++, and it is the current model of Ada. The situation is a bit more complicated in Java, but for practical purposes, one can assume that Java guarantees sequential consistency for data race free program as well. Such a guarantee is called the DRF guarantee. The problem with irrelevant read introduction is that it can break data race freedom. This is not a problem per se, but there can be subtle interference with other optimisations.

More precisely, the problem is that some optimisations that are perfectly legal in the DRF guarantee do not play nice with the irrelevant read introduction. First note that an optimising compiler with a DRF guarantee can perform the following removal of redundant read:

  r1 = x; lock m; r2 = x; unlock m; ==> r1 = x; lock m; r2 = r1; unlock m;

because it cannot introduce new behaviours for any data race free context (if it could then there would be a data race in the original program).

Now, have a look at the following program with 2 threads (all variables are zero-initialized)

  Thread 1: lock m; x = 1; r1 = y; print r1; unlock m; 
  Thread 2: lock m; y = 1; r2 = x; print r2; unlock m; 

and observe that the program is data race free and it can never output two zeroes. Then, consider an irrelevant read introduction:

  Thread 1: r3 = y; lock m; x = 1; r1 = y; print r1; unlock m; 
  Thread 2: r4 = x; lock m; y = 1; r2 = x; print r2; unlock m; 

and then redundant read removal shown above:

  Thread 1: r3 = y; lock m; x = 1; r1 = r3; print r1; unlock m; 
  Thread 2: r4 = x; lock m; y = 1; r2 = r4; print r2; unlock m; 

Suddenly, the program can output two zeroes. This is very annoying…

I have some other (less convincing) evidence showing that irrelevant reads are not easy to explain, but I do not have any example that would show bad interference with standard optimisations (common subexpression elimination, reordering of independent statements…). In fact, it might actually be the case that irrelevant reads are fine in combination with standard optimisation. In either case, it would be very useful to know…


2008-02-06 Research

Ok, let me settle on half year update period.

The main thing from the last post is that I have submitted a paper on Transformations in the Java Memory Model. The main message is that the people behind Java should revise the memory model again because it cannot really explain even simple transformations that compilers do.

Currently I am working on a memory model that starts from transformations. I define several 'simple' transformations and then prove that any composition of these transformations on your data race free program yields a program that has the same behaviours as the original one. This is actually what my PhD? is all about.


2007-07-17 Research

When giving my talk about the Java Memory Model in Cambrdige, Peter Sewell suggested that I should look at the optimisations that real JVMs do.

Since I have no idea how to look at the results of a JIT compilation I've started experimenting a bit with GCJ compiler. Some of the results are summarised in OptimisationsGCJ.

Unfortunatelty, while testing this, I've found out that GCJ does not implement the Java Memory Model correctly (version gcj (GCC) 4.1.2 20070626 (Red Hat 4.1.2-13), IA32 architecture, command line switches "--main=Tester -O"):

  public class Tester extends Thread
  {
 	public static int x = 0;
	public static volatile int y = 0;
  
    public static volatile int z = 0;
  
    public static void main(String [ ] args)
    	{
            new Tester().start();
  
            int r1 = x;
            y = 1;
            
            for(int i=0; i<100000000;i++) z=i;
  
            if (y == 2) 
            {
              if (x == 0) System.out.println("!!!");
            }
    	}
  
    public void run()
    {
        for(int i=0; i<50000000;i++) z=i;
  
        if (y == 1)
        {
            x = 2;
            y = 2;
        }
    }
  }
    

By the Java Memory Model, the program should never print the exclamation marks, because y is volatile and the read of x in the if statement must see the write of 2 into x.

In fact, the program is correctly synchronized, thus it should have only sequentially consistent behaviours. Therefore, 'println' should never be invoked.

However, as GCJ optimizes away the read of x and uses the value obtained from the assignment r1=x, the invocation of println is possible (and actually happens).


2007-03-28 Research

Optimisations in the JMM

I have an informal proof of the independent statement reordering with our weaker definition of legality.

While doing this, I realised that even redundant write optimisation might be an optimisation that is legal (i.e. does not change observational behaviour). Well, looks like I will have to go through the tedious proof once again (and make better notes this time :-)).


2007-03-21 Research

I've been busy writing a paper about our formalisation of the Java Memory Model in Isabelle/Isar theorem prover. Now it should be (almost) ready for its Friday submission to TPHOLs'2007. I am a bit afraid that it should aim somewhere else, because the formalisation is not the most interesting part of it. The interesting part is that we were able to find and fix some flaws in the memory model. Some of them are not obvious at all when you read the memory model specification. What is even less obvious are the fixes. The formalisation makes it very easy to change definitions and see how they affect the proofs. So we were able to come up with fixes for the flaws and we have proved that these fixes do not break important guarantees of the memory model.

Now one of these flaws:

One more flaw in the Java Memory Model

As I've been trying to justify the causality test cases for the JMM, I found another flaw in the memory model (I believe). It's described in the paper. Basically, test cases 17-20 cannot be justified. E.g. test 20:

  Thread 1:
  join thread 3
  r1 = x
  y = r1
  
  Thread 2:
  r2 = y
  x = r2
  
  Thread 3:
  r3 = x
  if (r3 == 0)
    x = 42

The causality tests say that the behaviour r1 == r2 == r3 == 42 should be allowed.

A (maybe too) quick explanation why this is not legal: the problem is the rule 7 of legality which says that when you are committing a read, both the write that it sees in the justifying execution and the write it sees in the final execution must be committed. This prevents us from justifying the read 'r1 = x'.

What is the fix? Simple - weaken the rule 7 to require only the write in the final execution to be committed. Then DRF works and the out-of-thin-air too.

I even have some proof of some out of thin air, but I will keep it for later (maybe write a short paper about it).


2007-03-07 Research

Bug in the Java Memory Model

Pietro Cenciarelli, Alexander Knapp and Eleonora Sibilio have found a bug in the Java Memory Model in their paper.

The problem with the JMM is that reordering of independent statements does not always preserve the meaning of the program.

They give this example:

   Thread 1     | Thread 2
   -------------------------
   r3 = z       | r1 = x
   if (r3 == 1) | r2 = y
   { x = 1      | if (r1 == 1 &&
     y = 1 }    |     r2 == 1)
   else         |    z = 1  
   { y = 1      |
     x = 1 }    |

In this example the behaviour r1 == r2 == r3 == 1 is not legal in the Java Memory Model. The main reason is that in legal executions, if actions are committed, they have to preserve their happens-before ordering. Hence, if we commit the writes y = 1 and x = 1 in the else branch, these writes cannot be generated by the then branch, because then they would not have the same happens-before order.

However, if we swap the writes to x and y in any of the branches, the behaviour above becomes legal, because now the writes have the same happens-before order in both branches.

This implies that there is a mistake in the proof of the theorem guaranteeing legality of reordering. Can we fix the bug in the JMM and formalise the proof properly?

So far I have formalised a change in the formalisation that only requires to preserve happens-before order for read-write pairs to the same variable. I have also reproved the DRF guarantee and it works. This definitely fixes the example of Cenciarelli et al; however, the guarantee of any ordering must be examined in greater detail…


2007-03-04 Research

Sequential Consistency vesus W(r) happens-before r

Manson et al. have proved that whenever p is correctly synchronised then for any execution e if every read sees a write that happens-before it in e, then e is sequentially consistent.

This raises some questions:

Q: Is the second implication an equivalence?

A: YES. Why? If e is sequentially consistent exec of correctly synchronised program, then it does not have a race. In particular r and W(r) cannot be a race for any r, which means that they have to be ordered by happens-before. From well-formedness of execution they cannot be ordered r < W(r), hence they must be ordered W(r) < r.

Q: What if the program is NOT correctly synchronised? Is there still the equivalance or at least an implication in any direction?

A: Unfortunately none of these. If the program is not correctly synchronised, the world of its executions is wild and both notions are incomparable, meaning that we can find:

  • a program and its sequentially consistent execution such that there is a read that does not happen before the write it sees, and
  • a program and its execution such that every read in it sees a write that happens-before it, but it is not sequentially consistent.

First example is easy:

        Intially, x = 0
   -------------------------
   Thread 1   | Thread 2
   -------------------------
   x = 1      | r = x

Now look at the execution where r == 1. It is sequentially consistent, but the read of x does not see a write that happens-before it.

The other example is a bit more difficult. Consider the program:

   -------------------------
   Thread 1   | Thread 2
   -------------------------
   lock m1    | lock m2
   y = 1      | y = 2
   x = 1      | x = 2 
   unlock m1  | unlock m2
   lock m2    | lock m1
   r1 = x     | r2 = x
   r3 = y     | r4 = y
   unlock m2  | unlock m1

Now consider any legal execution where r1 == r4 == 1 and r2 == r3 == 2 and unlock m1 in thread 1 synchronises-with lock m1 in thread 2 and unlock m2 in thread 2 synchronises with lock m2 in thread 1.

Then all four reads see writes that happen-before them. However, there is no sequentially consistent execution with this result (just try to construct it!).


2007-02-28 Research

On impossibility of having a global committing store in JMM

An interesting question is whether for any legal execution there is a committing sequence of actions (not necessarily consistent with po!) such that each read sees the most recent write in this sequence. This would give a possiblity of committing actions (not necessarily in program order) into some global store.

However, that is not possible.

Just look at:

         intially, x = y = 0
   -----------------------------------------------------------------------
   x = 1  | if (x == 1) | r1 = x             | if (y == 1)  | if (y == 1)
          |   x = 2     | r2 = x             |   r3 = x     |    r4 = x
          |             | if (r1 == 1 &&     |              |
          |             |     r2 == 2)       |              |
          |             |   y = 1            |              |

Let's look at an execution that could give r3 == 1 and r4 == 2.

This is definitely legal, just commit in sequence all actions in thread 1, thread 2,… thread 5, seeing the appropriate values.

Now note that to commit r3 == 1 and r4 == 2 we really have to commit x = 1 before x = 2 (before y = 1) before r3 = x and r4 = x. As a result, r3 does not see the most recent write.


2007-02-27 Research

Important questions about the Java Memory Model

It seems to me that many people consider the JMM being a closed topic now, because it is specified in the Java language Specification.

I believe that this is just a beginning. Below are my questions about the Java Memory Model and I intend to answer some of them (preferably all :-) in my Phd thesis.

  1. DRF guarantee is whole program property. One data race and we have no guarantee at all. Considering that there is a data race in the hash-code function of string… What are compositional guarantees of the JMM? Can we describe benign data races?
  2. What is out-of-thin-air anyway? Can we have somthing more formal than just a bunch of examples that should be prohibited?
  3. Can we write down a sensible operational semantics rules for the JMM?
  4. What is a denotational semantics of a piece of code?
  5. What about program logic for the JMM?