Sunday, December 26, 2010

Maven Surefire 2.7.1

Todays release of surefire 2.7.1 is an important milestone for me. Lets start out with what the 2.7 series has to offer

  • Multiple run-orders for tests now supported
    The runOrder attribute lets you specify alphabetical, reversealphabetical, random, hourly (alphabetical on even hours, reverse alphabetical on odd hours) and filesystem. Odd/Even for hourly is determined at the time the of scanning the classpath, meaning it could change during a multi-module build.
  • Faster, smaller
    About 1/3 of the total download size of 2.6. I get quite a lot of feedback saying it's significantly faster too; your mileage will vary.
  • Parallel JUnit
    Surefire is now totally self-contained and no longer uses ConfigurableParallelComputer for anything. Real execution times per test are also reported. It won't get much better than this.
  • Severe memory/resource leak fixed for those of you who have console output.
    The more your tests were writing to stdout/stderr, the worse the problem was. This one has been here since 2.4 ;)
  • Pluggable/Selectable providers
    Surefire is a framework for forking and reporting with a few additional features, such as directory scanning services. Until 2.7, this has all been wired together in one monolithic slab of code with test-providers ( TestNG, junit3/4/4.7) seemingly independent but in reality all welded together by massive dependencies and strange divisions of labor. No more. 2.7.X makes it possible to write your own providers. Best of all there's really not much work you need to do to create one.
    Need a "fork every 20 tests" provider? Fork one of the existing ones and make it yourself on github, probably in less than an hour. This also means we will be closing some of the more exotic requests as won't fix, since you can just do it yourself.
    Read about it here and here for the api
  • JDK 1.3 fork-compatibility restored.
    I ended up doing this just for the sheer heck of it; kind of a challenge. Pardon the language.

There's other issues fixed too, but these are the highlights.

* Why ?
I started programming on my C64 when I was 13 and I've been coding passionately ever since. I went to university and when I finished developing software became my day job. And although I keep my code clean, professional software development is also a lot about making deliveries with trade-offs and sometimes compromises. And we move on. If it doesn't come back to haunt us, it was probably good enough - no matter how frustrated you felt when making it.

Not so with my Open Source work. There I will only do stuff that somehow is the most excellent work I am capable of. It's the hobbyist computing returned, but my capacity is oh so infinitely different from when I was a teenager. Several OSS companies have offered me jobs, which I have turned down. If I was to take that I somehow feel I'd be taking the hobby as work once more.

So how does all of this relate to surefire? Until recently, surefire has been real messy code. While the basic design contains some interesting and pretty sophisticated stuff, it was basically a big mess. Huge classes with 400 line long methods is the trending topic. It was code without any discernible shape or form; mutable state mutating at every opportunity. All this mostly due to different people working on it at different points in time.

I've seen this happen, even with the best of people working together, and I will not dwell further as to the reasons for this. But I love working with the structure of code, and I think programming is communication; me talking to you - disconnected in time and space.

So I refactor. And I think. Sometimes I can think about a change for a week, code it in few hours only to realize it was wrong and throw it all away. Sometimes I can discover midway that there is a more important angle on solving the problem; something totally different is more important and will have bigger impact on future change. And I just stash away everything and do The Thing I Now Know Is Right. Repeat until satisifed.

I spent months of spare time working with surefire 2.7 this way. I think the results are pretty amazing; the plugin is basically transformed into clean code. Every year around Christmas time, I tend to bring this book to the fireplace. It's still one of the most amazing books written about how to think code. Kent has rewritten this book for Java several times, but I still recommend the Smalltalk version - even if you never programmed smalltalk. And take a look at surefire trunk. There's still work to do. Make a patch. Merry xmas.


Friday, February 19, 2010

Concurrency in maven ?

Within the maven community, there has been a push towards parallelizing maven itself
to achieve better build performance within multi-module reactor builds. A number of strategies have been tried, and the main two strategies are: Parallel reactor mode (uses module dependency graph to schedule builds that can be built in concurrently side-by side). The other strategy is known as "weave" mode, and it traverses the modules phase-by-phase instead of module-by module (you can read about it here)

Both have "fully" functional implementations available, and weave mode is quite a lot faster than parallel mode. The code is available at http://github.com/krosenvold/maven3. Just build and run with -Dmaven.threads.experimental=4

So what is this post about ? I am the primary author of "weave" mode, and for the last weeks I've been searching for an elusive goal: 1000 consecutive green builds of 1 single project on my CI environment.

Initially I was quite afraid of the thread safety issues withing maven; after all retromounting concurrency to any non-concurrent code can be a daunting task. Fortunately there is a lot of state that is /copied/ in maven reactor mode. From a concurrency perspective, this saves the day.

So why am I not getting my 1000 greens ? Every 3-400 builds it would fail, with strange errors. I asked a few questions (and this one) on stackoverflow.com. It's the file system. The java file system
has no guarantees of /anything/ when it comes to concurrency. The only thing you can be sure about is that the single thread that wrote the file can also read it afterwards.

javac uses the file system. And I was quite baffled by this; in weave mode the javacs are invoked on a pretty tight schedule; they typically come within just a few ms of each other. Every now and then the downstream javac would complain about "bad class files" from the upstream javac. But the scheduling is done properly, and the first javac /was/ done. How to solve it ? Turn on "forkMode" in maven for javac.

I had a chat with the nice folks at #kernel and they told me that all contents of a file should be concurrently visible to /everyone/ upon close() in a modern linux kernel. When I turned on forkMode in javac the problem went away. Because forkMode=true basically delegates the visibility issues to the os.

You /can/ try this yourself if you check out from github and try to build a project. It works best if you do a "mvn -Dmaven.threads.experimental=4 clean install", since that'll write a lot of files.

I'm still scratching my head about what to do with this; given that forking delegates visibility to the underlying os one could just fork everything all the time. Or find some other option.. Suggestions ?

Thursday, January 14, 2010

Run your junit tests concurrently with maven, junit (and maybe also spring3) in 5 minutes or less!

Surefire 2.5 is released, and it contains the concurrent junit patches. In this post I'll give the quick rundown of how to try out your current maven based build in a concurrent fashion. Just a few initial thoughts:

Will my tests run concurrently ?


Probably not, as is. Most existing test fixtures that people use have singletons and shared state that needs to be fixed first. This may be everything from a shared file, shared TCP/IP port or a static member variable in some base class that can't be static any more.

It took me about a day to fix these things in my current project. My project is large and complex. using all sorts of dark spring magics. Your mileage may vary.

What performance gain can I expect ?


For an IO-bound test (integration test/selenium test etc), the sky's the limit ;)

For a fairly optimized unit-test set, expect little or no gain - maybe 15-20%.
The reason for this is that until jdk7, all classloading and all classpath bound resource access is synchronized across the VM. Running unit tests is mostly about classloading and classpath resources :( If your unit-tests have other kinds of I/O bindings you may be luckier.

If you're using spring inside your test fixtures (like we do), you can probably squeeze a few drops of speed out of it by making lazy-init==true globally. You're going to test it all anyway, right ?

How to do it!


Make sure you upgrade to surefire-2.5:


<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.5</version>
</plugin>


Make sure you upgrade to junit 4.8.1 (still not in maven central repo, install locally):

<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.1</version>
<scope>test</scope>
</dependency>


(You /can/ use 4.7 but it still has a couple of concurrency bugs in it that were fixed in 4.8.1)

Fix your surefire configuration:

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
....
<parallel>classes</parallel>
</configuration>
</plugin>


Legal values for parallel are "classes, methods or both". Now you can run, but you should probably read the rest of this post first...

Classes, methods or both ?


From a concurrency perspective, "classes" is probably the easiest to start with, since it will probably run into the smallest number of troubles in your test fixtures. I really recommend you try to get both working, since you'll probably need to identify any additional issues raised by using "methods". You want a test-suite you can trust, right ?

The fine print


One drawback of the current default concurrency implementation in junit is that it does not allow you to constrain threads, which is desirable for almost all use cases I am aware of. But junit is extendable, and I have made a supplemental add-on that allows threads to be configured too. Surefire knows to invoke this one if it's present on your classpath, meaning you'll get additional options available:

The configurable-parallel-computer still has a few issues that are unsolved, so you
may consider dropping by the issue tracker to check if these will bother you before adding it to your project

git clone http://github.com/krosenvold/configurable-parallel-computer.git
cd configurable-parallel-computer
mvn install


<dependency>
<groupId>org.jdogma.junit</groupId>
<artifactId>configurable-parallel-computer</artifactId>
<version>1.5</version>
<scope>test</scope>
</dependency>




<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
....
<parallel>classes</parallel>
<threadCount>2</threadCount>
</configuration>
</plugin>


You can also use the setting "perCoreThreadCount", which scales threadcount per CPU core.

The configurable-parallel-computer project also contains a better output demultiplexer than the one in surefire (which has a ketchup-bottle behaviour). Surefire knows about this one too, so adding configurable-parallel-computer to your classpath will give you smooth project output. I'm sure we'll get the improved version into surefire at some later time.


Spring


If you're using spring you need to use a forked version of spring-test. The patch I submitted to spring is targeted for 3.1, whatever that may mean in practice:

git clone http://github.com/krosenvold/org.springframework.test.git
cd org.springframework.test
mvn install

You need to replace your dependency on "spring-test" to the forked version:


<dependency>
<groupId>org.rosenvold.springframework</groupId>
<artifactId>spring-test</artifactId>
<version>3.0.0.RELEASE</version>
<scope>test</scope>
</dependency>

Remember to also suppress the transitively dependent spring-test artifacts, like this:

<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-aspects</artifactId>
<version>3.0.0.RELEASE</version>
<exclusions>
<exclusion>
<groupId>org.springframework</groupId>
<artifactId>spring-test</artifactId>
</exclusion>
</exclusions>
</dependency>


Make sure mvn dependency:tree does not contain the spring version of spring-test. Any inclusion of the 3.0 artifact WILL get you into trouble.

If you're using a Mock Session Scope based on the Spring Session Scope (with a custom ContextLoader), you need to make sure you override the synchronized methods in the base class with your own implementations, or your tests may deadlock on the session scope. This is a real bug in spring, but one that probably does not occur to often in real-life scenarios; but quite a lot when running concurrent tests.

Off you go !