Archive of articles classified as' "metaprogramming"

Back home

Better Grails Batch Import Performance with Redis and Jesque

2011/10/13

A couple of years ago, I put up a well-received blog post on tuning Batch Import Performance with Grails an MySQL.

I’ve recently needed to revisit some batch importing procedures and have acquired a few extra tools in my Grails utility belt since writing that post: Grails Redis and Grails Jesque.

Redis is a very fast key/value store, where the values are not just strings, but are data structures like lists, sets, and hash maps. I’m the main author of the grails redis plugin, and it’s my favorite pragmatic technology of the past few years. If you’re new to Redis, check out the presentation slides I gave at this year’s gr8conf.

Jesque is a Java implementation of Resque. A Redis-backed message queueing system for creating background jobs. The Jesque plugin is fully integrated with Grails and allows you to create worker jobs that are spring injected and have an active hibernate session. Resque was written in Ruby by the folks at GitHub.

This combination makes parallelizing work very easy, as most of the pain of trying to spin off threads in grails is handled for you by Jesque. Yes, there’s GPars, but the threads that it creates aren’t spring injected and don’t have hibernate sessions.

Using Jesque is as simple as:

  1. create a Job class that implements a perform method.
  2. tell Jesque to start up 1..n worker threads that monitor a queue and use your Job to process work
  3. enqueue work on the queue so workers can pick it up

I’ve created a bitbucket repository with all of the source code from the original Batch Import post, as well as with the enhancements below.

The example problem is that there is a Library class that produces metadata for 100,000 books that we want to persist in the database as Book domain objects.

package com.naleid.example
 
class Book {
    String title
    String isbn
    Integer edition
 
    static constraints = {
    }
 
    static mapping = {
        isbn column:'isbn', index:'book_isbn_idx'
    }
}

The naive way of doing this takes Grails ~3 hours to do the inserts. The original batch performance post showed how to improve this time from 3 hours to 3 minutes with a few Grails and MySQL tweaks.

Using Redis + Jesque to parallelize the task, I’m able to cut that time in half again to a little over 90 seconds on my MacBook Air.

On real-world imports, where there is quite a bit more data and potentially other linked domain objects that can be memoized with the redis-plugin, I’ve seen a >100x speed improvement over the original serial import, even with the tuning tips from my original post.

Install redis and clone the test project from bitbucket to try it yourself. Just grails-run app, go to the running app on localhost and click on the link to the SerialBookController to see the original version, or the ParallelBookController to see the faster Redis+Jesque version. Each will display the length of time they took to do the insert after they’re done.

The ParallelBookController calls bookService.parallelImportBooksInLibrary(). That method spins up a number of worker threads, iterates through the books in the Library and enqueues each one on a Jesque queue. When it’s done iterating through the Library, it tells all the threads to end when they’re done processing all the work:

    def parallelImportBooksInLibrary(library) {
        Integer workerCount = 10
        String queueName = "import:book"
        withWorkers(queueName, BookConsumerJob, workerCount) {
            library.each { Map bookValueMap ->
                String bookValueMapJson = (bookValueMap as JSON).toString()
                jesqueService.enqueue(queueName, BookConsumerJob.simpleName, bookValueMapJson)
            }
        }
    }
 
    void withWorkers(String queueName, Class jobClass, Integer workerCount = 5, Closure closure) {
        def workers = []
        def fullQueueName = "resque:queue:$queueName"
        try {
            workers = (1..workerCount).collect { jesqueService.startWorker(queueName, jobClass.simpleName, jobClass) }
            closure()
            // wait for all the work we've generated to be pulled off the queue
            while (redisService.exists(fullQueueName)) sleep(500)
        } finally {
            // all work is off the queue, tell each worker to kill themselves when they're finished
            workers*.end(false)
        }
    }

The work queue that persist the Book domain objects in the database are very simple Jesque Job artefacts that are spring injected and have an active hibernate session. They can be of any class type. The only requirement is that they have a method named perform that is called and passed an item of work from the queue.

Here’s the example BookConsumerJob class that persists a Book to the database:

package com.naleid.example
 
import grails.converters.JSON
 
class BookConsumerJob {
    def bookService
 
    void perform(String bookJson) {
        bookService.updateOrInsertBook(JSON.parse(bookJson))
    }
}

You can see how simple the BookConsumerJob class is. It also calls out to the same bookService method that the serial batch import calls to import a Book.

One other neat thing about using Jesque is that it adheres to the Resque conventions for what gets stored in Redis. This means that you can gem install resque-web and then launch resque-web to get a nice monitoring platform for your Jobs and to see errors, or how much work is left in the queue.

3 Comments

Dynamically setting Grails Log4J levels with the Console Plugin

2011/09/23

If you’ve got Burt Beckwith’s great Grails Console Plugin installed, it’s easy to tweak the logging levels dynamically in your grails application.

The quick and dirty way to switch your logging level dynamically, if you know the name of the logger is just to do this in your console window:

import org.apache.log4j.*
Logger.getLogger("org.springframework").level = Level.DEBUG

Sometimes, a few helper methods can help you see what the current config is (especially if you’ve changed some things), as well as figure out what the right loggers are to tweak. This sample script can be used in a grails console to make it easy to view and change the logging level to whatever you want, just cut and paste it into your application’s console window (in dev it defaults to: http://localhost:8080/yourAppName/console):

import org.apache.log4j.Logger
import org.apache.log4j.Level
import static org.apache.log4j.Level.*
 
def getRootLogger() { Logger.rootLogger }
def getAllLoggers() { rootLogger.loggerRepository.currentLoggers.toList().sort { it.name } }
def getActiveLoggers() { allLoggers.findAll { it.level } }
def getLogger(String logName) { rootLogger.getLogger(logName) }
def setLevel(String logName, Level level) { rootLogger.getLogger(logName).level = level }
 
def printLogger(logger) { println "${logger.name} -> ${logger.level}" }
def printAllLoggers() { allLoggers.each { printLogger(it) } }
def printActiveLoggers() { activeLoggers.each { printLogger(it) } }

This makes it easy to see what logs are currently active (those with a log level set):

printActiveLoggers()

prints something like:

grails.app.filters.LoggingFilters -> DEBUG
grails.app.filters.SecurityFilters -> DEBUG
grails.app.service.grails.plugin.redis.RedisService -> WARN
grails.app.task -> DEBUG
org.apache.cxf -> DEBUG
...

You can also list all loggers, which also adds in those loggers who’s log level is currently `null`:

printAllLoggers()

prints:

grails.app -> DEBUG
grails.app.bootstrap.BootStrap -> null
grails.app.bootstrap.QuartzBootStrap -> null
grails.app.codec.org.codehaus.groovy.grails.plugins.codecs.Base64Codec -> null
grails.app.codec.org.codehaus.groovy.grails.plugins.codecs.HTMLCodec -> null
grails.app.codec.org.codehaus.groovy.grails.plugins.codecs.HexCodec -> null
...

You can also dynamically grab/create a logger and set it’s logging level to something more or less verbose than it’s current value:

def logger = getLogger("grails.app.service.grails.plugin.redis.RedisService")
printLogger(logger)   // initially WARN
logger.level = INFO  
printLogger(logger)   // prints INFO

prints:

grails.app.service.grails.plugin.redis.RedisService -> WARN
grails.app.service.grails.plugin.redis.RedisService -> INFO

It’d be easy to turn this into a simple gsp/controller that accepts changes and can list things out. There are also a number of other plugins out there that let you view/change logging levels (including another one of Burt’s plugins, app info), but if you don’t have those installed, this is a quick way to see what’s going on with your application.

3 Comments

Creating New Instances of Spring “Singleton” Beans with Grails BeanBuilder

2011/03/7

When I’m integration testing Grails service classes, I often want to mock off a part of the class so that a complicated code branch isn’t followed that I’m not trying to test.

Grails will helpfully inject fully autowired Spring service beans into my test if I ask for them. Unfortunately, if I change the metaClass of the injected service, that change persists beyond where we want it to:

class MyService {
    def myMethod() { "unmodified" }
}
 
class MyServiceTests extends GroovyTestCase {
    def myService  // injected automatically by spring/grails
 
    void testOne() {
        myService.metaClass.myMethod = {-> "modified" }
        assertEquals "modified", myService.myMethod()
    } 
 
    void testTwo() {
        assertEquals  "unmodified", myService.myMethod() // WTF!  Returns "modified", pollution from first test
    }
}

Grails services are Spring “singleton” objects. They’re not true singletons though, singleton’s are just cached in the application context and returned whenever getBean is called.

Historically, if I wanted to mock part of my service manually, I’d need to “new” up my own instance of the service and manually inject any dependencies that the service might need to function. This is both painful and fragile, if the service adds or removes dependencies, chances are that the tests are going to break.

I realized that if I could ask Grails/Spring for a new instance of the “singleton” service that I could muck with it all I wanted in my test without worrying about polluting other tests with my changes. After some digging into the grails spring support, I came up with the following method that could be added to an integration test (or integration test base class):

// spring "singleton" objects really aren't they're just cached by their application context
def getNewSingletonInstanceOf(Class clazz) {
    String beanName = "prototype${clazz.name}"
    BeanBuilder beanBuilder = new BeanBuilder(ApplicationHolder.application.mainContext)
 
    beanBuilder.beans {
        "$beanName"(clazz) { bean ->
            bean.autowire = 'byName'
        }
    }
 
    beanBuilder.createApplicationContext().getBean(beanName)
}

This method uses the BeanBuilder to construct a temporary ApplicationContext with the Grails mainContext as a parent so that other dependencies can be resolved. This method won’t work if your service has changes to how it’s wired up and configured by spring, but the majority of Grails service classes are simply autowired byName.

Here’s a more detailed example of use. Given this service:

package com.example
 
class MyService {
    def injectedService
 
    def serviceMethod() {
        return otherMethod()
    }
 
    def otherMethod() {
        return "original value"
    }
}

I’m able to generate per-test autowired instances of my service and mock out otherMethod without polluting other tests (or the Spring injected version of the bean):

package com.example
 
import grails.spring.BeanBuilder
import org.codehaus.groovy.grails.commons.ApplicationHolder
 
class MyServiceTests extends GroovyTestCase {
    def myServiceInstance  // our new
    def myService // spring injected version
 
    protected void setUp() {
        super.setUp()
        myServiceInstance = getNewSingletonInstanceOf(MyService)
    }
 
    protected void tearDown() {
        super.tearDown()
    }
 
    // spring "singleton" objects really aren't they're just cached by their application context
    def getNewSingletonInstanceOf(Class clazz) {
        String beanName = "prototype${clazz.name}"
        BeanBuilder beanBuilder = new BeanBuilder(ApplicationHolder.application.mainContext)
 
        beanBuilder.beans {
            "$beanName"(clazz) { bean ->
                bean.autowire = 'byName'
            }
        }
 
        beanBuilder.createApplicationContext().getBean(beanName)
    }
 
    void testNewSingletonInstance() {
        assertNotNull myServiceInstance // created uniquely for this test in setUp
        assertNotNull myService         // spring injected into integration test
 
        assertNotSame myService, myServiceInstance
 
        // we've got unique instances of MyService, but both are injected with the same singleton dependencies
        assertSame myService.injectedService, myServiceInstance.injectedService
    }
 
    void testMessingWithMetaClassDoesNotAffectOriginalSingleton() {
        myServiceInstance.metaClass.otherMethod = {-> "new value" }
 
        assertEquals "new value", myServiceInstance.serviceMethod()
        assertEquals "original value", myService.serviceMethod()
 
        def anotherInstance = getNewSingletonInstanceOf(MyService)
 
        assertEquals "original value", anotherInstance.serviceMethod()
    }
}

Using this technique lets you leverage Spring’s autowiring in your tests, but also gives you the flexibility to override areas not under test to improve test readability and maintainability.

6 Comments

Using the Grails BeanBuilder to Set Arbitrary Properties From an External Config

2011/02/25

I’m working with an existing library (Jedis a Redis client library) that has a fairly complicated connection pool config file with a large variety of potential properties that could be worth setting depending on the environment that my Grails app is running in.

I wanted the ability to define the set of properties that I wanted to override in the config file without having to call them all out explicitly in the Spring resources.groovy file. If I missed one, or if the client library that I’m using added a new one that I don’t notice, I don’t want to have to release a new version of the code just to set it.

Grails allows you to load external config files simply by defining a reference to them in Config.groovy (this code is even commented out in the default Config.groovy file that gets generated automatically with a new grails app):

grails.config.locations = ["file:${userHome}/.grails/${appName}-config.groovy"]

After a little playing around with the BeanBuilder syntax, I was able to come up with a solution that lets me set whatever values I want in the Config file and have them set on the bean that I have Spring/Grails build.

If you have a config like this:

foo {
    foo = "bar"
    baz = 4
}

You can populate your resources.groovy with something like this to set whatever
values are set in your config file:

beans = {
    def fooMap = application.config?.foo
 
    fooBean(Foo) {
        fooMap?.each { key, value ->
            delegate.setProperty(key, value)
        }
    }
}

This will make a bean that has it’s `foo` set to “bar” and it’s `baz` set to 4.

Later, if I find that I need to set the `baz` property on the fooBean in production, I just add that in my config file and everything works without any code changes.

4 Comments

Introduction to Using Redis with Groovy

2010/12/28

I’m more excited about Redis than just about any other technology right now.

Redis is an insanely fast key/value store, in some ways similar to memcached, but the values it stores aren’t just dumb blobs of data, but can also be hashes, lists, sets and sorted sets. It provides a number of atomic operations on each of those data types (ex: union and intersection methods on sets) and it has been called a “data structure server”.

It’s used in production today by a number of very popular websites including Craigslist, GitHub, The Guardian, and Digg.

Redis is of particular note for Groovy/Grails developers because it’s development is financially supported by VMWare/SpringSource. Redis was also the first non-relational data store to be officially supported by core Grails developers.

It has excellent documentation and a super simple wire protocol that has made it easy for a ton of client libraries for just about every language to pop up. You could probably write a simple client library in a day. Once you know the commands, you can even interact with it through telnet (though you don’t have to :).

Installing Redis

Read the rest of this article »

8 Comments

Grails issue ‘not processed by flush()’ root causes

2010/11/21

I’ve learned some painful lessons recently around hibernate sessions in grails that are pretty obvious in retrospect. I wanted to get some quick notes down as there’s not a lot of good information out there around this in grails that I could find. Here are some quick bullet points about what I’ve learned around grails management of the hibernate session and when it gets flushed.

  • Grails normally manages when the session gets flushed, sometimes this happens when you might not be expecting it and isn’t always caused by a save/delete. It can be caused by doing a read only command too.
  • If you have an object that you’ve made changes to (it’s dirty), if you make any calls to the database involving that object (such as a findBy using the object), grails will automatically flush the session so that the query that you make is consistent.
  • if it didn’t do this, your query wouldn’t have the dirty information available to it and your query would return results inconsistent with reality in the session you’re using. So if you have a parent object that’s dirty because you added a child object, but before saving it, you called Children.findAllByParent(parent). The parent object gets flushed to the database so the findAll is correct.
  • If that’s not a big deal to you (you’re querying something that doesn’t have anything to do with the dirty fields), you can use Foo.withNewSession { } closure to make the database call without flushing the current session
  • if it is a big deal to you, then let the session flush and let the object get persisted.
  • but if in the process of flushing (before the findBy/query gets run) additional objects are brought into the session, you’ll hit the dreaded ‘collection [com.foo.bar.Baz.quxs] was not processed by flush()’
  • One place where this could happen is if you have a custom validator that does a query to the database and pulls any related objects back that aren’t already in the session (ex: a validator on Parent that asserts that all children are younger than the parent).
  • You can use Foo.withNewSession in your custom validators to try to get around this. But beware a StackOverflow error (or a red zone memory exception on OSX because of a JVM bug) where it gets into a loop of flushing out sessions
  • If you do use Foo.withNewSession, make sure you understand that the database calls that you make will be completely unaware of any changes to the current object. Also that you won’t be able to use the instance of the object passed to the custom validator, but will instead have to do Foo.load(object.id) and then use the result of that to query the DB. The performance of all of this probably sucks (but that might not matter for your use case).
  • Chances are that what you should really be doing though is using a service/transaction and actually saving the object first before calling the dynamic finder on the dirty object. Then, when it’s persisted, you can safely use the clean object in your finder.
5 Comments

Adding Logging around all of the Methods of a Class with Groovy

2010/09/11

An interesting question came up on stack overflow, “In Groovy Is there a way to decorate every class to add tracing”. I came up with the following solution using groovy’s invokeMethod.

The invokeMethod method gets called by groovy’s MOP for every method call that happens on an object. If we add our own version, we need to make sure we keep a reference to the original metaClass so that within our invokeMethod, we can still get access to the other methods that we want to delegate to.

class Foo {
    def bar() {
        println "in bar"
    }
 
    def baz(String name) {
        println "in baz with $name"
    }
}
 
 
 
def decorateMethodsWithLogging(clazz) {
    def mc = clazz.metaClass
 
    mc.invokeMethod = { String name, args ->
        println "before $name, args = $args"
        def result = mc.getMetaMethod(name, args).invoke(delegate, args)
        println "after $name"
        return result
    }
}
 
 
decorateMethodsWithLogging(Foo.class)
 
def f = new Foo()
f.bar()
f.baz("qux")

prints

before bar, args = []
in bar
after bar
before baz, args = [qux]
in baz with qux
after baz
6 Comments

Groovy Each Iterator with Peek-ahead at Next Collection Value

2010/06/15

Groovy closures combined with iterators make it simple to create our own enhanced iterators that let us process a collection how we want to.

I write my own custom iterators all the time and name them something descriptive. This makes the code much more readable. Rather than trying to decipher what a for loop is trying to do, we wrap up all of that iteration logic into a meaningful name and we cleanly separate that iteration from the processing that we’re doing with each element.

This kind of design is a core concept in Uncle Bob’s Clean Code, one of my favorite programming books in the last few years.

This example iterates over a collection and calls the passed in closure until we hit a value greater than 5.

def eachUntilGreaterThanFive = { collection, closure ->
    for ( value in collection ) {
        if ( value  > 5 ) break
        closure(value)
    }
}
 
def a = [1, 2, 3, 4, 5, 6, 7]
 
eachUntilGreaterThanFive(a) {
    println it
}

prints:

1
2
3
4
5

This code makes it obvious what the iterator is doing (looping till we hit a condition) as well as what will happen with each element iterated over (print it out).

For a real life example, I had a need to iterate over a list of values and where I needed both the current object as well as a peek at the next object in the list.

Doing this is Java is a bit of a pain, but groovy makes it easy to write and (hopefully) to read, we can also add it directly onto the Collection metaClass so that it’s available for all of our Collection instances:

Collection.metaClass.eachWithPeek = { closure ->
    def last = null
    delegate?.each { current ->
        if (last) closure(last, current)
        last = current
    }
    if (last) closure(last, null)
}

These test cases show that as we iterate through the collection, we can see the current item and peek at the next one (if any). If the collection is empty, we don’t execute the closure it, and if we’re at the end of the list there isn’t anything to peek at:

[].eachWithPeek { current, peek ->
    assert false // shouldn't get here, nothing to iterate through
}
 
[1].eachWithPeek { current, peek ->
    assert current == 1
    assert peek == null  // only 1 element, nothing to peek at
}
 
def results = []
[1, 2, 3, 4, 5].eachWithPeek { current, peek ->
    results << [current, peek]
}
assert results == [[1, 2], [2, 3], [3, 4], [4, 5], [5, null]]
1 Comment

Grails build-test-data plugin version 1.0 released

2010/03/17

I’ve finally released version 1.0 of the grails build-test-data plugin.

If you’re not familiar with build-test-data, the quick summary is that it puts a build() method on all grails domain objects. Calling that method will automatically construct and save an instance of that domain object that conforms to all of the domain’s constraints. It also allows you to override the values that you want to explicitly set. It makes your tests much cleaner and less fragile as you only need to specify the values that actually matter to a particular test method instead of building a huge graph of objects just to satisfy constraints.

The plugin has been quite stable for the past 6 months or so, and has survived upgrades from grails 1.0.X through grails 1.2.1 with minimal changes. Because of this, I’ve decided to move the version from 0.2.3 to 1.0 to indicate that the plugin is stable and ready to be used. I’ve also added the LICENSE file releasing the plugin under the Apache 2.0 open source license (the same license as grails). It was always open source, but I had neglected to add the official license file in the past.

If you’re not familiar with the build-test-data plugin, the documentation on the wiki is thorough, I’ve also given a presentation on build-test-data that explains why it’s better than other existing data generation technologies.

The biggest changes for this release compared to the last are a number of bugfixes around making sure that both sides of a one-to-many relationship get populated correctly and that there isn’t a need to refresh() from the database.

Previously, if you had an Author that hasMany Books, and each Book belongsTo an Author, you’d need to refresh the Author if you tried to build a new book with an existing author:

Author eap = Author.findByName("Edgar Allan Poe")
Book b = Book.build(author: eap, title: "The Tell-Tale Heart")
 
assertEquals eap.name, b.author.name  // works, linked in OK previously
assertEquals 1, eap.books.size() // FAILED previously WORKS now, previously the book wasn't added properly to the author side of things

The previous workaround was to call eap.refresh() to reload the author from the database, or to have the user manually addToBooks(b). Both solutions were ugly and kludgy and this issue has now been fixed.

3 Comments

Interrogating Arbitrary Groovy Closures for Values

2010/01/24

Inspired by this question on stackoverflow, I decided to create a utility class that allowed me to determine generically what calls a closure makes (without actually letting it make any calls). This lets me see what it’s trying to do before letting it actually do it.
Read the rest of this article »

4 Comments