Better Grails Batch Import Performance With Redis and Jesque

| Comments

A couple of years ago, I put up a well-received blog post on tuning Batch Import Performance with Grails an MySQL.

I’ve recently needed to revisit some batch importing procedures and have acquired a few extra tools in my Grails utility belt since writing that post: Grails Redis and Grails Jesque.

Redis is a very fast key/value store, where the values are not just strings, but are data structures like lists, sets, and hash maps. I’m the main author of the grails redis plugin, and it’s my favorite pragmatic technology of the past few years. If you’re new to Redis, check out the presentation slides I gave at this year’s gr8conf.

Jesque is a Java implementation of Resque. A Redis-backed message queueing system for creating background jobs. The Jesque plugin is fully integrated with Grails and allows you to create worker jobs that are spring injected and have an active hibernate session. Resque was written in Ruby by the folks at GitHub.

This combination makes parallelizing work very easy, as most of the pain of trying to spin off threads in grails is handled for you by Jesque. Yes, there’s GPars, but the threads that it creates aren’t spring injected and don’t have hibernate sessions.

Using Jesque is as simple as:

  1. create a Job class that implements a perform method.
  2. tell Jesque to start up 1..n worker threads that monitor a queue and use your Job to process work
  3. enqueue work on the queue so workers can pick it up

I’ve created a bitbucket repository with all of the source code from the original Batch Import post, as well as with the enhancements below.

The example problem is that there is a Library class that produces metadata for 100,000 books that we want to persist in the database as Book domain objects.

package com.naleid.example

class Book {
    String title
    String isbn
    Integer edition

    static constraints = {
    }

    static mapping = {
        isbn column:'isbn', index:'book_isbn_idx'
    }
}

The naive way of doing this takes Grails ~3 hours to do the inserts. The original batch performance post showed how to improve this time from 3 hours to 3 minutes with a few Grails and MySQL tweaks.

Using Redis + Jesque to parallelize the task, I’m able to cut that time in half again to a little over 90 seconds on my MacBook Air.

On real-world imports, where there is quite a bit more data and potentially other linked domain objects that can be memoized with the redis-plugin, I’ve seen a >100x speed improvement over the original serial import, even with the tuning tips from my original post.

Install redis and clone the test project from bitbucket to try it yourself. Just grails-run app, go to the running app on localhost and click on the link to the SerialBookController to see the original version, or the ParallelBookController to see the faster Redis+Jesque version. Each will display the length of time they took to do the insert after they’re done.

The ParallelBookController calls bookService.parallelImportBooksInLibrary(). That method spins up a number of worker threads, iterates through the books in the Library and enqueues each one on a Jesque queue. When it’s done iterating through the Library, it tells all the threads to end when they’re done processing all the work:

    def parallelImportBooksInLibrary(library) {
        Integer workerCount = 10
        String queueName = "import:book"
        withWorkers(queueName, BookConsumerJob, workerCount) {
            library.each { Map bookValueMap ->
                String bookValueMapJson = (bookValueMap as JSON).toString()
                jesqueService.enqueue(queueName, BookConsumerJob.simpleName, bookValueMapJson)
            }
        }
    }

    void withWorkers(String queueName, Class jobClass, Integer workerCount = 5, Closure closure) {
        def workers = []
        def fullQueueName = "resque:queue:$queueName"
        try {
            workers = (1..workerCount).collect { jesqueService.startWorker(queueName, jobClass.simpleName, jobClass) }
            closure()
            // wait for all the work we've generated to be pulled off the queue
            while (redisService.exists(fullQueueName)) sleep(500)
        } finally {
            // all work is off the queue, tell each worker to kill themselves when they're finished
            workers*.end(false)
        }
    }   

The work queue that persist the Book domain objects in the database are very simple Jesque Job artefacts that are spring injected and have an active hibernate session. They can be of any class type. The only requirement is that they have a method named perform that is called and passed an item of work from the queue.

Here’s the example BookConsumerJob class that persists a Book to the database:

package com.naleid.example

import grails.converters.JSON

class BookConsumerJob {
    def bookService

    void perform(String bookJson) {
        bookService.updateOrInsertBook(JSON.parse(bookJson))
    }
}

You can see how simple the BookConsumerJob class is. It also calls out to the same bookService method that the serial batch import calls to import a Book.

One other neat thing about using Jesque is that it adheres to the Resque conventions for what gets stored in Redis. This means that you can gem install resque-web and then launch resque-web to get a nice monitoring platform for your Jobs and to see errors, or how much work is left in the queue.

Comments