Skip to main content

Catch the Batch !

Are you a developer ?

Are you facing PHP timeouts, memory exhausts while dealing with huge data sets ?

YES ??? ... WELCOME ! you are at the right place. If this is such a common problem, Drupal

must be having some solutions in store for us. Let's find out who's the life saver this time...

Drupal's Batch API.

'Serious' wikipedia 'boringly' explains general batch processing as execution of series of programs

on a computer without manual intervention and the same definition holds true in Drupal as well.

Performing series of operations with large data sets often results in timeouts or memory

exhausted errors but then how does Drupal's Batch API manage all these?

What's the mystry behind Drupal's Batch API which handles such situations without errors?

Well, breaking long tasks into smaller batches and spreading out series of programs over different

page requests is the key. This ensures that our processing is not interfered by PHP timeouts or

memory exhausts. Batch uses ajax request to keep the connection alive across several requests.

Batch API also provides a progress bar while the process is running to keep the end user

informed of what they are at in the process.

If we don't require the feedback and want to run batches in parallel, Drupal also provides us

Queue API. As of now, let's restrict our discussion to Batch API only.

So, the mystry behind Drupal's Batch API got revealed . Let's check out how we implement it.

For understanding, let's focus on very basic and minimal requirements to set up a batch process

in Drupal 7.

1. Implement hook_menu to register the path for batch process.

The callback function here can either be

Drupal form ('page callback' => 'drupal_get_form') or
a custom function ('page callback' => 'custom_batch_function').

Going for Drupal forms will require us to implement hook_form and the corresponding submit

handler as well.

2. In the custom function or the form's submit handler depending on which method we want to go

with, set up batch process by invoking batch_set() with an associative array :

$batch = array(
'title' => t('Updating ...'), // Title to display while running.
'operations' => array(
array('batch_process_operation', array($arg1, $arg2)),
array('batch_process_another_operation', array($arg3,)),
), // Callback functions with arguments for performing batch operations like update,import etc.
'finished' => 'batch_process_finished', // Function to call at end on batch process.
'init_message' => t('Initializing...'), // Msg to display while initializing.
'progress_message' => t('Fixed @current out of @total.'), // Msg to track progress.
'error_message' => t('Batch process encountered an error.'),// Msg to inform about errors.
'file' => drupal_get_path('module', 'my_module_name') . '/batch_file.inc',
// Location of the file in which the callback functions mentioned above are
written. Not required if they are in the same file i.e .module file.
);
batch_set($batch);
batch_process('<path-to-redirect-upon-batch-completion>');// Needed only when not in form_submit handler.

So, we just added a batch process. I hope to retain your attention to take you through the

defining stage.

3. Done with declaration part, let's move on to defining our callback functions. So, who's taking

the lead here ?

It's a variable '$context' which contains Batch API data.

This is always present as a parameter of the callback functions. $context being a reference

variable, retains its value as the batch progresses.
function batch_process_operation($arg1, $arg2, &$context) {
  // We can use $context['sandbox'] based on our requirement to store information about
  progress between successive calls. Something like ...
  if (!isset($context['sandbox']['progress'])) {
    $context['sandbox']['progress'] = 0;
    $context['sandbox']['current_element'] = 0;
    $context['sandbox']['max'] = < max no. of actions to be performed (we can query here
    to get the max count) > ;
  }
  // Set value of limit to some no. which we think our server can process in a go without     timeouts.
  $limit = < no. of elements to process per batch> ;
  // Fetching next set of elements to be processed. Something similar to...
  $result = db_select('my_table')
    ->fields('my_table_fields')
    ->condition('element_id', $context['sandbox']['current_element'], '>')
    ->orderBy('element_id')
    ->range(0, $limit)
    ->execute();
  foreach ( $result as $row ) {
    // Here we perform our processing based on the requirement.
    // Storing some value in $context['results'] for processing in 'finished' callback.
    $context['results'][] = < some value like content title etc. >
    // Updating progress information
    $context['sandbox']['progress']++;
    $context['sandbox']['current_element'] = < current_element_id >;
  }
  // Checking batch progress information and if not reached max, set the completion      level. Something similar to ...
  if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
    $context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
  }
}

Well, to have a closer look at the components of the $context variable, please refer this. And with

this ,we are done with our batch process operations part too.

Are you still with me ? Well... That's inspiring !!! Here comes the valediction for you.

4. Defining the batch process finished callback.

// @param $success: Is true if successful else false.
// @param $results: Contains information of processed data.
// @param $operations: Contains the operations that remained unprocessed.
function batch_process_finished($success, $results, $operations) {
// Display final message to end user.
}

That's it!

Yes, these four simple steps and we are able to perform jobs involving huge amount of data. So

far so good but what if ... while we are updating 1000 nodes of our site and 10 of them didn't

update due to some or the other reason. And here, we got stuck.

Batch API is an abstraction layer for performing tasks and in such a process we lose track of what

is actually happening at the other end. So, how to go about debugging these ?

We have the indispensable Devel module to our rescue. This time we go ahead with dd() which

logs a variable to a drupal_debug.txt in the site's temp directory. With this we need not wait till

batch process completion to know what's happening below the layer and we'll be able to keep

track of ongoing actions.

That was a brief overview of my understandings of Drupal's Batch API. Hope it was helpful with no

more fear of timeouts and memory exhausts, thanks to Batch API.

Feedback and corrections are highly appreciated in comments below.

Comments

Popular posts from this blog

Drupal 8 ... Bring it on !!

Back after a loooonnggg gap so, need a bang to get started again, isn't it ? Well, I have something in store :) Yet not very comfy with Drupal 7 but does that allow me to keep my friend, Drupal 8 waiting at the door ? Will he really wait for me to be done with Drupal 7 ? Nooo way, he is running in his own pace ... so what are we waiting for friends, need to catch up. So, from here on let's dive into the ocean of Drupal 8. It's about a month or even more, just thinking that it's high time I introduce myself with Drupal 8 or say the other way round ;) Friends, I know most of us might be thinking not to skip steps while climbing up but at times even three dots are enough to define an area, so why not let's get started. We may get stuck, not an issue, will keep adding dots so as to get closer to our perfect structure. Let's begin with the Getting Started guide of drupal.org. Will be back soon with my way of moving ahead with Drupal 8. Till

404 Expenses Not Found !!

Everyone out there on web must have come across "404", "Page Not Found" messages. The most important thing with these 404s is that they will happen, they are inevitable. Before moving into complexities, lets recollect what a 404 error means. In simple words, 404 is that our server is not able to find what we have requested, so it says “Sorry buddy, I give up." in the form of page-not-found messages. Keeping in mind this inevitability of these 404s, you can wonder what problems it can pose to our site and what these 404 errors can cost us. Imagine a website, trying to fetch dynamic resources for each of it's page request, some of which may result in a 404. Such requests, resulting in a 404 on a heavy traffic site can eventually add up to be actually troublesome. Hold on ... there is more to it. Let's see what happens when Drupal comes into picture. Even while serving a 404 page in Drupal, it does the full "bootstrap", l

Request to Response in Drupal 8

Here ? So, I am assuming we have some idea of Symfony2 and it's components used in Drupal 8 by now. Great, let's move ahead. Every web interaction as we know starts with a request and ends with a response, same goes here with Drupal 8. We send a request and receive some response from Drupal but what goes internally ? Let's dive in and find out. Of all the important components of Symfony2 being used in Drupal 8, HTTPKernel and HTTPFoundation plays an important role in handling a page request but this time the proces uses an object oriented way. Ohh.. big names :) Let's know something more about these. HTTPKernel : This component consists of an HTTPKernelInterface which provides a method, handle(). This method accepts $request object as parameter and returns $response object. HTTPFoundation : This component is responsible for creating and managing the $request and $response objects. We can say that, it's an object oriented replacement of sup