Skip to main content

Catch the Batch !

Are you a developer ?

Are you facing PHP timeouts, memory exhausts while dealing with huge data sets ?

YES ??? ... WELCOME ! you are at the right place. If this is such a common problem, Drupal

must be having some solutions in store for us. Let's find out who's the life saver this time...

Drupal's Batch API.

'Serious' wikipedia 'boringly' explains general batch processing as execution of series of programs

on a computer without manual intervention and the same definition holds true in Drupal as well.

Performing series of operations with large data sets often results in timeouts or memory

exhausted errors but then how does Drupal's Batch API manage all these?

What's the mystry behind Drupal's Batch API which handles such situations without errors?

Well, breaking long tasks into smaller batches and spreading out series of programs over different

page requests is the key. This ensures that our processing is not interfered by PHP timeouts or

memory exhausts. Batch uses ajax request to keep the connection alive across several requests.

Batch API also provides a progress bar while the process is running to keep the end user

informed of what they are at in the process.

If we don't require the feedback and want to run batches in parallel, Drupal also provides us

Queue API. As of now, let's restrict our discussion to Batch API only.

So, the mystry behind Drupal's Batch API got revealed . Let's check out how we implement it.

For understanding, let's focus on very basic and minimal requirements to set up a batch process

in Drupal 7.

1. Implement hook_menu to register the path for batch process.

The callback function here can either be

Drupal form ('page callback' => 'drupal_get_form') or
a custom function ('page callback' => 'custom_batch_function').

Going for Drupal forms will require us to implement hook_form and the corresponding submit

handler as well.

2. In the custom function or the form's submit handler depending on which method we want to go

with, set up batch process by invoking batch_set() with an associative array :

$batch = array(
'title' => t('Updating ...'), // Title to display while running.
'operations' => array(
array('batch_process_operation', array($arg1, $arg2)),
array('batch_process_another_operation', array($arg3,)),
), // Callback functions with arguments for performing batch operations like update,import etc.
'finished' => 'batch_process_finished', // Function to call at end on batch process.
'init_message' => t('Initializing...'), // Msg to display while initializing.
'progress_message' => t('Fixed @current out of @total.'), // Msg to track progress.
'error_message' => t('Batch process encountered an error.'),// Msg to inform about errors.
'file' => drupal_get_path('module', 'my_module_name') . '/batch_file.inc',
// Location of the file in which the callback functions mentioned above are
written. Not required if they are in the same file i.e .module file.
);
batch_set($batch);
batch_process('<path-to-redirect-upon-batch-completion>');// Needed only when not in form_submit handler.

So, we just added a batch process. I hope to retain your attention to take you through the

defining stage.

3. Done with declaration part, let's move on to defining our callback functions. So, who's taking

the lead here ?

It's a variable '$context' which contains Batch API data.

This is always present as a parameter of the callback functions. $context being a reference

variable, retains its value as the batch progresses.
function batch_process_operation($arg1, $arg2, &$context) {
  // We can use $context['sandbox'] based on our requirement to store information about
  progress between successive calls. Something like ...
  if (!isset($context['sandbox']['progress'])) {
    $context['sandbox']['progress'] = 0;
    $context['sandbox']['current_element'] = 0;
    $context['sandbox']['max'] = < max no. of actions to be performed (we can query here
    to get the max count) > ;
  }
  // Set value of limit to some no. which we think our server can process in a go without     timeouts.
  $limit = < no. of elements to process per batch> ;
  // Fetching next set of elements to be processed. Something similar to...
  $result = db_select('my_table')
    ->fields('my_table_fields')
    ->condition('element_id', $context['sandbox']['current_element'], '>')
    ->orderBy('element_id')
    ->range(0, $limit)
    ->execute();
  foreach ( $result as $row ) {
    // Here we perform our processing based on the requirement.
    // Storing some value in $context['results'] for processing in 'finished' callback.
    $context['results'][] = < some value like content title etc. >
    // Updating progress information
    $context['sandbox']['progress']++;
    $context['sandbox']['current_element'] = < current_element_id >;
  }
  // Checking batch progress information and if not reached max, set the completion      level. Something similar to ...
  if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
    $context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
  }
}

Well, to have a closer look at the components of the $context variable, please refer this. And with

this ,we are done with our batch process operations part too.

Are you still with me ? Well... That's inspiring !!! Here comes the valediction for you.

4. Defining the batch process finished callback.

// @param $success: Is true if successful else false.
// @param $results: Contains information of processed data.
// @param $operations: Contains the operations that remained unprocessed.
function batch_process_finished($success, $results, $operations) {
// Display final message to end user.
}

That's it!

Yes, these four simple steps and we are able to perform jobs involving huge amount of data. So

far so good but what if ... while we are updating 1000 nodes of our site and 10 of them didn't

update due to some or the other reason. And here, we got stuck.

Batch API is an abstraction layer for performing tasks and in such a process we lose track of what

is actually happening at the other end. So, how to go about debugging these ?

We have the indispensable Devel module to our rescue. This time we go ahead with dd() which

logs a variable to a drupal_debug.txt in the site's temp directory. With this we need not wait till

batch process completion to know what's happening below the layer and we'll be able to keep

track of ongoing actions.

That was a brief overview of my understandings of Drupal's Batch API. Hope it was helpful with no

more fear of timeouts and memory exhausts, thanks to Batch API.

Feedback and corrections are highly appreciated in comments below.

Comments

Popular posts from this blog

My First Drupal 8 Module.

Lots of reading and reading ... How about trying some coding ?? Huhh !!! Coding in drupal 8. Frightened ?? Me too :) But what I exactly feel here is that this is just the fear of the unknown. And everything we want is on the other side of fear. I replace my fear of the unknown with curiosity. Give it a shot, who knows if this works out for you too :) Let's start together with a simple stuff like "Printing a Drupal 8 welcome message" by creating a small module. Taking a deeep breath, let's dive in. As we must have seen or read that Drupal 8 follows a very strict directory structure, so let's start with looking for the directory structure followed by custom modules in Drupal 8. 1 . Directory structure : It is advised to keep our custom modules by creating a folder 'custom' under 'modules' folder in the root directory of our site. So, our custom module 'first_d8_module' will have directory structure something like this...

Know Your Customers !!!

A user landed on a web page and started filling a form. After going half way, he/she got distracted or suddenly felt not interested in the same and tried to leave the page by either closing the tab or by closing the browser itself. In either case, we (a company) lost a prospective client and as the user didn't submit the form we have no info of the user in order to contact them and get the details of them not being interested. Is this your story too ?? Congratulations my friend ... you found the solution for yourself. Please continue reading... To start with let's modularize our requirement. There's a web page containing a form, a user lands on it and started filling the form: 1 . As soon as the user tries to leave the page, ask them if they are ok with sharing their unsubmitted data with you and leave the page. 2 . If a user chooses to stay on the page, simply allow him to be there. 3 . If a user opts for leaving the page without submitting the for...

Request to Response in Drupal 8

Here ? So, I am assuming we have some idea of Symfony2 and it's components used in Drupal 8 by now. Great, let's move ahead. Every web interaction as we know starts with a request and ends with a response, same goes here with Drupal 8. We send a request and receive some response from Drupal but what goes internally ? Let's dive in and find out. Of all the important components of Symfony2 being used in Drupal 8, HTTPKernel and HTTPFoundation plays an important role in handling a page request but this time the proces uses an object oriented way. Ohh.. big names :) Let's know something more about these. HTTPKernel : This component consists of an HTTPKernelInterface which provides a method, handle(). This method accepts $request object as parameter and returns $response object. HTTPFoundation : This component is responsible for creating and managing the $request and $response objects. We can say that, it's an object oriented replacement of sup...