Are you a developer ?
Are you facing PHP timeouts, memory exhausts while dealing with huge data sets ?
YES ??? ... WELCOME ! you are at the right place. If this is such a common problem, Drupal
must be having some solutions in store for us. Let's find out who's the life saver this time...
Drupal's Batch API.
'Serious' wikipedia 'boringly' explains general batch processing as execution of series of programs
on a computer without manual intervention and the same definition holds true in Drupal as well.
Performing series of operations with large data sets often results in timeouts or memory
exhausted errors but then how does Drupal's Batch API manage all these?
What's the mystry behind Drupal's Batch API which handles such situations without errors?
Well, breaking long tasks into smaller batches and spreading out series of programs over different
page requests is the key. This ensures that our processing is not interfered by PHP timeouts or
memory exhausts. Batch uses ajax request to keep the connection alive across several requests.
Batch API also provides a progress bar while the process is running to keep the end user
informed of what they are at in the process.
If we don't require the feedback and want to run batches in parallel, Drupal also provides us
Queue API. As of now, let's restrict our discussion to Batch API only.
So, the mystry behind Drupal's Batch API got revealed . Let's check out how we implement it.
For understanding, let's focus on very basic and minimal requirements to set up a batch process
in Drupal 7.
1. Implement hook_menu to register the path for batch process.
The callback function here can either be
Going for Drupal forms will require us to implement hook_form and the corresponding submit
handler as well.
2. In the custom function or the form's submit handler depending on which method we want to go
with, set up batch process by invoking batch_set() with an associative array :
So, we just added a batch process. I hope to retain your attention to take you through the
defining stage.
3. Done with declaration part, let's move on to defining our callback functions. So, who's taking
the lead here ?
It's a variable '$context' which contains Batch API data.
This is always present as a parameter of the callback functions. $context being a reference
variable, retains its value as the batch progresses.
Well, to have a closer look at the components of the $context variable, please refer this. And with
this ,we are done with our batch process operations part too.
Are you still with me ? Well... That's inspiring !!! Here comes the valediction for you.
4. Defining the batch process finished callback.
That's it!
Yes, these four simple steps and we are able to perform jobs involving huge amount of data. So
far so good but what if ... while we are updating 1000 nodes of our site and 10 of them didn't
update due to some or the other reason. And here, we got stuck.
Batch API is an abstraction layer for performing tasks and in such a process we lose track of what
is actually happening at the other end. So, how to go about debugging these ?
We have the indispensable Devel module to our rescue. This time we go ahead with dd() which
logs a variable to a drupal_debug.txt in the site's temp directory. With this we need not wait till
batch process completion to know what's happening below the layer and we'll be able to keep
track of ongoing actions.
That was a brief overview of my understandings of Drupal's Batch API. Hope it was helpful with no
more fear of timeouts and memory exhausts, thanks to Batch API.
Feedback and corrections are highly appreciated in comments below.
Are you facing PHP timeouts, memory exhausts while dealing with huge data sets ?
YES ??? ... WELCOME ! you are at the right place. If this is such a common problem, Drupal
must be having some solutions in store for us. Let's find out who's the life saver this time...
Drupal's Batch API.
'Serious' wikipedia 'boringly' explains general batch processing as execution of series of programs
on a computer without manual intervention and the same definition holds true in Drupal as well.
Performing series of operations with large data sets often results in timeouts or memory
exhausted errors but then how does Drupal's Batch API manage all these?
What's the mystry behind Drupal's Batch API which handles such situations without errors?
Well, breaking long tasks into smaller batches and spreading out series of programs over different
page requests is the key. This ensures that our processing is not interfered by PHP timeouts or
memory exhausts. Batch uses ajax request to keep the connection alive across several requests.
Batch API also provides a progress bar while the process is running to keep the end user
informed of what they are at in the process.
If we don't require the feedback and want to run batches in parallel, Drupal also provides us
Queue API. As of now, let's restrict our discussion to Batch API only.
So, the mystry behind Drupal's Batch API got revealed . Let's check out how we implement it.
For understanding, let's focus on very basic and minimal requirements to set up a batch process
in Drupal 7.
1. Implement hook_menu to register the path for batch process.
The callback function here can either be
Drupal form ('page callback' => 'drupal_get_form') or
a custom function ('page callback' => 'custom_batch_function').
a custom function ('page callback' => 'custom_batch_function').
Going for Drupal forms will require us to implement hook_form and the corresponding submit
handler as well.
2. In the custom function or the form's submit handler depending on which method we want to go
with, set up batch process by invoking batch_set() with an associative array :
$batch = array(
'title' => t('Updating ...'), // Title to display while running.
'operations' => array(
array('batch_process_operation', array($arg1, $arg2)),
array('batch_process_another_operation', array($arg3,)),
), // Callback functions with arguments for performing batch operations like update,import etc.
'finished' => 'batch_process_finished', // Function to call at end on batch process.
'init_message' => t('Initializing...'), // Msg to display while initializing.
'progress_message' => t('Fixed @current out of @total.'), // Msg to track progress.
'error_message' => t('Batch process encountered an error.'),// Msg to inform about errors.
'file' => drupal_get_path('module', 'my_module_name') . '/batch_file.inc',
// Location of the file in which the callback functions mentioned above are
written. Not required if they are in the same file i.e .module file.
);
batch_set($batch);
batch_process('<path-to-redirect-upon-batch-completion>');// Needed only when not in form_submit handler.
'title' => t('Updating ...'), // Title to display while running.
'operations' => array(
array('batch_process_operation', array($arg1, $arg2)),
array('batch_process_another_operation', array($arg3,)),
), // Callback functions with arguments for performing batch operations like update,import etc.
'finished' => 'batch_process_finished', // Function to call at end on batch process.
'init_message' => t('Initializing...'), // Msg to display while initializing.
'progress_message' => t('Fixed @current out of @total.'), // Msg to track progress.
'error_message' => t('Batch process encountered an error.'),// Msg to inform about errors.
'file' => drupal_get_path('module', 'my_module_name') . '/batch_file.inc',
// Location of the file in which the callback functions mentioned above are
written. Not required if they are in the same file i.e .module file.
);
batch_set($batch);
batch_process('<path-to-redirect-upon-batch-completion>');// Needed only when not in form_submit handler.
So, we just added a batch process. I hope to retain your attention to take you through the
defining stage.
3. Done with declaration part, let's move on to defining our callback functions. So, who's taking
the lead here ?
It's a variable '$context' which contains Batch API data.
This is always present as a parameter of the callback functions. $context being a reference
variable, retains its value as the batch progresses.
function batch_process_operation($arg1, $arg2, &$context) {
// We can use $context['sandbox'] based on our requirement to store information about
progress between successive calls. Something like ...
if (!isset($context['sandbox']['progress'])) {
$context['sandbox']['progress'] = 0;
$context['sandbox']['current_element'] = 0;
$context['sandbox']['max'] = < max no. of actions to be performed (we can query here
to get the max count) > ;
}
// We can use $context['sandbox'] based on our requirement to store information about
progress between successive calls. Something like ...
if (!isset($context['sandbox']['progress'])) {
$context['sandbox']['progress'] = 0;
$context['sandbox']['current_element'] = 0;
$context['sandbox']['max'] = < max no. of actions to be performed (we can query here
to get the max count) > ;
}
// Set value of limit to some no. which we think our server can process in a go without timeouts.
$limit = < no. of elements to process per batch> ;
// Fetching next set of elements to be processed. Something similar to...
$result = db_select('my_table')
->fields('my_table_fields')
->condition('element_id', $context['sandbox']['current_element'], '>')
->orderBy('element_id')
->range(0, $limit)
->execute();
foreach ( $result as $row ) {
// Here we perform our processing based on the requirement.
// Storing some value in $context['results'] for processing in 'finished' callback.
$context['results'][] = < some value like content title etc. >
// Updating progress information
$context['sandbox']['progress']++;
$context['sandbox']['current_element'] = < current_element_id >;
}
// Checking batch progress information and if not reached max, set the completion level. Something similar to ...
if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
$context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
}
}
$limit = < no. of elements to process per batch> ;
// Fetching next set of elements to be processed. Something similar to...
$result = db_select('my_table')
->fields('my_table_fields')
->condition('element_id', $context['sandbox']['current_element'], '>')
->orderBy('element_id')
->range(0, $limit)
->execute();
foreach ( $result as $row ) {
// Here we perform our processing based on the requirement.
// Storing some value in $context['results'] for processing in 'finished' callback.
$context['results'][] = < some value like content title etc. >
// Updating progress information
$context['sandbox']['progress']++;
$context['sandbox']['current_element'] = < current_element_id >;
}
// Checking batch progress information and if not reached max, set the completion level. Something similar to ...
if ($context['sandbox']['progress'] != $context['sandbox']['max']) {
$context['finished'] = $context['sandbox']['progress'] / $context['sandbox']['max'];
}
}
Well, to have a closer look at the components of the $context variable, please refer this. And with
this ,we are done with our batch process operations part too.
Are you still with me ? Well... That's inspiring !!! Here comes the valediction for you.
4. Defining the batch process finished callback.
// @param $success: Is true if successful else false.
// @param $results: Contains information of processed data.
// @param $operations: Contains the operations that remained unprocessed.
function batch_process_finished($success, $results, $operations) {
// Display final message to end user.
}
// @param $results: Contains information of processed data.
// @param $operations: Contains the operations that remained unprocessed.
function batch_process_finished($success, $results, $operations) {
// Display final message to end user.
}
That's it!
Yes, these four simple steps and we are able to perform jobs involving huge amount of data. So
far so good but what if ... while we are updating 1000 nodes of our site and 10 of them didn't
update due to some or the other reason. And here, we got stuck.
Batch API is an abstraction layer for performing tasks and in such a process we lose track of what
is actually happening at the other end. So, how to go about debugging these ?
We have the indispensable Devel module to our rescue. This time we go ahead with dd() which
logs a variable to a drupal_debug.txt in the site's temp directory. With this we need not wait till
batch process completion to know what's happening below the layer and we'll be able to keep
track of ongoing actions.
That was a brief overview of my understandings of Drupal's Batch API. Hope it was helpful with no
more fear of timeouts and memory exhausts, thanks to Batch API.
Feedback and corrections are highly appreciated in comments below.
Comments
Post a Comment