Let’s say you wanted to fetch the public data for VG.no and tech.vg.no from Facebook’s Graph API. One might use file_get_contents, or a standard curl-call:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [code language="php"]<?php $urls = array( 'http://graph.facebook.com/http://tech.vg.no', 'http://graph.facebook.com/http://www.vg.no', ); foreach ($urls as $url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); echo curl_exec($ch); curl_close($ch); } [/code] |
The problem with this approach is obviously that it waits for each request to return before proceeding to the next. If each request takes 5 seconds to perform, we’d have to wait 10 seconds before we’d be able to process all our data and return it to the user.
So, how do we make these request perform in parallel? curl_multi_*. This feature seems to have been introduced in PHP5, but I’m sure many (like me) have not come across it before. Let’s take a look at how we could optimize our code using these functions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | [code language="php"]<?php $urls = array( 'http://graph.facebook.com/http://tech.vg.no', 'http://graph.facebook.com/http://www.vg.no', ); $multi = curl_multi_init(); $channels = array(); // Loop through the URLs, create curl-handles // and attach the handles to our multi-request foreach ($urls as $url) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_multi_add_handle($multi, $ch); $channels[$url] = $ch; } // While we're still active, execute curl $active = null; do { $mrc = curl_multi_exec($multi, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); while ($active && $mrc == CURLM_OK) { // Wait for activity on any curl-connection if (curl_multi_select($multi) == -1) { continue; } // Continue to exec until curl is ready to // give us more data do { $mrc = curl_multi_exec($multi, $active); } while ($mrc == CURLM_CALL_MULTI_PERFORM); } // Loop through the channels and retrieve the received // content, then remove the handle from the multi-handle foreach ($channels as $channel) { echo curl_multi_getcontent($channel); curl_multi_remove_handle($multi, $channel); } // Close the multi-handle and return our results curl_multi_close($multi); [/code] |
Using this technique, we’ve reduced the time it takes to perform the requests down to the slowest request in the batch. Nice!
However, I’m sure many (like me) are looking at this code and thinking: “Wow. That is a lot of code, and far from readable”. I agree. Thanks to a wonderful PHP HTTP client called Guzzle, we can achieve the same result with much more readable code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | [code language="php"]<?php use Guzzle\Http\Client, Guzzle\Common\Exception\MultiTransferException; $client = new Client('http://graph.facebook.com'); try { $responses = $client->send(array( $client->get('/' . urlencode('http://tech.vg.no')), $client->get('/' . urlencode('http://www.vg.no')), )); foreach ($responses as $response) { echo $response->getBody(); } } catch (MultiTransferException $e) { echo 'The following exceptions were encountered:' . PHP_EOL; foreach ($e as $exception) { echo $exception->getMessage() . PHP_EOL; } } [/code] |
I’ve put together a simple demo repository showing the difference in speed between these approaches. It’s available on Github for anyone who wants to take a look.
Happy HTTP’ing!