User Tools

Site Tools

Do Web Requests With Guzzle

Guzzle is kind of a swiss army knife for doing web requests from within PHP code. It comes with thirty bees by default, no need to test for its presence. It's thirty bees' preferred tool for doing requests.

Advantages over PHP built in tools:

  • Same interface for GET, POST and PUT requests.
  • Allows multiple requests over one connection.
  • Supports multiple parallel requests.
  • Can handle cookie sessions automatically.
  • Good error handling capabilities.

Simple File Download

Let's start with a simple task, downloading a remote file synchronously.

Download Into a PHP Variable

For starters, let's look at a somewhat verbose code example for a simple download:

use \GuzzleHttp\Exception\RequestException;
 
$body = false;
$guzzle = new \GuzzleHttp\Client([
    'base_uri'    => 'https://translations.thirtybees.com/packs/1.0.4/',
    'http_errors' => true,
    'verify'      => _PS_TOOL_DIR_.'cacert.pem',
    'timeout'     => 20,
]);
try {
    $body = $guzzle->get('en.json')->getBody();
} catch (RequestException $e) {
    $errorCode = $e->getCode();
    $errorMessage = $e->getMessage();
    if ($e->hasResponse()) {
        // Can happen with 'http_errors' set to true, only.
        $httpMessage = \GuzzleHttp\Psr7\str($e->getResponse());
    }
}
 
if ($body) {
    // Three ways to get the response as a string:
    print($body);                     // Implicit cast.
    $content = (string) $body;        // Explicit cast.
    $content = $body->getContents();  // Read method.
}

Options in detail:

'base_uri'

This is the common part of URIs to be requested. URI for each request gets built by joining this string with the parameter given to get().

'http_errors'

Defaults to true, which causes Guzzle to throw an exception on HTTP 4xx and 5xx errors (e.g. 'page not found') instead of delivering the response as message body. Should be false if such responses are expected answers. Networking errors, e.g. timeouts, always throw an exception.

'verify'

thirty bees comes with its own set of certificates, these should be used. Setting it to a boolean true asks Guzzle to use the set of certificates built into PHP. Setting it to false is insecure. For details, see the --cacert option for curl. Not to be confused with Guzzle option 'cert', which matches --cert in curl.

'timeout'

Timeout for the whole request operation. Defaults to 0, indefinitely. Indefinitely means the PHP script, if things go wrong, times out before the request does, so the malfunction can't get reported to the user, which is bad. Should be set to a generous value, but well below the minimum PHP script timeout (30 seconds). 20 seconds is fine, lower values won't make the request faster. There are also options 'connect_timeout' and 'read_timeout' to split this total timeout up, which is usually pointless.

Note about redirects

By default, Guzzle follows up to 5 redirects. For changing this default, see documentation on option 'redirect'.

Note about large downloads

thirty bees guarantees a PHP script timeout of 30 seconds, only. Accordingly, large downloads, which might take longer than these 30 seconds, are simply not possible from within a script. Large downloads have to be split up into smaller chunks, spanning multiple PHP script invocations.

Download Into a File

Downloading a resource directly into a file is simple. Guzzle requests accept a parameter for defining a target location. Like changing this in the above example:

$body = $guzzle->get('en.json')->getBody();

to this:

$guzzle->get('en.json', ['sink' => _PS_CACHE_DIR_.'en.json']);

One can even combine both, defining a sink as well as getting the result with getBody().

Custom HTTP Headers

Requesting a resource with custom HTTP headers is straightforward as well, one simply defines them. Like:

$guzzle = new \GuzzleHttp\Client([
    'base_uri'    => 'https://translations.thirtybees.com/packs/1.0.4/',
    'http_errors' => true,
    'verify'      => _PS_TOOL_DIR_.'cacert.pem',
    'timeout'     => 20,
    'headers'     => [
        'Accept'        => 'application/json',
        'Content-Type'  => 'application/json;charset=UTF-8',
        'User-Agent'    => 'thirty bees '._TB_VERSION_,
    ],
]);
try {
[...]

A POST Request

Similar to defining a sink, one can simply add a parameter with JSON data to a request:

[...]
try {
    $body = $guzzle->post('foo.php', [
        'form_params' => [
            'message' => 'It\'s nice!',
            'action'  => 'record-it',
        ],
    ])->getBody();
[...]

Note the usage of post() rather than get() here. Actually, there are methods for each of the HTTP request types: get(), delete(), head(), options(), patch(), post(), put(), and even a generic one for usage like this: request('GET', '<path>').

Downloading Multiple Requests

With the code sample given above on can download multiple files, of course. However, doing so can be more efficient.

Multiple Synchronous Requests

When downloading multiple requests synchronously, e.g. to keep the server load low, it still improves performance quite a bit to re-use one Guzzle instance for all requests. Guzzle keeps the connection open after being done, so subsequent requests don't need to negotiate connection parameters again:

$guzzle = new \GuzzleHttp\Client([
    [...]
]);
try {
    $bodyOne = $guzzle->get('en.json')->getBody();
} catch (RequestException $e) {
    [...]
}
try {
    $bodyTwo = $guzzle->get('de.json')->getBody();
} catch (RequestException $e) {
    [...]
}
[...]

Multiple Parallel Requests

Here it becomes a bit more tricky. One can start multiple requests asynchronously, then wait for all of them to be completed.

$content = [];
$guzzle = new \GuzzleHttp\Client([
    'base_uri'    => 'https://translations.thirtybees.com/packs/1.0.4/',
    'verify'      => _PS_TOOL_DIR_.'cacert.pem',
    'timeout'     => 20,
]);
 
// Initiate each request but do not block.
$promises = [
    'en'  => $client->getAsync('en.json'),
    'de'  => $client->getAsync('de.json'),
];
 
// Wait for the requests to complete, even if some of them fail.
$results = \GuzzleHttp\Promise\settle($promises)->wait();
 
foreach ($results as $lang => $download) {
    if ($download['state'] === 'fulfilled') {
        $content[$lang] = (string) $download['value']->getBody();
    }
    if ($download['state'] === 'rejected') {
        $errorCode = $download['reason']->getCode();
        $errorMessage = $download['reason']->getMessage();
    }
}

As one can see, dealing with these requests is a bit differently.

  • For each promise, there's an array with two items inside.
  • Item 'state' gives the request status and can be 'pending', 'fulfilled' or 'rejected'.
  • After wait() returned, states of type 'pending' should be gone.
  • Depending on state, the other item in the array is:
  • * 'value' for state 'fulfilled', which is a normal Guzzle Response object, answering to getHeaders(), getBody(), and so on.
  • * 'reason' for state 'rejected', which is the ClientException object handled in the catch{} clause in the simpler cases above.

Needless to say, one can start a single asynchronous request as well. In case there are lengthy computations ahead it's a good idea to start them early and deal with them later.

References

do_web_requests_with_guzzle.txt ยท Last modified: 2019/01/03 15:22 by Traumflug