How a wrong carrier implementation causes a server outage

How a wrong carrier implementation causes a server outage

Sometimes one wrong line of code can break your site. In the following I will describe a mistake in a Magento 2 custom carrier implementation, which causes a massive overloading of server resources (CPU, RAM, DB processes) and even can cause an outage of your Magento store.

The one line of code

The following line of code is the reason for the problems, if used in the collectRates() method, or in methods, called from collectRates() in the Carrier class:

$quote = $this->checkoutSession->getQuote();

So, in other words, you must not obtain the quote object globally via the checkout session.

The reason

The method \Magento\Checkout\Model\Session::getQuote(), called for the first time, triggers loading the quote. If we then look at the method \Magento\Quote\Model\Quote::_afterLoad() :

    /**
     * Trigger collect totals after loading, if required
     *
     * @return $this
     */
    protected function _afterLoad()
    {
        // collect totals and save me, if required
        if (1 == $this->getTriggerRecollect()) {
            $this->collectTotals()->save();
            $this->setTriggerRecollect(0);
        }
        return parent::_afterLoad();
    }

We then can see, that for quotes, having the field (also a DB column) trigger_recollect set to 1, collectTotals() method is called.

An attentive reader will already notice, what is going wrong here. It’s an infinite loop! Quote::collectTotals() will trigger shipping carriers’ method collectRates() and thats where the loop is closed.

The trigger_recollect flag is set in Magento:

  • for quotes depending on catalog price rules
  • for quotes containing products which were updated (e.g. in Admin or via API)

In my case there were a lot of such kind of quotes because of frequent product updates.

The result was overloaded CPUs, RAM, full MySQL process list and several outages as the infinite loops were being executed for the value of seconds equals PHP max_execution_time.

How to avoid this

The shipping carrier’s method collectRates() gets the object of the class \Magento\Quote\Model\Quote\Address\RateRequest passed, where the already loaded quote object should be obtained from (if needed). Unfortunately there is no method “getQuote()” in the RateRequest class. The following snippet shows an example of obtaining the quote correctly:

        /**
         * Do not use checkoutSession->getQuote()!!! it will cause infinite loop for
         * quotes with trigger_recollect = 1, see Quote::_afterLoad()
         */
        $items = $request->getAllItems();
        if (empty($items)) {
            return false;
        }

        /** @var \Magento\Quote\Model\Quote\Item $firstItem */
        $firstItem = reset($items);
        if (!$firstItem) {
            return false;
        }

        $quote = $firstItem->getQuote();
        if (!($quote instanceof \Magento\Quote\Model\Quote)) {
            return false;
        }

I hope this post can save some nerves for you and your team. Feel free to leave a comment.

One thought on “How a wrong carrier implementation causes a server outage

Leave a Reply

Your email address will not be published. Required fields are marked *