Took a long time, at least 2 weeks, and a wasted day at blitz.io before I figured this one out.
Occasionally, any web page on the site would randomly break. Either the first click would have no effect, it would only load part of the page, or most commonly, only the background would be visible. It seemed to occur about 5% of the time.
This issue was especially problematic because it contributed to higher bounce rates and of course, wasted advertising expenses.
For the exact times when the issue occurred, nginx logs revealed only "readv() failed (104: Connection reset by peer) while reading upstream", which implied simply that something had gone wrong in PHP. Google searches revealed no useful solutions to this problem that applied to my case. Even so, blaming PHP didn't make much sense to me, because it seemed to occur after PHP had already sent the output to the browser (we would get the site background, after all).
I wondered if maybe SPDY support was broken in Google Chrome (some Google discussions seemed to suggest Google servers had similar trouble in 2011), or if my version of nginx had broken SPDY code. Upgrading nginx to a version that had SPDY bugfixes didn't help, and everything I read about the Chrome issue suggested it was only an issue with Google servers and only during that period in 2011.
So after spending 6 hours messing with nginx, PHP, and TCP timeouts on my server, I was ready to give up.
Since we had trouble earlier with Zend Opcache and the cart (
discussed here), I wondered if maybe Zend Opcache was also related to this. Finally I tried disabling Zend Opcache entirely, and surprisingly, I discovered that I could no longer reproduce the problem.
I read through the Opcache docs hoping to see some mention of another configuration directive I had turned on or didn't that might be contributing to this problem. I really did not want to go back to XCache. After all, Zend has proved to be almost 40% faster at times. Finally I narrowed it down to:
Code:
opcache.fast_shutdown = 1
I turned that setting off, and with Zend Opcache turned on, no longer had any ERR_SPDY_PROTOCOL_ERRORs or random connection drops. Thankfully, disabling fast_shutdown did not appear to have any major impact on performance (perhaps 1ms was added).