To start, I have been watching
top for a while this morning.
In general, a simple wiki page is on par with a simple forum thread, in terms of CPU usage. The more complicated a wiki page becomes (length + templates), the higher the spike, and there are situations where both a complex wiki page and a complex forum thread (such as a thread with 10 XF 2.3 embeds) lead to high usage. Another page I have seen with higher usage is the Feeds index (the page that shows entries from multiple feeds -- usually previews or full wiki pages for about 10 pages on screen at once). I suppose if many visitors were accessing these same pages at once it could lead to a spike about half as high as yours.
In all likelihood, there are probably a combination of factors contributing to increased load.
Now that about a day has passed, would you say your CPU usage graph is consistent with your graph in 4.1.7?
Possible Factor: Sudden Wiki EMBEDs
Existing URL links to wiki pages on the forum are now being converted to XF2.3 EMBED. Every usage is actually a partial nested wiki page being rendered on each view. From what I understand, XF provides no direct means to disable EMBED for testing, but you can try to modify
src/addons/vw/vw/XF/BbCode/Renderer/Html23.php renderTagEmbed with an early
for testing.
If you are using a version of XF 2.3 < 2.3.2, upgrade right away. Early versions had various issues, depending on the version, not the least of which was the possibility of EMBEDs recursively rendering each other.
Possible Factor: AutoComplete
The XF 2.3 integration of auto-complete in search. In general, auto-complete results include a small preview of each result. For wiki pages, that means we have to render the content when the page has no summary filled out. If your visitors are actively searching with autocomplete, and there are a lot of wiki results with no summaries, this could lead to an increase in CPU activity.
To test this, you can edit
src/addons/vw/vw/Handler/Search/Data/AutoCompletableTrait.php. Find:
Code:
if ($desc === '' AND isset($item['pagetext']))
BEFORE it, add:
Possible Factor: SVG Icons
The XF 2.3 change to SVG icons, instead of icons from a single font. Since VW uses a lot of icons, this could contribute to a spike in HTTP activity downloading the individual icons, rather than downloading 1 font. I doubt it would account for the full spike. Making sure the XF SVGs are being served from a CDN might reduce this.
Possible Factor: Thumbnail Generation
In
top, the only time I have noticed spikes remotely in the range from your graph (at or above 20%), is when loading a page that has wiki images that still need thumbnails. The on-the-fly thumbnail generation causes a spike. When I reload the page and the same now-thumbnailed images appear, there is no spike.
In 4.1.8, thumbnails are added to a queue on-the-fly when needed. While the queue is waiting, the full size image is used. Visitors cache whichever version they get, so they won't request it again.
In 4.1.7 and below, they were generated all at once on-the-fly when needed and tended to crash when there were to too many triggered at once. After a crash, all the thumbnails that were attempted were marked failed or had corrupt results. Failed thumbnails would result in the full size image being used. Visitors cached whichever version they got, so they wouldn't request it again.
The 4.1.7 behavior had a single CPU spike, a crash, and then no more CPU spike. In 4.1.8, the CPU usage is spread over time until all queued thumbnails are done. This results in lower spikes that don't crash, but more noticeable usage on the graph nonetheless.
Depending how many wiki images your site has and how actively your wiki is viewed and queuing those up, the queue might take a long time. If you don't really have enough images to fill your graph for so many hours (I'm betting you don't), then there are likely other factors at play.
Possible Factor: Rogue Background Process
After upgrading VaultWiki, old background processes from earlier versions that failed get requeued, in case the new version made a change that allows the process to succeed this time. If you had an insane number of particularly intense processes that failed, but are now able to run, this could cause a spike. Generally, such a spike should only last so long until the process has completed. But it is hypothetically feasible that one or more of these processes could be stuck in an infinite loop for whatever reason, which could lead to spikes that never go away.
This possibility would really require someone to look at your installation and debug the background process activity.
Possible Factor: Something Else
There's always the possibility that something else is at play. We did profile VaultWiki not too long ago, and we did not encounter any bottlenecks that were attributable to anything unexpected. Autolinks were a bottleneck. Templates, especially nested templates, were a bottleneck, but when drilled down, it was nothing other than the nature of those things. I'll make sure we profile it again in the coming few days, just in case.