These statistics are called not only by the statistics page but also by a number of variables in Parser Extensions, which could be added to many posts. Again this is only a "could be".
Despite any report overhead vs. distributed overhead argument, considering the method that is currently used to gather this information, there is a higher likelihood of bad things with 4 almost-table-scans than by adding a single update query when an article is edited.
A similar argument was made for this issue in the past:
http://www.crackedeggstudios.com/issues/702/
But from a scalability standpoint, it would seem that the larger a wiki grows, the more stressful these particular functions become on the server, especially once we begin to process several thousand rows of information several times over on a single page load. Compared to the 1 row that would need to be maintained (with simple +/-1 changes), the trade-off seems worth it in the long run.
This query, in particular, troubles me:
Code:
SELECT COUNT(revisionid) AS revision_count
FROM " . TABLE_PREFIX . "vault_revision
As that table grows exponentially with every wiki article, the query will become much worse. Even more troubling is this new code which has yet to be seen by public eyes, which was added to reduce the memory footprint of VaultWiki dramatically, in this case doubles it:
Code:
$nsarray = $vault->fetch_article('stats', $nsid, LANGUAGEID, 'dump');
This returns the ID of every article in the wiki, which is then used to count thousands of records multiple times.
However, it's also possible to do this without the added footprint simply by changing the key that we use. Rather than fetching the article list for each namespace and counting the records, we can fetch the already-statistics fields for the forum cache, and reconstruct the namespace from that data only.
This approach saved SECONDS of generation time, because about 10 fewer queries were used, 3000 fewer items were saved to memory, and 1000 fewer loop iterations were performed. For the queries, more generic indexes were used (50 possibles on another key rather than 1000 possibles against a primary key), resulting in lower seek times per item (at least that's my understanding - that the first 999 items would be discovered faster because the criteria are laxer).
After all of this, Statistics performance was improved, just not in the way put forth in the first post. For now (until we hear that someone's server complained about the queries), this should suffice.