Issue: Improve Statistics Performance

Issue Tools
- View Changes

July 20, 2009 12:11 AM

pegasus

VaultWiki Team

Improve Statistics Performance

Certain information about the wiki is relatively server intensive to grab - statistics such as # of pages, redirects, edits. As a solution for example, rather than dumping an entire namespace to memory to count pages, have a single statistics record for that namespace with that number already in the database.

Issue Details

Issue Number 700

Issue Type Task

Project VaultWiki 3.x Series

Category Namespaces

Status Completed

Priority 5 - Minor Bugs / Small Tweaks

Target Version 2.3.2

Resolved Version 2.5.0

Milestone VaultWiki 2.5.x

Software DependencyAny

Votes to perform 0

Votes not to perform 0

Attachments 0

Assigned Users (none)

Tags (none)

July 20, 2009 11:58 PM

tommythejoat

Regular Member

I think the perception is correct, but I would recommend a trial calculation on the maintenance of the record on a site with a few hundred simultaneous users where one might reasonably assume that only a small percentage of them would ever request the report.

You are trading the report overhead for the distributed overhead.

Reply
July 21, 2009 3:26 AM
pegasus

VaultWiki Team
These statistics are called not only by the statistics page but also by a number of variables in Parser Extensions, which could be added to many posts. Again this is only a "could be".

Despite any report overhead vs. distributed overhead argument, considering the method that is currently used to gather this information, there is a higher likelihood of bad things with 4 almost-table-scans than by adding a single update query when an article is edited.

A similar argument was made for this issue in the past: http://www.crackedeggstudios.com/issues/702/

But from a scalability standpoint, it would seem that the larger a wiki grows, the more stressful these particular functions become on the server, especially once we begin to process several thousand rows of information several times over on a single page load. Compared to the 1 row that would need to be maintained (with simple +/-1 changes), the trade-off seems worth it in the long run.

This query, in particular, troubles me:

Code:

SELECT COUNT(revisionid) AS revision_count FROM " . TABLE_PREFIX . "vault_revision

As that table grows exponentially with every wiki article, the query will become much worse. Even more troubling is this new code which has yet to be seen by public eyes, which was added to reduce the memory footprint of VaultWiki dramatically, in this case doubles it:

Code:

$nsarray = $vault->fetch_article('stats', $nsid, LANGUAGEID, 'dump');

This returns the ID of every article in the wiki, which is then used to count thousands of records multiple times.

However, it's also possible to do this without the added footprint simply by changing the key that we use. Rather than fetching the article list for each namespace and counting the records, we can fetch the already-statistics fields for the forum cache, and reconstruct the namespace from that data only.

This approach saved SECONDS of generation time, because about 10 fewer queries were used, 3000 fewer items were saved to memory, and 1000 fewer loop iterations were performed. For the queries, more generic indexes were used (50 possibles on another key rather than 1000 possibles against a primary key), resulting in lower seek times per item (at least that's my understanding - that the first 999 items would be discovered faster because the criteria are laxer).

After all of this, Statistics performance was improved, just not in the way put forth in the first post. For now (until we hear that someone's server complained about the queries), this should suffice.
Reply

+ Reply

All times are GMT -4. The time now is 9:50 PM.

This site uses cookies to help personalize content, to tailor your experience, and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Learn more… Accept Remind me later

Welcome to VaultWiki.org, home of the wiki add-on for vBulletin and XenForo!

Issue: Improve Statistics Performance

Issue Tools