• Register
    • Help

    striker  0 Items
    Currently Supporting
    • Home
    • News
    • Forum
    • Wiki
    • Support
      • Manage Subscriptions
      • FAQ
      • Support For
        • VaultWiki 4.x Series
        • VaultWiki.org Site
    • What's New?
    • Buy Now
    • Manual
    • 
    • Support
    • VaultWiki 3.x Series
    • Task
    • Improve Statistics Performance

    1. Welcome to VaultWiki.org, home of the wiki add-on for vBulletin and XenForo!

      VaultWiki allows your existing forum users to collaborate on creating and managing a site's content pages. VaultWiki is a fully-featured and fully-supported wiki solution for vBulletin and XenForo.

      The VaultWiki Team encourages you to join our community of forum administrators and check out VaultWiki for yourself.

    Issue: Improve Statistics Performance

    • Issue Tools
      • View Changes
    1. issueid=700 July 20, 2009 12:11 AM
      pegasus pegasus is offline
      VaultWiki Team
      Improve Statistics Performance

      Certain information about the wiki is relatively server intensive to grab - statistics such as # of pages, redirects, edits. As a solution for example, rather than dumping an entire namespace to memory to count pages, have a single statistics record for that namespace with that number already in the database.
    Issue Details
    Issue Number 700
    Issue Type Task
    Project VaultWiki 3.x Series
    Category Namespaces
    Status Completed
    Priority 5 - Minor Bugs / Small Tweaks
    Target Version 2.3.2
    Resolved Version 2.5.0
    Milestone VaultWiki 2.5.x
    Software DependencyAny
    Votes to perform 0
    Votes not to perform 0
    Attachments 0
    Assigned Users (none)
    Tags (none)




    1. July 20, 2009 11:58 PM
      tommythejoat tommythejoat is offline
      Regular Member
      I think the perception is correct, but I would recommend a trial calculation on the maintenance of the record on a site with a few hundred simultaneous users where one might reasonably assume that only a small percentage of them would ever request the report.

      You are trading the report overhead for the distributed overhead.
      Reply Reply  
    2. July 21, 2009 3:26 AM
      pegasus pegasus is offline
      VaultWiki Team
      These statistics are called not only by the statistics page but also by a number of variables in Parser Extensions, which could be added to many posts. Again this is only a "could be".

      Despite any report overhead vs. distributed overhead argument, considering the method that is currently used to gather this information, there is a higher likelihood of bad things with 4 almost-table-scans than by adding a single update query when an article is edited.

      A similar argument was made for this issue in the past: http://www.crackedeggstudios.com/issues/702/

      But from a scalability standpoint, it would seem that the larger a wiki grows, the more stressful these particular functions become on the server, especially once we begin to process several thousand rows of information several times over on a single page load. Compared to the 1 row that would need to be maintained (with simple +/-1 changes), the trade-off seems worth it in the long run.

      This query, in particular, troubles me:
      Code:
      SELECT COUNT(revisionid) AS revision_count
      FROM " . TABLE_PREFIX . "vault_revision
      As that table grows exponentially with every wiki article, the query will become much worse. Even more troubling is this new code which has yet to be seen by public eyes, which was added to reduce the memory footprint of VaultWiki dramatically, in this case doubles it:
      Code:
      $nsarray = $vault->fetch_article('stats', $nsid, LANGUAGEID, 'dump');
      This returns the ID of every article in the wiki, which is then used to count thousands of records multiple times.

      However, it's also possible to do this without the added footprint simply by changing the key that we use. Rather than fetching the article list for each namespace and counting the records, we can fetch the already-statistics fields for the forum cache, and reconstruct the namespace from that data only.

      This approach saved SECONDS of generation time, because about 10 fewer queries were used, 3000 fewer items were saved to memory, and 1000 fewer loop iterations were performed. For the queries, more generic indexes were used (50 possibles on another key rather than 1000 possibles against a primary key), resulting in lower seek times per item (at least that's my understanding - that the first 999 items would be discovered faster because the criteria are laxer).

      After all of this, Statistics performance was improved, just not in the way put forth in the first post. For now (until we hear that someone's server complained about the queries), this should suffice.
      Reply Reply  
    + Reply

    Assigned Users
    Loading Please Wait
    Tags
    Loading Please Wait
    • Contact Us
    • License Agreement
    • Privacy
    • Terms
    • Top
    All times are GMT -4. The time now is 10:26 PM.
    This site uses cookies to help personalize content, to tailor your experience, and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Learn more… Accept Remind me later
  • striker
    Powered by vBulletin® Version 4.2.5 Beta 2
    Copyright © 2023 vBulletin Solutions Inc. All rights reserved.
    Search Engine Optimisation provided by DragonByte SEO (Pro) - vBulletin Mods & Addons Copyright © 2023 DragonByte Technologies Ltd.
    Copyright © 2008 - 2013 VaultWiki Team, Cracked Egg Studios, LLC.