• Register
    • Help

    striker  0 Items
    Currently Supporting
    • Home
    • News
    • Forum
    • Wiki
    • Support
      • Manage Subscriptions
      • FAQ
      • Support For
        • VaultWiki 4.x Series
        • VaultWiki.org Site
    • What's New?
    • Buy Now
    • Manual
    • 
    • Support
    • VaultWiki 4.x Series
    • Bug
    • MediaWiki import issues with image files: control character affixed to some urls, first (incorrect) file version always displayed

    1. Welcome to VaultWiki.org, home of the wiki add-on for vBulletin and XenForo!

      VaultWiki allows your existing forum users to collaborate on creating and managing a site's content pages. VaultWiki is a fully-featured and fully-supported wiki solution for vBulletin and XenForo.

      The VaultWiki Team encourages you to join our community of forum administrators and check out VaultWiki for yourself.

    Issue: MediaWiki import issues with image files: control character affixed to some urls, first (incorrect) file version always displayed

    • Issue Tools
      • View Changes
    1. issueid=4948 March 3, 2017 8:15 AM
      ACL ACL is offline
      Regular Member
      MediaWiki import issues with image files: control character affixed to some urls, first (incorrect) file version always displayed

      Issue 1: Some (not all) imported articles with images include the left-to-right mark (LRM) UTF-8 control character in the link to the file:
      Code:
      %E2%80%8E
      
      E.g. File:name-of-file-jpg%E2%80%8E
      I have one instance of an article where this problem applies to selected included images only. While said article contains 13 images in total, just 3 of them have the LRM control character added.


      Issue 2: When an image has had multiple versions on MediaWiki, VaultWiki displays the original version of an image instead of the latest version.

      This problem applies both to the image file page itself and to any articles that the image appears in.

      Upon inspecting an image's file history with 2 or more versions, they are correctly listed with the corresponding image for each revision so this isn't that newer file versions aren't imported at all but some other issue.

      Related Issue: vw_file_usage_none phrase not loaded when viewing "Pages showing this file in their content" for a file not used anywhere
    Issue Details
    Issue Number 4948
    Issue Type Bug
    Project VaultWiki 4.x Series
    Category Importing
    Status Fixed
    Priority 3 - Loss of Functionality
    Affected Version 4.0.16
    Fixed Version 4.0.17
    Milestone (none)
    Software DependencyXenForo 1.x
    License TypePaid
    Users able to reproduce bug 0
    Users unable to reproduce bug 0
    Attachments 2
    Assigned Users (none)
    Tags (none)




    1. March 3, 2017 9:31 AM
      pegasus pegasus is offline
      VaultWiki Team
      #1 If this is the case, at first glance, it seems that the LRM character was placed in the wrong location in the source material (before a | rather than after it); however, without a snippet of the entire link where this occurs, I cannot really know what is happening. Please post the problematic link markup in its entirety.

      #2 I am reading the code, and I am having a hard time seeing how this would happen. Are you sure that it is the original version in every case? Or is it the second-to-last version (excluding "deleted" versions)?

      #3 The phrase is confirmed missing and will be included in the next version.
      Reply Reply  
    2. March 3, 2017 10:01 AM
      ACL ACL is offline
      Regular Member
      #1: Editing an article with the LRM character problem, this is the code for a particular broken image link:
      Code:
      [[Image:WS2811_strobe_string.jpg‎]]
      When viewing the article, that same included image appears as text "File:WS2811_strobe_string.jpg‎" with a hyperlink pointing to "/path/to/wiki/File:WS2811_strobe_string-jpg%E2%80%8E".
      Browsing to that file page presents an error, although the LRM is not present in the error message:
      "A wiki node with the title File:WS2811_strobe_string-jpg‎ could not be found".

      Incidentally, these images on the source wiki live inside a table using mediawiki syntax. It appears the broken image links containing the LRM have a space between the table cell character "|" and the start of the [[Image: ...] wiki code. E.g. problem image (copied from MediaWiki, other table cells removed for simplicity):
      Code:
      {| class="wikitable"
      |-
      | [[Image:WS2811_strobe_string.jpg‎]]
      |-
      |}
      And the images which display correctly without issue do not have a space between | and the start of the [[Image: ...] wiki code (copied from MediaWiki, other table cells removed for simplicity):
      Code:
      {| class="wikitable"
      |-
      | Rowspan="5"|[[Image:6803_Strip.jpg]]
      |-
      |}

      #2:
      You are kind of correct. On my wiki there are quite a number of images with two versions of the image (i.e. it has been updated one time) but not too many images with >2 versions. Anyway, I've checked 2 images with >2 versions and in both instances what actually seems to be happening is the chosen image is one version behind the latest version. Apologies for the mis-information...
      Reply Reply  
    3. March 3, 2017 11:19 AM
      pegasus pegasus is offline
      VaultWiki Team
      #1 For the LRM issue, please try this edit to vault/core/model/encode/vw.php. Find:
      Code:
      else if ($is_uni OR preg_match('/[A-Z0-9\-_:]+/i', $char, $match))
      Replace with:
      Code:
      else if (($is_uni AND !preg_match('/^\p{Cc}$/u', $char, $match)) OR (!$is_uni AND preg_match('/[A-Z0-9\-_:]+/i', $char, $match)))
      You may need to add &nocache=1 to the URL to see the effect on an uncached page.
      Reply Reply  
    4. March 3, 2017 11:47 AM
      pegasus pegasus is offline
      VaultWiki Team
      #2 Are the timestamps for the non-current edits wrong? Do they appear to be stamped during the import? Including the current edit, are the edits listed in the wrong order? I have may have found the problem, but the current edit would be listed as the original edit, with the real original edit being listed with a more recent timestamp.

      In vault/core/controller/import/handle/mw/attach/vw.php, find:
      Code:
      $edit['dateline'] = $this->convert_dateline($edit['dateline']);
      After it, add:
      Code:
      $edit['edit_dateline'] = $edit['dateline'];
      Find:
      Code:
      $dm = vw_Hard_Core::controller('DM')->create('Attach', 'SILENT');
      After it, add:
      Code:
      $dm->set_info('custom_dateline', 1);
      Apparently the actual timestamps were not being used when importing non-current edits. Instead, the timestamp for when the importer was running (now) was being used. This is because edits require custom_dateline to be set in order for a non-now edit_dateline to be applied. Additionally, the importer was setting the incorrect dateline field (which is only used when creating the attachment) vs the correct edit_dateline field.
      Reply Reply  
    5. March 3, 2017 12:41 PM
      ACL ACL is offline
      Regular Member
      Quote Originally Posted by pegasus
      #1 For the LRM issue, please try this edit to vault/core/model/encode/vw.php.
      You may need to add &nocache=1 to the URL to see the effect on an uncached page.
      #1: No change in behaviour with this code change, unfortunately! I'm using friendly urls so had to adjust the URL slightly but that shouldn't have mattered, e.g. path/to/wiki/Article-Name?nocache=1

      Quote Originally Posted by pegasus
      #2 Are the timestamps for the non-current edits wrong? Do they appear to be stamped during the import? Including the current edit, are the edits listed in the wrong order? I have may have found the problem, but the current edit would be listed as the original edit, with the real original edit being listed with a more recent timestamp.
      #2: Actually, yes re the import timestamp. I've found an image with 8 file versions with the last edit from 2013.
      The 'current' version as far as MediaWiki is concerned is timestamped in VaultWiki as May 30, 2013, 6:28 PM - correct.
      The 'original' version as far as MediaWiki is concerned is timestamped in VaultWiki as May 23, 2013, 8:30 PM - correct.
      All 6 other file versions are timestamped in VaultWiki as Feb 20, 2017, 12:17 AM, which is indeed the date I last ran the importer.

      Thus the order from bottom to top is, 1. initial file timestamp, 2. latest file timestamp, 3-8 other file versions other than initial/latest version with importer timestamp. But oddly, the file loaded against the file with the 'oldeest' timestamp is not actually the original revision.

      The true latest file version in the history view is also showing as deleted so when trying to view it via the history page I get a permission error. The file version listed one above the true latest version shows the message "Username may have made changes that were omitted from this report to save resources". I'll attach a screenshot which is likely to make more sense.

      I've applied those two edits to vault/core/controller/import/handle/mw/attach/vw.php, but won't be able to confirm whether it is fixed until the next time I do an import (which will be at least a few days away)
      Reply Reply  
    6. March 3, 2017 4:48 PM
      pegasus pegasus is offline
      VaultWiki Team
      I tested it here: Control Characters in Links.

      LRM seems to be classed as an invisible-formatting-character and not as a control-character per-se. This changes the regex above from:
      Code:
      !preg_match('/^\p{Cc}$/u', $char, $match)
      To:
      Code:
      !preg_match('/^[\p{Cc}\p{Cf}]$/u', $char, $match)
      Marking this issue as fixed.
      Reply Reply  
    7. March 4, 2017 3:13 AM
      ACL ACL is offline
      Regular Member
      #1: I've altered vault/core/model/encode/vw.php once more with the new regex and the LRM issue is now fixed with images displaying correctly. Thanks!
      Reply Reply  
    + Reply

    Assigned Users
    Loading Please Wait
    Tags
    Loading Please Wait
    • Contact Us
    • License Agreement
    • Privacy
    • Terms
    • Top
    All times are GMT -4. The time now is 5:44 PM.
    This site uses cookies to help personalize content, to tailor your experience, and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Learn more… Accept Remind me later
  • striker
    Powered by vBulletin® Version 4.2.5 Beta 2
    Copyright © 2025 vBulletin Solutions Inc. All rights reserved.
    Search Engine Optimisation provided by DragonByte SEO (Pro) - vBulletin Mods & Addons Copyright © 2025 DragonByte Technologies Ltd.
    Copyright © 2008 - 2024 VaultWiki Team, Cracked Egg Studios, LLC.