• Register
    • Help

    striker  0 Items
    Currently Supporting
    • Home
    • News
    • Forum
    • Wiki
    • Support
      • Manage Subscriptions
      • FAQ
      • Support For
        • VaultWiki 4.x Series
        • VaultWiki.org Site
    • What's New?
    • Buy Now
    • Manual
    • 
    • Support
    • VaultWiki 4.x Series
    • Bug
    • First part of Unicode is removed by importer

    1. Welcome to VaultWiki.org, home of the wiki add-on for vBulletin and XenForo!

      VaultWiki allows your existing forum users to collaborate on creating and managing a site's content pages. VaultWiki is a fully-featured and fully-supported wiki solution for vBulletin and XenForo.

      The VaultWiki Team encourages you to join our community of forum administrators and check out VaultWiki for yourself.

    Issue: First part of Unicode is removed by importer

    • Issue Tools
      • View Changes
    1. issueid=5034 April 29, 2017 1:52 PM
      Alfa1 Alfa1 is offline
      Distinguished Member
      First part of Unicode is removed by importer

      it seems like in many (but not all) cases, if the importer found a character with a unicode value that was above a certain range, it just cut off the upper part of the value. So when

      U+067E : ARABIC LETTER PEH

      got turned into this:

      U+007E : TILDE

      ...it's like the upper part of the number (06) got cut off, and turned into 00. You can see it happen again and again:

      U+0646 : ARABIC LETTER NOON
      U+0046 : LATIN CAPITAL LETTER F
      ^ missing 06

      U+062F : ARABIC LETTER DAL
      U+002F : SOLIDUS {slash, virgule}
      ^ missing 06

      U+015A : LATIN CAPITAL LETTER S WITH ACUTE
      U+005A : LATIN CAPITAL LETTER Z
      ^ missing 01

      ...sort of like it translated all the numbers in a chart into two-digit numbers, even if there were four-digit numbers in the original, so in those cases it just chopped off the first two digits. But that metaphor doesn't neatly explain all the cases. I would guess that the importer assumed it only had to deal with a limited unicode character set, so when it hit a character from a more extended character set, it just gave the closest result it could, either a chopped off result or just garbage.
    Issue Details
    Issue Number 5034
    Issue Type Bug
    Project VaultWiki 4.x Series
    Category Importing
    Status Fixed
    Priority 1 - Security / Login / Data Loss
    Affected Version 4.0.17
    Fixed Version 4.0.18
    Milestone (none)
    Software DependencyXenForo 1.x
    License TypePaid
    Users able to reproduce bug 0
    Users unable to reproduce bug 0
    Attachments 0
    Assigned Users (none)
    Tags (none)




    1. May 7, 2017 11:48 AM
      pegasus pegasus is offline
      VaultWiki Team
      Fixed in the next release. The conversion of some multiple-byte HTML entities back into their UTF-8 codepoints was adding/subtracting extra bits or using invalid ranges for those codepoints. The function that did this was based on a vBulletin function that does the same thing; the only explanation is that this bug existed in vBulletin already. Switching to the XenForo version of the same function causes these entities to convert correctly.
      Reply Reply  
    2. June 10, 2017 7:23 PM
      Alfa1 Alfa1 is offline
      Distinguished Member
      I will try it with the new version.
      Should the importer charset be defined as latin1 ?
      Reply Reply  
    3. June 11, 2017 10:43 AM
      pegasus pegasus is offline
      VaultWiki Team
      It would not hurt.
      Reply Reply  
    + Reply

    Assigned Users
    Loading Please Wait
    Tags
    Loading Please Wait
    • Contact Us
    • License Agreement
    • Privacy
    • Terms
    • Top
    All times are GMT -4. The time now is 11:30 AM.
    This site uses cookies to help personalize content, to tailor your experience, and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Learn more… Accept Remind me later
  • striker
    Powered by vBulletin® Version 4.2.5 Beta 2
    Copyright © 2025 vBulletin Solutions Inc. All rights reserved.
    Search Engine Optimisation provided by DragonByte SEO (Pro) - vBulletin Mods & Addons Copyright © 2025 DragonByte Technologies Ltd.
    Copyright © 2008 - 2024 VaultWiki Team, Cracked Egg Studios, LLC.