• Register
    • Help

    striker  0 Items
    Currently Supporting
    • Home
    • News
    • Forum
    • Wiki
    • Support
      • Manage Subscriptions
      • FAQ
      • Support For
        • VaultWiki 4.x Series
        • VaultWiki.org Site
    • What's New?
    • Buy Now
    • Manual
    • 
    • Support
    • VaultWiki 4.x Series
    • Bug
    • Special Characters imported as (???????)

    1. Welcome to VaultWiki.org, home of the wiki add-on for vBulletin and XenForo!

      VaultWiki allows your existing forum users to collaborate on creating and managing a site's content pages. VaultWiki is a fully-featured and fully-supported wiki solution for vBulletin and XenForo.

      The VaultWiki Team encourages you to join our community of forum administrators and check out VaultWiki for yourself.

    Issue: Special Characters imported as (???????)

    • Issue Tools
      • View Changes
    1. issueid=4976 April 1, 2017 11:18 PM
      Alfa1 Alfa1 is offline
      Distinguished Member
      Special Characters imported as (???????)

      νηπενθές and πράμνιον have been imported as ????????
    Issue Details
    Issue Number 4976
    Issue Type Bug
    Project VaultWiki 4.x Series
    Category Importing
    Status Duplicate
    Priority 1 - Security / Login / Data Loss
    Affected Version 4.0.17
    Fixed Version (none)
    Milestone (none)
    Software DependencyXenForo 1.x
    License TypePaid
    Users able to reproduce bug 0
    Users unable to reproduce bug 0
    Attachments 0
    Assigned Users (none)
    Tags (none)




    1. April 2, 2017 12:08 PM
      pegasus pegasus is offline
      VaultWiki Team
      VaultWiki will not single out only some special characters to convert. If the characters exist in the original character set and the target character set, they are converted. Have other special characters been imported fine? If so, then there is not an issue with the importer, but rather the source character sets for some articles were defined in a nonsensical way (as was possible in VW3, and one of the main things we worked to resolve with VW4).

      ? occurs if the final code points after conversion are not valid UTF-8. If the character set used to submit the wiki articles originally did not match the character set that was detected for the article, it is possible to have invalid code points. This is especially likely if your source forum had multiple language packs installed and UTF-8 Mode was never turned on in the wiki, or the built-in multi-language support was not used in the wiki.

      If these code points were successfully imported during your test import, then this is not a bug. In this case, one possibility is that you set the incorrect value for the 'charset' in the importer config. If this was left blank (per the default vB setting) for an import from VW3, this would default to 'latin1', the expected database connection charset for vBulletin 3. However, if you manually entered something like 'utf8', this would make the database read the characters incorrectly. In this case, I would expect the issue to affect at most the two most recent edits of a page; older edits should read normally, because they were compressed in the database.
      Reply Reply  
    2. April 2, 2017 8:48 PM
      Alfa1 Alfa1 is offline
      Distinguished Member
      Characters like ä OR ß were imported fine, but I dont know if this falls under the definition of 'special'.

      There was no value in the charset field as IIRC there was another problem with filling in uft8 here. I remember that I needed to leave this blank. But I am not sure if this is still the case.
      If it would apply to two edits per article then it would relate to 2500 edits.

      The source site does contain the aforementioned special characters, but the target site does not.
      Is there anything I can do to prevent this issue?
      Reply Reply  
    3. April 3, 2017 10:57 AM
      pegasus pegasus is offline
      VaultWiki Team
      ä and ß fall within the ASCII range. It would be difficult for them to be misidentified. The characters in your OP are good characters for testing; when testing special character support, I generally use Cyrillic. By the way, these characters pass unimpeded when used directly: https://www.vaultwiki.org/xf-wiki/in...ράμνιον

      If the original content was saved with mixed character sets, as was easy to accomplish in VaultWiki 3, then there is almost no way to guarantee that all characters will be safe.

      The importer for VaultWiki 3 assumes that the database connection character set is correct (for a default source vBulletin, blank is correct). When determining the source character set for an article:
      1. It compiles a list of all forum languages and their defined character sets and orders them from least-inclusive to most-inclusive
      2. It loops through this list and checks for errors in the source text. If there are any errors, it tries the next language.
      3. If no languages were successful, the most inclusive character set UTF-8 is assumed.

      This means that in order for an article to be imported with any ???, the source article probably contains a character that is invalid for the character set the rest of the article is written in. Such an article has multiple character sets (was probably edited by users who had different forum languages active) and it is not possible to import correctly unless the source article is fixed.

      If VaultWiki 3's Multi-Language Support was not active for the entire life of your wiki but your forum contained multiple language packs, the likelihood that such articles exist is high.

      But even with Multi-Language Support, any article that was moved from one language to another at any time, or any character that was pasted into an article but that did not exist in the article's current character set, might make the source character set indeterminable and default to UTF-8 (which is also wrong).

      However, there is no way for me to know whether any of this even applies to you or where the problem is in your case for this article without knowing which article it is in your source forum and without access to the importer. Even if you were to paste the source text here and it looks the same as on your source forum, the data will not be encoded in the same way here as your importer is receiving it.
      Reply Reply  
    4. April 3, 2017 6:56 PM
      Alfa1 Alfa1 is offline
      Distinguished Member
      Is there any way to search for wiki articles with ???????? in it?
      Reply Reply  
    5. April 4, 2017 10:18 AM
      pegasus pegasus is offline
      VaultWiki Team
      If the ???? appears in the title, you can search with a MySQL query in VaultWiki 4:
      Code:
      SELECT pageid, title
      FROM vw_revision
      WHERE title LIKE '%?%'
      GROUP BY pageid
      Although the title might have been safer because the source was also HTML encoded by vBulletin when it was saved.

      If the ??? appears in the body text, I don't think XenForo's search engine would have indexed those characters, and there is no way to query them directly via MySQL due to the data being compressed. A custom search would need to be used. I am working on one, but it is not present in the current version.
      Reply Reply  
    6. April 4, 2017 12:21 PM
      Alfa1 Alfa1 is offline
      Distinguished Member
      Thanks! Thats good news. I suspect that if we are able to search for such then it will be easy to correct the problem after the import.
      Reply Reply  
    7. April 6, 2017 7:43 PM
      Alfa1 Alfa1 is offline
      Distinguished Member
      It also happens for α δ while these characters can be found on the keyboard.
      Reply Reply  
    8. April 12, 2017 2:36 PM
      Alfa1 Alfa1 is offline
      Distinguished Member
      Please see the following bug reports on my live site:
      300113
      300117
      Reply Reply  
    9. May 7, 2017 2:35 PM
      pegasus pegasus is offline
      VaultWiki Team
      I can't imagine that this isn't due to either of:
      https://www.vaultwiki.org/issues/5003/
      https://www.vaultwiki.org/issues/5034/

      Even though this report was first, the others had more detail and were resolved first, so this is marked as a duplicate.
      Reply Reply  
    + Reply

    Assigned Users
    Loading Please Wait
    Tags
    Loading Please Wait
    • Contact Us
    • License Agreement
    • Privacy
    • Terms
    • Top
    All times are GMT -4. The time now is 10:35 AM.
    This site uses cookies to help personalize content, to tailor your experience, and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Learn more… Accept Remind me later
  • striker
    Powered by vBulletin® Version 4.2.5 Beta 2
    Copyright © 2025 vBulletin Solutions Inc. All rights reserved.
    Search Engine Optimisation provided by DragonByte SEO (Pro) - vBulletin Mods & Addons Copyright © 2025 DragonByte Technologies Ltd.
    Copyright © 2008 - 2024 VaultWiki Team, Cracked Egg Studios, LLC.