First part of Unicode is removed by importer
it seems like in many (but not all) cases, if the importer found a character with a unicode value that was above a certain range, it just cut off the upper part of the value. So when
U+067E : ARABIC LETTER PEH
got turned into this:
U+007E : TILDE
...it's like the upper part of the number (06) got cut off, and turned into 00. You can see it happen again and again:
U+0646 : ARABIC LETTER NOON
U+0046 : LATIN CAPITAL LETTER F
^ missing 06
U+062F : ARABIC LETTER DAL
U+002F : SOLIDUS {slash, virgule}
^ missing 06
U+015A : LATIN CAPITAL LETTER S WITH ACUTE
U+005A : LATIN CAPITAL LETTER Z
^ missing 01
...sort of like it translated all the numbers in a chart into two-digit numbers, even if there were four-digit numbers in the original, so in those cases it just chopped off the first two digits. But that metaphor doesn't neatly explain all the cases. I would guess that the importer assumed it only had to deal with a limited unicode character set, so when it hit a character from a more extended character set, it just gave the closest result it could, either a chopped off result or just garbage.