unacceptable character '?' (0xFFFD) special characters are not allowed

Discussion in 'Plugin Development' started by Don Redhorse, Apr 4, 2012.

Thread Status:
Not open for further replies.
  1. Offline

    Don Redhorse

    Hi,

    I'm getting the following errors on linux systems (some of them) when I load config yaml files via the built in config classes:

    Code:
    [WARNING] [DeathTpPlus] [DeathTpPlus] Error in the deathmessages configuration
    org.bukkit.configuration.InvalidConfigurationException: unacceptable character '?' (0xFFFD) special characters are not allowed
    in "<string>", position 11310
    as you can see in these supplied config files http://dev.bukkit.org/media/attachments/25/852/DeathTpPlus.zip

    the file in question is UTF-8

    strangly the others aren't and don't cause those issues.

    Another strange thing is that I save the files as UTF-8

    Code:
                OutputStream outputStream = new FileOutputStream(pluginPath + deathMessageFileName);
                stream = new PrintWriter(new OutputStreamWriter(outputStream, "utf-8"));
    Any idea on how to fix this? Google is full of this and it looks like issues with the yaml processor
     
  2. Offline

    bergerkiller

    Don Redhorse my guess is that the '?' character is not supported in the UTF-8 charset, though that is a bit strange. I am pretty sure ? can be read/written from file...

    Maybe it is different on Linux systems?
     
  3. It's flipping over your 'ä' on line 302. Remove that and it would probably work.

    I don't know the cause of this, as that character is a valid UTF-8 character.
     
  4. Offline

    Don Redhorse

    well that is the point... the file is valid but still causing issues... the file I posted above uses special characters, but this is now even happening with my stock default english messages ... only on linux though...

    On windows it works, even with special characters like ä but only if I use utf-8.

    well the ? is just a placeholder... for a character in the file which can't be parsed by the yaml configuration classes... i guess that the issue would go away if the file is saved without utf-8 on linux. But this is required for windows systems.
     
  5. Offline

    B3NW

    Save it as Unix ANSI to make it work on both linux and windows AFAIK..
     
  6. Offline

    Don Redhorse

    will that support foreign characters like öäü (german) or danish, french etc?
     
  7. Offline

    Whynot

    I have the same Problem but dont know how to fix it.
    Im using everywhere Linux, therefore i have no Wordpad or something like that.
    But i need to "convert" from UTF-8 to ANSI right?
    But how to?
    I dont understand where i can do this or HOW i can do this.

    I have the Problem with AuthDB, it shows me mysterious letters (an a with a line over it ^^)
    And after that theres the default text. But theres no way to translate it from English to german... i have really no idea whats wrong.
     
  8. Whatever 0xFFFD is, it's not a valid character: http://www.unicodemap.org/details/0xFFFD/index.html
    But I can't find the character in your yml files, so I assume it's a charset issue. Let me check bukkit sources... Okay, as far as I see bukkit doesn't load with a specific charset. That means Java falls back to the systems charset. Execute the following command on the infected machines to confirm that UTF-8 is not their system charset:
    locale
    Example output from a machine using UTF-8:
    Code:
    LANG=de_DE.UTF-8
    LANGUAGE=de_DE:en
    LC_CTYPE="de_DE.UTF-8"
    LC_NUMERIC="de_DE.UTF-8"
    LC_TIME="de_DE.UTF-8"
    LC_COLLATE="de_DE.UTF-8"
    LC_MONETARY="de_DE.UTF-8"
    LC_MESSAGES="de_DE.UTF-8"
    LC_PAPER="de_DE.UTF-8"
    LC_NAME="de_DE.UTF-8"
    LC_ADDRESS="de_DE.UTF-8"
    LC_TELEPHONE="de_DE.UTF-8"
    LC_MEASUREMENT="de_DE.UTF-8"
    LC_IDENTIFICATION="de_DE.UTF-8"
    LC_ALL=
    
    //EDIT: I almost forgot the fix... I think you load the files (from within your jar), then save them (into you data folder). So tell Java to load them as UTF-8, but do not tell it which charset to use while you save them. That way it will convert the file from UTF-8 to whatever charset the system uses and it should be loadable by bukkits yaml API later. :)
     
  9. Offline

    Whynot

    if do "local" the output looks wrong ^^
    Code:
    LANG=
    LANGUAGE=
    LC_CTYPE="POSIX"
    LC_NUMERIC="POSIX"
    LC_TIME="POSIX"
    LC_COLLATE="POSIX"
    LC_MONETARY="POSIX"
    LC_MESSAGES="POSIX"
    LC_PAPER="POSIX"
    LC_NAME="POSIX"
    LC_ADDRESS="POSIX"
    LC_TELEPHONE="POSIX"
    LC_MEASUREMENT="POSIX"
    LC_IDENTIFICATION="POSIX"
    LC_ALL=
    
    That should be really wrong :D But how to tell java to use UTF8?
    Or how to install UTF8 on Ubuntu?
    Im asking aunt google, perhaps she helps me...

    Changed to UTF8 with this "update-locale LANG=de_DE.UTF-8 LC_MESSAGES=POSIX".
    Rebooting now and test it again :D
     
  10. Offline

    monir

    I have the same problem with some plugins i have linux how can i fix this?:
    Code:
    org.bukkit.configuration.InvalidConfigurationException: unacceptable character '�' (0xFFFD) special characters are not allowed
    in "<string>", position 44
     
  11. Offline

    desht

    The iconv utility on Linux will convert files to UTF-8 (or pretty much any encoding).
     
  12. Offline

    monir

    So i need to use icony utility is that what you mean? i really dont have a clue how to fix this.
     
  13. Offline

    desht

    You're most likely seeing that error ("unacceptable character '�' (0xFFFD)") because the file you're trying to load isn't either plain ASCII or UTF-8 encoded (and ASCII is a subset of UTF-8 anyway). You can convert files to UTF-8 encoding with iconv, like this:
    Code:
    $ iconv -f ISO8859-1 -t UTF-8 file.txt > newfile.txt
    
    Of course, you need to know what encoding your existing file is (ISO8859-1 is a common encoding for western European languages). "file file.txt" might be able to tell you. If you're OK with sharing the file, feel free to upload it to pastebin or equivalent and I'll take a look.
     
  14. Offline

    Sehales

    btw, for a german language support and the 'Ä', 'ä', 'Ö', 'ö', 'Ü', 'ü' characters you can use '\u00C4', '\u00E4', '\u00D6', '\u00F6', '\u00DC', '\u00FC', that will solve the problem too!
    In german:
    Du kannst statt 'Ä', 'ä', 'Ö', 'ö', 'Ü', 'ü' auch '\u00C4', '\u00E4', '\u00D6', '\u00F6', '\u00DC', '\u00FC' nutzen, damit wäre der Fehler auch "behoben"!
     
  15. If it's really a "german file" it may be encoded in ISO8859-15.

    //EDIT: Or some windows encoding like Windows-1252.
     
  16. Offline

    toothplck1

    If I remember correctly, there was recentlyish(a few months ago) a bug with the way that bukkit uses SnakeYML, I am not sure if has been fixed since, but I was told that bug was the cause the nearly identical problem to yours that I had. Also I am positive it only affects UTF-8 files
     
  17. Offline

    monir

    sorry mabyy stupid question which file do i need to pastebin?
     
  18. Show us the full stacktrace and we may be able to tell. But from that snippet it almost sounds like its hardcoded into some plugin you use.
     
  19. Offline

    desht

    It's probably coming from here: https://github.com/Bukkit/Bukkit/bl...configuration/file/YamlConfiguration.java#L55 in the loadFromString() method, which is also used when you load an external configuration file.

    monir it's a YAML file that you need to post - might be your config.yml, or does your plugin load another YAML files?
     
  20. Offline

    monir

    Here i have this file its Yml and its modifyworld for permissionsex When i change the config i get this charachter error but this is orginal config and that work.

    http://pastebin.com/9a8EFFDc
     
  21. monir Now the question is how should we identify the charset of the wrongly encoded characters and encode it to the right one if you give us the file without it? :oops:

    Also I think it would be better to upload to a binary file hoster as pastbin&co may correct the character issues automatically.
     
  22. Offline

    Don Redhorse

    guys.. could you make sure to add any findings to the issue about this on the bukkit bug tracker... will try to get the issue number but atm it is not possible
     
  23. Offline

    monir

    The map chatsensor is how i edited the config after restart it resets when i get the error.
    Here is pastebin:http://pastebin.com/Bxk78pzb
    yml file and jar: http://speedy.sh/C7knb/chatsensor.rar

    I have this problem with 3 diffrent plugins.
     
  24. Offline

    desht

    OK, your ChatCensor2/config.yml is in ISO8859-1 encoding, as I suspected - in particular, this string: "Det ordet är blockerat!"

    (ChatCensor/config.yml is plain ASCII so shouldn't be a problem)

    If you convert that config.yml with iconv like this:
    Code:
    $ cp config.yml config.yml.SAFE
    $ iconv -f ISO8859-1 -t UTF-8 config.yml -o config.yml
    
    that should fix your problem.
     
  25. Offline

    Scorpion_vn

    Hi, I just wish to add my two cents in the bug:
    I have Ubuntu11.10 laptop and my UTF-8 files were working great.
    Then after the setup I sent the files to Ubuntu Server and I got that error.
    So the problem is in the environment.
    I list my 'locale' outputs here:

    Laptop - works great:
    Code:
    mine@tonigh:~/MCServer$ locale
    LANG=bg_BG.UTF-8
    LANGUAGE=bg:en
    LC_CTYPE=bg_BG.UTF-8
    LC_NUMERIC="bg_BG.UTF-8"
    LC_TIME="bg_BG.UTF-8"
    LC_COLLATE=bg_BG.UTF-8
    LC_MONETARY="bg_BG.UTF-8"
    LC_MESSAGES=bg_BG.UTF-8
    LC_PAPER="bg_BG.UTF-8"
    LC_NAME="bg_BG.UTF-8"
    LC_ADDRESS="bg_BG.UTF-8"
    LC_TELEPHONE="bg_BG.UTF-8"
    LC_MEASUREMENT="bg_BG.UTF-8"
    LC_IDENTIFICATION="bg_BG.UTF-8"
    LC_ALL=
    The server - throws error:

    Code:
    root@Server27:~# locale
    locale: Cannot set LC_CTYPE to default locale: No such file or directory
    locale: Cannot set LC_MESSAGES to default locale: No such file or directory
    locale: Cannot set LC_ALL to default locale: No such file or directory
    LANG=bg_BG.UTF-8
    LANGUAGE=
    LC_CTYPE=bg_BG.UTF-8
    LC_NUMERIC="bg_BG.UTF-8"
    LC_TIME="bg_BG.UTF-8"
    LC_COLLATE=bg_BG.UTF-8
    LC_MONETARY="bg_BG.UTF-8"
    LC_MESSAGES=bg_BG.UTF-8
    LC_PAPER="bg_BG.UTF-8"
    LC_NAME="bg_BG.UTF-8"
    LC_ADDRESS="bg_BG.UTF-8"
    LC_TELEPHONE="bg_BG.UTF-8"
    LC_MEASUREMENT="bg_BG.UTF-8"
    LC_IDENTIFICATION="bg_BG.UTF-8"
    LC_ALL=
    Hope that helps
     
Thread Status:
Not open for further replies.

Share This Page