[BLUEJ-508] System.in does not honour terminal encoding setting
The default charset for System.in is not the project charset but rather the system default charset, which leads to some nasty surprises for those outside the cosy world of the Latin-1 charset.
New projects have project.charset=UTF-8 set by default, whereas old projects from earlier BlueJ versions will use the system default charset (e.g. project.charset=cp1252 on Windows). When project.charset=cp1252, it is impossible to use 16-bit charsets such as Cyrillic or Hebrew alphabets in string constants in a file. These alphabets can be used to construct a string using Tools -> Use Library Class, or even when typing into the editor (although the file can't be saved correctly), but when inspecting or displaying the resulting string it uses the project charset (=cp1252) and displays nonsense. This is true even if the system charset is changed to UTF-8 before launching BlueJ, as BlueJ uses the project.charset value in preference to the system charset.
However the reverse is true when it comes to System.in. If yoy write this:
bq. Scanner scan = new Scanner(System.in);
the charset used is not the project charset which is used in all other situations, but is instead the system charset (cp1252 on Windows). This means that a program can display a prompt in Cyrillic or Hebrew but cannot read a response using the same alphabet correctly. This is a surprise to most people, as they imagine that the project.charset applies to all aspects of the project including the standard input and output, and not just the file encoding. The bluej.terminal.encoding setting in bluej.defs also fails to affect this behaviour.
It is of course easily (if unpleasantly) worked around: add "-Dfile.encoding=UTF-8" to bluej.vm.args in bluej.defs (or set it in JAVA_TOOL_OPTIONS), or specify the charset as a parameter when constructing a scanner:
bq. Scanner scan = new Scanner(System.in,"UTF-8");
However, the least surprising approach would seem to be to use the project.charset as the default charset when launching the user VM so that the same charset is used both for input and output, and is thus the charset for the entire project as implied by the property name.
Issue metadata
- Issue type: Task
- Priority: Medium
- Fix versions: 3.1.4