Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: support charset#442

Merged
SunsetTechuila merged 13 commits intomainfrom
encoding
Jan 21, 2026
Merged

feat: support charset#442
SunsetTechuila merged 13 commits intomainfrom
encoding

Conversation

@SunsetTechuila
Copy link
Member

@SunsetTechuila SunsetTechuila commented Jun 2, 2025

resolves #35

Please fill in this template.

  • Use a meaningful title for the pull request.
  • Use meaningful commit messages.
  • Run tsc w/o errors (same as npm run build).
  • Run npm run lint w/o errors.

@SunsetTechuila SunsetTechuila requested a review from xuhdev June 2, 2025 22:08
@SunsetTechuila

This comment was marked as resolved.

@SunsetTechuila

This comment was marked as outdated.

@marcoburato
Copy link

@SunsetTechuila Could you please add the latin1 encoding? Looks like it's missing.

@SunsetTechuila
Copy link
Member Author

SunsetTechuila commented Sep 17, 2025

instead of reading the file as if it's in the target encoding, convert it to the target encoding

Should be implemented on the vscode side

@marcoburato
Copy link

instead of reading the file as if it's in the target encoding, convert it to the target encoding

Could you clarify what you mean with this? My understanding is that the charset option indicates the encoding to use when reading/writing the file, which is important if it can’t be guessed correctly by vscode. This is the problem I have, vscode guesses utf8 as the encoding for old projects that use iso8859-1 and I can’t change. So, I would create a .editorconfig with charset=latin1.

I guess the conversion would make sense if the file is UTF16 and the .editorconfig charset is set to UTF8 or vice versa. The problem is that, if there is no BOM, there is no bulletproof way to determine the current file encoding (which is the whole point of the charset option).

So, maybe the logic should be to use:

  • when reading, the charset value when there is no BOM, otherwise the UTF encoding signaled by the BOM
  • when writing, always the charset value

@SunsetTechuila
Copy link
Member Author

Could you clarify what you mean with this?

Opening with autodetection and then switching to the specified encoding

This is the problem I have, vscode guesses utf8 as the encoding for old projects that use iso8859-1 and I can’t change.

For this use case, the behavior I described would be problematic.

Maybe we should add an option to disable encoding conversion. Or, should we just not introduce it at all? @xuhdev, what's your opinion? How do other plugins handle this?

@styx3r
Copy link

styx3r commented Dec 20, 2025

Hi,

is there any progress on this PR?

Would be very useful to have this feature.

@SunsetTechuila
Copy link
Member Author

SunsetTechuila commented Dec 22, 2025

is there any progress on this PR?

No, first I need to check how other editors and plugins handle encoding - do they simply read files in the target encoding, or detect the file's actual encoding and re-save it in the target one?

Feel free to share info on this if you'd like to help speed up merging the PR

@styx3r
Copy link

styx3r commented Dec 27, 2025

As far as I know e.g. QtCreator displays a warning if the specified encoding is not the same used within the files.

Neovim e.g. opens the file in the target encoding and saves it in the target encoding.

IMO the safest way is to ASSUME that the file is in the target encoding. Because detecting encodings is always a gamble.

@marcoburato
Copy link

@styx3r If an UTF BOM is present, there's no guesswork, it indicates the encoding the file uses. So, if an UTF BOM is present and charset specifies a different encoding, if we use what charset dictates then it's guaranteed to be the wrong encoding. This is why I say it makes no sense.

I believe the only sensible thing to do is opening the file with the UTF encoding indicated by the BOM, if present, otherwise use what chartset indicates.

@SunsetTechuila I think the main issue here is that the .editorconfig specification doesn't define what the charset property actually does. It only lists the possible values. IMHO, the right thing to do is to improve the spec to define what it's supposed to do. It should state what to do when an UTF BOM is present and if a file should be converted (read with encoding X and re-written with encoding Y) when it's not the expected encoding.

Personally, I would be concerned about silently converting files encoding without user explicit consent.

I did a code search on this project to see what other plugins do and it looks like most of them don't support it at all. So there doesn't seem to be a de-facto standard to follow.

@styx3r
Copy link

styx3r commented Dec 28, 2025

@marcoburato I agree with the BOM argument. BUT u could always end up opening a binary file which starts with those 3 bytes and then u would tinker with the binary file.

IMO it should always be the users responsibility to ensure correct encoding in the used repository. This is IMO also the reason why there is no official definition how to implement the charset feature.

+ if the encoding is changed it should be catched during PR review anyhow.

@marcoburato
Copy link

@marcoburato I agree with the BOM argument. BUT u could always end up opening a binary file which starts with those 3 bytes and then u would tinker with the binary file.

Sorry, I don't understand this point... A binary file is by definition not a text file, there's no correct text encoding to use to read it. It makes no sense to open a binary file in a text editor.

Of course, one can always open a binary file by mistake. In that case, we should avoid corrupting it by attempting to do an automatic text encoding conversion when the file is opened.

This is a good example of why an automatic conversion could be problematic. In my opinion, we should never attempt to convert a file when it's just opened. If anything, it should be done when saving it. But again, I think it would cause more bad than good.

Anyway, it would be appropriate to have a test case for this scenario. So, it's good that you brought it up.

IMO it should always be the users responsibility to ensure correct encoding in the used repository.

I agree, better do whatever possible to leave the files as they are and let the users deal with conversions.

This is IMO also the reason why there is no official definition how to implement the charset feature.

Well, if the spec doesn't fully define how it works, what's the point of the spec? I could understand if this was some kind of edge case, like very old encodings not officially supported by .editorconfig, but the issues we're discussing feel pretty substantial. Perhaps this is why virtually no plugin actually supports the charset property.

I think there's not much point to implement it in a certain way in VScode without improving the spec. Otherwise, other plugins could eventually be implemented with incompatible behaviour.
Any thought @xuhdev ? I've seen several tickets related to charset in the editorconfig repo, all of which seem to be unresolved.

@styx3r
Copy link

styx3r commented Jan 4, 2026

Of course, one can always open a binary file by mistake. In that case, we should avoid corrupting it by attempting to do an automatic text encoding conversion when the file is opened.

This is a good example of why an automatic conversion could be problematic. In my opinion, we should never attempt to convert a file when it's just opened. If anything, it should be done when saving it. But again, I think it would cause more bad than good.

@marcoburato that's what I meant. One could open a binary file by accident and then change the encoding without any manual interaction.

Well, if the spec doesn't fully define how it works, what's the point of the spec? I could understand if this was some kind of edge case, like very old encodings not officially supported by .editorconfig, but the issues we're discussing feel pretty substantial. Perhaps this is why virtually no plugin actually supports the charset property.

As already mentioned, nvim supports it https://neovim.io/doc/user/plugins.html#editorconfig by default.

charset editorconfig.charset

One of "utf-8", "utf-8-bom", "latin1", "utf-16be", or
"utf-16le". Sets the 'fileencoding' and 'bomb' options.

where it states

When 'fileencoding' is not UTF-8, conversion will be done when writing the file.
When reading a file 'fileencoding' will be set from 'fileencodings'. To read a file in a certain encoding it won't work by setting 'fileencoding', use the ++enc argument.

So as far as I understand it, nvim DOES NOT convert the file to UTF-8 while reading it. It only converts it if it's saved to disk.

IMO VSCode could do it similar.

@SunsetTechuila SunsetTechuila marked this pull request as ready for review January 20, 2026 08:40
@SunsetTechuila SunsetTechuila requested review from xuhdev and removed request for xuhdev January 20, 2026 08:40
@SunsetTechuila SunsetTechuila merged commit d90a98c into main Jan 21, 2026
3 checks passed
@SunsetTechuila SunsetTechuila deleted the encoding branch January 21, 2026 09:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add support for charset

3 participants