Codestin Search App

SunsetTechuila · 2025-06-02T22:08:23Z

resolves #35

Please fill in this template.

Use a meaningful title for the pull request.
Use meaningful commit messages.
Run tsc w/o errors (same as npm run build).
Run npm run lint w/o errors.

marcoburato · 2025-09-17T07:47:39Z

@SunsetTechuila Could you please add the latin1 encoding? Looks like it's missing.

SunsetTechuila · 2025-09-17T07:53:45Z

instead of reading the file as if it's in the target encoding, convert it to the target encoding

Should be implemented on the vscode side

marcoburato · 2025-09-17T09:10:05Z

instead of reading the file as if it's in the target encoding, convert it to the target encoding

Could you clarify what you mean with this? My understanding is that the charset option indicates the encoding to use when reading/writing the file, which is important if it can’t be guessed correctly by vscode. This is the problem I have, vscode guesses utf8 as the encoding for old projects that use iso8859-1 and I can’t change. So, I would create a .editorconfig with charset=latin1.

I guess the conversion would make sense if the file is UTF16 and the .editorconfig charset is set to UTF8 or vice versa. The problem is that, if there is no BOM, there is no bulletproof way to determine the current file encoding (which is the whole point of the charset option).

So, maybe the logic should be to use:

when reading, the charset value when there is no BOM, otherwise the UTF encoding signaled by the BOM
when writing, always the charset value

SunsetTechuila · 2025-09-17T13:10:00Z

Could you clarify what you mean with this?

Opening with autodetection and then switching to the specified encoding

This is the problem I have, vscode guesses utf8 as the encoding for old projects that use iso8859-1 and I can’t change.

For this use case, the behavior I described would be problematic.

Maybe we should add an option to disable encoding conversion. Or, should we just not introduce it at all? @xuhdev, what's your opinion? How do other plugins handle this?

styx3r · 2025-12-20T13:58:20Z

Hi,

is there any progress on this PR?

Would be very useful to have this feature.

SunsetTechuila · 2025-12-22T03:52:46Z

is there any progress on this PR?

No, first I need to check how other editors and plugins handle encoding - do they simply read files in the target encoding, or detect the file's actual encoding and re-save it in the target one?

Feel free to share info on this if you'd like to help speed up merging the PR

styx3r · 2025-12-27T20:22:46Z

As far as I know e.g. QtCreator displays a warning if the specified encoding is not the same used within the files.

Neovim e.g. opens the file in the target encoding and saves it in the target encoding.

IMO the safest way is to ASSUME that the file is in the target encoding. Because detecting encodings is always a gamble.

marcoburato · 2025-12-28T11:17:42Z

@styx3r If an UTF BOM is present, there's no guesswork, it indicates the encoding the file uses. So, if an UTF BOM is present and charset specifies a different encoding, if we use what charset dictates then it's guaranteed to be the wrong encoding. This is why I say it makes no sense.

I believe the only sensible thing to do is opening the file with the UTF encoding indicated by the BOM, if present, otherwise use what chartset indicates.

@SunsetTechuila I think the main issue here is that the .editorconfig specification doesn't define what the charset property actually does. It only lists the possible values. IMHO, the right thing to do is to improve the spec to define what it's supposed to do. It should state what to do when an UTF BOM is present and if a file should be converted (read with encoding X and re-written with encoding Y) when it's not the expected encoding.

Personally, I would be concerned about silently converting files encoding without user explicit consent.

I did a code search on this project to see what other plugins do and it looks like most of them don't support it at all. So there doesn't seem to be a de-facto standard to follow.

styx3r · 2025-12-28T15:49:50Z

@marcoburato I agree with the BOM argument. BUT u could always end up opening a binary file which starts with those 3 bytes and then u would tinker with the binary file.

IMO it should always be the users responsibility to ensure correct encoding in the used repository. This is IMO also the reason why there is no official definition how to implement the charset feature.

+ if the encoding is changed it should be catched during PR review anyhow.

marcoburato · 2025-12-29T20:11:19Z

@marcoburato I agree with the BOM argument. BUT u could always end up opening a binary file which starts with those 3 bytes and then u would tinker with the binary file.

Sorry, I don't understand this point... A binary file is by definition not a text file, there's no correct text encoding to use to read it. It makes no sense to open a binary file in a text editor.

Of course, one can always open a binary file by mistake. In that case, we should avoid corrupting it by attempting to do an automatic text encoding conversion when the file is opened.

This is a good example of why an automatic conversion could be problematic. In my opinion, we should never attempt to convert a file when it's just opened. If anything, it should be done when saving it. But again, I think it would cause more bad than good.

Anyway, it would be appropriate to have a test case for this scenario. So, it's good that you brought it up.

IMO it should always be the users responsibility to ensure correct encoding in the used repository.

I agree, better do whatever possible to leave the files as they are and let the users deal with conversions.

This is IMO also the reason why there is no official definition how to implement the charset feature.

Well, if the spec doesn't fully define how it works, what's the point of the spec? I could understand if this was some kind of edge case, like very old encodings not officially supported by .editorconfig, but the issues we're discussing feel pretty substantial. Perhaps this is why virtually no plugin actually supports the charset property.

I think there's not much point to implement it in a certain way in VScode without improving the spec. Otherwise, other plugins could eventually be implemented with incompatible behaviour.
Any thought @xuhdev ? I've seen several tickets related to charset in the editorconfig repo, all of which seem to be unresolved.

styx3r · 2026-01-04T16:44:13Z

Of course, one can always open a binary file by mistake. In that case, we should avoid corrupting it by attempting to do an automatic text encoding conversion when the file is opened.

This is a good example of why an automatic conversion could be problematic. In my opinion, we should never attempt to convert a file when it's just opened. If anything, it should be done when saving it. But again, I think it would cause more bad than good.

@marcoburato that's what I meant. One could open a binary file by accident and then change the encoding without any manual interaction.

Well, if the spec doesn't fully define how it works, what's the point of the spec? I could understand if this was some kind of edge case, like very old encodings not officially supported by .editorconfig, but the issues we're discussing feel pretty substantial. Perhaps this is why virtually no plugin actually supports the charset property.

As already mentioned, nvim supports it https://neovim.io/doc/user/plugins.html#editorconfig by default.

charset editorconfig.charset

One of "utf-8", "utf-8-bom", "latin1", "utf-16be", or
"utf-16le". Sets the 'fileencoding' and 'bomb' options.

where it states

When 'fileencoding' is not UTF-8, conversion will be done when writing the file.
When reading a file 'fileencoding' will be set from 'fileencodings'. To read a file in a certain encoding it won't work by setting 'fileencoding', use the ++enc argument.

So as far as I understand it, nvim DOES NOT convert the file to UTF-8 while reading it. It only converts it if it's saved to disk.

IMO VSCode could do it similar.

SunsetTechuila requested a review from xuhdev June 2, 2025 22:08

This comment was marked as resolved.

Sign in to view

This comment was marked as outdated.

Sign in to view

SunsetTechuila force-pushed the encoding branch from d946620 to 343e9d1 Compare June 8, 2025 15:17

SunsetTechuila added 5 commits June 8, 2025 20:17

feat: support charset

83865e7

fix typo

2bef322

some additional logging

65b6384

add tests

df13709

also exclude unset from <KnownProps['charset']

9a61990

SunsetTechuila force-pushed the encoding branch from 343e9d1 to 9a61990 Compare June 8, 2025 15:17

SunsetTechuila added the vscode-blocked label Sep 17, 2025

latin1

b73cd98

SunsetTechuila removed the vscode-blocked label Jan 3, 2026

SunsetTechuila added 6 commits January 20, 2026 12:53

Merge branch 'main' into encoding

f803bef

minor refactor

9ffa944

correct log statement

1387389

refactor

e966f95

remove unused variable

a61a8b3

log

7318075

SunsetTechuila marked this pull request as ready for review January 20, 2026 08:40

SunsetTechuila requested review from xuhdev and removed request for xuhdev January 20, 2026 08:40

correct log statement in latin1 test

b4c9353

SunsetTechuila merged commit d90a98c into main Jan 21, 2026
3 checks passed

SunsetTechuila deleted the encoding branch January 21, 2026 09:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support charset#442

feat: support charset#442
SunsetTechuila merged 13 commits intomainfrom
encoding

SunsetTechuila commented Jun 2, 2025 •

edited

Loading

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

marcoburato commented Sep 17, 2025

Uh oh!

SunsetTechuila commented Sep 17, 2025 •

edited

Loading

Uh oh!

marcoburato commented Sep 17, 2025

Uh oh!

SunsetTechuila commented Sep 17, 2025

Uh oh!

styx3r commented Dec 20, 2025

Uh oh!

SunsetTechuila commented Dec 22, 2025 •

edited

Loading

Uh oh!

styx3r commented Dec 27, 2025

Uh oh!

marcoburato commented Dec 28, 2025

Uh oh!

styx3r commented Dec 28, 2025 •

edited

Loading

Uh oh!

marcoburato commented Dec 29, 2025

Uh oh!

styx3r commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

SunsetTechuila commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as outdated.

marcoburato commented Sep 17, 2025

Uh oh!

SunsetTechuila commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcoburato commented Sep 17, 2025

Uh oh!

SunsetTechuila commented Sep 17, 2025

Uh oh!

styx3r commented Dec 20, 2025

Uh oh!

SunsetTechuila commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

styx3r commented Dec 27, 2025

Uh oh!

marcoburato commented Dec 28, 2025

Uh oh!

styx3r commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marcoburato commented Dec 29, 2025

Uh oh!

styx3r commented Jan 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SunsetTechuila commented Jun 2, 2025 •

edited

Loading

SunsetTechuila commented Sep 17, 2025 •

edited

Loading

SunsetTechuila commented Dec 22, 2025 •

edited

Loading

styx3r commented Dec 28, 2025 •

edited

Loading