I am trying to localize an application that was originally written only for English-speaking users.
There is one area where user-submitted HTML files are parsed. Before parsing, the application must determine the charset/encoding the HTML file was created with. If there is no
<meta charset... tag, the application then tries to convert the document from the server default encoding to UTF-8 with this expression
var text = Encoding.UTF8.GetString( Encoding.Convert(Encoding.Default, Encoding.UTF8, bytes)));
bytes is a byte array from the input file.
This encoding conversion is not converting non-English letters like
On my machine,
Encoding.SBCSCodePageEncoding. I am trying to find out what code page is actually being used by this encoding, because from its source code (https://referencesource.microsoft.com/#mscorlib/system/text/sbcscodepageencoding.cs) it looks like it can behave differently depending on operating system or machine settings. How can I tell what this encoding is actually doing?