Programming Language · 2017-12-12 0

.NET – How can I determine the actual code page of System.Text.Encoding.SBCSCodePageEncoding


I am trying to localize an application that was originally written only for English-speaking users.

There is one area where user-submitted HTML files are parsed. Before parsing, the application must determine the charset/encoding the HTML file was created with. If there is no <meta charset... tag, the application then tries to convert the document from the server default encoding to UTF-8 with this expression

var text = Encoding.UTF8.GetString(
    Encoding.Convert(Encoding.Default, Encoding.UTF8, bytes)));

Where bytes is a byte array from the input file.

This encoding conversion is not converting non-English letters like á,é,í,ó,ú,ñ properly.

On my machine, Encoding.Default is Encoding.SBCSCodePageEncoding. I am trying to find out what code page is actually being used by this encoding, because from its source code ( it looks like it can behave differently depending on operating system or machine settings. How can I tell what this encoding is actually doing?



Would love your thoughts, please comment.x