.NET – How can I determine the actual code page of System.Text.Encoding.SBCSCodePageEncoding

问题内容:

I am trying to localize an application that was originally written only for English-speaking users.

There is one area where user-submitted HTML files are parsed. Before parsing, the application must determine the charset/encoding the HTML file was created with. If there is no <meta charset... tag, the application then tries to convert the document from the server default encoding to UTF-8 with this expression

var text = Encoding.UTF8.GetString(
    Encoding.Convert(Encoding.Default, Encoding.UTF8, bytes)));

Where bytes is a byte array from the input file.

This encoding conversion is not converting non-English letters like á,é,í,ó,ú,ñ properly.

On my machine, Encoding.Default is Encoding.SBCSCodePageEncoding. I am trying to find out what code page is actually being used by this encoding, because from its source code (https://referencesource.microsoft.com/#mscorlib/system/text/sbcscodepageencoding.cs) it looks like it can behave differently depending on operating system or machine settings. How can I tell what this encoding is actually doing?

问题评论:

原文地址:

https://stackoverflow.com/questions/47740873/net-how-can-i-determine-the-actual-code-page-of-system-text-encoding-sbcscode

添加评论