Change the encoding scheme while parsing an xml file
HI There
I have a problme where i am parsing an xml file using SAX Api .The xml file itself is coming from the internet live.The problme is that the xml file being parsed is not using any encoding format and hence uses the default encoding.
Now when some special characters say special fonts come there it throws a UTFDataFormatException at runtime.But i found that if i change the encoding to ISO-8859-1 then it fails to give any error.
So my question is can i specify the encoding to be used for the xml file to be parsed at runtime
If so how
Thanks
Mum
Help...
How do you read in and write a modified version with the correct encoding? I am doing this but it gives me error
URL urlAddress = new URL ("www.mywebsite.com/index.html");
HttpURLConnection httpConnect = (HttpURLConnection)
UrlAddress.openConnection();
httpConnect.setDoOutput(false);
httpConnect.setDoInput(true);
httpConnect.setAllowUserInteraction(false);
BufferedReader bufferPage = new BufferedReader(new InputStreamReader(httpConnect.getInputStream(),"UTF-8"));
String sTemp;
while ((sTemp=bufferPage.readLine())!=null) {
sbPage.append(sTemp);
}
bufferPage.close();
String theHTMLString = sbPage.toString();
First of all theHTMLString is null... why?
if I don't put "UTF-8" at the InputStreamReader. it is ok.
but gives me error when I use theHTMLString in XML Parser..like this
-
StringBuffer newFile = new StringBuffer();
newFile.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
newFile.append(theHTMLString);
ByteArrayInputStream byteSource = new ByteArrayInputStream(newFile.toString().getBytes());
public static Document parse(ByteArrayInputStream bytesource) throws SAXException, IOException
{
DOMParser parser = new DOMParser();
try
{
InputSource source = new InputSource(bytesource);
parser.parse(source);
}
catch (SAXParseException spe)
{
GOT ERROR HERE! HELP!
}
return parser.getDocument();
}
Error message :
Can't open file: due to: java.io.UTFDataFormatException: invalid byte 1 of 1-byte UTF-8 sequence (0x92)
Please tell me what am I doing wrong..
I am reading a stream of html from http and try to parse it through DOMparser..
Please Help!!!