Change the encoding scheme while parsing an xml file

HI There

I have a problme where i am parsing an xml file using SAX Api .The xml file itself is coming from the internet live.The problme is that the xml file being parsed is not using any encoding format and hence uses the default encoding.

Now when some special characters say special fonts come there it throws a UTFDataFormatException at runtime.But i found that if i change the encoding to ISO-8859-1 then it fails to give any error.

So my question is can i specify the encoding to be used for the xml file to be parsed at runtime

If so how

Thanks

Mum

[606 byte] By [mumtazalam2k] at [2007-9-19]
# 1

The default encoding for an XML file is UTF-8, which is probably not the default Java encoding for your system. The first "default" is a rule of XML, the second is a rule of Java. If changing the encoding of the XML file to ISO-8859-1 fixes the problem, then that means that whatever is producing the XML file is in error. You need to report that to whoever is sending you those files.

DrClap at 2007-7-4 > top of java,Enterprise & Remote Computing,Enterprise Technologies...
# 2
Hi,I am also facing same problem. So DrClap..... is there any way to do so....... I can report back.. but for our safety side. i want to change XML file's encoding ....... I am also using SAX to parse.... thanks in adv,Shaan.....
Shaaan at 2007-7-4 > top of java,Enterprise & Remote Computing,Enterprise Technologies...
# 3
Write a program that reads it in and writes out a modified version with the correct encoding. Or better still, harass the person who is sending you malformed XML files and get them to do that.
DrClap at 2007-7-4 > top of java,Enterprise & Remote Computing,Enterprise Technologies...
# 4

Help...

How do you read in and write a modified version with the correct encoding? I am doing this but it gives me error

URL urlAddress = new URL ("www.mywebsite.com/index.html");

HttpURLConnection httpConnect = (HttpURLConnection)

UrlAddress.openConnection();

httpConnect.setDoOutput(false);

httpConnect.setDoInput(true);

httpConnect.setAllowUserInteraction(false);

BufferedReader bufferPage = new BufferedReader(new InputStreamReader(httpConnect.getInputStream(),"UTF-8"));

String sTemp;

while ((sTemp=bufferPage.readLine())!=null) {

sbPage.append(sTemp);

}

bufferPage.close();

String theHTMLString = sbPage.toString();

First of all theHTMLString is null... why?

if I don't put "UTF-8" at the InputStreamReader. it is ok.

but gives me error when I use theHTMLString in XML Parser..like this

-

StringBuffer newFile = new StringBuffer();

newFile.append("<?xml version=\"1.0\" encoding=\"UTF-8\"?>");

newFile.append(theHTMLString);

ByteArrayInputStream byteSource = new ByteArrayInputStream(newFile.toString().getBytes());

public static Document parse(ByteArrayInputStream bytesource) throws SAXException, IOException

{

DOMParser parser = new DOMParser();

try

{

InputSource source = new InputSource(bytesource);

parser.parse(source);

}

catch (SAXParseException spe)

{

GOT ERROR HERE! HELP!

}

return parser.getDocument();

}

Error message :

Can't open file: due to: java.io.UTFDataFormatException: invalid byte 1 of 1-byte UTF-8 sequence (0x92)

Please tell me what am I doing wrong..

I am reading a stream of html from http and try to parse it through DOMparser..

Please Help!!!

stormlover23 at 2007-7-4 > top of java,Enterprise & Remote Computing,Enterprise Technologies...