utf-8 encoding, SAX readers and currency symbols

Hi,

Do SAX readers have a problem with utf-8 encoding

I have a text-box where the user enters details and if I enter or when the XML is written out ( DOM writers ) the symbols appear as £ and ?. The DOM reader can reconvert it back to and when the field is repopulated but the SAX readers cannot. We use DOM for an offline Swing application and SAX readers for the online form and for uploading files created offline.

Can anyone help?

[466 byte] By [087643759] at [2007-9-19]
# 1
It appears that the DOM writer is producing correct UTF-8 encoding, based on your example. If your "SAX reader" is having a problem, what exactly is that problem? It throws an exception? It appears to handle the data incorrectly?
DrClap at 2007-7-4 > top of java,Enterprise & Remote Computing,Enterprise Technologies...
# 2

Hi sorry, perhaps should have been more specific:

There are two text areas on the form where the user is free to enter alphanumerics and a selection of other characters including and .

On entering these characters the XMl file will look like this:

<?xml version="1.0" encoding="UTF-8"?>// This is at the top of the XML file produced by the form

.. // various elements and subelements

..

<Indicators details="pound is £ and euro is ?" />

When the user reopens the page on the form in the offline application it repopulates the field with

"pound is and euro is " . This application uses a DOM reader to convert the XMl into objects.

In the online application the field repopulates with

"pound is £ and euro is ?". It uses a SAX reader to repopulate the fields.

No exception is thrown and the website will simply prevent progress because the angstrom( ) is not a valid format for the field. Is it that the SAX reader - which extends DocumentHandler - cannot reconvert the utf-8 encoding for the currency signs back into £ and before it is converted into a string object for repopulation.

Thanks

087643759 at 2007-7-4 > top of java,Enterprise & Remote Computing,Enterprise Technologies...
# 3

>Is it that the SAX reader - which extends DocumentHandler - cannot reconvert the utf-8 encoding for the currency signs back into and before it is converted into a string object for repopulation.

Depends where the SAX reader is actually getting its data from. Your Java program will ALWAYS be working with Strings, which are never in any encoding. The encoding only describes how the data is stored outside Java. So it may be that you have a file that is encoded in UTF-8, and correctly says so in the XML header. But if you read that file in through some component -- for example a FileReader -- without specifying UTF-8 as the encoding, then you will get the result you are seeing.

So it's probably not the SAX reader that is the problem, it's whatever is feeding the SAX reader. And that we have no information about.

DrClap at 2007-7-4 > top of java,Enterprise & Remote Computing,Enterprise Technologies...