Parsing a Text File
Hi,
Could anybody have some proven experiments/records on the complexity of a text file parsing using Java.
In detail, I want to have some flexibilty in retrieving the string or element of my choice through my external application from my text file.To put in other words, how much of XML parsing be achieved with simple plain text files.
If anybody has some code examples and willing to share with me that would be of great help.
Thanks & Regards
Sridharan
Text file parsing can be complex or simple. It depends on how the data in the text file is created and delimited.
If you have a delimiter character (such as "|") then you can easily use StringTokenizer.
Otherwise you have to customize your solution. Is it dataname = datavalue? ,, etc.
Many different considerations. Would need more info to actually assist.
Hi,
Thanks for your response.
I'll tell you the raltionale of our system. Out application has to perform a output log in the specified format. First we start of with XML structure, as we have the flexibility of storing and retrieving data in a more meaningful and complex style. But on the other side, XML is quite space consuming (3 times compared to a flat file) , we are forced to go for text file logging.
Now what I'm looking for is, how complex we could design this output text file , so that we should achieve a flexibilty more closer to XML approach.
In XML, our structure is somewhat like this:
<Appln Name>
<Type of Logging >
<Message = Message1>
... tags corresponding to this message (with one to more attributes)
</Message>
<Message = Message2>
... tags corresponding to this message (with one to more attributes)
</Message>
........
</Type of Logging >
</Appln Name>
Now with this structure in place I can insert/retrieve a particular tag to/from this structure. I want to achieve a closer design with the Flat File.
I've a probelm with java.util.Properties to achieve this kind of structure as it wont allows duplicate keys in the PropertyList
I'm doing some research on the RegualrExpressions in Java too.
Hope this throws a clear picture of my probelm now. Would highly appreciate if you could suggest me some good design methodology to have my issue solved.
Thanks & Regards
Sridharan
Hi,
Well I don't know ur entire requirement..............how you would be retrieving data from text.
Can you follow something like this.
1)
I assume all tags of XML are compulsory.
Using BufferedWriter, write values of each tags in separate lines ( BufferedWriter.newLine() ).
So you would be knowing the format of the file i.e what line number represents what and where it ends. Retrieve accordingly.
2)
Instead of java.util.Properties use java.util.ArrayList.
Add all the values maybe as Strings ( using delimiters for attributes of a tag) in the order you want. you can alse write the ArrayList to the file if persistance is required.
3) Have delimiters for each Tag and make entries in the file so that based on Delimiters you can retrieve
Also XML parsing would be very slow.
Hi Raghavendra,
Thanks for your response.
I think I was not clear in my explanation. Sorry about that.
Well, I need to design/structure a Text file which would give me the kind of flexibility that the XML(the one in my earlier mail) format is giving. Meaning, I wont be having any tag like structure anymore;I am looking at the Text alternate for that.
The 'retrieval logic', for example, should have to fetch the value of a particular attribute in a specific Message Tag under a particular LoggingType and like that. Assume a kind of front end where I would enter a particular combination of the higher level tags(Appln Name, LoggingType,etc) and require the values of the attributes correspondings to the combination which matches the input(Remember,all these I'm talking with XML architecture in mind. This needs to be achieved using Text File)
And you are right, the tags I mentioned are compulsory and I need to map/convert them to the equivalent Text representation now.
Hope I'm making sense and clearer now.
Thanks & Regards
Sridharan
if I get it right you need a structured text file with a simpler structure than a whole xml file.
Why don't you use a property file giving a structure to property names ?
something like
main=<value>
main.firstelement=<value>
main.secondelement=<value>
main.firstelement.firstsecondlevelelement=<value>
and so on
then you just need a simple method to parse the property name and get its position in the
virtual structure.
It would be as simple to retrieve a specific element in the file knowing its position in the virtual structure.
Just an idea, i've used it in a few apps and it works fast and clean.
Hope it helps
Hi Irio,
Thanks for your reply.
But wont it be some kind of hardcoring the combination? I see the effort would be more in case we change the combinatinon at the front end or at the business level.
I understand that we couldn't have the exact XML functionality here in text file. But I'm looking for still more flexibile file.
Would be glad to get more ideas on this issue guys.
Thanks & Regards
Sridharan
Why can you not use XML functionality in the text file?Is it used by another application?Or is there another issue?It sounds like storing your data in XML format is appropriate for your application as described.
Export your XML with no white space to save space.
Does your log need to be read by a text editor easily?Some additional information would be helpful. Sorry, I got very busy yesterday and just got back to the forums.
Text parsing is a very extensive field of programming with the best approach highly dependant on the format you're looking at. The virtue of using XML is that the parser you want already exists off the shelf.
For a more DIY approach I suggest you take a look at the StreamTokenizer class. This may well do the lexical analysis part of the job for you (with proper configuration).
Don't forget the regular expression pattern matching in J2SE 1.4! See the javadoc for java.util.regex for more info. Not necessarily optimal for XML parsing, but if you have a general text file, it works like a dream.Cheers,Colin
You can define you own format with some delemeters. For example you can define one delimeter to mark the start of an application name, one to mark the start of the login type name, and one to mark the start of a message and may be next two bytes to store the number of message. If the width of your delemeters are one byte, you are not wasting much space. Lets say your three delimeters are some unprintable ASCII value X,Y,Z respectively. So your file format will be like this
Xapplication nameYlog in typeZ01some messageZ02some other messageXanother applicationYsome log in typeZ01some other message.
It would be very straight forward to write your own parser for this.
I guess using one file for one application will make it easier to write new logs.
Hope it helps.
Hi There,
This seems to a different but good approach. Thanks Zakariah. I'll look into that more deeper, as how to fit to my need.
As I mentioned already, we are bound to go with Text file approach though XML will be best in our case.
And I'm not getting it clearly, what do you mean by "Exporting XML with no white spaces"; could you through some more light on this please.
Hi There,
This seems to a different but good approach. Thanks Zakariah. I'll look into that more deeper, as how to fit to my need.
As I mentioned already, we are bound to go with Text file approach though XML will be best in our case.
And I'm not getting it clearly, what do you mean by "Exporting XML with no white spaces"; could you through some more light on this please.
Thanks & Regards
Sridharan
>
> And I'm not getting it clearly, what do you mean by
> "Exporting XML with no white spaces"; could you
> through some more light on this please.
>
Extend the output stream class and add buffer to remove all spaces. Or if your information has spaces then all spaces between some tags and not others. This effectively compresses your output.
As was mentioned above StringTokenizer (1 character for each token). You can have a set of tokens but each one is used independent is StringTokenizer(str,"@|")with "This|isTokenized@properly". Then determine if you want the tokens to be included or excluded as parsed tokens.
I don't know much about the new method mentioned above. Have not had cause to use it yet.
Interested to know how it works out. Personally, I like the XML because JAXB makes it so easy to use the data as classes.
Here's another idea.
We make extensive use of a simple list structure, expressed in text. There are only a few delimiters:
{},"
What we usually find ourselves doing is defining trees as nested lists An example:
{ k1 = v1, k2 = v2, l3 = {k11 = v11}}
This is a list with 3 named entries; the last entry has as its value another list, which has one named entry.
For a particular application we typically establish some standard entry naming. (Note that entries need not be named. A list might just consist of a bunch of comma-separated values. Note also that I haven't mentioned the quotes; typically we try to distinguish numeric vs string values.)
We parse this structure into related TreeNode objects. A list is a TreeNode. A TreeNode has attributes, and since those attributes - as shown above - may include lists as values, those lists are parsed into TreeNodes.
Opinions:
o Light-weight XML
o Better than ini format because you don't have to think up a name for everything. Particularly useful for repeating data elements.
Facts:
o A java parser can be easily done using StringParser, and will probably be half a page of code.
o If you give TreeNode a robust toString() method, then you can easily make programmatic changes and write the results back out to a file.
o Easily reimplemented in just about any language as required.
hi sridhar, when i'm searching something i found ur requirement. i need one help from u. Actually
1) my requirement is like this. I have return one method which will took four parameter and writes in to text file with the delimeter "|". Now with that log file we have to do some manipulation so what we thought is we can write into XML format. so that we can easily grap the fields.
My log file looks like this:
2345|1|23|34|
Now i want to change it to XML format which u does for ur log file. If u have to code it would be helpful for me. It's urgent. can u send me the code.
2) And if u know how to update/delete a line in text file without using any other temp file then pls suggest me.
thanx in advance
Hi JJJavac,
Sorry for this delayed(too long) reply. I was out and came back today only. Well,
I need some more inputs.
1. Are these numbers delimited by | are at the same level or they have any parent- child relationship
2. If so, could you please specify that.
3. By code, do you mean the code to convert Text to XML or something else?
In my case, we have defined the XML structure first and then decided to achieve the same functionality with text file.... And my XML structure is what I've posted in this thread already.
Regarding your second question,
1. updating will be the simple , set the append flag to true in the FileWriter()
In fact, I myself has posted a query in the forum regarding inserting a line in between lines of text.
Sorry again for the delay.
Thanks & Regards
Sridharan
hi sridhar, thanx for ur reply. here i'm giving some more details so that u can get the exact problem.
see as i said earlier my file will be like this
2345|1|23|34|abcd|
2345|1|23|34|efgh|
i have to read this text file and findout abcd or efgh. once i found then i have to change the records value and update in the text file without using other file. OR
After findout the particular record i have to delete the entire record without using the other file.
i have gone through so many forums but i didn't get anything. do help if u have time.
thanx in advace
If possible pls write a small code for me doing all stuffs.