Categories: MSDN / DotNet / Java / Scripts / Linux / PHP Ask - La ask - La Answer

how to extract value from text?

i saved a html code into a text file. But how can i get the value i want from these sentences?

<TD BGCOLOR=#FFFFFF>2 Months</TD><TD BGCOLOR=#FFFFFF ALIGN=CENTER>3.10</TD></FONT></TR>
<TR><font face="Verdana, Arial, Helvetica, sans-serif" size=2><TD BGCOLOR=#FFCCCC>3 Months</TD><TD BGCOLOR=#FFCCCC ALIGN=CENTER>3.20</TD>

how can i get the value 2months, 3.10, 3 months, 3.20?
pls help me...thank you.
i using c#
[514 byte] By [shanny] at [2007-11-11 8:10:29]
# 1 Re: how to extract value from text?
I suggest you to create custom class. If you dont want to do that, you may check the following links.

StringFilter for C#
http://www.devhood.com/tools/tool_details.aspx?tool_id=186

Removing HTML from the text in ASP
http://www.codeproject.com/asp/removehtml.asp

I think you may get good idea out of it.
Sync at 2007-11-11 21:48:00 >
# 2 Re: how to extract value from text?
thank you, ur post did help me...but i have one more doubt
after all the tag have been removed, my output is like this 2 months3.10 3months3.20
wat method should i use so that i can get 3.10 when i search for 2 months?
thank you...
shanny at 2007-11-11 21:48:55 >
# 3 Re: how to extract value from text?
Here is a method that might work for you. Its structured pretty strictly on your given inputs though.

Dim HTMLText As String = [HTML String]

'remove all breaks
HTMLText = System.Text.RegularExpressions.Regex.Replace(HTMLText, vbCrLf, "")
'remove all the tags
HTMLText = System.Text.RegularExpressions.Regex.Replace(HTMLText, "(?:<[^>]*?>)", " ")
'remove extra spaces
HTMLText = System.Text.RegularExpressions.Regex.Replace(HTMLText, " +", " ").Trim
'parse leftover data
For Each match As System.Text.RegularExpressions.Match In System.Text.RegularExpressions.Regex.Matches(HTMLText, "(\d+ *\w*) *(\d+\.?\d*)")
MsgBox(match.Groups(1).Value & " " & match.Groups(2).Value)
Next
AdamP at 2007-11-11 21:49:53 >
# 4 Re: how to extract value from text?
I just realized you were using c#, sorry about that, if you cannot translate let me know.
AdamP at 2007-11-11 21:50:54 >
# 5 Re: how to extract value from text?
i really cant translate to c#, because certain function not provided in c#, for example

For Each match As System.Text.RegularExpressions.Match In System.Text.RegularExpressions.Regex.Matches(HTMLT ext, "(\d+ *\w*) *(\d+\.?\d*)")

this cause me an error?
would u mind translate to me?
shanny at 2007-11-11 21:51:56 >
# 6 Re: how to extract value from text?
Here you go:

string HTMLText = this.textBox1.Text;

//remove all breaks
HTMLText = System.Text.RegularExpressions.Regex.Replace(HTMLText, "\n", "");
//remove all the tags
HTMLText = System.Text.RegularExpressions.Regex.Replace(HTMLText, "(?:<[^>]*?>)", " ");
//remove extra spaces
HTMLText = System.Text.RegularExpressions.Regex.Replace(HTMLText, " +", " ").Trim();
//Parse leftover data
foreach (System.Text.RegularExpressions.Match match in System.Text.RegularExpressions.Regex.Matches(HTMLText, "(\\d+ *\\w*) *(\\d+\\.?\\d*)"))
{
MessageBox.Show(match.Groups[1].Value + " " + match.Groups[2].Value);
}
AdamP at 2007-11-11 21:53:00 >
# 7 Re: how to extract value from text?
For some reason a space is placed in HTMLText, you'll have to remove that.
AdamP at 2007-11-11 21:54:04 >
# 8 Re: how to extract value from text?
wow..thanks...the code works pretty nice...
but can help me solve one more problem?
the html file i get, got many other unwanted words that is not in the tag.For example, "fixed deposit
int count=0;"
1 month3.10
2 months3.20
Wat can i do to get only the data 1 month3.10 2month3.20?

Because if i used the above code, i really can get the value 1 month 3.10, but if also return those data i dont need..
Hope you can understand wat i mean...
shanny at 2007-11-11 21:55:02 >
# 9 Re: how to extract value from text?
or is there any match pattern that can only match the "months" and return the value?
this is the output i get after using the above code
9999 9
000 0
0 6
1month 3.10
2month 3.20
3month 3.20
60 0
200 4

i only want the part which consists of month...is there any way to do that?
Help is fully appreciated..
shanny at 2007-11-11 21:56:06 >
# 10 Re: how to extract value from text?
if you know the word Months will always appear then change the last expression to:

(\\d+ *Months) *(\\d+\\.?\\d*)

and give it a try.

The line would then look like this:

foreach (System.Text.RegularExpressions.Match match in System.Text.RegularExpressions.Regex.Matches(HTMLT ext, "(\\d+ *Months) *(\\d+\\.?\\d*)"))
AdamP at 2007-11-11 21:57:10 >
# 11 Re: how to extract value from text?
thanks for ur post...it helps a lot
shanny at 2007-11-11 21:58:08 >
# 12 Re: how to extract value from text?
one more question...
how the regex should be written if i want to subsitude the value into it?
for example
(\\w\\d+ *Months) *(\\d+\\.?\\d*)

i want to replace the word Months with exactly a certain string,
like string subsitude="2Months";
then the regex would be like

(\\w\\d+ *subsitude) *(\\d+\\.?\\d*)

wat is the proper way to write this since the above code cant function properly..
kindly pls tell me...thank you...
shanny at 2007-11-11 21:59:12 >
# 13 Re: how to extract value from text?
I'm not following what you are asking. You want to search for something other then the word "Months" or you want to replace the word "Months" with something else?
AdamP at 2007-11-11 22:00:08 >
# 14 Re: how to extract value from text?
erm..i mean how can i use variable instead of using the month to search...
for example this is the orginial one (\\w\\d+ *month) *(\\d+\\.?\\d*)
but now i want to replace the month with a variable instead of putting the word "month" in it...

this is something tat i want...
string x = "month";
(\\w\\d+ *x) *(\\d+\\.?\\d*)

is there any way to do like this?
shanny at 2007-11-11 22:01:08 >
# 15 Re: how to extract value from text?
is there any way to do this?
or is my question silly as there is no such thing in regex?
kindly tell me pls....thank you...
shanny at 2007-11-11 22:02:12 >
# 16 Re: how to extract value from text?
Try this:

string str_variable = "Months"
"(\\w\\d+ *" + str_variable + ") *(\\d+\\.?\\d*)"

be carful though, not every string will work! In fact it's possable to put a string in there that will crash the application. So if you intend to leave that word up to your users your best bet is to allow only numbers and letters in the str_variable.
AdamP at 2007-11-11 22:03:11 >
# 17 Re: how to extract value from text?
sorry to trouble you...
but is this the correct way ? because i cant return the value i want...

or if i want do like this?
string str_variable="2 Months";
"(" + str_variable + ") *(\\d+\\.?\\d*)"

actually what i intend to do is let users choose for the months, then the system will get the value "2 Months" then match in the html file to find the rate(for example 3.10)...

hope you understand wat i mean...
this one(i edit myself one) doesnt work
string str_variable="2 Months";
"(" + str_variable + ") *(\\d+\\.?\\d*)"

hope you can help...thank you..
shanny at 2007-11-11 22:04:11 >
# 18 Re: how to extract value from text?
i try using this code, it works, but it return the value for 2 months and 12 months...
May i know why it doesnt match exactly the string? why the value for 12 months will returned also?
string str_variable="2 Months";
"("+ str_variable +") *(\\d+\\.?\\d*)"
shanny at 2007-11-11 22:05:16 >
# 19 Re: how to extract value from text?
I think I understand what you mean. What I would do is go into the HTML file and grab all data that is usable and make a list of it. Then filter the list to the user depending on if they choose 2 Months or 3 Months or what ever they choose.
AdamP at 2007-11-11 22:06:19 >
# 20 Re: how to extract value from text?
yes, this is the way i am doing...
but the problem i faced now is match the 2 months and return the value for the month..so i try using this code, but this code will return 2 value which is the value for 2 months and 12 months...may i know why and the way to modify it?

string str_variable="2 Months";
"("+ str_variable +") *(\\d+\\.?\\d*)"

really really hope u can help, because this is the last problem i faced...thank you...
shanny at 2007-11-11 22:07:10 >
# 21 Re: how to extract value from text?
Well you can try this if you want:

string str_variable = "2 Months";
"(?:\\A|[^\\d])(" + str_variable + ") *(\\d+\\.?\\d*)"
AdamP at 2007-11-11 22:08:12 >
# 22 Re: how to extract value from text?
wow...it's work pretty nice...you are so kind and so brilliant :D
thank you very much...u really help me a lot... ;)
shanny at 2007-11-11 22:09:21 >
# 23 Re: how to extract value from text?
I'm glad I was able to help!
AdamP at 2007-11-11 22:10:16 >