Categories: MSDN / DotNet / Java / Scripts / Linux / PHP Ask - La ask - La Answer

Extract information from another website

How can I extract information from another website and display it on my webpage ?
And which tool offers best solution ?
[128 byte] By [BhagyaLakshmi] at [2007-11-11 11:57:06]
# 1 Re: Extract information from another website
What information are you after and from what web site?
Hack at 2007-11-11 23:34:00 >
# 2 Re: Extract information from another website
if you just want to display some page, use frames, frameset or Iframe .. but if you want specific data i think you need some parsing.
Amahdy at 2007-11-11 23:35:11 >
# 3 Re: Extract information from another website
We will know exactly what he needs and from where when he posts back.
Hack at 2007-11-11 23:36:10 >
# 4 Re: Extract information from another website
I have a travel agency project.

A customer purchases Train Tickets thru the Agency. Each ticket has a specific ID,which can be used to track the status at the official website of Railways.

I want a flexibility where the customer enters the same ID on the website of the Agency and and in the background we check with the Official website and extract the information present on our website.

I hope I am clear.
BhagyaLakshmi at 2007-11-11 23:37:04 >
# 5 Re: Extract information from another website
Ok. So do you need to read the Official website of railways website page or query its database?
Hack at 2007-11-11 23:38:10 >
# 6 Re: Extract information from another website
Query its database.
BhagyaLakshmi at 2007-11-11 23:39:15 >
# 7 Re: Extract information from another website
Question number one is do you have the necessary access rights to connect to an query this database?

In other words, is this OK with the keeper of the Official website of Railways?
Hack at 2007-11-11 23:40:08 >
# 8 Re: Extract information from another website
I think its ok. Because I have seen similar websites.

Also I want only that information which is anyway available generally.
BhagyaLakshmi at 2007-11-11 23:41:11 >
# 9 Re: Extract information from another website
I don't think it's ok, the information is availble, but nobody will give you access to the database unless you deal with the database owner ...
I think you have to "read the webpage" parse it then extract the nessesary data from it ...
Amahdy at 2007-11-11 23:42:13 >
# 10 Re: Extract information from another website
I agree that Database access can bring legal issues.

But then how do I do it? I need to develop this webpage which can display the information from another website. Please help.

I dont know about parsing.
BhagyaLakshmi at 2007-11-11 23:43:14 >
# 11 Re: Extract information from another website
It's not concerned with legal or no, the problem is you really can't access the database at all ... in any company the database is usally the most secured part of the company IT ...

if you just want to display and not use the output informations what about using IFRAME ?

<iframe src="ANOTHE_WEB_SITE" width=SOME_WIDTH height=SOME_HEIGHT />
Amahdy at 2007-11-11 23:44:12 >
# 12 Re: Extract information from another website
I agree that Database access can bring legal issues.

But then how do I do it? I need to develop this webpage which can display the information from another website. Please help.
You do it by following Amahdy's advice and contacting the database owners as step number one.
Hack at 2007-11-11 23:45:16 >
# 13 Re: Extract information from another website
There are ways to get the information you want.
Try googling "screen scraping"
An example using Active Server Pages follows:
======================================================
<%
Function CurrentTemperature(iZipCode)
Dim sHTML
Dim beginpos
Dim endpos
Dim srvXmlHttp
Dim URL
CurrentTemperature = "(not available)" ' default
Set srvXmlHttp = Server.CreateObject("MSXML2.ServerXMLHTTP.3.0")
URL= "http://www.wunderground.com/cgi-bin/findweather/getForecast?query=" & iZipCode
srvXmlHttp.open "GET", URL, false
srvXmlHttp.send()
If srvXmlHttp.status = 200 Then
' grab the HTML source for the entire page
sHTML = srvXmlHttp.responseText
' find code that occurs just before the current temperature
beginpos = Instr(sHTML,"<td class=""full"" id=""message2"">")
' throw away everything before this
sHTML = Mid(sHTML,beginpos,len(sHTML))
' find code that occurs just after the temperature
endpos = Instr(sHTML,"°")
' throw away everything after it
sHTML = Mid(sHTML,1,endpos)
' with what's left, find the tag just ahead of the temp
beginpos = Instr(sHTML,"<b>")
sHTML = Mid(sHTML,beginpos+3,len(sHTML))
' with what's left, find the tag just after the temp
endpos = Instr(sHTML,"</b>")
' grab the temp from between the tags and add a degree symbol
CurrentTemperature = Mid(sHTML,1,endpos-1) & "°"
End If
Set srvXMLHttp = Nothing
End Function ' CurrentTemperature
%>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<BODY>
<p>
<i>Currently in Houston...</i> <% =CurrentTemperature(77001) %>
</p>
</BODY>
</HTML>
bschaettle at 2007-11-11 23:46:12 >
# 14 Re: Extract information from another website
yes this was the second suggestion in my first post : some parsing ... but I don't recommend this method , because it's not stable at all .. what if the owner changes just a tag or a space .. and this happens frequently in most of webpages ... how to track this ? try to deal with the owner maybe he accept to creat for you a limited user althought if it was me I won't accept at all to give at least the databse location ... this is very risky .
Amahdy at 2007-11-11 23:47:14 >