Data Scraping Help

@HarleyDavidsonNov 02.2009

Hello. I am new here and have limited knowledge of HTML. While I have made several different web pages over the years, I went from using note pad to using DreamWeaver for my web design. I found it made things easier in the long run but as a result, I have lost some of my basic HTML reading/writting ability.

I have an older mobile phone which doesn’t surf the web very well unless the website contains simple text. Since I started day trading on the stock market, I wish to check current stock quotes via my mobile phone, however, my phones limitations make it difficult at best. I wish to make a simple web page which displays specific stock quotes in real time and shows them simply as text. No graphs or any flash, images or fancy stuff. I feel that data scraping from otcbb.com would be a fantastic way to get the quotes that I am looking for. The problem is I have NO IDEA how to do this. I’ve read several tutorials and visited many web sites which claim to dumb this down but I must be awfully dumb as I can’t seem to figure this out. PHP and stuff like that is foreign to me.

Help?

Thanks!!
– Dan

to post a comment

HTML

16 Comments(s) _↴

@HarleyDavidsonauthorNov 02.2009 — #Perhaps I should simplfy things to start with...

I want to go to http://www.otcbb.com/asp/watchlist.asp and scrape some data from this webpage and display that data in plain text in my webpage. Namely, I want to take the follow data :

Stock Ticker (SDVI)

Current Price ($0.0000)

Can this be done? If so, what is the best way for a complete N00B to bo about this? For those of you who wish to help, I work best with example code but feel free to direct me however you wish to.

Currently, I am using DreamWeaver 3 to build my webpages. I'm not sure that is relevant at all....But for now I am off to work. I'll check back in 9-10 hours.

Cheers!

- Dan

@opifexNov 02.2009 — #you would be better off to use the xml feed from the company you are using to trade. that would be more accurate data.

@HarleyDavidsonauthorNov 03.2009 — #Am I asking the impossible? Is it a dumb question? I'm not sure how to simplify my question anymore than I already have...Could it be that there is no easy answer and perhaps this is not an acceptable obsticle for a beginner?

I can find several example on the internet with open source but there is a good chance that I don't know what to do with the code and therefore cannot make it work.

One more thing. In responce to the previous poster - I thank you for your input but I do my trading via my bank's online service. For now and until I manage to get this job sorted out, I would prefer to use the above listed URL to scrape from.

Cheers!

- Dan

@criterion9Nov 03.2009 — #Does the site you listed above provide xml data? Using xml data would be the most efficient since to screen scrape you'll have to update your code every time something changes on that link that causes your scrape to fail.

@HarleyDavidsonauthorNov 03.2009 — #Hmmmm......Honestly, I don't know nor do I know how to determine that...I suppose I am hopeless, huh? LoL

Here is the updated page to the specific stock that I wish to use as an example : http://www.otcbb.com/asp/quote_module.asp?qm_page=77018&symbol=SDVI

@rnd_meNov 03.2009 — #then you probably don't have access to xml.

depending on how you scrape the html, you might not have to update your scraping code to coincide with content changes. I used to use strings and regexs to scrape, always having to update my files. I now use the html DOM instead. That way, you can just pretend the HTML is XML, and your algo is a lot more reliable.

@HarleyDavidsonauthorNov 05.2009 — #I'm about ready to give up....I've been beating my head against the wall trying to get a handle on this but I think it's out of my league. I've switched from trying to scrape the URL I previously posted above to trying to scrape from google finance. I don't know what I'm doing wrong (or right for that matter). I'm getting no where fast!

Without code begging, I'm curious as to whether or not somebody might put together a simple page for me which scrapes the CURRENT stock price from google finance and display that scraped data as simple text on an html page. Perhaps if I could look at somebody else's working code I could learn from their example. Just curious...Like I said, I don't want to make it seem like I am code begging...

If anyone is interested, lets use google finance as the data source. Furthermore, lets use google finances's search result for the stock ticker TTWO. This is the ticker for the company called Take-Two Interactive.

Any takers? Should somebody choose to put something together for me as an example, I'll provide my email address so that it can be emailed.

Cheers!!

- Dan

@opifexNov 05.2009 — #The best Google solution is right here with the Google Finance API where you set up what you are following won't need to "scrape" anything.

@Richard_WilliamNov 11.2009 — #I would use the script WebPageToTable at http://www.biterscripting.com/SS_WebPageToCSV.html . It extracts a quote from a web page. You will find some examples on the net.

@wdunnhtmltip23Nov 11.2009 — #how to i get my hands on a html parser suggestions please and how much would it cost

@Richard_WilliamNov 11.2009 — #how to i get my hands on a html parser ? [/QUOTE]

It depends on what type of things you need to "parse out". As was already mentioned, Google Finance API (http://code.google.com/apis/finance/) is good for financial stuff. Biterscripting (http://www.biterscripting.com) is good for general purpose parsing.

and how much would it cost ? [/QUOTE]

Most of these things are free.

@donatelloNov 13.2009 — #Here is a screenscraper for you:

[code=php]
 
 <?php
 $url = "http://go.internet.com/?id=474X1150&url=http&#37;3A%2F%2Fwww.otcbb.com%2Fasp%2Fquote_module.asp%3Fqm_page%3D77018%26symbol%3DSDVI";
 
 $ch = curl_init();
 curl_setopt ($ch, CURLOPT_URL, $url);
 curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP);
 curl_setopt ($ch, CURLOPT_PROXY,"http://64.202.165.130:3128");
 curl_setopt ($ch, CURLOPT_TIMEOUT, 120);
 $response = curl_exec ($ch);
 if(is_int($response)) {
 die("Errors: " . curl_errno($ch) . " : " . curl_error($ch));
 }
 curl_close ($ch);
 
 print "$response";
 ?>
 [/code]

I have it set to work with GoDaddy which requires the proxy you see in the code. You can take that out and strip out anything you don't want using PregReplace.

@HarleyDavidsonauthorNov 13.2009 — #donatello, thank you for your code. I very much appreciate it. Currently, I don't know what to do with it though. So far I have created an .html file and pasted your code inside. When I upload this file and surf to it, I get a blank page. I'm POSITIVE that this is due to my ignorance and not your code. I simply don't know what to do with this code yet....I'll bang my head against the wall for a while. I may be able to figure it out...LoL...Maybe...

Thanks for the help all the same though! Cheers!!

Here is the source code for my .html file so far :

[CODE]
 <html>
 <head>
 <title>SDVI Quote</title>
 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
 </head>
 
 <body bgcolor="#FFFFFF">
 
 <?php 
 $url = "http://go.internet.com/?id=474X1150&url=http%3A%2F%2Fwww.otcbb.com%2Fasp%2Fquote_module.asp%3Fqm_page%3D77018%26symbol%3DSDVI"; 
 
 $ch = curl_init(); 
 curl_setopt ($ch, CURLOPT_URL, $url); 
 curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
 curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP); 
 curl_setopt ($ch, CURLOPT_PROXY,"http://64.202.165.130:3128"); 
 curl_setopt ($ch, CURLOPT_TIMEOUT, 120); 
 $response = curl_exec ($ch); 
 if(is_int($response)) { 
 die("Errors: " . curl_errno($ch) . " : " . curl_error($ch)); 
 } 
 curl_close ($ch); 
 
 print "$response"; 
 ?> 
 
 </body>
 </html>
 [/CODE]

@donatelloNov 13.2009 — #IT IS NOT AN HTML FILE!

Do not name it scraper.html

instead name it

scraper.php

That is all.

This will simply scrape the page you want. Then you can remove the bits you don't want with simple stripping scripts which are easy to learn.

--------------------------------------------------------

So, in summary... save the following as screenscraper.php

[code=php]
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
 <head>
 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
 <title>ScreenScraper</title>
 </head>
 
 <body>
 <?php 
 $url = "http://go.internet.com/?id=474X1150&url=http%3A%2F%2Fwww.otcbb.com%2Fasp%2Fquote_module.asp%3Fqm_page%3D77018%26symbol%3DSDVI"; 
 
 $ch = curl_init(); 
 curl_setopt ($ch, CURLOPT_URL, $url); 
 curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 
 curl_setopt ($ch, CURLOPT_PROXYTYPE, CURLPROXY_HTTP); 
 curl_setopt ($ch, CURLOPT_PROXY,"http://64.202.165.130:3128"); 
 curl_setopt ($ch, CURLOPT_TIMEOUT, 120); 
 $response = curl_exec ($ch); 
 if(is_int($response)) { 
 die("Errors: " . curl_errno($ch) . " : " . curl_error($ch)); 
 } 
 curl_close ($ch); 
 
 print "$response"; 
 ?> 
 </body>
 </html>
 
 [/code]

NOT: screenscraper.html

YES: screenscraper.php

@donatelloNov 13.2009 — #See the thread here:

http://www.webdeveloper.com/forum/showthread.php?t=219883

If you are not sure if your server is PHP enabled.

There are also tips there about how to make PHP run on HTML pages.

@HarleyDavidsonauthorNov 14.2009 — #OK...My forehead is bloody again and I am beyond frustrated. I had all but thrown in the towel until you threw me a nugget donatello. I have spent hours with this and obviously I don't know enough about it to even generate an educated search query. I do truely appreciate your help donatello but I'm affraid I may need for you to hold my hand here. I have created a file "scraper.php" which contains your code, however, I simply can't figure out how to utilize it...I have tried searching for examples of "simple stripping scripts" but all I seem to get are ways to remove html tags from code...

I guess I have a few questions...I hope I am following close enough for these questions to be accurate...Bare with me as I truely am trying...My questions are as follows :

(1) Is it correct to assume that the "scarper.php" file is going to be called or run from a seperate "webpage.html" file?

(2) Assuming that "scraper.php" is going to be a seperate file from the "webpage.html", is it safe to assume that only these two files will be required in order to achieve my goal (create a webpage.html page which scrapes a stock quote and displays it as plain text)?

(3) How do I go about utilizing the "scraper.php" file now that you've created it for me? I guess what I mean is, how do I call upon it from the "webpage.html" (assuming that my 1st question is accurate)?

Also in #HTML _↴

Extremely large image map help Drop Down Menu with Photos need help with databases

Success!

Help @HarleyDavidson spread the word by sharing this article on Twitter...

Tweet This

Data Scraping Help

16 Comments(s) _↴

Also in #HTML _↴

Success!

Social

Version

Data Scraping Help

16 Comments(s) ↴

Also in #HTML ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

16 Comments(s) _↴

Also in #HTML _↴