Web scrapping in a smart way, making a “Today in History” object in PHP

12 thoughts on “Web scrapping in a smart way, making a “Today in History” object in PHP”

  1. In line with aurelian’s comments, using dom/xpath could really save you some hassles. With xpath you can query on attributes for given tags, etc which make your script a bit less dependent on the page content (you can’t get zero dependence but it minimizes it). You’d find your script a bit shorter.

    Another suggestion would be to cache the results using Zend_Cache or PEAR’s Cache_Lite.

  2. @Tony, Aurelian

    True. And sometime dom is more efficient than scrapping like this. But did you ever try to make a scrapper which parse all the linked in page after simulating POST ? Or some of the popular job sites? You will understand the real pain. In those cases, this policy works best.

    And yeah, I forgot to mention about caching. We should cache the page to avoid DOS to that service. Thanks

  3. Hi there Hasin. Thanks for posting the script as I’ve been searching for an example like it to scrap specific element within a page. Given the fact it was written two years ago, when scraping with this app all I retrieve is the Date with nothing else echoing out. I think Scopesys have changed their source code, apart from that the code’s function seems fine.

Leave a comment