/    Sign up×
Community /Pin to ProfileBookmark

Page scraping question

I am working on a site that requires me to pull a small bit of information from another page of my site, and have figured it out somewhat. Here’s my code so far. I set up a simple page as an example of the page I am pulling info from.

[CODE]<?php
$data = file_get_contents(‘http://www.egnited.net/site.php/’);
$regex = ‘/Star Wars (.+?) A New Hope/’;
preg_match($regex,$data,$match);
echo $match[1];
?>[/CODE]

This code pulls the text between “Star Wars” and “A New Hope” (the text being “Episode IV:”) to the page which I have the above code:

[url]http://egnited.net/test.php[/url]

My question is, how exactly would I need to alter my code to include:

  • 1. The entire line, “Star Wars Episode IV: A New Hope”;
    and

  • 2. Multiple lines (ie, from “Star Wars” down to “Jaws”?
  • Can someone please show me what changes I need to make to make the above happen?

    Thanks for any help, I’m sorry if this doesn’t make sense.

    to post a comment
    PHP

    4 Comments(s)

    Copy linkTweet thisAlerts:
    @Phill_PaffordJun 12.2008 — I have used this site in the past to develop a screen scraping tool, hope this helps

    http://www.tgreer.com/class_http_php.html
    Copy linkTweet thisAlerts:
    @legendxJun 12.2008 — For #2:

    <i>
    </i>$regex = '/Star Wars (.+?) Jaws/m';


    The m at the end represents the multi-line modifier. Give it a shot.
    Copy linkTweet thisAlerts:
    @EgnitedauthorJun 12.2008 — Thank you both. ?

    For #2:

    <i>
    </i>$regex = '/Star Wars (.+?) Jaws/m';


    The m at the end represents the multi-line modifier. Give it a shot.[/QUOTE]

    Hmm, this isn't working.
    Copy linkTweet thisAlerts:
    @legendxJun 12.2008 — Maybe try removing newlines from $data before it hits preg_match().

    Something like:

    [code=php]
    $data = file_get_contents('http://www.egnited.net/site.php/');
    $data = str_replace(chr(10), "", $data); // Ascii decimal 10 = newline
    ...
    [/code]


    I don't know a lot about regular expressions.. there is probably a better way to do it with regex but I work with the knowledge I got :p
    ×

    Success!

    Help @Egnited spread the word by sharing this article on Twitter...

    Tweet This
    Sign in
    Forgot password?
    Sign in with TwitchSign in with GithubCreate Account
    about: ({
    version: 0.1.9 BETA 5.5,
    whats_new: community page,
    up_next: more Davinci•003 tasks,
    coming_soon: events calendar,
    social: @webDeveloperHQ
    });

    legal: ({
    terms: of use,
    privacy: policy
    });
    changelog: (
    version: 0.1.9,
    notes: added community page

    version: 0.1.8,
    notes: added Davinci•003

    version: 0.1.7,
    notes: upvote answers to bounties

    version: 0.1.6,
    notes: article editor refresh
    )...
    recent_tips: (
    tipper: @Yussuf4331,
    tipped: article
    amount: 1000 SATS,

    tipper: @darkwebsites540,
    tipped: article
    amount: 10 SATS,

    tipper: @Samric24,
    tipped: article
    amount: 1000 SATS,
    )...