/    Sign up×
Community /Pin to ProfileBookmark

Parse remote HTML document in PHP

Dear users,

I want to get a HTML document from a remote website using curl, and parse it via php.

About curl, I know how to do it, but when I get the HTML document in a php string, how can I parse it getting certain elements? (like js queryselector)

Thanks

to post a comment
PHP

7 Comments(s)

Copy linkTweet thisAlerts:
@ginerjmJun 20.2019 — A quick search of the php.net manual gave me this function. Not sure if it gives you what you want but it may be a start

loadHTML() function (part of the domdocument group)
Copy linkTweet thisAlerts:
@NogDogJun 20.2019 — https://php.net/DOM is the most robust approach, though it's a bit intimidating when you first look at the documentation. :) Ultimately it's similar to parsing the DOM in JavaScript/jQuery, though.
Copy linkTweet thisAlerts:
@redagrauthorJun 20.2019 — Can you please post an example of which I want to do? I need querySelector, but I can't find it in documentation.

Thanks
Copy linkTweet thisAlerts:
@redagrauthorJul 01.2019 — I can't do it. I receive many errors like these:

Warning: DOMDocument::loadHTML(): Tag main invalid in Entity, line: 66 in /home/www.site.com/z.php on line 509

Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 67 in /home/www.site.com/z.php on line 509

Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 69 in /home/www.site.com/z.php on line 509

Warning: DOMDocument::loadHTML(): Tag aside invalid in Entity, line: 70 in /home/www.site.com/z.php on line 509

Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 72 in /home/www.site.com/z.php on line 509

And many others errors.

I want only to parse some <meta> tags in head of html document to withdraw the contents.

Can you help me?

Thanks
Copy linkTweet thisAlerts:
@SempervivumJul 01.2019 — DOMDocument's parser doesn't know about the new semantic tags of HTML5 (yet?). However I experienced that the document is parced successfully in spite of these errros and that these tags can be accessed.
Copy linkTweet thisAlerts:
@NogDogJul 01.2019 — Yeah, I usually have to add the optional parameter that suppresses warnings. :)

e.g.:
<i>
</i>$dom-&gt;loadHTMLFile($myFile, LIBXML_NOWARNING);
Copy linkTweet thisAlerts:
@SempervivumJul 01.2019 — PS:
I want only to parse some <meta> tags in head of html document to withdraw the contents.[/quote]
Check if the function get_meta_tags can do this

https://www.php.net/manual/en/function.get-meta-tags.php
×

Success!

Help @redagr spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.24,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...