/    Sign up×
Community /Pin to ProfileBookmark

[RESOLVED] Regular Expression problem.

I have a problem with my PHP. Here’s the regulary expression:

[code=php]preg_match_all(‘#<kwd>(.)<kwd>#’, file_get_contents(‘./Data/keywords.xml’), $keywords);[/code]

Here is the file it’s searching through:

[code=html]<?xml version=”1.0″ encoding=”utf-8″?>
<!DOCTYPE kwds [
<!ELEMENT kwds (kwd+)>
<!ELEMENT kwd (#PCDATA)>
]>
<kwds>
<kwd>Comics</kwd>
<kwd>Chaos Theory/String Theory</kwd>
<kwd>Email</kwd>
<kwd>Flash</kwd>
<kwd>Forums</kwd>
<kwd>Furcadia</kwd>
<kwd>Furry</kwd>
<kwd>Games</kwd>
<kwd>HTML</kwd>
<kwd>Knowledge</kwd>
<kwd>LiveJournal</kwd>
<kwd>Local</kwd>
<kwd>Music</kwd>
</kwds>
[/code]

For some reason, I’m not getting any of the contents of the <kwd> elements returned.

to post a comment
PHP

12 Comments(s)

Copy linkTweet thisAlerts:
@NogDogFeb 07.2009 — Try this:
<i>
</i>'#&lt;kwd&gt;([^&lt;]+)&lt;/kwd&gt;#'

Alternatively, if there is any possibility of a "<" within the captured text, use:
<i>
</i>'#&lt;kwd&gt;(.+?)&lt;/kwd&gt;#'
Copy linkTweet thisAlerts:
@Mr_Initial_ManauthorFeb 07.2009 — Thank you. ?


EDIT: (*Sticks some keywords in here in case anyone else needs this...*)

XML Regular Expression PHP Parse File
Copy linkTweet thisAlerts:
@NogDogFeb 07.2009 — Just FYI, another more robust option (if a bit more coding work) would be to use the [url=http://www.php.net/dom]DOM functions[/url] to parse the XML and then pick out the data you want.
Copy linkTweet thisAlerts:
@Mr_Initial_ManauthorFeb 07.2009 — What do you mean by "Robust"?

And will the PHP phpinfo() function let me know if I've got it installed?

Oh, and this isn't QUITE resolved yet. What character also includes "newline"? Because I have another XML file to work with.
Copy linkTweet thisAlerts:
@NogDogFeb 07.2009 — "Robust" in that if the XML being parsed is properly formed, then it should be expected to parse it correctly regardless of how intricate it may be. For example, the regexp we used would fail for anything where a <kwd> tag had an attribute (e.g. <kwd id="1">). In other words, as long as the basic XML schema does not change, your code would still work with the DOM functions but a regexp-based solution might fail if your pattern does not account for all possibilities.

The DOM functions are part of the PHP core if you are running PHP5; there is nothing to install. If running PHP4 (and shame on you if you are :p ), you would have to use the DOM XML extension instead, which is not part of the PHP core.
Copy linkTweet thisAlerts:
@Mr_Initial_ManauthorFeb 07.2009 — Um... PHP 3.SomethingOrOther.

(Sorry, couldn't resist)

Yeah, both me and my host are running PHP 5.2.x
Copy linkTweet thisAlerts:
@NogDogFeb 07.2009 — Just because I don't feel like working on what I should be, here's an example:
[code=php]
<?php
$xml = <<<EOD
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE kwds [
<!ELEMENT kwds (kwd+)>
<!ELEMENT kwd (#PCDATA)>
]>
<kwds>
<kwd>Comics</kwd>
<kwd>Chaos Theory/String Theory</kwd>
<kwd>Email</kwd>
<kwd>Flash</kwd>
<kwd>Forums</kwd>
<kwd>Furcadia</kwd>
<kwd>Furry</kwd>
<kwd>Games</kwd>
<kwd>HTML</kwd>
<kwd>Knowledge</kwd>
<kwd>LiveJournal</kwd>
<kwd>Local</kwd>
<kwd>Music</kwd>
</kwds>
EOD;

// Use the DOM to get the kwd values:
$dom = new DOMDocument();
$dom->loadXML($xml);
$list = $dom->getElementsByTagName('kwd');
foreach($list as $item)
{
echo $item->nodeValue . "<br />n";
}
[/code]
Copy linkTweet thisAlerts:
@Mr_Initial_ManauthorFeb 07.2009 — What do these little arrows do?

BTW, this is step 1. The other XML file's a good deal more complicated.

This is what I have so far, and it works well:

[code=php]<?php
header('Content-type: application/xhtml+xml; charset=utf-8');

if(empty($_GET['keyword'])){
$kwd = 'Welcome';
} else {
$kwd = $_GET['keyword'];
}

class PageLink{
var $href;
var $title;
var $secs = Array();
}


//Other Stuff.
$t3 = "ttt";
$n = "rn";

?>
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xml:lang="en" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Script-Type" content="application/javascript" />
<title>HomePage</title>
<link rel="stylesheet" type="text/css" href="../Formatting/homepage.css" />
</head>
<body>
<h1>Homepage</h1>
<ul id="Menu">
<?php
//Keyword XML

$key_dom = new DOMDocument();
$key_dom -> loadXML(file_get_contents('./Data/keywords.xml'));
$key_list = $key_dom -> getElementsByTagName('kwd');
foreach($key_list as $keyword){
$kword = $keyword -> nodeValue;
echo ($t3 . '<li><a href="index.php?keyword=' . $kword . '">' . $kword . '</a></li>' . $n);
}
?>
</ul>
<div id="Main">
<h2><?php echo $kwd; ?></h2>
<?php
if($kwd=='Welcome'){
echo $t3 . '<p>All Systems Are Go.</p>' . $n;
} else {

}
?>
<p>You have 0 matches</p>
<pre>
<?php
?>
</pre>
</div>
</body>
</html>[/code]


Of course, using this file:

[code=html]<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE kwds [
<!ELEMENT kwds (kwd+)>
<!ELEMENT kwd (#PCDATA)>
]>
<!-- This file contains keywords of my page links. -->
<kwds>
<kwd>Comics</kwd>
<kwd>Chaos Theory/String Theory</kwd>
<kwd>Email</kwd>
<kwd>Flash</kwd>
<kwd>Forums</kwd>
<kwd>Furcadia</kwd>
<kwd>Furry</kwd>
<kwd>Games</kwd>
<kwd>HTML</kwd>
<kwd>Knowledge</kwd>
<kwd>LiveJournal</kwd>
<kwd>Local</kwd>
<kwd>Music</kwd>
</kwds>
[/code]


The following is the next file to deal with:

[code=html]<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE links [
<!ELEMENT links (link+)>
<!ELEMENT link (name, url, kwds, secs?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT url (#PCDATA)>
<!ELEMENT kwds (kwd+)>
<!ELEMENT kwd (#PCDATA)>
<!ELEMENT secs (sec+)>
<!ELEMENT sec (kwd, s+)>
<!ELEMENT s (#PCDATA)>
]>
<!-- This page contains information about my links -->
<links>
<link>
<name>Acropolis</name>
<url>furc://Acropolis</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kwd>Furcadia</kwd>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Albert "Gene Catlow" Temple</name>
<url>http://genecatlow.livejournal.com</url>
<kwds>
<kwd>LiveJournal</kwd>
</kwds>
</link>
<link>
<name>Allegria Island</name>
<url>furc://allegriaisland</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kwd>Furcadia</kwd>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>AIM Email</name>
<url>http://mail.aim.com</url>
<kwds>
<kwd>Email</kwd>
</kwds>
</link>
<link>
<name>Albino Black Sheep</name>
<url>http://www.albinoblacksheep.com</url>
<kwds>
<kwd>Flash</kwd>
</kwds>
</link>
<link>
<name>Babael</name>
<url>http://babael.livejournal.com/</url>
<kwds>
<kwd>LiveJournal</kwd>
</kwds>
</link>
<link>
<name>Belfry WebComics Index</name>
<url>http://www.belfry.com/comics</url>
<kwds>
<kwd>Comics</kwd>
<kwd>Furry</kwd>
</kwds>
<secs>
<sec>
<kwd>Furry</kwd>
<s>Comics</s>
</sec>
</secs>
</link>
<link>
<name>Bob DayWalker</name>
<url>http://spiritboi.livejournal.com/</url>
<kwds>
<kwd>LiveJournal</kwd>
</kwds>
</link>
<link>
<name>Brian Gelfand</name>
<url>http://barach.livejournal.com/</url>
<kwds>
<kwd>LiveJournal</kwd>
</kwds>
</link>
<link>
<name>Challenges</name>
<url>furc://Challenges</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kwd>Furcadia</kwd>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Chaos Theory/String Theory Facebook Group</name>
<url>http://www.facebook.com/group.php?gid=4069048319</url>
<kwds>
<kwd>Chaos Theory/String Theory</kwd>
</kwds>
</link>
<link>
<name>Chaos Theory/String Theory Google Group</name>
<url>http://groups.google.com/group/ctst</url>
<kwds>
<kwd>Chaos Theory/String Theory</kwd>
</kwds>
</link>
<link>
<name>Chaos Theory/String Theory Homepage</name>
<url>http://geocities.com/kharismaspade/</url>
<kwds>
<kwd>Chaos Theory/String Theory</kwd>
</kwds>
</link>
<link>
<name>Chaos Theory/String Theory Wiki</name>
<url>http://ctst.bluwiki.com</url>
<kwds>
<kwd>Chaos Theory/String Theory</kwd>
</kwds>
</link>
<link>
<name>Coach Random</name>
<url>http://localhost:8080/Websites/Coach_Random</url>
<kwds>
<kwd>Local</kwd>
</kwds>
</link>
<link>
<name>Cobo</name>
<url>http://cobo4231.livejournal.com/</url>
<kwds>
<kwd>LiveJournal</kwd>
</kwds>
</link>
<link>
<name>Database Journals</name>
<url>http://forums.databasejournal.com/</url>
<kwds>
<kwd>Forums</kwd>
<kwd>HTML</kwd>
</kwds>
<secs>
<sec>
<kwd>Forums</kwd>
<s>HTML</s>
</sec>
<sec>
<kwd>HTML</kwd>
<s>Forums</s>
</sec>
</secs>
</link>
<link>
<name>Delphi Forums</name>
<url>http://www.delphiforums.com</url>
<kwds>
<kwd>Forums</kwd>
<kwd>Music</kwd>
</kwds>
<secs>
<sec>
<kwd>Forums</kwd>
<s>Music</s>
</sec>
<sec>
<kwd>Music</kwd>
<s>Forums</s>
</sec>
</secs>
</link>
<link>
<name>DevPPL Forum</name>
<url>http://www.devppl.com</url>
<kwds>
<kwd>Forums</kwd>
<kwd>HTML</kwd>
</kwds>
<secs>
<sec>
<kwd>Forums</kwd>
<s>HTML</s>
</sec>
<sec>
<kwd>HTML</kwd>
<s>Forums</s>
</sec>
</secs>
</link>
<link>
<name>DeviantArt</name>
<url>http://www.deviantart.com</url>
<kwds>
<kwd>Furry</kwd>
</kwds>
<secs>
<sec>
<kwd>Furry</kwd>
<s>Art</s>
</sec>
</secs>
</link>
<link>
<name>Doctypes</name>
<url>http://localhost:8080/Local_Use_Only/HTML_Info/doctypes.xhtml</url>
<kwds>
<kwd>Local</kwd>
</kwds>
</link>
<link>
<name>Dog's Days Of Summer</name>
<url>http://community.livejournal.com/dogsdays/</url>
<kwds>
<kwd>LiveJournal</kwd>
</kwds>
<secs>
<sec>
<kwd>LiveJournal</kwd>
<s>Communities</s>
</sec>
</secs>
</link>
</links>[/code]


What I want to do with this one is, say, if the keyword is "LiveJournal", then list ALL links that match <kwd>LiveJournal</kwd>. Yes, let's start with that. Then we get weird later.
Copy linkTweet thisAlerts:
@NogDogFeb 07.2009 — The "->" arrows are part of the object-oriented PHP syntax, indicating that the expression on the right is a member of the object on the left. It's basically the same as the "." in JavaScript, e.g.: [b]document.getElementById('kwd')[/b].
[code=php]
<?php
$xmlFile = "test.xml";
$search = "LiveJournal";
$linksArray = array();

$dom = new DOMDocument();
$dom->load($xmlFile);
$links = $dom->getElementsByTagName('link');
foreach($links as $link)
{
$kwds = $link->getElementsByTagName('kwd');
foreach($kwds as $kwd)
{
if($kwd->nodeValue == $search)
{
$url = $link->getElementsByTagName('url');
$name = $link->getElementsByTagName('name');
$linksArray[] = array('url' => $url->item(0)->nodeValue, 'name' => $name->item(0)->nodeValue);
break;
}
}
}
// show the results:
echo "<pre>".print_r($linksArray,1)."</pre>";
[/code]
Copy linkTweet thisAlerts:
@Mr_Initial_ManauthorFeb 07.2009 — Yeah, this is looking a LOT like Javascript. Is this deliberate?
Copy linkTweet thisAlerts:
@NogDogFeb 07.2009 — The basic structure is similar to many object-oriented languages. Then we're using the DOM which is used a lot in JavaScript, so it does start to look familiar, eh? (Though for me it's sort of the inverse, since I know PHP better than I do JavaScript.)
Copy linkTweet thisAlerts:
@Mr_Initial_ManauthorFeb 07.2009 — I've never used PHP in this area. So it's all new to me.

Now here's where things take their final step into the weird:

The relevant code:
[code=php]<?php
$link_arr = Array();
$link_tick = 0;
$link_dom = new DOMDocument();
$link_dom -> loadXML(file_get_contents('./Data/links.xml'));
$links = $link_dom -> getElementsByTagName ('link');

foreach ($links as $link){
$link_kwds = $link -> getElementsByTagName ('kwd');
foreach ($link_kwds as $link_kwd){
if($link_kwd->nodeValue == $kwd){
$url = $link->getElementsByTagName('url');
$name = $link->getElementsByTagName('name');
$link_arr[$link_tick] = ('<li><a href="' . $url->item(0)->nodeValue . '">' . $name->item(0)->nodeValue . '</a></li>');
$link_tick++;
break;
}
}
}
?>
<p>You have <?php echo count($link_arr); ?> matches</p>
<ul>
<?php for ($i=0; $i < count($link_arr); $i++){echo ($t4 . $link_arr[$i] . $n);}?>
</ul>
</div>[/code]


Sample links:
[code=html]<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE links [
<!ELEMENT links (link+)>
<!ELEMENT link (name, url, kwds, secs?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT url (#PCDATA)>
<!ELEMENT kwds (kwd+)>
<!ELEMENT kwd (#PCDATA)>
<!ELEMENT secs (sec+)>
<!ELEMENT sec (kword, s+)>
<!ELEMENT kword (#PCDATA)>
<!ELEMENT s (#PCDATA)>
]>
<!-- This page contains information about my links -->
<links>
<link>
<name>Acropolis</name>
<url>furc://Acropolis</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Allegria Island</name>
<url>furc://allegriaisland</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Challenges</name>
<url>furc://Challenges</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Festival</name>
<url>furc://festival/</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Furcadia</name>
<url>http://www.furcadia.com</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
</link>
<link>
<name>Furrabian Nights</name>
<url>furc://furrabiannights</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Haunted Clock Tower, The</name>
<url>http://www.freewebs.com/thehauntedclocktower/</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Dream Homepages</s>
</sec>
</secs>
</link>
<link>
<name>Hawthorn</name>
<url>furc://hawthorn</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Imaginarium</name>
<url>furc://imaginarium</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Last One Standing (Fighting Arenas)</name>
<url>furc://lastonestanding:fightingarenas//</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Roleplaying Dream</s>
</sec>
</secs>
</link>
<link>
<name>Meovanni Village</name>
<url>furc://meovannivillage/</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Naia Green</name>
<url>furc://naiagreen/</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
<link>
<name>Tanglewood Forest, The</name>
<url>http://www.freewebs.com/tanglewoodforest/</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Dream Homepages</s>
</sec>
</secs>
</link>
<link>
<name>Vinca, The</name>
<url>furc://vinca</url>
<kwds>
<kwd>Furcadia</kwd>
</kwds>
<secs>
<sec>
<kword>Furcadia</kword>
<s>Main Maps</s>
</sec>
</secs>
</link>
</links>
[/code]


Notice the content of the <secs> element and, most importantly, the <s> Element. I need to group these links by the content of the <s> element, which describes which group they're in. Oh, and one link omits the <s> element altogether. How can I gather unique <s> names, and set up an array for each one?
×

Success!

Help @Mr_Initial_Man spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.24,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...