/    Sign up×
Community /Pin to ProfileBookmark

java searching html

i want to use java for searching html tags. More specific i want to know if a webpage has rss. If it has i see it in meta tag. Then i want to download the link for rss and store it in database. For going to pages i use a java crawler.

Anyone to help me?

to post a comment
Java

4 Comments(s)

Copy linkTweet thisAlerts:
@athanachauthorNov 09.2007 — Do you want any clarification because my English isn't good enough.........

Tell me someone
Copy linkTweet thisAlerts:
@chazzyNov 11.2007 — couldn't you use DOM to parse the webpage?
Copy linkTweet thisAlerts:
@athanachauthorNov 11.2007 — any good DOM api?

this that i want to do to fount the world rss and store the url

<head>

<title>SPORT 24</title>

<meta http-equiv="content-type" content="text/html; charset=windows-1253"/>

<link href="/ast/css/s24_022.css" type="text/css" rel="stylesheet"/>

<script language="javascript" src="/ast/js/s24_007.js"></script>

<link rel="Shortcut Icon" href="/favicon.ico" >

<link rel="icon" href="/favicon.png" type="image/png">


<link rel="alternate" type="application/rss+xml" href="http://www.sport24.gr/svc/rss/topNews/" title="SPORT 24 RSS: &#931;&#951;&#956;&#945;&#957;&#964;&#953;&#954;&#972;&#964;&#949;&#961;&#949;&#962; &#949;&#953;&#948;&#942;&#963;&#949;&#953;&#962;" />

<link rel="alternate" type="application/rss+xml" href="http://www.sport24.gr/svc/rss/lastNews/" title="SPORT 24 RSS: &#929;&#959;&#942; &#949;&#953;&#948;&#942;&#963;&#949;&#969;&#957;, &#972;&#955;&#949;&#962; &#959;&#953; &#954;&#945;&#964;&#951;&#947;&#959;&#961;&#943;&#949;&#962;" />

</head>

for example to found the from th e type tag

<link rel="alternate" type="application/rss+xml" href="http://www.sport24.gr/svc/rss/topNews/" title="SPORT 24 RSS: &#931;&#951;&#956;&#945;&#957;&#964;&#953;&#954;&#972;&#964;&#949;&#961;&#949;&#962; &#949;&#953;&#948;&#942;&#963;&#949;&#953;&#962;" />

and store the

href="http://www.sport24.gr/svc/rss/topNews/"


any help?
Copy linkTweet thisAlerts:
@chazzyNov 11.2007 — The org.w3c.dom package maybe? It's always what I use.

This came up in a quick search in google.

http://htmlparser.sourceforge.net/
×

Success!

Help @athanach spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 6.2,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @meenaratha,
tipped: article
amount: 1000 SATS,

tipper: @meenaratha,
tipped: article
amount: 1000 SATS,

tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,
)...