/    Sign up×
Community /Pin to ProfileBookmark

Find "additem" in html web site

I’m looking for a way to find <input type=”hidden” name=”ADDITEM” value=”____”> in a web site that contains over 1000 pages. Need it to sort our data base out. Kinda looking for scripts, programs, or a software
?

to post a comment
HTML

25 Comments(s)

Copy linkTweet thisAlerts:
@CharlesJan 23.2007 — Myself, I would write a spider in Perl using the LWP::Simple and Tree::Builder modules. We taliking a half a dozen lines of code at the most.
Copy linkTweet thisAlerts:
@felgallJan 23.2007 — If you have a copy of the site on your computer then you could just use the Search or Find function built into your operating system to do it.
Copy linkTweet thisAlerts:
@tbriscoeauthorJan 23.2007 — Chrarles- is there another way I'm not to familar with using perl.

Felgal- it does not work when you trying to find HTML code thanks anyway
Copy linkTweet thisAlerts:
@CharlesJan 23.2007 — What are you familiar with?
Copy linkTweet thisAlerts:
@tbriscoeauthorJan 23.2007 — I'm a noob when it comes to scripts.
Copy linkTweet thisAlerts:
@tracknutJan 23.2007 — 
Felgal- it does not work when you trying to find HTML code thanks anyway[/QUOTE]

What operating system are you using?

Dave
Copy linkTweet thisAlerts:
@CharlesJan 23.2007 — Well, then you're going to have to learn one of them and it might as well be Perl.

But if you describe the problem [i]in full[/i] over at the Perl forum you might get someone to write the half a dozen lines of code that you need. We'll need to know where the files are, what they are named, what OS and platform you are running and what the output needs to look like at a minimum.
Copy linkTweet thisAlerts:
@felgallJan 23.2007 — Finding any text in files on your own computer just requires that you search the correct directory where all of the pages are stored (including sub-directories if necessary) and search for the correct text. If you can't get it to work then you are simply not doing it the right way.

If you let us know what operating system you are using then someone can provide you with more complete instructions on how to run searches of directories/folders for that operating system.

You only need to use a scripting language such as PERL if you want to do a global replace of what is found in all of the different files. You don't need to write a script just to retrieve a list of all the files that contain particular text.
Copy linkTweet thisAlerts:
@tbriscoeauthorJan 23.2007 — I am running XP pro sp2, I'll try describe what i need to do the best i can, we have a web (www.poolcenter.com) it has about 1,000 pages it all in HTML format, we are also using a database dydacom's MOM (mail order manger) with sitelink. we sell like over 2,500 products on our website and we over 5,000 in MOM. i am trying to figure outa way list all the products on our site and compare it to the ones in MOM.
Copy linkTweet thisAlerts:
@felgallJan 23.2007 — 
  • 1. open the folder on your computer where you hhave the copy of the files from your web site.


  • 2. click on the search button at the top of the window.


  • 3. In the "A word or phrase in the file" field enter


  • <input type="hidden" name="ADDITEM"

  • 4. Press the Search button at the bottom.


  • If it isn't searching sub-folders select "More Adanced Options" and check the appropriate option before pressing Search.

    This should return a list of all of the files that contain a reference to a hidden field named ADDITEM. If there are only a few files returned then you can just manually check each to get the info on what values they are using. If it returns a large number of files then you may want to look at setting up an automated process using a script to extract the names.
    Copy linkTweet thisAlerts:
    @CharlesJan 23.2007 — I am running XP pro sp2, I'll try describe what i need to do the best i can, we have a web (www.poolcenter.com) it has about 1,000 pages it all in HTML format, we are also using a database dydacom's MOM (mail order manger) with sitelink. we sell like over 2,500 products on our website and we over 5,000 in MOM. i am trying to figure outa way list all the products on our site and compare it to the ones in MOM.[/QUOTE]Yup, it sounds like Perl would do the trick. But you still haven't answered my questions above.
    Copy linkTweet thisAlerts:
    @tbriscoeauthorJan 23.2007 — Which questions are they?
    Copy linkTweet thisAlerts:
    @CharlesJan 23.2007 — We'll still need a list of all of the files or some idea of how to find them. Do you have a page on the site that list them? Or we'll need you to have a copy of the site on your hard drive but then we'll need to know what directories they're in. And we'll need to know how you want the output formatted.
    Copy linkTweet thisAlerts:
    @tbriscoeauthorJan 23.2007 — i dont know how you would get the files but ,

    if you go to www.poolcenter.com/cleaners_pol280_poolstor.htm and the look at the source then if you find "additem" right below it has <input type="hidden" name="ADDITEM" value="e2006"> where it say value="e2006" the quote is what im looking for.

    i would like the format in excel
    Copy linkTweet thisAlerts:
    @CharlesJan 24.2007 — Excel understands comma separated values so that part is a snap. And getting the values you want is as easy as #!c:perlbinperl.exe

    use strict;
    use HTML::TreeBuilder;
    use LWP::Simple;

    my @page = qw (http://www.poolcenter.com/cleaners_pol280_poolstor.htm);

    foreach (@page) {
    my $item = HTML::TreeBuilder-&gt;new_from_content (get $_)-&gt;look_down ('name', 'ADDITEM')-&gt;attr('value');
    print "$_,$itemn";
    };
    But without a way of knowing what all of the files are called you are kind of dead in the water - unless you can run that thing in the same file system as your site.
    Copy linkTweet thisAlerts:
    @tbriscoeauthorJan 24.2007 — i"ve ran active perl and i got a few errors :

    string found where operator expected at - line 7, near "value"); print ""

    scalar found where operator expected at - line 7, near""); print"$_" (missing operator before $_?)

    backlash found where operator expected at - line 7 near "$item" (missing operator befroe?)

    I really dont know what al that means
    Copy linkTweet thisAlerts:
    @tbriscoeauthorJan 24.2007 — I have a back-up site on my hard drive maybe if did a C:Documents and SettingsBriscoeMy DocumentsMy Web SitesSEARCHABLE use it that way , maybe? instead of http://www.poolcenter.com/cleaners_pol280_poolstor.htm
    Copy linkTweet thisAlerts:
    @CharlesJan 24.2007 — Now we're cooking with gas! All we need to do is step through each and every file with a ".htm" extension.

    The Perl I posted works just fine for me. Perhaps you have a cut and paste error going on there.

    You are welcome to go ahead and expand my example above on your own. You'll want to use HTML::TreeBuilder->new_from_file and eliminate the LWP module.

    I'm a little busy at the moment, but check back in 12 hours or so if you haven't solved the problem yourself.
    Copy linkTweet thisAlerts:
    @tbriscoeauthorJan 24.2007 — I'm sure you are a busy man, i did run the script, i didnt get any erors but i dont think i did anything all the cursor did was jump down to the next line
    Copy linkTweet thisAlerts:
    @tbriscoeauthorJan 24.2007 — i downloaded a program called dzscript perl editor it worked but it is coming up with one "additem" on the page, there is a couple of them on there, any suggestions?
    Copy linkTweet thisAlerts:
    @CharlesJan 24.2007 — In the list contex look_down will return all of them looking down from the node, in the scalar context just the first one. Something like:foreach (@page) {
    my $root = HTML::TreeBuilder-&gt;new_from_file($_)
    my @additems = $root-&gt;look_down ('name', 'ADDITEM');
    foreach (@additem) {
    print $_-&gt;attr ('value'), "n";
    };
    $root-&gt;delete() ;
    };
    If you are parsing more than one file you need to delete the tree after each one. http://search.cpan.org/~petek/HTML-Tree-3.23/lib/HTML/Element.pm#%24h-%3Edelete()
    Copy linkTweet thisAlerts:
    @tbriscoeauthorJan 24.2007 — well that one didnt work ?
    Copy linkTweet thisAlerts:
    @CharlesJan 24.2007 — Try#!c:perlbinperl.exe

    use strict;
    use File::Find;
    use HTML::TreeBuilder;

    find &amp;callback, 'C:Documents and SettingsBriscoeMy DocumentsMy Web SitesSEARCHABLE';

    sub callback {
    if (/.htm/) {
    my $root = HTML::TreeBuilder-&gt;new_from_file ( $_);
    my @additems = $root-&gt;look_down ('name', 'ADDITEM');
    foreach (@additems) {
    print $_-&gt;attr ('value'), "n";
    };
    $root-&gt;delete();
    };
    };
    Copy linkTweet thisAlerts:
    @CharlesJan 25.2007 — But HTML::TreeBuilder is a bit more power and overhead than you need. We can use directly the parser and here is the way more confusing but much more light weight version:#!c:perlbinperl.exe
    {
    package MyParser;
    use base 'HTML::Parser';

    <i> </i>sub start {
    <i> </i> my($self, $tagname, $attr, $attrseq, $origtext) = @_;
    <i> </i> if ($tagname eq 'input' &amp;&amp; $$attr{name} &amp;&amp; $$attr{name} eq 'ADDITEM') {
    <i> </i> print "$$attr{value}n";
    <i> </i> }
    <i> </i>}
    }

    use strict;
    use File::Find;

    my $p = MyParser-&gt;new;

    find &amp;callback, 'C:Documents and SettingsBriscoeMy DocumentsMy Web SitesSEARCHABLE';

    sub callback {
    $p-&gt;parse_file ($_) if (/.htm/);
    }
    And this thread should have been moved to the Perl forum long ago.
    Copy linkTweet thisAlerts:
    @tbriscoeauthorJan 25.2007 — Cool it worked! Thank you very much, i think i might study up on perl looks kinda interesting,

    thank you again
    ×

    Success!

    Help @tbriscoe spread the word by sharing this article on Twitter...

    Tweet This
    Sign in
    Forgot password?
    Sign in with TwitchSign in with GithubCreate Account
    about: ({
    version: 0.1.9 BETA 6.1,
    whats_new: community page,
    up_next: more Davinci•003 tasks,
    coming_soon: events calendar,
    social: @webDeveloperHQ
    });

    legal: ({
    terms: of use,
    privacy: policy
    });
    changelog: (
    version: 0.1.9,
    notes: added community page

    version: 0.1.8,
    notes: added Davinci•003

    version: 0.1.7,
    notes: upvote answers to bounties

    version: 0.1.6,
    notes: article editor refresh
    )...
    recent_tips: (
    tipper: @meenaratha,
    tipped: article
    amount: 1000 SATS,

    tipper: @meenaratha,
    tipped: article
    amount: 1000 SATS,

    tipper: @AriseFacilitySolutions09,
    tipped: article
    amount: 1000 SATS,
    )...