HowProgOne: perl extract text between html tags using regex

perl extract text between html tags using regex

Note: Do not use regular expressions to parse HTML.

This first option is done using HTML::TreeBuilder, one of many HTML Parsers that is available to use. You can visit the link provided above and read the documentation and see the example's that are given.

use strict;
use warnings;
use HTML::TreeBuilder;

my $str 
   = "<ul>"
   . "<li>hello</li>"
   . "<li>there</li>"
   . "<li>everyone</li>"
   . "</ul>"
   ;

# Now create a new tree to parse the HTML from String $str
my $tr = HTML::TreeBuilder->new_from_content($str);

# And now find all <li> tags and create an array with the values.
my @lists = 
      map { $_->content_list } 
      $tr->find_by_tag_name('li');

# And loop through the array returning our values.
foreach my $val (@lists) {
   print $val, "\n";
}

If you decide you want to use a regular expression here (I don't recommend). You could do something like..

my $str
   = "<ul>"
   . "<li>hello</li>"
   . "<li>there</li>"
   . "<li>everyone</li>"
   . "</ul>"
   ;

my @matches;
while ($str =~/(?<=<li>)(.*?)(?=<\/li>)/g) {
  push @matches, $1;
}

foreach my $m (@matches) {
   print $m, "\n";
}

Output:

hello
there
everyone

perl parse

perl extract text between html tags using regex

Popular Topics

Recent Languages

perl extract text between html tags using regex

Related Questions

Popular Topics

Recent Languages