See this regular expression in action
Here's the problem. For this blog, I want to be able to put code (inside pre tags) right inside of my articles, which are stored in the database. When I display the articles, I want to run the php function htmlentities() on everything between my pre tags so that all the code is properly escaped.
Here's our imaginary article source code (pulled from the database):
<h1>An article with PHP code</h1> <pre> <?php echo 'hi'; ?> <br/> <?php echo 'oh hey there'; ?> </pre> <p> The above and below pre tags will be rendered as code on the screen </p> <pre> <?php echo 'hello for a second time'; ?> <br/> <?php echo 'yep, here we are again'; ?> </pre> <p> Thanks for reading! </p>
To make the code (everything between pre tags) appear correctly when output, we need to run it through the htmlentities method. In other words, we need to isolate all content that comes between the <pre> and </pre> tags. Here's how we do it.
// $content holds your raw content
$content_processed = preg_replace_callback(
'#\<pre\>(.+?)\<\/pre\>#s',
create_function(
'$matches',
'return "<pre>".htmlentities($matches[1])."</pre>";'
),
$content
);
Your $content_processed variable now holds the processed version of your article. That's it!
How it all Works
The above code runs everything between a pre tag through the htmlentities function. This is just one way you may need to process your content. Let's look more closely at how this works.
The pattern that matches our pre tags and their content is:
#\<pre\>(.+?)\<\/pre\>#s
If you're relatively new to regular expressions, the two pound (#) symbols may look strange, but they're harmless. Perl regular expressions (which we're using here) must always start and end with a delimiter. The # symbol is used here, but / is probably the most common delimiter. Delimiters appear at the start and end of the string you want to match. Any characters appearing after the delimiter (in this case there is an s at the end) have special meaning. In this case, the 's' after the final # delimiter means that matches can be found over multiple lines. Without this, the content inside of our pre tags would all need to be on the same line to match.
One more very important piece of our match is the (.+?) portion. Apart from the question mark (?), this is straightforward regex, which basically says to match 1 or more of any character. This is the portion of our code that captures the contents between our pre tags. The question mark (?) is very important. Normally, regular expressions are "greedy". This means that it'll always look for the LAST instance of what it's searching for in your string. In this case, if you have multiple pre tags, it'll match the entire string between the first pre tag and the last pre tag:
#\(.+?)\<\/pre\>#matches ALL of the following:
<pre> <?php echo 'hi'; ?> <br/> <?php echo 'oh hey there'; ?> </pre> <p> We should make one more code tag just to make sure we've got everything right: </p> <pre> <?php echo 'hello for a second time'; ?> <br/> <?php echo 'yep, here we are again'; ?> </pre>but #\
(.+?)\<\/pre\>#smatches the 2 following pieces individually
<pre> <?php echo 'hi'; ?> <br/> <?php echo 'oh hey there'; ?> </pre>AND
<pre> <?php echo 'hello for a second time'; ?> <br/> <?php echo 'yep, here we are again'; ?> </pre>
Obviously, the second result is what we want because the first (without the 's') matches all of the text in between the 2 pre blocks in addition to the pre blocks themselves. Placing the 's' after the ending delimiter tells the function to match in a non greedy fashion, meaning that it'll look for the first occurrence of a match, not the last. In other words, if you neglect the 's', you'll match too much (greedy).
$content = preg_replace_callback(
'#\<pre\>(.+?)\<\/pre\>#s',
create_function(
'$matches',
'return "<pre>".htmlentities($matches[1])."</pre>";'
),
$content
);
The rest of the function is fairly simple. In order to process the code between our pre tags, we create a function using create_function that does exactly that. The syntax is a little confusing, but the above code simply replaces each pre block with a pre block whose contents have been run through the htmlentities function.
Summing Up
If you need a regular expression that will match the content in between two html tags, use the following:
preg_match('#\<pre\>(.+?)\<\/pre\>#s', $html_content, $matches);










some contents
It will stop parsing as follows
some contents
-Ryan
How could I make replace for any text the located between HTML tags.
for example, if I want to replace the word "php" with asp in the following text:
myphp best php website PhP!!! php and myphp or phpme - php!!
How can I creat the following resualt:
myphp best asp website asp!!! aspand myphp or phpme - asp!!
?
thank you in advance,
Roi
How could I make replace for any text the located between HTML tags.
for example, if I want to replace the word "php" with asp in the following text:
myphp best php website PhP!!! php and myphp or phpme - php!!
How can I creat the following resualt:
myphp best asp website asp!!! aspand myphp or phpme - asp!!
?
thank you in advance,
Roi
Yes, you could do that with a regex that looks something like this:
#\]+)\>(.+?)\]+)\>#s
The problem you're going to run into is if you have any embedded html tags inside the tag you're trying to match. The above expression will stop at the first that it finds.
Used your code for search-replace function of tag contents:
function tagreplace($content,$tag,$search,$replace){
$content_processed = preg_replace_callback(
'#\(.+?)\#s',
create_function(
'$matches',
'return "".str_replace("'.$search.'","'.$replace.'",$matches[1])."";'
),
$content
);
return $content_processed;
}
Please help i used your above code
Check your regular expression - you're probably just missing the closing "#" at the end of it. All regular expressions must open and close with the same "delimiter". Like
#foo#
or
/foo/