PHP Tricks to Identifying Duplicate Words in a String

PHP Tricks to Identifying Duplicate Words in a String

Today we are going to learn how to identify duplicate words in a string using the PHP. When we fetch the contents, news or feeds from the other website, we found the duplicate words and find a convenient way to remove those. Yesterday, I was working on a project, and faced the similar issue, and eventually found a technique to solve that issue. So, here goes the technique.

Example:

// define string
$string = "hello world nice hello world";
// trim the whitespace at the ends of the string
$string = trim($string);

// compress the whitespace in the middle of the string
$string = ereg_replace('[[:space:]]+', ' ', $string);

// decompose the string into an array of "words"
$words  = explode(' ', $string); 

// iterate over the array
// count occurrences of each word
// save stats to another array
foreach ($words as $word) {
	$wordStats[strtolower($word)]++;
}
// print all duplicate words
// result: "hello world "

foreach ($wordStats as $k=>$v) {

	if ($v >= 2) { print "$k 
"; }
}

Output:

Demo

– Step 01:

Firstly, identify the individual words in the sentence or paragraph and remove the multiple spaces from the string.

– Step 02:

Decomposing the sentence into words with explode(), using a single space as [the] delimiter.

– Step 03:

Next, a new associative array, $wordStats , is initialized and a key is created within it for every word in the original string. If a word occurs more than once, the value corresponding to that word’s key in the $wordStats array is incremented by 1.

– Step 04:

Once all the words in the string have been processed, the $wordStats array will contain a list of unique words from the original string, together with a number indicating each word’s frequency. It is now a simple matter to isolate those keys with values greater than 1, and print the corresponding words as a list of duplicates.

Important Note:

ereg_* functions have been deprecated and will throw errors in PHP versions 5.3 and newer. Therefore they should not be used (and new tutorials should not be promoting them). Use preg_* instead.

Enjoy!

Total 0 Votes
0

Tell us how can we improve this post?

+ = Verify Human or Spambot ?

Leave a Comment

Back To Top
Cool Ajax - Web Development Tutorial Blog
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.