PHP Tricks to Identifying Duplicate Words in a String

January 3, 2012February 26, 2020 Mahbub Alam Khan Tutorials

Today we are going to learn how to identify duplicate words in a string using the PHP. When we fetch the contents, news or feeds from the other website, we found the duplicate words and find a convenient way to remove those. Yesterday, I was working on a project, and faced the similar issue, and eventually found a technique to solve that issue. So, here goes the technique.

Example:

// define string
$string = "hello world nice hello world";
// trim the whitespace at the ends of the string
$string = trim($string);

// compress the whitespace in the middle of the string
$string = ereg_replace('[[:space:]]+', ' ', $string);

// decompose the string into an array of "words"
$words  = explode(' ', $string); 

// iterate over the array
// count occurrences of each word
// save stats to another array
foreach ($words as $word) {
	$wordStats[strtolower($word)]++;
}
// print all duplicate words
// result: "hello world "

foreach ($wordStats as $k=>$v) {

	if ($v >= 2) { print "$k 
"; }
}

Output:

Demo

– Step 01:

Firstly, identify the individual words in the sentence or paragraph and remove the multiple spaces from the string.

– Step 02:

Decomposing the sentence into words with explode(), using a single space as [the] delimiter.

– Step 03:

Next, a new associative array, $wordStats , is initialized and a key is created within it for every word in the original string. If a word occurs more than once, the value corresponding to that word’s key in the $wordStats array is incremented by 1.

– Step 04:

Once all the words in the string have been processed, the $wordStats array will contain a list of unique words from the original string, together with a number indicating each word’s frequency. It is now a simple matter to isolate those keys with values greater than 1, and print the corresponding words as a list of duplicates.

Important Note:

ereg_* functions have been deprecated and will throw errors in PHP versions 5.3 and newer. Therefore they should not be used (and new tutorials should not be promoting them). Use preg_* instead.

Enjoy!

Total 0 Votes