Today we are going to learn how to identify duplicate words in a string using the PHP. When we fetch the contents, news or feeds from the other website, we found the duplicate words and find a convenient way to remove those. Yesterday, I was working on a project, and faced the similar issue, and eventually found a technique to solve that issue. So, here goes the technique.
Example:
// define string $string = "hello world nice hello world"; // trim the whitespace at the ends of the string $string = trim($string); // compress the whitespace in the middle of the string $string = ereg_replace('[[:space:]]+', ' ', $string); // decompose the string into an array of "words" $words = explode(' ', $string); // iterate over the array // count occurrences of each word // save stats to another array foreach ($words as $word) { $wordStats[strtolower($word)]++; } // print all duplicate words // result: "hello world " foreach ($wordStats as $k=>$v) { if ($v >= 2) { print "$k "; } }
Output:
– Step 01:
Firstly, identify the individual words in the sentence or paragraph and remove the multiple spaces from the string.
– Step 02:
Decomposing the sentence into words with explode(), using a single space as [the] delimiter.
– Step 03:
Next, a new associative array, $wordStats , is initialized and a key is created within it for every word in the original string. If a word occurs more than once, the value corresponding to that word’s key in the $wordStats array is incremented by 1.
– Step 04:
Once all the words in the string have been processed, the $wordStats array will contain a list of unique words from the original string, together with a number indicating each word’s frequency. It is now a simple matter to isolate those keys with values greater than 1, and print the corresponding words as a list of duplicates.
Important Note:
ereg_* functions have been deprecated and will throw errors in PHP versions 5.3 and newer. Therefore they should not be used (and new tutorials should not be promoting them). Use preg_* instead.
Enjoy!