Count Words Algorithm
The Count Words Algorithm is a natural language processing technique used to analyze and quantify the frequency of words in a given text. It is a fundamental algorithm in text analysis, information retrieval, and machine learning applications, as it helps in understanding the importance and relevance of specific words or terms within the text. The algorithm works by tokenizing the input text into individual words, usually by using whitespace or punctuation marks as delimiters, and then counting the number of occurrences of each unique word. The output is typically represented as a dictionary, with words as keys and their respective frequencies as values.
In addition to its primary function of counting word occurrences, the Count Words Algorithm can also be extended to perform various text analytics tasks, such as identifying common words or phrases (n-grams), filtering out stop words (commonly used words with little semantic meaning), and normalizing word frequencies to account for variations in text length. The algorithm can also be combined with other natural language processing techniques, such as stemming or lemmatization, to group similar words together and improve the accuracy of the analysis. Ultimately, the Count Words Algorithm serves as a foundational tool in text mining, enabling researchers and developers to extract meaningful insights and patterns from large volumes of unstructured textual data.
package Others;
import java.util.Scanner;
/**
* You enter a string into this program, and it will return how many words were
* in that particular string
*
* @author Marcus
*/
public class CountWords {
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
System.out.println("Enter your text: ");
String str = input.nextLine();
System.out.println("Your text has " + wordCount(str) + " word(s)");
System.out.println("Your text has " + secondaryWordCount(str) + " word(s)");
input.close();
}
private static int wordCount(String s) {
if (s == null || s.isEmpty())
return 0;
return s.trim().split("[\\s]+").length;
}
/**
* counts the number of words in a sentence but ignores all potential
* non-alphanumeric characters that do not represent a word. runs in O(n) where
* n is the length of s
*
* @param s String: sentence with word(s)
* @return int: number of words
*/
private static int secondaryWordCount(String s) {
if (s == null || s.isEmpty())
return 0;
StringBuilder sb = new StringBuilder();
for (char c : s.toCharArray()) {
if (Character.isLetter(c) || Character.isDigit(c))
sb.append(c);
}
s = sb.toString();
return s.trim().split("[\\s]+").length;
}
}