. Advertisement .
..3..
. Advertisement .
..4..
The replaceAll()
method can help you remove punctuation from string in Java. But how, exactly? This tutorial will give you the answer.
Remove Punctuation From String In Java With replaceAll()
The replaceAll()
method of the String class searches for a substring and replaces every occurrence with a substitute. Its syntax is as follows:
replaceAll(regex, repl)
You will need to provide the method with a regular expression (regex) representing the pattern that needs to be replaced. The repl parameter is a string that replaceAll()
will substitute for every occurrence of that pattern.
From this description, we can think of a straightforward algorithm: supply the replaceAll()
method with a regex expression that matches every punctuation and tell it to replace them with an empty string.
To target all punctuation characters (!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~), this is a regular expression many might use:
!|"|#|\$|%|&|'|\(|\)|\*|\+|,|-|\.|/|:|;|<|=|>|\?|@|\[|\\|\]|\^|_|`|{|\||}|~|
At first glance, there should be no problem. In fact, a test on sites like Regexr.com can also verify this expression (it could match every punctuation during the test).
However, there are several problems with using this long expression in Java. It has terrible readability, making debugging and reusing the code much harder. But most importantly, you will run into trouble trying to pass this regular expression into the replaceAll()
method:
import java.util.regex.Pattern
public class RemovePunctuation {
public static void main(String args[]) {
String str1 = "#Welcome to ITTutoria.net!# Thi^s is a* [test] ~string fo_r our examp|le.";
String str2 = str1.replaceAll("!|"|#|\$|%|&|'|\(|\)|\*|\+|,|-|\.|/|:|;|<|=|>|\?|@|\[|\\|\]|\^|_|`|{|\||}|~|", "");
System.out.println("Original string:\n" + str1);
System.out.println("Replacement string:\n" + str2);
}
}
Output:
/RemovePunctuation.java:1: error: ';' expected
import java.util.regex.Pattern
^
/RemovePunctuation.java:6: error: illegal character: '#'
String str2 = str1.replaceAll("!|"|#|\$|%|&|'|\(|\)|\*|\+|,|-|\.|/|:|;|<|=|>|\?|@|\[|\\|\]|\^|_|`|{|\||}|~|", "");
^
/RemovePunctuation.java:6: error: illegal character: ''
String str2 = str1.replaceAll("!|"|#|\$|%|&|'|\(|\)|\*|\+|,|-|\.|/|:|;|<|=|>|\?|@|\[|\\|\]|\^|_|`|{|\||}|~|", "");
Why does Java throw such errors? Because the regular expression pattern above contains several special characters, which change the way Java interprets the string. For instance, from the compiler’s perspective, the first double-quote character in the expression marks the end of the string. This puts the remainder of the expression outside the string, rendering them illegal.
Instead of dealing with such a hassle, have a look at the concept of POSIX character classes in regular expressions.
These predefined classes represent different sets of characters with bracket expressions. The POSIX standard defines those classes, such as:
- [:alpha:] (alphabetic characters)
- [:digit:] (digits)
- [:alnum:] (numeric and alphabetic characters)
- [:lower:] (lowercase letters)
- [:punct:] (punctuation)
While Java doesn’t support these bracket expressions, you can specify those character classes with the \p operator. For instance, the \p{Lower} shorthand is equivalent to the [:lower:] POSIX character class.
To match punctuation characters in Java, you can use the \p{Punct} shorthand:
public class RemovePunctuation {
public static void main(String args[]) {
String str1 = "#Welcome to ITTutoria.net!# Thi^s is a* [test] ~string fo_r our examp|le.";
String str2 = str1.replaceAll("\\p{Punct}", "");
System.out.println("Original string:\n" + str1);
System.out.println("Replacement string:\n" + str2);
}
}
Output:
Original string:
#Welcome to ITTutoria.net!# Thi^s is a* [test] ~string fo_r our examp|le.
Replacement string:
Welcome to ITTutorianet This is a test string for our example
In the example above, we put several punctuation symbols in a test string. The replaceAll()
method is given the “\\p{Punct}
” as the regular expression.
Ignoring the escape character (the first backslash), this indicates the character class of punctuation, (!”#$%&'()*+,-./:;<=>?@[\]^_`{|}~). Since the second parameter is an empty string, the method removes every occurrence of this class from the resulting string.
Summary
You can remove punctuation from string in Java with the replaceAll()
method. The \p{Punct}
shorthand comes in handy to help you search for every punctuation symbol at once.
Note: if you just want to check whether a string contains a specific character, not replace it, give this guide a try.
Leave a comment