If there are a number of predetermined patterns (substrings) to search for,
ACMatcher
will outperform the same search using a
SuffixArray
. If multiple searches are performed on the same
string, with the pattern or substring not known in advance, a
SuffixArray
is a better choice.
The implementation is based on one contributed to the Geeks for
Geeks website by Princi Singh as an example of
an Aho Corasick algorithm implementation. The modifications included
using a BitSet
instead of an integer (which restricted the
number of patterns to 32), a dynamically configured
alphabet, and methods to provide iterators and streams. The
implementation in this package can be configured to be either case
sensitive or case insensitive. Finally, the sequence being
searched can be represented as either a string or an array of
characters, and the search can be restricted to a subsequence of
either.
The patterns to search for are passed to a constructor. The methods used to search are named
- iterator. This provides an
Iterator
. - iterableOver. This provides an
Iterable
for use in a 'for' loop. - stream. This provides a
Stream
.-
These are overloaded so that the text being searched can be either a
string or a character array, and so that all the text can be searched
or just a subsequence of the text.
-
Nested Class Summary
Nested Classes -
Constructor Summary
ConstructorsConstructorDescriptionConstructor specifying if the matcher is case sensitive or case insensitive.Constructor using a variable number of arguments and indicating if the search is case sensitive or case insensitive.Constructor using an array of pattern specifications and specifying if the matcher is case sensitive.Constructor.Constructor using a variable number of arguments.Constructor using an array of pattern specifications. -
Method Summary
Modifier and TypeMethodDescriptionString[]
Get the patterns (search strings) for this matcher.iterableOver
(char[] text) Get anIterable
that searcher the given text, provided as a character array, for the patterns used to configure this matcher.iterableOver
(char[] text, int start, int end) Get anIterable
that searcher's a portion of the given text, provided as a character array, for the patterns used to configure this matcher.iterableOver
(String text) Get anIterable
that searches the given text, provided as a string, for the patterns used to configure this matcher.iterableOver
(String text, int start, int end) Get anIterable
that starch's a portion of the given text, provided as a string, for the patterns used to configure this matcher.iterator
(char[] text) Get an iterator that will scan a given character array and match that sequence against the patterns used to configure this matcher.iterator
(char[] text, int start, int end) Get an iterator that will scan a portion of a given character array and match that sequence against the patterns used to configure this matcher.Get an iterator that will scan a given string and match that string against the patterns used to configure this matcher.Get an iterator that will scan a portion of a given string and match that string against the patterns used to configure this matcher.int
size()
Get the number of patterns for this matcher.stream
(char[] text) Get a stream of matches for the given text, provided as a character array.stream
(char[] text, int start, int end) Get a stream of matches for a portion the given text, provided as a character array.Get a stream of matches for the given text, provided as a string.Get a stream of matches for a portion of the given text, provided as a string.
-
Constructor Details
-
ACMatcher
Constructor using a variable number of arguments. The search is case sensitive.- Parameters:
first
- the first search stringrest
- the remaining search strings
-
ACMatcher
Constructor using a variable number of arguments and indicating if the search is case sensitive or case insensitive.- Parameters:
ignoreCase
- true if the search is case insensitive; false if the search is case sensitivefirst
- the first search stringrest
- the remaining search strings
-
ACMatcher
Constructor using an array of pattern specifications. A function maps each pattern specification to the corresponding pattern. This can be used to associate each pattern with a enumeration, which can make the use of aswitch
statement more reliable when new cases are added. For example,static enum SpecType { TYPE1, TYPE2, ... } static class Spec { SpecType type String pattern; public Spec(specType type, string pattern) { this.type = type; this.pattern = pattern; } ... Spec specs[] = { new Spec(SpecType.TYPE1, "foo"), new Spec(SpecType.TYPE2, "bar"), ... }; ACMatcher matcher = new ACMatcher((spec) -> {return spec.pattern;}, specs); for (ACMatcher.MatchResult mr: matcher.interatorOver(text)) { int index = mr.getIndex(); switch (spec[index].type) { case TYPE1: ... } }
- Type Parameters:
T
- the type of the objects used as a pattern specifications- Parameters:
f
- a function that maps a pattern specification to a patternspecs
- an array containing pattern specifications
-
ACMatcher
Constructor using an array of pattern specifications and specifying if the matcher is case sensitive. A function maps each pattern specification to the corresponding pattern. This can be used to associate each pattern with a enumeration, which can make the use of aswitch
statement more reliable when new cases are added. For example,static enum SpecType { TYPE1, TYPE2, ... } static class Spec { SpecType type String pattern; public Spec(specType type, string pattern) { this.type = type; this.pattern = pattern; } ... Spec specs[] = { new Spec(SpecType.TYPE1, "foo"), new Spec(SpecType.TYPE2, "bar"), ... }; ACMatcher matcher = new ACMatcher((spec) -> {return spec.pattern;}, specs); for (ACMatcher.MatchResult mr: matcher.interatorOver(text)) { int index = mr.getIndex(); switch (spec[index].type) { case TYPE1: ... } }
- Type Parameters:
T
- the type of the objects used as a pattern specifications- Parameters:
ignoreCase
- true if the search is case insensitive; false if the search is case sensitivef
- a function that maps a pattern specification to a patternspecs
- an array containing pattern specifications
-
ACMatcher
Constructor. The search is case sensitive.- Parameters:
strings
- the strings to match
-
ACMatcher
Constructor specifying if the matcher is case sensitive or case insensitive.- Parameters:
ignoreCase
- true if the search is case insensitive; false if the search is case sensitivestrings
- the strings to match
-
-
Method Details
-
size
public int size()Get the number of patterns for this matcher.- Returns:
- the number of patterns
-
getPatterns
Get the patterns (search strings) for this matcher. If the search is case insensitive, the patterns will use lower case, regardless of the cases used in a constructor.- Returns:
- the patterns
-
iterableOver
Get anIterable
that searches the given text, provided as a string, for the patterns used to configure this matcher.This is a convenience method, added to allow an instance of this class to be used in a 'for' loop:
String patterns[] = {...}; ACMatcher matcher = new ACMatcher(patterns); String text = "..."; for (ACMatcher.MatchResult mr: matcher.iterableOver(text)) { ... }
- Parameters:
text
- the text to search- Returns:
- the iterable
-
iterableOver
Get anIterable
that starch's a portion of the given text, provided as a string, for the patterns used to configure this matcher.This is a convenience method, added to allow an instance of this class to be used in a 'for' loop:
String patterns[] = {...}; ACMatcher matcher = new ACMatcher(patterns); String text = "..."; int start = ...; int end = ...; for (ACMatcher.MatchResult mr: matcher.iterableOver(text, start, end)) { ... }
- Parameters:
text
- the text to searchstart
- the starting index (inclusive) for the text being scannedend
- the ending index (exclusive) for the text being scanned- Returns:
- the iterable
-
iterableOver
Get anIterable
that searcher the given text, provided as a character array, for the patterns used to configure this matcher.This is a convenience method, added to allow an instance of this class to be used in a 'for' loop:
String patterns[] = {...}; ACMatcher matcher = new ACMatcher(patterns); String text = "..."; for (ACMatcher.MatchResult mr: matcher.iterableOver(text)) { ... }
- Parameters:
text
- the text to search- Returns:
- the iterable
-
iterableOver
Get anIterable
that searcher's a portion of the given text, provided as a character array, for the patterns used to configure this matcher.This is a convenience method, added to allow an instance of this class to be used in a 'for' loop:
String patterns[] = {...}; ACMatcher matcher = new ACMatcher(patterns); String text = "..."; int start = ...; int end = ...; for (ACMatcher.MatchResult mr: matcher.iterableOver(text, start, end)) { ... }
- Parameters:
text
- the text to searchstart
- the starting index (inclusive) for the text being scannedend
- the ending index (exclusive) for the text being scanned- Returns:
- the iterable
-
stream
Get a stream of matches for the given text, provided as a string.- Parameters:
text
- the text to scan- Returns:
- the stream
-
stream
Get a stream of matches for a portion of the given text, provided as a string.- Parameters:
text
- the text to scanstart
- the starting index (inclusive) for the text being scannedend
- the ending index (exclusive) for the text being scanned- Returns:
- the stream
-
stream
Get a stream of matches for the given text, provided as a character array.- Parameters:
text
- the text to scan- Returns:
- the stream
-
stream
Get a stream of matches for a portion the given text, provided as a character array.- Parameters:
text
- the text to scanstart
- the starting index (inclusive) for the text being scannedend
- the ending index (exclusive) for the text being scanned- Returns:
- the stream
-
iterator
Get an iterator that will scan a given string and match that string against the patterns used to configure this matcher.- Parameters:
text
- the string to scan- Returns:
- an iterator that will enumerate the matches
- See Also:
-
iterator
Get an iterator that will scan a portion of a given string and match that string against the patterns used to configure this matcher.- Parameters:
text
- the string to scanstart
- the starting index (inclusive) for the text being scannedend
- the ending index (exclusive) for the text being scanned- Returns:
- an iterator that will enumerate the matches
- See Also:
-
iterator
Get an iterator that will scan a given character array and match that sequence against the patterns used to configure this matcher.- Parameters:
text
- the character array to scan- Returns:
- an iterator that will enumerate the matches
- See Also:
-
iterator
Get an iterator that will scan a portion of a given character array and match that sequence against the patterns used to configure this matcher.- Parameters:
text
- the character array to scanstart
- the starting index (inclusive) for the text being scannedend
- the ending index (exclusive) for the text being scanned- Returns:
- an iterator that will enumerate the matches
- See Also:
-