- Enclosing class:
- SuffixArray
String
instead of an array of bytes.
The methods that use least common prefixes compute lengths in
bytes rather than the number of UTF-8 characters in a prefix.
This class is provided because of the widespread use of UTF-8.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.bzdev.util.SuffixArray
SuffixArray.Array<T>, SuffixArray.Byte, SuffixArray.Char, SuffixArray.Integer, SuffixArray.Iterator, SuffixArray.Range, SuffixArray.Short, SuffixArray.String, SuffixArray.UnsignedByte, SuffixArray.UnsignedShort, SuffixArray.UTF
-
Field Summary
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionint
findInstance
(String subsequence) Find the index into the sequence associated with a suffix array for an arbitrary instance of a subsequence, with the subsequence specified by a string.Find all instances of a subsequence, with the subsequence specified by a string.int
findSubsequence
(String subsequence) Find the suffix-array index of an arbitrary instance of a subsequence with the subsequence specified by a string.int
findSubsequence
(String subsequence, boolean keyflag) Find a subsequence that is specified by a string.Methods inherited from class org.bzdev.util.SuffixArray.UnsignedByte
commonPrefixLength, fillLCPArray, findInstance, findInstance, findRange, findRange, findSubsequence, findSubsequence, findSubsequence, findSubsequence, findSubsequence, getBWT, getSequence, inverseBWT
Methods inherited from class org.bzdev.util.SuffixArray
clearCachedInverse, clearCachedLCP, clearCachedLCPLR, countUniqueSubsequences, getArray, getInverse, getLCP, hasInverse, hasLCP, hasLCPLR, lcpLength, subsequences, useInverse, useLCP, useLCPLR
-
Constructor Details
-
UTF
public UTF(byte[] sequence) Constructor. The sequence must not be changed after this constructor is called as long as this suffix array is used.- Parameters:
sequence
- a sequence of bytes whose values form a sequence of UTF-8-encoded characters
-
UTF
Constructor for precomputed suffix arrays. The maximum reasonable value of n should be no higher than the length of the sequence and frequently much lower. The sequence must not be changed after this constructor is called as long as this suffix array is used.The suffix array passed as the third argument must have one more element than the sequence.
This constructor is intended for the case where a suffix array was previously computed and saved in a file. The caller must assure that the suffix array was computed for the sequence and that the alphabets are the same.
- Parameters:
sequence
- a sequence of integers whose values are in the range [0,n)sarray
- the suffix array- Throws:
IllegalArgumentException
- n is out of range or the suffix array is not compatible with the sequence.
-
-
Method Details
-
findInstance
Find the index into the sequence associated with a suffix array for an arbitrary instance of a subsequence, with the subsequence specified by a string.Using an LCP-LR table (created by calling
SuffixArray.useLCPLR()
) will change with time complexity of this method from O(m log n) to O(m + log n), where m is the length of a subsequence and n is the length of the sequence array. These, however, are worst-case numbers: while it can take m steps for a comparison function to determine that two suffixes differ, the comparison will stop at the first step at which the suffixes actually differ: the difference in running time in practice is data-set dependent.If the argument has a length of 0, the value returned is the length of the sequence.
- Parameters:
subsequence
- the subsequence.- Returns:
- the index into the sequence; -1 if the subsequence cannot be found
- See Also:
-
findSubsequence
Find the suffix-array index of an arbitrary instance of a subsequence with the subsequence specified by a string. This method finds the index into a suffix array corresponding to a subsequence.- Parameters:
subsequence
- the subsequence.- Returns:
- the index into the suffix array; -1 if the subsequence cannot be found
- See Also:
-
findSubsequence
Find a subsequence that is specified by a string. This method finds the index into a suffix array corresponding to a subsequence.- Parameters:
subsequence
- the subsequence.keyflag
- true if the highest index should be returned; false if the lowest index should be returned.- Returns:
- the index into the suffix array; -1 if the subsequence cannot be found
- See Also:
-
findRange
Find all instances of a subsequence, with the subsequence specified by a string.- Parameters:
subsequence
- the subsequence- Returns:
- the subsequences corresponding to a range in the suffix array
- See Also:
-