Package org.bzdev.io

Class CSVReader

java.lang.Object
java.io.Reader
org.bzdev.io.CSVReader
All Implemented Interfaces:
Closeable, AutoCloseable, Readable

public class CSVReader extends Reader
Reader for CSV (Comma Separated Values) input streams. The CSV format is describe in RFC 4180. Essentially, it consists of a set a series of rows providing a field for each of N columns, using a comma to separate columns. A quoting convention allows a quoted field to contain commas, new lines, or double quotes. All rows must have the same number of columns, although the length of a field may be zero. Unfortunately, there are a variety of implementations that are not completely compatible with each other. The classes CSVWriter and CSVReader follows RFC 4180 but provides options to aid in conversions between variants of this format. For example, RFC 4180, as written, assumes that each field contains printable 7-bit ASCII characters (excluding, for example, control characters). CSVWriter and CSVReader will allow any character to appear in a field. Options allow a choice of line separators (the default is CRLF, the one used in RFC 4180).

The constructors' arguments indicate

  • the reader providing the input. This reader should be configured to use the character set appropriate for the input.
  • whether or not the first line of the input is a header.
  • optionally a delimiter used as a line separator when a row is split between multiple lines. This delimiter is used to join adjacent lines to form a single row. If null the system's line separator is used. If carriage returns and line feeds should be reproduced "as is" when lines are joined to construct rows, One should specify a delimiter that matches the end-of-line convention used in the input.

For the simplest use of this class, one will first call a constructor. If the constructor indicates that the first line is a header, the method getHeaders() will return an array containing the headers in column order. Calling nextRow() will similarly return the fields of the next row in the input, finally returning null when the end of the input is reached. For example,


     Reader in = new FileReader("input.csv");
     CSVReader r = new CSVReader(in, true);
     String[] headers = r.getHeaders();
     String[] fields;
     while ((fields = r.nextRow()) != null) {
         ...
     }
     r.close();
 
One may also use the Reader read methods to read a field, followed by a call to nextField() to go to the next field.

To specify a character encoding, the Reader passed as the first argument of a constructor should be an instance of InputStreamReader or a Reader that contains an instance of InputStreamReader, possibly with several intermediate readers.

  • Constructor Details

    • CSVReader

      public CSVReader(Reader in, boolean hasHeader) throws IOException
      Constructor. When the first row is classified as a header, it is skipped but can be retrieved by calling getHeaders().
      Parameters:
      in - the input
      hasHeader - true if the first line (or row) is a header; false otherwise
      Throws:
      IOException - an IO Exception was thrown
    • CSVReader

      public CSVReader(Reader in, boolean hasHeader, LineReader.Delimiter delimiter) throws IOException
      Constructor specifying a delimiter. When the row is classified as a header, it is skipped but can be retrieved by calling getHeaders(). If a field (which must be a quoted one in this case) is split between two lines, the specified delimiter will be inserted when the lines are joined to create a single row.
      Parameters:
      in - the input
      hasHeader - true if the first line (or row) is a header; false otherwise
      delimiter - the delimiter (LineReader.Delimiter.LF for a new line, LineReader.Delimiter.CR for a carriage return, or LineReader.Delimiter.CRLF for a carriage return followed by a new line; null for the system-defined line separator ("\n" on Unix system, "\r\n" on Microsoft Windows, and "\r" on the original MacOS systems)
      Throws:
      IOException - an IO Exception was thrown
  • Method Details

    • getDelimiter

      public LineReader.Delimiter getDelimiter()
      Get the delimiter. The specified delimiter is used to join lines when a row is split into multiple lines.
      Returns:
      the delimiter; null for the system-defined line separator ("\n" on Unix system, "\r\n" on Microsoft Windows, and "\r" on the original MacOS systems).
    • getHeaders

      public String[] getHeaders()
      Get the headers.
      Returns:
      the headers; null if the constructor indicates that the first row does not contain headers
    • nextRow

      public String[] nextRow() throws IOException
      Get the fields that make up the next row.
      Returns:
      an array containing the fields in column order
      Throws:
      IOException - an IO Exception was thrown
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in class Reader
      Throws:
      IOException
    • ready

      public boolean ready() throws IOException

      This method always returns true because read(char[],int,int) returns -1 when the end of a field is reached. To obtain more characters to read, nextField() should be called. That method may block until the next row is read.

      Overrides:
      ready in class Reader
      Throws:
      IOException
    • read

      public int read(char[] cbuf, int off, int len) throws IOException

      This method returns -1 when the end of a field is reached. To read more data, one should call nextField(), which may block when the next row has to be read.

      Specified by:
      read in class Reader
      Throws:
      IOException
    • nextField

      public boolean nextField() throws IOException
      Get the next field.
      Returns:
      true if the next field has been read; false if there are no more fields to read
      Throws:
      IOException - an IO Exception was thrown