Package org.bzdev.io

Class CSVWriter

java.lang.Object
java.io.Writer
org.bzdev.io.CSVWriter
All Implemented Interfaces:
Closeable, Flushable, Appendable, AutoCloseable

public class CSVWriter extends Writer
Writer for CSV (Comma Separated Values) output streams. The CSV format is describe in RFC 4180. Essentially, it consists of a set a series of rows providing a field for each of N columns, using a comma to separate columns. A quoting convention allows a quoted field to contain commas, new lines, or double quotes. All rows must have the same number of columns, although the length of a field may be zero. Unfortunately, there are a variety of implementations that are not completely compatible with each other. The classes CSVWriter and CSVReader follows RFC 4180 but provides options to aid in conversions between variants of this format. For example, RFC 4180, as written, assumes that each field contains printable 7-bit ASCII characters (excluding, for example, control characters). CSVWriter and CSVReader will allow any character to appear in a field. Options allow a choice of line separators (the default is CRLF, the one used in RFC 4180).

The constructors provide the number of columns (i.e., the number of fields per row). One can then create a CSV file by calling writeField(String) repeatedly, adding fields for columns 0 to (n-1) before proceeding to the next row.. If a row is not complete, one can call nextRow() to pad the row with empty fields. The method nextRowIfNeeded() can be used to finish a row, but will not add a row that has only empty fields.

Alternatively, one can use the Writer methods Writer.write(char[]), Writer.write(char[],int,int), Writer.write(int), Writer.write(String) or Writer.write(String,int,int) to store the character sequence making up a field, followed by nextField() to create the field, and clear the input for the next field.

For example,


   int ncols = 5;
   Writer out = new PrintWriter("output.csv", "US-ASCII");
   CSVWriter w = new CSVWriter(out, ncols);
   w.writeRow("col1", "col2", "col3", "col4", "col5");
   w.writeRow(...)
   ...
   w.close();
 
Alternatively, one can write the rows as follows:

   int nrows = 20;
   int ncols = 5;
   String data[][] = {
     {"col1", "col2", "col3", "col4", "col5"},
     ... // 20 rows of data
   }
   Writer out = new PrintWriter("output.csv", "US-ASCII");
   CSVWriter w = new CSVWriter(out, ncols);
   for (int i = 0, i < nrows; i++) {
      for (int j = 0; j < ncols; j++) {
          w.writeField(data[i][j]);
      }
   }
 
The call to writeField(String) can be replaced with a series of calls to the write methods defined by the Writer class, followed by a call to nextField().

To specify a character encoding, the Writer passed as the first argument of a constructor should be an instance of OutputStreamWriter or a Writer that contains an instance of OutputStreamWriter, possibly with several intermediate writers.

  • Constructor Details

    • CSVWriter

      public CSVWriter(Writer out, int n)
      Constructor.
      Parameters:
      out - the output for this writer
      n - the number of columns
    • CSVWriter

      public CSVWriter(Writer out, int n, boolean alwaysQuote)
      Construct specifying if all fields are quoted. Fields must be quoted if a field contains a comma, a double quotation mark, a carriage return, or a new line character.
      Parameters:
      out - the output for this writer
      n - the number of columns
      alwaysQuote - true if each field should be quoted; false if fields should be quoted only when necessary
    • CSVWriter

      public CSVWriter(Writer out, int n, boolean alwaysQuote, LineReader.Delimiter delimiter)
      Construct specifying if all fields are quoted and specifying an end-of-line (EOL) delimiter for terminating rows. Fields will be quoted if a field contains a comma, a double quotation mark, a carriage return, a new line character, or the system-defined EOL sequence. If alwaysQuote is true, all fields will be quoted. The delimiter indicates the end-of-line sequence that terminates each row. If a field contains the system end-of-line sequence, that sequence will be replaced with the delimiter.
      Parameters:
      out - the output for this writer
      n - the number of columns
      alwaysQuote - true if each field should be quoted; false if fields should be quoted only when necessary
      delimiter - the end of line sequence used to terminate a row (LineReader.Delimiter.LF, LineReader.Delimiter.CR, or LineReader.Delimiter.CRLF); null for the system default (or LineReader.Delimiter.CRLF if the the system default is not LineReader.Delimiter.LF, LineReader.Delimiter.CR, or LineReader.Delimiter.CRLF)
  • Method Details

    • currentIndex

      public int currentIndex()
      Get the current index. The current index is a number in the range [0, n), where n is the number of columns.
      Returns:
      the current index
    • getDelimiter

      public LineReader.Delimiter getDelimiter()
      Get the delimiter. Fields will be quoted only if quoting is necessary (i.e., if the field contains a double quote, a comma, or a character that could be part of a line separator). The delimiter is used to terminate rows or lines
      Returns:
      the delimiter
    • close

      public void close() throws IOException

      If a field was partially created, nextField() will be called to create that field, and if a new row was not started, the current row will be padded with empty fields and terminated with an end-of-line sequence as specified by the delimiter.

      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in class Writer
      Throws:
      IOException
    • flush

      public void flush() throws IOException
      Specified by:
      flush in interface Flushable
      Specified by:
      flush in class Writer
      Throws:
      IOException
    • write

      public void write(char[] cbuf, int off, int len)
      Specified by:
      write in class Writer
    • writeField

      public void writeField(String string) throws IOException
      Write a single field. A call to nextField() is implicit.
      Parameters:
      string - the field
      Throws:
      IOException - an IO error occurred
    • writeRow

      public void writeRow(String... fields) throws IOException, IllegalArgumentException
      Write a row. If IO operations created the text for a field, but nextField() was not yet called, a call to nextField() will be inserted. If the number of fields is less than the number of columns specified in a constructor, the the line will be padded with empty fields. A new line sequence, as specified by the delimiter, will be added at the end.
      Parameters:
      fields - the fields to write
      Throws:
      IllegalArgumentException - the number of fields is larger than the number of columns specified in the constructor
      IOException - an IO error occurred
    • nextField

      public void nextField() throws IOException
      Start a new field. If the field is the last field in a row, a new row will be started automatically.
      Throws:
      IOException - an IO error occurred
    • nextRow

      public void nextRow() throws IOException
      End the current row. This method will pad the current row with empty fields if necessary. If a field was partially written but nextField() was not called, an implicit call to nextField() will be added. If called at the start of a row, a row with empty fields will be written.
      Throws:
      IOException - an IO error occurred
    • nextRowIfNeeded

      public void nextRowIfNeeded() throws IOException
      Terminate the current row unless the current row is empty. The row will be padded with empty fields if it is too short.
      Throws:
      IOException - an IO error occurred
    • getMediaType

      public static String getMediaType(boolean hasHeader, String charsetName)
      Get the media type for the output created by this CSV writer. The character-set names are those supported by the IETF. A list of valid names is provided by IANA. Examples include "US-ASCII" and "UTF-8".

      When hasHeader is true, the media type will contain a parameter named "header" that indicates that the first row of the contents represents a header - a set of fields with text describing or labeling each column.

      Parameters:
      hasHeader - true if the first line of the contents contains a header; false if it does not
      charsetName - the name of the character set used to encode an output stream; null if a charset parameter is not included in the media type
      Returns:
      the media type