Add
the file csvdriver.jar to your classpath or extract the jar file in the directory of the
application.
Description |
Classes |
Driver class (JDBC API v1.0) |
jstels.jdbc.csv.CsvDriver2 |
Data Source class (JDBC API v2.0) |
jstels.jdbc.csv.CsvDataSource2 |
Connection Pool Data Source class (JDBC API v2.0) |
jstels.jdbc.csv.CsvConnectionPoolDataSource2 |
The connection URL is jdbc:jstels:csv:csvdir, where csvdir may be the following:
absolute or relative path to the local directory containing text files, e.g.:
jdbc:jstels:csv:c:/mydir/csvfiles jdbc:jstels:csv:csvfiles |
path to the directory entry within the zip (jar) file, e.g.:
jdbc:jstels:csv:zip://c:/myArchive.zip/csvfiles jdbc:jstels:csv:zip://myApp.jar/csvfiles |
path to the resource directory within the CLASSPATH, e.g.:
jdbc:jstels:csv:classpath://resources |
FTP URL to the FTP-server directory (syntax: ftp://user:password@hostname[:port]/[dirpath/]), e.g.:
jdbc:jstels:csv:ftp://login:password@somesite.com:21/csvfiles |
FTP URL to a ZIP archive containing CSV/text files (syntax: ftp://user:password@hostname[:port]/[dirpath/]zipfile), e.g.:
jdbc:jstels:csv:zip://ftp://login:password@somesite.com:21/archives/csvfiles.zip |
SFTP URL to the SFTP-server directory (syntax: sftp://user:password@hostname[:port]/[dirpath/], also required third-party libraries Commons VFS, Commons Logging and JSch for this protocol), e.g.:
jdbc:jstels:csv:sftp://login:password@somesite.com:22/csvfiles |
HTTP URL to the web-server directory, e.g.:
jdbc:jstels:csv:http://www.somesite.com/csvfiles |
HTTP URL to the dynamic server page (JSP, ASP, PHP, CGI, etc) that returns a CSV/text file as output. For more information please see useWebParam driver property.
jdbc:jstels:csv:http://www.somesite.com/out.jsp?param1=value1??useWebParam=tablename |
SMB/CIFS URL to a directory located on a SMB/CIFS server (e.g.: MS Windows share or Samba server, syntax: smb://[user:password@]hostname/share/[dirpath/]):
jdbc:jstels:csv:smb://your_server/your_share/your_folder jdbc:jstels:csv:smb://login:password@your_server/your_share/your_folder |
Note: a path to some particular CSV/text file can be specified directly in a SQL query, e.g.:
absolute file path to a file, e.g.:
SELECT * FROM "c:/csvfiles/test.txt" |
relative path to a file:
SELECT * FROM "subfolder/test.txt" |
other URL path:
SELECT * FROM "http://www.somesite.com/csvfiles/test.txt" SELECT * FROM "zip://c:/csvfiles.zip/dir/test.txt" SELECT * FROM "ftp://login:password@somesite.com:21/csvfiles/test.txt" |
The
driver supports a number of parameters that change default behavior of the driver.
These
properties are the following:
charset is used to specify a different than the default charset encoding for a CSV/text file (Default is the JVM default charset)
commentLine is used to specify a string denoting comment line (By default is not-defined)
dateFormat is used to specify a format for date/time values. A value of this property is a sequence of date/time formats separated by the '|' character, e.g: "dd.MM.yy | dd.MM | dd". (Default is "yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss | yyyy-MM-dd | HH:mm:ss.SSS | HH:mm:ss"). For more details about date/time format used in the driver please see the chapter "time format syntax" in the java.util.SimpleDateFormat class documentation.
defaultColumnType is used to specify the default data type for columns for a CSV/text file (Default is "Varchar").
decimalFormatInput, decimalFormatOutput are used to specify input and output formats for floating point values in a CSV/text file. For instance, you can use these formats to specify currency values, e.g.: "###,###.##$". For more information about patterns used in "decimalFormat" please see the documentation for java.text.DecimalFormat class.
dbInMemory, dbPath are used to set a driver mode. For more details see driver modes.
fileExtension is used to specify a different file extension (Default is ".txt")
If extension is set to ".txt", then both "myTable.txt" and myTable are valid.
Note: You should stick to one of these table naming types in the SQL queries, i.e. either "myTable.txt" or myTable.
ignoreCase. The property sets the case sensitivity for text columns. By default the driver are not case sensitive, i.e. the default value for this property is true.
rowDelimiter is used to specify a custom row delimiter for a CSV/text file. (Default is a line separator specified by JVM environment variable "line.separator", e.g. for Windows - "\r\n", for Linux - "\r")
schema is used to specify a path to the schema file. It can be absolute or relative to a CSV directory path, e.g.: "c:/schemas/schema1.xml" or "schemas/schema2.xml". (Default is "schema.xml")
separator
is
used to specify a different column separator (Default is '\t' (tab))
suppressHeaders is used to specify if the first line contains column header information (Default is false; column header are on first line).
watchFileModifications tells the driver to start an additional thread that checks regularly (every second once by default) if the file was modified by some external process. If the file was modified, the driver refreshes it in the cache. (Default value is false). This property is recommended, if your files are modified by an external program / process and you want the changes to be immediately available to the driver. Also, use the property "checkPeriod" to set regularity of the file check. By default it is 1000 (1000 ms)
Advanced Properties
escapeEOLInQuotes is used to escape row delimiters within double quotes. (Default is false).
escapeSeparatorInQuotes is used to escape column separators within double quotes. (Default is true).
logPath is used to set a file path to the log file.
emptyStringAsNull. If 'emptyStringAsNull' is set to 'true', empty strings (i.e. "") are treated as NULL values. (By default is true).
nullString is a string value that is treated as the NULL value. Not case sensitive. (By default is "NULL").
paddingChar - padding char for fixed-length files. (By default is space).
preSQL is used to specify a path to a SQL script that will be executed after creating a driver connection, e.g. c:/sql_script.txt. It may be useful for creating table indexes and other preparations.
singletonConnection. If "singletonConnection" is set to "true", the driver creates only one java.sql.Connection instance for each unique JDBC URL and reuses that connection for all threads. It allows the driver to increase up performance and reduce memory usage in multithreading environments like web/application servers Tomcat, GlassFish, WebSphere, etc. For example JDBC URL with this property may look like the following: jdbc:jstels:csv:c:/mydir/csvfiles?property1=value1&property2=value2&singletonConnection=true. Default value is false.
trimBlanks. If 'trimBlanks' is set to 'true', the driver trims leading and trailing spaces for string values when reading a text file. (By default is true).
useWebParam is used to specify the name of the web parameter that will be used to pass a CSV/text file name to the dynamic server page. For instance, if you specify the following value: 'tablename', the driver will access server page using the following HTTP URL:
http://www.site.com/out.jsp?tablename=sometable
(where 'sometable' is the table specified in the SQL query or the schema file)
Note: If you specify driver properties directly in the driver URL and
server page has its own parameters as well, you should separate them with '??':
jdbc:jstels:csv:http://www.site.com/out.jsp?param1=value1??useWebParam=tablename&suppressHeaders=true
quoteString is used to
enable/disable writing of double quotes for string values while inserting and
updating records in a CSV/text file (Default is true).
To set the driver properties, you can use one of the following ways:
1) using Properties class:
java.util.Properties props = new java.util.Properties();
props.put("separator", "|"); // separator is a bar props.put("suppressHeaders", "false"); // column headers are on the first line props.put("fileExtension", ".txt"); // file extension is .txt props.put("charset", "ISO-8859-2"); // file encoding is "ISO-8859-2" props.put("commentLine", "--"); // string denoting comment line is "--" // date/time format props.put("dateFormat", "yyyy-MM-dd HH:mm | dd/MM/yyyy");
Connection conn = DriverManager.getConnection("jdbc:jstels:csv:c:/csvfiles", props); |
2) using a data source class jstels.jdbc.csv.CsvDataSource2 class:
CsvDataSource2 csvDS = new CsvDataSource2();
csvDS.setPath("c:/csvfiles"); // path to the CSV directory csvDS.setSeparator("|"); // separator is a bar csvDS.setSuppressHeaders(false); // column headers are on the first line csvDS.setFileExtension(".txt"); // file extension is .txt
Connection conn = csvDS.getConnection(); |
3) appending the properties to the URL (multiple properties should be separated by '&' or '!' character):
Connection conn =
DriverManager.getConnection(
"jdbc:jstels:csv:c:/csvfiles?suppressHeaders=true&dateFormat=yyyy-MM-dd HH:mm | dd/MM/yyyy&dbInMemory=false"); |
4) using local properties specified in the schema for some particular file(s).
<schema>
<table
name = "my_table1.txt">
<column name = "my_field1"
type = "Integer"/>
<!--
this tag assigns the SQL data type to the column by its name in the text file-->
<column pos = "2" type = "Integer"/>
<!--
this tag assigns the SQL data type to the column by its position in the text
file-->
<column name = "my_field3" pos = "3" type = "Integer"/>
<!--
if you do not use the column header in the text file, you should set column names by using
both the 'pos' and the 'name' attributes-->
</table>
<!--
you
can also use file templates, if your files have the same format-->
<table
name = "*.csv">
<!-- all files with the template "*.csv" --><!-- the wildcard '*' denotes any string of zero or more characters -->
<column name = "col1"
type = "Varchar"/>
<column name = "col2"
type = "Integer"/>
</table>
<table
name = "file????.*">
<!-- all files with the template "file????.*" --><!-- the wildcard '?' denotes any single character --> <column name = "id"
type = "Integer"/>
<column name = "descr"
type = "Varchar"
size = "50"/>
<column name = "num"
type = "Decimal"
size = "15" decimalCount = "2"/>
<!-- the 'size' attribute specifies the maximum number of characters for the VARCHAR type or total number of digits that can be stored for the DECIMAL type --><!-- the 'decimalCount' attribute specifies the maximum number of digits that can be stored to the right of the decimal separator --></table><!-- absolute path to CSV/text files with the template '*' " --> <table
name = "c:/csvfiles/*.txt">
<column name = "id"
type = "Integer"/>
<column name = "name"
type = "Varchar"
size = "50"/>
</table>
<!-- absolute path to a particular CSV/text file" --> <table
name = "zip://c:/csvfiles.zip/dir/test.txt"
constraint = "PRIMARY KEY(product_id, region_id), UNIQUE (min_price), CHECK
min_price > 100">
<!-- the 'constraint' attribute specifies a table constraint like a primary key, unique value, etc -->
<column name = "product_id"
type = "Integer"/>
<column name = "region_id"
type = "Integer"/>
<column name = "min_price"
type = "Integer"/>
<column name = "max_price"
type = "Integer"/>
</table><!-- you can also set local properties for each table by using attributes of the same name " -->
<table
name = "my_table2.txt"
charset = "ISO-8859-2"
suppressHeaders =
"true"
commentLine = "--"
separator = ","
dateFormat = "dd/MM/yyyy">
<column name = "my_field1"
pos = "1" type = "Integer"/>
<column name = "my_field2" pos = "2" type = "Integer"/> <column name = "my_field3"
pos = "3" type = "Date"/>
</table>
<!-- If you do not use the column header in a text file ("suppressHeaders" is true), you should set column names by using both the 'pos' and the 'name' attributes --> <table
name = "file_without_header.txt"
suppressHeaders =
"true">
<column
name
= "my_field1"
pos
= "1" type = "Integer"/>
<column name = "my_field2" pos = "2" type = "Integer"/> <column
name
= "my_field3"
pos
= "3" type = "Date"/>
</table>
<!-- The driver also supports the fixed-length files. For this, you should set the "fixed" value for the "separator" parameter. -->
<table
name =" fixed_length_file.txt"
separator = "fixed">
<column name = "id"
begin
= "1"
end
= "5" type = "Integer"/>
<!-- The "begin" attribute sets the initial position of a column in a text file and the "end" attribute sets its final position. --><column name = "name" begin = "6" end = "20" type = "Varchar"/><column name = "birthdate" begin = "21" end = "28" type = "Date"/> </table>
</schema> |
Column names specified in the schema don't override column names in the header of a text file ("suppressHeaders" is false).
Local driver properties specified in the schema override global properties.
Data Type |
JDBC returned type (java.sql.Types.*) |
Java class used |
AUTOINCREMENT |
java.sql.Types.BIGINT |
java.lang.Long |
Int, Integer, Tinyint, Smallint, SHORT |
java.sql.Types.INTEGER |
java.lang.Integer |
Long, Bigint |
java.sql.Types.BIGINT |
java.lang.Long |
Float, Real |
java.sql.Types.FLOAT |
java.lang.Float |
Double |
java.sql.Types.DOUBLE |
java.lang.Double |
BIGDECIMAL, DECIMAL, NUMERIC (recommended for storing currency values) |
java.sql.Types.DECIMAL |
java.math.BigDecimal |
String, Char, Varchar |
java.sql.Types.VARCHAR |
java.lang.String |
Datetime, Date, Time, Timestamp |
java.sql.Types.TIMESTAMP |
java.util.Date |
BOOLEAN, LOGICAL, BIT |
java.sql.Types.BOOLEAN |
java.lang.Boolean |
Notes:
DOUBLE and FLOAT values should not be used to represent currency values, because of rounding problems.
The DECIMAL (NUMERIC) type is slower and requires more storage than the FLOAT and DOUBLE types.
Parsing and formatting takes longer for the DATETIME (TIMESTAMP) type than the numeric types.
Text types are slower to read,
write, and compare than numeric types and generally require more storage.
An SQL query must meet the following conditions:
It must correspond to the SQL syntax supported by the driver. For detailed information about supported SQL syntax please see the specification here.
À column using an SQL reserved word as a name or containing spaces and other delimiters must be quoted in double quotes in a query, e.g.: SELECT "DATE", "My integer-column" FROM "test.txt"
To use single quotes (') within a string constant you should duplicate them, e.g.: SELECT 'a''bcd''efgh'
// ---- SELECT queries ---
SELECT
SUM(a)
AS col1, MAX(b) /
MAX(c) AS col2
FROM "test.txt" GROUP
BY a HAVING AVG(a) >
30
|
See also:
Connection Example
This example code shows how the driver is used. You can download it here.
import java.sql.*;
public class DriverTest { public static void main(String[] args) { try { // load the driver into memory Class.forName("jstels.jdbc.csv.CsvDriver2");
// create a connection. The first command line parameter is assumed to // be the directory in which the .csv files are held Connection conn = DriverManager.getConnection("jdbc:jstels:csv:" + args[0] );
// create a Statement object to execute the query with Statement stmt = conn.createStatement();
// execute a query ResultSet rs = stmt.executeQuery("SELECT * FROM \"test.txt\"");
// read the data and put it to the console for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) { System.out.print(rs.getMetaData().getColumnName(j) + "\t"); } System.out.println();
while (rs.next()) { for(int j=1; j <= rs.getMetaData().getColumnCount(); j++){ System.out.print(rs.getObject(j)+ "\t"); } System.out.println(); }
// close the objects rs.close(); stmt.close(); conn.close(); } catch(Exception e) { e.printStackTrace(); } } }
|
The driver works in the following way: it loads
data from files to an intermediate database (further referred to as "synchrobase") that is used for running SQL queries and synchronizing changes between this
database and external CSV/text files.
The driver can work in the following three modes:
1) With a temporary synchrobase in RAM. The synchrobase is
created in RAM and is removed from it after the connection is closed. It is a
default mode.
2) With a temporary synchrobase on the hard drive. The
synchrobase is created on the HDD every time a connection is opened and deleted
after it’s closed. To use this mode, set the driver property dbInMemory to
false.
Connection conn = DriverManager.getConnection("jdbc:jstels:csv:c:/csvfiles?dbInMemory=false&tempPath=c:/tempfiles"); |
3) With a persistent synchrobase on the hard drive.
The synchrobase is created just once and is re-used afterwards. To use this mode
set the property dbPath to the file path where is synchrobase will be stored.
Connection conn = DriverManager.getConnection("jdbc:jstels:csv:c:/csvfiles?dbPath=c:/synchrobases/syncro_db_name&tempPath=c:/tempfiles "); |
In the first mode, all synchrobase data are stored in the system RAM, which
ensures maximum performance. Keep in mind that Java Virtual Machine must have
enough free memory allotted for storing large tables (use -Xms and -Xmx JVM
options). You can also use the DROP TABLE <table name> FROM CACHE command to
force tables to be removed from the cache.
In the second case, a temporary synchrobase is
created on the hard drive. This mode is used for processing large files, since
it uses a minimum of RAM. The synchrobase is deleted after the connection is
closed
In the third mode, the synchrobase is created once and is reused afterwards.
This is the optimal mode, but it requires additional files to store a synchrobase on the hard drive.
Note: if your files are modified by an external program / process and you want
the changes to be immediately available to the driver, consider using
watchFileModifications property.
There are also some properties for configuring these modes:
tempPath - directory where temporary files will be created (by default it is a OS temporary directory, specified by JVM environment variable "java.io.tmpdir"). It is recommended to set your own value.
Example:
Properties props = new java.util.Properties();
props.setProperty("dbInMemory",
"false");
// switch to the second mode
(a temporary synchrobase on the hard drive) |
You can use your own SQL functions in the driver. To use this feature, you should do the following:
1) Create a static method that will act as an SQL function. Mind that:
The class and method must have the public modifier
The number of arguments in the method is arbitrary, but they must belong to one of the java classes specified in the data types table
It is advisable to process the null values for arguments in the method code
The method can return the null value as well
For example:
package my_pack; public class MyFuncs{ // user-defined SQL function that formats the given argument into // a date/time string with specified format
public static
String format_date(java.util.Date d, String format) {
if (d ==
null || format == null)
return
sdf.format(d); |
2) Register the function by executing CREATE ALIAS ... FOR command. Use an unique name for each function.
For example:
CREATE ALIAS IF NOT EXISTS format_date FOR "my_pack.MyFuncs.format_date(java.util.Date, java.lang.String)" |
Also, you can use the driver property function:<my_func>.
For example:
Properties props = new java.util.Properties(); props.put("function:formate_date","my_pack.MyFuncs.format_date"); Connection conn = DriverManager.getConnection("jdbc:jstels:csv:c:/csvfiles",props);
// or append this property to the URL
Connection conn2 = DriverManager.getConnection("jdbc:jstels:csv:c:/csvfiles"
+ "?function:formate_date=my_pack.MyFuncs.format_date"); |
3) Call the function in an SQL query
For example:
Statement st = conn.createStatement(); st.execute("SELECT format_date(date_column, 'yyyy-MM-dd') FROM \"test.txt\""); |
Wherever possible use the java.sql.PreparedStatement class for SELECT, INSERT, UPDATE and DELETE operations
For bulk INSERT, UPDATE and DELETE operations use batching, i.e.:
Statement st = connection.createStatement(); st.addBatch("INSERT INTO test VALUES(1, 'string 1')"); // ... other INSERT, UPDATE and DELETE operations st.executeBatch(); |
Describe your tables in "schema.xml" and specify data types and sizes for columns.
Create indexes using CREATE INDEX operation or use the constraint attribute in the "table" tag.
Use the same connection (java.sql.Connection instance) for multiple threads.
Set the property singletonConnection to "true" for web/application servers like Tomcat, GlassFish, WebSphere, etc.
If your tables are big (~> 100 MB), use modes with a syncrobase on the hard drive. For more information see Driver modes.
Do not forget to close java.sql.ResultSet, java.sql.Statement and java.sql.Connection instances.
Use a modern JVM. Upgrading to the latest version of your JVM can provide a "free" boost to performance.