ProjectTemplate

File formats

ProjectTemplate can automatically load a variety of text based file formats, including comma separated value (CSV) files, tab separated value (TSV) files and generic whitespace separated value (WSV) files. In addition, automatic data loading is supported for several binary file formats, including the RData, SPSS, Stata and SAS formats.

Beyond those file formats, several ad hoc file types support the loading of data sets that are accessible over HTTP or contained in SQL databases, such as MySQL and sqlite.

Please note that several of file formats have not been tested yet, including Weka files, DBF files, EPIInfo files, MTP files, Octave files, Systat files and SAS files. Because ProjectTemplate is simply wrapping the ‘foreign’ package, these file formats are expected to work, but we have not confirmed that yet. Your mileage may vary.

Supported File Extensions

.csv: CSV files that use a comma separator.
.csv.bz2: CSV files that use a comma separator and are compressed using bzip2.
.csv.zip: CSV files that use a comma separator and are compressed using zip.
.csv.gz: CSV files that use a comma separator and are compressed using gzip.
.csv2: CSV files that use a semicolon separator.
.csv2.bz2: CSV files that use a semicolon separator and are compressed using bzip2.
.csv2.zip: CSV files that use a semicolon separator and are compressed using zip.
.csv2.gz: CSV files that use a semicolon separator and are compressed using gzip.
.tsv: CSV files that use a tab separator.
.tsv.bz2: CSV files that use a tab separator and are compressed using bzip2.
.tsv.zip: CSV files that use a tab separator and are compressed using zip.
.tsv.gz: CSV files that use a tab separator and are compressed using gzip.
.tab: CSV files that use a tab separator.
.tab.bz2: CSV files that use a tab separator and are compressed using bzip2.
.tab.zip: CSV files that use a tab separator and are compressed using zip.
.tab.gz: CSV files that use a tab separator and are compressed using gzip.
.wsv: CSV files that use an arbitrary whitespace separator.
.wsv.bz2: CSV files that use an arbitrary whitespace separator and are compressed using bzip2.
.wsv.zip: CSV files that use an arbitrary whitespace separator and are compressed using zip.
.wsv.gz: CSV files that use an arbitrary whitespace separator and are compressed using gzip.
.dat: CSV files that use an arbitrary whitespace separator.
.dat.bz2: CSV files that use an arbitrary whitespace separator and are compressed using bzip2.
.dat.zip: CSV files that use an arbitrary whitespace separator and are compressed using zip.
.dat.gz: CSV files that use an arbitrary whitespace separator and are compressed using gzip.
.txt: CSV files that use an arbitrary whitespace separator.
.txt.bz2: CSV files that use an arbitrary whitespace separator and are compressed using bzip2.
.txt.zip: CSV files that use an arbitrary whitespace separator and are compressed using zip.
.txt.gz: CSV files that use an arbitrary whitespace separator and are compressed using gzip.
.RData: .RData binary files produced by save().
.rda: .RData binary files produced by save().
.rds: .RDS binary files produced by saveRDS().
.R: R source code files.
.r: R source code files.
.url: A DCF file that contains an HTTP URL and a type specification for a remote dataset.
.sql: A DCF file that contains database connection information for a MySQL database.
.xls: XLS files.
.xlsx: XLSX files.
.sav: Binary file format generated by SPSS.
.dta: Binary file format generated by Stata.
.arff: Weka’s ARFF files.
.dbf: DBF files.
.rec: EPIInfo .rec files.
.mtp: MTP files.
.m: Octave files.
.sys: Systat files.
.syd: Systat files.
.sas: SAS Xport files.
.xport: SAS Xport files.
.xpt: SAS Xport files
.db: A SQLite3 database in binary format.
.file: A DCF file describing the location of another file that should be loaded.
.mp3: MP3 audio files. (Uses the tuneR package.)
.ppm: PPM image files. (Uses the pixmap package.)
.feather: Feather files for data frames in the Apache Arrow format (uses feather package)

Ad Hoc File Types

URL Files

You can access CSV files over HTTP using the .url file extension. Inside of the .url file, you must place DCF that describes your data sources. An example file is shown below:

url: http://www.johnmyleswhite.com/ProjectTemplate/sample_data.csv

SQL Files

ProjectTemplate supports access to many of the most common databases. All databases use the .sql file extension. Inside of the .sql file, you must place DCF that describes the connection protocol for your database. Example files for the support databases are shown below.

MySQL:

type: mysql
user: sample_user
password: sample_password
host: localhost
dbname: sample_database
table: sample_table

SQLite:

type: sqlite
dbname: /path/to/sample_database
table: sample_table

type: sqlite
dbname: /path/to/sample_database
query: SELECT * FROM users WHERE user_active == 1

PostgreSQL:

type: postgres
user: sample_user
password: sample_password
host: localhost
dbname: sample_database
table: sample_table

ODBC:

type: odbc
dsn: sample_dsn
user: sample_user
password: sample_password
dbname: sample_database
query: SELECT * FROM sample_table

Oracle:

type: oracle
user: sample_user
password: sample_password
dbname: sample_database
table: sample_table

JDBC:

type: jdbc
class: org.jdbc.OracleDriver
classpath: /path/to/ojdbc5.jar (or set in CLASSPATH)
user: scott
password: tiger
url: jdbc:oracle:thin:@@myhost:1521:orcl
query: SELECT * FROM emp

Heroku PostgreSQL:

This is a special case of the JDBC driver. It requires the current PostgreSQL JDBC jar file.

type: heroku
classpath: /path/to/jdbc4.jar (or set in CLASSPATH)
user: scott
password: tiger
host: heroku.postgres.url
port: 1234
dbname: herokudb
query: select * from emp

.file Files

You can load data that is not stored in the current project using a .file file. You must specify the path and the extension that the file would have, if it were being loaded by the standard ProjectTemplate auto-loader. An example is shown below that would load an SQLite3 database stored in a separate location:

path: /path/to/sample_database
extension: db

Future Support For Data Sources

It is possible to provide support for new data sources by hooking into ProjectTemplate. The ElasticSearch reader is a working example of how to achieve this. We are looking forward to linking to your custom readers for new data sources, such as SQL Server, MongoDB or CouchDB. Please use the mailing list to get in touch with us.