IBM SPSS Statistics Data File Driver Guide
IBM SPSS Statistics Data File Driver Guide
Guide
IBM
Note
Before using this information and the product it supports, read the information in “Notices” on page
25.
Product Information
This edition applies to version 28, release 0, modification 0 of IBM® SPSS® Statistics and to all subsequent releases and
modifications until otherwise indicated in new editions.
© Copyright International Business Machines Corporation .
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with
IBM Corp.
Contents
Chapter 1. Overview.............................................................................................. 1
Notices................................................................................................................25
Trademarks................................................................................................................................................ 26
Index.................................................................................................................. 27
iii
iv
Chapter 1. Overview
The IBM SPSS Statistics data file driver allows you to read IBM SPSS Statistics (.sav and .zsav) data files
in applications that support Open Database Connectivity (ODBC) or Java Database Connectivity (JDBC).
IBM SPSS Statistics itself supports ODBC in the Database Wizard, providing you with the ability to
leverage the Structured Query Language (SQL) when reading .sav and .zsav data files in IBM SPSS
Statistics.
There are three flavors of the IBM SPSS Statistics data file driver, all of which are available for Windows,
UNIX, and Linux:
• Standalone driver. The standalone driver provides ODBC support without requiring installation of
additional components. After the standalone driver is installed, you can immediately set up an ODBC
data source and use it to read .sav and .zsav files.
• Service driver. The service driver provides both ODBC and JDBC support. The service driver handles
data requests from the service client driver, which may be installed on the same computer or on one or
more remote computers. Thus you can configure one service driver that may be used by many clients. If
you put your data files on the same computer on which the service driver is installed, the service driver
can reduce network traffic because all the queries occur on the server. Only the resulting cases are sent
to the service client. If the server has a faster processor or more RAM compared to service client
machines, there may also be performance improvements.
• Service client driver. The service client driver provides an interface between the client application that
needs to read the .sav or .zsav data file and the service driver that handles the request for the data.
Unlike the standalone driver, it supports both ODBC and JDBC. The operating system of the service
client driver does not need to match the operating system of the service driver. For example, you can
install the service driver on a UNIX machine and the service client driver on a Windows machine.
Using the standalone and service client drivers is similar to connecting to a database with any other ODBC
or JDBC driver. After configuring the driver, creating data sources, and connecting to the IBM SPSS
Statistics data file, you will see that the data file is represented as a collection of tables. In other words,
the data file looks like a database source. For information about installing and configuring the drivers, see
Chapter 2, “Installation and Configuration,” on page 3 . For information about the tables and table
relationships, see Chapter 3, “Database Schema Reference,” on page 15 .
2 IBM SPSS Statistics Data File Driver Guide
Chapter 2. Installation and Configuration
This section provides information for installing the standalone driver, the service driver, and the service
client driver.
What Do I Install?
Accessing data files thorough ODBC. If you want to access data files through ODBC, the easiest solution
is to install the standalone driver. However, the standalone driver works only with ODBC. If you need to
access the data file through JDBC, you must install both the service driver and the service client driver on
the same computer.
Accessing data files through JDBC. If you want to access data files through JDBC, you must install the
service driver on the remote computer. Then you install the service client driver on the computer that
needs to access the data on the remote computer. The service driver also supports ODBC, so it has the
added advantage of handling both ODBC and JDBC.
Reducing network traffic and increasing performance. You may also want to install the service driver
and the service client driver if you want to reduce network traffic and/or improve performance. If you put
your data files on the same computer on which the service driver is installed, the service driver can reduce
network traffic because all the queries occur on the server. Only the resulting cases are sent to the service
client. If the server has a faster processor or more RAM compared to service client machines, there may
also be performance improvements.
For information about installing the standalone driver, see “Installing and Configuring the Standalone
Driver” on page 3 . For information about installing the service driver, see “Installing and Configuring
the Service Driver” on page 5 . For information about installing the service client driver, see “Installing
and Configuring the Service Client” on page 7 .
-or-
gunzip statistics_datadrv_standalone_linux32.tar.Z
tar -xvf statistics_datadrv_standalone_linux32.tar
3. For Red Hat Linux 7.x or 8.x, run the following commands to download and install required libraries:
wget http://mirror.centos.org/centos/7/os/x86_64/Packages/compat-libstdc++-33-3.2.3-72.el7.x86_64.rpm
yum localinstall compat-libstdc++-33-3.2.3-72.el7.x86_64.rpm
yum install libquadmath.x86_64
5. Answer the prompts. Press Enter to accept the default for any of the prompts.
2. If you modified an existing odbc.ini file, be sure to remove the IBM SPSS Statistics data sources.
Upgrading
If you are installing a new version of the service driver on a computer on which an older version of the
service driver is installed, you need to do one of the following, depending on whether you want to keep
the older version:
• Uninstall the old version of the service driver before installing the new version. Older service clients
should be able to connect to the newer service driver. However, users requesting data from a JDBC
source will need to ensure that the URL is correct. The URL has changed in some versions.
or
• If you want to keep the old version, install the new version with a different port number. Be sure to
communicate the port number to other users so they know which port number to use with the service
client.
Upgrading
If you are installing a new version of the service driver on a computer on which an older version of the
service driver is installed, you need to do one of the following, depending on whether you want to keep
the older version:
• Uninstall the old version of the service driver before installing the new version. Older service clients
should be able to connect to the newer service driver. However, users requesting data from a JDBC
source will need to ensure that the URL is correct. The URL has changed in some versions.
or
• If you want to keep the old version, install the new version with a different port number. Be sure to
communicate the port number to other users so they know which port number to use with the service
client.
• If you are installing the service driver on AIX or HP-UX, you cannot install from an NFS-formatted mount
directory. Copy the installer file to a local disk before proceeding.
1. Open a terminal application.
2. Uncompress and untar the installer file. For example:
tar -xvvzf statistics_datadrv_service_linux64.tar.Z
-or-
gunzip statistics_datadrv_service_linux64.tar.Z
tar -xvf statistics_datadrv_service_linux64.tar
4. Answer the displayed prompts. Press Enter to accept the default for a particular prompt. The default
host name is localhost. If remote users are going to access the service, be sure to change the
default to the server computer's network name or IP address. Also, if the default port is in use by
another application, be sure to change it.
2. If the driver daemon is not running, open a terminal application and change directories to the admin
subdirectory within the installation directory.
3. Run the startup script:
./startStatisticsDataDriverService.sh
The service is now ready to accept connections from the service client driver. Note that the admin
subdirectory also contains a script for stopping the daemon (stopStatisticsDataDriverService.sh).
• <hostname> is the host name or IP address of the computer on which the service driver is running.
• <port> is the port number on which the service is listening for connections.
• <path_to_SAV_file> is the full path to the data file, relative to the computer on which the service
driver is running. This path cannot contain an equals sign (=) or semicolon (;).
• The UserMissingIsNull part of the connect string is optional and specifies the treatment of user-
defined missing values. 0 indicates that user-defined missing values are read as valid values. 1
indicates user-defined missing values are set to system-missing for numeric variables and blank for
string variables. If UserMissingIsNull is not specified, it is set to a default value of 1.
• The MissingDoubleValueAsNAN part of the connect string is optional and specifies the treatment
of missing numeric values. 0 indicates that user missing values are displayed with the original
missing value in the data file. 1 indicates that user missing values are read as not a number (NaN).
For JDBC, UserMissingIsNull should always be set to 1.
• DBUID and DBPWD are optional and specify the user name and password of a password-protected
SAV file. If the file is not password protected, these properties are unnecessary.
Following is a complete example (this should be entered on one line):
jdbc:spssstatistics://localhost:18886;ServerDatasource=SAVDB;
CustomProperties=(CONNECT_STRING=/home/user/data/Employee data.sav;UserMissingIsNull=1;
MissingDoubleValueAsNAN=1)
Installing and Configuring the Service Client Driver on UNIX and Linux
• If you are installing the service driver on AIX or HP-UX, you cannot install from an NFS-formatted mount
directory. Copy the installer file to a local disk before proceeding.
1. Open a terminal application.
2. Uncompress and untar the installer file. For example:
tar -xvvzf statistics_datadrv_service_client_linux64.tar.Z
-or-
gunzip statistics_datadrv_service_client_linux64.tar.Z
tar -xvf statistics_datadrv_service_client_linux64.tar
How to Configure the Service Client Driver on UNIX and Linux for ODBC
For use with an ODBC DSN, configuration of the driver on UNIX and Linux requires updating the odbc.ini
file and setting environment variables. You can also use the driver without a DSN. See the topic “Using
ODBC Without Using a Data Source Name” on page 11 for more information.
In the installation directory, you can find an example odbc.ini file with a default data source. You will also
find a shell script named savodbc.sh. The shell script includes the commands for setting up your
environment correctly. You can run the file directly (for example, . savodbc.sh) or you can copy the
contents of the shell script for use elsewhere. For example, you could copy the contents of the shell script
and paste them into the statistics shell script located in the bin subdirectory of the IBM SPSS Statistics
installation directory. Doing so will allow IBM SPSS Statistics to take advantage of the configured ODBC
data sources.
See the odbc.ini file for an example of how you can add IBM SPSS Statistics Data File ODBC sources.
Following are descriptions of the fields for each data source.
Driver
This points to the ivoa22.so file located in the lib subdirectory of the installation directory.
Host
The host name or IP address of the computer on which the service driver is running. If the service is
running on the same machine as the client, you can use the default setting of localhost.
Port
The port number on which the service is listening for connections. The default port number is
automatically entered. Unless the port number was explicitly changed, keep the default.
ServerDataSource
This specifies the type of data source.
SAVDB
A SAV file that is not password protected.
PASSWORD-PROTECTED-SAVDB
A SAV file that is password protected.
CustomProperties
This is always set to CONNECT_STRING=/path/to/sav/file;UserMissingIsNull=<0|
1>;MissingDoubleValueAsNAN=<0|1>. For PASSWORD-PROTECTED-SAVDB data sources, this
string can also include ;DBUID=<user_name>;DBPWD=<password> to specify the user name and
password for the password-protected SAV file.
• The path to the SAV file is relative to the computer on which the service is running.
• The path to the SAV file cannot contain an equals sign (=) or semicolon (;).
• The UserMissingIsNull part of the connect string is optional and specifies the treatment of user-
defined missing values. 0 indicates that user-defined missing values are read as valid values. 1
indicates user-defined missing values are set to system-missing for numeric variables and blank for
string variables.
• The MissingDoubleValueAsNAN part of the connect string is optional and specifies the treatment
of missing numeric values. 0 indicates that user missing values are displayed with the original
missing value in the data file. 1 indicates that user missing values are read as not a number (NaN).
For ODBC, UserMissingIsNull should always be set to 0.
• If UserMissingIsNull or MissingDoubleValueAsNAN is not specified, it is set to a default
value of 1.
• <hostname> is the host name or IP address of the computer on which the service driver is running.
• <port> is the port number on which the service is listening for connections.
• <path_to_SAV_file> is the full path to the data file, relative to the computer on which the service
driver is running. This path cannot contain an equals sign (=) or semicolon (;).
• The UserMissingIsNull part of the connect string is optional and specifies the treatment of user-
defined missing values. 0 indicates that user-defined missing values are read as valid values. 1
indicates user-defined missing values are set to system-missing for numeric variables and blank for
string variables. If UserMissingIsNull is not specified, it is set to a default value of 1.
• The MissingDoubleValueAsNAN part of the connect string is optional and specifies the treatment
of missing numeric values. 0 indicates that user missing values are displayed with the original
missing value in the data file. 1 indicates that user missing values are read as not a number (NaN).
For JDBC, UserMissingIsNull should always be set to 1.
• DBUID and DBPWD are optional and specify the user name and password of a password-protected
SAV file. If the file is not password protected, these properties are unnecessary.
Following is a complete example (this should be entered on one line):
jdbc:spssstatistics://localhost:18886;ServerDatasource=SAVDB;
CustomProperties=(CONNECT_STRING=/home/user/data/Employee data.sav;UserMissingIsNull=1;
MissingDoubleValueAsNAN=1)
2. If you modified an existing odbc.ini file, be sure to remove the IBM SPSS Statistics data sources.
GET DATA
/TYPE=ODBC
• DRIVER. Instead of specifying a DSN (data source name), the CONNECT statement specifies the driver
name. You could define DSNs for each IBM SPSS Statistics data file that you want to access with the
ODBC driver (using the ODBC Data Source Administrator on Windows), but specifying the driver and all
other parameters on the CONNECT statement makes it easier to reuse and modify the same basic syntax
for different data files. The driver name is always IBM SPSS Statistics <version> Data File
Driver - Standalone, where <version> is the product version number.
• SDSN. This is set to PASSWORD-PROTECTED-SAVDB to indicate a password-protected data file. If the
file were not password protected, this would be set to SAVDB.
• HST. This specifies the location of the oadm.ini file. It is located in the cfg sub-directory of the driver
installation directory.
• PRT. This is always set to StatisticsSAVDriverStandalone.
• CP_CONNECT_STRING. The full path and name of the IBM SPSS Statistics data file. This path cannot
contain an equals sign (=) or semicolon (;).
• CP_UserMissingIsNull. This specifies the treatment of user-defined missing values. If it is set to 0,
user-defined missing values are read as valid values. If it is set to 1, user-defined missing values are set
to system-missing for numeric variables and blank for string variables. In this example, the user-defined
missing values will be read as valid values and then the original user-missing definitions will be
reapplied with APPLY DICTIONARY.
• CP_DBUID. The user name for the password-protected data file.
• CP_DBPWD. The password for data file.
• SQL. The SQL subcommand uses standard SQL syntax to specify the variables (fields) to include, the
name of the database table, and the case (record) selection rules.
• SELECT specifies the subset of variables (fields) to read. In this example, the variables age, marital,
inccat, and gender.
• FROM specifies the database table to read. The prefix is the name of the IBM SPSS Statistics data file.
The Cases table contains the case data values.
• WHERE specifies the criteria for selecting cases (records). In this example, males over 40 years of age.
• APPLY DICTIONARY applies the dictionary information (variable labels, value labels, missing value
definitions, and so forth) from the original IBM SPSS Statistics data file. When you use GET DATA /
TYPE=ODBC to read IBM SPSS Statistics data files, the dictionary information is not included, but this is
easily restored with APPLY DICTIONARY.
Using the Service Client ODBC Driver Without a Data Source Name
This example uses the service client ODBC driver to select a subset of variables and cases when reading a
data file in IBM SPSS Statistics format into IBM SPSS Statistics.
GET DATA
/TYPE=ODBC
/CONNECT=
"DRIVER=IBM SPSS Statistics 19 Data File Driver - Service Client;"
"SDSN=SAVDB;"
"HST=myserver;"
"PRT=18886;"
"CP_CONNECT_STRING=C:\examples\data\demo.sav;"
"CP_UserMissingIsNull=0"
/SQL="SELECT age, marital, inccat, gender FROM demo.Cases "
"WHERE (age > 40 AND gender = 'm')"
CACHE.
EXECUTE.
APPLY DICTIONARY FROM '/examples/data/demo.sav'.
This section describes the database schema for the IBM SPSS Statistics data file.
Tables
There are several tables that may be associated with the IBM SPSS Statistics data file. The tables provide
detailed information about variables, cases, attributes, multiple response sets, and variable sets. In many
situations, you can use the CasesView table by itself. This table retrieves all cases and displays data value
labels if available.
Properties Table
The Properties table specifies the general properties for the IBM SPSS Statistics data file.
Variables Table
The Variables table defines the variables in the IBM SPSS Statistics data file. If a specific variable has any
defined value labels, the Variables table is linked to one or more VLVAR<var_name> tables. The
ValueLabelTableName column identifies the specific VLVAR<var_name> table for each variable with
defined value labels.
1-A
2 - AHEX
3 - COMMA
4 - DOLLAR
5-F
6 - IB
7 - IBHEX
8-P
9 - PIB
10 - PK
11 - RB
12 - RBHEX
15 - Z
16 - N
17 - E
20 - DATE
21 - TIME
22 - DATETIME
23 - ADATE
24 - JDATE
25 - DTIME
26 - WKDAY
27 - MONTH
28 - MOYR
29 - QYR
30 - WKYR
31 - PERCENT
32 - DOT
33 - CCA
34 - CCB
35 - CCC
36 - CCD
37 - CCE
38 - EDATE
39 - SDATE
0 -Left
1 - Right
2 - Center
0 - Unknown
1 - Nominal
2 - Ordinal
3 - Scale
4 - Flag
5 - Typeless
Role SMALLINT A number indicating the predefined role for the variable.
0 - Input
1 - Target
2 - Both
3 - None
4 - Partition
5 - Split
6 - Frequency
7 - Record ID
VLVAR<var_name> Table
There can be more than one VLVAR<var_name> table. Each VLVAR<var_name> table defines value labels
for a specific variable. The ValueLabelTableName column in the Variables table identifies the associated
VLVAR<var_name> table for each variable with defined value labels.
Attributes Table
The Attributes table identifies the defined data file attributes. The AttributeTableId column is linked to the
AttributeTableId column in the AttributeValues table.
VarAttributes Table
The VarAttributes table identifies the defined variable attributes. The VarName column is linked to the
VarName column in the Variables table. The AttributeTableId column is linked to the AttributeTableId
column in the AttributeValues table.
MrSets Table
The MrSets table identifies the multiple response sets in the data file. The TableId column is linked to the
TableId column in the MrSetVariables table.
1 - Multiple category
2 - Multiple dichotomy
MrSetVariables Table
The MrSetVariables table identifies the variables in the multiple response sets. The TableId column is
linked to the TableId column in the MrSets table. The VarName column links to the VarName column in
the Variables table.
Cases Table
The Cases table identifies the cases and values in the data file. Except for the RECORD_NUM column, the
columns in the Cases table correspond to the unique VarName values in the Variables table. The column
types and sizes are based on the value of the Type column in the Variables table.
CasesView Table
The CasesView table identifies the cases and value labels in the data file. The RECORD_NUM column is
linked to the RECORD_NUM column in the Cases table.
Except for the RECORD_NUM column, the columns in the CasesView table correspond to the unique
VarName values in the Variables table. The column types and sizes are based on the value of the Type
column in the Variables table. This table automatically extracts values labels from the VLVAR<var_name>
tables and includes the value labels as strings. If there is no VLVAR<var_name> table for a specific
variable, the original formatted value is included.
CasesElapsedTimeView Table
The CasesElapsedTimeView table identifies the cases and elapsed time variables in the data file. The
RECORD_NUM column is linked to the RECORD_NUM column in the Cases table. The
CasesElapsedTimeView table exists only if there are elapsed time variables in the data file.
Except for the RECORD_NUM column, the columns in the CasesElapsedTimeView table correspond to the
unique elapsed time VarName values in the Variables table. Note that elapsed time is formatted as a
string (VARCHAR) in this table.
VarSets Table
The VarSets table identifies the variables sets in the data file. The VarSets table is linked to one or more
VARSETCASES<set_name> and VARSETCASESVIEW<set_name> tables. The TableName column
identifies the specific VARSETCASES<set_name> table for each variable set, and the ViewTableName
column identifies the specific VARSETCASESVIEW<set_name>.
VARSETCASES<set_name> Table
There can be more than one VARSETCASES<set_name> table. Each VARSETCASES<set_name> table
identifies the variables in specific variable sets. The TableName column in the VarSets table identifies the
associated VARSETCASES<set_name> table for each variable set. The RECORD_NUM column is also
linked to the RECORD_NUM column in the Cases table.
Except for the RECORD_NUM column, the columns in the VARSETCASES<set_name> table correspond to
the unique VarName values in the Variables table. The column types and sizes are based on the value of
the Type column in the Variables table.
VARSETCASESVIEW<set_name> Table
There can be more than one VARSETCASESVIEW<set_name> table. Each
VARSETCASESVIEW<set_name> table identifies the variables in specific variable sets. The
ViewTableName column in the VarSets table identifies the associated VARSETCASESVIEW<set_name>
table for each variable set. The RECORD_NUM column is also linked to the RECORD_NUM column in the
Cases table.
Except for the RECORD_NUM column, the columns in the VARSETCASESVIEW<set_name> table
correspond to the unique VarName values in the Variables table. The column types and sizes are based on
the value of the Type column in the Variables table. This table automatically extracts values labels from
the VLVAR<var_name> tables and includes the value labels as strings. If there is no VLVAR<var_name>
table for a specific variable, the original formatted value is included.
Extensions Table
The Extensions table stores any extensions associated with the data file. Except for data file comments
(created with the DOCUMENT command), extensions are typically reserved for internal features of IBM
SPSS Statistics.
TrendsInfo Table
The TrendsInfo defines the Trends date variables in the data set.
0 - None
1 - Cycle
2 - Year
3 - Quarter
4 - Month
5 - Week
6 - Day
7 - Hour
8 - Minute
9 - Second
10 - Observation
11 - Date
Period INTEGER The periodicity of the Trends date variable. This value
depends on Type. If Type is 6 (week), a value of 2 equals
2 weeks.
For license inquiries regarding double-byte (DBCS) information, contact the IBM Intellectual Property
Department in your country or send inquiries, in writing, to:
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at
"Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or
trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon,
Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or
its subsidiaries in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
D T
DDL script 23 tables
Attributes 18
AttributeValues 19
E Cases 19
CasesElapsedTimeView 20
Extensions table 22
CasesView 20
Extensions 22
I MrSets 19
MrSetVariables 19
installation Properties 15
service client driver on UNIX and Linux 9 TrendsInfo 22
service client driver on Windows 8 VarAttributes 18
service driver on UNIX and Linux 6 Variables 15
service driver on Windows 6 VARSETCASES<set_name> 21
standalone driver on UNIX and Linux 4 VARSETCASESVIEW<set_name> 21
standalone driver on Windows 3 VarSets 20
VLVAR<var_name> 18
M TrendsInfo table 22
MrSets table 19
MrSetVariables table 19
V
VarAttributes table 18
O Variables table 15
VARSETCASES<set_name> table 21
ODBC configuration VARSETCASESVIEW<set_name> table 21
service client driver on UNIX and Linux 9 VarSets table 20
service client driver on Windows 8 VLVAR<var_name> table 18
standalone driver on UNIX and Linux 4
standalone driver on Windows 3
without a DSN 11
P
Properties table 15
S
service client driver
installation on UNIX and Linux 9
installation on Windows 8
removing from UNIX and Linux 9
removing from Windows 8
service driver
Index 27
28 IBM SPSS Statistics Data File Driver Guide
IBM®