VECTOR STREAM LIBRARY AND TOOLS

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

0. About This Document

This document describes how to use Vector Stream library and tools, that are distributed under GPLv3 by Ichiroh Kanaya.

1. Vector Stream File Format

Vector Stream provides file I/O APIs of vectors (sequence of numbers) for the standard C language. Vector Stream library stores C-style value array (float and double are supported at this moment) to a file specified by a pointer to FILE structure in its unique format.

The file format provided by Vector Stream (vector file format) is quite simple, human-readable, and yet gives efficiency of streaming (file I/O).

Before presenting the format itself, let us see how the Vector Stream library works in your C-code.

1.1 What is Vector Stream library?

By using the Vector Stream library, you can read/write arrays of numbers from/to a file with no pain. For example, if you have an array of float like

	#define N 10
	float x[N] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

then you can put it onto a file as follows.

	vec_put_float_vector_to_file(N, x, 1, stdout);

The format of the file created here is called "vector file format". Once the vector file created, you can read it as follows.

	size_t n;
	float *y;
	vec_new_float_vector_from_file(&n, &y, stdin);
	some_function(y[0], y[1], y[2]);
	/* ... */

Memory allocation is automatically done. After usage of y, you are required to free the memory manually.

	/* ... */
	vec_delete_float_vector(y);

This is C.

1.2 What is Vector Stream file format?

The Vector Stream file format is quite simple, human-readable, yet supports efficient implementation of streamng library. The key idea of the vector fire format is to have minimum requirement to hold numerical arrays.

First, vector file is a text file. You can put only the follwoing terms in the file.

Whitespace: [ \t]
Newline: \n
Real number in text: -?[0-9]*([.]?[0-9]*)?([eE]-?[0-9]+)?
Special keyword nil
Comment: %.*$

In the comment area, you can put ASCII (and its compatible) characters. (The vector format parser searches \n character when it enters comment sequence, so that you are likely to able to put JIS, EUC, and other ASCII-like codes.)

Second, the Vector Stream file can only contains 1-dimensional array. Remember that multi-dimensional array is avoidable in most scientific fields. Multi-dimensional array is needed only if its elements has variable (or multiple) lengths. For this perpose, you may want to use object collection mechanism that you can see in Objective-C or in Java.

Third, you must start the number of elements before putting the elements of the array themselves. For example, if you have an array [0 1 2 3], then you must put

	4 0 1 2 3

into the file. This can greatly reduce memory allocation cost in the library since the library will know how much the array size will be.

You can put comment anywhere. For example, the following is an example of valid Vector Stream file.

	% Comment started from the begining of the line
	4 % Number of elements
	1 % The first element
	  % This is blank line.
	2 3 % The second and third elements.
	% File end.

Fourth, you are encouraged to start all vector files with this header.

	%!VCTR
	%-format=1.2

The author also recommend that the filenames of all vector files end with ".v" suffixes.

Fifth, you are allowed to give some hints and messages to the vector files by the special form of comments as follows.

	%?Hello, world.
	%*option=123

The message started with %? might be printed on the console during the processing by Vector Stream tools, might be passed to the next vector file if the process formed pipeline. The option string started with %* might be analized by Vector Stream tools; for example, the option string could hold stride of the vector like the following sample that is meaning ([1 2 3], [10 20 30], [100 200 300]).

	%!VCTR
	%-format=1.2
	%*dimension=3
	%*cardinality=3
	%*minimum=1:2:3
	%*maximum=100:200:300
	%*average=37:74:111
	9 % Number of elements (3-dimensional vector x 3)
	1 2 3 % The first vector
	10 20 30 % The second vector
	100 200 300 % The third vector
	% End of file.

1.3 Revision history of Vector file format

The erly version of vector file format is proposed by the author in 2002, for the purpose of his research. In 2008, the format has been slightly modified while keeping backward comaptibility to the previous versions.

Format version 1.2

The vector file format version 1.2 was designed in 2008.

%* metadata field appeared.
%? message field appeared.
You can put nil for empty vector, instead of 0 vector.

Format version 1.1

The vector file format version 1.1 was designed in 2002. The version 1.1 of the format has the following features.

%!VCTR header appeared.
%-format=1.1 format identifier appeared.
You can put comment anywhere from this version.
No metadata except for comment supported.

Format version 1.0

The vector file format was originally introduced in 2002. The version 1.0 of the format has the following restriction.

You can only put numbers and comments that exactly start from the begining of the lines.
Blank lines are not allowed. Instead you must insert blank comment.

2. Vector Stream Library

The Vector Stream library provides C89/C90 API and also provieds C++03 API at this moment. The C API is accessible from C++ but all functions in C API are located in the namespace pid.

2.1 C API

The Vector Stream library provides the following functions. The C APIs are defined in <vec.h> header. C++ users can include <vec++.hh> instead, to use short-cuts to the C functions.

2.1.1 Writing header

	extern int vec_put_header_to_file(FILE *fout);

This function puts valid header (i.e. %!VCTR...) to fout.

2.1.2 Writing vector

	extern int vec_put_nil_to_file(FILE *fout);

This function puts nil to fout.

	extern int vec_put_float_vector_to_file(size_t n, const float *v, size_t s, FILE *fout);

vec_put_float_vector_to_file puts value array v of size n to file fout. If s is more than 1, this funciton puts LF every s elements; otherwise this function puts each elements line by line. C++ users can use this function with the short-cut name pid::put_vector.

	extern int vec_new_float_vector_from_file(size_t *n, float **v, FILE *fin);

vec_new_float_vector_from_file gets vector *v of size *n from file fin. Memory for *v is automatically allocated. C++ users can use this function with the short-cut name pid::new_vector.

	extern void vec_delete_float_vector(float *v);

After use of vectors, users are encouraged to clean up the memory by using this function. C++ users can use this function with the short-cut name pid::delete_vector.

	extern int vec_put_double_vector_to_file(size_t n, const double *v, size_t s, FILE *fout);
	extern int vec_new_double_vector_from_file(size_t *n, double **v, FILE *fin);
	extern void vec_delete_double_vector(double *v);

The above functions are double versions of the put/new/delete functions. The short-cut names for C++ users are same.

2.1.3 Writing hint and message

	extern int vec_put_hint_to_file(const char *hint, int parameter, FILE *fout);

This function puts hint to the file fout in the format of %*hint=parameter.

	extern int vec_put_message_to_file(const char *message, FILE *fout);

This function puts message to the file fout in the format of %?message. The \n characters in message are ignored.

	extern int vec_scan_messages_from_file_and_put_to_file(FILE *fin, FILE *fout);

This function scans any messages (lines that start with %?) of file fin and puts them to file fout.

2.1.4 Vector operations

	extern int vec_slice_double_vector(double *a, const double *v, size_t offset, size_t length, size_t stride);

This function slices vector v and stores the sliced vector to a. The memory for the sliced vector a must be allocated before calling this function. The slicing parameters are offset offset, length length, and stride stride. This function is an alternative to C++'s std::slice.

	extern int vec_add_double_multi_vector_to_multi_vector(double *a, size_t s, size_t n1, const double *v1, size_t n2, const double *v2);
	extern int vec_add_double_single_vector_to_multi_vector(double *a, size_t s, size_t n1, const double *v1, size_t n2, const double *v2);

These functions calculate sum of two vectors v1 (size n1) and v2 (size n2) and store the result to a. The memory for resulting vector a must be allocated before calling these functions. The former function takes the same size of vectors v1 and v2 and each elements are added and stored in the corresponding positions of a. The latter function takes smaller size of vector as v1 and repeats it for adding to v2.

	extern int vec_multiply_double_multi_matrix_to_multi_vector(double *a, size_t s, size_t nm, const double *m, size_t nv, const double *v, int transpose);
	extern int vec_multiply_double_single_matrix_to_multi_vector(double *a, size_t s, size_t nm, const double *m, size_t nv, const double *v, int transpose);

These functions calculate products of matrices v1 (size n1, stride s) and vectors v2 (size n2, stride s), and store the result to a. The memory for resulting vector a must be allocated before calling these functions. The former function takes the same numbers of matrices v1 and vectors v2; each elements are multiplied and stored in the corresponding positions of a. The latter function takes a single matrix as v1 and repeats it for multiplying to v2.

2.1.5 Error handler

	typedef int (*vec_error_handler_t)(int error_type, const char *error_message);
	extern vec_error_handler_t vec_set_error_handler(vec_error_handler_t new_error_handler);

If some error occurs in Vector Stream library, a default error handler is invoked. The default error handler leaves error message on stderr and calls exit(1) if the error is fatal, otherwise returns. By using the vec_set_error_handler function you can modify the default behavior (e.g., you can throw exception if you are using C++). This function returns current error handler.

2.2 C++ API

The Vector Stream library provides C++ APIs on top of C APIs. The functions and templates are provided through <vec++.hh> header.

2.2.1 Short-cuts

C++ API of Vector Stream provides short-cut names to the C APIs. All C APIs and their short-cut versions are declared in pid namespace.

2.2.2 vector_loader class

The C++ API provides class template vector_loader. The following example shows how to use this template.

	#include <iostream>
	#include <vec++.hh>
        int main(int argc, char **argv) {
          pid::vector_loader<double> *vl = new pid::vector_loader<double>(argv[2]);
          std::valarray<double> *v = vl->values();
          std::vector<std::string> *m = vl->messages();
          std::vector<std::string>::const_iterator i = m->begin();
          while (i != m->end()) {
            std::cerr << *i << '\n';
            ++i;
          }
          std::map<std::string, std::string> *h = vl->hints();
          std::cerr << (*h)["dimension"] << '\n';
          delete vl;
        }

Unfortunately current implementation of vector_loader scans the input file twice. This spoils the advantage of Vector Stream file format in terms of its efficiency. If the time is critical, stick with C APIs for loading the file.

2.2.3 Paraeter analyzer

The following function template helps to break collon-separated-values in valarray.

	template <typename T, typename C> void parse_multiple_parameters(std::valarray<T> **v, C converter, char *s);

The converter is a functor (function-like object) and must provide

T operator () (const std::string
      &)

. For example if the parameter was

s =
      "1:2:3"

then

	std::valarray<double> *parameters = 0;
	pid::parse_multiple_parameters(&parameters, pid::string_to_double(), s);

will give you new std::valarray<double> with size 3, containing 1.0, 2.0, and 3.0. In this example we use pre-defined functor string_to_double, other functors are also avilable: string_to_string and string_to_int.

3. Vector Stream Tools

Vector Stream provides the following command-line tools. These commands print out help message if no arguments are given.

3.1 vectorize

Command vectorize reads text file and writes it in Vector Stream file format to the standard output.

3.2 vcat

Command vcat reads Vector Stream file and writes it in Vector Stream file format to the standard output. If -u option is given, the command writes in plain text format.

3.3 slice/gslice

Command slice -ooffset -llength -sstride input.v slices input vector input.v with offset offset, length length, and stride stride. If length is 0, the length is automatically calculated.

Command gslice -ooffset -Llength1:length2[:...] -Sstride1:stride2[:...] input.v slices input vector input.v with offset offset, lengths length1, lenght2, ..., and strides stride1, stride2, .... If length1 is 0, the length is automatically calculated.

3.4 splice

Command splice input1.v input2.v splices two input vectors. The first element of input1.v is first copied to output, and then the first element of input2.v is copied to output. Next the second element of input1.v is copied to the output, then the second element of input2.v is copied... You can give -sstride option to specify stride.

3.5 add/multiply

Command add input1.v input2.v calculates sum of two vectors input1.v and input2.v and outputs to the standard out.

3.6 statistics

Command statistics reports maximum value, minimum value, average value, etc. of input vectors.

4. Install

Vector Stream is distributed as a source code, thus you must compile the library/tools by yourself after you obtain the source code.

4.1 How to get

Visit sourceforge.net to download the source code. The source codes are under the control of SVN.

4.2 How to install

Vector Stream library provides configure script. To install library and supporting tools, try: ./configure, then make, and then make install.

4.3 How to contact the author

You will be able to contact the author at kanaya (at) users (dot) sourceforge (dot) net.

5. References

To be written.

6. Acknowledgement

This program/library has been developed under the support of:

Prof. Kazuo Kawasaki, The PiD Lab., Graduate School of Engineering, Osaka University, Japan
Prof. Kosuke Sato, The SENS Lab., Graduate School of Engineering Science, Osaka University, Japan
Dr. Mark Lehner and Mr. Yukinori Kawae, Ancient Egypt Research Association, US

7. Revision History

20008-06-22: First version of this document.

$Id: index.html 36 2008-06-30 15:22:51Z kanaya $

kanaya (at) users (dot) sourceforge (dot) net