teapot-spreadsheet/src/common/csv.c

163 lines
3.4 KiB
C
Raw Normal View History

#ifndef NO_POSIX_SOURCE
#undef _POSIX_SOURCE
#define _POSIX_SOURCE 1
#undef _POSIX_C_SOURCE
#define _POSIX_C_SOURCE 2
#endif
#include <assert.h>
#include <ctype.h>
#include <stdlib.h>
#ifdef OLD_REALLOC
#define realloc(s,l) myrealloc(s,l)
#endif
#ifdef DMALLOC
#include "dmalloc.h"
#endif
#include "csv.h"
Streamline internals of teapot The primary change is to add a “funcall” token, so that an entire expression can be encapsulated as a single token. This change is used to allow a cell to include simply a selection of appropriate semantic tokens. I.e., the content and iterative content are now each a single token like the value and the result value. Token vectors are used only as intermediate results in scanning and parsing. Not this means the cells are now in effect storing parse trees, so computation should be slightly faster and future extensions (like #56) should be facilitated. This commit also takes the opportunity while internals are being altered to add another token to a cell for future use for computed attributes, cf #22, and to change the internal numerical values from double and ints to long doubles and long longs. However, the change attempts to encapsulate that choice so it would be easy to change back or change to another representation. Note that these changes break savexdr(), as the internal binary format of a cell is now different. Rather than reimplement it, it is deprecated as the world does not need another binary spreadsheet format. Hence, the ascii format for teapot spreadsheets becomes the primary file format. Loading of old xdr files is still supported for backward compatibility. Closes #59. Also along the way, various other slight fixes and enhancements crept in, a partial but probably not exhaustive list of which follows: Fixes #31. Further revisions and improvements to documentation. Make the approximate comparison of floating point values scale more accurately with the size of the doubles being compared. Further extensions of absolute and relative cell addressing. Addition of (circle constant) tau function/constant. Modified string conversion to simply use internal printing routines, and to take "scientific" and "decimal" keywords. Allowed n() function to take a list of values, or just a single location defaulting to the current location. Added floor, ceil, trunc, and round functions, and allowed them to be keywords controlling the int() integer conversion function as well. Allowed substr() to drop its last argument to go to the end of the string. Provided an enum of built-in functions to preserve legacy function identifiers, allowing the large table inside func.c to be reorganized in a clearer fashion. Added additional annotation of properties of the built-in functions, including precedence. All operators are now also accessible as built-in functions. Made precedence of unary - lower than ^ to match python. Avoided inadvertently using FLTK @symbol abbreviations for formulas with "@" in them.
2019-08-23 19:12:06 +00:00
#include "scanner.h"
static int semicol=0;
/* csv_setopt -- set language */ /*{{{C}}}*//*{{{*/
void csv_setopt(int sem)
{
semicol=sem;
}
/*}}}*/
#if 0
/* csv_date -- convert string "year<sep>month<sep>day<sep>" to date */ /*{{{*/
static struct CSV_Date csv_date(const char *s, const char **end)
{
struct CSV_Date date;
assert(s!=(const char*)0);
assert(end!=(const char**)0);
*end=s;
if (*s=='"')
{
++s;
if (isdigit(*s))
{
for (date.year=0; isdigit(*s); date.year=10*date.year+(*s-'0'),++s);
if (*s)
{
++s;
if (isdigit(*s))
{
for (date.month=0; isdigit(*s); date.month=10*date.month+(*s-'0'),++s);
if (*s)
{
++s;
if (isdigit(*s))
{
for (date.day=0; isdigit(*s); date.day=10*date.day+(*s-'0'),++s);
if (*s=='"')
{
++s;
*end=s;
}
}
}
}
}
}
}
return date;
}
/*}}}*/
#endif
/* csv_separator -- convert string \t to nothing */ /*{{{*/
void csv_separator(const char *s, const char **end)
{
assert(s!=(const char*)0);
assert(end!=(const char**)0);
*end=s+(*s=='\t' || (semicol && *s==';') || (!semicol && *s==','));
}
/*}}}*/
/* csv_long -- convert string [0[x]]12345 to long */ /*{{{*/
Streamline internals of teapot The primary change is to add a “funcall” token, so that an entire expression can be encapsulated as a single token. This change is used to allow a cell to include simply a selection of appropriate semantic tokens. I.e., the content and iterative content are now each a single token like the value and the result value. Token vectors are used only as intermediate results in scanning and parsing. Not this means the cells are now in effect storing parse trees, so computation should be slightly faster and future extensions (like #56) should be facilitated. This commit also takes the opportunity while internals are being altered to add another token to a cell for future use for computed attributes, cf #22, and to change the internal numerical values from double and ints to long doubles and long longs. However, the change attempts to encapsulate that choice so it would be easy to change back or change to another representation. Note that these changes break savexdr(), as the internal binary format of a cell is now different. Rather than reimplement it, it is deprecated as the world does not need another binary spreadsheet format. Hence, the ascii format for teapot spreadsheets becomes the primary file format. Loading of old xdr files is still supported for backward compatibility. Closes #59. Also along the way, various other slight fixes and enhancements crept in, a partial but probably not exhaustive list of which follows: Fixes #31. Further revisions and improvements to documentation. Make the approximate comparison of floating point values scale more accurately with the size of the doubles being compared. Further extensions of absolute and relative cell addressing. Addition of (circle constant) tau function/constant. Modified string conversion to simply use internal printing routines, and to take "scientific" and "decimal" keywords. Allowed n() function to take a list of values, or just a single location defaulting to the current location. Added floor, ceil, trunc, and round functions, and allowed them to be keywords controlling the int() integer conversion function as well. Allowed substr() to drop its last argument to go to the end of the string. Provided an enum of built-in functions to preserve legacy function identifiers, allowing the large table inside func.c to be reorganized in a clearer fashion. Added additional annotation of properties of the built-in functions, including precedence. All operators are now also accessible as built-in functions. Made precedence of unary - lower than ^ to match python. Avoided inadvertently using FLTK @symbol abbreviations for formulas with "@" in them.
2019-08-23 19:12:06 +00:00
IntT csv_long(const char *s, const char **end)
{
Streamline internals of teapot The primary change is to add a “funcall” token, so that an entire expression can be encapsulated as a single token. This change is used to allow a cell to include simply a selection of appropriate semantic tokens. I.e., the content and iterative content are now each a single token like the value and the result value. Token vectors are used only as intermediate results in scanning and parsing. Not this means the cells are now in effect storing parse trees, so computation should be slightly faster and future extensions (like #56) should be facilitated. This commit also takes the opportunity while internals are being altered to add another token to a cell for future use for computed attributes, cf #22, and to change the internal numerical values from double and ints to long doubles and long longs. However, the change attempts to encapsulate that choice so it would be easy to change back or change to another representation. Note that these changes break savexdr(), as the internal binary format of a cell is now different. Rather than reimplement it, it is deprecated as the world does not need another binary spreadsheet format. Hence, the ascii format for teapot spreadsheets becomes the primary file format. Loading of old xdr files is still supported for backward compatibility. Closes #59. Also along the way, various other slight fixes and enhancements crept in, a partial but probably not exhaustive list of which follows: Fixes #31. Further revisions and improvements to documentation. Make the approximate comparison of floating point values scale more accurately with the size of the doubles being compared. Further extensions of absolute and relative cell addressing. Addition of (circle constant) tau function/constant. Modified string conversion to simply use internal printing routines, and to take "scientific" and "decimal" keywords. Allowed n() function to take a list of values, or just a single location defaulting to the current location. Added floor, ceil, trunc, and round functions, and allowed them to be keywords controlling the int() integer conversion function as well. Allowed substr() to drop its last argument to go to the end of the string. Provided an enum of built-in functions to preserve legacy function identifiers, allowing the large table inside func.c to be reorganized in a clearer fashion. Added additional annotation of properties of the built-in functions, including precedence. All operators are now also accessible as built-in functions. Made precedence of unary - lower than ^ to match python. Avoided inadvertently using FLTK @symbol abbreviations for formulas with "@" in them.
2019-08-23 19:12:06 +00:00
IntT value;
const char *t;
assert(s!=(const char*)0);
assert(end!=(const char**)0);
if (*s=='\t') { *end=s; return 0L; };
Streamline internals of teapot The primary change is to add a “funcall” token, so that an entire expression can be encapsulated as a single token. This change is used to allow a cell to include simply a selection of appropriate semantic tokens. I.e., the content and iterative content are now each a single token like the value and the result value. Token vectors are used only as intermediate results in scanning and parsing. Not this means the cells are now in effect storing parse trees, so computation should be slightly faster and future extensions (like #56) should be facilitated. This commit also takes the opportunity while internals are being altered to add another token to a cell for future use for computed attributes, cf #22, and to change the internal numerical values from double and ints to long doubles and long longs. However, the change attempts to encapsulate that choice so it would be easy to change back or change to another representation. Note that these changes break savexdr(), as the internal binary format of a cell is now different. Rather than reimplement it, it is deprecated as the world does not need another binary spreadsheet format. Hence, the ascii format for teapot spreadsheets becomes the primary file format. Loading of old xdr files is still supported for backward compatibility. Closes #59. Also along the way, various other slight fixes and enhancements crept in, a partial but probably not exhaustive list of which follows: Fixes #31. Further revisions and improvements to documentation. Make the approximate comparison of floating point values scale more accurately with the size of the doubles being compared. Further extensions of absolute and relative cell addressing. Addition of (circle constant) tau function/constant. Modified string conversion to simply use internal printing routines, and to take "scientific" and "decimal" keywords. Allowed n() function to take a list of values, or just a single location defaulting to the current location. Added floor, ceil, trunc, and round functions, and allowed them to be keywords controlling the int() integer conversion function as well. Allowed substr() to drop its last argument to go to the end of the string. Provided an enum of built-in functions to preserve legacy function identifiers, allowing the large table inside func.c to be reorganized in a clearer fashion. Added additional annotation of properties of the built-in functions, including precedence. All operators are now also accessible as built-in functions. Made precedence of unary - lower than ^ to match python. Avoided inadvertently using FLTK @symbol abbreviations for formulas with "@" in them.
2019-08-23 19:12:06 +00:00
value = STRTOINT(s, (char**)end, 0);
if (s!=*end)
{
t=*end; csv_separator(t,end);
if (t!=*end || *t=='\0' || *t=='\n') return value;
}
*end=s; return 0L;
}
/*}}}*/
/* csv_double -- convert string 123.4e5 to double */ /*{{{*/
Streamline internals of teapot The primary change is to add a “funcall” token, so that an entire expression can be encapsulated as a single token. This change is used to allow a cell to include simply a selection of appropriate semantic tokens. I.e., the content and iterative content are now each a single token like the value and the result value. Token vectors are used only as intermediate results in scanning and parsing. Not this means the cells are now in effect storing parse trees, so computation should be slightly faster and future extensions (like #56) should be facilitated. This commit also takes the opportunity while internals are being altered to add another token to a cell for future use for computed attributes, cf #22, and to change the internal numerical values from double and ints to long doubles and long longs. However, the change attempts to encapsulate that choice so it would be easy to change back or change to another representation. Note that these changes break savexdr(), as the internal binary format of a cell is now different. Rather than reimplement it, it is deprecated as the world does not need another binary spreadsheet format. Hence, the ascii format for teapot spreadsheets becomes the primary file format. Loading of old xdr files is still supported for backward compatibility. Closes #59. Also along the way, various other slight fixes and enhancements crept in, a partial but probably not exhaustive list of which follows: Fixes #31. Further revisions and improvements to documentation. Make the approximate comparison of floating point values scale more accurately with the size of the doubles being compared. Further extensions of absolute and relative cell addressing. Addition of (circle constant) tau function/constant. Modified string conversion to simply use internal printing routines, and to take "scientific" and "decimal" keywords. Allowed n() function to take a list of values, or just a single location defaulting to the current location. Added floor, ceil, trunc, and round functions, and allowed them to be keywords controlling the int() integer conversion function as well. Allowed substr() to drop its last argument to go to the end of the string. Provided an enum of built-in functions to preserve legacy function identifiers, allowing the large table inside func.c to be reorganized in a clearer fashion. Added additional annotation of properties of the built-in functions, including precedence. All operators are now also accessible as built-in functions. Made precedence of unary - lower than ^ to match python. Avoided inadvertently using FLTK @symbol abbreviations for formulas with "@" in them.
2019-08-23 19:12:06 +00:00
FltT csv_double(const char *s, const char **end)
{
Streamline internals of teapot The primary change is to add a “funcall” token, so that an entire expression can be encapsulated as a single token. This change is used to allow a cell to include simply a selection of appropriate semantic tokens. I.e., the content and iterative content are now each a single token like the value and the result value. Token vectors are used only as intermediate results in scanning and parsing. Not this means the cells are now in effect storing parse trees, so computation should be slightly faster and future extensions (like #56) should be facilitated. This commit also takes the opportunity while internals are being altered to add another token to a cell for future use for computed attributes, cf #22, and to change the internal numerical values from double and ints to long doubles and long longs. However, the change attempts to encapsulate that choice so it would be easy to change back or change to another representation. Note that these changes break savexdr(), as the internal binary format of a cell is now different. Rather than reimplement it, it is deprecated as the world does not need another binary spreadsheet format. Hence, the ascii format for teapot spreadsheets becomes the primary file format. Loading of old xdr files is still supported for backward compatibility. Closes #59. Also along the way, various other slight fixes and enhancements crept in, a partial but probably not exhaustive list of which follows: Fixes #31. Further revisions and improvements to documentation. Make the approximate comparison of floating point values scale more accurately with the size of the doubles being compared. Further extensions of absolute and relative cell addressing. Addition of (circle constant) tau function/constant. Modified string conversion to simply use internal printing routines, and to take "scientific" and "decimal" keywords. Allowed n() function to take a list of values, or just a single location defaulting to the current location. Added floor, ceil, trunc, and round functions, and allowed them to be keywords controlling the int() integer conversion function as well. Allowed substr() to drop its last argument to go to the end of the string. Provided an enum of built-in functions to preserve legacy function identifiers, allowing the large table inside func.c to be reorganized in a clearer fashion. Added additional annotation of properties of the built-in functions, including precedence. All operators are now also accessible as built-in functions. Made precedence of unary - lower than ^ to match python. Avoided inadvertently using FLTK @symbol abbreviations for formulas with "@" in them.
2019-08-23 19:12:06 +00:00
FltT value;
const char *t;
assert(s!=(const char*)0);
assert(end!=(const char**)0);
if (*s=='\t') { *end=s; return 0.0; };
Streamline internals of teapot The primary change is to add a “funcall” token, so that an entire expression can be encapsulated as a single token. This change is used to allow a cell to include simply a selection of appropriate semantic tokens. I.e., the content and iterative content are now each a single token like the value and the result value. Token vectors are used only as intermediate results in scanning and parsing. Not this means the cells are now in effect storing parse trees, so computation should be slightly faster and future extensions (like #56) should be facilitated. This commit also takes the opportunity while internals are being altered to add another token to a cell for future use for computed attributes, cf #22, and to change the internal numerical values from double and ints to long doubles and long longs. However, the change attempts to encapsulate that choice so it would be easy to change back or change to another representation. Note that these changes break savexdr(), as the internal binary format of a cell is now different. Rather than reimplement it, it is deprecated as the world does not need another binary spreadsheet format. Hence, the ascii format for teapot spreadsheets becomes the primary file format. Loading of old xdr files is still supported for backward compatibility. Closes #59. Also along the way, various other slight fixes and enhancements crept in, a partial but probably not exhaustive list of which follows: Fixes #31. Further revisions and improvements to documentation. Make the approximate comparison of floating point values scale more accurately with the size of the doubles being compared. Further extensions of absolute and relative cell addressing. Addition of (circle constant) tau function/constant. Modified string conversion to simply use internal printing routines, and to take "scientific" and "decimal" keywords. Allowed n() function to take a list of values, or just a single location defaulting to the current location. Added floor, ceil, trunc, and round functions, and allowed them to be keywords controlling the int() integer conversion function as well. Allowed substr() to drop its last argument to go to the end of the string. Provided an enum of built-in functions to preserve legacy function identifiers, allowing the large table inside func.c to be reorganized in a clearer fashion. Added additional annotation of properties of the built-in functions, including precedence. All operators are now also accessible as built-in functions. Made precedence of unary - lower than ^ to match python. Avoided inadvertently using FLTK @symbol abbreviations for formulas with "@" in them.
2019-08-23 19:12:06 +00:00
value = STRTOFLT(s,(char**)end);
if (s!=*end)
{
t=*end; csv_separator(t,end);
if (t!=*end || *t=='\0' || *t=='\n') return value;
}
*end=s; return 0.0;
}
/*}}}*/
Streamline internals of teapot The primary change is to add a “funcall” token, so that an entire expression can be encapsulated as a single token. This change is used to allow a cell to include simply a selection of appropriate semantic tokens. I.e., the content and iterative content are now each a single token like the value and the result value. Token vectors are used only as intermediate results in scanning and parsing. Not this means the cells are now in effect storing parse trees, so computation should be slightly faster and future extensions (like #56) should be facilitated. This commit also takes the opportunity while internals are being altered to add another token to a cell for future use for computed attributes, cf #22, and to change the internal numerical values from double and ints to long doubles and long longs. However, the change attempts to encapsulate that choice so it would be easy to change back or change to another representation. Note that these changes break savexdr(), as the internal binary format of a cell is now different. Rather than reimplement it, it is deprecated as the world does not need another binary spreadsheet format. Hence, the ascii format for teapot spreadsheets becomes the primary file format. Loading of old xdr files is still supported for backward compatibility. Closes #59. Also along the way, various other slight fixes and enhancements crept in, a partial but probably not exhaustive list of which follows: Fixes #31. Further revisions and improvements to documentation. Make the approximate comparison of floating point values scale more accurately with the size of the doubles being compared. Further extensions of absolute and relative cell addressing. Addition of (circle constant) tau function/constant. Modified string conversion to simply use internal printing routines, and to take "scientific" and "decimal" keywords. Allowed n() function to take a list of values, or just a single location defaulting to the current location. Added floor, ceil, trunc, and round functions, and allowed them to be keywords controlling the int() integer conversion function as well. Allowed substr() to drop its last argument to go to the end of the string. Provided an enum of built-in functions to preserve legacy function identifiers, allowing the large table inside func.c to be reorganized in a clearer fashion. Added additional annotation of properties of the built-in functions, including precedence. All operators are now also accessible as built-in functions. Made precedence of unary - lower than ^ to match python. Avoided inadvertently using FLTK @symbol abbreviations for formulas with "@" in them.
2019-08-23 19:12:06 +00:00
/* csv_string -- convert almost any string to string */ /*{{{*/
char *csv_string(const char *s, const char **end)
{
static char *string;
static int strings,stringsz;
assert(s!=(const char*)0);
assert(end!=(const char**)0);
strings=0;
stringsz=0;
string=(char*)0;
if (!isprint((int)*s) || (!semicol && *s==',') || (semicol && *s==';')) return (char*)0;
if (*s=='"')
{
++s;
while (*s!='\0' && *s!='\n' && *s!='"')
{
if ((strings+2)>=stringsz)
{
string=realloc(string,stringsz+=32);
}
string[strings]=*s;
++strings;
++s;
}
if (*s=='"') ++s;
}
else
{
while (*s!='\0' && *s!='\n' && *s!='\t' && ((!semicol && *s!=',') || (semicol && *s!=';')))
{
if ((strings+2)>=stringsz)
{
string=realloc(string,stringsz+=32);
}
string[strings]=*s;
++strings;
++s;
}
}
string[strings]='\0';
*end=s;
csv_separator(s,end);
return string;
}
/*}}}*/