Path: icaen!news.uiowa.edu!uunet!dziuxsolim.rutgers.edu!pilot.njin.net!not-for-mail From: comp-sources-apple2@pilot.njin.net Newsgroups: comp.sources.apple2 Subject: v001SRC094: AWK -- 16-bit Port of AT&T AWK (GNO) 01/06 Date: 1 Jan 1995 17:39:06 -0500 Organization: Rutgers University Lines: 1519 Sender: jac@pilot.njin.net Approved: jac@pilot.njin.net Distribution: world Message-ID: <3e7aua$d5t@pilot.njin.net> NNTP-Posting-Host: pilot.njin.net Submitted-By: Jawaid Bazyar (bazyar@netcom.com) Posting-number: Volume 1, Source 94 Archive-Name: gno/util/awk.01 Architecture: 2gs,UNIX Version-Number: 1.00 AWK is a powerful string processing language that is widely used in the Unix world. This particular version of AWK is a port of the actual AT&T AWK source code, which AT&T has graciously made available to the general public. The changes necessary to make AWK function on the 16-bit Apple //GS will likely be of interest to anyone trying to port AWK to MS-DOS or some other 16-bit machine. This package requires the GNO/ME 2.0 and ORCA/C 2.0.1. Parts 1 through 4 contain the actual source; Parts 5 and 6 contain the output of YACC and LEX so you can compile the source even if you do not have these programs. The contents of Parts 5 and 6 are fully rebuildable from the source. Packed in AAF. Enjoy. ******************************************************************************** =Manifest -FIXES -Manifest -README -README.gno -awk.1 -awk.g.y -awk.h -awk.lx.l -b.c -lex.yy.c -lib.c -main.c -makefile -makefile.gno -maketab.c -parse.c -proctab.c -proto.h -run.c -tran.c -y.tab.c -y.tab.h =README.gno - -AWK - -AWK is a powerful string processing language that is widely used in -the Unix world. There has been a longstanding request for a GNO/ME -version. - -This particular version of AWK is a port of the actual AT&T AWK source -code, which AT&T has graciously made available to the general public. -The changes necessary to make AWK function on the 16-bit Apple //GS -will likely be of interest to anyone trying to port AWK to MS-DOS or -some other 16-bit machine. - -While I have done some reasonable testing of AWK, its extensive feature -set makes any sort of exhaustive testing very difficult. While I have -not been able to give it a full-fledged test, I am confident enough -that it works to release it. - -It is certainly conceivable that there bugs were introduced by the -porting process; if you find one, please let me know. - -This package requires the GNO/ME 2.0 and ORCA/C 2.0.1. - -========= -Compiling -========= -To build this package, you will need GNO/ME 2.0 and ORCA/C 2.0.1. -An Orca compatible Makefile is included. - -The AWK executable should be placed in your /usr/bin directory. -The awk.1 manual page should be placed in /usr/man/man1/. - -================================== -Changes To Make AWK Work Under GNO -================================== -There were numberous changes made to this version of AWK to allow it to -be ported to the Apple IIGS. The vast majority of these changes replace -the use of large stack-based arrays and data structures with a call to -malloc() at the beginning of the function and a call to free() at the -end. These changes will likely be of interest to anyone trying to port -AWK to MS-DOS or some other 16-bit machine. - -The changes involved are: - (1) Reduce the size of the data structure (in the case of an array - of possible open file pointers, I reduced the system's MAX_OPEN - from 32768 to 40 by redefining it (in "run.c"). - (2) Allocate all large local structures via malloc() and free. - (in "run.c") - (3) Set the IIGS OMF load segment names, since AWK is bigger than - 64K and the code must therefore be segmented. - (4) in main.c, signal(SIGFPE,fpecatch) was removed, because the - IIGS floating point libraries don't send signals on floating - point exceptions. - -All changes are marked with: - #ifdef __ORCAC__ or #ifndef __ORCAC__ - ... - #else - .. - #endif - -The original code still remains in all cases. - -================== -Author Information -================== -Jawaid Bazyar -bazyar@netcom.com -Procyon, Inc. -P.O Box 620334 -Littleton, CO 80162-0334 -303-781-3273 - -Version 1.00 -December 1994 =README -/**************************************************************** -Copyright (C) AT&T 1993 -All Rights Reserved - -Permission to use, copy, modify, and distribute this software and -its documentation for any purpose and without fee is hereby -granted, provided that the above copyright notice appear in all -copies and that both that the copyright notice and this -permission notice and warranty disclaimer appear in supporting -documentation, and that the name of AT&T or any of its entities -not be used in advertising or publicity pertaining to -distribution of the software without specific, written prior -permission. - -AT&T DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, -INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. -IN NO EVENT SHALL AT&T OR ANY OF ITS ENTITIES BE LIABLE FOR ANY -SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES -WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER -IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, -ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF -THIS SOFTWARE. -****************************************************************/ - -This is the version of awk described in "The AWK Programming Language", -by A. V. Aho, B. W. Kernighan, and P. J. Weinberger -(Addison-Wesley, 1988, ISBN 0-201-07981-X). -Changes, mostly bug fixes, are listed in FIXES. -If you distribute this code further, please please please -distribute FIXES with it. If you find errors, please report -them to bwk@research.att.com. Thanks. - -The program itself is created by - make -which should produce a longish sequence of -messages roughly like this: - - yacc -d awk.g.y - - conflicts: 43 shift/reduce, 85 reduce/reduce - cc -g -c y.tab.c - rm y.tab.c - mv y.tab.o awk.g.o - cmp -s y.tab.h prevy.tab.h || (cp y.tab.h prevy.tab.h; echo change maketab) - prevy.tab.h: No such file or directory - change maketab - lex awk.lx.l - cc -g -c lex.yy.c - rm lex.yy.c - mv lex.yy.o awk.lx.o - cc -g -c b.c - cc -g -c main.c - cc -g -c parse.c - cc maketab.c -o maketab - ./maketab >proctab.c - cc -g -c proctab.c - cc -g -c tran.c - cc -g -c lib.c - cc -g -c run.c - cc -g awk.g.o awk.lx.o b.o main.o parse.o proctab.o tran.o lib.o run.o -lm - -This produces an executable a.out; you will eventually -want to move this to some place like /usr/bin/awk. - -The -g option (which generates symbol table information -for debuggers) can be disabled by removing the line - CFLAGS = -g -from the makefile. Alternatively, you might replace -it by - CFLAGS = -O -if your compiler does significant optimization. - -NOTE: This version uses ANSI C, as you should also. =FIXES -/**************************************************************** -Copyright (C) AT&T 1993 -All Rights Reserved - -Permission to use, copy, modify, and distribute this software and -its documentation for any purpose and without fee is hereby -granted, provided that the above copyright notice appear in all -copies and that both that the copyright notice and this -permission notice and warranty disclaimer appear in supporting -documentation, and that the name of AT&T or any of its entities -not be used in advertising or publicity pertaining to -distribution of the software without specific, written prior -permission. - -AT&T DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, -INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. -IN NO EVENT SHALL AT&T OR ANY OF ITS ENTITIES BE LIABLE FOR ANY -SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES -WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER -IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, -ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF -THIS SOFTWARE. -****************************************************************/ - -Sep 12, 1987: - Very long printf strings caused core dump; - fixed aprintf, asprintf, format to catch them. - Can still get a core dump in printf itself. - -Sep 17, 1987: - Error-message printer had printf(s) instead of - printf("%s",s); got core dumps when the message - included a %. - -Oct xx, 1987: - Reluctantly added toupper and tolower functions. - Subject to rescinding without notice. - -Dec 2, 1987: - Newer C compilers apply a strict scope rule to extern - declarations within functions. Two extern declarations in - lib.c and tran.c have been moved to obviate this problem. - -Mar 25, 1988: - main.c fixed to recognize -- as terminator of command- - line options. Illegal options flagged. - Error reporting slightly cleaned up. - -May 10, 1988: - Fixed lib.c to permit _ in commandline variable names. - -May 22, 1988: - Removed limit on depth of function calls. - -May 28, 1988: - srand returns seed value it's using. - see 1/18/90 - -June 1, 1988: - check error status on close - -July 2, 1988: - performance bug in b.c/cgoto(): not freeing some sets of states. - partial fix only right now, and the number of states increased - to make it less obvious. - -July 2, 1988: - flush stdout before opening file or pipe - -July 24, 1988: - fixed egregious error in toupper/tolower functions. - still subject to rescinding, however. - -Aug 23, 1988: - setting FILENAME in BEGIN caused core dump, apparently - because it was freeing space not allocated by malloc. - -Sep 30, 1988: - Now guarantees to evaluate all arguments of built-in - functions, as in C; the appearance is that arguments - are evaluated before the function is called. Places - affected are sub (gsub was ok), substr, printf, and - all the built-in arithmetic functions in bltin(). - A warning is generated if a bltin() is called with - the wrong number of arguments. - - This requires changing makeprof on p167 of the book. - -Oct 12, 1988: - Fixed bug in call() that freed local arrays twice. - - Fixed to handle deletion of non-existent array right; - complains about attempt to delete non-array element. - -Oct 20, 1988: - Fixed %c: if expr is numeric, use numeric value; - otherwise print 1st char of string value. still - doesn't work if the value is 0 -- won't print \0. - - Added a few more checks for running out of malloc. - -Oct 30, 1988: - Fixed bug in call() that failed to recover storage. - - A warning is now generated if there are more arguments - in the call than in the definition (in lieu of fixing - another storage leak). - -Nov 27, 1988: - With fear and trembling, modified the grammar to permit - multiple pattern-action statements on one line without - an explicit separator. By definition, this capitulation - to the ghost of ancient implementations remains undefined - and thus subject to change without notice or apology. - DO NOT COUNT ON IT. - -Dec 7, 1988: - Added a bit of code to error printing to avoid printing nulls. - (Not clear that it actually would.) - -Dec 17, 1988: - Catches some more commandline errors in main. - Removed redundant decl of modf in run.c (confuses some compilers). - Warning: there's no single declaration of malloc, etc., in awk.h - that seems to satisfy all compilers. - -Jan 9, 1989: - Fixed bug that caused tempcell list to contain a duplicate. - The fix is kludgy. - -Apr 9, 1989: - Changed grammar to prohibit constants as 3rd arg of sub and gsub; - prevents class of overwriting-a-constant errors. (Last one?) - This invalidates the "banana" example on page 43 of the book. - - Added \a ("alert"), \v (vertical tab), \xhhh (hexadecimal), - as in ANSI, for strings. Rescinded the sloppiness that permitted - non-octal digits in \ooo. Warning: not all compilers and libraries - will be able to deal with \x correctly. - -Apr 26, 1989: - Debugging output now includes a version date, - if one compiles it into the source each time. - -Apr 27, 1989: - Line number now accumulated correctly for comment lines. - -Jun 4, 1989: - ENVIRON array contains environment: if shell variable V=thing, - ENVIRON["V"] is "thing" - - multiple -f arguments permitted. error reporting is naive. - (they were permitted before, but only the last was used.) - - fixed a really stupid botch in the debugging macro dprintf - - fixed order of evaluation of commandline assignments to match - what the book claims: an argument of the form x=e is evaluated - at the time it would have been opened if it were a filename (p 63). - this invalidates the suggested answer to ex 4-1 (p 195). - - removed some code that permitted -F (space) fieldseparator, - since it didn't quite work right anyway. (restored aug 2) - -Jun 14, 1989: - added some missing ansi printf conversion letters: %i %X %E %G. - no sensible meaning for h or L, so they may not do what one expects. - - made %* conversions work. - - changed x^y so that if n is a positive integer, it's done - by explicit multiplication, thus achieving maximum accuracy. - (this should be done by pow() but it seems not to be locally.) - done to x ^= y as well. - -Jun 23, 1989: - add newline to usage message. - -Jul 10, 1989: - fixed ref-thru-zero bug in environment code in tran.c - -Jul 30, 1989: - added -v x=1 y=2 ... for immediate commandline variable assignment; - done before the BEGIN block for sure. they have to precede the - program if the program is on the commandline. - Modified Aug 2 to require a separate -v for each assignment. - -Aug 2, 1989: - restored -F (space) separator - -Aug 11, 1989: - fixed bug: commandline variable assignment has to look like - var=something. (consider the man page for =, in file =.1) - - changed number of arguments to functions to static arrays - to avoid repeated malloc calls. - -Aug 24, 1989: - removed redundant relational tests against nullnode if parse - tree already had a relational at that point. - -Oct 11, 1989: - FILENAME is now defined in the BEGIN block -- too many old - programs broke. - - "-" means stdin in getline as well as on the commandline. - - added a bunch of casts to the code to tell the truth about - char * vs. unsigned char *, a right royal pain. added a - setlocale call to the front of main, though probably no one - has it usefully implemented yet. - -Oct 18, 1989: - another try to get the max number of open files set with - relatively machine-independent code. - - small fix to input() in case of multiple reads after EOF. - -Jan 5, 1990: - fix potential problem in tran.c -- something was freed, - then used in freesymtab. - -Jan 18, 1990: - srand now returns previous seed value (0 to start). - -Feb 9, 1990: - fixed null pointer dereference bug in main.c: -F[nothing]. sigh. - - restored srand behavior: it returns the current seed. - -May 6, 1990: - AVA fixed the grammar so that ! is uniformly of the same precedence as - unary + and -. This renders illegal some constructs like !x=y, which - now has to be parenthesized as !(x=y), and makes others work properly: - !x+y is (!x)+y, and x!y is x !y, not two pattern-action statements. - (These problems were pointed out by Bob Lenk of Posix.) - - Added \x to regular expressions (already in strings). - Limited octal to octal digits; \8 and \9 are not octal. - Centralized the code for parsing escapes in regular expressions. - Added a bunch of tests to T.re and T.sub to verify some of this. - -Jun 26, 1990: - changed struct rrow (awk.h) to use long instead of int for lval, - since cfoll() stores a pointer in it. now works better when int's - are smaller than pointers! - -Aug 24, 1990: - changed NCHARS to 256 to handle 8-bit characters in strings - presented to match(), etc. - -Oct 8, 1990: - fixed horrible bug: types and values were not preserved in - some kinds of self-assignment. (in assign().) - -Oct 14, 1990: - fixed the bug on p. 198 in which it couldn't deduce that an - argument was an array in some contexts. replaced the error - message in intest() by code that damn well makes it an array. - -Oct 29, 1990: - fixed sleazy buggy code in lib.c that looked (incorrectly) for - too long input lines. - -Nov 2, 1990: - fixed sleazy test for integrality in getsval; use modf. - -Jan 11, 1991: - failed to set numeric state on $0 in cmd|getline context in run.c. - -Jan 28, 1991: - awk -f - reads the program from stdin. - -Feb 10, 1991: - check error status on all writes, to avoid banging on full disks. - -May 6, 1991: - fixed silly bug in hex parsing in hexstr(). - removed an apparently unnecessary test in isnumber(). - warn about weird printf conversions. - fixed unchecked array overwrite in relex(). - - changed for (i in array) to access elements in sorted order. - then unchanged it -- it really does run slower in too many cases. - left the code in place, commented out. - -May 13, 1991: - removed extra arg on gettemp, tempfree. minor error message rewording. - -Jun 2, 1991: - better defense against very long printf strings. - made break and continue illegal outside of loops. - -Jun 30, 1991: - better test for detecting too-long output record. - -Jul 21, 1991: - fixed so that in self-assignment like $1=$1, side effects - like recomputing $0 take place. (this is getting subtle.) - -Jul 27, 1991: - allow newline after ; in for statements. - -Aug 18, 1991: - enforce variable name syntax for commandline variables: has to - start with letter or _. - -Sep 24, 1991: - increased buffer in gsub. a very crude fix to a general problem. - and again on Sep 26. - -Nov 12, 1991: - cranked up some fixed-size arrays in b.c, and added a test for - overflow in penter. thanks to mark larsen. - -Nov 19, 1991: - use RAND_MAX instead of literal in builtin(). - -Nov 30, 1991: - fixed storage leak in freefa, failing to recover [N]CCL. - thanks to Bill Jones (jones@skorpio.usask.ca) - -Dec 2, 1991: - die-casting time: converted to ansi C, installed that. - -Feb 20, 1992: - recompile after abortive changes; should be unchanged. - -Apr 12, 1992: - added explicit check for /dev/std(in,out,err) in redirection. - unlike gawk, no /dev/fd/n yet. - - added fflush(file/pipe) builtin. hard to test satisfactorily. - not posix. - -Apr 24, 1992: - remove redundant close of stdin when using -f -. - - got rid of core dump with -d; awk -d just prints date. - -May 31, 1992: - added -mr N and -mf N options: more record and fields. - these really ought to adjust automatically. - - cleaned up some error messages; "out of space" now means - malloc returned NULL in all cases. - - changed rehash so that if it runs out, it just returns; - things will continue to run slow, but maybe a bit longer. - -Nov 28, 1992: - deleted yyunput and yyoutput from proto.h; - different versions of lex give these different declarations. - -Jul 23, 1993: - cosmetic changes: increased sizes of some arrays, - reworded some error messages. - - added CONVFMT as in posix (just replaced OFMT in getsval) - - FILENAME is now "" until the first thing that causes a file - to be opened. =awk.g.y -/**************************************************************** -Copyright (C) AT&T 1993 -All Rights Reserved - -Permission to use, copy, modify, and distribute this software and -its documentation for any purpose and without fee is hereby -granted, provided that the above copyright notice appear in all -copies and that both that the copyright notice and this -permission notice and warranty disclaimer appear in supporting -documentation, and that the name of AT&T or any of its entities -not be used in advertising or publicity pertaining to -distribution of the software without specific, written prior -permission. - -AT&T DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, -INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. -IN NO EVENT SHALL AT&T OR ANY OF ITS ENTITIES BE LIABLE FOR ANY -SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES -WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER -IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, -ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF -THIS SOFTWARE. -****************************************************************/ - -%{ -#include -#include "awk.h" -yywrap(void) { return(1); } - -Node *beginloc = 0; -Node *endloc = 0; -int infunc = 0; /* = 1 if in arglist or body of func */ -int inloop = 0; /* = 1 if in while, for, do */ -uchar *curfname = 0; /* current function name */ -Node *arglist = 0; /* list of args for current function */ -%} - -%union { - Node *p; - Cell *cp; - int i; - uchar *s; -} - -%token FIRSTTOKEN /* must be first */ -%token

PROGRAM PASTAT PASTAT2 XBEGIN XEND -%token NL ',' '{' '(' '|' ';' '/' ')' '}' '[' ']' -%token ARRAY -%token MATCH NOTMATCH MATCHOP -%token FINAL DOT ALL CCL NCCL CHAR OR STAR QUEST PLUS -%token AND BOR APPEND EQ GE GT LE LT NE IN -%token ARG BLTIN BREAK CLOSE CONTINUE DELETE DO EXIT FOR FUNC -%token SUB GSUB IF INDEX LSUBSTR MATCHFCN NEXT -%token ADD MINUS MULT DIVIDE MOD -%token ASSIGN ASGNOP ADDEQ SUBEQ MULTEQ DIVEQ MODEQ POWEQ -%token PRINT PRINTF SPRINTF -%token

ELSE INTEST CONDEXPR -%token POSTINCR PREINCR POSTDECR PREDECR -%token VAR IVAR VARNF CALL NUMBER STRING FIELD -%token REGEXPR - -%type

pas pattern ppattern plist pplist patlist prarg term re -%type

pa_pat pa_stat pa_stats -%type reg_expr -%type

simple_stmt opt_simple_stmt stmt stmtlist -%type

var varname funcname varlist -%type

for if while -%type pst opt_pst lbrace rparen comma nl opt_nl and bor -%type subop print - -%right ASGNOP -%right '?' -%right ':' -%left BOR -%left AND -%left GETLINE -%nonassoc APPEND EQ GE GT LE LT NE MATCHOP IN '|' -%left ARG BLTIN BREAK CALL CLOSE CONTINUE DELETE DO EXIT FOR FIELD FUNC -%left GSUB IF INDEX LSUBSTR MATCHFCN NEXT NUMBER -%left PRINT PRINTF RETURN SPLIT SPRINTF STRING SUB SUBSTR -%left REGEXPR VAR VARNF IVAR WHILE '(' -%left CAT -%left '+' '-' -%left '*' '/' '%' -%left NOT UMINUS -%right POWER -%right DECR INCR -%left INDIRECT -%token LASTTOKEN /* must be last */ - -%% - -program: - pas { if (errorflag==0) - winner = (Node *)stat3(PROGRAM, beginloc, $1, endloc); } - | error { yyclearin; bracecheck(); ERROR "bailing out" SYNTAX; } - ; - -and: - AND | and NL - ; - -bor: - BOR | bor NL - ; - -comma: - ',' | comma NL - ; - -do: - DO | do NL - ; - -else: - ELSE | else NL - ; - -for: - FOR '(' opt_simple_stmt ';' opt_nl pattern ';' opt_nl opt_simple_stmt rparen {inloop++;} stmt - { --inloop; $$ = stat4(FOR, $3, notnull($6), $9, $12); } - | FOR '(' opt_simple_stmt ';' ';' opt_nl opt_simple_stmt rparen {inloop++;} stmt - { --inloop; $$ = stat4(FOR, $3, NIL, $7, $10); } - | FOR '(' varname IN varname rparen {inloop++;} stmt - { --inloop; $$ = stat3(IN, $3, makearr($5), $8); } - ; - -funcname: - VAR { setfname($1); } - | CALL { setfname($1); } - ; - -if: - IF '(' pattern rparen { $$ = notnull($3); } - ; - -lbrace: - '{' | lbrace NL - ; - -nl: - NL | nl NL - ; - -opt_nl: - /* empty */ { $$ = 0; } - | nl - ; - -opt_pst: - /* empty */ { $$ = 0; } - | pst - ; - - -opt_simple_stmt: - /* empty */ { $$ = 0; } - | simple_stmt - ; - -pas: - opt_pst { $$ = 0; } - | opt_pst pa_stats opt_pst { $$ = $2; } - ; - -pa_pat: - pattern { $$ = notnull($1); } - ; - -pa_stat: - pa_pat { $$ = stat2(PASTAT, $1, stat2(PRINT, rectonode(), NIL)); } - | pa_pat lbrace stmtlist '}' { $$ = stat2(PASTAT, $1, $3); } - | pa_pat ',' pa_pat { $$ = pa2stat($1, $3, stat2(PRINT, rectonode(), NIL)); } - | pa_pat ',' pa_pat lbrace stmtlist '}' { $$ = pa2stat($1, $3, $5); } - | lbrace stmtlist '}' { $$ = stat2(PASTAT, NIL, $2); } - | XBEGIN lbrace stmtlist '}' - { beginloc = linkum(beginloc, $3); $$ = 0; } - | XEND lbrace stmtlist '}' - { endloc = linkum(endloc, $3); $$ = 0; } - | FUNC funcname '(' varlist rparen {infunc++;} lbrace stmtlist '}' - { infunc--; curfname=0; defn((Cell *)$2, $4, $8); $$ = 0; } - ; - -pa_stats: - pa_stat - | pa_stats opt_pst pa_stat { $$ = linkum($1, $3); } - ; - -patlist: - pattern - | patlist comma pattern { $$ = linkum($1, $3); } - ; - -ppattern: - var ASGNOP ppattern { $$ = op2($2, $1, $3); } - | ppattern '?' ppattern ':' ppattern %prec '?' - { $$ = op3(CONDEXPR, notnull($1), $3, $5); } - | ppattern bor ppattern %prec BOR - { $$ = op2(BOR, notnull($1), notnull($3)); } - | ppattern and ppattern %prec AND - { $$ = op2(AND, notnull($1), notnull($3)); } - | ppattern MATCHOP reg_expr { $$ = op3($2, NIL, $1, (Node*)makedfa($3, 0)); } - | ppattern MATCHOP ppattern - { if (constnode($3)) - $$ = op3($2, NIL, $1, (Node*)makedfa(strnode($3), 0)); - else - $$ = op3($2, (Node *)1, $1, $3); } - | ppattern IN varname { $$ = op2(INTEST, $1, makearr($3)); } - | '(' plist ')' IN varname { $$ = op2(INTEST, $2, makearr($5)); } - | ppattern term %prec CAT { $$ = op2(CAT, $1, $2); } - | re - | term - ; - -pattern: - var ASGNOP pattern { $$ = op2($2, $1, $3); } - | pattern '?' pattern ':' pattern %prec '?' - { $$ = op3(CONDEXPR, notnull($1), $3, $5); } - | pattern bor pattern %prec BOR - { $$ = op2(BOR, notnull($1), notnull($3)); } - | pattern and pattern %prec AND - { $$ = op2(AND, notnull($1), notnull($3)); } - | pattern EQ pattern { $$ = op2($2, $1, $3); } - | pattern GE pattern { $$ = op2($2, $1, $3); } - | pattern GT pattern { $$ = op2($2, $1, $3); } - | pattern LE pattern { $$ = op2($2, $1, $3); } - | pattern LT pattern { $$ = op2($2, $1, $3); } - | pattern NE pattern { $$ = op2($2, $1, $3); } - | pattern MATCHOP reg_expr { $$ = op3($2, NIL, $1, (Node*)makedfa($3, 0)); } - | pattern MATCHOP pattern - { if (constnode($3)) - $$ = op3($2, NIL, $1, (Node*)makedfa(strnode($3), 0)); - else - $$ = op3($2, (Node *)1, $1, $3); } - | pattern IN varname { $$ = op2(INTEST, $1, makearr($3)); } - | '(' plist ')' IN varname { $$ = op2(INTEST, $2, makearr($5)); } - | pattern '|' GETLINE var { $$ = op3(GETLINE, $4, (Node*)$2, $1); } - | pattern '|' GETLINE { $$ = op3(GETLINE, (Node*)0, (Node*)$2, $1); } - | pattern term %prec CAT { $$ = op2(CAT, $1, $2); } - | re - | term - ; - -plist: - pattern comma pattern { $$ = linkum($1, $3); } - | plist comma pattern { $$ = linkum($1, $3); } - ; - -pplist: - ppattern - | pplist comma ppattern { $$ = linkum($1, $3); } - ; - -prarg: - /* empty */ { $$ = rectonode(); } - | pplist - | '(' plist ')' { $$ = $2; } - ; - -print: - PRINT | PRINTF - ; - -pst: - NL | ';' | pst NL | pst ';' - ; - -rbrace: - '}' | rbrace NL - ; - -re: - reg_expr - { $$ = op3(MATCH, NIL, rectonode(), (Node*)makedfa($1, 0)); } - | NOT re { $$ = op1(NOT, notnull($2)); } - ; - -reg_expr: - '/' {startreg();} REGEXPR '/' { $$ = $3; } - ; - -rparen: - ')' | rparen NL - ; - -simple_stmt: - print prarg '|' term { $$ = stat3($1, $2, (Node *) $3, $4); } - | print prarg APPEND term { $$ = stat3($1, $2, (Node *) $3, $4); } - | print prarg GT term { $$ = stat3($1, $2, (Node *) $3, $4); } - | print prarg { $$ = stat3($1, $2, NIL, NIL); } - | DELETE varname '[' patlist ']' { $$ = stat2(DELETE, makearr($2), $4); } - | DELETE varname { yyclearin; ERROR "you can only delete array[element]" SYNTAX; $$ = stat1(DELETE, $2); } - | pattern { $$ = exptostat($1); } - | error { yyclearin; ERROR "illegal statement" SYNTAX; } - ; - -st: - nl | ';' opt_nl - ; - -stmt: - BREAK st { if (!inloop) ERROR "break illegal outside of loops" SYNTAX; - $$ = stat1(BREAK, NIL); } - | CLOSE pattern st { $$ = stat1(CLOSE, $2); } - | CONTINUE st { if (!inloop) ERROR "continue illegal outside of loops" SYNTAX; - $$ = stat1(CONTINUE, NIL); } - | do {inloop++;} stmt {--inloop;} WHILE '(' pattern ')' st - { $$ = stat2(DO, $3, notnull($7)); } - | EXIT pattern st { $$ = stat1(EXIT, $2); } - | EXIT st { $$ = stat1(EXIT, NIL); } - | for - | if stmt else stmt { $$ = stat3(IF, $1, $2, $4); } - | if stmt { $$ = stat3(IF, $1, $2, NIL); } - | lbrace stmtlist rbrace { $$ = $2; } - | NEXT st { if (infunc) - ERROR "next is illegal inside a function" SYNTAX; - $$ = stat1(NEXT, NIL); } - | RETURN pattern st { $$ = stat1(RETURN, $2); } - | RETURN st { $$ = stat1(RETURN, NIL); } - | simple_stmt st - | while {inloop++;} stmt { --inloop; $$ = stat2(WHILE, $1, $3); } - | ';' opt_nl { $$ = 0; } - ; - -stmtlist: - stmt - | stmtlist stmt { $$ = linkum($1, $2); } - ; - -subop: - SUB | GSUB - ; - -term: - term '+' term { $$ = op2(ADD, $1, $3); } - | term '-' term { $$ = op2(MINUS, $1, $3); } - | term '*' term { $$ = op2(MULT, $1, $3); } - | term '/' term { $$ = op2(DIVIDE, $1, $3); } - | term '%' term { $$ = op2(MOD, $1, $3); } - | term POWER term { $$ = op2(POWER, $1, $3); } - | '-' term %prec UMINUS { $$ = op1(UMINUS, $2); } - | '+' term %prec UMINUS { $$ = $2; } - | NOT term %prec UMINUS { $$ = op1(NOT, notnull($2)); } - | BLTIN '(' ')' { $$ = op2(BLTIN, (Node *) $1, rectonode()); } - | BLTIN '(' patlist ')' { $$ = op2(BLTIN, (Node *) $1, $3); } - | BLTIN { $$ = op2(BLTIN, (Node *) $1, rectonode()); } - | CALL '(' ')' { $$ = op2(CALL, valtonode($1,CVAR), NIL); } - | CALL '(' patlist ')' { $$ = op2(CALL, valtonode($1,CVAR), $3); } - | DECR var { $$ = op1(PREDECR, $2); } - | INCR var { $$ = op1(PREINCR, $2); } - | var DECR { $$ = op1(POSTDECR, $1); } - | var INCR { $$ = op1(POSTINCR, $1); } - | GETLINE var LT term { $$ = op3(GETLINE, $2, (Node *)$3, $4); } - | GETLINE LT term { $$ = op3(GETLINE, NIL, (Node *)$2, $3); } - | GETLINE var { $$ = op3(GETLINE, $2, NIL, NIL); } - | GETLINE { $$ = op3(GETLINE, NIL, NIL, NIL); } - | INDEX '(' pattern comma pattern ')' - { $$ = op2(INDEX, $3, $5); } - | INDEX '(' pattern comma reg_expr ')' - { ERROR "index() doesn't permit regular expressions" SYNTAX; - $$ = op2(INDEX, $3, (Node*)$5); } - | '(' pattern ')' { $$ = $2; } - | MATCHFCN '(' pattern comma reg_expr ')' - { $$ = op3(MATCHFCN, NIL, $3, (Node*)makedfa($5, 1)); } - | MATCHFCN '(' pattern comma pattern ')' - { if (constnode($5)) - $$ = op3(MATCHFCN, NIL, $3, (Node*)makedfa(strnode($5), 1)); - else - $$ = op3(MATCHFCN, (Node *)1, $3, $5); } - | NUMBER { $$ = valtonode($1, CCON); } - | SPLIT '(' pattern comma varname comma pattern ')' /* string */ - { $$ = op4(SPLIT, $3, makearr($5), $7, (Node*)STRING); } - | SPLIT '(' pattern comma varname comma reg_expr ')' /* const /regexp/ */ - { $$ = op4(SPLIT, $3, makearr($5), (Node*)makedfa($7, 1), (Node *)REGEXPR); } - | SPLIT '(' pattern comma varname ')' - { $$ = op4(SPLIT, $3, makearr($5), NIL, (Node*)STRING); } /* default */ - | SPRINTF '(' patlist ')' { $$ = op1($1, $3); } - | STRING { $$ = valtonode($1, CCON); } - | subop '(' reg_expr comma pattern ')' - { $$ = op4($1, NIL, (Node*)makedfa($3, 1), $5, rectonode()); } - | subop '(' pattern comma pattern ')' - { if (constnode($3)) - $$ = op4($1, NIL, (Node*)makedfa(strnode($3), 1), $5, rectonode()); - else - $$ = op4($1, (Node *)1, $3, $5, rectonode()); } - | subop '(' reg_expr comma pattern comma var ')' - { $$ = op4($1, NIL, (Node*)makedfa($3, 1), $5, $7); } - | subop '(' pattern comma pattern comma var ')' - { if (constnode($3)) - $$ = op4($1, NIL, (Node*)makedfa(strnode($3), 1), $5, $7); - else - $$ = op4($1, (Node *)1, $3, $5, $7); } - | SUBSTR '(' pattern comma pattern comma pattern ')' - { $$ = op3(SUBSTR, $3, $5, $7); } - | SUBSTR '(' pattern comma pattern ')' - { $$ = op3(SUBSTR, $3, $5, NIL); } - | var - ; - -var: - varname - | varname '[' patlist ']' { $$ = op2(ARRAY, makearr($1), $3); } - | FIELD { $$ = valtonode($1, CFLD); } - | IVAR { $$ = op1(INDIRECT, valtonode($1, CVAR)); } - | INDIRECT term { $$ = op1(INDIRECT, $2); } - ; - -varlist: - /* nothing */ { arglist = $$ = 0; } - | VAR { arglist = $$ = valtonode($1,CVAR); } - | varlist comma VAR { arglist = $$ = linkum($1,valtonode($3,CVAR)); } - ; - -varname: - VAR { $$ = valtonode($1, CVAR); } - | ARG { $$ = op1(ARG, (Node *) $1); } - | VARNF { $$ = op1(VARNF, (Node *) $1); } - ; - - -while: - WHILE '(' pattern rparen { $$ = notnull($3); } - ; - -%% - -void setfname(Cell *p) -{ - if (isarr(p)) - ERROR "%s is an array, not a function", p->nval SYNTAX; - else if (isfunc(p)) - ERROR "you can't define function %s more than once", p->nval SYNTAX; - curfname = p->nval; -} - -constnode(Node *p) -{ - return isvalue(p) && ((Cell *) (p->narg[0]))->csub == CCON; -} - -uchar *strnode(Node *p) -{ - return ((Cell *)(p->narg[0]))->sval; -} - -Node *notnull(Node *n) -{ - switch (n->nobj) { - case LE: case LT: case EQ: case NE: case GT: case GE: - case BOR: case AND: case NOT: - return n; - default: - return op2(NE, n, nullnode); - } -} =awk.1 -.TH AWK 1 -.CT 1 files prog_other -.SH NAME -awk \- pattern-directed scanning and processing language -.SH SYNOPSIS -.B awk -[ -.BI -F -.I fs -] -[ -.BI -v -.I var=value -] -[ -.I 'prog' -| -.BI -f -.I progfile -] -[ -.I file ... -] -.SH DESCRIPTION -.I Awk -scans each input -.I file -for lines that match any of a set of patterns specified literally in -.IR prog -or in one or more files -specified as -.B -f -.IR progfile . -With each pattern -there can be an associated action that will be performed -when a line of a -.I file -matches the pattern. -Each line is matched against the -pattern portion of every pattern-action statement; -the associated action is performed for each matched pattern. -The file name -.L - -means the standard input. -Any -.IR file -of the form -.I var=value -is treated as an assignment, not a filename, -and is executed at the time it would have been opened if it were a filename. -The option -.B -v -followed by -.I var=value -is an assignment to be done before -.I prog -is executed; -any number of -.B -v -options may be present. -The -.B -F -.IR fs -option defines the input field separator to be the regular expression -.IR fs. -.PP -An input line is normally made up of fields separated by white space. -(This default can be changed by using the FS built-in variable or the -.B -F -.IR fs -option.) -The fields are denoted -.BR $1 , -.BR $2 , -\&..., while -.B $0 -refers to the entire line. -.PP -A pattern-action statement has the form -.IP -.IB pattern " { " action " } -.PP -A missing -.BI { " action " } -means print the line; -a missing pattern always matches. -Pattern-action statements are separated by newlines or semicolons. -.PP -An action is a sequence of statements. -A statement can be one of the following: -.PP -.EX -.ta \w'\f(CWdelete array[expression]'u -.RS -.nf -if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP -while(\fI expression \fP)\fI statement\fP -for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP -for(\fI var \fPin\fI array \fP)\fI statement\fP -do\fI statement \fPwhile(\fI expression \fP) -break -continue -{\fR [\fP\fI statement ... \fP\fR] \fP} -\fIexpression\fP #\fR commonly\fP\fI var = expression\fP -print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP -printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP -return\fR [ \fP\fIexpression \fP\fR]\fP -next #\fR skip remaining patterns on this input line\fP -delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP -exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP -.fi -.RE -.EE -.DT -.PP -Statements are terminated by -semicolons, newlines or right braces. -An empty -.I expression-list -stands for -.BR $0 . -String constants are quoted \&\f(CW"\ "\fR, -with the usual C escapes recognized within. -Expressions take on string or numeric values as appropriate, -and are built using the operators -.B + - * / % ^ -(exponentiation), and concatenation (indicated by a blank). -The operators -.B -! ++ -- += -= *= /= %= ^= > >= < <= == != ?: -are also available in expressions. -Variables may be scalars, array elements -(denoted -.IB x [ i ] ) -or fields. -Variables are initialized to the null string. -Array subscripts may be any string, -not necessarily numeric; -this allows for a form of associative memory. -Multiple subscripts such as -.B [i,j,k] -are permitted; the constituents are concatenated, -separated by the value of -.BR SUBSEP . -.PP -The -.B print -statement prints its arguments on the standard output -(or on a file if -.BI > file -or -.BI >> file -is present or on a pipe if -.BI | cmd -is present), separated by the current output field separator, -and terminated by the output record separator. -.I file -and -.I cmd -may be literal names or parenthesized expressions; -identical string values in different statements denote -the same open file. -The -.B printf -statement formats its expression list according to the format -(see -.IR printf (3)) . -The built-in function -.BI close( expr ) -closes the file or pipe -.IR expr . -.PP -The mathematical functions -.BR exp , -.BR log , -.BR sqrt , -.BR sin , -.BR cos , -and -.BR atan2 -are built in. -Other built-in functions: -.TF length -.TP -.B length -the length of its argument -taken as a string, -or of -.B $0 -if no argument. -.TP -.B rand -random number on (0,1) -.TP -.B srand -sets seed for -.B rand -and returns the previous seed. -.TP -.B int -truncates to an integer value -.TP -.BI substr( s , " m" , " n\fB) -the -.IR n -character -substring of -.I s -that begins at position -.IR m -counted from 1. -.TP -.BI index( s , " t" ) -the position in -.I s -where the string -.I t -occurs, or 0 if it does not. -.TP -.BI match( s , " r" ) -the position in -.I s -where the regular expression -.I r -occurs, or 0 if it does not. -The variables -.B RSTART -and -.B RLENGTH -are set to the position and length of the matched string. -.TP -.BI split( s , " a" , " fs\fB) -splits the string -.I s -into array elements -.IB a [1] , -.IB a [2] , -\&..., -.IB a [ n ] , -and returns -.IR n . -The separation is done with the regular expression -.I fs -or with the field separator -.B FS -if -.I fs -is not given. -.TP -.BI sub( r , " t" , " s\fB) -substitutes -.I t -for the first occurrence of the regular expression -.I r -in the string -.IR s . -If -.I s -is not given, -.B $0 -is used. -.TP -.B gsub -same as -.B sub -except that all occurrences of the regular expression -are replaced; -.B sub -and -.B gsub -return the number of replacements. -.TP -.BI sprintf( fmt , " expr" , " ...\fB ) -the string resulting from formatting -.I expr ... -according to the -.IR printf (3) -format -.I fmt -.TP -.BI system( cmd ) -executes -.I cmd -and returns its exit status -.PD -.PP -The ``function'' -.B getline -sets -.B $0 to -the next input record from the current input file; -.B getline -.BI < file -sets -.B $0 -to the next record from -.IR file . -.B getline -.I x -sets variable -.I x -instead. -Finally, -.IB cmd " | getline -pipes the output of -.I cmd -into -.BR getline ; -each call of -.B getline -returns the next line of output from -.IR cmd . -In all cases, -.B getline -returns 1 for a successful input, -0 for end of file, and \-1 for an error. -.PP -Patterns are arbitrary Boolean combinations -(with -.BR "! || &&" ) -of regular expressions and -relational expressions. -Regular expressions are as in -.IR egrep ; -see -.IR grep (1). -Isolated regular expressions -in a pattern apply to the entire line. -Regular expressions may also occur in -relational expressions, using the operators -.BR ~ -and -.BR !~ . -.BI / re / -is a constant regular expression; -any string (constant or variable) may be used -as a regular expression, except in the position of an isolated regular expression -in a pattern. -.PP -A pattern may consist of two patterns separated by a comma; -in this case, the action is performed for all lines -from an occurrence of the first pattern -though an occurrence of the second. -.PP -A relational expression is one of the following: -.IP -.I expression matchop regular-expression -.br -.I expression relop expression -.br -.IB expression " in " array-name -.br -.BI ( expr , expr,... ") in " array-name -.PP -where a relop is any of the six relational operators in C, -and a matchop is either -.B ~ -(matches) -or -.B !~ -(does not match). -A conditional is an arithmetic expression, -a relational expression, -or a Boolean combination -of these. -.PP -The special patterns -.B BEGIN -and -.B END -may be used to capture control before the first input line is read -and after the last. -.B BEGIN -and -.B END -do not combine with other patterns. -.PP -Variable names with special meanings: -.TF FILENAME -.TP -.B FS -regular expression used to separate fields; also settable -by option -.BI -F fs. -.TP -.BR NF -number of fields in the current record -.TP -.B NR -ordinal number of the current record -.TP -.B FNR -ordinal number of the current record in the current file -.TP -.B FILENAME -the name of the current input file -.TP -.B RS -input record separator (default newline) -.TP -.B OFS -output field separator (default blank) -.TP -.B ORS -output record separator (default newline) -.TP -.B OFMT -output format for numbers (default -.BR "%.6g" ) -.TP -.B SUBSEP -separates multiple subscripts (default 034) -.TP -.B ARGC -argument count, assignable -.TP -.B ARGV -argument array, assignable; -non-null members are taken as filenames -.TP -.B ENVIRON -array of environment variables; subscripts are names. -.PD -.PP -Functions may be defined (at the position of a pattern-action statement) thus: -.IP -.L -function foo(a, b, c) { ...; return x } -.PP -Parameters are passed by value if scalar and by reference if array name; -functions may be called recursively. -Parameters are local to the function; all other variables are global. -Thus local variables may be created by providing excess parameters in -the function definition. -.SH EXAMPLES -.TP -.L -length > 72 -Print lines longer than 72 characters. -.TP -.L -{ print $2, $1 } -Print first two fields in opposite order. -.PP -.EX -BEGIN { FS = ",[ \et]*|[ \et]+" } - { print $2, $1 } -.EE -.ns -.IP -Same, with input fields separated by comma and/or blanks and tabs. -.PP -.EX -.nf - { s += $1 } -END { print "sum is", s, " average is", s/NR } -.fi -.EE -.ns -.IP -Add up first column, print sum and average. -.TP -.L -/start/, /stop/ -Print all lines between start/stop pairs. -.PP -.EX -.nf -BEGIN { # Simulate echo(1) - for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] - printf "\en" - exit } -.fi -.EE -.SH SEE ALSO -.IR lex (1), -.IR sed (1) -.br -A. V. Aho, B. W. Kernighan, P. J. Weinberger, -.I -The AWK Programming Language, -Addison-Wesley, 1988. -.SH BUGS -There are no explicit conversions between numbers and strings. -To force an expression to be treated as a number add 0 to it; -to force it to be treated as a string concatenate -\&\f(CW""\fP to it. -.br -The scope rules for variables in functions are a botch; -the syntax is worse. - + END OF ARCHIVE