1 Tue Apr 25 10:51:27 1995 Sigbjorn Finne <sof@dcs.gla.ac.uk>
3 * Merged in the regex.c and regex.h of gawk-2.15.6, following a
4 suggestion on gnu.utils.bugs
6 * regex.h: Added defines for Perl syntax, RE_PERL_MULTILINE_SYNTAX
7 and RE_PERL_SINGLELINE_SYNTAX
9 * regex.c (regex_compile): Added handling of Perl operators,
10 nothing exciting - just different syntax for common operators.
12 Fri Apr 2 17:31:59 1993 Jim Blandy (jimb@totoro.cs.oberlin.edu)
14 * Released version 0.12.
16 * regex.c (regerror): If errcode is zero, that's not a valid
17 error code, according to POSIX, but return "Success."
19 * regex.c (regerror): Remember to actually fetch the message
22 * regex.c (regex_compile): Don't use the trick for ".*\n" on
23 ".+\n". Since the latter involves laying an extra choice
24 point, the backward jump isn't adjusted properly.
26 Thu Mar 25 21:35:18 1993 Jim Blandy (jimb@totoro.cs.oberlin.edu)
28 * regex.c (regex_compile): In the handle_open and handle_close
29 sections, clear pending_exact to zero.
31 Tue Mar 9 12:03:07 1993 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu)
33 * regex.c (re_search_2): In the loop which searches forward
34 using fastmap, don't forget to cast the character from the
35 string to an unsigned before using it as an index into the
38 Thu Jan 14 15:41:46 1993 David J. MacKenzie (djm@kropotkin.gnu.ai.mit.edu)
40 * regex.h: Never define const; let the callers do it.
41 configure.in: Don't define USING_AUTOCONF.
43 Wed Jan 6 20:49:29 1993 Jim Blandy (jimb@geech.gnu.ai.mit.edu)
45 * regex.c (regerror): Abort if ERRCODE is out of range.
47 Sun Dec 20 16:19:10 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu)
49 * configure.in: Arrange to #define USING_AUTOCONF.
50 * regex.h: If USING_AUTOCONF is #defined, don't mess with
51 `const' at all; autoconf has taken care of it.
53 Mon Dec 14 21:40:39 1992 David J. MacKenzie (djm@kropotkin.gnu.ai.mit.edu)
55 * regex.h (RE_SYNTAX_AWK): Fix typo. From Arnold Robbins.
57 Sun Dec 13 20:35:39 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu)
59 * regex.c (compile_range): Fetch the range start and end by
60 casting the pattern pointer to an `unsigned char *' before
63 Sat Dec 12 09:41:01 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu)
65 * regex.c: Undo change of 12/7/92; it's better for Emacs to
66 #define HAVE_CONFIG_H.
68 Fri Dec 11 22:00:34 1992 Jim Meyering (meyering@hal.gnu.ai.mit.edu)
70 * regex.c: Define and use isascii-protected ctype.h macros.
72 Fri Dec 11 05:10:38 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu)
74 * regex.c (re_match_2): Undo Karl's November 10th change; it
75 keeps the group in :\(.*\) from matching :/ properly.
77 Mon Dec 7 19:44:56 1992 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu)
79 * regex.c: #include config.h if either HAVE_CONFIG_H or emacs
82 Tue Dec 1 13:33:17 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu)
84 * regex.c [HAVE_CONFIG_H]: Include config.h.
86 Wed Nov 25 23:46:02 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu)
88 * regex.c (regcomp): Add parens around bitwise & for clarity.
89 Initialize preg->allocated to prevent segv.
91 Tue Nov 24 09:22:29 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu)
93 * regex.c: Use HAVE_STRING_H, not USG.
94 * configure.in: Check for string.h, not USG.
96 Fri Nov 20 06:33:24 1992 Karl Berry (karl@cs.umb.edu)
98 * regex.c (SIGN_EXTEND_CHAR) [VMS]: Back out of this change,
99 since Roland Roberts now says it was a localism.
101 Mon Nov 16 07:01:36 1992 Karl Berry (karl@cs.umb.edu)
103 * regex.h (const) [!HAVE_CONST]: Test another cpp symbol (from
104 Autoconf) before zapping const.
106 Sun Nov 15 05:36:42 1992 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu)
108 * regex.c, regex.h: Changes for VMS from Roland B Roberts
109 <roberts@nsrl31.nsrl.rochester.edu>.
111 Thu Nov 12 11:31:15 1992 Karl Berry (karl@cs.umb.edu)
113 * Makefile.in (distfiles): Include INSTALL.
115 Tue Nov 10 09:29:23 1992 Karl Berry (karl@cs.umb.edu)
117 * regex.c (re_match_2): At maybe_pop_jump, if at end of string
118 and pattern, just quit the matching loop.
120 * regex.c (LETTER_P): Rename to `WORDCHAR_P'.
122 * regex.c (AT_STRINGS_{BEG,END}): Take `d' as an arg; change
125 * regex.c (re_match_2) [!emacs]: In wordchar and notwordchar
128 Wed Nov 4 15:43:58 1992 Karl Berry (karl@hal.gnu.ai.mit.edu)
130 * regex.h (const) [!__STDC__]: Don't define if it's already defined.
132 Sat Oct 17 19:28:19 1992 Karl Berry (karl@cs.umb.edu)
134 * regex.c (bcmp, bcopy, bzero): Only #define if they are not
137 * configure.in: Use AC_CONST.
139 Thu Oct 15 08:39:06 1992 Karl Berry (karl@cs.umb.edu)
141 * regex.h (const) [!const]: Conditionalize.
143 Fri Oct 2 13:31:42 1992 Karl Berry (karl@cs.umb.edu)
145 * regex.h (RE_SYNTAX_ED): New definition.
147 Sun Sep 20 12:53:39 1992 Karl Berry (karl@cs.umb.edu)
149 * regex.[ch]: remove traces of `longest_p' -- dumb idea to put
150 this into the pattern buffer, as it means parallelism loses.
152 * Makefile.in (config.status): use sh to run configure --no-create.
154 * Makefile.in (realclean): OK, don't remove configure.
156 Sat Sep 19 09:05:08 1992 Karl Berry (karl@hayley)
158 * regex.c (PUSH_FAILURE_POINT, POP_FAILURE_POINT) [DEBUG]: keep
159 track of how many failure points we push and pop.
160 (re_match_2) [DEBUG]: declare variables for that, and print results.
161 (DEBUG_PRINT4): new macro.
163 * regex.h (re_pattern_buffer): new field `longest_p' (to
164 eliminate backtracking if the user doesn't need it).
165 * regex.c (re_compile_pattern): initialize it (to 1).
166 (re_search_2): set it to zero if register information is not needed.
167 (re_match_2): if it's set, don't backtrack.
169 * regex.c (re_search_2): update fastmap only after checking that
170 the pattern is anchored.
172 * regex.c (re_match_2): do more debugging at maybe_pop_jump.
174 * regex.c (re_search_2): cast result of TRANSLATE for use in
177 Thu Sep 17 19:47:16 1992 Karl Berry (karl@geech.gnu.ai.mit.edu)
181 Wed Sep 16 08:17:10 1992 Karl Berry (karl@hayley)
183 * regex.c (INIT_FAIL_STACK): rewrite as statements instead of a
184 complicated comma expr, to avoid compiler warnings (and also
186 (re_compile_fastmap, re_match_2): change callers.
188 * regex.c (POP_FAILURE_POINT): cast pop of regstart and regend
189 to avoid compiler warnings.
191 * regex.h (RE_NEWLINE_ORDINARY): remove this syntax bit, and
193 * regex.c (at_{beg,end}line_loc_p): go the last mile: remove
194 the RE_NEWLINE_ORDINARY case which made the ^ in \n^ be an anchor.
196 Tue Sep 15 09:55:29 1992 Karl Berry (karl@hayley)
198 * regex.c (at_begline_loc_p): new fn.
199 (at_endline_loc_p): simplify at_endline_op_p.
200 (regex_compile): in ^/$ cases, call the above.
202 * regex.c (POP_FAILURE_POINT): rewrite the fn as a macro again,
203 as lord's profiling indicates the function is 20% of the time.
204 (re_match_2): callers changed.
206 * configure.in (AC_MEMORY_H): remove, since we never use memcpy et al.
208 Mon Sep 14 17:49:27 1992 Karl Berry (karl@hayley)
210 * Makefile.in (makeargs): include MFLAGS.
212 Sun Sep 13 07:41:45 1992 Karl Berry (karl@hayley)
214 * regex.c (regex_compile): in \1..\9 case, make it always
215 invalid to use \<digit> if there is no preceding <digit>th subexpr.
216 * regex.h (RE_NO_MISSING_BK_REF): remove this syntax bit.
218 * regex.c (regex_compile): remove support for invalid empty groups.
219 * regex.h (RE_NO_EMPTY_GROUPS): remove this syntax bit.
221 * regex.c (FREE_VARIABLES) [!REGEX_MALLOC]: define as alloca (0),
224 * regex.h (RE_SYNTAX_POSIX_SED): don't bother with this.
226 Sat Sep 12 13:37:21 1992 Karl Berry (karl@hayley)
228 * README: incorporate emacs.diff.
230 * regex.h (_RE_ARGS) [!__STDC__]: define as empty parens.
232 * configure.in: add AC_ALLOCA.
234 * Put test files in subdir test, documentation in subdir doc.
235 Adjust Makefile.in and configure.in accordingly.
237 Thu Sep 10 10:29:11 1992 Karl Berry (karl@hayley)
239 * regex.h (RE_SYNTAX_{POSIX_,}SED): new definitions.
241 Wed Sep 9 06:27:09 1992 Karl Berry (karl@hayley)
245 Tue Sep 8 07:32:30 1992 Karl Berry (karl@hayley)
247 * xregex.texinfo: put the day of month into the date.
249 * Makefile.in (realclean): remove Texinfo-generated files.
250 (distclean): remove empty sorted index files.
251 (clean): remove dvi files, etc.
253 * configure.in: test for more Unix variants.
255 * fileregex.c: new file.
256 Makefile.in (fileregex): new target.
258 * iregex.c (main): move variable decls to smallest scope.
260 * regex.c (FREE_VARIABLES): free reg_{,info_}dummy.
261 (re_match_2): check that the allocation for those two succeeded.
263 * regex.c (FREE_VAR): replace FREE_NONNULL with this.
264 (FREE_VARIABLES): call it.
265 (re_match_2) [REGEX_MALLOC]: initialize all our vars to NULL.
267 * tregress.c (do_match): generalize simple_match.
268 (SIMPLE_NONMATCH): new macro.
269 (SIMPLE_MATCH): change from routine.
271 * Makefile.in (regex.texinfo): make file readonly, so we don't
274 * many files (re_default_syntax): rename to `re_syntax_options';
275 call re_set_syntax instead of assigning to the variable where
278 Mon Sep 7 10:12:16 1992 Karl Berry (karl@hayley)
280 * syntax.skel: don't use prototypes.
282 * {configure,Makefile}.in: new files.
284 * regex.c: include <string.h> `#if USG || STDC_HEADERS'; remove
285 obsolete test for `POSIX', and test for BSRTING.
286 Include <strings.h> if we are not USG or STDC_HEADERS.
287 Do not include <unistd.h>. What did we ever need that for?
289 * regex.h (RE_NO_EMPTY_ALTS): remove this.
290 (RE_SYNTAX_AWK): remove from here, too.
291 * regex.c (regex_compile): remove the check.
292 * xregex.texinfo (Alternation Operator): update.
293 * other.c (test_others): remove tests for this.
295 * regex.h (RE_DUP_MAX): undefine if already defined.
297 * regex.h: (RE_SYNTAX_POSIX*): redo to allow more operators, and
298 define new syntaxes with the minimal set.
300 * syntax.skel (main): used sscanf instead of scanf.
302 * regex.h (RE_SYNTAX_*GREP): new definitions from mike.
304 * regex.c (regex_compile): initialize the upper bound of
305 intervals at the beginning of the interval, not the end.
306 (From pclink@qld.tne.oz.au.)
308 * regex.c (handle_bar): rename to `handle_alt', for consistency.
310 * regex.c ({store,insert}_{op1,op2}): new routines (except the last).
311 ({STORE,INSERT}_JUMP{,2}): macros to replace the old routines,
312 which took arguments in different orders, and were generally weird.
314 * regex.c (PAT_PUSH*): rename to `BUF_PUSH*' -- we're not
315 appending info to the pattern!
317 Sun Sep 6 11:26:49 1992 Karl Berry (karl@hayley)
319 * regex.c (regex_compile): delete the variable
320 `following_left_brace', since we never use it.
322 * regex.c (print_compiled_pattern): don't print the fastmap if
325 * regex.c (re_compile_fastmap): handle
326 `on_failure_keep_string_jump' like `on_failure_jump'.
328 * regex.c (re_match_2): in `charset{,_not' case, cast the bit
329 count to unsigned, not unsigned char, in case we have a full
332 * tregress.c (simple_match): remove.
333 (simple_test): rename as `simple_match'.
334 (simple_compile): print the error string if the compile failed.
336 * regex.c (DO_RANGE): rewrite as a function, `compile_range', so
337 we can debug it. Change pattern characters to unsigned char
338 *'s, and change the range variable to an unsigned.
339 (regex_compile): change calls.
341 Sat Sep 5 17:40:49 1992 Karl Berry (karl@hayley)
343 * regex.h (_RE_ARGS): new macro to put in argument lists (if
344 ANSI) or omit them (if K&R); don't declare routines twice.
346 * many files (obscure_syntax): rename to `re_default_syntax'.
348 Fri Sep 4 09:06:53 1992 Karl Berry (karl@hayley)
350 * GNUmakefile (extraclean): new target.
351 (realclean): delete the info files.
353 Wed Sep 2 08:14:42 1992 Karl Berry (karl@hayley)
357 Sun Aug 23 06:53:15 1992 Karl Berry (karl@hayley)
359 * regex.[ch] (re_comp): no const in the return type (from djm).
361 Fri Aug 14 07:25:46 1992 Karl Berry (karl@hayley)
363 * regex.c (DO_RANGE): declare variables as unsigned chars, not
364 signed chars (from jimb).
366 Wed Jul 29 18:33:53 1992 Karl Berry (karl@claude.cs.umb.edu)
370 * GNUmakefile (distclean): do not remove regex.texinfo.
371 (realclean): remove it here.
373 * tregress.c (simple_test): initialize buf.buffer.
375 Sun Jul 26 08:59:38 1992 Karl Berry (karl@hayley)
377 * regex.c (push_dummy_failure): new opcode and corresponding
378 case in the various routines. Pushed at the end of
381 * regex.c (jump_past_next_alt): rename to `jump_past_alt', for
383 (no_pop_jump): rename to `jump'.
385 * regex.c (regex_compile) [DEBUG]: terminate printing of pattern
390 * tregress.c (simple_{compile,match,test}): routines to simplify all
393 * tregress.c: test for matching as much as possible.
395 Fri Jul 10 06:53:32 1992 Karl Berry (karl@hayley)
399 Wed Jul 8 06:39:31 1992 Karl Berry (karl@hayley)
401 * regex.c (SIGN_EXTEND_CHAR): #undef any previous definition, as
402 ours should always work properly.
404 Mon Jul 6 07:10:50 1992 Karl Berry (karl@hayley)
406 * iregex.c (main) [DEBUG]: conditionalize the call to
407 print_compiled_pattern.
409 * iregex.c (main): initialize buf.buffer to NULL.
410 * tregress (test_regress): likewise.
412 * regex.c (alloca) [sparc]: #if on HAVE_ALLOCA_H instead.
414 * tregress.c (test_regress): didn't have jla's test quite right.
416 Sat Jul 4 09:02:12 1992 Karl Berry (karl@hayley)
418 * regex.c (re_match_2): only REGEX_ALLOCATE all the register
419 vectors if the pattern actually has registers.
420 (match_end): new variable to avoid having to use best_regend[0].
422 * regex.c (IS_IN_FIRST_STRING): rename to FIRST_STRING_P.
424 * regex.c: doc fixes.
426 * tregess.c (test_regress): new fastmap test forwarded by rms.
428 * tregress.c (test_regress): initialize the fastmap field.
430 * tregress.c (test_regress): new test from jla that aborted
433 Fri Jul 3 09:10:05 1992 Karl Berry (karl@hayley)
435 * tregress.c (test_regress): add tests for translating charsets,
438 * GNUmakefile (common): add alloca.o.
439 * alloca.c: new file, copied from bison.
441 * other.c (test_others): remove var `buf', since it's no longer used.
443 * Below changes from ro@TechFak.Uni-Bielefeld.DE.
445 * tregress.c (test_regress): initialize buf.allocated.
447 * regex.c (re_compile_fastmap): initialize `succeed_n_p'.
449 * GNUmakefile (regex): depend on $(common).
451 Wed Jul 1 07:12:46 1992 Karl Berry (karl@hayley)
455 * regex.c: doc fixes.
457 Mon Jun 29 08:09:47 1992 Karl Berry (karl@fosse)
459 * regex.c (pop_failure_point): change string vars to
460 `const char *' from `unsigned char *'.
462 * regex.c: consolidate debugging stuff.
463 (print_partial_compiled_pattern): avoid enum clash.
465 Mon Jun 29 07:50:27 1992 Karl Berry (karl@hayley)
467 * xmalloc.c: new file.
468 * GNUmakefile (common): add it.
470 * iregex.c (print_regs): new routine (from jimb).
473 Sat Jun 27 10:50:59 1992 Jim Blandy (jimb@pogo.cs.oberlin.edu)
475 * xregex.c (re_match_2): When we have accepted a match and
476 restored d from best_regend[0], we need to set dend
477 appropriately as well.
479 Sun Jun 28 08:48:41 1992 Karl Berry (karl@hayley)
481 * tregress.c: rename from regress.c.
483 * regex.c (print_compiled_pattern): improve charset case to ease
485 Also, don't distinguish between Emacs and non-Emacs
486 {not,}wordchar opcodes.
488 * regex.c (print_fastmap): move here.
490 * regex.c (print_{{partial,}compiled_pattern,double_string}):
491 rename from ..._printer. Change calls here and in test.c.
493 * regex.c: create from xregex.c and regexinc.c for once and for
494 all, and change the debug fns to be extern, instead of static.
495 * GNUmakefile: remove traces of xregex.c.
496 * test.c: put in externs, instead of including regexinc.c.
498 * xregex.c: move interactive main program and scanstring to iregex.c.
499 * iregex.c: new file.
500 * upcase.c, printchar.c: new files.
502 * various doc fixes and other cosmetic changes throughout.
504 * regexinc.c (compiled_pattern_printer): change variable name,
506 (partial_compiled_pattern_printer): print other info about the
507 compiled pattern, besides just the opcodes.
508 * xregex.c (regex_compile) [DEBUG]: print the compiled pattern
511 * xregex.c (re_compile_fastmap): in the duplicate case, set
512 `can_be_null' and return.
513 Also, set `bufp->can_be_null' according to a new variable,
515 Also, rewrite main while loop to not test `p != NULL', since
516 we never set it that way.
517 Also, eliminate special `can_be_null' value for the endline case.
518 (re_search_2): don't test for the special value.
519 * regex.h (struct re_pattern_buffer): remove the definition.
521 Sat Jun 27 15:00:40 1992 Karl Berry (karl@hayley)
523 * xregex.c (re_compile_fastmap): remove the `RE_' from
524 `REG_RE_MATCH_NULL_AT_END'.
525 Also, assert the fastmap in the pattern buffer is non-null.
526 Also, reset `succeed_n_p' after we've
527 paid attention to it, instead of every time through the loop.
528 Also, in the `anychar' case, only clear fastmap['\n'] if the
529 syntax says to, and don't return prematurely.
530 Also, rearrange cases in some semblance of a rational order.
531 * regex.h (REG_RE_MATCH_NULL_AT_END): remove the `RE_' from the name.
533 * other.c: take bug reports from here.
534 * regress.c: new file for them.
535 * GNUmakefile (test): add it.
536 * main.c (main): new possible test.
537 * test.h (test_type): new value in enum.
539 Thu Jun 25 17:37:43 1992 Karl Berry (karl@hayley)
541 * xregex.c (scanstring) [test]: new function from jimb to allow some
543 (main) [test]: call it (on the string, not the pattern).
545 * xregex.c (main): make return type `int'.
547 Wed Jun 24 10:43:03 1992 Karl Berry (karl@hayley)
549 * xregex.c (pattern_offset_t): change to `int', for the benefit
550 of patterns which compile to more than 2^15 bytes.
552 * xregex.c (GET_BUFFER_SPACE): remove spurious braces.
554 * xregex.texinfo (Using Registers): put in a stub to ``document''
556 * regex.h (re_set_registers) [!__STDC__]: declare.
557 * xregex.c (re_set_registers): declare K&R style (also move to a
558 different place in the file).
560 Mon Jun 8 18:03:28 1992 Jim Blandy (jimb@pogo.cs.oberlin.edu)
562 * regex.h (RE_NREGS): Doc fix.
564 * xregex.c (re_set_registers): New function.
565 * regex.h (re_set_registers): Declaration for new function.
567 Fri Jun 5 06:55:18 1992 Karl Berry (karl@hayley)
569 * main.c (main): `return 0' instead of `exit (0)'. (From Paul Eggert)
571 * regexinc.c (SIGN_EXTEND_CHAR): cast to unsigned char.
572 (extract_number, EXTRACT_NUMBER): don't bother to cast here.
574 Tue Jun 2 07:37:53 1992 Karl Berry (karl@hayley)
578 * Change copyrights to `1985, 89, ...'.
580 * regex.h (REG_RE_MATCH_NULL_AT_END): new macro.
581 * xregex.c (re_compile_fastmap): initialize `can_be_null' to
582 `p==pend', instead of in the test at the top of the loop (as
583 it was, it was always being set).
584 Also, set `can_be_null'=1 if we would jump to the end of the
585 pattern in the `on_failure_jump' cases.
586 (re_search_2): check if `can_be_null' is 1, not nonzero. This
587 was the original test in rms' regex; why did we change this?
589 * xregex.c (re_compile_fastmap): rename `is_a_succeed_n' to
592 Sat May 30 08:09:08 1992 Karl Berry (karl@hayley)
594 * xregex.c (re_compile_pattern): declare `regnum' as `unsigned',
595 not `regnum_t', for the benefit of those patterns with more
598 * xregex.c: rename `failure_stack' to `fail_stack', for brevity;
599 likewise for `match_nothing' to `match_null'.
601 * regexinc.c (REGEX_REALLOCATE): take both the new and old
602 sizes, and copy only the old bytes.
603 * xregex.c (DOUBLE_FAILURE_STACK): pass both old and new.
604 * This change from Thorsten Ohl.
606 Fri May 29 11:45:22 1992 Karl Berry (karl@hayley)
608 * regexinc.c (SIGN_EXTEND_CHAR): define as `(signed char) c'
609 instead of relying on __CHAR_UNSIGNED__, to work with
610 compilers other than GCC. From Per Bothner.
612 * main.c (main): change return type to `int'.
614 Mon May 18 06:37:08 1992 Karl Berry (karl@hayley)
616 * regex.h (RE_SYNTAX_AWK): typo in RE_RE_UNMATCHED...
618 Fri May 15 10:44:46 1992 Karl Berry (karl@hayley)
622 Sun May 3 13:54:00 1992 Karl Berry (karl@hayley)
624 * regex.h (struct re_pattern_buffer): now it's just `regs_allocated'.
625 (REGS_UNALLOCATED, REGS_REALLOCATE, REGS_FIXED): new constants.
626 * xregex.c (regexec, re_compile_pattern): set the field appropriately.
627 (re_match_2): and use it. bufp can't be const any more.
629 Fri May 1 15:43:09 1992 Karl Berry (karl@hayley)
631 * regexinc.c: unconditionally include <sys/types.h>, first.
633 * regex.h (struct re_pattern_buffer): rename
634 `caller_allocated_regs' to `regs_allocated_p'.
635 * xregex.c (re_compile_pattern): same change here.
637 (re_match_2): reallocate registers if necessary.
639 Fri Apr 10 07:46:50 1992 Karl Berry (karl@hayley)
641 * regex.h (RE_SYNTAX{_POSIX,}_AWK): new definitions from Arnold.
643 Sun Mar 15 07:34:30 1992 Karl Berry (karl at hayley)
645 * GNUmakefile (dist): versionize regex.{c,h,texinfo}.
647 Tue Mar 10 07:05:38 1992 Karl Berry (karl at hayley)
651 * xregex.c (PUSH_FAILURE_POINT): always increment the failure id.
652 (DEBUG_STATEMENT) [DEBUG]: execute the statement even if `debug'==0.
654 * xregex.c (pop_failure_point): if the saved string location is
655 null, keep the current value.
656 (re_match_2): at fail, test for a dummy failure point by
657 checking the restored pattern value, not string value.
658 (re_match_2): new case, `on_failure_keep_string_jump'.
659 (regex_compile): output this opcode in the .*\n case.
660 * regexinc.c (re_opcode_t): define the opcode.
661 (partial_compiled_pattern_pattern): add the new case.
663 Mon Mar 9 09:09:27 1992 Karl Berry (karl at hayley)
665 * xregex.c (regex_compile): optimize .*\n to output an
666 unconditional jump to the ., instead of pushing failure points
667 each time through the loop.
669 * xregex.c (DOUBLE_FAILURE_STACK): compute the maximum size
670 ourselves (and correctly); change callers.
672 Sun Mar 8 17:07:46 1992 Karl Berry (karl at hayley)
674 * xregex.c (failure_stack_elt_t): change to `const char *', to
677 * regex.h (re_set_syntax): declare this.
679 * xregex.c (pop_failure_point) [DEBUG]: conditionally pass the
680 original strings and sizes; change callers.
682 Thu Mar 5 16:35:35 1992 Karl Berry (karl at claude.cs.umb.edu)
684 * xregex.c (regnum_t): new type for register/group numbers.
685 (compile_stack_elt_t, regex_compile): use it.
687 * xregex.c (regexec): declare len as `int' to match re_search.
689 * xregex.c (re_match_2): don't declare p1 twice.
691 * xregex.c: change `while (1)' to `for (;;)' to avoid silly
694 * regex.h [__STDC__]: use #if, not #ifdef.
696 * regexinc.c (REGEX_REALLOCATE): cast the result of alloca to
697 (char *), to avoid warnings.
699 * xregex.c (regerror): declare variable as const.
701 * xregex.c (re_compile_pattern, re_comp): define as returning a const
703 * regex.h (re_compile_pattern, re_comp): likewise.
705 Thu Mar 5 15:57:56 1992 Karl Berry (karl@hal)
707 * xregex.c (regcomp): declare `syntax' as unsigned.
709 * xregex.c (re_match_2): try to avoid compiler warnings about
710 unsigned comparisons.
712 * GNUmakefile (test-xlc): new target.
714 * regex.h (reg_errcode_t): remove trailing comma from definition.
715 * regexinc.c (re_opcode_t): likewise.
717 Thu Mar 5 06:56:07 1992 Karl Berry (karl at hayley)
719 * GNUmakefile (dist): add version numbers automatically.
720 (versionfiles): new variable.
721 (regex.{c,texinfo}): don't add version numbers here.
722 * regex.h: put in placeholder instead of the version number.
724 Fri Feb 28 07:11:33 1992 Karl Berry (karl at hayley)
726 * xregex.c (re_error_msg): declare const, since it is.
728 Sun Feb 23 05:41:57 1992 Karl Berry (karl at fosse)
730 * xregex.c (PAT_PUSH{,_2,_3}, ...): cast args to avoid warnings.
731 (regex_compile, regexec): return REG_NOERROR, instead
733 (boolean): define as char, and #define false and true.
734 * regexinc.c (STREQ): cast the result.
736 Sun Feb 23 07:45:38 1992 Karl Berry (karl at hayley)
738 * GNUmakefile (test-cc, test-hc, test-pcc): new targets.
740 * regex.inc (extract_number, extract_number_and_incr) [DEBUG]:
741 only define if we are debugging.
743 * xregex.c [_AIX]: do #pragma alloca first if necessary.
744 * regexinc.c [_AIX]: remove the #pragma from here.
746 * regex.h (reg_syntax_t): declare as unsigned, and redo the enum
747 as #define's again. Some compilers do stupid things with enums.
749 Thu Feb 20 07:19:47 1992 Karl Berry (karl at hayley)
753 * xregex.c, regex.h (newline_anchor_match_p): rename to
754 `newline_anchor'; dumb idea to change the name.
756 Tue Feb 18 07:09:02 1992 Karl Berry (karl at hayley)
758 * regexinc.c: go back to original, i.e., don't include
759 <string.h> or define strchr.
760 * xregex.c (regexec): don't bother with adding characters after
761 newlines to the fastmap; instead, just don't use a fastmap.
762 * xregex.c (regcomp): set the buffer and fastmap fields to zero.
764 * xregex.texinfo (GNU r.e. compiling): have to initialize more
767 * regex.h (struct re_pattern_buffer): rename `newline_anchor' to
768 `newline_anchor_match_p', as we're back to two cases.
769 * xregex.c (regcomp, re_compile_pattern, re_comp): change
771 (re_match_2): at begline and endline, POSIX is not a special
772 case anymore; just check newline_anchor_match_p.
774 Thu Feb 13 16:29:33 1992 Karl Berry (karl at hayley)
776 * xregex.c (*empty_string*): rename to *null_string*, for brevity.
778 Wed Feb 12 06:36:22 1992 Karl Berry (karl at hayley)
780 * xregex.c (re_compile_fastmap): at endline, don't set fastmap['\n'].
781 (re_match_2): rewrite the begline/endline cases to take account
782 of the new field newline_anchor.
784 Tue Feb 11 14:34:55 1992 Karl Berry (karl at hayley)
786 * regexinc.c [!USG etc.]: include <strings.h> and define strchr
789 * xregex.c (re_search_2): when searching backwards, declare `c'
790 as a char and use casts when using it as an array subscript.
792 * xregex.c (regcomp): if REG_NEWLINE, set
793 RE_HAT_LISTS_NOT_NEWLINE. Set the `newline_anchor' field
795 (regex_compile): compile [^...] as matching a \n according to
797 (regexec): if doing REG_NEWLINE stuff, compile a fastmap and add
798 characters after any \n's to the newline.
799 * regex.h (RE_HAT_LISTS_NOT_NEWLINE): new syntax bit.
800 (struct re_pattern_buffer): rename `posix_newline' to
801 `newline_anchor', define constants for its values.
803 Mon Feb 10 07:22:50 1992 Karl Berry (karl at hayley)
805 * xregex.c (re_compile_fastmap): combine the code at the top and
806 bottom of the loop, as it's essentially identical.
808 Sun Feb 9 10:02:19 1992 Karl Berry (karl at hayley)
810 * xregex.texinfo (POSIX Translate Tables): remove this, as it
811 doesn't match the spec.
813 * xregex.c (re_compile_fastmap): if we finish off a path, go
814 back to the top (to set can_be_null) instead of returning
817 * xregex.texinfo: changes from bob.
819 Sat Feb 1 07:03:25 1992 Karl Berry (karl at hayley)
821 * xregex.c (re_search_2): doc fix (from rms).
823 Fri Jan 31 09:52:04 1992 Karl Berry (karl at hayley)
825 * xregex.texinfo (GNU Searching): clarify the range arg.
827 * xregex.c (re_match_2, at_endline_op_p): add extra parens to
828 get rid of GCC 2's (silly, IMHO) warning about && within ||.
830 * xregex.c (common_op_match_empty_string_p): use
831 MATCH_NOTHING_UNSET_VALUE, not -1.
833 Thu Jan 16 08:43:02 1992 Karl Berry (karl at hayley)
835 * xregex.c (SET_REGS_MATCHED): only set the registers from
838 * regexinc.c (MIN): new macro.
839 * xregex.c (re_match_2): only check min (num_regs,
840 regs->num_regs) when we set the returned regs.
842 * xregex.c (re_match_2): set registers after the first
843 num_regs to -1 before we return.
845 Tue Jan 14 16:01:42 1992 Karl Berry (karl at hayley)
847 * xregex.c (re_match_2): initialize max (RE_NREGS, re_nsub + 1)
848 registers (from rms).
850 * xregex.c, regex.h: don't abbreviate `19xx' to `xx'.
852 * regexinc.c [!emacs]: include <sys/types.h> before <unistd.h>.
853 (from ro@thp.Uni-Koeln.DE).
855 Thu Jan 9 07:23:00 1992 Karl Berry (karl at hayley)
857 * xregex.c (*unmatchable): rename to `match_empty_string_p'.
858 (CAN_MATCH_NOTHING): rename to `REG_MATCH_EMPTY_STRING_P'.
860 * regexinc.c (malloc, realloc): remove prototypes, as they can
861 cause clashes (from rms).
863 Mon Jan 6 12:43:24 1992 Karl Berry (karl at claude.cs.umb.edu)
867 Sun Jan 5 10:50:38 1992 Karl Berry (karl at hayley)
869 * xregex.texinfo: bring more or less up-to-date.
870 * GNUmakefile (regex.texinfo): generate from regex.h and
872 * include.awk: new file.
874 * xregex.c: change all calls to the fn extract_number_and_incr
877 * xregex.c (re_match_2) [emacs]: in at_dot, use PTR_CHAR_POS + 1,
878 instead of bf_* and sl_*. Cast d to unsigned char *, to match
879 the declaration in Emacs' buffer.h.
880 [emacs19]: in before_dot, at_dot, and after_dot, likewise.
882 * regexinc.c: unconditionally include <sys/types.h>.
884 * regexinc.c (alloca) [!alloca]: Emacs config files sometimes
885 define this, so don't define it if it's already defined.
887 Sun Jan 5 06:06:53 1992 Karl Berry (karl at fosse)
889 * xregex.c (re_comp): fix type conflicts with regex_compile (we
890 haven't been compiling this).
892 * regexinc.c (SIGN_EXTEND_CHAR): use `__CHAR_UNSIGNED__', not
895 * regexinc.c (NULL) [!NULL]: define it (as zero).
897 * regexinc.c (extract_number): remove the temporaries.
899 Sun Jan 5 07:50:14 1992 Karl Berry (karl at hayley)
901 * regex.h (regerror) [!__STDC__]: return a size_t, not a size_t *.
903 * xregex.c (PUSH_FAILURE_POINT, ...): declare `destination' as
904 `char *' instead of `void *', to match alloca declaration.
906 * xregex.c (regerror): use `size_t' for the intermediate values
907 as well as the return type.
909 * xregex.c (regexec): cast the result of malloc.
911 * xregex.c (regexec): don't initialize `private_preg' in the
912 declaration, as old C compilers can't do that.
914 * xregex.c (main) [test]: declare printchar void.
916 * xregex.c (assert) [!DEBUG]: define this to do nothing, and
917 remove #ifdef DEBUG's from around asserts.
919 * xregex.c (re_match_2): remove error message when not debugging.
921 Sat Jan 4 09:45:29 1992 Karl Berry (karl at hayley)
923 * other.c: test the bizarre duplicate case in re_compile_fastmap
926 * test.c (general_test): don't test registers beyond the end of
927 correct_regs, as well as regs.
929 * xregex.c (regex_compile): at handle_close, don't assign to
930 *inner_group_loc if we didn't push a start_memory (because the
931 group number was too big). In fact, don't push or pop the
932 inner_group_offset in that case.
934 * regex.c: rename to xregex.c, since it's not the whole thing.
935 * regex.texinfo: likewise.
936 * GNUmakefile: change to match.
938 * regex.c [DEBUG]: only include <stdio.h> if debugging.
940 * regexinc.c (SIGN_EXTEND_CHAR) [CHAR_UNSIGNED]: if it's already
941 defined, don't redefine it.
943 * regex.c: define _GNU_SOURCE at the beginning.
944 * regexinc.c (isblank) [!isblank]: define it.
945 (isgraph) [!isgraph]: change conditional to this, and remove the
948 * regex.c (regex_compile): add `blank' character class.
950 * regex.c (regex_compile): don't use a uchar variable to loop
951 through all characters.
953 * regex.c (regex_compile): at '[', improve logic for checking
954 that we have enough space for the charset.
956 * regex.h (struct re_pattern_buffer): declare translate as char
957 * again. We only use it as an array subscript once, I think.
959 * regex.c (TRANSLATE): new macro to cast the data character
961 (num_internal_regs): rename to `num_regs'.
963 Fri Jan 3 07:58:01 1992 Karl Berry (karl at hayley)
965 * regex.h (struct re_pattern_buffer): declare `allocated' and
966 `used' as unsigned long, since these are never negative.
968 * regex.c (compile_stack_element): rename to compile_stack_elt_t.
969 (failure_stack_element): similarly.
971 * regexinc.c (TALLOC, RETALLOC): new macros to simplify
972 allocation of arrays.
974 * regex.h (re_*) [__STDC__]: don't declare string args unsigned
975 char *; that makes them incompatible with string constants.
976 (struct re_pattern_buffer): declare the pattern and translate
977 table as unsigned char *.
978 * regex.c (most routines): use unsigned char vs. char consistently.
980 * regex.h (re_compile_pattern): do not declare the length arg as
982 * regex.c (re_compile_pattern): likewise.
984 * regex.c (POINTER_TO_REG): rename to `POINTER_TO_OFFSET'.
986 * regex.h (re_registers): declare `start' and `end' as
987 `regoff_t', instead of `int'.
989 * regex.c (regexec): if either of the malloc's for the register
990 information fail, return failure.
992 * regex.h (RE_NREGS): define this again, as 30 (from jla).
993 (RE_ALLOCATE_REGISTERS): remove this.
994 (RE_SYNTAX_*): remove it from definitions.
995 (re_pattern_buffer): remove `return_default_num_regs', add
996 `caller_allocated_regs'.
997 * regex.c (re_compile_pattern): clear no_sub and
998 caller_allocated_regs in the pattern.
999 (regcomp): set caller_allocated_regs.
1000 (re_match_2): do all register allocation at the end of the
1001 match; implement new semantics.
1003 * regex.c (MAX_REGNUM): new macro.
1004 (regex_compile): at handle_open and handle_close, if the group
1005 number is too large, don't push the start/stop memory.
1007 Thu Jan 2 07:56:10 1992 Karl Berry (karl at hayley)
1009 * regex.c (re_match_2): if the back reference is to a group that
1010 never matched, then goto fail, not really_fail. Also, don't
1011 test if the pattern can match the empty string. Why did we
1013 (really_fail): this label no longer needed.
1015 * regexinc.c [STDC_HEADERS]: use only this to test if we should
1018 * regex.c (DO_RANGE, regex_compile): translate in all cases
1019 except the single character after a \.
1021 * regex.h (RE_AWK_CLASS_HACK): rename to
1022 RE_BACKSLASH_ESCAPE_IN_LISTS.
1023 * regex.c (regex_compile): change use.
1025 * regex.c (re_compile_fastmap): do not translate the characters
1026 again; we already translated them at compilation. (From ylo@ngs.fi.)
1028 * regex.c (re_match_2): in case for at_dot, invert sense of
1029 comparison and find the character number properly. (From
1030 worley@compass.com.)
1031 (re_match_2) [emacs]: remove the cases for before_dot and
1032 after_dot, since there's no way to specify them, and the code
1033 is wrong (judging from this change).
1035 Wed Jan 1 09:13:38 1992 Karl Berry (karl at hayley)
1037 * psx-{interf,basic,extend}.c, other.c: set `t' as the first
1038 thing, so that if we run them in sucession, general_test's
1039 kludge to see if we're doing POSIX tests works.
1041 * test.h (test_type): add `all_test'.
1042 * main.c: add case for `all_test'.
1044 * regexinc.c (partial_compiled_pattern_printer,
1045 double_string_printer): don't print anything if we're passed null.
1047 * regex.c (PUSH_FAILURE_POINT): do not scan for the highest and
1048 lowest active registers.
1049 (re_match_2): compute lowest/highest active regs at start_memory and
1051 (NO_{LOW,HIGH}EST_ACTIVE_REG): new sentinel values.
1052 (pop_failure_point): return the lowest/highest active reg values
1053 popped; change calls.
1055 * regex.c [DEBUG]: include <assert.h>.
1056 (various routines) [DEBUG]: change conditionals to assertions.
1058 * regex.c (DEBUG_STATEMENT): new macro.
1059 (PUSH_FAILURE_POINT): use it to increment num_regs_pushed.
1060 (re_match_2) [DEBUG]: only declare num_regs_pushed if DEBUG.
1062 * regex.c (*can_match_nothing): rename to *unmatchable.
1064 * regex.c (re_match_2): at stop_memory, adjust argument reading.
1066 * regex.h (re_pattern_buffer): declare `can_be_null' as a 2-bit
1069 * regex.h (re_pattern_buffer): declare `buffer' unsigned char *;
1070 no, dumb idea. The pattern can have signed number.
1072 * regex.c (re_match_2): in maybe_pop_jump case, skip over the
1073 right number of args to the group operators, and don't do
1074 anything with endline if posix_newline is not set.
1076 * regex.c, regexinc.c (all the things we just changed): go back
1077 to putting the inner group count after the start_memory,
1078 because we need it in the on_failure_jump case in re_match_2.
1079 But leave it after the stop_memory also, since we need it
1080 there in re_match_2, and we don't have any way of getting back
1081 to the start_memory.
1083 * regexinc.c (partial_compiled_pattern_printer): adjust argument
1084 reading for start/stop_memory.
1085 * regex.c (re_compile_fastmap, group_can_match_nothing): likewise.
1087 Tue Dec 31 10:15:08 1991 Karl Berry (karl at hayley)
1089 * regex.c (bits list routines): remove these.
1090 (re_match_2): get the number of inner groups from the pattern,
1091 instead of keeping track of it at start and stop_memory.
1092 Put the count after the stop_memory, not after the
1094 (compile_stack_element): remove `fixup_inner_group' member,
1095 since we now put it in when we can compute it.
1096 (regex_compile): at handle_open, don't push the inner group
1097 offset, and at handle_close, don't pop it.
1099 * regex.c (level routines): remove these, and their uses in
1100 regex_compile. This was another manifestation of having to find
1101 $'s that were endlines.
1103 * regex.c (regexec): this does searching, not matching (a
1104 well-disguised part of the standard). So rewrite to use
1105 `re_search' instead of `re_match'.
1106 * psx-interf.c (test_regexec): add tests to, uh, match.
1108 * regex.h (RE_TIGHT_ALT): remove this; nobody uses it.
1109 * regex.c: remove the code that was supposed to implement it.
1111 * other.c (test_others): ^ and $ never match newline characters;
1112 RE_CONTEXT_INVALID_OPS doesn't affect anchors.
1114 * psx-interf.c (test_regerror): update for new error messages.
1116 * psx-extend.c: it's now ok to have an alternative be just a $,
1117 so remove all the tests which supposed that was invalid.
1119 Wed Dec 25 09:00:05 1991 Karl Berry (karl at hayley)
1121 * regex.c (regex_compile): in handle_open, don't skip over ^ and
1122 $ when checking for an empty group. POSIX has changed the
1124 * psx-extend.c (test_posix_extended): thus, move (^$) tests to
1127 * regexinc.c (boolean): move from here to test.h and regex.c.
1128 * test files: declare verbose, omit_register_tests, and
1129 test_should_match as boolean.
1131 * psx-interf.c (test_posix_c_interface): remove the `c_'.
1134 * psx-basic.c (test_posix_basic): ^ ($) is an anchor after
1135 (before) an open (close) group.
1137 * regex.c (re_match_2): in endline, correct precedence of
1138 posix_newline condition.
1140 Tue Dec 24 06:45:11 1991 Karl Berry (karl at hayley)
1142 * test.h: incorporate private-tst.h.
1143 * test files: include test.h, not private-tst.h.
1145 * test.c (general_test): set posix_newline to zero if we are
1146 doing POSIX tests (unfortunately, it's difficult to call
1147 regcomp in this case, which is what we should really be doing).
1149 * regex.h (reg_syntax_t): make this an enumeration type which
1150 defines the syntax bits; renames re_syntax_t.
1152 * regex.c (at_endline_op_p): don't preincrement p; then if it's
1153 not an empty string op, we lose.
1155 * regex.h (reg_errcode_t): new enumeration type of the error
1157 * regex.c (regex_compile): return that type.
1159 * regex.c (regex_compile): in [, initialize
1160 just_had_a_char_class to false; somehow I had changed this to
1163 * regex.h (RE_NO_CONSECUTIVE_REPEATS): remove this, since we
1164 don't use it, and POSIX doesn't require this behavior anymore.
1165 * regex.c (regex_compile): remove it from here.
1167 * regex.c (regex_compile): remove the no_op insertions for
1168 verify_and_adjust_endlines, since that doesn't exist anymore.
1170 * regex.c (regex_compile) [DEBUG]: use printchar to print the
1171 pattern, so unprintable bytes will print properly.
1173 * regex.c: move re_error_msg back.
1174 * test.c (general_test): print the compile error if the pattern
1177 Mon Dec 23 08:54:53 1991 Karl Berry (karl at hayley)
1179 * regexinc.c: move re_error_msg here.
1181 * regex.c (re_error_msg): the ``message'' for success must be
1182 NULL, to keep the interface to re_compile_pattern the same.
1183 (regerror): if the msg is null, use "Success".
1185 * rename most test files for consistency. Change Makefile
1188 * test.c (most routines): add casts to (unsigned char *) when we
1189 call re_{match,search}{,_2}.
1191 Sun Dec 22 09:26:06 1991 Karl Berry (karl at hayley)
1193 * regex.c (re_match_2): declare string args as unsigned char *
1194 again; don't declare non-pointer args const; declare the
1195 pattern buffer const.
1196 (re_match): likewise.
1197 (re_search_2, re_search): likewise, except don't declare the
1198 pattern const, since we make a fastmap.
1199 * regex.h [__STDC__]: change prototypes.
1201 * regex.c (regex_compile): return an error code, not a string.
1202 (re_err_list): new table to map from error codes to string.
1203 (re_compile_pattern): return an element of re_err_list.
1204 (regcomp): don't test all the strings.
1205 (regerror): just use the list.
1206 (put_in_buffer): remove this.
1208 * regex.c (equivalent_failure_points): remove this.
1210 * regex.c (re_match_2): don't copy the string arguments into
1211 non-const pointers. We never alter the data.
1213 * regex.c (re_match_2): move assignment to `is_a_jump_n' out of
1214 the main loop. Just initialize it right before we do
1217 * regex.[ch] (re_match_2): don't declare the int parameters const.
1219 Sat Dec 21 08:52:20 1991 Karl Berry (karl at hayley)
1221 * regex.h (re_syntax_t): new type; declare to be unsigned
1222 (previously we used int, but since we do bit operations on
1223 this, unsigned is better, according to H&S).
1224 (obscure_syntax, re_pattern_buffer): use that type.
1225 * regex.c (re_set_syntax, regex_compile): likewise.
1227 * regex.h (re_pattern_buffer): new field `posix_newline'.
1228 * regex.c (re_comp, re_compile_pattern): set to zero.
1229 (regcomp): set to REG_NEWLINE.
1230 * regex.h (RE_HAT_LISTS_NOT_NEWLINE): remove this (we can just
1231 check `posix_newline' instead.)
1233 * regex.c (op_list_type, op_list, add_op): remove these.
1234 (verify_and_adjust_endlines): remove this.
1235 (pattern_offset_list_type, *pattern_offset* routines): and these.
1236 These things all implemented the nonleading/nontrailing position
1237 code, which was very long, had a few remaining problems, and
1238 is no longer needed. So...
1240 * regexinc.c (STREQ): new macro to abbreviate strcmp(,)==0, for
1241 brevity. Change various places in regex.c to use it.
1243 * regex{,inc}.c (enum regexpcode): change to a typedef
1244 re_opcode_t, for brevity.
1246 * regex.h (re_syntax_table) [SYNTAX_TABLE]: remove this; it
1247 should only be in regex.c, I think, since we don't define it
1248 in this case. Maybe it should be conditional on !SYNTAX_TABLE?
1250 * regexinc.c (partial_compiled_pattern_printer): simplify and
1251 distinguish the emacs/not-emacs (not)wordchar cases.
1253 Fri Dec 20 08:11:38 1991 Karl Berry (karl at hayley)
1255 * regexinc.c (regexpcode) [emacs]: only define the Emacs opcodes
1256 if we are ifdef emacs.
1258 * regex.c (BUF_PUSH*): rename to PAT_PUSH*.
1260 * regex.c (regex_compile): in $ case, go back to essentially the
1261 original code for deciding endline op vs. normal char.
1262 (at_endline_op_p): new routine.
1263 * regex.h (RE_ANCHORS_ONLY_AT_ENDS, RE_CONTEXT_INVALID_ANCHORS,
1264 RE_REPEATED_ANCHORS_AWAY, RE_NO_ANCHOR_AT_NEWLINE): remove
1265 these. POSIX has simplified the rules for anchors in draft
1267 (RE_NEWLINE_ORDINARY): new syntax bit.
1268 (RE_CONTEXT_INDEP_ANCHORS): change description to be compatible
1270 * regex.texinfo (Syntax Bits): remove the descriptions.
1272 Mon Dec 16 08:12:40 1991 Karl Berry (karl at hayley)
1274 * regex.c (re_match_2): in jump_past_next_alt, unconditionally
1275 goto no_pop. The only register we were finding was one which
1276 enclosed the whole alternative expression, not one around an
1277 individual alternative. So we were never doing what we
1278 thought we were doing, and this way makes (|a) against the
1281 * regex.c (regex_compile): remove `highest_ever_regnum', and
1282 don't restore regnum from the stack; just put it into a
1283 temporary to put into the stop_memory. Otherwise, groups
1284 aren't numbered consecutively.
1286 * regex.c (is_in_compile_stack): rename to
1287 `group_in_compile_stack'; remove unnecessary test for the
1290 * regex.c (re_match_2): in on_failure_jump, skip no_op's before
1291 checking for the start_memory, in case we were called from
1294 Sun Dec 15 16:20:48 1991 Karl Berry (karl at hayley)
1296 * regex.c (regex_compile): in duplicate case, use
1297 highest_ever_regnum instead of regnum, since the latter is
1298 reverted at stop_memory.
1300 * regex.c (re_match_2): in on_failure_jump, if the * applied to
1301 a group, save the information for that group and all inner
1302 groups (by making it active), even though we're not inside it
1305 Sat Dec 14 09:50:59 1991 Karl Berry (karl at hayley)
1307 * regex.c (PUSH_FAILURE_ITEM, POP_FAILURE_ITEM): new macros.
1308 Use them instead of copying the stack manipulating a zillion
1311 * regex.c (PUSH_FAILURE_POINT, pop_failure_point) [DEBUG]: save
1312 and restore a unique identification value for each failure point.
1314 * regexinc.c (partial_compiled_pattern_printer): don't print an
1315 extra / after duplicate commands.
1317 * regex.c (regex_compile): in back-reference case, allow a back
1318 reference to register `regnum'. Otherwise, even `\(\)\1'
1319 fails, since regnum is 1 at the back-reference.
1321 * regex.c (re_match_2): in fail, don't examine the pattern if we
1324 * test_private.h: rename to private_tst.h. Change includes.
1326 * regex.c (extend_bits_list): compute existing size for realloc
1327 in bytes, not blocks.
1329 * regex.c (re_match_2): in jump_past_next_alt, the for loop was
1330 missing its (empty) statement. Even so, some register tests
1331 still fail, although in a different way than in the previous change.
1333 Fri Dec 13 15:55:08 1991 Karl Berry (karl at hayley)
1335 * regex.c (re_match_2): in jump_past_next_alt, unconditionally
1336 goto no_pop, since we weren't properly detecting if the
1337 alternative matched something anyway. No, we need to not jump
1338 to keep the register values correct; just change to not look at
1339 register zero and not test RE_NO_EMPTY_ALTS (which is a
1340 compile-time thing).
1342 * regex.c (SET_REGS_MATCHED): start the loop at 1, since we never
1343 care about register zero until the very end. (I think.)
1345 * regex.c (PUSH_FAILURE_POINT, pop_failure_point): go back to
1346 pushing and popping the active registers, instead of only doing
1347 the registers before a group: (fooq|fo|o)*qbar against fooqbar
1348 fails, since we restore back into the middle of group 1, yet it
1349 isn't active, because the previous restore clobbered the active flag.
1351 Thu Dec 12 17:25:36 1991 Karl Berry (karl at hayley)
1353 * regex.c (PUSH_FAILURE_POINT): do not call
1354 `equivalent_failure_points' after all; it causes the registers
1355 to be ``wrong'' (according to POSIX), and an infinite loop on
1356 `((a*)*)*' against `ab'.
1358 * regex.c (re_compile_fastmap): don't push `pend' on the failure
1361 Tue Dec 10 10:30:03 1991 Karl Berry (karl at hayley)
1363 * regex.c (PUSH_FAILURE_POINT): if pushing same failure point that
1364 is on the top of the stack, fail.
1365 (equivalent_failure_points): new routine.
1367 * regex.c (re_match_2): add debug statements for every opcode we
1370 * regex.c (regex_compile/handle_close): restore
1371 `fixup_inner_group_count' and `regnum' from the stack.
1373 Mon Dec 9 13:51:15 1991 Karl Berry (karl at hayley)
1375 * regex.c (PUSH_FAILURE_POINT): declare `this_reg' as int, so
1376 unsigned arithmetic doesn't happen when we don't want to save
1379 Tue Dec 3 08:11:10 1991 Karl Berry (karl at hayley)
1381 * regex.c (extend_bits_list): divide size by bits/block.
1383 * regex.c (init_bits_list): remove redundant assignmen to
1386 * regexinc.c (partial_compiled_pattern_printer): don't do *p++
1387 twice in the same expr.
1389 * regex.c (re_match_2): at on_failure_jump, use the correct
1390 pattern positions for getting the stuff following the start_memory.
1392 * regex.c (struct register_info): remove the bits_list for the
1393 inner groups; make that a separate variable.
1395 Mon Dec 2 10:42:07 1991 Karl Berry (karl at hayley)
1397 * regex.c (PUSH_FAILURE_POINT): don't pass `failure_stack' as an
1398 arg; change callers.
1400 * regex.c (PUSH_FAILURE_POINT): print items in order they are
1402 (pop_failure_point): likewise.
1404 * regex.c (main): prompt for the pattern and string.
1406 * regex.c (FREE_VARIABLES) [!REGEX_MALLOC]: declare as nothing;
1407 remove #ifdefs from around calls.
1409 * regex.c (extract_number, extract_number_and_incr): declare static.
1411 * regex.c: remove the canned main program.
1413 * Makefile (COMMON): add main.o.
1415 Tue Sep 24 06:26:51 1991 Kathy Hargreaves (kathy at fosse)
1417 * regex.c (re_match_2): Made `pend' and `dend' not register variables.
1418 Only set string2 to string1 if string1 isn't null.
1419 Send address of p, d, regstart, regend, and reg_info to
1421 Put in more debug statements.
1423 * regex.c [debug]: Added global variable.
1424 (DEBUG_*PRINT*): Only print if `debug' is true.
1425 (DEBUG_DOUBLE_STRING_PRINTER): Changed DEBUG_STRING_PRINTER's
1427 Changed some comments.
1428 (PUSH_FAILURE_POINT): Moved and added some debugging statements.
1429 Was saving regstart on the stack twice instead of saving both
1430 regstart and regend; remedied this.
1431 [NUM_REGS_ITEMS]: Changed from 3 to 4, as now save lowest and
1432 highest active registers instead of highest used one.
1433 [NUM_NON_REG_ITEMS]: Changed name of NUM_OTHER_ITEMS to this.
1434 (NUM_FAILURE_ITEMS): Use active registers instead of number 0
1435 through highest used one.
1436 (re_match_2): Have pop_failure_point put things in the variables.
1437 (pop_failure_point): Have it do what the fail case in re_match_2
1438 did with the failure stack, instead of throwing away the stuff
1439 popped off. re_match_2 can ignore results when it doesn't
1443 Thu Sep 5 13:23:28 1991 Kathy Hargreaves (kathy at fosse)
1445 * regex.c (banner): Changed copyright years to be separate.
1447 * regex.c [CHAR_UNSIGNED]: Put __ at both ends of this name.
1448 [DEBUG, debug_count, *debug_p, DEBUG_PRINT_1, DEBUG_PRINT_2,
1449 DEBUG_COMPILED_PATTERN_PRINTER ,DEBUG_STRING_PRINTER]:
1450 defined these for debugging.
1451 (extract_number): Added this (debuggable) routine version of
1452 the macro EXTRACT_NUMBER. Ditto for EXTRACT_NUMBER_AND_INCR.
1453 (re_compile_pattern): Set return_default_num_regs if the
1454 syntax bit RE_ALLOCATE_REGISTERS is set.
1455 [REGEX_MALLOC]: Renamed USE_ALLOCA to this.
1456 (BUF_POP): Got rid of this, as don't ever use it.
1457 (regex_compile): Made the type of `pattern' not be register.
1458 If DEBUG, print the pattern to compile.
1459 (re_match_2): If had a `$' in the pattern before a `^' then
1460 don't record the `^' as an anchor.
1461 Put (enum regexpcode) before references to b, as suggested
1462 [RE_NO_BK_BRACES]: Changed RE_NO_BK_CURLY_BRACES to this.
1463 (remove_pattern_offset): Removed this unused routine.
1464 (PUSH_FAILURE_POINT): Changed to only save active registers.
1465 Put in debugging statements.
1466 (re_compile_fastmap): Made `pattern' not a register variable.
1467 Use routine for extracting numbers instead of macro.
1468 (re_match_2): Made `p', `mcnt' and `mcnt2' not register variables.
1469 Added `num_regs_pushed' for debugging.
1470 Only malloc registers if the syntax bit RE_ALLOCATE_REGISTERS is set.
1471 Put in debug statements.
1472 Put the macro NOTE_INNER_GROUP's code inline, as it was the
1473 only called in one place.
1474 For debugging, extract numbers using routines instead of macros.
1475 In case fail: only restore pushed active registers, and added
1476 debugging statements.
1477 (pop_failure_point): Test for underfull stack.
1478 (group_can_match_nothing, common_op_can_match_nothing): For
1479 debugging, extract numbers using routines instead of macros.
1480 (regexec): Changed formal parameters to not be prototypes.
1481 Don't initialize `regs' or `private_preg' in their declarations.
1483 Tue Jul 23 18:38:36 1991 Kathy Hargreaves (kathy at hayley)
1485 * regex.h [RE_CONTEX_INDEP_OPS]: Moved the anchor stuff out of
1487 [RE_UNMATCHED_RIGHT_PAREN_ORD]: Defined this bit.
1488 [RE_CONTEXT_INVALID_ANCHORS]: Defined this bit.
1489 [RE_CONTEXT_INDEP_ANCHORS]: Defined this bit.
1490 Added RE_CONTEXT_INDEP_ANCHORS to all syntaxes which had
1491 RE_CONTEXT_INDEP_OPS.
1492 Took RE_ANCHORS_ONLY_AT_ENDS out of the POSIX basic syntax.
1493 Added RE_UNMATCHED_RIGHT_PAREN_ORD to the POSIX extended
1495 Took RE_REPEATED_ANCHORS_AWAY out of the POSIX extended syntax.
1496 Defined REG_NOERROR (which will probably have to go away again).
1497 Changed the type `off_t' to `regoff_t'.
1499 * regex.c: Changed some commments.
1500 (regex_compile): Added variable `had_an_endline' to keep track
1501 of if hit a `$' since the beginning of the pattern or the last
1502 alternative (if any).
1503 Changed RE_CONTEXT_INVALID_OPS and RE_CONTEXT_INDEP_OPS to
1504 RE_CONTEXT_INVALID_ANCHORS and RE_CONTEXT_INDEP_ANCHORS where
1506 Put a `no_op' in the pattern if a repeat is only zero or one
1507 times; in this case and if it is many times (whereupon a jump
1508 backwards is pushed instead), keep track of the operator for
1509 verify_and_adjust_endlines.
1510 If RE_UNMATCHED_RIGHT_PAREN is set, make an unmatched
1511 close-group operator match `)'.
1512 Changed all error exits to exit (1).
1513 (remove_pattern_offset): Added this routine, but don't use it.
1514 (verify_and_adjust_endlines): At top of routine, if initialize
1515 routines run out of memory, return true after setting
1516 enough_memory false.
1517 At end of endline, et al. case, don't set *p to no_op.
1518 Repetition operators also set the level and active groups'
1519 match statuses, unless RE_REPEATED_ANCHORS_AWAY is set.
1520 (get_group_match_status): Put a return in front of call to get_bit.
1521 (re_compile_fastmap): Changed is_a_succeed_n to a boolean.
1522 If at end of pattern, then if the failure stack isn't empty,
1523 go back to the failure point.
1524 In *jump* case, only pop the stack if what's on top of it is
1525 where we've just jumped to.
1526 (re_search_2): Return -2 instead of val if val is -2.
1527 (group_can_match_nothing, alternative_can_match_nothing,
1528 common_op_can-match_nothing): Now pass in reg_info for the
1530 (re_match_2): Don't skip over the next alternative also if
1531 empty alternatives aren't allowed.
1532 In fail case, if failed to a backwards jump that's part of a
1533 repetition loop, pop the current failure point and use the
1535 (pop_failure_point): Check that there's as many register items
1536 on the failure stack as the stack says there are.
1537 (common_op_can_match_nothing): Added variables `ret' and
1538 `reg_no' so can set reg_info for the group encountered.
1539 Also break without doing anything if hit a no_op or the other
1540 kinds of `endline's.
1541 If not done already, set reg_info in start_memory case.
1542 Put in no_pop_jump for an optimized succeed_n of zero repetitions.
1543 In succeed_n case, if the number isn't zero, then return false.
1544 Added `duplicate' case.
1546 Sat Jul 13 11:27:38 1991 Kathy Hargreaves (kathy at hayley)
1548 * regex.h (REG_NOERROR): Added this error code definition.
1550 * regex.c: Took some redundant parens out of macros.
1551 (enum regexpcode): Added jump_past_next_alt.
1552 Wrapped some macros in `do..while (0)'.
1553 Changed some comments.
1554 (regex_compile): Use `fixup_alt_jump' instead of `fixup_jump'.
1555 Use `maybe_pop_jump' instead of `maybe_pop_failure_jump'.
1556 Use `jump_past_next_alt' instead of `no_pop_jump' when at the
1557 end of an alternative.
1558 (re_match_2): Used REGEX_ALLOCATE for the registers stuff.
1559 In stop_memory case: Add more boolean tests to see if the
1561 Added jump_past_next_alt case, which doesn't jump over the
1562 next alternative if the last one didn't match anything.
1563 Unfortunately, to make this work with, e.g., `(a+?*|b)*'
1564 against `bb', I also had to pop the alternative's failure
1565 point, which in turn broke backtracking!
1566 In fail case: Detect a dummy failure point by looking at
1567 failure_stack.avail - 2, not stack[-2].
1568 (pop_failure_point): Only pop if the stack isn't empty; don't
1569 give an error if it is. (Not sure yet this is correct.)
1570 (group_can_match_nothing): Make it return a boolean instead of int.
1571 Make it take an argument indicating the end of where it should look.
1572 If find a group that can match nothing, set the pointer
1573 argument to past the group in the pattern.
1574 Took out cases which can share with alternative_can_match_nothing
1575 and call common_op_can_match_nothing.
1576 Took ++ out of switch, so could call common_op_can_match_nothing.
1577 Wrote lots more for on_failure_jump case to handle alternatives.
1578 Main loop now doesn't look for matching stop_memory, but
1579 rather the argument END; return true if hit the matching
1580 stop_memory; this way can call itself for inner groups.
1581 (alternative_can_match_nothing): Added for alternatives.
1582 (common_op_can_match_nothing): Added for previous two routines'
1584 (regerror): Returns a message saying there's no error if gets
1587 Wed Jul 3 10:43:15 1991 Kathy Hargreaves (kathy at hayley)
1589 * regex.c: Removed unnecessary enclosing parens from several macros.
1590 Put `do..while (0)' around a few.
1591 Corrected some comments.
1592 (INIT_FAILURE_STACK_SIZE): Deleted in favor of using
1594 (INIT_FAILURE_STACK, DOUBLE_FAILURE_STACK, PUSH_PATTERN_OP,
1595 PUSH_FAILURE_POINT): Made routines of the same name (but with all
1596 lowercase letters) into these macros, so could use `alloca'
1597 when USE_ALLOCA is defined. The reason is stated below for
1598 bits lists. Deleted analogous routines.
1599 (re_compile_fastmap): Added variable void *destination for
1601 (re_match_2): Added variable void *destination for REGEX_REALLOCATE.
1602 Used the failure stack macros in place of the routines.
1603 Detected a dummy failure point by inspecting the failure stack's
1604 (avail - 2)th element, not failure_stack.stack[-2]. This bug
1605 arose when used the failure stack macros instead of the routines.
1607 * regex.c [USE_ALLOCA]: Put this conditional around previous
1608 alloca stuff and defined these to work differently depending
1609 on whether or not USE_ALLOCA is defined:
1610 (REGEX_ALLOCATE): Uses either `alloca' or `malloc'.
1611 (REGEX_REALLOCATE): Uses either `alloca' or `realloc'.
1612 (INIT_BITS_LIST, EXTEND_BITS_LIST, SET_BIT_TO_VALUE): Defined
1613 macro versions of routines with the same name (only with all
1614 lowercase letters) so could use `alloc' in re_match_2. This
1615 is to prevent core leaks when C-g is used in Emacs and to make
1616 things faster and avoid storage fragmentation. These things
1617 have to be macros because the results of `alloca' go away with
1618 the routine by which it's called.
1619 (BITS_BLOCK_SIZE, BITS_BLOCK, BITS_MASK): Moved to above the
1620 above-mentioned macros instead of before the routines defined
1621 below regex_compile.
1622 (set_bit_to_value): Compacted some code.
1623 (reg_info_type): Changed inner_groups field to be bits_list_type
1624 so could be arbitrarily long and thus handle arbitrary nesting.
1625 (NOTE_INNER_GROUP): Put `do...while (0)' around it so could
1627 Changed code to use bits lists.
1628 Added variable void *destination for REGEX_REALLOCATE (whose call
1629 is several levels in).
1630 Changed variable name of `this_bit' to `this_reg'.
1631 (FREE_VARIABLES): Only define and use if USE_ALLOCA is defined.
1632 (re_match_2): Use REGEX_ALLOCATE instead of malloc.
1633 Instead of setting INNER_GROUPS of reg_info to zero, have to
1634 use INIT_BITS_LIST and return -2 (and free variables if
1635 USE_ALLOCA isn't defined) if it fails.
1637 Fri Jun 28 13:45:07 1991 Karl Berry (karl at hayley)
1639 * regex.c (re_match_2): set value of `dend' when we restore `d'.
1641 * regex.c: remove declaration of alloca.
1643 * regex.c (MISSING_ISGRAPH): rename to `ISGRAPH_MISSING'.
1645 * regex.h [_POSIX_SOURCE]: remove these conditionals; always
1647 * regex.c (_POSIX_SOURCE): change conditionals to use `POSIX'
1650 Sat Jun 1 16:56:50 1991 Kathy Hargreaves (kathy at hayley)
1652 * regex.*: Changed RE_CONTEXTUAL_* to RE_CONTEXT_*,
1653 RE_TIGHT_VBAR to RE_TIGHT_ALT, RE_NEWLINE_OR to
1654 RE_NEWLINE_ALT, and RE_DOT_MATCHES_NEWLINE to RE_DOT_NEWLINE.
1656 Wed May 29 09:24:11 1991 Karl Berry (karl at hayley)
1658 * regex.texinfo (POSIX Pattern Buffers): cross-reference the
1659 correct node name (Match-beginning-of-line, not ..._line).
1660 (Syntax Bits): put @code around all syntax bits.
1662 Sat May 18 16:29:58 1991 Karl Berry (karl at hayley)
1664 * regex.c (global): add casts to keep broken compilers from
1665 complaining about malloc and realloc calls.
1667 * regex.c (isgraph) [MISSING_ISGRAPH]: change test to this,
1668 instead of `#ifndef isgraph', since broken compilers can't
1669 have both a macro and a symbol by the same name.
1671 * regex.c (re_comp, re_exec) [_POSIX_SOURCE]: do not define.
1672 (regcomp, regfree, regexec, regerror) [_POSIX_SOURCE && !emacs]:
1673 only define in this case.
1675 Mon May 6 17:37:04 1991 Kathy Hargreaves (kathy at hayley)
1677 * regex.h (re_search, re_search_2): Changed BUFFER to not be const.
1679 * regex.c (re_compile_pattern): `^' is in a leading position if
1680 it precedes a newline.
1681 (various routines): Added or changed header comments.
1682 (double_pattern_offsets_list): Changed name from
1683 `extend_pattern_offsets_list'.
1684 (adjust_pattern_offsets_list): Changed return value from
1686 (verify_and_adjust_endlines): Now returns `true' and `false'
1688 `$' is in a leading position if it follows a newline.
1689 (set_bit_to_value, get_bit_value): Exit with error if POSITION < 0
1690 so now calling routines don't have to.
1691 (init_failure_stack, inspect_failure_stack_top,
1692 pop_failure_stack_top, push_pattern_op, double_failure_stack):
1693 Now return value unsigned instead of boolean.
1694 (re_search, re_search_2): Changed BUFP to not be const.
1695 (re_search_2): Added variable const `private_bufp' to send to
1697 (push_failure_point): Made return value unsigned instead of boolean.
1699 Sat May 4 15:32:22 1991 Kathy Hargreaves (kathy at hayley)
1701 * regex.h (re_compile_fastmap): Added extern for this.
1702 Changed some comments.
1704 * regex.c (re_compile_pattern): In case handle_bar: put invalid
1705 pattern test before levels matching stuff.
1706 Changed some commments.
1707 Added optimizing test for detecting an empty alternative that
1708 ends with a trailing '$' at the end of the pattern.
1709 (re_compile_fastmap): Moved failure_stack stuff to before this
1710 so could use it. Made its stack dynamic.
1711 Made it return an int so that it could return -2 if its stack
1712 couldn't be allocated.
1713 Added to header comment (about the return values).
1714 (init_failure_stack): Wrote so both re_match_2 and
1715 re_compile_fastmap could use it similar stacks.
1716 (double_failure_stack): Added for above reasons.
1717 (push_pattern_op): Wrote for re_compile_fastmap.
1718 (re_search_2): Now return -2 if re_compile_fastmap does.
1719 (re_match_2): Made regstart and regend type failure_stack_element*.
1720 (push_failure_point): Made pattern_place and string_place type
1721 failure_stack_element*.
1722 Call double_failure_stack now.
1723 Return true instead of 1.
1725 Wed May 1 12:57:21 1991 Kathy Hargreaves (kathy at hayley)
1727 * regex.c (remove_intervening_anchors): Avoid erroneously making
1728 ops into no_op's by making them no_op only when they're beglines.
1729 (verify_and_adjust_endlines): Don't make '$' a normal character
1730 if it's before a newline.
1731 Look for the endline op in *p, not p[1].
1732 (failure_stack_element): Added this declaration.
1733 (failure_stack_type): Added this declaration.
1734 (INIT_FAILURE_STACK_SIZE, FAILURE_STACK_EMPTY,
1735 FAILURE_STACK_PTR_EMPTY, REMAINING_AVAIL_SLOTS): Added for
1737 (FAILURE_ITEM_SIZE, PUSH_FAILURE_POINT): Deleted.
1738 (FREE_VARIABLES): Now free failure_stack.stack instead of stackb.
1739 (re_match_2): deleted variables `initial_stack', `stackb',
1740 `stackp', and `stacke' and added `failure_stack' to replace them.
1741 Replaced calls to PUSH_FAILURE_POINT with those to
1743 (push_failure_point): Added for re_match_2.
1744 (pop_failure_point): Rewrote to use a failure_stack_type of stack.
1745 (can_match_nothing): Moved definition to below re_match_2.
1746 (bcmp_translate): Moved definition to below re_match_2.
1748 Mon Apr 29 14:20:54 1991 Kathy Hargreaves (kathy at hayley)
1750 * regex.c (enum regexpcode): Added codes endline_before_newline
1751 and repeated_endline_before_newline so could detect these
1752 types of endlines in the intermediate stages of a compiled
1754 (INIT_FAILURE_ALLOC): Renamed NFAILURES to this and set it to 5.
1755 (BUF_PUSH): Put `do {...} while 0' around this.
1756 (BUF_PUSH_2): Defined this to cut down on expansion of EXTEND_BUFFER.
1757 (regex_compile): Changed some comments.
1758 Now push endline_before_newline if find a `$' before a newline
1760 If a `$' might turn into an ordinary character, set laststart
1762 In '^' case, if syntax bit RE_TIGHT_VBAR is set, then for `^'
1763 to be in a leading position, it must be first in the pattern.
1764 Don't have to check in one of the else clauses that it's not set.
1765 If RE_CONTEXTUAL_INDEP_OPS isn't set but RE_ANCHORS_ONLY_AT_ENDS
1766 is, make '^' a normal character if it isn't first in the pattern.
1767 Can only detect at the end if a '$' after an alternation op is a
1768 trailing one, so can't immediately detect empty alternatives
1769 if a '$' follows a vbar.
1770 Added a picture of the ``success jumps'' in alternatives.
1771 Have to set bufp->used before calling verify_and_adjust_endlines.
1772 Also do it before returning all error strings.
1773 (remove_intervening_anchors): Now replaces the anchor with
1774 repeated_endline_before_newline if it's an endline_before_newline.
1775 (verify_and_adjust_endlines): Deleted SYNTAX parameter (could
1776 use bufp's) and added GROUP_FORWARD_MATCH_STATUS so could
1777 detect back references referring to empty groups.
1778 Added variable `bend' to point past the end of the pattern buffer.
1779 Added variable `previous_p' so wouldn't have to reinspect the
1780 pattern buffer to see what op we just looked at.
1781 Added endline_before_newline and repeated_endline_before_newline
1783 When checking if in a trailing position, added case where '$'
1784 has to be at the pattern's end if either of the syntax bits
1785 RE_ANCHORS_ONLY_AT_ENDS or RE_TIGHT_VBAR are set.
1786 Since `endline' can have the intermediate form `endline_in_repeat',
1787 have to change it to `endline' if RE_REPEATED_ANCHORS_AWAY
1789 Now disallow empty alternatives with trailing endlines in them
1790 if RE_NO_EMPTY_ALTS is set.
1791 Now don't make '$' an ordinary character if it precedes a newline.
1792 Don't make it an ordinary character if it's before a newline.
1793 Back references now affect the level matching something only if
1794 they refer to nonempty groups.
1795 (can_match_nothing): Now increment p1 in the switch, which
1796 changes many of the cases, but makes the code more like what
1797 it was derived from.
1798 Adjust the return statement to reflect above.
1799 (struct register_info): Made `can_match_nothing' field an int
1800 instead of a bit so could have -1 in it if never set.
1801 (MAX_FAILURE_ITEMS): Changed name from MAX_NUM_FAILURE_ITEMS.
1802 (FAILURE_ITEM_SIZE): Defined how much space a failure items uses.
1803 (PUSH_FAILURE_POINT): Changed variable `last_used_reg's name
1804 to `highest_used_reg'.
1805 Added variable `num_stack_items' and changed `len's name to
1807 Test failure stack limit in terms of number of items in it, not
1808 in terms of its length. rms' fix tested length against number
1809 of items, which was a misunderstanding.
1810 Use `realloc' instead of `alloca' to extend the failure stack.
1811 Use shifts instead of multiplying by 2.
1812 (FREE_VARIABLES): Free `stackb' instead of `initial_stack', as
1813 might may have been reallocated.
1814 (re_match_2): When mallocing `initial_stack', now multiply
1815 the number of items wanted (what was there before) by
1817 (pop_failure_point): Need this procedure form of the macro of
1818 the same name for debugging, so left it in and deleted the
1820 (recomp): Don't free the pattern buffer's translate field.
1822 Mon Apr 15 09:47:47 1991 Kathy Hargreaves (kathy at hayley)
1824 * regex.h (RE_DUP_MAX): Moved to outside of #ifdef _POSIX_SOURCE.
1825 * regex.c (#include <sys/types.h>): Removed #ifdef _POSIX_SOURCE
1827 (malloc, realloc): Made return type void* #ifdef __STDC__.
1828 (enum regexpcode): Added endline_in_repeat for the compiler's
1829 use; this never ends up on the final compiled pattern.
1830 (INIT_PATTERN_OFFSETS_LIST_SIZE): Initial size for
1831 pattern_offsets_list_type.
1832 (pattern_offset_type): Type for pattern offsets.
1833 (pattern_offsets_list_type): Type for keeping a list of
1835 (anchor_list_type): Changed to above type.
1836 (PATTERN_OFFSETS_LIST_PTR_FULL): Tests if a pattern offsets
1838 (ANCHOR_LIST_PTR_FULL): Changed to above.
1839 (BIT_BLOCK_SIZE): Changed to BITS_BLOCK_SIZE and moved to
1840 above bits list routines below regex_compile.
1841 (op_list_type): Defined to be pattern_offsets_list_type.
1842 (compile_stack_type): Changed offsets to be
1843 pattern_offset_type instead of unsigned.
1844 (pointer): Changed the name of all structure fields from this
1846 (COMPILE_STACK_FULL): Changed so the stack is full if `avail'
1847 is equal to `size' instead of `size' - 1.
1848 (GET_BUFFER_SPACE): Changed `>=' to `>' in the while statement.
1849 (regex_compile): Added variable `enough_memory' so could check
1850 that routine that verifies '$' positions could return an
1852 (group_count): Deleted this variable, as `regnum' already does
1854 (op_list): Added this variable to keep track of operations
1855 needed for verifying '$' positions.
1856 (anchor_list): Now initialize using routine
1857 `init_pattern_offsets_list'.
1858 Consolidated the three bits_list initializations.
1859 In case '$': Instead of trying to go past constructs which can
1860 follow '$', merely detect the special case where it has to be
1861 at the pattern's end, fix up any fixup jumps if necessary,
1862 record the anchor if necessary and add an `endline' (and
1863 possibly two `no-op's) to the pattern; will call a routine at
1864 the end to verify if it's in a valid position or not.
1865 (init_pattern_offsets_list): Added to initialize pattern
1867 (extend_anchor_list): Renamed this extend_pattern_offsets_list
1868 and renamed parameters and internal variables appropriately.
1869 (add_pattern_offset): Added this routine which both
1870 record_anchor_position and add_op call.
1871 (adjust_pattern_offsets_list): Add this routine to adjust by
1872 some increment all the pattern offsets a list of such after a
1874 (record_anchor_position): Now send in offset instead of
1875 calculating it and just call add_pattern_offset.
1876 (adjust_anchor_list): Replaced by above routine.
1877 (remove_intervening_anchors): If the anchor is an `endline'
1878 then replace it with `endline_in_repeat' instead of `no_op'.
1879 (add_op): Added this routine to call in regex_compile
1880 wherever push something relevant to verifying '$' positions.
1881 (verify_and_adjust_endlines): Added routine to (1) verify that
1882 '$'s in a pattern buffer (represented by `endline') were in
1883 valid positions and (2) whether or not they were anchors.
1884 (BITS_BLOCK_SIZE): Renamed BIT_BLOCK_SIZE and moved to right
1885 above bits list routines.
1886 (BITS_BLOCK): Defines which array element of a bits list the
1887 bit corresponding to a given position is in.
1888 (BITS_MASK): Has a 1 where the bit (in a bit list array element)
1889 for a given position is.
1891 Mon Apr 1 12:09:06 1991 Kathy Hargreaves (kathy at hayley)
1893 * regex.c (BIT_BLOCK_SIZE): Defined this for using with
1894 bits_list_type, abstracted from level_list_type so could use
1895 for more things than just the level match status.
1896 (regex_compile): Renamed `level_list' variable to
1897 `level_match_status'.
1898 Added variable `group_match_status' of type bits_list_type.
1899 Kept track of whether or not for all groups any of them
1900 matched other than the empty string, so detect if a back
1901 reference in front of a '^' made it nonleading or not.
1902 Do this by setting a match status bit for all active groups
1903 whenever leave a group that matches other than the empty string.
1904 Could detect which groups are active by going through the
1905 stack each time, but or-ing a bits list of active groups with
1906 a bits list of group match status is faster, so make a bits
1907 list of active groups instead.
1908 Have to check that '^' isn't in a leading position before
1909 going to normal_char.
1910 Whenever set level match status of the current level, also set
1911 the match status of all active groups.
1912 Increase the group count and make that group active whenever
1914 When close a group, only set the next level down if the
1915 current level matches other than the empty string, and make
1916 the current group inactive.
1917 At a back reference, only set a level's match status if the
1918 group to which the back reference refers matches other than
1920 (init_bits_list): Added to initialize a bits list.
1921 (get_level_value): Deleted this. (Made into
1922 get_level_match_status.)
1923 (extend_bits_list): Added to extend a bits list. (Made this
1924 from deleted routine `extend_level_list'.)
1925 (get_bit): Added to get a bit value from a bits list. (Made
1926 this from deleted routine `get_level_value'.)
1927 (set_bit_to_value): Added to set a bit in a bits list. (Made
1928 this from deleted routine `set_level_value'.)
1929 (get_level_match_status): Added this to get the match status
1930 of a given level. (Made from get_level_value.)
1931 (set_this_level, set_next_lower_level): Made all routines
1932 which set bits extend the bits list if necessary, thus they
1933 now return an unsigned value to indicate whether or not the
1934 reallocation failed.
1935 (increase_level): No longer extends the level list.
1936 (make_group_active): Added to mark as active a given group in
1937 an active groups list.
1938 (make_group_inactive): Added to mark as inactive a given group
1939 in an active groups list.
1940 (set_match_status_of_active_groups): Added to set the match
1941 status of all currently active groups.
1942 (get_group_match_status): Added to get a given group's match status.
1943 (no_levels_match_anything): Removed the paramenter LEVEL.
1944 (PUSH_FAILURE_POINT): Added rms' bug fix and changed RE_NREGS
1945 to num_internal_regs.
1947 Sun Mar 31 09:04:30 1991 Kathy Hargreaves (kathy at hayley)
1949 * regex.h (RE_ANCHORS_ONLY_AT_ENDS): Added syntax so could
1950 constrain '^' and '$' to only be anchors if at the beginning
1951 and end of the pattern.
1952 (RE_SYNTAX_POSIX_BASIC): Added the above bit.
1954 * regex.c (enum regexcode): Changed `unused' to `no_op'.
1955 (this_and_lower_levels_match_nothing): Deleted forward reference.
1956 (regex_compile): case '^': if the syntax bit RE_ANCHORS_ONLY_AT_ENDS
1957 is set, then '^' is only an anchor if at the beginning of the
1958 pattern; only record anchor position if the syntax bit
1959 RE_REPEATED_ANCHORS_AWAY is set; the '^' is a normal char if
1960 the syntax bit RE_ANCHORS_ONLY_AT_END is set and we're not at
1961 the beginning of the pattern (and neither RE_CONTEXTUAL_INDEP_OPS
1962 nor RE_CONTEXTUAL_INDEP_OPS syntax bits are set).
1963 Only adjust the anchor list if the syntax bit
1964 RE_REPEATED_ANCHORS_AWAY is set.
1966 * regex.c (level_list_type): Use to detect when '^' is
1967 in a leading position.
1968 (regex_compile): Added level_list_type level_list variable in
1969 which we keep track of whether or not a grouping level (in its
1970 current or most recent incarnation) matches anything besides the
1971 empty string. Set the bit for the i-th level when detect it
1972 should match something other than the empty string and the bit
1973 for the (i-1)-th level when leave the i-th group. Clear all
1974 bits for the i-th and higher levels if none of 0--(i - 1)-th's
1975 bits are set when encounter an alternation operator on that
1976 level. If no levels are set when hit a '^', then it is in a
1977 leading position. We keep track of which level we're at by
1978 increasing a variable current_level whenever we encounter an
1979 open-group operator and decreasing it whenever we encounter a
1980 close-group operator.
1981 Have to adjust the anchor list contents whenever insert
1982 something ahead of them (such as on_failure_jump's) in the
1984 (adjust_anchor_list): Adjusts the offsets in an anchor list by
1985 a given increment starting at a given start position.
1986 (get_level_value): Returns the bit setting of a given level.
1987 (set_level_value): Sets the bit of a given level to a given value.
1988 (set_this_level): Sets (to 1) the bit of a given level.
1989 (set_next_lower_level): Sets (to 1) the bit of (LEVEL - 1) for a
1991 (clear_this_and_higher_levels): Clears the bits for a given
1992 level and any higher levels.
1993 (extend_level_list): Adds sizeof(unsigned) more bits to a level list.
1994 (increase_level): Increases by 1 the value of a given level variable.
1995 (decrease_level): Decreases by 1 the value of a given level variable.
1996 (lower_levels_match_nothing): Checks if any levels lower than
1997 the given one match anything.
1998 (no_levels_match_anything): Checks if any levels match anything.
1999 (re_match_2): At case wordbeg: before looking at d-1, check that
2000 we're not at the string's beginning.
2001 At case wordend: Added some illuminating parentheses.
2003 Mon Mar 25 13:58:51 1991 Kathy Hargreaves (kathy at hayley)
2005 * regex.h (RE_NO_ANCHOR_AT_NEWLINE): Changed syntax bit name
2006 from RE_ANCHOR_NOT_NEWLINE because an anchor never matches the
2007 newline itself, just the empty string either before or after it.
2008 (RE_REPEATED_ANCHORS_AWAY): Added this syntax bit for ignoring
2009 anchors inside groups which are operated on by repetition
2011 (RE_DOT_MATCHES_NEWLINE): Added this bit so the match-any-character
2012 operator could match a newline when it's set.
2013 (RE_SYNTAX_POSIX_BASIC): Set RE_DOT_MATCHES_NEWLINE in this.
2014 (RE_SYNTAX_POSIX_EXTENDED): Set RE_DOT_MATCHES_NEWLINE and
2015 RE_REPEATED_ANCHORS_AWAY in this.
2016 (regerror): Changed prototypes to new POSIX spec.
2018 * regex.c (anchor_list_type): Added so could null out anchors inside
2020 (ANCHOR_LIST_PTR_FULL): Added for above type.
2021 (compile_stack_element): Changed name from stack_element.
2022 (compile_stack_type): Changed name from compile_stack.
2023 (INIT_COMPILE_STACK_SIZE): Changed name from INIT_STACK_SIZE.
2024 (COMPILE_STACK_EMPTY): Changed name from STACK_EMPTY.
2025 (COMPILE_STACK_FULL): Changed name from STACK_FULL.
2026 (regex_compile): Changed SYNTAX parameter to non-const.
2027 Changed variable name `stack' to `compile_stack'.
2028 If syntax bit RE_REPEATED_ANCHORS_AWAY is set, then naively put
2029 anchors in a list when encounter them and then set them to
2030 `unused' when detect they are within a group operated on by a
2031 repetition operator. Need something more sophisticated than
2032 this, as they should only get set to `unused' if they are in
2033 positions where they would be anchors. Also need a better way to
2034 detect contextually invalid anchors.
2035 Changed some commments.
2036 (is_in_compile_stack): Changed name from `is_in_stack'.
2037 (extend_anchor_list): Added to do anchor stuff.
2038 (record_anchor_position): Added to do anchor stuff.
2039 (remove_intervening_anchors): Added to do anchor stuff.
2040 (re_match_2): Now match a newline with the match-any-character
2041 operator if RE_DOT_MATCHES_NEWLINE is set.
2042 Compacted some code.
2043 (regcomp): Added new POSIX newline information to the header
2045 If REG_NEWLINE cflag is set, then now unset RE_DOT_MATCHES_NEWLINE
2047 (put_in_buffer): Added to do new POSIX regerror spec. Called
2049 (regerror): Changed to take a pattern buffer, error buffer and
2050 its size, and return type `size_t', the size of the full error
2051 message, and the first ERRBUF_SIZE - 1 characters of the full
2052 error message in the error buffer.
2054 Wed Feb 27 16:38:33 1991 Kathy Hargreaves (kathy at hayley)
2056 * regex.h (#include <sys/types.h>): Removed this as new POSIX
2057 standard has the user include it.
2058 (RE_SYNTAX_POSIX_BASIC and RE_SYNTAX_POSIX_EXTENDED): Removed
2059 RE_HAT_LISTS_NOT_NEWLINE as new POSIX standard has the cflag
2060 REG_NEWLINE now set this. Similarly, added syntax bit
2061 RE_ANCHOR_NOT_NEWLINE as this is now unset by REG_NEWLINE.
2062 (RE_SYNTAX_POSIX_BASIC): Removed syntax bit
2063 RE_NO_CONSECUTIVE_REPEATS as POSIX now allows them.
2065 * regex.c (#include <sys/types.h>): Added this as new POSIX
2066 standard has the user include it instead of us putting it in
2068 (extern char *re_syntax_table): Made into an extern so the
2069 user could allocate it.
2070 (DO_RANGE): If don't find a range end, now goto invalid_range_end
2071 instead of unmatched_left_bracket.
2072 (regex_compile): Made variable SYNTAX non-const.????
2073 Reformatted some code.
2074 (re_compile_fastmap): Moved is_a_succeed_n's declaration to
2076 Compacted some code.
2077 (SET_NEWLINE_FLAG): Removed and put inline.
2078 (regcomp): Made variable `syntax' non-const so can unset
2079 RE_ANCHOR_NOT_NEWLINE syntax bit if cflag RE_NEWLINE is set.
2080 If cflag RE_NEWLINE is set, set the RE_HAT_LISTS_NOT_NEWLINE
2081 syntax bit and unset RE_ANCHOR_NOT_NEWLINE one of `syntax'.
2083 Wed Feb 20 16:33:38 1991 Kathy Hargreaves (kathy at hayley)
2085 * regex.h (RE_NO_CONSECUTIVE_REPEATS): Changed name from
2086 RE_NO_CONSEC_REPEATS.
2087 (REG_ENESTING): Deleted this POSIX return value, as the stack
2089 (struct re_pattern_buffer): Changed some comments.
2090 (re_compile_pattern): Changed a comment.
2091 Deleted check on stack upper bound and corresponding error.
2092 Now when there's no interval contents and it's the end of the
2093 pattern, go to unmatched_left_curly_brace instead of end_of_pattern.
2094 Removed nesting_too_deep error, as the stack is now unbounded.
2095 (regcomp): Removed REG_ENESTING case, as the stack is now unbounded.
2096 (regerror): Removed REG_ENESTING case, as the stack is now unbounded.
2098 * regex.c (MAX_STACK_SIZE): Deleted because don't need upper
2099 bound on array indexed with an unsigned number.
2101 Sun Feb 17 15:50:24 1991 Kathy Hargreaves (kathy at hayley)
2103 * regex.h: Changed and added some comments.
2105 * regex.c (init_syntax_once): Made `_' a word character.
2106 (re_compile_pattern): Added a comment.
2107 (re_match_2): Redid header comment.
2108 (regexec): With header comment about PMATCH, corrected and
2109 removed details found regex.h, adding a reference.
2111 Fri Feb 15 09:21:31 1991 Kathy Hargreaves (kathy at hayley)
2113 * regex.c (DO_RANGE): Removed argument parentheses.
2114 Now get untranslated range start and end characters and set
2115 list bits for the translated (if at all) versions of them and
2116 all characters between them.
2117 (re_match_2): Now use regs->num_regs instead of num_regs_wanted
2119 (regcomp): Now build case-fold translate table using isupper
2120 and tolower facilities so will work on foreign language characters.
2122 Sat Feb 9 16:40:03 1991 Kathy Hargreaves (kathy at hayley)
2124 * regex.h (RE_HAT_LISTS_NOT_NEWLINE): Changed syntax bit name
2125 from RE_LISTS_NOT_NEWLINE as it only affects nonmatching lists.
2126 Changed all references to the match-beginning-of-string
2127 operator to match-beginning-of-line operator, as this is what
2129 (RE_NO_CONSEC_REPEATS): Added this syntax bit.
2130 (RE_SYNTAX_POSIX_BASIC): Added above bit to this.
2131 (REG_PREMATURE_END): Changed name to REG_EEND.
2132 (REG_EXCESS_NESTING): Changed name to REG_ENESTING.
2133 (REG_TOO_BIG): Changed name to REG_ESIZE.
2134 (REG_INVALID_PREV_RE): Deleted this return POSIX value.
2135 Added and changed some comments.
2137 * regex.c (re_compile_pattern): Now sets the pattern buffer's
2138 `return_default_num_regs' field.
2139 (typedef struct stack_element, stack_type, INIT_STACK_SIZE,
2140 MAX_STACK_SIZE, STACK_EMPTY, STACK_FULL): Added for regex_compile.
2141 (INIT_BUF_SIZE): Changed value from 28 to 32.
2142 (BUF_PUSH): Changed name from BUFPUSH.
2143 (MAX_BUF_SIZE): Added so could use in many places.
2144 (IS_CHAR_CLASS_STRING): Replaced is_char_class with this.
2145 (regex_compile): Added a stack which could grow dynamically
2146 and which has struct elements.
2147 Go back to initializing `zero_times_ok' and `many_time_ok' to
2148 0 and |=ing them inside the loop.
2149 Now disallow consecutive repetition operators if the syntax
2150 bit RE_NO_CONSEC_REPEATS is set.
2151 Now detect trailing backslash when the compiler is expecting a
2153 Changed calls to GET_BUFFER_SPACE which asked for 6 to ask for
2154 3, as that's all they needed.
2155 Now check for trailing backslash inside lists.
2156 Now disallow an empty alternative right before an end-of-line
2158 Now get buffer space before leaving space for a fixup jump.
2159 Now check if at pattern end when at open-interval operator.
2160 Added some comments.
2161 Now check if non-interval repetition operators follow an
2162 interval one if the syntax bit RE_NO_CONSEC_REPEATS is set.
2163 Now only check if what precedes an interval repetition
2164 operator isn't a regular expression which matches one
2165 character if the syntax bit RE_NO_CONSEC_REPEATS is set.
2166 Now return "Unmatched [ or [^" instead of "Unmatched [".
2167 (is_in_stack): Added to check if a given register number is in
2169 (re_match_2): If initial variable allocations fail, return -2,
2171 Now set reg's `num_regs' field when allocating regs.
2172 Now before allocating them, free regs->start and end if they
2173 aren't NULL and return -2 if either allocation fails.
2174 Now use regs->num_regs instead of num_regs_wanted to control
2176 Now increment past the newline when matching it with an
2177 end-of-line operator.
2178 (recomp): Added to the header comment.
2179 Now return REG_ESUBREG if regex_compile returns "Unmatched [
2180 or [^" instead of doing so if it returns "Unmatched [".
2181 Now return REG_BADRPT if in addition to returning "Missing
2182 preceding regular expression", regex_compile returns "Invalid
2183 preceding regular expression".
2184 Now return new return value names (see regex.h changes).
2185 (regexec): Added to header comment.
2186 Initialize regs structure.
2187 Now match whole string.
2188 Now always free regs.start and regs.end instead of just when
2190 (regerror): Now return "Regex error: Unmatched [ or [^.\n"
2191 instead of "Regex error: Unmatched [.\n".
2192 Now return "Regex error: Preceding regular expression either
2193 missing or not simple.\n" instead of "Regex error: Missing
2194 preceding regular expression.\n".
2195 Removed REG_INVALID_PREV_RE case (it got subsumed into the
2198 Thu Jan 17 09:52:35 1991 Kathy Hargreaves (kathy at hayley)
2200 * regex.h: Changed a comment.
2202 * regex.c: Changed and added large header comments.
2203 (re_compile_pattern): Now if detect that `laststart' for an
2204 interval points to a byte code for a regular expression which
2205 matches more than one character, make it an internal error.
2206 (regerror): Return error message, don't print it.
2208 Tue Jan 15 15:32:49 1991 Kathy Hargreaves (kathy at hayley)
2210 * regex.h (regcomp return codes): Added GNU ones.
2211 Updated some comments.
2213 * regex.c (DO_RANGE): Changed `obscure_syntax' to `syntax'.
2214 (regex_compile): Added `following_left_brace' to keep track of
2215 where pseudo interval following a valid interval starts.
2216 Changed some instances that returned "Invalid regular
2217 expression" to instead return error strings coinciding with
2219 Changed some comments.
2220 Now consider only things between `[:' and `:]' to be possible
2221 character class names.
2222 Now a character class expression can't end a pattern; at
2223 least a `]' must close the list.
2224 Now if the syntax bit RE_NO_BK_CURLY_BRACES is set, then a
2225 valid interval must be followed by yet another to get an error
2226 for preceding an interval (in this case, the second one) with
2227 a regular expression that matches more than one character.
2228 Now if what follows a valid interval begins with a open
2229 interval operator but doesn't begin a valid interval, then set
2230 following_left_bracket to it, put it in C and go to
2232 Added some comments.
2233 Return "Invalid character class name" instead of "Invalid
2235 (regerror): Return messages for all POSIX error codes except
2236 REG_ECOLLATE and REG_NEWLINE, along with all GNU error codes.
2237 Added `break's after all cases.
2238 (main): Call re_set_syntax instead of setting `obscure_syntax'
2241 Sat Jan 12 13:37:59 1991 Kathy Hargreaves (kathy at hayley)
2243 * regex.h (Copyright): Updated date.
2244 (#include <sys/types.h>): Include unconditionally.
2245 (RE_CANNOT_MATCH_NEWLINE): Deleted this syntax bit.
2246 (RE_SYNTAX_POSIX_BASIC, RE_SYNTAX_POSIX_EXTENDED): Removed
2247 setting the RE_ANCHOR_NOT_NEWLINE syntax bit from these.
2248 Changed and added some comments.
2249 (struct re_pattern_buffer): Changed some flags from chars to bits.
2250 Added field `syntax'; holds which syntax pattern was compiled with.
2251 Added bit flag `return_default_num_regs'.
2252 (externs for GNU and Berkeley UNIX routines): Added `const's to
2253 parameter types to be compatible with POSIX.
2254 (#define const): Added to support old C compilers.
2256 * regex.c (Copyright): Updated date.
2257 (enum regexpcode): Deleted `newline'.
2258 (regex_compile): Renamed re_compile_pattern to this, added a
2259 syntax parameter so it can set the pattern buffer's `syntax'
2261 Made `pattern', and `size' `const's so could pass to POSIX
2262 interface routines; also made `const' whatever interval
2263 variables had to be to make this work.
2264 Changed references to `obscure_syntax' to new parameter `syntax'.
2265 Deleted putting `newline' in buffer when see `\n'.
2266 Consider invalid character classes which have nothing wrong
2267 except the character class name; if so, return character-class error.
2268 (is_char_class): Added routine for regex_compile.
2269 (re_compile_pattern): added a new one which calls
2270 regex_compile with `obscure_syntax' as the actual parameter
2271 for the formal `syntax'.
2272 Gave this the old routine's header comments.
2273 Made `pattern', and `size' `const's so could use POSIX interface
2275 (re_search, re_search_2, re_match, re_match_2): Changed
2277 (re_search_2, re_match_2): Changed `mstop' to `stop'.
2278 (re_search, re_search_2): Made all parameters except `regs'
2279 `const's so could use POSIX interface routines parameters.
2280 (re_search_2): Added private copies of `const' parameters so
2281 could change their values.
2282 (re_match_2): Made all parameters except `regs' `const's so
2283 could use POSIX interface routines parameters.
2284 Changed `size1' and `size2' parameters to `size1_arg' and
2285 `size2_arg' and so could change; added local `size1' and
2286 `size2' and set to these.
2287 Added some comments.
2288 Deleted `newline' case.
2289 `begline' can also possibly match if `d' contains a newline;
2290 if it does, we have to increment d to point past the newline.
2291 Replaced references to `obscure_syntax' with `bufp->syntax'.
2292 (re_comp, re_exec): Made parameter `s' a `const' so could use POSIX
2293 interface routines parameters.
2294 Now call regex_compile, passing `obscure_syntax' via the
2296 (re_exec): Made local `len' a `const' so could pass to re_search.
2297 (regcomp): Added header comment.
2298 Added local `syntax' to set and pass to regex_compile rather
2299 than setting global `obscure_syntax' and passing it.
2300 Call regex_compile with its `syntax' parameter rather than
2302 Return REG_ECTYPE if character-class error.
2303 (regexec): Don't initialize `regs' to anything.
2304 Made `private_preg' a nonpointer so could set to what the
2305 constant `preg' points.
2306 Initialize `private_preg's `return_default_num_regs' field to
2307 zero because want to return `nmatch' registers, not however
2308 many there are subexpressions in the pattern.
2309 Also test if `nmatch' > 0 to see if should pass re_match `regs'.
2311 Tue Jan 8 15:57:17 1991 Kathy Hargreaves (kathy at hayley)
2313 * regex.h (struct re_pattern_buffer): Reworded comment.
2315 * regex.c (EXTEND_BUFFER): Also reset beg_interval.
2316 (re_search_2): Return val if val = -2.
2317 (NUM_REG_ITEMS): Listed items in comment.
2318 (NUM_OTHER_ITEMS): Defined this for using in > 1 definition.
2319 (MAX_NUM_FAILURE_ITEMS): Replaced `+ 2' with NUM_OTHER_ITEMS.
2320 (NUM_FAILURE_ITEMS): As with definition above and added to
2322 (PUSH_FAILURE_POINT): Replaced `* 2's with `<< 1's.
2323 (re_match_2): Test with equality with 1 to see pbufp->bol and
2326 Fri Jan 4 15:07:22 1991 Kathy Hargreaves (kathy at hayley)
2328 * regex.h (struct re_pattern_buffer): Reordered some fields.
2329 Updated some comments.
2330 Added not_bol and not_eol fields.
2331 (extern regcomp, regexec, regerror): Added return types.
2332 (extern regfree): Added `extern'.
2334 * regex.c (min): Deleted unused macro.
2335 (re_match_2): Compacted some code.
2336 Removed call to macro `min' from `for' loop.
2337 Fixed so unused registers get filled with -1's.
2338 Fail if the pattern buffer's `not_bol' field is set and
2339 encounter a `begline'.
2340 Fail if the pattern buffer's `not_eol' field is set and
2341 encounter a `endline'.
2342 Deleted redundant check for empty stack in fail case.
2343 Don't free pattern buffer's components in re_comp.
2344 (regexec): Initialize variable regs.
2345 Added `private_preg' pattern buffer so could set `not_bol' and
2346 `not_eol' fields and hand to re_match.
2347 Deleted naive attempt to detect anchors.
2348 Set private pattern buffer's `not_bol' and `not_eol' fields
2349 according to eflags value.
2350 `nmatch' must also be > 0 for us to bother allocating
2351 registers to send to re_match and filling pmatch
2352 with their results after the call to re_match.
2353 Send private pattern buffer instead of argument to re_match.
2354 If use the registers, always free them and then set them to NULL.
2355 (regerror): Added this Posix routine.
2356 (regfree): Added this Posix routine.
2358 Tue Jan 1 15:02:45 1991 Kathy Hargreaves (kathy at hayley)
2360 * regex.h (RE_NREGS): Deleted this definition, as now the user
2361 can choose how many registers to have.
2362 (REG_NOTBOL, REG_NOTEOL): Defined these Posix eflag bits.
2363 (REG_NOMATCH, REG_BADPAT, REG_ECOLLATE, REG_ECTYPE,
2364 REG_EESCAPE, REG_ESUBREG, REG_EBRACK, REG_EPAREN, REG_EBRACE,
2365 REG_BADBR, REG_ERANGE, REG_ESPACE, REG_BADRPT, REG_ENEWLINE):
2366 Defined these return values for Posix's regcomp and regexec.
2367 Updated some comments.
2368 (struct re_pattern_buffer): Now typedef this as regex_t
2369 instead of the other way around.
2370 (struct re_registers): Added num_regs field. Made start and
2371 end fields pointers to char instead of fixed size arrays.
2372 (regmatch_t): Added this Posix register type.
2373 (regcomp, regexec, regerror, regfree): Added externs for these
2376 * regex.c (enum boolean): Typedefed this.
2377 (re_pattern_buffer): Reformatted some comments.
2378 (re_compile_pattern): Updated some comments.
2379 Always push start_memory and its attendant number whenever
2380 encounter a group, not just when its number is less than the
2381 previous maximum number of registers; same for stop_memory.
2382 Get 4 bytes of buffer space instead of 2 when pushing a
2384 (can_match_nothing): Added this to elaborate on and replace
2386 (reg_info_type): Made can_match_nothing field a bit instead of int.
2387 (MIN): Added for re_match_2.
2388 (re_match_2 macros): Changed all `for' loops which used
2389 RE_NREGS to now use num_internal_regs as upper bounds.
2390 (MAX_NUM_FAILURE_ITEMS): Use num_internal_regs instead of RE_NREGS.
2391 (POP_FAILURE_POINT): Added check for empty stack.
2392 (FREE_VARIABLES): Added this to free (and set to NULL)
2393 variables allocated in re_match_2.
2394 (re_match_2): Rearranged parameters to be in order.
2395 Added variables num_regs_wanted (how many registers the user wants)
2396 and num_internal_regs (how many groups there are).
2397 Allocated initial_stack, regstart, regend, old_regstart,
2398 old_regend, reginfo, best_regstart, and best_regend---all
2399 which used to be fixed size arrays. Free them all and return
2401 Free above variables if starting position pos isn't valid.
2402 Changed all `for' loops which used RE_NREGS to now use
2403 num_internal_regs as upper bounds---except for the loops which
2404 fill regs; then use num_regs_wanted.
2405 Allocate regs if the user has passed it and wants more than 0
2407 Set regs->start[i] and regs->end[i] to -1 if either
2408 regstart[i] or regend[i] equals -1, not just the first.
2409 Free allocated variables before returning.
2410 Updated some comments.
2411 (regcomp): Return REG_ESPACE, REG_BADPAT, REG_EPAREN when
2413 Free translate array.
2414 (regexec): Added this Posix interface routine.
2416 Mon Dec 24 14:21:13 1990 Kathy Hargreaves (kathy at hayley)
2418 * regex.h: If _POSIX_SOURCE is defined then #include <sys/types.h>.
2419 Added syntax bit RE_CANNOT_MATCH_NEWLINE.
2420 Defined Posix cflags: REG_EXTENDED, REG_NEWLINE, REG_ICASE, and
2422 Added fields re_nsub and no_sub to struct re_pattern_buffer.
2423 Typedefed regex_t to be `struct re_pattern_buffer'.
2425 * regex.c (CHAR_SET_SIZE): Defined this to be 256 and replaced
2426 incidences of this value with this constant.
2427 (re_compile_pattern): Added switch case for `\n' and put
2428 `newline' into the pattern buffer when encounter this.
2429 Increment the pattern_buffer's `re_nsub' field whenever open a
2431 (re_match_2): Match a newline with `newline'---provided the
2432 syntax bit RE_CANNOT_MATCH_NEWLINE isn't set.
2433 (regcomp): Added this Posix interface routine.
2434 (enum test_type): Added interface_test tag.
2435 (main): Added Posix interface test.
2437 Tue Dec 18 12:58:12 1990 Kathy Hargreaves (kathy at hayley)
2439 * regex.h (struct re_pattern_buffer): reformatted so would fit
2440 in texinfo documentation.
2442 Thu Nov 29 15:49:16 1990 Kathy Hargreaves (kathy at hayley)
2444 * regex.h (RE_NO_EMPTY_ALTS): Added this bit.
2445 (RE_SYNTAX_POSIX_EXTENDED): Added above bit.
2447 * regex.c (re_compile_pattern): Disallow empty alternatives only
2448 when RE_NO_EMPTY_ALTS is set, not when RE_CONTEXTUAL_INVALID_OPS is.
2449 Changed RE_NO_BK_CURLY_BRACES to RE_NO_BK_PARENS when testing
2450 for empty groups at label handle_open.
2451 At label handle_bar: disallow empty alternatives if RE_NO_EMPTY_ALTS
2453 Rewrote some comments.
2455 (re_compile_fastmap): cleaned up code.
2457 (re_search_2): Rewrote comment.
2459 (struct register_info): Added field `inner_groups'; it records
2460 which groups are inside of the current one.
2461 Added field can_match_nothing; it's set if the current group
2463 Added field ever_match_something; it's set if current group
2464 ever matched something.
2466 (INNER_GROUPS): Added macro to access inner_groups field of
2467 struct register_info.
2469 (CAN_MATCH_NOTHING): Added macro to access can_match_nothing
2470 field of struct register_info.
2472 (EVER_MATCHED_SOMETHING): Added macro to access
2473 ever_matched_something field of struct register_info.
2475 (NOTE_INNER_GROUP): Defined macro to record that a given group
2476 is inside of all currently active groups.
2478 (re_match_2): Added variables *p1 and mcnt2 (multipurpose).
2479 Added old_regstart and old_regend arrays to hold previous
2480 register values if they need be restored.
2481 Initialize added fields and variables.
2482 case start_memory: Find out if the group can match nothing.
2483 Save previous register values in old_restart and old_regend.
2484 Record that current group is inside of all currently active
2486 If the group is inside a loop and it ever matched anything,
2487 restore its registers to values before the last failed match.
2488 Restore the registers for the inner groups, too.
2489 case duplicate: Can back reference to a group that never
2490 matched if it can match nothing.
2492 Thu Nov 29 11:12:54 1990 Karl Berry (karl at hayley)
2494 * regex.c (bcopy, ...): define these if either _POSIX_SOURCE or
2495 STDC_HEADERS is defined; same for including <stdlib.h>.
2497 Sat Oct 6 16:04:55 1990 Kathy Hargreaves (kathy at hayley)
2499 * regex.h (struct re_pattern_buffer): Changed field comments.
2501 * regex.c (re_compile_pattern): Allow a `$' to precede an
2502 alternation operator (`|' or `\|').
2503 Disallow `^' and/or `$' in empty groups if the syntax bit
2504 RE_NO_EMPTY_GROUPS is set.
2505 Wait until have parsed a valid `\{...\}' interval expression
2506 before testing RE_CONTEXTUAL_INVALID_OPS to see if it's
2507 invalidated by that.
2508 Don't use RE_NO_BK_CURLY_BRACES to test whether or not a validly
2509 parsed interval expression is invalid if it has no preceding re;
2510 rather, use RE_CONTEXTUAL_INVALID_OPS.
2511 If an interval parses, but there is no preceding regular
2512 expression, yet the syntax bit RE_CONTEXTUAL_INDEP_OPS is set,
2513 then that interval can match the empty regular expression; if
2514 the bit isn't set, then the characters in the interval
2515 expression are parsed as themselves (sans the backslashes).
2516 In unfetch_interval case: Moved PATFETCH to above the test for
2517 RE_NO_BK_CURLY_BRACES being set, which would force a goto
2518 normal_backslash; the code at both normal_backsl and normal_char
2519 expect a character in `c.'
2521 Sun Sep 30 11:13:48 1990 Kathy Hargreaves (kathy at hayley)
2523 * regex.h: Changed some comments to use the terms used in the
2525 (RE_CONTEXTUAL_INDEP_OPS): Changed name from `RE_CONTEXT_INDEP_OPS'.
2526 (RE_LISTS_NOT_NEWLINE): Changed name from `RE_HAT_NOT_NEWLINE.'
2527 (RE_ANCHOR_NOT_NEWLINE): Added this syntax bit.
2528 (RE_NO_EMPTY_GROUPS): Added this syntax bit.
2529 (RE_NO_HYPHEN_RANGE_END): Deleted this syntax bit.
2530 (RE_SYNTAX_...): Reformatted.
2531 (RE_SYNTAX_POSIX_BASIC, RE_SYNTAX_EXTENDED): Added syntax bits
2532 RE_ANCHOR_NOT_NEWLINE and RE_NO_EMPTY_GROUPS, and deleted
2533 RE_NO_HYPHEN_RANGE_END.
2534 (RE_SYNTAX_POSIX_EXTENDED): Added syntax bit RE_DOT_NOT_NULL.
2536 * regex.c (bcopy, bcmp, bzero): Define if _POSIX_SOURCE is defined.
2537 (_POSIX_SOURCE): ifdef this, #include <stdlib.h>
2538 (#ifdef emacs): Changed comment of the #endif for the its #else
2539 clause to be `not emacs', not `emacs.'
2540 (no_pop_jump): Changed name from `jump'.
2541 (pop_failure_jump): Changed name from `finalize_jump.'
2542 (maybe_pop_failure_jump): Changed name from `maybe_finalize_jump'.
2543 (no_pop_jump_n): Changed name from `jump_n.'
2544 (EXTEND_BUFFER): Use shift instead of multiplication to double
2546 (DO_RANGE, recompile_pattern): Added macro to set the list bits
2548 (re_compile_pattern): Fixed grammar problems in some comments.
2549 Checked that RE_NO_BK_VBAR is set to make `$' valid before a `|'
2550 and not set to make it valid before a `\|'.
2551 Checked that RE_NO_BK_PARENS is set to make `$' valid before a ')'
2552 and not set to make it valid before a `\)'.
2553 Disallow ranges starting with `-', unless the range is the
2554 first item in a list, rather than disallowing ranges which end
2556 Disallow empty groups if the syntax bit RE_NO_EMPTY_GROUPS is set.
2557 Disallow nothing preceding `{' and `\{' if they represent the
2558 open-interval operator and RE_CONTEXTUAL_INVALID_OPS is set.
2559 (register_info_type): typedef-ed this using `struct register_info.'
2560 (SET_REGS_MATCHED): Compacted the code.
2561 (re_match_2): Made it fail if back reference a group which we've
2563 Made `^' not match a newline if the syntax bit
2564 RE_ANCHOR_NOT_NEWLINE is set.
2565 (really_fail): Added this label so could force a final fail that
2566 would not try to use the failure stack to recover.
2568 Sat Aug 25 14:23:01 1990 Kathy Hargreaves (kathy at hayley)
2570 * regex.h (RE_CONTEXTUAL_OPS): Changed name from RE_CONTEXT_OPS.
2571 (global): Rewrote comments and rebroke some syntax #define lines.
2573 * regex.c (isgraph): Added definition for sequents.
2574 (global): Now refer to character set lists as ``lists.''
2575 Rewrote comments containing ``\('' or ``\)'' to now refer to
2577 (RE_CONTEXTUAL_OPS): Changed name from RE_CONTEXT_OPS.
2579 (re_compile_pattern): Expanded header comment.
2581 Sun Jul 15 14:50:25 1990 Kathy Hargreaves (kathy at hayley)
2583 * regex.h (RE_CONTEX_INDEP_OPS): the comment's sense got turned
2584 around when we changed how it read; changed it to be correct.
2586 Sat Jul 14 16:38:06 1990 Kathy Hargreaves (kathy at hayley)
2588 * regex.h (RE_NO_EMPTY_BK_REF): changed name to
2589 RE_NO_MISSING_BK_REF, as this describes it better.
2591 * regex.c (re_compile_pattern): changed RE_NO_EMPTY_BK_REF
2592 to RE_NO_MISSING_BK_REF, as above.
2594 Thu Jul 12 11:45:05 1990 Kathy Hargreaves (kathy at hayley)
2596 * regex.h (RE_NO_EMPTY_BRACKETS): removed this syntax bit, as
2597 bracket expressions should *never* be empty regardless of the
2598 syntax. Removes this bit from RE_SYNTAX_POSIX_BASIC and
2599 RE_SYNTAX_POSIX_EXTENDED.
2601 * regex.c (SET_LIST_BIT): in the comment, now refer to character
2602 sets as (non)matching sets, as bracket expressions can now match
2603 other things in addition to characters.
2604 (re_compile_pattern): refer to groups as such instead of `\(...\)'
2605 or somesuch, because groups can now be enclosed in either plain
2606 parens or backslashed ones, depending on the syntax.
2607 In the '[' case, added a boolean just_had_a_char_class to detect
2608 whether or not a character class begins a range (which is invalid).
2609 Restore way of breaking out of a bracket expression to original way.
2610 Add way to detect a range if the last thing in a bracket
2611 expression was a character class.
2612 Took out check for c != ']' at the end of a character class in
2613 the else clause, as it had already been checked in the if part
2614 that also checked the validity of the string.
2615 Set or clear just_had_a_char_class as appropriate.
2616 Added some comments. Changed references to character sets to
2617 ``(non)matching lists.''
2619 Sun Jul 1 12:11:29 1990 Karl Berry (karl at hayley)
2621 * regex.h (BYTEWIDTH): moved back to regex.c.
2623 * regex.h (re_compile_fastmap): removed declaration; this
2624 shouldn't be advertised.
2626 Mon May 28 15:27:53 1990 Kathy Hargreaves (kathy at hayley)
2628 * regex.c (ifndef Sword): Made comments more specific.
2629 (global): include <stdio.h> so can write fatal messages on
2630 standard error. Replaced calls to assert with fprintfs to
2631 stderr and exit (1)'s.
2632 (PREFETCH): Reformatted to make more readable.
2633 (AT_STRINGS_BEG): Defined to test if we're at the beginning of
2634 the virtual concatenation of string1 and string2.
2635 (AT_STRINGS_END): Defined to test if at the end of the virtual
2636 concatenation of string1 and string2.
2637 (AT_WORD_BOUNDARY): Defined to test if are at a word boundary.
2638 (IS_A_LETTER(d)): Defined to test if the contents of the pointer D
2640 (re_match_2): Rewrote the wordbound, notwordbound, wordbeg, wordend,
2641 begbuf, and endbuf cases in terms of the above four new macros.
2642 Called SET_REGS_MATCHED in the matchsyntax, matchnotsyntax,
2643 wordchar, and notwordchar cases.
2645 Mon May 14 14:49:13 1990 Kathy Hargreaves (kathy at hayley)
2647 * regex.c (re_search_2): Fixed RANGE to not ever take STARTPOS
2648 outside of virtual concatenation of STRING1 and STRING2.
2649 Updated header comment as to this.
2650 (re_match_2): Clarified comment about MSTOP in header.
2652 Sat May 12 15:39:00 1990 Kathy Hargreaves (kathy at hayley)
2654 * regex.c (re_search_2): Checked for out-of-range STARTPOS.
2656 When searching backwards, not only get the character with which
2657 to compare to the fastmap from string2 if the starting position
2658 >= size1, but also if size1 is zero; this is so won't get a
2659 segmentation fault if string1 is null.
2660 Reformatted code at label advance.
2662 Thu Apr 12 20:26:21 1990 Kathy Hargreaves (kathy at hayley)
2664 * regex.h: Added #pragma once and #ifdef...endif __REGEXP_LIBRARY.
2665 (RE_EXACTN_VALUE): Added for search.c to use.
2666 Reworded some comments.
2668 regex.c: Punctuated some comments correctly.
2669 (NULL): Removed this.
2670 (RE_EXACTN_VALUE): Added for search.c to use.
2671 (<ctype.h>): Moved this include to top of file.
2672 (<assert.h>): Added this include.
2673 (struct regexpcode): Assigned 0 to unused and 1 to exactn
2674 because of RE_EXACTN_VALUE.
2676 (various macros): Lined up backslashes near end of line.
2677 (insert_jump): Cleaned up the header comment.
2678 (re_search): Corrected the header comment.
2679 (re_search_2): Cleaned up and completed the header comment.
2680 (re_max_failures): Updated comment.
2681 (struct register_info): Constructed as bits so as to save space
2682 on the stack when pushing register information.
2683 (IS_ACTIVE): Macro for struct register_info.
2684 (MATCHED_SOMETHING): Macro for struct register_info.
2685 (NUM_REG_ITEMS): How many register information items for each
2686 register we have to push on the stack at each failure.
2687 (MAX_NUM_FAILURE_ITEMS): If push all the registers on failure,
2688 this is how many items we push on the stack.
2689 (PUSH_FAILURE_POINT): Now pushes whether or not the register is
2690 currently active, and whether or not it matched something.
2691 Checks that there's enough space allocated to accomodate all the
2692 items we currently want to push. (Before, a test for an empty
2693 stack sufficed because we always pushed and popped the same
2695 Replaced ``2'' with MAX_NUM_FAILURE_POINTS when ``2'' refers
2696 to how many things get pushed on the stack each time.
2697 When copy the stack into the newly allocated storage, now only copy
2700 (POP_FAILURE_POINT): Defined to use in places where put number
2701 of registers on the stack into a variable before using it to
2702 decrement the stack, so as to not confuse the compiler.
2703 (IS_IN_FIRST_STRING): Defined to check if a pointer points into
2705 (SET_REGS_MATCHED): Changed to use the struct register_info
2706 bits; also set the matched-something bit to false if the
2707 register isn't currently active. (This is a redundant setting.)
2708 (re_match_2): Cleaned up and completed the header comment.
2709 Updated the failure stack comment.
2710 Replaced the ``2'' with MAX_NUM_FAILURE_ITEMS in the static
2711 allocation of initial_stack, because now more than two (now up
2712 to MAX_FAILURE_ITEMS) items get pushed on the failure stack each
2715 Trashed restart_seg1, regend_seg1, best_regstart_seg1, and
2716 best_regend_seg1 because they could have erroneous information
2717 in them, such as when matching ``a'' (in string1) and ``ab'' (in
2718 string2) with ``(a)*ab''; before using IS_IN_FIRST_STRING to see
2719 whether or not the register starts or ends in string1,
2720 regstart[1] pointed past the end of string1, yet regstart_seg1
2722 Added variable reg_info of type struct register_info to keep
2723 track of currently active registers and whether or not they
2724 currently match anything.
2725 Commented best_regs_set.
2726 Trashed reg_active and reg_matched_something and put the
2727 information they held into reg_info; saves space on the stack.
2728 Replaced NULL with '\000'.
2729 In begline case, compacted the code.
2730 Used assert to exit if had an internal error.
2731 In begbuf case, because now force the string we're working on
2732 into string2 if there aren't two strings, now allow d == string2
2733 if there is no string1 (and the check for that is size1 == 0!);
2734 also now succeeds if there aren't any strings at all.
2735 (main, ifdef canned): Put test type into a variable so could
2736 change it while debugging.
2738 Sat Mar 24 12:24:13 1990 Kathy Hargreaves (kathy at hayley)
2740 * regex.c (GET_UNSIGNED_NUMBER): Deleted references to num_fetches.
2741 (re_compile_pattern): Deleted num_fetches because could keep
2742 track of the number of fetches done by saving a pointer into the
2744 Added variable beg_interval to be used as a pointer, as above.
2745 Assert that beg_interval points to something when it's used as above.
2746 Initialize succeed_n's to lower_bound because re_compile_fastmap
2748 (re_compile_fastmap): Deleted unnecessary variable is_a_jump_n.
2750 (re_match_2): Put number of registers on the stack into a
2751 variable before using it to decrement the stack, so as to not
2752 confuse the compiler.
2754 Used error routine instead of printf and exit.
2755 In exactn case, restored longer code from ``original'' regex.c
2756 which doesn't test translate inside a loop.
2758 * regex.h: Moved #define NULL and the enum regexpcode definition
2759 and to regex.c. Changed some comments.
2761 regex.c (global): Updated comments about compiling and for the
2762 re_compile_pattern jump routines.
2763 Added #define NULL and the enum regexpcode definition (from
2765 (enum regexpcode): Added set_number_at to reset the n's of
2766 succeed_n's and jump_n's.
2767 (re_set_syntax): Updated its comment.
2768 (re_compile_pattern): Moved its heading comment to after its macros.
2769 Moved its include statement to the top of the file.
2770 Commented or added to comments of its macros.
2771 In start_memory case: Push laststart value before adding
2772 start_memory and its register number to the buffer, as they
2773 might not get added.
2774 Added code to put a set_number_at before each succeed_n and one
2775 after each jump_n; rewrote code in what seemed a more
2776 straightforward manner to put all these things in the pattern so
2777 the succeed_n's would correctly jump to the set_number_at's of
2778 the matching jump_n's, and so the jump_n's would correctly jump
2779 to after the set_number_at's of the matching succeed_n's.
2780 Initialize succeed_n n's to -1.
2781 (insert_op_2): Added this to insert an operation followed by
2783 (re_compile_fastmap): Added set_number_at case.
2784 (re_match_2): Moved heading comment to after macros.
2785 Added mention of REGS to heading comment.
2786 No longer turn a succeed_n with n = 0 into an on_failure_jump,
2787 because n needs to be reset each time through a loop.
2788 Check to see if a succeed_n's n is set by its set_number_at.
2789 Added set_number_at case.
2790 Updated some comments.
2791 (main): Added another main to run posix tests, which is compiled
2792 ifdef both test and canned. (Old main is still compiled ifdef
2795 Tue Mar 19 09:22:55 1990 Kathy Hargreaves (kathy at hayley)
2797 * regex.[hc]: Change all instances of the word ``legal'' to
2798 ``valid'' and all instances of ``illegal'' to ``invalid.''
2800 Sun Mar 4 12:11:31 1990 Kathy Hargreaves (kathy at hayley)
2802 * regex.h: Added syntax bit RE_NO_EMPTY_RANGES which is set if
2803 an ending range point has to collate higher or equal to the
2804 starting range point.
2805 Added syntax bit RE_NO_HYPHEN_RANGE_END which is set if a hyphen
2806 can't be an ending range point.
2807 Set to two above bits in RE_SYNTAX_POSIX_BASIC and
2808 RE_SYNTAX_POSIX_EXTENDED.
2810 regex.c: (re_compile_pattern): Don't allow empty ranges if the
2811 RE_NO_EMPTY_RANGES syntax bit is set.
2812 Don't let a hyphen be a range end if the RE_NO_HYPHEN_RANGE_END
2814 (ESTACK_PUSH_2): renamed this PUSH_FAILURE_POINT and made it
2815 push all the used registers on the stack, as well as the number
2816 of the highest numbered register used, and (as before) the two
2818 (re_match_2): Fixed up comments.
2819 Added arrays best_regstart[], best_regstart_seg1[], best_regend[],
2820 and best_regend_seg1[] to keep track of the best match so far
2821 whenever reach the end of the pattern but not the end of the
2822 string, and there are still failure points on the stack with
2823 which to backtrack; if so, do the saving and force a fail.
2824 If reach the end of the pattern but not the end of the string,
2825 but there are no more failure points to try, restore the best
2826 match so far, set the registers and return.
2827 Compacted some code.
2828 In stop_memory case, if the subexpression we've just left is in
2829 a loop, push onto the stack the loop's on_failure_jump failure
2830 point along with the current pointer into the string (d).
2831 In finalize_jump case, in addition to popping the failure
2832 points, pop the saved registers.
2833 In the fail case, restore the registers, as well as the failure
2836 Sun Feb 18 15:08:10 1990 Kathy Hargreaves (kathy at hayley)
2838 * regex.c: (global): Defined a macro GET_BUFFER_SPACE which
2839 makes sure you have a specified number of buffer bytes
2841 Redefined the macro BUFPUSH to use this.
2844 (re_compile_pattern): Call GET_BUFFER_SPACE before storing or
2845 inserting any jumps.
2847 (re_match_2): Set d to string1 + pos and dend to end_match_1
2848 only if string1 isn't null.
2849 Force exit from a loop if it's around empty parentheses.
2850 In stop_memory case, if found some jumps, increment p2 before
2851 extracting address to which to jump. Also, don't need to know
2852 how many more times can jump_n.
2853 In begline case, d must equal string1 or string2, in that order,
2854 only if they are not null.
2855 In maybe_finalize_jump case, skip over start_memorys' and
2856 stop_memorys' register numbers, too.
2858 Thu Feb 15 15:53:55 1990 Kathy Hargreaves (kathy at hayley)
2860 * regex.c (BUFPUSH): off by one goof in deciding whether to
2863 Wed Jan 24 17:07:46 1990 Kathy Hargreaves (kathy at hayley)
2865 * regex.h: Moved definition of NULL to here.
2866 Got rid of ``In other words...'' comment.
2867 Added to some comments.
2869 regex.c: (re_compile_pattern): Tried to bulletproof some code,
2870 i.e., checked if backward references (e.g., p[-1]) were within
2871 the range of pattern.
2873 (re_compile_fastmap): Fixed a bug in succeed_n part where was
2874 getting the amount to jump instead of how many times to jump.
2876 (re_search_2): Changed the name of the variable ``total'' to
2878 Condensed some code.
2880 (re_match_2): Moved the comment about duplicate from above the
2881 start_memory case to above duplicate case.
2883 (global): Rewrote some comments.
2884 Added commandline arguments to testing.
2886 Wed Jan 17 11:47:27 1990 Kathy Hargreaves (kathy at hayley)
2888 * regex.c: (global): Defined a macro STORE_NUMBER which stores a
2889 number into two contiguous bytes. Also defined STORE_NUMBER_AND_INCR
2890 which does the same thing and then increments the pointer to the
2891 storage place to point after the number.
2892 Defined a macro EXTRACT_NUMBER which extracts a number from two
2893 continguous bytes. Also defined EXTRACT_NUMBER_AND_INCR which
2894 does the same thing and then increments the pointer to the
2895 source to point to after where the number was.
2897 Tue Jan 16 12:09:19 1990 Kathy Hargreaves (kathy at hayley)
2899 * regex.h: Incorporated rms' changes.
2900 Defined RE_NO_BK_REFS syntax bit which is set when want to
2901 interpret back reference patterns as literals.
2902 Defined RE_NO_EMPTY_BRACKETS syntax bit which is set when want
2903 empty bracket expressions to be illegal.
2904 Defined RE_CONTEXTUAL_ILLEGAL_OPS syntax bit which is set when want
2905 it to be illegal for *, +, ? and { to be first in an re or come
2906 immediately after a | or a (, and for ^ not to appear in a
2907 nonleading position and $ in a nontrailing position (outside of
2908 bracket expressions, that is).
2909 Defined RE_LIMITED_OPS syntax bit which is set when want +, ?
2910 and | to always be literals instead of ops.
2911 Fixed up the Posix syntax.
2912 Changed the syntax bit comments from saying, e.g., ``0 means...''
2913 to ``If this bit is set, it means...''.
2914 Changed the syntax bit defines to use shifts instead of integers.
2916 * regex.c: (global): Incorporated rms' changes.
2918 (re_compile_pattern): Incorporated rms' changes
2919 Made it illegal for a $ to appear anywhere but inside a bracket
2920 expression or at the end of an re when RE_CONTEXTUAL_ILLEGAL_OPS
2921 is set. Made the same hold for $ except it has to be at the
2922 beginning of an re instead of the end.
2923 Made the re "[]" illegal if RE_NO_EMPTY_BRACKETS is set.
2924 Made it illegal for | to be first or last in an re, or immediately
2925 follow another | or a (.
2926 Added and embellished some comments.
2927 Allowed \{ to be interpreted as a literal if RE_NO_BK_CURLY_BRACES
2929 Made it illegal for *, +, ?, and { to appear first in an re, or
2930 immediately follow a | or a ( when RE_CONTEXTUAL_ILLEGAL_OPS is set.
2931 Made back references interpreted as literals if RE_NO_BK_REFS is set.
2932 Made recursive intervals either illegal (if RE_NO_BK_CURLY_BRACES
2933 isn't set) or interpreted as literals (if is set), if RE_INTERVALS
2935 Made it treat +, ? and | as literals if RE_LIMITED_OPS is set.
2936 Cleaned up some code.
2938 Thu Dec 21 15:31:32 1989 Kathy Hargreaves (kathy at hayley)
2940 * regex.c: (global): Moved RE_DUP_MAX to regex.h and made it
2941 equal 2^15 - 1 instead of 1000.
2942 Defined NULL to be zero.
2943 Moved the definition of BYTEWIDTH to regex.h.
2944 Made the global variable obscure_syntax nonstatic so the tests in
2945 another file could use it.
2947 (re_compile_pattern): Defined a maximum length (CHAR_CLASS_MAX_LENGTH)
2948 for character class strings (i.e., what's between the [: and the
2950 Defined a macro SET_LIST_BIT(c) which sets the bit for C in a
2952 Took out comments that EXTEND_BUFFER clobbers C.
2953 Made the string "^" match itself, if not RE_CONTEXT_IND_OPS.
2954 Added character classes to bracket expressions.
2955 Change the laststart pointer saved with the start of each
2956 subexpression to point to start_memory instead of after the
2957 following register number. This is because the subexpression
2959 Added comments and compacted some code.
2960 Made intervals only work if preceded by an re matching a single
2961 character or a subexpression.
2962 Made back references to nonexistent subexpressions illegal if
2964 Made intervals work on the last preceding character of a
2965 concatenation of characters, e.g., ab{0,} matches abbb, not abab.
2966 Moved macro PREFETCH to outside the routine.
2968 (re_compile_fastmap): Added succeed_n to work analogously to
2969 on_failure_jump if n is zero and jump_n to work analogously to
2970 the other backward jumps.
2972 (re_match_2): Defined macro SET_REGS_MATCHED to set which
2973 current subexpressions had matches within them.
2974 Changed some comments.
2975 Added reg_active and reg_matched_something arrays to keep track
2976 of in which subexpressions currently have matched something.
2977 Defined MATCHING_IN_FIRST_STRING and replaced ``dend == end_match_1''
2978 with it to make code easier to understand.
2979 Fixed so can apply * and intervals to arbitrarily nested
2980 subexpressions. (Lots of previous bugs here.)
2981 Changed so won't match a newline if syntax bit RE_DOT_NOT_NULL is set.
2982 Made the upcase array nonstatic so the testing file could use it also.
2984 (main.c): Moved the tests out to another file.
2986 (tests.c): Moved all the testing stuff here.
2988 Sat Nov 18 19:30:30 1989 Kathy Hargreaves (kathy at hayley)
2990 * regex.c: (re_compile_pattern): Defined RE_DUP_MAX, the maximum
2991 number of times an interval can match a pattern.
2992 Added macro GET_UNSIGNED_NUMBER (used to get below):
2993 Added variables lower_bound and upper_bound for upper and lower
2994 bounds of intervals.
2995 Added variable num_fetches so intervals could do backtracking.
2996 Added code to handle '{' and "\{" and intervals.
2999 (store_jump_n): (Added) Stores a jump with a number following the
3000 relative address (for intervals).
3002 (insert_jump_n): (Added) Inserts a jump_n.
3004 (re_match_2): Defined a macro ESTACK_PUSH_2 for the error stack;
3005 it checks for overflow and reallocates if necessary.
3007 * regex.h: Added bits (RE_INTERVALS and RE_NO_BK_CURLY_BRACES)
3008 to obscure syntax to indicate whether or not
3009 a syntax handles intervals and recognizes either \{ and
3010 \} or { and } as operators. Also added two syntaxes
3011 RE_SYNTAX_POSIX_BASIC and RE_POSIX_EXTENDED and two command codes
3012 to the enumeration regexpcode; they are succeed_n and jump_n.
3014 Sat Nov 18 19:30:30 1989 Kathy Hargreaves (kathy at hayley)
3016 * regex.c: (re_compile_pattern): Defined INIT_BUFF_SIZE to get rid
3017 of repeated constants in code. Tested with value 1.
3018 Renamed PATPUSH as BUFPUSH, since it pushes things onto the
3019 buffer, not the pattern. Also made this macro extend the buffer
3020 if it's full (so could do the following):
3021 Took out code at top of loop that checks to see if buffer is going
3022 to be full after 10 additions (and reallocates if necessary).
3024 (insert_jump): Rearranged declaration lines so comments would read
3027 (re_match_2): Compacted exactn code and added more comments.
3029 (main): Defined macros TEST_MATCH and MATCH_SELF to do
3030 testing; took out loop so could use these instead.
3032 Tue Oct 24 20:57:18 1989 Kathy Hargreaves (kathy at hayley)
3034 * regex.c (re_set_syntax): Gave argument `syntax' a type.
3035 (store_jump, insert_jump): made them void functions.
3040 version-control: never