moderated >> pattern matching dynamic strings w/ regex ending in $ problem

by rader » Tue, 03 May 2005 04:31:59 GMT

I'm seeing a simple, ah, problem with perl 5.8.0...

chive(rader): perl -v | head -2 | tail -1
This is perl, v5.8.0 built for i386-linux-thread-multi

chive(rader): cat crud
#!/usr/bin/perl
$a = 'fuBar';
$b = 'Bar$';
$a_new = `echo -n $a`;
$b_new = `echo -n $b`;
if ( $a_new =~ /$b_new/ ) {
print "match\n";
} else {
print "no match?!\n";
}

chive(rader): ./crud
no match?!

Anybody know of a workaround? The code above behaves as expected
with 5.6.x. Could this actually be a bug?

steve
- - -
systems & network manager
high energy physics
university of wisconsin


moderated >> pattern matching dynamic strings w/ regex ending in $ problem

by wInuX » Tue, 03 May 2005 14:13:15 GMT


r> I'm seeing a simple, ah, problem with perl 5.8.0...

r> chive(rader): perl -v | head -2 | tail -1
r> This is perl, v5.8.0 built for i386-linux-thread-multi

r> chive(rader): cat crud
r> #!/usr/bin/perl
r> $a = 'fuBar';
r> $b = 'Bar$';
r> $a_new = `echo -n $a`;
r> $b_new = `echo -n $b`;
r> if ( $a_new =~ /$b_new/ ) {
r> print "match\n";
r> } else {
r> print "no match?!\n";
r> }

r> chive(rader): ./crud
r> no match?!

r> Anybody know of a workaround? The code above behaves as expected
r> with 5.6.x. Could this actually be a bug?


$ perl -v
This is perl, v5.8.0 built for MSWin32-x86-multi-thread

$ perl crud.pl
match

may be problem in `echo -n $a`. try to print $a_new and $b_new to stdout.

moderated >> pattern matching dynamic strings w/ regex ending in $ problem

by Andrei Voropaev » Tue, 03 May 2005 15:32:29 GMT


Have you tried to print out the values that you get from shell? Since
your variable contains '$' this might get into some conflict with your
shell, so that $b_new will contains something different than Bar$.

Anyway, this particular script runs with all perl versions that I have
(5.8.1, 5.8.5) and all shells that I use (bash, zsh).

--
Minds, like parachutes, function best when open

moderated >> pattern matching dynamic strings w/ regex ending in $ problem

by rader » Wed, 04 May 2005 00:14:28 GMT


nope, it's not a echo -n nor a $ in echo problem... and, btw, it works
with 5.004 and 5.6.1 (sol7 and rhl73)... but not with 5.8.0 on rhel3...

steve
- - -

chive(rader): perl -v | head -2 | tail -1
This is perl, v5.8.0 built for i386-linux-thread-multi

chive(rader): cat crud
#!/usr/bin/perl -w
use strict;
my $a = 'fuBar'; my $a_new = `echo -n $a`;
my $b = 'Bar$'; my $b_new = `echo -n $b`;
if ( $a ne $a_new || $b ne $b_new ) { print "oops!\n"; }
print "$a_new =~ /$b_new/...\n";
if ( $a_new =~ /$b_new/ ) {
print "match\n";
} else {
print "no match?!\n";
}

chive(rader): ./crud
fuBar =~ /Bar$/...
no match?!


ginseng(rader): perl -v | head -2 | tail -1
This is perl, v5.6.1 built for i386-linux

ginseng(rader): cat crud
#!/usr/bin/perl -w
use strict;
my $a = 'fuBar'; my $a_new = `echo -n $a`;
my $b = 'Bar$'; my $b_new = `echo -n $b`;
if ( $a ne $a_new || $b ne $b_new ) { print "oops!\n"; }
print "$a_new =~ /$b_new/...\n";
if ( $a_new =~ /$b_new/ ) {
print "match\n";
} else {
print "no match?!\n";
}

ginseng(rader): ./crud
fuBar =~ /Bar$/...
match

moderated >> pattern matching dynamic strings w/ regex ending in $ problem

by Ernest Lergon » Sat, 07 May 2005 20:24:21 GMT

Perl 5.8.0 had some problems with regex. You should upgrade your Perl.

See: http://guest: XXXX@XXXXX.COM /rt3/index.html?q=19767

Ernest

Similar Threads

1. pattern matching dynamic strings w/ regex ending in $ problem

On 2005-05-02, rader scribbled these
curious markings:

[Posting to both clpmod and clpmisc is largely superfluous. Following up
to and F'ups set to clpmisc.]

> chive(rader): cat crud
> #!/usr/bin/perl
> $a = 'fuBar';
> $b = 'Bar$';
> $a_new = `echo -n $a`;
> $b_new = `echo -n $b`;
> if ( $a_new =~ /$b_new/ ) {
>    print "match\n";
> } else {
>    print "no match?!\n";
> }
>
> chive(rader): ./crud
> no match?!

Make sure that the results of the echo commands are actually what you
expect. FWIW, it WFM on 5.8.6 i386-freebsd-thread-multi-64int.

Best Regards,
Christopher Nehren
-- 
I abhor a system designed for the "user", if that word is a coded
pejorative meaning "stupid and unsophisticated". -- Ken Thompson
If you ask the wrong questions, you get answers like "42" and "God".
Unix is user friendly. However, it isn't idiot friendly.

2. howto simplfy this regex if ($string =~/^$match | $match | $match$/g){ - Perl

3. Regex testing and UTF8 awarenes or Regex and numeric pattern matching

Reading through the pod's for info on utf8 and possible interger matching,
and setting up numerous tests, I inadvertantly discovered what utf really is
in its entirety.

Unfortunately, only utf-8 is allowed (my 5.8.6 version) within Perl. All the
gates, entry points are covered. Internally, its as the documentation says,
pure utf8. The BOM (byte order mark) is different for utf16/32.

If you try to force internal variant, say utf32, you get malformed or
utf-16 surrogate errors. It almost seems impossible then, you could
convert external utf32 (no BOM) or utf16 to internal utf8. But then, how
could you test it internal when there is no conversion functions.
It does no good internally because it is not an entry point. You are inside
Perl, which doesen't understand anything other that utf8 or byte demotion.

Not a very good strategy. This leaves holes if one wanted to do utf32 character
processing inside a regular expression. Of course I don't want to do that, I
want to process binary 32 bit integers with some of the niceties of the regex engine.

If the regex engine is so nice as to process some intermittant range of 32-bit
integers encoded as characters (utf8) perhaps its almost there towards integer
pattern matching. Albeit the constructs need to be changed a little, it would
be a powerfull binary parser. Don't you agree?

Below is some menutia of trials and errors spaghetii code I've tried.
Within ranges, encoding 32 bit integers for basic pattern matching works well.
Of course, it is very slow in character classes, as opposed to groups, but sometimes
putting a few 0-256 character range in a class won't cause it to crash whereas in groups it
will. Over that, ranges have the surrogate or malformed utf8.

Outside of problem ranges (BOM) it works flawlessly in groups, and its real fast.

So, my question is, why is Perl so short sided in this regard. Just some apparently simple
adjustments and it could be a high grade binary processor.

The junk code is below. If you haven't tried it or can't explain it
don't bother replying. I've read all the unicode there is in the pods and understand it
completely.

-sln

------------------------------------------------------------------------------
##
use warnings;
use strict;

printf ">>>>>>> \n%d %d %d %d \n<<<<<<<<<<<\n", 0xdf20,0xdf21,0xdf22,0xdf23,;

binmode STDOUT, ':utf8';

#my @ar = (120000,21,22,23,24,25,26,27,28,ord('a'),30);

#my @ar = (20000,20001,20002,0,20003,20004,20005,23336,20007,20008,20009,30000);

#my @ar = (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);

my @ar = ();
#push @ar, $_ for (0 .. 280);
#push @ar, $_ for (240 .. 280);
push @ar, $_ for (0 .. 70000);

push @ar, 0xdf20;
push @ar, 0xdf21;
push @ar, 0xdf21;
push @ar, 0xdf22;

my $str = pack 'U*', @ar;
#print "\nstr = ",$str,"\nlength = ",length($str),"\n";


foreach my $cur (@ar)  # here $cur is the frame position
{


#	my $pattern = sprintf "(\\x{%x}\\x{%x}\\x{%x})(.{0,5})\\x{%x}", $cur,$cur+1,$cur+2,$cur+5;

#	my $pattern = sprintf "(\\x{%x}\\x{%x}\\x{%x})(\$)", $cur,$cur+1,$cur+2;  # GETS 3 at end of string

# 3? >>	my $pattern = sprintf "(\\x{%x}\\x{%x}\\x{%x}.{0,8})([\\x{%x}-\\x{%x}])", $cur,$cur+1,$cur+2, $cur+4,$cur+10;

# 2 >>	my $pattern = sprintf "(\\x{%x}\\x{%x}\\x{%x}).{0,8}([\\x{%x}-\\x{%x}])", $cur,$cur+1,$cur+2, $cur+7,$cur+10;

# --	my $pattern = sprintf "(\\x{%x}\\x{%x}\\x{%x}).*?(\\x{%x})", $cur,$cur+1,$cur+2, $cur+4;

#	my $pattern = sprintf "(%c%c%c).*?(%c)", $cur,$cur+1,$cur+2, $cur+5;
#	my $pattern = sprintf "(\\%c\\%c\\%c).*?(\\%c)", $cur,$cur+1,$cur+2, $cur+5;


#my $pattern = sprintf "(\\x{%x}\\x{%x}\\x{%x}).*?(\\x{%x})", $cur,$cur+1,$cur+2, $cur+5;
my $pattern = sprintf "(%c%c%c).*?(%c)", $cur,$cur+1,$cur+2, $cur+5;
if ($cur < 256)
{
	$pattern = sprintf "([\\x{%x}][\\x{%x}][\\x{%x}]).*?([\\x{0%x}])", $cur,$cur+1,$cur+2,$cur+5;
}


#	my $pattern = sprintf "(\\x{%x}\\x{%x})(.*?)(\\x{%x})", $cur,$cur+1,$cur+5;

#	my $pattern = sprintf "(\\x{%x}\\x{%x}\\x{%x}).*?(\\x{%x})", $cur,$cur+1,$cur+2,$cur+5;

#	my $pattern = sprintf "(\\x{0%x}\\x{0%x}\\x{0%x})[^\\x{0%x}]*?(\\x{0%x})", $cur,$cur+1,$cur+2,$cur+5,$cur+5;

#	my $pattern = sprintf "([\\x{0%x}\\x{0%x}\\x{0%x}]{3}).*?([\\x{0%x}])", $cur,$cur+1,$cur+2,$cur+4;

### apparently \\x{%x} must exist in char class
### and here [\\%c] won't work because of unknown escaped chars like \J
##

#my $pattern = sprintf "([\\x{%x}\\x{%x}\\x{%x}]).*?([\\x{0%x}])", $cur,$cur+1,$cur+2,$cur+5;

#	my $pattern = sprintf "[\\x{%x}][\\x{%x}][\\x{%x}].*?[\\x{0%x}]", $cur,$cur+1,$cur+2,$cur+5;


#my $s1 = sprintf "%c%c%c",$cur,$cur+1,$cur+2;
#my $s2 = sprintf "%c",$cur+5;
#$s1 = quotemeta($s1);
#$s2 = quotemeta($s2);
#my $pattern = sprintf "(%s).*?(%s)", $s1,$s2;


#	my $pattern = sprintf "(\\%c\\%c\\%c).*?(\\%c)", $cur,$cur+1,$cur+2, $cur+5;


#	my $pattern = sprintf "([\\x{%x}][\\x{%x}][\\x{%x}]).*?([\\x{0%x}])", $cur,$cur+1,$cur+2,$cur+5;

#	my $pattern = sprintf "(\\x{0%x}\\x{0%x})(.*?)(\\x{0%x})", $cur,$cur+1,$cur+4;

# -->	my $pattern = sprintf "(\\%c\\%c\\%c)[^\\%c]*?(\\0%c)", $cur,$cur+1,$cur+2, $cur+5, $cur+5;

#	my $pattern = sprintf "(\\x{%x}\\x{%x}\\x{%x}).*?(\\x{%x})", $cur,$cur+1,$cur+2, $cur+4;

#	my $pattern = sprintf "(%s).{0,5}([\\x{%x}-\\x{%x}])", $test, $cur+7,$cur+10;

#	my $pattern = sprintf "(%c%c%c).{0,5}([%c-%c])", ($cur,$cur+1,$cur+2), $cur+7,$cur+10;

#>>	my $pattern = sprintf "([%c-%c]{3}).{0,5}([%c-%c])", ($cur,$cur+2), $cur+7,$cur+10;


#	print "\n----------------------------\ncur = $cur\n";
#	print "pattern = $pattern\n";

#$str =~ /($pattern)/s;

#my @p = unpack ('U*',$pattern);
#my @p = map {ord $_} split '',$pattern;
#print "pat = @p\n";

	if ( $str =~ /($pattern)/s)  ### NEED '/s' BECAUSE '.*?' WON'T MATCH '\n' WITHOUT IT
	{
#print "$cur\n";
print "$cur\n" if ($cur % 1000 == 0);
next;

		my @m1 = unpack ('U*',$1);
		my @m2 = unpack ('U*',$2);
		my @m3 = unpack ('U*',$3);

		print "matched:\n  1 = '@m1',  length = ".length($1).
				"\n  2 = '@m2',  length = ".length($2).
				"\n  3 = '@m3',  length = ".length($3)."\n";

		printf "\$3 = %d\n",ord $3;
	}
else
{
   print STDERR "didn't match $cur\n";
   print "didn't match $cur\n";
}

}

4. Dynamic pattern matching using RE - Perl

5. Dynamic pattern matching?


I've got a data file that for the most part, the entries look like:  (The
last 3 columns are data points...)

LKG_535   P10X0.6         -2.00E-09   0.00E+00  amps     -3.800E-13
-3.920E-12   -7.800E-13 
VT_GM     L0.8H40         -1.15E+00  -7.50E-01  volts    -1.104E+00
-1.111E+00   -1.110E+00      
IDSAT_5   Y0.8N20         -5.80E-03  -3.00E-03  amps     -5.036E-06
-5.001E-06   -4.853E-06   
VT_GP     P0.8X.6         -1.15E+00  -7.50E-01  volts    -1.018E+00
-9.966E-01   -1.012E+00     
LOGU_I    I2.00.6          6.00E-03   1.00E-02  amps      8.992E-03
8.939E-03    8.903E-03     

which I match with the following:

# RE for a valid floating point number
$fp = qr/[+-]?\ *(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?/;

# Case for 3 data points
if $line =~
/(.{9})\s+(.{10})\s+.{4}\s+($fp)\s+($fp)\s+(.{8})\s+($fp)\s+($fp)\s+($fp)\s+
$/o)         
{
  $datapts = 3
  #Insert matched vars into Class::Struct array...
  ...
}

But optionally, and once in a while there might be a line that looks like:
(this case shows 3 extra columns [data points], but in reality there could
be 1,3 or 5 more columns)

HGYPG5    M1_LG       OT   0.00E+00   2.00E-08  amps      1.000E-06
4.000E-11    2.000E-11    6.000E-11    4.000E-11    8.000E-11 

I know I can write an if() clause to match every possible case, but I'm
wondering if there is a more general approach that would allow me to
dynamically match a varying number of extra columns within a single
expression.

Thanks,
-Dan   

6. Dynamic data within regex pattern? - Perl

7. Pattern matching : not matching problem

8. regex not matching high end - Perl