Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#213403 - 16/04/2004 10:15 Using a regexp to tokenize?
Dylan
addict

Registered: 23/09/2000
Posts: 498
Loc: Virginia, USA
Is there a way to write a regular expression such that it will produce a match for each of a series of repeating tokens.

I'm trying to parse a query string into a set of name,value matches. These are my test input cases:

a = b
&a=b&
a=b& c=d
a =b&c=d&
a=b&c= d&e=f

I'd like to produce matches of (a,b) or (a,b,c,d) or (a,b,c,d,e,f). It's obvious for a human to see what the desired matches are.

The following regexp will match against the first token of each example and give the correct matching substrings.

[& ]*([^= ]*)[ ]*=[ ]*([^&= ]*)[& ]*

But is there a way to make the pattern iterate over the entire input and return multiple matches. I could write controlling logic to walk through the input string but it would be easier if I could make the regexp engine do it.

Thanks.

Top
#213404 - 16/04/2004 10:36 Re: Using a regexp to tokenize? [Re: Dylan]
siberia37
old hand

Registered: 09/01/2002
Posts: 702
Loc: Tacoma,WA
I don't thnk the regexp can do all the work in the regard your thinking- but why not use a replacing regular expression to replace all the matches with commas "," and then enclose it the resulting string in paranthesis.

Top
#213405 - 17/04/2004 02:16 Re: Using a regexp to tokenize? [Re: Dylan]
canuckInOR
carpal tunnel

Registered: 13/02/2002
Posts: 3212
Loc: Portland, OR
What language are you using? This is fairly trivial using perl:
        perl -e'while(<>){@m=/(\w+)\s*=\s*(\w+)/g;print "@m\n"}'
It gets a bit trickier (but not a whole lot) if you want to allow more than word characters (alphanumerics and underscore) as your a and b:
        @m=/\s*([^&=]+?)\s*=\s*([^&]+)/g
And, if you're sure you'll always get pairs, then you can assign that to a hash, as well, so you automatically have key => value pairs.

edit: Note... the while() is just looping over stdin, so you can input your test cases, not looping your regex over the input. That part is taken care of by the /g modifier.

Top
#213406 - 17/04/2004 10:02 Re: Using a regexp to tokenize? [Re: canuckInOR]
Dylan
addict

Registered: 23/09/2000
Posts: 498
Loc: Virginia, USA
My God. Is that a Martian dialect?

I'm writing this in C using libpcre as the regexp engine. In isolation, it wouldn't be difficult to write the surrounding logic to walk through the string, evaluating the regexp one token at a time. But for various long and boring reasons it would fit into our app better if it could all be controlled by run time regexp configuration.

Thanks for the reply, though. I really should become facile with Perl. There are so many times I find myself needing a quick scripting solution but I end up using C because it's what I comfortable with.

Top
#213407 - 18/04/2004 03:13 Re: Using a regexp to tokenize? [Re: Dylan]
andy
carpal tunnel

Registered: 10/06/1999
Posts: 5916
Loc: Wivenhoe, Essex, UK
My God. Is that a Martian dialect?

You should see what people can do with Perl when they are trying to write obtuse code:

@P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{
@p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f^ord
($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&
close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print

http://perl.plover.com/obfuscated/
_________________________
Remind me to change my signature to something more interesting someday

Top
#213408 - 18/04/2004 09:49 Re: Using a regexp to tokenize? [Re: andy]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
That one's way too obvious. It's obviously got source material in there, just backwards.
_________________________
Bitt Faulk

Top
#213409 - 18/04/2004 09:56 Re: Using a regexp to tokenize? [Re: wfaulk]
andy
carpal tunnel

Registered: 10/06/1999
Posts: 5916
Loc: Wivenhoe, Essex, UK
It might be obvious what it does, but it is far from obvious how it does it.

It spawns separate processes to print each single character of the message, to syncronise the processes it opens pipes between them and tracks the state of each process.

Something like that anyway.
_________________________
Remind me to change my signature to something more interesting someday

Top
#213410 - 18/04/2004 09:59 Re: Using a regexp to tokenize? [Re: andy]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
I didn't even begin to try to figure that out. I just prefer the ones where there's no obvious source material at all, so that it seems to generate output from nowhere.
_________________________
Bitt Faulk

Top