Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
878 views
in Technique[技术] by (71.8m points)

ruby - How do I split a string by commas except inside parenthesis, using a regular expression?

I want to split a string by comma:

"a,s".split ','  # => ['a', 's']

I don't want to split a sub-string if it is wrapped by parenthesis:

"a,s(d,f),g,h"

should yield:

['a', 's(d,f)', 'g', 'h']

Any suggestion?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To deal with nested parenthesis, you can use:

txt = "a,s(d,f(4,5)),g,h"
pattern = Regexp.new('((?:[^,(]+|(((?>[^()]+|g<-1>)*)))+)')
puts txt.scan(pattern).map &:first

pattern details:

(                        # first capturing group
    (?:                  # open a non capturing group
        [^,(]+           # all characters except , and (
      |                  # or
        (                # open the second capturing group
           (            # (
            (?>          # open an atomic group
                [^()]+   # all characters except parenthesis
              |          # OR
                g<-1>   # the last capturing group (you can also write g<2>)
            )*           # close the atomic group
            )           # )
        )                # close the second capturing group
    )+                   # close the non-capturing group and repeat it
)                        # close the first capturing group

The second capturing group describe the nested parenthesis that can contain characters that are not parenthesis or the capturing group itself. It's a recursive pattern.

Inside the pattern, you can refer to a capture group with his number (g<2> for the second capturing group) or with his relative position (g<-1> the first on the left from the current position in the pattern) (or with his name if you use named capturing groups)

Notice: You can allow single parenthesis if you add |[()] before the end of the non-capturing group. Then a,b(,c will give you ['a', 'b(', 'c']


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...