Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
274 views
in Technique[技术] by (71.8m points)

Python regex find all overlapping matches?

I'm trying to find every 10 digit series of numbers within a larger series of numbers using re in Python 2.6.

I'm easily able to grab no overlapping matches, but I want every match in the number series. Eg.

in "123456789123456789"

I should get the following list:

[1234567891,2345678912,3456789123,4567891234,5678912345,6789123456,7891234567,8912345678,9123456789]

I've found references to a "lookahead", but the examples I've seen only show pairs of numbers rather than larger groupings and I haven't been able to convert them beyond the two digits.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use a capturing group inside a lookahead. The lookahead captures the text you're interested in, but the actual match is technically the zero-width substring before the lookahead, so the matches are technically non-overlapping:

import re 
s = "123456789123456789"
matches = re.finditer(r'(?=(d{10}))',s)
results = [int(match.group(1)) for match in matches]
# results: 
# [1234567891,
#  2345678912,
#  3456789123,
#  4567891234,
#  5678912345,
#  6789123456,
#  7891234567,
#  8912345678,
#  9123456789]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...