# python – How to find all occurrences of a substring?

## The Question :

405 people think this question is useful

Python has string.find() and string.rfind() to get the index of a substring in a string.

I’m wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).

For example:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]


• what should 'ttt'.find_all('tt') return?
• it should return ‘0’. Of course, in perfect world there also has to be 'ttt'.rfind_all('tt'), which should return ‘1’
• Seems like a duplicate of this stackoverflow.com/questions/3873361/…

574 people think this answer is useful

There is no simple built-in string function that does what you’re looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]



If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]



If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]



re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you’re only iterating through the results once.

124 people think this answer is useful
>>> help(str.find)
Help on method_descriptor:

find(...)
S.find(sub [,start [,end]]) -> int



Thus, we can build it ourselves:

def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]



No temporary strings or regexes required.

54 people think this answer is useful

Here’s a (very inefficient) way to get all (i.e. even overlapping) matches:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]



31 people think this answer is useful

Again, old thread, but here’s my solution using a generator and plain str.find.

def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)



### Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]



returns

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]



22 people think this answer is useful

You can use re.finditer() for non-overlapping matches.

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]



but won’t work for:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]



19 people think this answer is useful

Come, let us recurse together.

def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""

substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found

return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]



No need for regular expressions this way.

12 people think this answer is useful

If you’re just looking for a single character, this would work:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7



Also,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4



My hunch is that neither of these (especially #2) is terribly performant.

10 people think this answer is useful

this is an old thread but i got interested and wanted to share my solution.

def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result



It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.

6 people think this answer is useful

This does the trick for me using re.finditer

import re

text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))



5 people think this answer is useful

This thread is a little old but this worked for me:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
marker = len(numberString)



5 people think this answer is useful

You can try :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index

0
5
10
15



2 people think this answer is useful

Whatever the solutions provided by others are completely based on the available method find() or any available methods.

What is the core basic algorithm to find all the occurrences of a substring in a string?

def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes



You can also inherit str class to new class and can use this function below.

class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes



Calling the method

2 people think this answer is useful

This function does not look at all positions inside the string, it does not waste compute resources. My try:

def findAll(string,word):
all_positions=[]
next_pos=-1
while True:
next_pos=string.find(word,next_pos+1)
if(next_pos<0):
break
all_positions.append(next_pos)
return all_positions



to use it call it like this:

result=findAll('this word is a big word man how many words are there?','word')



1 people think this answer is useful

When looking for a large amount of key words in a document, use flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
result = kwp.extract_keywords(txt, span_info=True)



Flashtext runs faster than regex on large list of search words.

0 people think this answer is useful
src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
res.append(pos)
pos = src.find(sub, pos + 1)



0 people think this answer is useful

This is solution of a similar question from hackerrank. I hope this could help you.

import re
a = input()
b = input()
if b not in a:
print((-1,-1))
else:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))



Output:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)



0 people think this answer is useful
def find_index(string, let):
enumerated = [place  for place, letter in enumerate(string) if letter == let]
return enumerated



for example :

find_index("hey doode find d", "d")



returns:

[4, 7, 13, 15]



-1 people think this answer is useful

You can easily use:

string.count('test')!



https://www.programiz.com/python-programming/methods/string/count

Cheers!

-1 people think this answer is useful

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
for j in range(1,n+1):
l.append(s[i:j])
if f in l:
print(l.count(f))



-2 people think this answer is useful

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''

def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result

if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)



-2 people think this answer is useful

The pythonic way would be:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26]
>>>