# python – How to find all occurrences of a substring?

## The Question :

Python has string.find() and string.rfind() to get the index of a substring in a string.

I’m wondering whether there is something like string.find_all() which can return all found indexes (not only the first from the beginning or the first from the end).

For example:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#this is the goal
print string.find_all('test') # [0,5,10,15]


• what should 'ttt'.find_all('tt') return?
• it should return ‘0’. Of course, in perfect world there also has to be 'ttt'.rfind_all('tt'), which should return ‘1’
• Seems like a duplicate of this stackoverflow.com/questions/3873361/…

There is no simple built-in string function that does what you’re looking for, but you could use the more powerful regular expressions:

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]



If you want to find overlapping matches, lookahead will do that:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]



If you want a reverse find-all without overlaps, you can combine positive and negative lookahead into an expression like this:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]



re.finditer returns a generator, so you could change the [] in the above to () to get a generator instead of a list which will be more efficient if you’re only iterating through the results once.

>>> help(str.find)
Help on method_descriptor:

find(...)
S.find(sub [,start [,end]]) -> int



Thus, we can build it ourselves:

def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]



No temporary strings or regexes required.

Here’s a (very inefficient) way to get all (i.e. even overlapping) matches:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]



Again, old thread, but here’s my solution using a generator and plain str.find.

def findall(p, s):
'''Yields all the positions of
the pattern p in the string s.'''
i = s.find(p)
while i != -1:
yield i
i = s.find(p, i+1)



### Example

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]



returns

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]



You can use re.finditer() for non-overlapping matches.

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]



but won’t work for:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]



Come, let us recurse together.

def locations_of_substring(string, substring):
"""Return a list of locations of a substring."""

substring_length = len(substring)
def recurse(locations_found, start):
location = string.find(substring, start)
if location != -1:
return recurse(locations_found + [location], location+substring_length)
else:
return locations_found

return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]



No need for regular expressions this way.

If you’re just looking for a single character, this would work:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7



Also,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4



My hunch is that neither of these (especially #2) is terribly performant.

this is an old thread but i got interested and wanted to share my solution.

def find_all(a_string, sub):
result = []
k = 0
while k < len(a_string):
k = a_string.find(sub, k)
if k == -1:
return result
else:
result.append(k)
k += 1 #change to k += len(sub) to not search overlapping results
return result



It should return a list of positions where the substring was found. Please comment if you see an error or room for improvment.

This does the trick for me using re.finditer

import re

text = 'This is sample text to test if this pythonic '\
'program can serve as an indexing platform for '\
'finding words in a paragraph. It can give '\
'values as to where the word is located with the '\
'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
print('start {}, end {}, search string \'{}\''.
format(match.start(), match.end(), match.group()))



This thread is a little old but this worked for me:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
try:
print(numberString.index("five",marker))
marker = numberString.index("five", marker) + 1
except ValueError:
marker = len(numberString)



You can try :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
if string[index:index+(len("test"))] == "test":
print index

0
5
10
15



Whatever the solutions provided by others are completely based on the available method find() or any available methods.

What is the core basic algorithm to find all the occurrences of a substring in a string?

def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes



You can also inherit str class to new class and can use this function below.

class newstr(str):
def find_all(string,substring):
"""
Function: Returning all the index of substring in a string
Arguments: String and the search string
Return:Returning a list
"""
length = len(substring)
c=0
indexes = []
while c < len(string):
if string[c:c+length] == substring:
indexes.append(c)
c=c+1
return indexes



Calling the method

This function does not look at all positions inside the string, it does not waste compute resources. My try:

def findAll(string,word):
all_positions=[]
next_pos=-1
while True:
next_pos=string.find(word,next_pos+1)
if(next_pos<0):
break
all_positions.append(next_pos)
return all_positions



to use it call it like this:

result=findAll('this word is a big word man how many words are there?','word')



When looking for a large amount of key words in a document, use flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
result = kwp.extract_keywords(txt, span_info=True)



Flashtext runs faster than regex on large list of search words.

src = input() # we will find substring in this string
sub = input() # substring

res = []
pos = src.find(sub)
while pos != -1:
res.append(pos)
pos = src.find(sub, pos + 1)



This is solution of a similar question from hackerrank. I hope this could help you.

import re
a = input()
b = input()
if b not in a:
print((-1,-1))
else:
#create two list as
start_indc = [m.start() for m in re.finditer('(?=' + b + ')', a)]
for i in range(len(start_indc)):
print((start_indc[i], start_indc[i]+len(b)-1))



Output:

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)



def find_index(string, let):
enumerated = [place  for place, letter in enumerate(string) if letter == let]
return enumerated



for example :

find_index("hey doode find d", "d")



returns:

[4, 7, 13, 15]



You can easily use:

string.count('test')!



https://www.programiz.com/python-programming/methods/string/count

Cheers!

By slicing we find all the combinations possible and append them in a list and find the number of times it occurs using count function

s=input()
n=len(s)
l=[]
f=input()
print(s[0])
for i in range(0,n):
for j in range(1,n+1):
l.append(s[i:j])
if f in l:
print(l.count(f))



#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''

def get_substring_indices(text, s):
result = [i for i in range(len(text)) if text.startswith(s, i)]
return result

if __name__ == '__main__':
text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
s = 'wood'
print get_substring_indices(text, s)



The pythonic way would be:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26]
