# python – Is there a simple way to remove multiple spaces in a string?

## The Question :

444 people think this question is useful

Suppose this string:

The   fox jumped   over    the log.



Turning into:

The fox jumped over the log.



What is the simplest (1-2 lines) to achieve this, without splitting and going into lists?

• What is your aversion to lists? They are an integral part of the language, and ” “.join(list_of_words) is one of the core idioms for making a list of strings into a single space-delimited string.
• @Tom/@Paul: For simple strings, (string) join would be simple and sweet. But it gets more complex if there is other whitespace that one does NOT want to disturb… in which case “while” or regex solutions would be best. I’ve posted below a string-join that would be “correct”, with timed test results for three ways of doing this.

613 people think this answer is useful
>>> import re
>>> re.sub(' +', ' ', 'The     quick brown    fox')
'The quick brown fox'



600 people think this answer is useful

foo is your string:

" ".join(foo.split())



Be warned though this removes “all whitespace characters (space, tab, newline, return, formfeed)” (thanks to hhsaffar, see comments). I.e., "this is \t a test\n" will effectively end up as "this is a test".

98 people think this answer is useful
import re
s = "The   fox jumped   over    the log."
re.sub("\s\s+" , " ", s)



or

re.sub("\s\s+", " ", s)



since the space before comma is listed as a pet peeve in PEP 8, as mentioned by user Martin Thoma in the comments.

55 people think this answer is useful

Using regexes with “\s” and doing simple string.split()’s will also remove other whitespace – like newlines, carriage returns, tabs. Unless this is desired, to only do multiple spaces, I present these examples.

I used 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum to get realistic time tests and used random-length extra spaces throughout:

original_string = ''.join(word + (' ' * random.randint(1, 10)) for word in lorem_ipsum.split(' '))



The one-liner will essentially do a strip of any leading/trailing spaces, and it preserves a leading/trailing space (but only ONE ;-).

# setup = '''

import re

def while_replace(string):
while '  ' in string:
string = string.replace('  ', ' ')

return string

def re_replace(string):
return re.sub(r' {2,}' , ' ', string)

def proper_join(string):
split_string = string.split(' ')

# To account for leading/trailing spaces that would simply be removed
beg = ' ' if not split_string[ 0] else ''
end = ' ' if not split_string[-1] else ''

# versus simply ' '.join(item for item in string.split(' ') if item)
return beg + ' '.join(item for item in split_string if item) + end

original_string = """Lorem    ipsum        ... no, really, it kept going...          malesuada enim feugiat.         Integer imperdiet    erat."""

assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string)

#'''



# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string



# re_replace_test
new_string = original_string[:]

new_string = re_replace(new_string)

assert new_string != original_string



# proper_join_test
new_string = original_string[:]

new_string = proper_join(new_string)

assert new_string != original_string



NOTE: The “while version” made a copy of the original_string, as I believe once modified on the first run, successive runs would be faster (if only by a bit). As this adds time, I added this string copy to the other two so that the times showed the difference only in the logic. Keep in mind that the main stmt on timeit instances will only be executed once; the original way I did this, the while loop worked on the same label, original_string, thus the second run, there would be nothing to do. The way it’s set up now, calling a function, using two different labels, that isn’t a problem. I’ve added assert statements to all the workers to verify we change something every iteration (for those who may be dubious). E.g., change to this and it breaks:

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string # will break the 2nd iteration

while '  ' in original_string:
original_string = original_string.replace('  ', ' ')



Tests run on a laptop with an i5 processor running Windows 7 (64-bit).

timeit.Timer(stmt = test, setup = setup).repeat(7, 1000)

test_string = 'The   fox jumped   over\n\t    the log.' # trivial

Python 2.7.3, 32-bit, Windows
test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
while_replace_test |   0.001066 |   0.001260 |   0.001128 |   0.001092
re_replace_test |   0.003074 |   0.003941 |   0.003357 |   0.003349
proper_join_test |   0.002783 |   0.004829 |   0.003554 |   0.003035

Python 2.7.3, 64-bit, Windows
test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
while_replace_test |   0.001025 |   0.001079 |   0.001052 |   0.001051
re_replace_test |   0.003213 |   0.004512 |   0.003656 |   0.003504
proper_join_test |   0.002760 |   0.006361 |   0.004626 |   0.004600

Python 3.2.3, 32-bit, Windows
test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
while_replace_test |   0.001350 |   0.002302 |   0.001639 |   0.001357
re_replace_test |   0.006797 |   0.008107 |   0.007319 |   0.007440
proper_join_test |   0.002863 |   0.003356 |   0.003026 |   0.002975

Python 3.3.3, 64-bit, Windows
test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
while_replace_test |   0.001444 |   0.001490 |   0.001460 |   0.001459
re_replace_test |   0.011771 |   0.012598 |   0.012082 |   0.011910
proper_join_test |   0.003741 |   0.005933 |   0.004341 |   0.004009



test_string = lorem_ipsum
# Thanks to http://www.lipsum.com/
# "Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum"

Python 2.7.3, 32-bit
test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
while_replace_test |   0.342602 |   0.387803 |   0.359319 |   0.356284
re_replace_test |   0.337571 |   0.359821 |   0.348876 |   0.348006
proper_join_test |   0.381654 |   0.395349 |   0.388304 |   0.388193

Python 2.7.3, 64-bit
test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
while_replace_test |   0.227471 |   0.268340 |   0.240884 |   0.236776
re_replace_test |   0.301516 |   0.325730 |   0.308626 |   0.307852
proper_join_test |   0.358766 |   0.383736 |   0.370958 |   0.371866

Python 3.2.3, 32-bit
test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
while_replace_test |   0.438480 |   0.463380 |   0.447953 |   0.446646
re_replace_test |   0.463729 |   0.490947 |   0.472496 |   0.468778
proper_join_test |   0.397022 |   0.427817 |   0.406612 |   0.402053

Python 3.3.3, 64-bit
test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
while_replace_test |   0.284495 |   0.294025 |   0.288735 |   0.289153
re_replace_test |   0.501351 |   0.525673 |   0.511347 |   0.508467
proper_join_test |   0.422011 |   0.448736 |   0.436196 |   0.440318



For the trivial string, it would seem that a while-loop is the fastest, followed by the Pythonic string-split/join, and regex pulling up the rear.

For non-trivial strings, seems there’s a bit more to consider. 32-bit 2.7? It’s regex to the rescue! 2.7 64-bit? A while loop is best, by a decent margin. 32-bit 3.2, go with the “proper” join. 64-bit 3.3, go for a while loop. Again.

In the end, one can improve performance if/where/when needed, but it’s always best to remember the mantra:

1. Make It Work
2. Make It Right
3. Make It Fast

IANAL, YMMV, Caveat Emptor!

47 people think this answer is useful

I have to agree with Paul McGuire’s comment. To me,

' '.join(the_string.split())



is vastly preferable to whipping out a regex.

My measurements (Linux and Python 2.5) show the split-then-join to be almost five times faster than doing the “re.sub(…)”, and still three times faster if you precompile the regex once and do the operation multiple times. And it is by any measure easier to understand — much more Pythonic.

15 people think this answer is useful

Similar to the previous solutions, but more specific: replace two or more spaces with one:

>>> import re
>>> s = "The   fox jumped   over    the log."
>>> re.sub('\s{2,}', ' ', s)
'The fox jumped over the log.'



13 people think this answer is useful

A simple soultion

>>> import re
>>> s="The   fox jumped   over    the log."
>>> print re.sub('\s+',' ', s)
The fox jumped over the log.



10 people think this answer is useful

You can also use the string splitting technique in a Pandas DataFrame without needing to use .apply(..), which is useful if you need to perform the operation quickly on a large number of strings. Here it is on one line:

df['message'] = (df['message'].str.split()).str.join(' ')



7 people think this answer is useful
import re
string = re.sub('[ \t\n]+', ' ', 'The     quick brown                \n\n             \t        fox')



This will remove all the tabs, new lines and multiple white spaces with single white space.

7 people think this answer is useful

I have tried the following method and it even works with the extreme case like:

str1='          I   live    on    earth           '

' '.join(str1.split())



But if you prefer a regular expression it can be done as:

re.sub('\s+', ' ', str1)



Although some preprocessing has to be done in order to remove the trailing and ending space.

4 people think this answer is useful

One line of code to remove all extra spaces before, after, and within a sentence:

sentence = "  The   fox jumped   over    the log.  "
sentence = ' '.join(filter(None,sentence.split(' ')))



Explanation:

1. Split the entire string into a list.
2. Filter empty elements from the list.
3. Rejoin the remaining elements* with a single space

*The remaining elements should be words or words with punctuations, etc. I did not test this extensively, but this should be a good starting point. All the best!

4 people think this answer is useful

This also seems to work:

while "  " in s:
s = s.replace("  ", " ")



Where the variable s represents your string.

3 people think this answer is useful

In some cases it’s desirable to replace consecutive occurrences of every whitespace character with a single instance of that character. You’d use a regular expression with backreferences to do that.

(\s)\1{1,} matches any whitespace character, followed by one or more occurrences of that character. Now, all you need to do is specify the first group (\1) as the replacement for the match.

Wrapping this in a function:

import re

def normalize_whitespace(string):
return re.sub(r'(\s)\1{1,}', r'\1', string)


>>> normalize_whitespace('The   fox jumped   over    the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First    line\t\t\t \n\n\nSecond    line')
'First line\t \nSecond line'



3 people think this answer is useful

Another alternative:

>>> import re
>>> str = 'this is a            string with    multiple spaces and    tabs'
>>> str = re.sub('[ \t]+' , ' ', str)
>>> print str
this is a string with multiple spaces and tabs



3 people think this answer is useful

Solution for Python developers:

import re

text1 = 'Python      Exercises    Are   Challenging Exercises'
print("Original string: ", text1)
print("Without extra spaces: ", re.sub(' +', ' ', text1))



Output:
Original string: Python Exercises Are Challenging Exercises Without extra spaces: Python Exercises Are Challenging Exercises

3 people think this answer is useful

Quite surprising – no one posted simple function which will be much faster than ALL other posted solutions. Here it goes:

def compactSpaces(s):
os = ""
for c in s:
if c != " " or (os and os[-1] != " "):
os += c
return os



2 people think this answer is useful

The fastest you can get for user-generated strings is:

if '  ' in text:
while '  ' in text:
text = text.replace('  ', ' ')



The short circuiting makes it slightly faster than pythonlarry’s comprehensive answer. Go for this if you’re after efficiency and are strictly looking to weed out extra whitespaces of the single space variety.

1 people think this answer is useful
def unPretty(S):
# Given a dictionary, JSON, list, float, int, or even a string...
# return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one.
return ' '.join(str(S).replace('\n', ' ').replace('\r', '').split())



1 people think this answer is useful
import re

Text = " You can select below trims for removing white space!!   BR Aliakbar     "
# trims all white spaces
print('Remove all space:',re.sub(r"\s+", "", Text), sep='')
# trims left space
print('Remove leading space:', re.sub(r"^\s+", "", Text), sep='')
# trims right space
print('Remove trailing spaces:', re.sub(r"\s+$", "", Text), sep='') # trims both print('Remove leading and trailing spaces:', re.sub(r"^\s+|\s+$", "", Text), sep='')
# replace more than one white space in the string with one white space
print('Remove more than one space:',re.sub(' +', ' ',Text), sep='')



Result:

Remove all space:Youcanselectbelowtrimsforremovingwhitespace!!BRAliakbar Remove leading space:You can select below trims for removing white space!! BR Aliakbar
Remove trailing spaces: You can select below trims for removing white space!! BR Aliakbar Remove leading and trailing spaces:You can select below trims for removing white space!! BR Aliakbar Remove more than one space: You can select below trims for removing white space!! BR Aliakbar

1 people think this answer is useful

" ".join(foo.split()) is not quite correct with respect to the question asked because it also entirely removes single leading and/or trailing white spaces. So, if they shall also be replaced by 1 blank, you should do something like the following:

" ".join(('*' + foo + '*').split()) [1:-1]



Of course, it’s less elegant.

1 people think this answer is useful

Because @pythonlarry asked here are the missing generator based versions

The groupby join is easy. Groupby will group elements consecutive with same key. And return pairs of keys and list of elements for each group. So when the key is an space an space is returne else the entire group.

from itertools import groupby
def group_join(string):
return ''.join(' ' if chr==' ' else ''.join(times) for chr,times in groupby(string))



The group by variant is simple but very slow. So now for the generator variant. Here we consume an iterator, the string, and yield all chars except chars that follow an char.

def generator_join_generator(string):
last=False
for c in string:
if c==' ':
if not last:
last=True
yield ' '
else:
last=False
yield c

def generator_join(string):
return ''.join(generator_join_generator(string))



So i meassured the timings with some other lorem ipsum.

• while_replace 0.015868543065153062
• re_replace 0.22579886706080288
• proper_join 0.40058281796518713
• group_join 5.53206754301209
• generator_join 1.6673167790286243

With Hello and World separated by 64KB of spaces

• while_replace 2.991308711003512
• re_replace 0.08232860406860709
• proper_join 6.294375243945979
• group_join 2.4320066600339487
• generator_join 6.329648651066236

Not forget the original sentence

• while_replace 0.002160938922315836
• re_replace 0.008620491018518806
• proper_join 0.005650000995956361
• group_join 0.028368217987008393
• generator_join 0.009435956948436797

Interesting here for nearly space only strings group join is not that worse Timing showing always median from seven runs of a thousand times each.

0 people think this answer is useful

If it’s whitespace you’re dealing with, splitting on None will not include an empty string in the returned value.

5.6.1. String Methods, str.split()

0 people think this answer is useful
string = 'This is a             string full of spaces          and taps'
string = string.split(' ')
while '' in string:
string.remove('')
string = ' '.join(string)
print(string)



Results:

This is a string full of spaces and taps

0 people think this answer is useful

To remove white space, considering leading, trailing and extra white space in between words, use:

(?<=\s) +|^ +(?=\s)| (?= +[\n\0])



The first or deals with leading white space, the second or deals with start of string leading white space, and the last one deals with trailing white space.

For proof of use, this link will provide you with a test.

https://regex101.com/r/meBYli/4

This is to be used with the re.split function.

0 people think this answer is useful

I’ve got a simple method without splitting:

a = "Lorem   Ipsum Darum     Diesrum!"
while True:
count = a.find("  ")
if count > 0:
a = a.replace("  ", " ")
count = a.find("  ")
continue
else:
break

print(a)



0 people think this answer is useful
sentence = "The   fox jumped   over    the log."
word = sentence.split()
result = ""
for string in word:
result += string+" "
print(result)



-1 people think this answer is useful

I haven’t read a lot into the other examples, but I have just created this method for consolidating multiple consecutive space characters.

It does not use any libraries, and whilst it is relatively long in terms of script length, it is not a complex implementation:

def spaceMatcher(command):
"""
Function defined to consolidate multiple whitespace characters in
strings to a single space
"""
# Initiate index to flag if more than one consecutive character
iteration
space_match = 0
space_char = ""
for char in command:
if char == " ":
space_match += 1
space_char += " "
elif (char != " ") &amp; (space_match > 1):
new_command = command.replace(space_char, " ")
space_match = 0
space_char = ""
elif char != " ":
space_match = 0
space_char = ""
return new_command

command = None
command = str(input("Please enter a command ->"))
print(spaceMatcher(command))
print(list(spaceMatcher(command)))



-1 people think this answer is useful

I have my simple method which I have used in college.

line = "I     have            a       nice    day."

end = 1000
while end != 0:
line.replace("  ", " ")
end -= 1



This will replace every double space with a single space and will do it 1000 times. It means you can have 2000 extra spaces and will still work. 🙂

# python... 3.x