What is the best way (or are the various ways) to pretty print XML in Python?
Pretty printing XML in Python
The Question :
The Answer 1
import xml.dom.minidom dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string) pretty_xml_as_string = dom.toprettyxml()
The Answer 2
lxml is recent, updated, and includes a pretty print function
import lxml.etree as etree x = etree.parse("filename") print etree.tostring(x, pretty_print=True)
Check out the lxml tutorial: http://lxml.de/tutorial.html
The Answer 3
Another solution is to borrow this indent
function, for use with the ElementTree library that’s built in to Python since 2.5.
Here’s what that would look like:
from xml.etree import ElementTree def indent(elem, level=0): i = "\n" + level*" " j = "\n" + (level-1)*" " if len(elem): if not elem.text or not elem.text.strip(): elem.text = i + " " if not elem.tail or not elem.tail.strip(): elem.tail = i for subelem in elem: indent(subelem, level+1) if not elem.tail or not elem.tail.strip(): elem.tail = j else: if level and (not elem.tail or not elem.tail.strip()): elem.tail = j return elem root = ElementTree.parse('/tmp/xmlfile').getroot() indent(root) ElementTree.dump(root)
The Answer 4
Here’s my (hacky?) solution to get around the ugly text node problem.
uglyXml = doc.toprettyxml(indent=' ') text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL) prettyXml = text_re.sub('>\g<1></', uglyXml) print prettyXml
The above code will produce:
<?xml version="1.0" ?> <issues> <issue> <id>1</id> <title>Add Visual Studio 2005 and 2008 solution files</title> <details>We need Visual Studio 2005/2008 project files for Windows.</details> </issue> </issues>
Instead of this:
<?xml version="1.0" ?> <issues> <issue> <id> 1 </id> <title> Add Visual Studio 2005 and 2008 solution files </title> <details> We need Visual Studio 2005/2008 project files for Windows. </details> </issue> </issues>
Disclaimer: There are probably some limitations.
The Answer 5
BeautifulSoup has a easy to use prettify()
method.
It indents one space per indentation level. It works much better than lxml’s pretty_print and is short and sweet.
from bs4 import BeautifulSoup bs = BeautifulSoup(open(xml_file), 'xml') print bs.prettify()
The Answer 6
As others pointed out, lxml has a pretty printer built in.
Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.
Here’s a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False
). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'
):
from lxml import etree def prettyPrintXml(xmlFilePathToPrettyPrint): assert xmlFilePathToPrettyPrint is not None parser = etree.XMLParser(resolve_entities=False, strip_cdata=False) document = etree.parse(xmlFilePathToPrettyPrint, parser) document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')
Example usage:
prettyPrintXml('some_folder/some_file.xml')
The Answer 7
If you have xmllint
you can spawn a subprocess and use it. xmllint --format <file>
pretty-prints its input XML to standard output.
Note that this method uses an program external to python, which makes it sort of a hack.
def pretty_print_xml(xml): proc = subprocess.Popen( ['xmllint', '--format', '/dev/stdin'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, ) (output, error_output) = proc.communicate(xml); return output print(pretty_print_xml(data))
The Answer 8
I tried to edit “ade”s answer above, but Stack Overflow wouldn’t let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.
def indent(elem, level=0, more_sibs=False): i = "\n" if level: i += (level-1) * ' ' num_kids = len(elem) if num_kids: if not elem.text or not elem.text.strip(): elem.text = i + " " if level: elem.text += ' ' count = 0 for kid in elem: indent(kid, level+1, count < num_kids - 1) count += 1 if not elem.tail or not elem.tail.strip(): elem.tail = i if more_sibs: elem.tail += ' ' else: if level and (not elem.tail or not elem.tail.strip()): elem.tail = i if more_sibs: elem.tail += ' '
The Answer 9
If you’re using a DOM implementation, each has their own form of pretty-printing built-in:
# minidom # document.toprettyxml() # 4DOM # xml.dom.ext.PrettyPrint(document, stream) # pxdom (or other DOM Level 3 LS-compliant imp) # serializer.domConfig.setParameter('format-pretty-print', True) serializer.writeToString(document)
If you’re using something else without its own pretty-printer — or those pretty-printers don’t quite do it the way you want — you’d probably have to write or subclass your own serialiser.
The Answer 10
I had some problems with minidom’s pretty print. I’d get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1')
. Here’s my workaround for it:
def toprettyxml(doc, encoding): """Return a pretty-printed XML document in a given encoding.""" unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>', u'<?xml version="1.0" encoding="%s"?>' % encoding) return unistr.encode(encoding, 'xmlcharrefreplace')
The Answer 11
from yattag import indent pretty_string = indent(ugly_string)
It won’t add spaces or newlines inside text nodes, unless you ask for it with:
indent(mystring, indent_text = True)
You can specify what the indentation unit should be and what the newline should look like.
pretty_xml_string = indent( ugly_xml_string, indentation = ' ', newline = '\r\n' )
The doc is on http://www.yattag.org homepage.
The Answer 12
I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.
def prettify(element, indent=' '): queue = [(0, element)] # (level, element) while queue: level, element = queue.pop(0) children = [(level + 1, child) for child in list(element)] if children: element.text = '\n' + indent * (level+1) # for child open if queue: element.tail = '\n' + indent * queue[0][0] # for sibling open else: element.tail = '\n' + indent * (level-1) # for parent close queue[0:0] = children # prepend so children come before siblings
The Answer 13
Here’s a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.
import xml.etree.ElementTree as ET import xml.dom.minidom import os def pretty_print_xml_given_root(root, output_xml): """ Useful for when you are editing xml data on the fly """ xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml() xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue with open(output_xml, "w") as file_out: file_out.write(xml_string) def pretty_print_xml_given_file(input_xml, output_xml): """ Useful for when you want to reformat an already existing xml file """ tree = ET.parse(input_xml) root = tree.getroot() pretty_print_xml_given_root(root, output_xml)
I found how to fix the common newline issue here.
The Answer 14
XML pretty print for python looks pretty good for this task. (Appropriately named, too.)
An alternative is to use pyXML, which has a PrettyPrint function.
The Answer 15
You can use popular external library xmltodict, with unparse
and pretty=True
you will get best result:
xmltodict.unparse( xmltodict.parse(my_xml), full_document=False, pretty=True)
full_document=False
against <?xml version="1.0" encoding="UTF-8"?>
at the top.
The Answer 16
As of Python 3.9 (still a release candidate as of 12 Aug 2020), there is a new xml.etree.ElementTree.indent()
function for pretty-printing XML trees.
Sample usage:
import xml.etree.ElementTree as ET element = ET.XML("<html><body>text</body></html>") ET.indent(element)
The upside is that it does not require any additional libraries. For more information check https://bugs.python.org/issue14465 and https://github.com/python/cpython/pull/15200
The Answer 17
Take a look at the vkbeautify module.
It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn’t have any dependency.
Examples:
import vkbeautify as vkb vkb.xml(text) vkb.xml(text, 'path/to/dest/file') vkb.xml('path/to/src/file') vkb.xml('path/to/src/file', 'path/to/dest/file')
The Answer 18
You can try this variation…
Install BeautifulSoup
and the backend lxml
(parser) libraries:
user$ pip3 install lxml bs4
Process your XML document:
from bs4 import BeautifulSoup with open('/path/to/file.xml', 'r') as doc: for line in doc: print(BeautifulSoup(line, 'lxml-xml').prettify())
The Answer 19
An alternative if you don’t want to have to reparse, there is the xmlpp.py library with the get_pprint()
function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.
The Answer 20
I had this problem and solved it like this:
def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'): pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding) if pretty_print: pretty_printed_xml = pretty_printed_xml.replace(' ', indent) file.write(pretty_printed_xml)
In my code this method is called like this:
try: with open(file_path, 'w') as file: file.write('<?xml version="1.0" encoding="utf-8" ?>') # create some xml content using etree ... xml_parser = XMLParser() xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t') except IOError: print("Error while writing in log file!")
This works only because etree by default uses two spaces
to indent, which I don’t find very much emphasizing the indentation and therefore not pretty. I couldn’t ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.
The Answer 21
For converting an entire xml document to a pretty xml document
(ex: assuming you’ve extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly “content.xml” file to a pretty one for automated git version control and git difftool
ing of .odt/.ods files, such as I’m implementing here)
import xml.dom.minidom file = open("./content.xml", 'r') xml_string = file.read() file.close() parsed_xml = xml.dom.minidom.parseString(xml_string) pretty_xml_as_string = parsed_xml.toprettyxml() file = open("./content_new.xml", 'w') file.write(pretty_xml_as_string) file.close()
References:
– Thanks to Ben Noland’s answer on this page which got me most of the way there.
The Answer 22
from lxml import etree import xml.dom.minidom as mmd xml_root = etree.parse(xml_fiel_path, etree.XMLParser()) def print_xml(xml_root): plain_xml = etree.tostring(xml_root).decode('utf-8') urgly_xml = ''.join(plain_xml .split()) good_xml = mmd.parseString(urgly_xml) print(good_xml.toprettyxml(indent=' ',))
It’s working well for the xml with Chinese!
The Answer 23
If for some reason you can’t get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:
import subprocess def makePretty(filepath): cmd = "xmllint --format " + filepath prettyXML = subprocess.check_output(cmd, shell = True) with open(filepath, "w") as outfile: outfile.write(prettyXML)
As far as I know, this solution will work on Unix-based systems that have the xmllint
package installed.
The Answer 24
I found this question while looking for “how to pretty print html”
Using some of the ideas in this thread I adapted the XML solutions to work for XML or HTML:
from xml.dom.minidom import parseString as string_to_dom def prettify(string, html=True): dom = string_to_dom(string) ugly = dom.toprettyxml(indent=" ") split = list(filter(lambda x: len(x.strip()), ugly.split('\n'))) if html: split = split[1:] pretty = '\n'.join(split) return pretty def pretty_print(html): print(prettify(html))
When used this is what it looks like:
html = """\ <div class="foo" id="bar"><p>'IDK!'</p><br/><div class='baz'><div> <span>Hi</span></div></div><p id='blarg'>Try for 2</p> <div class='baz'>Oh No!</div></div> """ pretty_print(html)
Which returns:
<div class="foo" id="bar"> <p>'IDK!'</p> <br/> <div class="baz"> <div> <span>Hi</span> </div> </div> <p id="blarg">Try for 2</p> <div class="baz">Oh No!</div> </div>
The Answer 25
Use etree.indent
and etree.tostring
import lxml.etree as etree root = etree.fromstring('<html><head></head><body><h1>Welcome</h1></body></html>') etree.indent(root, space=" ") xml_string = etree.tostring(root, pretty_print=True).decode() print(xml_string)
output
<html> <head/> <body> <h1>Welcome</h1> </body> </html>
Removing namespaces and prefixes
import lxml.etree as etree def dump_xml(element): for item in element.getiterator(): item.tag = etree.QName(item).localname etree.cleanup_namespaces(element) etree.indent(element, space=" ") result = etree.tostring(element, pretty_print=True).decode() return result root = etree.fromstring('<cs:document xmlns:cs="http://blabla.com"><name>hello world</name></cs:document>') xml_string = dump_xml(root) print(xml_string)
output
<document> <name>hello world</name> </document>
The Answer 26
I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:
f = open(file_name,'r') xml = f.read() f.close() #Removing old indendations raw_xml = '' for line in xml: raw_xml += line xml = raw_xml new_xml = '' indent = ' ' deepness = 0 for i in range((len(xml))): new_xml += xml[i] if(i<len(xml)-3): simpleSplit = xml[i:(i+2)] == '><' advancSplit = xml[i:(i+3)] == '></' end = xml[i:(i+2)] == '/>' start = xml[i] == '<' if(advancSplit): deepness += -1 new_xml += '\n' + indent*deepness simpleSplit = False deepness += -1 if(simpleSplit): new_xml += '\n' + indent*deepness if(start): deepness += 1 if(end): deepness += -1 f = open(file_name,'w') f.write(new_xml) f.close()
It works for me, perhaps someone will have some use of it 🙂