Issue
I am trying to remove a specific tag from an xml file but only if it is empty.
file:
<?xml version="1.0" encoding="utf-8"?>
<parent>
<child>
<value1>Foo<value1/>
<value2>Bar<value2/>
<value3>Hello World<value3/>
<value3/>
<value3/>
<value3/>
<child/>
<parent/>
expected output:
<?xml version="1.0" encoding="utf-8"?>
<parent>
<child>
<value1>Foo<value1/>
<value2>Bar<value2/>
<value3>Hello World<value3/>
<child/>
<parent/>
I am having problems reading a file and parsing it with lxml
so I am open to any other python3 methods/modules.
ideally would like the code todo something like the following:
def remove_empty_tag(tag=tagname, file=data):
...
data = open("file.xml").read()
new_xml = remove_empty_tag(tag="value3", data)
print(new_xml)
but open for any help really or even direction.
Solution
You shouldn't need to open()
the file for reading or writing; use lxml's parse()
to parse the file and write()
to write the new one.
You should also be able to use the self::
xpath axis instead of a python if
to check the tag name.
Example...
XML Input (old.xml)
<parent>
<child>
<value1>Foo</value1>
<value2>Bar</value2>
<value3>Hello World</value3>
<value3/>
<value3/>
<value3/>
</child>
</parent>
Python
from lxml import etree
def remove_empty_tag(tag, original_file, new_file):
root = etree.parse(original_file)
for element in root.xpath(f".//*[self::{tag} and not(node())]"):
element.getparent().remove(element)
# Serialize "root" and create a new tree using an XMLParser to clean up
# formatting caused by removing elements.
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.fromstring(etree.tostring(root), parser=parser)
# Write to new file.
etree.ElementTree(tree).write(new_file, pretty_print=True, xml_declaration=True, encoding="utf-8")
remove_empty_tag("value3", "old.xml", "new.xml")
XML Output (new.xml)
<?xml version='1.0' encoding='UTF-8'?>
<parent>
<child>
<value1>Foo</value1>
<value2>Bar</value2>
<value3>Hello World</value3>
</child>
</parent>
Note: The serializing and creating a new tree is not strictly necessary. You could just do this instead:
root.write(new_file, pretty_print=True, xml_declaration=True, encoding="utf-8")
but the formatting of the output will be slightly different (notice the extra indent of the child
end tag:
<?xml version='1.0' encoding='UTF-8'?>
<parent>
<child>
<value1>Foo</value1>
<value2>Bar</value2>
<value3>Hello World</value3>
</child>
</parent>
Answered By - Daniel Haley Answer Checked By - Terry (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.