PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 28, 2022

[FIXED] How to remove a specific tag that could be empty in an xml file

 October 28, 2022     is-empty, lxml, python-3.x, tags, xml     No comments   

Issue

I am trying to remove a specific tag from an xml file but only if it is empty.

file:

<?xml version="1.0" encoding="utf-8"?>
<parent>
  <child>
    <value1>Foo<value1/>
    <value2>Bar<value2/>
    <value3>Hello World<value3/>
    <value3/>
    <value3/>
    <value3/>
  <child/>
<parent/>

expected output:

<?xml version="1.0" encoding="utf-8"?>
<parent>
  <child>
    <value1>Foo<value1/>
    <value2>Bar<value2/>
    <value3>Hello World<value3/>
  <child/>
<parent/>

I am having problems reading a file and parsing it with lxml so I am open to any other python3 methods/modules. ideally would like the code todo something like the following:

def remove_empty_tag(tag=tagname, file=data):
   ...

data = open("file.xml").read()
new_xml = remove_empty_tag(tag="value3", data)
print(new_xml)

but open for any help really or even direction.


Solution

You shouldn't need to open() the file for reading or writing; use lxml's parse() to parse the file and write() to write the new one.

You should also be able to use the self:: xpath axis instead of a python if to check the tag name.

Example...

XML Input (old.xml)

<parent>
  <child>
    <value1>Foo</value1>
    <value2>Bar</value2>
    <value3>Hello World</value3>
    <value3/>
    <value3/>
    <value3/>
  </child>
</parent>

Python

from lxml import etree


def remove_empty_tag(tag, original_file, new_file):
    root = etree.parse(original_file)
    for element in root.xpath(f".//*[self::{tag} and not(node())]"):
        element.getparent().remove(element)

    # Serialize "root" and create a new tree using an XMLParser to clean up
    # formatting caused by removing elements.
    parser = etree.XMLParser(remove_blank_text=True)
    tree = etree.fromstring(etree.tostring(root), parser=parser)
    # Write to new file.
    etree.ElementTree(tree).write(new_file, pretty_print=True, xml_declaration=True, encoding="utf-8")


remove_empty_tag("value3", "old.xml", "new.xml")

XML Output (new.xml)

<?xml version='1.0' encoding='UTF-8'?>
<parent>
  <child>
    <value1>Foo</value1>
    <value2>Bar</value2>
    <value3>Hello World</value3>
  </child>
</parent>

Note: The serializing and creating a new tree is not strictly necessary. You could just do this instead:

root.write(new_file, pretty_print=True, xml_declaration=True, encoding="utf-8")

but the formatting of the output will be slightly different (notice the extra indent of the child end tag:

<?xml version='1.0' encoding='UTF-8'?>
<parent>
  <child>
    <value1>Foo</value1>
    <value2>Bar</value2>
    <value3>Hello World</value3>
    </child>
</parent>


Answered By - Daniel Haley
Answer Checked By - Terry (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing