PageRenderTime 102ms CodeModel.GetById 33ms RepoModel.GetById 2ms app.codeStats 1ms

/content/posts/2019-12-17-xml-pretty-print.md

https://gitlab.com/jamietanna/jvt.me
Markdown | 58 lines | 47 code | 11 blank | 0 comment | 0 complexity | ce910c27fe36bde0b9d4e7fb7c74b1f5 MD5 | raw file
  1. ---
  2. title: "Pretty Printing XML on the Command-Line"
  3. description: "How to use `xmllint` to pretty-print XML/HTML files."
  4. tags:
  5. - blogumentation
  6. - html
  7. - xml
  8. - pretty-print
  9. - command-line
  10. license_code: Apache-2.0
  11. license_prose: CC-BY-NC-SA-4.0
  12. date: 2019-12-17T21:47:31+0000
  13. slug: "xml-pretty-print"
  14. ---
  15. A couple of times recently I've found that I need to pretty-print XML - be that HTML or actual XML, but haven't found a great way from the command-line.
  16. Fortunately [DuckDuckGo has an HTML Beautify setup](https://duckduckgo.com/?q=html+beautify&ia=answer) but that's not super safe for proprietary content, nor for automating the pretty-printing.
  17. But as I found today, it turns out that the [`xmllint` command](https://linux.die.net/man/1/xmllint) can save us here.
  18. For instance, take the following XML file that I've purposefully uglified:
  19. ```xml
  20. <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>me.jvt.www</groupId> <artifactId>www-api</artifactId> <version>0.5.0-SNAPSHOT</version> <modules> <module>www-api-web</module> <module>www-api-acceptance</module> <module>www-api-core</module> <module>indieauth-spring-security</module> </modules> <packaging>pom</packaging> </project>
  21. ```
  22. If we feed this through `xmllint --format`:
  23. ```sh
  24. $ xmllint --format in.xml
  25. ```
  26. We then get the following pretty-printed XML:
  27. ```xml
  28. <?xml version="1.0" encoding="UTF-8"?>
  29. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  30. <modelVersion>4.0.0</modelVersion>
  31. <groupId>me.jvt.www</groupId>
  32. <artifactId>www-api</artifactId>
  33. <version>0.5.0-SNAPSHOT</version>
  34. <modules>
  35. <module>www-api-web</module>
  36. <module>www-api-acceptance</module>
  37. <module>www-api-core</module>
  38. <module>indieauth-spring-security</module>
  39. </modules>
  40. <packaging>pom</packaging>
  41. </project>
  42. ```
  43. If you are using this to pretty-print HTML, you can use:
  44. ```sh
  45. $ xmllint --html --format in.html
  46. ```
  47. But note that it will not ignore any non-standard HTML elements, or anything that it doesn't understand at least.