Or, how to parse XML documents from the shell.
This is part two of a three part series exploring how to automate the experience of alerting your sysadmin that PaperCut MF has a new version available:
PART 1 - Automate your PaperCut MF upgrade review process - Alec shows you how to discover which version of PaperCut MF is currently running on your server
PART 3 - Upgrade Automation: Putting it all together - Alec ties it all together so that a review ticket gets assigned to the correct application administrator for action if a new release appears.
In a previous post I talked about how to discover which version of PaperCut MF is running on your server. Now we have that information, we need to know the latest released version from the PaperCut website to see if there is a new release to install.
Please note that this post is all about how to parse XML data, specifically an Atom feed (using PowerShell or a POSIX shell) rather than anything PaperCut specific.
How we publish information on new PaperCut MF releases
You can discover the latest release of PaperCut MF by looking at our Atom feed here:
http://www.papercut.com/products/mf/release-history.atom
(If you are using PaperCut NG then use the URL http://www.papercut.com/products/ng/release-history.atom )
If you use a feed reader it will look pretty.
But in reality it’s ugly XML, which we need to dive into.
Let’s dump out the PaperCut XML document as text (this works on both Windows PowerShell 7, and a Linux or macOS POSIX shells) so we can explore it.
Note: Instead of saying “POSIX shells” again and again, I am going to refer to the Dash shell .
# PaperCut MF specific URL
curl -sL http://www.papercut.com/products/mf/release-history.atom | more
Which delivers a lot of XML output! (lines truncated to fit article)
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><id>urn:uuid:a020f74f-08
<div class="type--align-center margin--bottom--xxl">
<h2 class="type--h1 position--relative release--version"><a class="anchor" id="v
<h3 class="type--h4 margin--top--xxs release--build">Build 62842</h3>
<h4 class="type--h4-large margin--top--xxs">Print Provider version 109.2.0.5261<
<p class="type--subhead2 margin--top--xxs release--date">12 Jul, 2022</p>
</div>
<h3 id="copier--device-integration">Copier / Device Integration:</h3>
<ul>
<li><strong>Sharp:</strong> Support for Sharp CR5 Atlas and Titan models under P
</ul>
</div></div></content><link href="https://www.papercut.com/products/mf/release-h
<div class="type--align-center margin--bottom--xxl">
<h2 class="type--h1 position--relative release--version"><a class="anchor" id="v
<h3 class="type--h4 margin--top--xxs release--build">Build 62695</h3>
...
An Atom feed is an XML document with a simple structure. It’s hard to parse this visually, let’s see if it’s easier if we reformat the XML.
If you look hard enough you can see an <entry>
element, that contains an <id>
element. We will come back to them.
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<id>urn:uuid:a020f74f-0816-11dc-88ea-0015f2a54b0c</id>
<title>PaperCut MF Releases</title>
<updated>2022-07-12T02:00:00+00:00</updated>
<author>
<name>The PaperCut Team</name>
<uri>https://www.papercut.com/support</uri>
</author>
<link href="https://www.papercut.com/products/mf/release-history/" rel="alternate"/>
<link href="https://www.papercut.com/products/mf/release-history.atom" rel="self"/>
<generator uri="https://lkiesow.github.io/python-feedgen" version="0.9.0">python-feedgen</generator>
<logo>https://cdn.papercut.com/web/img/logo.png</logo>
<entry>
<id>tag:papercut.com,2022-07-12:mf/releases/v22-0-2</id>
<title>PaperCut MF 22.0.2 (Build 62842)</title>
<updated>2022-07-12T02:00:00+00:00</updated>
<author>
<name>The PaperCut Team</name>
<uri>https://www.papercut.com/support</uri>
</author>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<div class="release">
<div class="type--align-center margin--bottom--xxl">
<h2 class="type--h1 position--relative release--version"><a class="anchor" id="v22-0-2"/>22.0.2</h2>
<h3 class="type--h4 margin--top--xxs release--build">Build 62842</h3>
<h4 class="type--h4-large margin--top--xxs">Print Provider version 109.2.0.5261</h4>
<p class="type--subhead2 margin--top--xxs release--date">12 Jul, 2022</p>
</div>
<h3 id="copier--device-integration">Copier / Device Integration:</h3>
<ul>
...
If you are not familiar with XML or Atom them you may care to read the Atom introduction at the World Wide Web Consortium.
Before we start let’s download a copy of the current XML document to a local file and save repeated web access during our experiments.
(This command will work in both Dash and Powershell)
curl -sL http://www.papercut.com/products/mf/release-history.atom > atom.xml
Exploring XML via the shell
We can explore the XML text to discover information of interest with a tool like xmllint, using its interactive shell feature. The following walk-through assumes you have xmllint installed in either the Dash shell, or PowerShell. On Windows I installed xmllint via Chocolatey , and on Debian Linux I installed libxml2-utils .
-
Run xmllint in interactive shell mode on our XML file
xmllint --shell atom.xml
we are now at the xmllint shell prompt ans can start typing xmllint commands
-
This XML content needs an explicit namespace. Set the namespace with
setns atom=http://www.w3.org/2005/Atom
-
We know (from reading the Atom documentation ) that an Atom Feed Document consists of a feed element and multiple entry elements . Let’s look at the 1st entry element (arrays start at 1)
ls /atom:feed/atom:entry[1]
(Notice that we need to preface each element with correct namespace (see step 2 above))
--- 1 id --- 1 title --- 1 updated --- 2 author -a- 1 content -a- 0 link --- 1 published
-
The
id
element look interesting. Let’s dump the contentscat /atom:feed/atom:entry[1]/atom:id
on my system, at the time of writing, it displays
<id>tag:papercut.com,2022-07-12:mf/releases/v22-0-2</id>
-
Type
quit
to exit
So each <entry>
element has an <id>
element that contains the version information. However, the
semvar
information (MAJOR, MINOR, PATCH) contained in the string will need to be extracted and reformatted, so that v22-0-2
becomes 22.0.2
Furthermore we can assume that the latest release will always be the 1st <entry>
element (array element [1]
).
Dash solution
xmllint can use an
XPATH
query (option --xpath
) to extract the information we need. Working out the correct XPATH query string is rather involved, so I’ll just give it to you without explanation:
//*[local-name()='feed'\]/\*\[local-name()='entry'\]/\*\[local-name()='id'\]/text()
As part of a complete command
xmllint --xpath "//*[local-name()='feed']/*[local-name()='entry'][1]/*[local-name()='id']/text()" atom.xml
which returns
tag:papercut.com,2022-07-12:mf/releases/v22-0-2
That’s better but we still have a few things to fix:
- We only need the major, minor, and patch numbers, not all the preceding text.
- The semver separator used here is the “-”, rather than the “.” we need.
With Dash the
sed
tool is handy for doing this type of text manipulation. These sed expressions are quite complicated, so rather than explain the details I’ll just give you the final solution again.
PRODUCT="mf"
CURRENT_RELEASE=$(curl -sL http://www.papercut.com/products/$PRODUCT/release-history.atom |
xmllint --xpath "//*[local-name()='feed']/*[local-name()='entry'][1]/*[local-name()='id']/text()" - |
sed -Ene 's/tag:papercut.com,[0-9]{4}-[0-9]{2}-[0-9]{2}:'$PRODUCT'\/releases\/v([0-9]+)-([0-9]+)-([0-9]+)/\1.\2.\3/gp')
echo $CURRENT_RELEASE
Which gave me the answer of 22.0.2
. We have a winner!
(If you are not familiar with the regular expressions, used above, you might find this video useful.)
PowerShell solution
If you use PowerShell, your life is a little simpler.
-
PowerShell already knows about XML, and will parse the XML
<entry>
elements into a .NET XmlElement object. We don’t need any additional tools to execute an XPATH query, and the attribute selection is much simpler. To access this power we use theInvoke-RestMethod
cmdlet instead of curl. For example(Invoke-RestMethod -uri http://www.papercut.com/products/mf/release-history.atom).id[0]
If you are curious about about this works, the following two commands might give you some hints
(Invoke-RestMethod -uri http://www.papercut.com/products/mf/release-history.atom) | get-member (Invoke-RestMethod -uri http://www.papercut.com/products/mf/release-history.atom)
Notice how in PowerShell the XML array starts at
0
, instead of1
used in the XPATH query. -
We can then apply the
-replace
operator to extract the major, minor, patch numbers in the correct format.$PRODUCT="mf" $CURRENT_RELEASE=((Invoke-RestMethod -uri http://www.papercut.com/products/$PRODUCT/release-history.atom).id[0] ` -replace ` "^tag:papercut.com,[0-9]{4}-[0-9]{2}-[0-9]{2}:$PRODUCT\/releases\/v(\d+)-(\d+)-(\d+)$", ` '$1.$2.$3') echo $CURRENT_RELEASE
Which gave me the answer of 22.0.2
. We have another a winner!
So from the previous post we know which PaperCut MF version is running on our local server, and now we also what the most current release version is. In the final post we will tie these pieces together to create the final automation script. Hint – it’s not as simple are just comparing the installed version against the latest release.