Archive for the ‘GPX’ tag
Detecting errors in GPX by validation
Always being one keen to live on the bleeding edge, I’ve been using the beta software for my Garmin Colorado. It has been great for driving around and recording tracklogs for the OpenStreetMap project.
One of the downside of using the beta software though, means that you can be exposed to bugs. I discovered that recently when attempting to open some of my recent GPX tracklogs and the software just refused to open them. After a bit of hunting, I found a relatively easy means of detecting errors and fixing them.
The tool to use is an XML validator called Xerces produced by the Apache Foundation. On a Mac, I download the appropriate binary package, and I copied the binary files in xerces/bin to /usr/local/bin, and the libraries from xerces/lib to /usr/local/lib. You can then run the program SAXCount that counts the number of elements in an XML file – the side benefit that we’re after is that it is good at detecting and reporting errors that many GPX applications are not capable of.
After working through a few minor problems on the NZ GPS forums, I had Xerces up and validating GPX – including with Garmin’s extensions. Note that if you get an error about trying to connect to Garmin’s server to download the schema, e.g. an error like…
Fatal Error at file , line 0, char 0
Message: An exception
occurred! Type:NetAccessorException, Message:Could not open file:
http://www.garmin.com/xmlschemas/GpxExtensions/v3/GpxExtensionsv3.xsd
I believe this is a combination of Garmin redirecting the original link to a new location, and SAXCount not handling the redirect very well. If you strike this problem, this post in the forums has the fix. I’d basically recommend keeping a version of the fixed Garmin header ready to cut and paste into each GPX so that SAXCount can actually download each xsd. I’ve been using this one…
<?xml version="1.0" encoding="UTF-8" standalone="no"?><gpx xmlns="http://www.topografix.com/GPX/1/1" xmlns:gpxx="http://www.garmin.com/xmlschemas/GpxExtensions/v3" xmlns:gpxtpx="http://www.garmin.com/xmlschemas/TrackPointExtension/v1" creator="Colorado 300" version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd http://www.garmin.com/xmlschemas/GpxExtensions/v3 http://www8.garmin.com/xmlschemas/GpxExtensions/v3/GpxExtensionsv3.xsd http://www.garmin.com/xmlschemas/TrackPointExtension/v1 http://www8.garmin.com/xmlschemas/TrackPointExtensionv1.xsd">
From there it is a simple step to validate.
SAXCount -v=always -n -s -f test.gpx
All going well you’ll get something back similar to.
test.gpx: 24 ms (7478 elems, 2498 attrs, 0 spaces, 38613 chars)
This means that everything checked out ok. Otherwise, it will let you know the lines that have errors, making it quick and easy to open in a text editor to edit or delete the corrupt elements. One trick I’ve noted is that by default, a Colorado GPX has no line breaks in it. A trick here is to search for /trkpt><trkpt and replace it with /trkpt>\r<trkpt – this will insert a linebreak, ensuring that each trkpt element starts on a new line and SAXCount can refer to it by line number for easier identification.
Protecting your privacy uploading tracklogs to public sites
I have become interested in the ways that you can protect potentially private or sensitive information that may be contained in tracklogs uploaded to any public site. I am primarily writing this article from an OSM perspective, but it is really valid for any site that you may upload a tracklog to.
A GPX tracklog consists of a lot of sections of code that look something this – a trackpoint.
<trkpt lat="-43.502053000" lon="172.576317000">
<ele>16.480000</ele>
<time>2008-05-06T08:37:46Z</time>
</trkpt>
A trackpoint contains two key pieces of information – the time (in UTC – the Z after the time refers to this), and the location in latitude and longitude. A whole pile of these trackpoints are then added together to produce a tracklog. This of course presents a privacy risk as anyone that has access to the tracklog might be able to assume that the person that uploaded the tracklog was at that location at the time specified. And with GPS, this can be recorded to a high level of accuracy.
So, what we need to do is look at ways to protect some of this information. I’ll write here about two techniques that I have used to protect information in tracklogs by editing them before uploading them to public websites. For most public websites, the most important information is location, and time is less important. So we need to take a two-pronged approach to tracklog privacy protection.
1. Delete track points that we might have privacy concerns with.
2. Remove timestamps that we don’t want people knowing the time we were there.
Deleting Trackpoints
1. Protecting it manually. I have been using the free GPSTrackMaker to load and edit tracklogs before uploading them to OSM. This is a manual and sometime laborious process. I use this to remove any trackpoints around the final locations of puzzles/multi-caches that I have visited, and also to remove trackpoints close to home/home/friends etc. I also use it to touch up the tracklogs such as those areas that spray trackpoints around a wide area that don’t mean anything – such as in urban canyons in Wellington. This can result in quite a ‘rich’ tracklog, especially if you delete those areas where the trackpoints are not that accurate due to GPS signal error.
2. Automating deletion of trackpoints. There are also a number of locations that one may always want to remove from a tracklog before making it publicly available. Locations such as home and work spring to mind. I was looking for a way to automate the removal of these locations using GPSBabel. Using nothing more than co-ordinates near your home and a radius, you can easily set up a filter to remove all points that fall with the circle using the following GPSBabel command. Note that the following command is needlessly complex as a little workaround is required to use the radius filter on trackpoints (you have to convert tracks to waypoints, do the radius filter on waypoints, and then convert the waypoints back to tracks – ugly but it works).
gpsbabel -t -i gpx -f in.gpx -x transform,wpt=trk -x nuketypes,tracks -x radius,distance=0.3K,lat=-43.0,lon=172.5,exclude,nosort -x transform,trk=wpt -x nuketypes,waypoints -x track,pack,split=30m,title="LOG %Y%m%d" -o gpx -F out.gpx
It is possible to build a batch file that removes multiple locations such as home, work and friends, that requires very little input. Note that this process does not destroy the original tracklog that you keep, rather it creates a new tracklog with the sensitive data removed.
Removing Timestamps
For whatever reason, it makes some sense to also remove timestamp information from tracklogs – I won’t go into the reasons here. Here is a little unix script that I use to change the timestamp information. Usually I don’t mind people knowing what day I was somewhere, but I’m not that keen on them always knowing the time. So, I will remove either minutes/seconds, or minutes/seconds/hours as have every timestamp appear as midnight.
If you want to set it so that all times are set to the start of the hour e.g. hh:00:00, use this.
#!/bin/sh
for f in *.gpx; do
sed 's/:[0-9][0-9]:[0-9][0-9]Z/:00:00Z/g' < $f > ${f%.gpx}-clean.gpx
done
If you want to set it so that all times are set to midnight e.g. 00:00:00, use this.
#!/bin/sh
for f in *.gpx; do
sed 's/T[0-9][0-9]:[0-9][0-9]:[0-9][0-9]Z/T00:00:00Z/g' < $f > ${f%.gpx}-clean.gpx
done
Naturally, this isn’t the easiest to do, but it is getting easier. It would be great if someone was able to write a tool/webpage that was able to do this sort of cleaning of tracklog data before uploading it to public websites.