I have a peeve. I HATE it when people share URLs with me without trimming off all the marketing and tracking cruft. I get it. People don’t normally pay a lot of attention to the URLs of the web pages they visit. This common state of oblivion is what makes phishing so easy and is a delight to bad actors and Black Hats everywhere.
Needless to say, I’m not normal. I spent years working as a web developer before I switched over to cybersecurity, so I paid a lot of attention to URLs before I knew anything about phishing or SQL injection or XSS because I knew that every character in a URL has meaning, and I knew how much meaning can be packed into a very short string.
To see what I mean, let’s take a look at web link I received in an email newsletter.
There is a lot going on in that URL, so let’s chunk it down.
|scmagazine.com||Secondary and Top-Level domains|
|?||Separator delimiting file path from query parameters|
Source identifier (newsletter)
Use these to separate the key/value pairs
Advertising or marketing medium (email)
Individual campaign name, slogan, promo code, etc.
I have no idea
|email_hash=75aa69764ab2d99999cd218434208ec9||Hash of the email address the email newsletter was sent to|
I have no idea
The first three segments in the breakdown above are the “meat” of the URL. They relate to the blog article itself, and they are all I need to share with friends, family, and random internet passersby. What matters for this discussion, however, is everything that follows the question mark (?).
Web app penetration testers are very familiar with the ? as it is used in URLs, and with the use of key/value pairs to send queries to the database(s) behind a website. In this case, we aren’t sending a query to a product database in the hope of finding a discounted copy of the Encyclopedia Galactica. The key/value pairs in the URL above are there to track my interactions with a specific email marketing campaign.
For those who are not familiar with Google Analytics campaigns and conversion tracking, the keys that begin with “utm_” are all Google related. Google Analytics uses five different parameters for ad campaign tracking: utm_source, utm_medium, utm_campaign, utm_term, and utm_content. The first three are pretty self-explanatory and are described above, but the last two appear in URLs created by folks with slightly more advanced tracking-fu.
- utm_term identifies paid search keywords. This is helpful for organizations that pay to have their ads appear when specific keywords are used in Google searches.
- utm_content is used to differentiate similar content, or links within the same ad. For example, if you have two call-to-action links within an email message, you can use utm_content and set different values for each of them to track which version got more clicks.
Google is not the only enemy when it comes to cruft stuffed into URLs, though. As you can see from the URL we are analyzing, the key/value pairs used can include a hash of your email address, the words used to perform a search, and who knows what all else. And that is over and above query parameters used to interact with a product database on a shopping site.
Why does any of this matter?
I’m so glad you asked!
If I go to a website like Amazon to find a product, Amazon stuffs the URLs of the pages I visit with the keywords I used in my search and a whole bunch of other stuff, including a query ID. If I then share the unsanitized URL on social media, every person who clicks that link becomes associated with my original query. Similarly, if I share an unsanitized link like the one dissected above, I am sharing data about me and about that marketing campaign with everyone who sees or interacts with that URL.
To recap, here is the original, tracking cruft laden URL:
And here is the sanitized version:
And for you visual learners, here is a graphic breakdown, similar to the breakdown table above.
So, now that you know, take a moment to trim the tracking crud before you share that link to your favorite cute cat pictures!
Google. (n.d.). Collect campaign data with custom URLs. Retrieved August 9, 2019, from https://support.google.com/analytics/answer/1033863?hl=en
MDN Web Docs (Mozilla). (n.d.). What is a URL?. Retrieved August 9, 2019, from https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL
Wikipedia. (2001, May 16). URL. Retrieved August 9, 2019, from https://en.wikipedia.org/wiki/URL#Syntax
Wikipedia. (2015, September 27). UTM parameters. Retrieved August 9, 2019, from https://en.wikipedia.org/wiki/UTM_parameters
 Some of the values have been changed for security purposes. Did you really think I wouldn’t sanitize things? 😉