URL Hacking (or “How to Sanitize Your URLs”)

I have a peeve. I HATE it when people share URLs with me without trimming off all the marketing and tracking cruft. I get it. People don’t normally pay a lot of attention to the URLs of the web pages they visit. This common state of oblivion is what makes phishing so easy and is a delight to bad actors and Black Hats everywhere.

Needless to say, I’m not normal. I spent years working as a web developer before I switched over to cybersecurity, so I paid a lot of attention to URLs before I knew anything about phishing or SQL injection or XSS because I knew that every character in a URL has meaning, and I knew how much meaning can be packed into a very short string.

To see what I mean, let’s take a look at web link I received in an email newsletter.

https://www.scmagazine.com/home/security-news/vulnerabilities/bug-bounty-program-set-up-101-dont-be-afraid-its-all-good/?utm_source=newsletter&utm_medium=email&utm_campaign=SCUS_Newswire_20190809&hmSubId=Wv0jNMPpFa01&email_hash=75aa69764ab2d99999cd218434208ec9&mpweb=1325-8069-1031450

There is a lot going on in that URL, so let’s chunk it down.

String Fragment[1]Use/Function/Meaning
https://Internet protocol
wwwSub-domain
scmagazine.comSecondary and Top-Level domains
/home/security-news/vulnerabilities/bug-bounty-program-set-up-101-dont-be-afraid-its-all-good/File path
?Separator delimiting file path from query parameters
utm_source=newsletterTracking code:
Source identifier (newsletter)
&Query delimiter
Use these to separate the key/value pairs
utm_medium=emailTracking code:
Advertising or marketing medium (email)
utm_campaign=SCUS_Newswire_20190809Tracking code:
Individual campaign name, slogan, promo code, etc.
hmSubId=Wv0jNMPpFa01Tracking code:
I have no idea
email_hash=75aa69764ab2d99999cd218434208ec9Hash of the email address the email newsletter was sent to
mpweb=1325-8069-1031450Tracking code:
I have no idea

The first three segments in the breakdown above are the “meat” of the URL. They relate to the blog article itself, and they are all I need to share with friends, family, and random internet passersby. What matters for this discussion, however, is everything that follows the question mark (?).

Web app penetration testers are very familiar with the ? as it is used in URLs, and with the use of key/value pairs to send queries to the database(s) behind a website. In this case, we aren’t sending a query to a product database in the hope of finding a discounted copy of the Encyclopedia Galactica. The key/value pairs in the URL above are there to track my interactions with a specific email marketing campaign.

For those who are not familiar with Google Analytics campaigns and conversion tracking, the keys that begin with “utm_” are all Google related. Google Analytics uses five different parameters for ad campaign tracking: utm_source, utm_medium, utm_campaign, utm_term, and utm_content. The first three are pretty self-explanatory and are described above, but the last two appear in URLs created by folks with slightly more advanced tracking-fu.

  • utm_term identifies paid search keywords. This is helpful for organizations that pay to have their ads appear when specific keywords are used in Google searches.
  • utm_content is used to differentiate similar content, or links within the same ad. For example, if you have two call-to-action links within an email message, you can use utm_content and set different values for each of them to track which version got more clicks.

All five Google tracking codes depend on the Javascript embedded in sites using Google Analytics to capture and relay the tracking data back to the Google mothership. Even without using campaign tracking key/value pairs, Google Analytics captures a scary amount of data about your online behaviors. The tracking codes, however, increase the level of detail in the behavior tracking to an even greater degree.

Google is not the only enemy when it comes to cruft stuffed into URLs, though. As you can see from the URL we are analyzing, the key/value pairs used can include a hash of your email address, the words used to perform a search, and who knows what all else. And that is over and above query parameters used to interact with a product database on a shopping site.

Why does any of this matter?

I’m so glad you asked!

If I go to a website like Amazon to find a product, Amazon stuffs the URLs of the pages I visit with the keywords I used in my search and a whole bunch of other stuff, including a query ID. If I then share the unsanitized URL on social media, every person who clicks that link becomes associated with my original query. Similarly, if I share an unsanitized link like the one dissected above, I am sharing data about me and about that marketing campaign with everyone who sees or interacts with that URL.

To recap, here is the original, tracking cruft laden URL:

https://www.scmagazine.com/home/security-news/vulnerabilities/bug-bounty-program-set-up-101-dont-be-afraid-its-all-good/?utm_source=newsletter&utm_medium=email&utm_campaign=SCUS_Newswire_20190809&hmSubId=Wv0jNMPpFa01&email_hash=75aa69764ab2d99999cd218434208ec9&mpweb=1325-8069-1031450

And here is the sanitized version:

https://www.scmagazine.com/home/security-news/vulnerabilities/bug-bounty-program-set-up-101-dont-be-afraid-its-all-good/

Before I became a tracking-cruft-cutting-evangelist, I tended to trim the fat off of links I shared because doing so makes them cleaner and less frightening. Then I went into cybersecurity, and it was all over! Now I’m on a mission to help people take back a smidgen of their web surfing anonymity. I know it’s a losing battle, courtesy of all the cookies, Javascripts, and 1×1 pixel tracking bugs in everything from web pages to emails, but it’s a battle I’m willing to fight because the more people look at the URLs they share with others, the more aware they become about the links they click in general. And that’s a desired outcome I can get behind.

And for you visual learners, here is a graphic breakdown, similar to the breakdown table above.

So, now that you know, take a moment to trim the tracking crud before you share that link to your favorite cute cat pictures!

Cute kitteh wearing a rose wreath.
Isn’t she lovely?


References

Google. (n.d.). Collect campaign data with custom URLs. Retrieved August 9, 2019, from https://support.google.com/analytics/answer/1033863?hl=en

MDN Web Docs (Mozilla). (n.d.). What is a URL?. Retrieved August 9, 2019, from https://developer.mozilla.org/en-US/docs/Learn/Common_questions/What_is_a_URL

Wikipedia. (2001, May 16). URL. Retrieved August 9, 2019, from https://en.wikipedia.org/wiki/URL#Syntax

Wikipedia. (2015, September 27). UTM parameters. Retrieved August 9, 2019, from https://en.wikipedia.org/wiki/UTM_parameters


[1] Some of the values have been changed for security purposes. Did you really think I wouldn’t sanitize things? 😉

Leave a Reply

Your email address will not be published. Required fields are marked *