This is the second of a two-part series on Google’s spam filtering system and what the new messages teach us. In the first column I explained what Google has changed and looked at the phishing/forgery disposition and how to avoid it.
This week we’ll examine the spam disposition. These are messages that Google has tagged as “Spam” not because they’re fraudulent or forged (though they may be that) but because Google considers them to be unwanted by their intended recipient.
Like the phishing/forgery disposition, there are just a few reasons given for treating a message as spam. However, those few reasons have some important implications.
Messages that indicate this disposition include:
- It contains content that’s typically used in spam messages.
- It’s similar to messages that were detected by our spam filters.
- You previously marked messages from firstname.lastname@example.org as spam.
1. It contains content that’s typically used in spam messages. For a while there content filtering was virtually dead. Spammers became very adept at avoiding keywords and phrases and at obfuscating their content. So much so that content filters often caught more innocent bystanders than actual spam.
Industry professionals have been saying for a while though that content filtering has seen a renaissance and here’s proof. But this isn’t the content filtering of yore. It’s not just simplistic filtering of keywords and phrases but much more complex statistical analysis.What that means is that you’re not likely to get filtered just for using a trigger word or phrase and you will be unable to determine whether a message will be filtered based on a list of trigger words or phrases. Unexpected content combinations may trigger content filtering and which ones may vary over time. So you must test your messages to see where they actually end up.
2. It’s similar to messages that were detected by our spam filters. One could call this guilt by association. Google doesn’t provide great detail as to what similarities it looks at but one can reasonably presume the usual suspects are involved.
Source. The Internet Protocol (IP) address the message came from. This may be considered with surrounding addresses (your Internet neighborhood). If other messages from the same address (or neighborhood) look like spam, yours may be classified that way too. This is particularly problematic for small senders on shared servers.
Headers. Particularly the From and ReplyTo addresses and domains. Splitting your email across multiple IP addresses or even email service providers (ESPs) may be given away by your domains and addresses.
Content. Beyond just keywords and phrases, links can be a cause of filtering. If URLs in your message are the same as or similar to that of spam emails, you may find your messages filtered.
3. You previously marked messages from email@example.com as spam. On the surface this is an obvious item, but there is some subtlety here that bears consideration.
As a recipient, of course you expect that all future email from a sender you marked as a spammer will be hidden from you. As a sender though, if you were not registered for a feedback loop or failed to process it properly, weren’t sent the complaint notification, or with some ISPs if there is no notification, you will forever more be marked as spam for that recipient.
Once a relatively small number of users start marking your email as spam, the data shows that much or all of your email will start being bulked. Now consider the relationship of this rule with the similarity rule in item two and the negative impact these users will have on your engagement metrics. The more recipients on your list who have marked your message as spam in the past, the harder it’s going to be for you to stay out of the spam folder in the future.
This may be a good reason to consider sunsetting long-term inactive users.
It’s interesting to note also that while these reasons tell us a fair bit about Google’s spam filtering, there is much that is left out. There is, for example, no mention of reputation even though we know reputation is heavily used by almost all major Internet service providers (ISPs). Clearly Google is trying to keep the explanations simple and accessible to the layman, but it’s also keeping very quiet about the details of how it makes decisions in order to prevent spammers gaming the system.
All the same, if you’re getting filtered at Gmail, consider the reasons given and you may be able to more quickly remedy the situation.
Until next time,