Twitter’s algorithm ranking factors: A definitive guide
A review of many of Twitter’s patents reveals some indications of criteria that are influential to rankings that are not readily apparent.
Chris Silver Smith on July 1, 2022 at 8:00 am | Reading time: 37 minutes
Twitter patents and other publications reveal likely aspects of how tweets become promoted in the timeline feeds of users.
Some of Twitter’s timeline ranking factors are very surprising, and adjusting your approach to Tweeting may help you to gain greater visibility of your Tweets.
Based upon a number of key patents and other sources, I have outlined a number of probable ranking factors for Twitter’s algorithm herein.
The Twitter timeline
Twitter first began using an algorithm-based timeline back in 2016 when it switched from what was purely a chronological feed of Tweets from all the accounts one followed. The change ranked users’ timelines to allow them to see “the best Tweets first.” Twitter has since experimented with variations of this up to the present.
A feed-based algorithm for social media is not unusual. Facebook and other social media platforms have done the same.
The reasons for this change to an algorithmic mix of timeline Tweets are pretty clear. A purely personal, chronological timeline composed of only the accounts one has followed is very siloed and therefore limited – while introducing posts from accounts beyond one’s direct connections has the potential to increase the time one spends on the platform, which in turn increases overall stickiness, which in turn increases the worth of the service to advertisers and data partners.
Various interest classifications of users and interest topics associated with their accounts and tweets further enables potential for advertisement targeting based upon user demographics and content topics.
Twitter power users may have developed some intuitions about various Tweet factors that can result in greater visibility within the algorithm.
A reminder about patents
Corporations register patents all the time for inventions that they do not actually use in live service. When I worked at Verizon, I personally wrote a number of patent drafts for various inventions that my colleagues and I developed in the course of our work – including things that we did not end up using in production.
So, the fact that Twitter has patents that mention ideas for how things could work does not at all guarantee that that is how things do work.
Also, patents typically contain multiple embodiments, which are essentially various ways in which an invention could be implemented – patents attempt to describe the key elements of an invention as broadly as possible in order to claim any possible use that could be attributed to it.
Finally, just as with the famous PageRank algorithm patent that was the foundation of Google’s search engine, in instances where Twitter has used an embodiment from one of their patents, it is highly likely that they have changed and refined the simple, broad inventions described, and will continue to do so.
Even despite all this typical vagueness and uncertainty, I found a number of very interesting concepts in the Twitter patent descriptions, many of which are highly likely to be incorporated within their system.
Twitter and Deep Learning
One additional caveat before I proceed involves how Twitter’s timeline algorithm has incorporated Deep Learning into its DNA, coupled with various levels of human supervision, making it a frequently, if not constantly, self-evolving beast.
This means that both large changes and small, incremental changes, can and will be occurring in how it performs content ranking. Further, this machine learning approach can lead to conditions where Twitter’s own human engineers may not directly know precisely why some content is displayed or outranks other content due to the abstraction of ranking models produced, similar to what I described when writing about models produced by Google’s quality ranking through machine learning.
Despite the complexity and sophistication of how Twitter’s algorithm is functioning, understanding the factors that likely go into the black box can still reveal what influences rankings.
Twitter’s original timeline was simply composed of all the Tweets from the accounts one has followed since one’s last visit, which were collected and displayed in reverse-chronological order with the most recent Tweets shown first, and each earlier Tweet shown one after another as one scrolled downward.
The current algorithm is still largely composed of that same reverse-chronological listing of Tweets, but Twitter performs a re-ranking to try to display the most-interesting Tweets first and foremost out of recent Tweets.
In the background, the Tweets have been assigned a ranking score by a relevance model that predicts how interesting each Tweet is likely to be to you, and this score value dictates the ranking order.
The Tweets with highest scores are shown first in your timeline list, with the remainder of most-recent Tweets shown further down. It is notable that interspersed in your timeline are now also Tweets from accounts you are not following, as well as a few advertisement Tweets.
Twitter’s connection graph
First of all, one of the most influential aspects of the Twitter timeline is how Twitter is now displaying Tweets based upon not only your direct connections at this point, but essentially what is your unique social graph, which Twitter refers to in patents as a “connection graph”.
The connection graph represents accounts as nodes and relationships as lines (“edges”) connecting one or more nodes. A relationship may refer to associations between Twitter accounts.
For example, following, subscribing (such as via Twitter’s Super Follows program or, potentially, for Twitter’s announced subscription feature for keyword queries), liking, tagging, etc. – all of these create relationships.
Relationships in one’s connection graph may be unidirectional (e.g., I follow you) or bidirectional (e.g., we both follow each other). If I follow you, but you do not follow me, I would have a greater expectation of seeing your Tweets and Retweets appearing in my timeline, but you would not necessarily expect to see mine.
Simply based on the connection graph, you are likely to see Tweets and Retweets from those you have followed, as well as Tweets your connections have Liked or Replied to.
The Twitter algorithm has expanded Tweets you may see beyond those accounts that you have directly interacted-with. The Tweets you may see in your timeline now also include Tweets from others who are posting about topics you have followed, Tweets similar in some ways to Tweets you have previously Liked, and Tweets based on topics that the algorithm predicts you might like.
Even among these expanded types of Tweets you may get, the algorithm’s ranking system applies – you are not receiving all Tweets matching your topics, likes, and predicted interests – you are receiving a list curated through Twitter’s algorithm.
Within the DNA of a number of Twitter’s patents and algorithm for ranking Tweets is the concept of “interestingness.”
This was quite likely inspired by a patent granted to Yahoo In 2006 called “Interestingness ranking of media objects”, which described the ranking methods used in the algorithm for Flickr (the dominant social media photo-sharing service that has been subsequently eclipsed by Instagram and Pinterest).
That earlier algorithm for Flickr bears a great many similarities to Twitter’s contemporary patents. It used similar and even identical factors for computing interestingness. These included:
- Location info.
- Content meta data.
- User access patterns.
- Signals of interest (such as tagging, commenting, favoriting).
One could easily describe Twitter’s algorithm as taking the Flickr interestingness algorithm, expanding upon some of the factors involved, computing it through a more sophisticated machine learning process, interpreting content based upon natural language processing (NLP), and incorporating a number of additional variations to enable rapidity for presentation in near real-time for a gargantuan number of users simultaneously.
Twitter ranking and spam
It is also of interest to focus some on methods used by Twitter to detect spam, spam user accounts, and to demote or suppress spam Tweets from view.
The policing for disinformation, other policy-violating content, and harassment is likewise intense, but that does not necessarily converge as much with ranking evaluations.
Some of the spam detection patents are interesting because I see users frequently running aground of Twitter’s spam suppression processes quite unintentionally, and there are a number of things one may do that result in sandbagging efforts to promote and interact with Twitter’s audience. Twitter has had to build aggressive watchdog processes to police and remove spam, and even the most prominent users can run afoul of these processes from time to time.
Thus, an understanding of Twitter’s spam factors can be important as they can cause one’s Tweets to get deductions from interestingness they would otherwise have, and this loss in the relevancy scores can reduce the visibility and distribution power of your Tweets.
Twitter ranking factors
So, what are the factors mentioned in Twitter’s patents for assessing “interest”, and which influence how Twitter scores Tweets for rankings?
Recency of the Tweet posting
With more recent being generally much more preferred. Aside from specific keyword and other types of searches, most Tweets would be from the last few hours. Some “in case you missed it” Tweets may also be included, which appear to range primarily over the last day or two.
Images or Video
In general, in general, Google and other platforms have indicated that users tend to prefer images and video media more, so a Tweet containing either might get a higher score.
Twitter specifically cites image and video cards, which refers to websites that have implemented Twitter Cards, which enables Twitter to easily display richer preview snippets when Tweets contain links to webpages with the card markup.
Tweets with links that show images and video are generally more engaging to users, but there may be an additional advantage for Tweets linking to the pages with the card markup for displaying the card content
Interactions with the Tweet
Twitter cites Likes and Retweets, but additional metrics related to the Tweet would also potentially apply here. Interactions include:
- Clicks to links that may be in the Tweet
- Clicks to hashtags in the Tweet
- Clicks to Twitter accounts mentioned in the Tweet
- Detail Expands – clicks to view details about the Tweet, such as to view who Liked it, or Retweeted it.
- New Follows – how many people hovered over the username and then clicked to follow the account.
- Profile visits – how many people clicked the avatar or username to visit the poster’s profile.
- Shares – how many times the Tweet was shared via the share button.
- Replies to the Tweet
While most impressions come from the display of the Tweet in timelines, some impressions are derived when Tweets are shared through embedding in webpages. It is possible that those impressions numbers might also affect the interestingness score for the Tweet.
Likelihood of Interactions
One Twitter patent describes computing a score for a Tweet representing how likely it is that followers of the Tweet’s Author in the social messaging system will interact with the message, the score being based on the computed interaction level deviation between the observed interaction level of Followers of the Author and the expected interaction level of the Followers.
Length of Tweet
One type of classification is the length of the text contained in the Tweet, which could be classified as a numerical value (e.g. 103 characters), or it could be designated as one of a few categories (e.g., short, medium, or long).
According to topics involved with a Tweet, it might be assessed to be more or less interesting – for some topics, short might be more beneficial, and for some other topics, medium or long length might make the Tweet more interesting.
Previous Author Interactions
Past interactions with the author of a Tweet will increase the likelihood (and ranking score in one’s timeline) that one will see other Tweets by that same author.
These social graph interaction metrics can include scoring by the origin of the relationship.
So, a past history of replying-to, liking, or Retweeting an author’s Tweets, even if one does not follow that account, can increase the likelihood one will see their latest Tweets.
There is a likelihood that the recent of one’s interactions with a Tweet author may also factor into this, so if you have not interacted with one of their Tweets for a long time, potential visibility of their newer Tweets may decrease for you.
In the context of the algorithm, “author” and “account” are essentially used to mean the same thing, so Tweets from a corporate account are treated the same as Tweets from an individual.
Author Credibility Rating
This score can be calculated by an author’s relationships and interactions with other users.
The example given in the patent is that an author followed by multiple high profile or prolific accounts would have a high credibility score.
While one rating value cited is “low”, “medium”, and “high”, the patent also suggests a scale of rating values from 1 to 10, and it can include a qualitative and/or quantitative factor.
I would guess that a range like 1 to 10 is much more likely. It seems likely that some of the spam assessment values could be used to subtract from an Author Credibility Rating. More on potential spam assessment factors in the latter portion of this article.
It is possible that authors that are assessed to be more relevant for a particular topic may have a higher Author Relevancy value. Also, mentions of an Author may make them more relevant in the context of the Tweets mentioning them.
The patents also speak about associating Authors with topics, so it is possible that Authors that Tweet involving specific topics on a frequent basis, along with good engagement rates, may be deemed to have higher relevancy when their Tweets involve that topic.
Tweets may be classified based on properties of the Author. These metrics may influence the relative interestingness of the Author’s messages. Such Author Metrics include:
- Location of the Author (such as City or Country)
- Age (based upon the birthdate that can be given in account details)
- Number of Followers
- Number of Accounts the Author Follows
- Ratio of Number of Followers to Accounts Followed, as a larger number of Followers compared to Followed conveys greater popularity along with the raw Followers number. A ratio closer to 1 would indicate a quid pro quo following philosophy on the part of the Author, making it less possible to infer popularity and lending an appearance of artificial popularity.
- Number of Tweets Posted by the Author per Time Period (for example: per-day, or per-week).
- Age of the Account (months since account opened, for instance) – with accounts that have been set up very recently given much lower weight.
Tweets get classified according to the topics they involve. There are some very sophisticated algorithms involved in classifying the Tweets.
Twitter users often have selected topics to be associated with their accounts, and you will obviously be shown popular Tweets from the topics you have selected. But, Twitter also automatically creates topics based off of keywords found in Tweets.
Based on your interactions with Tweets and the accounts you follow, Twitter is also predicting topics that you would likely be interested in, and showing you some Tweets from those topics despite you not formally subscribing to the topics.
Twitter’s system is highly complex, and allows custom ranking models to potentially be applied to Tweets for particular topics and when particular phrases are present.
Twitter has a large staff that works to develop models for particular “customer journeys”, and this would appear to coincide with patent descriptions of how editors could set rules on topic-oriented posts and keywords or phrases in posts.
For instance, posts containing text about “hiring now” or “will be on TV” might be considered boring for a topic, while phrases like “fresh”, “on sale”, or “today only” might be given greater weight as they could be predicted to be more interesting.
This could be quite difficult to cater to, as there is a huge field of potential topics and custom weightings that could be applied.
One recent job posting at Twitter for a Staff Product Designer, Customer Journey described how the position would help:
“Whether you’re looking for Ariana Grande fanart, #herpetology, or extreme unicycling, it’s all happening on Twitter. Our team is responsible for helping new members navigate the diverse array of public conversations happening on Twitter and quickly find a sense of belonging…”
“Gather insights from data and qualitative research, develop hypotheses, sketch solutions with prototypes, and test ideas with our research team and in experiments.”
“Document detailed interaction models and UI specifications.”
“Experience designing for machine-learning, rich taxonomies, and / or interest graphs.”
This description sounds very similar to what’s described in Twitter’s patent for “System and method for determining relevance of social content” where:
“Editors might set rules on classifying certain phrases as more or less interesting…”
“…an editor may decide that some phrases and attributes are interesting in all content, regardless of the category of place that authors the content. For instance, the phrase ‘on sale’ or ‘event’ may be interesting in all cases and a positive weight may be applied.”
One patent describes how Tweets detected to have commercial language could be assigned a lower score than Tweets that did not have commercial language. (Contrarily, such weights could be flipped if the user was conducting searches indicating an interest in purchasing something, so that Tweets containing commercial language could be given a higher weight.)
Time of Day
Time of day can be used to impact relevancy. For instance, a rule could be implemented to lend more weight to Tweets mentioning “Coffee” between 8:00am to 10:00am, and/or to Tweets posted by coffee shops.
Patents describe how “place references” in Tweets could invoke greater weight for Tweets about a place, and/or to accounts associated with the place reference versus other accounts that merely mention the place. Also geographic proximity between the location of a user’s device and location associated with content items (the Tweet text, image, video, and/or Author) can increase or decrease potential relevancy.
Language of the Tweet can be classified (e.g., English, French, etc.).
The language may be determined automatically using various automated language assessment tools.
A Tweet in a particular language would be of more interest to speakers of the language and of less interest to others.
Tweets can be classified based on whether they are replies to previous Tweets. A Tweet that is a reply to a previous Tweet may be deemed less interesting than a Tweet concerning a new topic.
In one patent description, the topic of a Tweet could determine whether the Tweet will be designated to be displayed to another account or included in other accounts’ message streams.
When you are viewing your timeline, there are instances where some of a Tweet’s replies are also displayed with the main Tweet – such as when the Reply Tweets are posted by accounts you follow. In most cases, the Reply Tweets will be only viewable when one clicks to view the thread, or click the Tweet to view all the Replies.
This is an odd concept, that I believe might not be in production.
Twitter describes Blessed Accounts as being identified within a particular conversation’s graph, where the original Author in a conversation would be deemed “blessed”, and out of the subsequent replies to the original post, any of the Replies that is subsequently replied-to by the blessed account becomes “blessed” as well.
Those Tweets posted by Blessed Accounts in the conversation would be given increased relevance scores.
This is not mentioned in Twitter patents, but it makes too much sense in context of all the other factors they have mentioned to pass up.
A lot of major content websites frequently have their links shared on Twitter, and Twitter could easily create a website profile reputation/popularity score that also could factor into the rankings of Tweets when links to content on the websites is posted.
News sites, information resources, entertainment sites – all of these could have scores developed from the same factors used to assess Twitter accounts. Tweets from better-liked and better-engaged-with websites could be given greater weight than relatively unknown and less-interacted-with websites.
Yes, if you suspected the blue badge next to usernames conveys preferential treatment, there is specific verbiage in one of Twitter’s patents that confirms they have at least considered this.
Since Verified accounts often already have various other popularity indicators associated with them, it is not readily apparent if this factor is in-use or not. Tweets posted by an account that is Verified may be given a higher relevance score, enabling them to appear more than unverified accounts’ Tweets.
Here is the patent description:
“In one or more embodiments of the invention, the conversation module (120) includes functionality to apply a relevance filter to increase the relevance scores of one or more authoring accounts of the conversation graph which are identified in a whitelist of verified accounts. For example, the whitelist of verified accounts can be a list of accounts which are high-profile accounts which are susceptible to impersonation. In this example, celebrity and business accounts would be verified by the messaging platform (100) in order to notify users of the messaging platform (100) that the accounts are authentic. In one or more embodiments of the invention, the conversation module (120) is configured to increase the relevance scores of verified authoring accounts by a predefined amount/percentage.”
This is a binary flag indicating whether the Tweet has been identified as containing a topic that was trending at the time the message was broadcasted.
App Detected Gender, Sexual Orientation & Interests
Twitter may be able to use an account holder’s mobile device information to infer Gender of the account holder, or infer interests in topics such as News, Sports, Weight Training, and other topics.
Some mobile devices provide information upon other apps loaded on the phone for purposes of diagnosing potential application programming conflicts. Thus, some Tweets matching your Gender, Sexual Orientation, and Topical Interests could be given more interestingness points simply based upon inferences made from your phone’s apps. (See: https://screenrant.com/android-apps-collecting-app-data/ )
And more ranking factors
Twitter states that:
“Our list of considered features and their varied interactions keeps growing, informing our models of ever more nuanced behavior patterns.”
So this list of factors is likely something of an underrepresentation of the factors they may be using, and their list may be expanding.
Also imagine that a custom combination of some of the above factors may be applied as models for Tweets associated with particular topics, lending a large potential complexity to rankings through machine learning methods. (Again, the machine learning applied to create rank weighting models custom to particular queries or topics is very similar to methods that are likely in use with Google.)
Twitter has stated that the scoring of Tweets happens each time one visits Twitter, and each time one refreshes their timeline. Considering some of the complex factors involved, that is very fast!
Twitter uses A/B testing of weightings of ranking factors, and other algorithm alterations, and determines whether a proposed change is an improvement based on engagement and time viewing/interacting with a Tweet. This is used to train ranking models.
The involvement of machine learning in this process suggests that ranking models could be produced for many specific scenarios, and potentially specific to particular topics and types of users. Once developed, the model can get tested, and if it improves engagement, it can get rapidly rolled-out to all users.
How marketers can use this information
There are a lot of inferences that can be drawn from the list of potential ranking factors, and which can be used by marketers in order to improve their Tweeting tactics.
A Twitter account that only posts announcements about its products and promotional information about its company will likely not have as much visibility as accounts that are more interactive with their community, because interactions produce more ranking signals and potential benefits.
Social media experts have long recommended an approach of blending types of posts rather than merely publishing self-referential promotion – these strategies include “The Rule of Thirds”, “The 80/20 Rule”, and others.
The Twitter ranking factors likely support these theories, as eliciting more interactions with numbers of Twitter users is likelier to increase an account’s visibility.
For instance, a large company account with many followers could post an interesting poll to get advice on what features to add to its product. The votes and comments posted by users will make it such that the respondents will be much more likely to see the company’s next posting due to the recent interactions, and that next posting could be promoting or announcing something new. And, the respondents’ followers might also be more likely to see the company’s next posting, since Twitter appears to factor-in that users with similar interests may be more open to seeing content matching their interests.
Also, the factors suggest a number of potentially beneficial approaches.
When posting a Tweet promoting a product or making an announcement, combining something to elicit a response from one’s followers could easily expand exposure on the platform as each respondent’s replies to your Tweet may increase the odds that their direct followers may see the original Tweet and their connection’s reply Tweet.
Leveraging the social graph aspect of Twitter’s algorithm can help to increase the interestingness of your Tweets, and can increase exposure of your Tweets for other users.
Spam factors can negatively impact tweet rankings
Spam detection algorithms can negatively impact Tweet ranking ability.
For one thing, Twitter is very fast to suspend accounts that are blatantly spamming, and in cases where it is obvious and unequivocal, one can expect the account to get terminated abruptly, causing all of its Tweets to disappear from conversation graphs and timelines, and causing the account profile to be no longer available to view.
In yet other instances where it is not as clear whether an account is spamming, the account’s Tweets could simply be demoted by application of negative rank weight scores, or the Tweets could get locked or suspended until or if the account holder takes a corrective action or verifies their identity.
For example, a Twitter account with a long history of good Tweets might abruptly begin posting Viagra ads or links to malware, such as if an established account became hacked. Twitter might temporarily suspend the account until corrective actions were taken, such as passing a CAPTCHA verification, or receiving a verification code via cellphone and changing passwords. Another example could be a new user that accidentally passes over some threshold of following too many accounts within a short timeframe, or posting a little too frequently.
Twitter employs a number of methods for detecting spam and sidelining it so users see it less.
Much of the automated detecting relies upon detecting a combination of account profile characteristics, account Tweeting behaviors, and content found in the account’s Tweets.
Twitter has developed numbers of characteristic spam “fingerprints” in order to perform rapid pattern detection. One Twitter patent describes how:
“Spam is determined by comparing characteristics of identified spam accounts, and building a ‘similarity graph’ that can be compared with other accounts suspected of spam.”
Tweets identified as potentially containing spam could be flagged with a binary value like “yes” or “no”, and then Tweets that are flagged can get filtered out of timelines.
It is equally possible for there to be a scale of spamminess, computed from multiple factors, and once a Tweet or account surpasses a threshold, it then suffers demotion. I think it is worthwhile to include mention of these as Twitter users may not understand the implications of how the use the platform. For example, posting one overly-aggressive Tweet might negatively impact an account’s subsequent Tweets for some period of time. Repeated edgy behavior could result in worse, such as complete account deletion, with no opportunity to recover.
I will add a few factors here that are not specifically mentioned in Twitter patents or blog posts because Twitter does not reveal all spam identification factors for obvious reasons. But, some spam and spam account characteristics seem so obvious that I am adding a few from personal observations or from well-regarded research sources to provide a wider understanding of what can incur spam demotions.
Spam factors & other negative ranking factors
- Tweets containing a commercial message posted without a follower/followee relationship or in a unidirectional relationship (the Tweet’s Author is following the account it is mentioning but the receiving account does not follow the Author), but they have not had previous interactions, begins to seem suspicious. If this is done many times with similar or identical text, it will not take long for this to be deemed to be spam activity, especially for newer accounts.
- Account Age – where the age shows the account has been set up very recently. (SparkToro’s recent research on Twitter spam suggests account age of 90 days or less.)
- Account NSFW Flag – the account has a flag indicating it has been identified for linking to websites documented in a blacklist of potentially offensive sites (such as sites having porn, explicit materials, gore, etc).
- Offensive Flag – the Tweet has been identified as containing one or more terms from a blacklist of offensive terms.
- Potentially Fake Account – the account is suspected of impersonating a real person or organization, and has not been verified.
- Account Posting Frequent Copyright Infringement
- Blacklisting – One patent suggests use of a blacklist that will apply a relevance filter to decrease the relevance scores of accounts that can include but are not limited to: spammers, potentially fake accounts, accounts with a potential or history of posting adult content, accounts with a potential or history of posting illegal content, accounts flagged by other users, and/or meeting any other criteria for flagging accounts.
- Account Bot Flag – identifying that the account broadcasting the Tweet has been IDed as potentially being operated by a software application instead of by a human. This particular criteria has a number of implications involved, particularly for those accounts that have used types of scheduling applications for posting Tweets, or other software that generates automated Tweets. For instance, scheduling too many Tweets to be posted per time period through an app like Hootsuite or Sprout Social can result in the user account getting suspended, or its app access via the Twitter API to get suspended. This can be particularly galling, as if the same number of Tweets per time period were posted manually, the account would not run into issues. There has long been a believe among marketers on Facebook as well as Twitter that the respective algorithms might dumb-down visibility for posts published through software versus via manually, and this component suggests that that very well could be the case with Twitter.
- Tweets containing offensive language might be allowed to erode their interestingness score.
- Tweets posted via Twitter’s APIs, such as through social media management tools that rely upon Twitter’s API, are generally subject to greater scrutiny as Twitter has described “The problem may be exacerbated when a content sharing service opens its application programming interface (API) to developers.” My observation is that accounts that rely solely upon third-party posting applications and APIs – particularly newer accounts – may see their distribution ability somewhat sandbagged. Newer accounts should work to become established through human usage for an initial period before relying more upon scheduling and posting applications, and even established accounts may see greater distribution potential if they mix some human manual posting in combination with their scheduled/automated/third-party-application posts.
- Accounts Dormant for a Long Period – Accounts that have not posted for a long time, and then suddenly spring to life do not immediately have the ranking ability they otherwise might. The reason for this is that spammers sometimes may successfully hijack inactive accounts in order to subvert a previously bona fide account into posting spam.
- Device Profile Associated With Spammer or Other Policy Violator – Essentially, patents suggest that Twitter is using Browser Fingerprinting and Device Fingerprinting to detect spammers and other bad players. Fingerprinting enables tech services to generate profiles of a combo of data that would include things like IP address, device ID, user agent, browser plugins, device platform model and version, and app downloads to create unique “fingerprints” to identify specific devices. A major takeaway from this is that if you have two or more Twitter accounts you use with your phone or browser, if you perform abusive Tweeting through one of those accounts, there is the very real possibility that it could impair rankings in a more “professional” account you operate on the same device. In a worst-case scenario, it could even get you locked-out of both accounts for what you may do on one. This has pretty serious implications for companies and agencies that have employees conducting professional Tweets, while they may switch on their device to posting personal Tweets as well. Some types of Tweets that could cause issues would include: Spam, Harassment, False or Misleading Info, Threats, repeated Copyright Infringement, posting Malware links, and likely more. While I theorize that a personal account could also get a professional account suspended on the same device, I would hazard a guess that it might only suspend the professional account for that particular device holder, and the professional account could be subsequently accessed through a different device.
- Lack of other app usage data – It is very possible that Twitter may be able to receive data from mobile devices that indicates if the device operator has downloaded or recently used other apps on the device beyond just the Twitter app. (See: https://screenrant.com/android-apps-collecting-app-data/ ) A common spam account characteristic is that they do not reflect other app usage because the device is primarily dedicated to spamming Twitter and is not showing human usage characteristics. Or, the account is hosted on a webserver instead of a mobile device, and is attempting to imitate the usage profile of a human user.
- Blocks – accounts that other users have blocked numerous times, or accounts that have been blocked over a particular time frame can be indicative of a spam account.
- Frequency of Tweets – if a number of Tweets sent from the same account in a given time frame exceeds a threshold amount, then that account may be flagged as spam and denied from sending subsequent Tweets. This is not a hard-and-fast rule, or it is variable in application, because there are larger, corporate accounts with many staff members handling posting of Tweets to a large customer base, such as in the case of American Airlines. There are accounts such as this which are added to whitelists to avoid automatic suspension due to the large volumes of Tweets they may post within short time frames.
- High Volume of Tweets with the Same Hashtag or Mentions of the Same @Username – Obviously, high-volume Tweets are risky, and increasing your volume within short timeframes will inch your account closer and closer to being deemed to be that of a spammer. Thus, attempting to overwhelm the timeline of a particular Hashtag will be deemed to be annoying and potentially spammy. Likewise, insisting upon gaining the attention of a particular account by mentioning them repeatedly will begin to appear annoying, unnecessary, abusive harassment, and/or spammy.
- CAPTCHA – If suspected of spam, the service may prevent a Tweet from being written-to or published, requiring the user account to first pass a CAPTCHA challenge to establish that the account is operated by a human. (My agency has encountered this as we have set up new accounts on behalf of clients. This is more likely to happen when the computer that is used to set up the account has been used recently to set up other accounts, and the account is set up using free email service accounts instead of through mobile phones. Twitter also often requires sending a mobile text message to confirm a phone number before unblocking the account.)
- Bulk-Follow of Verified Accounts – Spam accounts will often bulk-follow prominent and/or Verified accounts in order to establish a foothold in the social graph. When setting up a Twitter account for a real, human user before, we used to follow a handful of the Verified accounts suggested by Twitter during the signup process. Oddly enough, this behavior alone can cause an account to get suspended until a CAPTCHA or other verification is passed. So, the takeaway here is do not follow all that many accounts suggested to you in the signup process if you are setting up a new account. Definitely do not use one of those automated follow services that people used to use a lot years ago, or your account could get downgraded in relevancy or suspended.
- Few Followers – Spam accounts are often newer, and because they often do not promote themselves in ways beneficial to the community they inspire very few followers. So, a low follower account can be one factor in combination with others to identify a potentially spammy user.
- Irrelevant Hashtags in Reply Tweets – Hashtags in Tweets that do not involve the original Tweet’s topic.
- Tweets Containing Affiliate Links – self explanatory.
- Frequent Requests to Befriend Users in a Short Time Frame
- Reposting Duplicate Content Across Multiple Accounts – Especially duplicate content posted close in time.
- Accounts that Tweet Only URLs
- Posting Irrelevant or Misleading Content to Trending Topics/Hashtags
- Erroneous or Fictitious Profile Location – For example, a profile location showing “Poughkeepsie, NY”, but the user’s IP is China, would produce an apparent mismatch indicating a potential scammer or spammer account.
- Account IP Address Matching Abuser Account Ranges, or Country Locations that Originate Greater Amounts of Abuse – For example, Russia. Likewise, commonly known proxied IP addresses are easily detectable by Twitter, and are flagged as suspect.
- Default Profile Image – Human users are more likely to set up customized account images (“avatars”), so not setting one up and continued use of Twitter’s default profile image is a red flag.
- Duplicated Profile Image – A profile image duplicated across many accounts is a red flag.
- Default Cover Image – Failure to set up a custom cover image in the profile’s masthead is not as suspicious as continued use of a default profile image, but use of a different masthead image is more representative of a real account.
- Nonresolving URL in Profile – SparkToro suggests this, and it does align with many spam accounts. Sometimes this is because spammers may be more likely to set up websites that are likely to be suspended, or typosquatting domains intended to create Trojan horse websites which can also get suspended.
- Profile Descriptions Matching Spammer Keywords/Patterns
- Display Usernames Conform To Spam Patterns – Usernames that are meaningless alphanumeric sequences, or proper names followed by multiple numeric digits reflect a lack of imagination upon the part of spammers who may be attempting to register hundreds of accounts in bulk, with each name generated randomly, or each username generated by adding the next number in a sequence. Example: John32168762 is the sort of username that most humans find undesirable.
- Patterns – Profile and Tweet patterns used by spammers often reveal spammer accounts. For instance, if numbers of accounts with default Twitter profile pics and similar patterned display usernames all Tweet out links to a particular page or domain, those accounts all become extremely easy to identify and sideline.
Simply listing out spam identification factors sharply understates Twitter’s sophisticated systems used for spam identification and spam management.
Major Silicon Valley tech companies have often fought spam for years now, and it has been described as a sort of arms race.
The tech company will create a method to detect the spam, and the spammers then evolve their processes to elude detection, and then the cycle repeats again, and again.
Twitter’s patents illustrate a huge sophistication in terms of employing components of Artificial Intelligence, social graph analysis, and methods that combine synchronous and asynchronous processing in order to deliver content extremely rapidly.
The AI components include:
- Neural networks.
- Natural language processing.
- Circumflex calculation.
- Markov modeling.
- Logistic regression.
- Decision tree analysis.
- Random forest analysis.
- Supervised and unsupervised machine learning.
As the ranking determinations can be based upon unique, abstracted, machine learning models according to specific phrases, topics, and interest profiling, what works for one area of interest may work a little differently for other areas of interest.
Even so, I think that looking at these many potential ranking factors that have been described in Twitter patents can be useful for marketers who want to attain greater exposure on Twitter’s platform.
I served this year as an expert witness in arbitration between a company that sued Twitter for unfair trade practices, and the case was amicably settled recently.
As an expert witness, I am often privy to secret information, including private communications such as employee emails within major corporations, as well as other key documents that can include data, reports, presentations, employee depositions and other information.
In such cases, I am bound by legal protective orders and agreements not to disclose information that was revealed to me in order to be sufficiently informed on the matters I am asked to opine upon, and this was no exception.
I have not disclosed any information covered by the protective order in this article from my recently-resolved case.
I have gained a greater understanding and insights into some aspects of how Twitter functions from context, observations of Twitter in public use, logical projections based on their various algorithm descriptions and from reading Twitter’s patents and other public disclosures subsequent to the resolution of the case I served upon, including the following sources:
- Identifying relevant messages in a conversation graph
- Providing content for broadcast by a messaging platform
- Promoting content in a real-time messaging platform
- System and method for determining relevance of social content
- Systems and methods for establishing or maintaining a personalized trusted social network
- Displaying relevant messages of a conversation graph
- Search infrastructure
- Visibility filtering
- Prioritizing Messages Within a Message Network
- Application graph builder
- Using Deep Learning at Scale in Twitter’s Timelines
- Multi-tiered anti-spamming systems and methods
- Detecting scripted or otherwise anomalous interactions with social media platform
- How Twitter is fighting spam and malicious automation
- Suspended Accounts in Retrospect: An Analysis of Twitter Spam
- Twitter Analysis: 19.42% of Active Accounts Are Fake or Spam
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.