Has Facebook really done something deviant with their emotional feed experiment? Sure, it is despicable and unethical, but it wasn’t truly different from what is done on social sites every day. The truth is that all social media, even search engines, filter what we see. From thousands, even millions, of possible posts or articles, only a tiny few have been selected for us to view. As long as you don’t understand the filtering you are being manipulated.
I’d like to look at why this problem exists from a technical view, how subjective filtering is unavoidable. I’ll also look at how the business model welcomes trolls and encourages production of a poor quality product.
The technical problem
Volume is the key problem. So many accounts. So many posts. There is simply no way to store, nor present a full list to the users. It’s hard to even know what the users actually want to see.
Grouping and filtering of data has to start right at the collection phase. Collection includes direct users posts, posts from apps, scanning of other services, or any source of data. To deal with the volume the data must be segmented into groups, topics, users or by other means. Some of the data will be discarded immediately, either because it is low interest, or junk.
All of that is highly subjective. There is no correct way to store and collate such data. The actual methods are also viewed as trade secrets. Partly this is a mechanism to prevent spamming: the mechanisms simply aren’t strong enough to be fully transparent and prevent spam. It also allows service changes to optimize the hardware, deal with load, and provide new features. There is a claim of business value as well: the algorithms used are not necessarily trivial and could of significant interest to competitors.
I think most users can appreciate the technical problems. The user may reasonably expect the service is working on a best effort basis to cope with the limitations and provide relevant, timely, and accurate results. There is unfortunately no incentive for the service to actually do that.
You are the product
Social providers, and search engines, have one product: you. They earn money off of you, but you aren’t paying them. This is what creates a divide in expectations and results.
Advertising is the most obvious way to monetize users. Ads and promoted links are presented alongside, or mixed directly in the results. The more you click, the more you see, the more money the service earns. The incentive for the business is to maximize this revenue.
The posts, or results, we see is one continuous A/B test. The service will continually try different sourcing techniques, filters, and collation strategies. The key performance indicator is revenue.
The metric products
This leads to the second, less obvious, revenue source. User metrics. Facebook and Twitter are essentially large databases of user activity. By combing over this data one can assemble a portrait of community opinion and gauge response to major, or minor, events.
It’s not even very hard to get useful data. Simple keyword searches are enough to reveal trending topics and basic sentiment.
For example, consider a fast food chain, say Burger King, since they appear to have a strong social media presence. They host some event, make a video of it, and then use viral marketing to spread it. They setup a “Burger King” keyword filter at DataSift and start grabbing all of the posts everywhere about them. These results can be sorted in simple good or bad categories using keywords such as “hate”, “stupid”, “like”, “awesome”, etc.
Two key observations can be made from this. One, your posts matter. All that noise that people complain about on Twitter and Facebook have value. You may never read it. It may not even appear in your feed. But rest assured somebody is deriving value from it.
The second observation is that viral marketing doesn’t fail. Even a terrible campaign will yield all sorts of interesting data. The company will simply take this into account, modify their campaign approach, and try again. It’s also not so hard to do limited releases and gauge response in smaller communities. This allows you to avoid doing that big viral push with a truly bad promotion.
This product is of huge value to the social firms. Motivating you to post thus becomes a key goal. As much as you may dislike reading it, all those asinine posts improve their product. It’s the reason why no effort is made stop trolling. Those trolls drive commentary, thus they are research gold.
We come back to the technical limitations now. The claim of trade secrets and fighting spam gives these companies a shield to hide behind. Lacking a gold standard of what should be done, it’s easy for them to pretend the results are good. So long as services don’t reveal what they are doing, it is completely open to manipulation.
The only way to have an accountable service is to have the collection, ranking, and collation algorithms explained openly. Without knowing what filters are in place, you have no idea if results are being censored. Without seeing placement details, you have no idea if the results are legitimate or paid promotion. Without being able to review spam, you have no assurance that it really was spam.
Might this cause a problem for spamming. Maybe. Maybe not. There’s already so much spam it’s hard to say if spam filters are doing a good job now. At some point we have to admit the basic solutions are no longer working and look to something new anyway.
We can probably also ignore the trade secrets claims. Most of the core algorithms are widely researched. Additionally, while the competing services may not be as good, it’s not like they’re terrible either. Often it’s just a matter or resources and user base, not clever algorithms.
For many people social services represent a key part of their life. And we must group search engines in here as well, as Google is probably worse for transparency than Facebook is. If we don’t demand that these services give details about how they work we’ll be forever subjected to their manipulation.