Parasites of the Blogosphere: an AdDense problem

April 10, 2006

This morning I was reading the Veoh Vs Video Bloggers post on Om Malik’s Blog; it is about unauthorized videoblog’s content acquisition and republishing. I followed the link to the We The Media website where the story is more detailed and I recommend reading this as it gives you enough information for start making up your mind about this rising phenomenon.
What phenomenon? Om Malik called it credit-less remixing or Wholesale Blog Plagiarism, or in other terms, the activity of the parasites of the blogsphere. A parasite is defined as

  1. Biology. An organism that grows, feeds, and is sheltered on or in a different organism while contributing nothing to the survival of its host.
    1. One who habitually takes advantage of the generosity of others without making any useful return.
    2. One who lives off and flatters the rich; a sycophant.

which fits very nicely in the observed behaviors.

Again, reading the posts on GigaOM and following the links therein will give you a better picture.

Am I not myself a parasite? I try not to be. I do feed on a number of blogs, but I try to contribute some and give credits. It may be that someone drops on my blog (maybe using the Next Blog » thing of wordpress) and gets to know about this story from here, but then she is likely to go to the sources and proceed from there. Next time she will first look at the sources, at the guy at the bottom.

It is not just a matter of rightness and intellectual property, it is also a matter of money. Om Malik writes:

Clearly, these sites ONLY exist because they can make money from Google AdSense.

How all this embroils Google is further discussed here and here and somewhere else.

How to spot para-sites?
If someone is stealing content to hope making money with Google AdSense, she may rearrange the text to look different and that can make hard for a computer program to detect counterfeit that would be obvious to a human reader (at least to those knowing the original). But synthesis ought to be as complex as analysis is, that is to say changes in text (content) will cause changes in both Google search’s results and what AdSense understands about the meaning of your content. As Google puts it:

Google’s complex, automated methods make human tampering with our results extremely difficult.

So it is likely that the keywords that command how Google grasps the meaning of contentbe still there (in the counterfeit).
Let us call O the original content and C its counterfeit and say they are AdSense Equivalent, that is O and C generate the same set of eligible advertisers, A, on their respective ad spaces. In other terms, AdSense(O) = AdSense(C) = A and the set of all contents is partitioned into AdSense equivalence classes.
Now imagine that given a set of eligible advertisers, B, it would be possible to determine the set of contents CB that generate B, that is AdSense(cb) = B, for each cb in CB. We may call it the reverse AdSense or AdDense : AdDense(B) = AdSense−1(B) = CB.
Finally, consider a (new type of) PageRank™ that ranks identities or consumer-generated URLs (and not just the hosting site). Let’s call it a URLRank.
Now we have all we need (so to say) to determine and rank the guys at the bottom of an ad.
The process would flow as follows (user U drops by on URL U1 where content C is hosted):

  1. AdSense generates a list of eligible advertisers, A, for the ad space on (U1,C).
  2. AdDense generates the equivalence class for A: ECA = AdDense(A).
  3. URLRank ranks all contents in ECA.
  4. The set of URLRank-ranked contents {C, ECA} compete for authority on A.
  5. monetization of Ads by Google on U1 is computed in real-time based on the ranking calculated in step 4: the less authority (U,C) has on A, the less its monetization is set.

The naive idea is that AdSense should generate less earnings on counterfeit content by introducing (fair) competion among the potential owners of an ad space.
A zero-sum game variation could also be considered where the earnings for the current AdSense program would be distributed across the set {U, ECA} (see above) according to the authority. If things work well, para-sites may start contributing something to the survival of their host.

Ok, stop dreaming and back to work.

technorati tags:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: