I’ve been building out a bit of a news aggregator — basically just trying to develop for myself an ongoing collection of articles that give me hope in the future of humanity. As I’ve been indexing articles, I noticed a peculiar thing that I had hoped was already resolved; duplicate content.
As my aggregator churned through results, I noticed that on many occasions, the same title for articles from different sites was coming through. I investigated more and found that the articles were exactly the same as well, and these were pretty well-known websites, not just some dude in his basement stealing other people’s content. The interesting thing is that I did a Google search, and not only did I find that there were additional articles that were also the same, but that all of these articles were ranked far above the original source of the article, which several of the sites referenced.
Honestly my little endeavor to build something that would make me feel better about things, honestly did the opposite. For years I had heard that duplicate content — taking content from another site and posting it on your own (or even having two pieces of your own content at different urls), would negatively impact your site, but here Google is ranking multiple sites that have taken content from another site above the person who actually created the content.
Yeah, yeah, perhaps the author gave the ok for these other sites to use it, and most of the other sites included a link to the source at the bottom of the article, but this is still wrong. For the first, Google should still rank the original creator above the duplicators. For the second, just because you’re citing the original creator, doesn’t mean you can just steal their work.
With Google’s power, they likely can tell who was the first to author a piece of content based on their indexing, but even if they can’t, since serveral of the websites listed the source; Google should still give the higher ranking to them. It would also probably be pretty easy to tell what sites are guilty of taking others’ content — track how much of a site’s content is posted on multiple other sites, watch content creation dates, see what this or other sites are listing as the original source, and I think you could pretty easily figure out if a site is guilty of pulling content from elsewhere — and then penalize the site across the board in the ranking algorithm.
I’m saddened to see that others are making money off of other people’s hard work, and that the largest search engine in the world (by far), is completely fine with it.
I often times dream of building out my own search engine — I honestly think about it far more than I should, since there is in no way that it could really compete with the other offerings out there, but someday it may be a fun thing to work on. I’ll throw these thoughts into the mix when building it, to do my best to ensure that those who create good quality content on the web are rewarded for it.