What I learned building a fact-checking startup – TechCrunch


After the 2016 US election, I set out to build a product that could tackle the scourge of fake news online. My initial hypothesis was simple: build a semi-automatic fact-checking algorithm that could automatically highlight any false or dubious claims and suggest the best-quality contextual facts for it. Our thesis was clear, though perhaps utopian: if technology could drive people to seek truth, facts, statistics, and data to make their decisions, we could build an online discourse of reason and rationality instead of hyperbole.

After five years of hard work, Factmata has had some successes. But for this space to truly thrive, there are a host of barriers, from economic to technological, that still need to be overcome.

Key challenges

We quickly realized that automated data verification represents an extremely difficult research problem. The first challenge was defining what facts we were checking. Next, I was thinking about how we could build and maintain up-to-date databases of facts that would allow us to assess the accuracy of the given claims. For example, commonly used Wikidata The knowledge base was an obvious choice, but it updates too slowly to check for claims about rapidly changing events.

We also found that being a for-profit fact-checking company was a hurdle. Most fact-checking and journalism networks are non-profit, and social media platforms prefer to work with non-profit organizations to avoid accusations of bias.

Beyond these factors, building a business that can rate what is “good” is inherently complex and nuanced. The definitions are infinitely debatable. For example, what people called “fake news” often turned out to be extreme hyperpartisanship, and what people proclaimed as “misinformation” were actually contrary opinions.

Therefore, we concluded that detecting what was “bad” (toxic, obscene, threatening, or hateful) was a much easier route from a business point of view. Specifically, we decided to detect harmful “gray area” text – content that a platform is not sure of should be removed but needs additional context. To achieve this, we created an API that rates the harmfulness of comments, posts, and news articles by their level of hyperpartisanship, controversy, objectivity, hatred, and 15 other signals.

We found it valuable to track all complaints evolving online on relevant corporate issues. So, beyond our API, we created a SaaS platform that tracks rumors and evolving “narratives” on any topic, whether it be about a brand’s products, government policy, or COVID-19 vaccines.

If this sounds complicated, it is because it is. One of the most important lessons we learned was how little $ 1 million is spent in seed funding in this space. Training the data around validated hate speech and false claims is no ordinary tagging task – it requires subject matter expertise and precise deliberations, neither of which is cheap.

In fact, creating the tools we needed, including various browser extensions, website demos, a data tagging platform, a social news comment platform, and real-time live dashboards of our AI output, It was similar to creating several new companies at the same time. .

To further complicate matters, finding the fit between product and market was a very difficult journey. After many years of construction, Factmata has turned towards safety and brand reputation. We sell our technology to online advertising platforms looking to clean up their ad inventory, brands looking to reputation management and optimization, and smaller-scale platforms looking to content moderation. It took us a long time to reach this business model, but in the last year we have finally seen several clients sign up for trials and contracts each month, and we are on target of $ 1 million in recurring revenue by mid-2022.

What to do

Our journey demonstrates the myriad of barriers to building a social impact business in the media space. As long as virality and eye-catching are the metrics of online advertising, search engines, and news sources, change will be difficult. And small businesses can’t do it themselves; they will need both regulatory and financial support.

Regulators must step up and start enacting strong laws. Facebook and Twitter have made great strides, but online advertising systems are far behind and emerging platforms have no incentive to evolve differently. At this time, there is no incentive for companies to moderate any non-illegal speech on their platforms: damage to reputation or fear of losing users is not enough. Even the most ardent advocates of free speech, like myself, recognize the need to create financial incentives and bans for platforms to really take action and start spending money to reduce harmful content and promote ecosystem health.

What would an alternative look like? Bad content will always exist, but we can create a system that promotes better content.

No matter how flawed, algorithms have an important role to play; they have the potential to automatically evaluate online content for its “goodness” or quality. These “quality scores” could be the basis for creating new social media platforms that are not based on ads, but used to promote (and pay for) content that is beneficial to society.

Given the scope of the problem, immense resources will be needed to build these new scoring algorithms; even the most innovative startups will struggle without tens, if not hundreds, of millions of dollars in funding. It will require several companies and nonprofits, providing different versions that can be integrated into people’s news sources.

The government can help in a number of ways. First, you should define the rules around “quality”; Companies trying to solve this problem should not be expected to develop their own policies.

The government should also provide funding. Government funding would allow these companies to avoid diluting their objectives. It would also encourage companies to open their technologies to public scrutiny and create transparency around flaws and biases. Technologies could even be encouraged to be released to the public for free and available use and ultimately provided for public benefit.

Lastly, we must embrace emerging technologies. There have been positive advances by platforms to seriously invest in the deep technology required to perform content moderation effectively and sustainably. The advertising industry, four years later, has also advanced in the adoption of new brand security algorithms such as Factmata, the Global Disinformation Index and Newsguard.

Although initially skeptical, I am also optimistic about the potential for cryptocurrency and the token economy to introduce a new form of funding and encourage good quality and verified media to prevail and distribute at scale. For example, tokenized systems “experts” can be encouraged to verify claims and efficiently scale data tagging for AI content moderation systems without companies needing large up-front investments to pay for tagging.

I don’t know if the original vision I proposed for factmata, as the technological component of a fact-based world, will ever come true. But I am proud to have tried and am hopeful that our experiences can help others chart a healthier direction in the ongoing battle against misinformation and misinformation.

Read more from TechCrunch's Global Affairs Project


Please enter your comment!
Please enter your name here