The Meta project wants to upgrade Wikipedia with artificial intelligence

  • Facebook
  • Twitter

The Meta project wants to upgrade Wikipedia with artificial intelligence

A mind-bogglingly large, multi-language encyclopedia with millions of articles, Wikipedia is one of the largest collaborative projects in human history, involving more than 100,000 volunteer human editors. Each month, Wikipedia adds upwards of 17,000 new articles while tweaking and modifying its existing corpus. A Wiki article can be edited thousands of times, reflecting the very latest research, insights, and up-to-date information.

The challenge, of course, lies in accuracy. Wikipedia is proof positive that humans can work together to create something positive when large numbers of people come together. Wikipedia articles must be backed up by facts to be genuinely useful and not a sprawling graffiti wall of unsubstantiated claims. Citations play a crucial role here.

Wikipedia users and editors can confirm facts by adding or clicking hyperlinks that trace statements back to their sources - and for the most part, this works very well. As an example, let's say I want to confirm the statement on President Barack Obama's Wikipedia page that he visited Europe and Kenya in 1988, where he met many of his paternal relatives.

When I look at the citations for the sentence, I see that three separate book references seem to confirm the fact. The hyperlinks do not support my alternative facts at all, but rather link to unrelated pages on Digital Trends. It99.9 percent of readers who have never met me may leave this article with a variety of false impressions, not the least of which is the surprisingly low barrier to entry into modeling.

Cites themselves appear to be factual endorsements in a world of hyperlinked information overload, in which we increasingly swim in what Nicholas Carr calls "The Shallows."However, what if Wikipedia editors add citations even if they don't link to pages that support the claims?

Using Joe Hipp as an example, a recent Wikipedia article described Hipp as the first Native American boxer to challenge for the WBA World Heavyweight title and provided a link. InDespitehis, neither Joe Hipp nor boxing are mentioned on the page in question.

A solution has been figured out by Meta for this issue. Meta AI, the social media giant's AI research and development lab, developed what it claims is the first machine learning model that can scan hundreds of thousands of citations at once to see if they support the corresponding claims in partnership with the Wikimedia Foundation.

This is not the first bot Wikipedia has used, but it may be the most impressive. As a result of training using 4 million Wikipedia citations, Meta's new tool can analyze information linked to a citation and cross-reference it with supporting evidence. In addition, this isn't just a simple text string comparison.

The lexical similarity between the claim and the source is important, but that's an easy case," Petroni said. This model builds an index of all these web pages by chunking them into passages and providing accurate representations for each passage. This is not representing the passage word-by-word, but it's meaning.

The resulting n-dimensional space where all these passages are stored will store two chunks of text with similar meanings in very close proximity."A multimedia presentation could, at least in theory, be included in this category as well as text-based content. The system may be able to direct users to a great documentary on YouTube. It is possible that an image somewhere online contains the answer to a particular claim.

The challenge is not the only one. At present, no attempt has been made to independently grade the quality of sources cited. It's a thorny issue on its own. As an example, would a brief, throwaway mention of a subject in the New York Times be a more appropriate, high-quality reference than a more comprehensive, but less-reputable one?

Is it better for a mainstream publication to rank higher than a non-mainstream publication? Google's trillion-dollar PageRank algorithm, the most famous algorithm ever based on citations, had this built into its model by equating high-quality sources with those that had a large number of links pointing to them. Meta's AI does not have anything like this at the moment.

A tool like that would be necessary for this AI to function effectively. Imagine attempting to "prove" the most egregious, reprehensible opinion for inclusion on a Wikipedia page. Any claim could technically prove correct - no matter how wrong it may be - if all that needs to be shown is that similar sentiments have been published elsewhere online.

" modeling explicitly the trustworthiness of a source, the trustworthiness of a domain," Petroni said. Wikipedia already has a list of domains that are considered trustworthy and domains that are not. We should find a way to promote these algorithmically rather than having a fixed list."

More Tech