August 4, 2020

Artificial Intelligence learns Spanish

By Narrativa Staff

Spanish is still considered a foreign language when it comes to artificial intelligence. Although the most widespread virtual assistants (Siri, Cortana, Alexa, etc.) do speak the language of Cervantes, it is not their mother tongue. English dominates in the world of tech, and artificial intelligence is no exception.

The Barcelona National Supercomputing Center (BSC) is putting their MareNostrum supercomputer to work to address this. Over the last few months, it has been collecting a huge amount of data from the Spanish National Library’s web archive, including .es websites, blogs, images, videos, documents and forums. The database aims to preserve and provide access to Spanish documentary heritage on the Internet.

Currently, the database holds more than 45 terabytes of files (and the transfer is still not complete).

What is the objective?

The main aim of all this data collection is to generate a language model for Spanish, something which is a fundamental basis for artificial intelligence. A language model reproduces the use of language and, in the words of Quim Moré, researcher from the CASE department of the BSC, “allows us to know the real meaning of words, even entire sentences, because the data is in context and has more information, so it makes more sense. In other words, the objective is not only for AI to understand what we are saying but how we are saying it, deciphering the different linguistic twists and turns in the way we express ourselves”.

The Secretary of State for Digital Advancement commissioned this, using the Plan for the Promotion of Language Technologies as a framework, in order to generate a new Spanish language model through natural language processing technologies.

This tool already exists for English in the form of Google Bert. This tool, based on artificial intelligence, allows Google algorithms to better understand the language that people use when searching for something. Not only does it look at the key search terms, but also the context. For example, when entering the word ‘book’, the user could be looking for something to read, or they could be looking to make a hotel reservation.

Narrativa – A pioneer in the use of Spanish

According to the newspaper El País, Spanish still represents less than 30% of the world market for natural language processing technologies. As such, Narrativa stands out as a pioneering startup that has been generating news in Spanish since it began. While Gabriele’s automatic content can be written in any language, our AI has a special predilection for Spanish.

But it’s not only the written word that has experienced such a change under this new model. Virtual assistants will soon be speaking almost like humans. Now they can play music or set alarms, but in the future they will be taking notes from your History class or even buying you a shirt.

As such, this new model delivers a number of benefits: machine translation, content descriptions and cybersecurity.

Why does artificial intelligence need to learn Spanish?

Spanish is a language that is growing exponentially. While it is a relative newcomer to the world of science and technology when compared to English, the data seems to indicate that things are changing. To illustrate this, we will leave you with some interesting figures about the Spanish language compiled by the Cervantes Institute:

Spanish in the world:

  • Almost 483 million people speak Spanish as a first language
  • Currently, 7.6% of the world’s population is Spanish-speaking
  • Spanish as a first language is second only to Mandarin Chinese in terms of number of speakers

Spanish on the Internet:

  • Spanish is the third most used language on the Internet, after English and Chinese
  • 8.1% of internet communication occurs in Spanish
  • It is the second most used language on Wikipedia, Facebook, Twitter and LinkedIn. Of the 580 million users of LinkedIn, 55 million use Spanish to some extent

With these figures in mind, it is impossible to discount Spanish as a key player in the world of science and technology. In a few years, this will become clearer.

Share

Book a demo to learn more about how our Generative AI content automation platform can transform your business.

Book a demo to learn more about how our Generative AI content automation platform can transform your business.