November 18, 2022
Presenting Narralegal, the largest language model for legal texts in Spanish
By Sofía Sánchez González
Law volumes and tomes are not exactly two-page pamphlets. The number of words and paragraphs we have just to describe a law is staggering. No wonder lawyers can talk and talk for hours with hardly any interruption. But of course, this is a problem if we want our artificial intelligence model to process a legal text. This is why we’re presenting Narralegal, the largest language model for legal texts in Spanish.
Too many tokens for so little model
Most of the artificial intelligence models created to date for legal issues could process up to 512 tokens. What is the problem with this? Obviously, legal texts have more than 512 words. Many more. A thousand tokens/words minimum. What could be a solution? Divide the texts into several parts. But this solution is very inefficient.
At Narrativa, our NLP engineer Manuel Romero aimed to find a solution to the problem so that contents would not have to be cut every time we introduced text to the machine. This is how Narralegal was born, converting immediately into the most powerful model in Spanish for legal texts.
Narralegal to the rescue
The Narrativa model adapts to the nature of legal texts. It is capable of processing texts with 4,096 tokens (yes, you read that correctly) compared to the 512 that other models were capable of. Thus, the model is able to understand more context. This is a very important advance because broad context models make more sense than other types of domains.
This model can be used for several types of tasks:
- Summarization
- Questions and answers
- Translation
- Semantic search
- Legal entities recognition
If you run a law firm and want to streamline your company’s processes, give it a try! But not only that, if your company works with legal texts on a day-to-day basis (as is the case with large corporations) this model could also be useful.
Where did we get the dataset to train it?
The dataset comes from the Barcelona Supercomputing Center. This organization has in its power a corpus of various websites related to Law and legal articles. In fact, they had already created a model previously, RoBERTtalex, but with the inconvenient limitation of 512 tokens.
On that basis, we have created a more powerful model. Narralegal has been retrained on the corpus only using the longest documents. That’s why Narralegal is the largest language model for legal texts in Spanish
If you want to try it, you already know that in our Hugging Face account you can find all our open-source models. Here is the link, where you will also find some details about Narralegal’s training.
Artificial intelligence in Spanish
At Narrativa we are committed to two main causes:
- Democratization of artificial intelligence. This technology has got to be not only in the hands of large corporations but also in the general population. Initiatives like Hugging Face are essential for this.
- Promoting Spanish. At Narrativa we want people from all parts of the world and all cultures to be able to benefit from natural processing models. But artificial intelligence still has to learn Spanish. More models have to be developed to close the gap between English and Spanish.
About Narrativa
Narrativa is an internationally recognized content services company that uses its proprietary artificial intelligence and machine learning platforms to build and deploy digital content solutions for enterprises. Its technology suite, consisting of data extraction, data analysis, natural language processing (NLP) and natural language generation (NLG) tools, all seamlessly work together to power a lineup of smart content creation, automated business intelligence reporting and process optimization products for a variety of industries.
Contact us to learn more about our solutions!
Share