November 18, 2022

Presenting Narralegal, the largest language model for legal texts in Spanish

By Sofía Sánchez González

Law volumes and tomes are not exactly two-page pamphlets. The number of words and paragraphs we have just to describe a law is staggering. No wonder lawyers can talk and talk for hours with hardly any interruption. But of course, this is a problem if we want our artificial intelligence model to process a legal text. This is why we’re presenting Narralegal, the largest language model for legal texts in Spanish.

Too many tokens for so little model

Most of the artificial intelligence models created to date for legal issues could process up to 512 tokens. What is the problem with this? Obviously, legal texts have more than 512 words. Many more. A thousand tokens/words minimum. What could be a solution? Divide the texts into several parts. But this solution is very inefficient.

At Narrativa, our NLP engineer Manuel Romero aimed to find a solution to the problem so that contents would not have to be cut every time we introduced text to the machine. This is how Narralegal was born, converting immediately into the most powerful model in Spanish for legal texts.

Narralegal to the rescue

The Narrativa model adapts to the nature of legal texts. It is capable of processing texts with 4,096 tokens (yes, you read that correctly) compared to the 512 that other models were capable of. Thus, the model is able to understand more context. This is a very important advance because broad context models make more sense than other types of domains.

This model can be used for several types of tasks:

Summarization
Questions and answers
Translation
Semantic search
Legal entities recognition

If you run a law firm and want to streamline your company’s processes, give it a try! But not only that, if your company works with legal texts on a day-to-day basis (as is the case with large corporations) this model could also be useful.

Where did we get the dataset to train it?

The dataset comes from the Barcelona Supercomputing Center. This organization has in its power a corpus of various websites related to Law and legal articles. In fact, they had already created a model previously, RoBERTtalex, but with the inconvenient limitation of 512 tokens.

On that basis, we have created a more powerful model. Narralegal has been retrained on the corpus only using the longest documents. That’s why Narralegal is the largest language model for legal texts in Spanish

If you want to try it, you already know that in our Hugging Face account you can find all our open-source models. Here is the link, where you will also find some details about Narralegal’s training.

Presenting Narralegal, the largest language model for legal texts in Spanish

Artificial intelligence in Spanish

At Narrativa we are committed to two main causes:

Democratization of artificial intelligence. This technology has got to be not only in the hands of large corporations but also in the general population. Initiatives like Hugging Face are essential for this.
Promoting Spanish. At Narrativa we want people from all parts of the world and all cultures to be able to benefit from natural processing models. But artificial intelligence still has to learn Spanish. More models have to be developed to close the gap between English and Spanish.

About Narrativa

Narrativa is an internationally recognized content services company that uses its proprietary artificial intelligence and machine learning platforms to build and deploy digital content solutions for enterprises. Its technology suite, consisting of data extraction, data analysis, natural language processing (NLP) and natural language generation (NLG) tools, all seamlessly work together to power a lineup of smart content creation, automated business intelligence reporting and process optimization products for a variety of industries.

Contact us to learn more about our solutions!

Share

MORE INSIGHTS

Book a demo to learn more about how our Generative AI content automation platform can transform your business.

Request a demo

Book a demo to learn more about how our Generative AI content automation platform can transform your business.

Request a demo

PLATFORM

Gen. AI Platform

Knowledge Graph

Security & privacy

Insights

Announcements

🤗 Hugging Face Repository

FEATURED

Narrativa’s Generative AI Platform User Guide

Educational Videos

INDUSTRIES

Life Sciences

Financial Services

Marketing & Ecommerce

Media & Entertainment

LIFE SCIENCES USE CASES

Clinical Study Reports

Patient Narratives

Tables Listings and Figures

Redaction and Anonymization

PRICING

Life Sciences

Financial Services

Marketing & Ecommerce

Media & Entertainment

CUSTOMER STORY

How New Treatments and Artificial Intelligence are Improving Multiple Aspects of Clinical Trials

Read more customer stories

Presenting Narralegal, the largest language model for legal texts in Spanish

Too many tokens for so little model

Narralegal to the rescue

Where did we get the dataset to train it?

Artificial intelligence in Spanish

About Narrativa

MORE INSIGHTS