Putting Llama3 to the Test: A Real-World Evaluation of the new LLM

Fabio Oppini
14 min readApr 20, 2024

--

LLaMA (Large Language Model Meta AI) is a family of autoregressive Large Language Models developed by Meta. On April 18th, 2024 Meta released Llama3, presented as “the most capable openly available LLM to date”.

At the time of writing, Llama3 is available in two text-based models with 8B and 70B parameter sizes. Meta is working to make Llama3 multimodal and also available in a larger model with over 400B parameters.

A big change in Llama 3 compared to Llama 2 is the use of a new tokenizer that expands the vocabulary size to 128,256 (from 32K tokens in the previous version). This larger vocabulary can encode text more efficiently (both for input and output).

The Llama 3 models were trained ~8x more data on over 15 trillion tokens on two clusters with 24,000 GPUs.

I tested the 8B model downloading the instruct version from ollama and running it locally on my laptop (2,4 GHz 8-Core Intel Core i9 with 32GB RAM).

I identified five real-case scenarios:

  • Generic Chatbot Testing: Even if using the instruct version of the model, consider a few queries to test for accuracy of responses and handling of generic inquiries.
  • Language Translation Evaluation: Test the translation from Italian to English, from French to Italian and from French to English.
  • Content Creation Assistant: Evaluate the model’s ability to write a news article from a short news story and an advertising text for an e-commerce site.
  • Code Generation Testing: Test the LLM’s ability to generate code snippets based on provided simple programming tasks in two programming languages (Python and Javascript).
  • Video Transcript Summarization: Ask the LLM’s to summarize youtube video transcripts.

Keep reading to see how the Llama3 model responded to the test prompts.

Generic Chatbot Testing

>>> Hello, please introduce yourself

Nice to meet you!

I’m LLaMA, a large language model trained by a team of researcher at Meta AI. My primary function is to generate human-like text responses to user input, which can be used for a variety of applications such as chatbots, virtual assistants, and language translation.

I’ve been trained on a massive dataset of text from the internet and can generate responses in a wide range of styles and tones. I’m constantly learning and
improving my abilities based on the conversations I have with users like you.

Some things you might want to know about me:

* I’m a large language model, which means I’ve been trained on a huge amount of text data.
* I can generate responses in a wide range of styles and tones.
* I’m constantly learning and improving my abilities based on the conversations I have with users like you.
* I’m designed to be helpful and informative, but I’m not perfect — I make mistakes sometimes!
* I’m always happy to chat and help answer questions or provide information on a wide range of topics.

So, what brings you here today? Do you have any specific questions or topics you’d like to discuss? I’m all ears (or rather, all text)!

>>> I am here to evaluate your linguistic performance, accuracy and skills. What makes Meta proud of you?

Thank you for the opportunity to showcase my abilities!

Meta is proud of me for several reasons:

1. **Conversational Understanding**: I’m designed to comprehend natural language input, even when it’s ambiguous or open-ended. This means I can engage in conversations that mimic human-like dialogue.
2. **Contextual Knowledge**: My training data includes a vast amount of text from the internet, which enables me to recognize and respond to context-specific
information. This helps me provide more accurate and relevant answers.
3. **Generative Capabilities**: I can generate text that’s not only informative but also engaging, creative, and sometimes even humorous! This ability allows me to create unique responses that don’t feel robotic or predictable.
4. **Multilingualism**: I’m trained on a massive dataset of text in multiple languages, which enables me to understand and respond in various languages.
Currently, I support over 100 languages!
5. **Self-Improvement**: My training is ongoing, and Meta continuously updates my models with new data and techniques. This means I can learn from my mistakes and improve my performance over time.
6. **Adaptability**: I’m designed to adapt to different conversing!

Language Translation Evaluation

>>> How many languages do you speak?

I’m a large language model, so I don’t have a native language like humans do.
Instead, I was trained on a massive corpus of text data in multiple languages. As a result, I can understand and generate text in many languages to some extent.

According to my training data, I can handle text in over 100 languages, including:

1. Major languages: English, Spanish, French, German, Italian, Portuguese, Chinese (Simplified and Traditional), Japanese, Korean, Arabic, Russian, Hebrew, Hindi, etc.
2. European languages: Dutch, Swedish, Norwegian, Danish, Finnish, Greek, Turkish, Polish, Romanian, Czech, Hungarian, etc.
3. Slavic languages: Ukrainian, Belarusian, Bulgarian, Serbian, Croatian, Slovenian, etc.
4. Asian languages: Vietnamese, Thai, Indonesian, Malaysian, Filipino, etc.
5. African languages: Swahili, Zulu, Yoruba, Hausa, Amharic, etc.

Please note that while I can understand and generate text in many languages, my proficiency may vary depending on the language and the complexity of the text. I’m constantly learning and improving, so my abilities will evolve over time.

Feel free to ask me a question or provide some text in any of these languages, and
I’ll do my best to respond!

>>> I am giving you a description in Italian, translate it to English: “C’è la possibilità che Israele e il movimento integralista islamico palestinese Hamas interrompano i negoziati in Qatar per un cessate-il-fuoco e il rilascio degli ostaggi. Lo ha riferito il Wall Street Journal, citando un funzionario arabo — sotto anonimato — che ha familiarità con il dossier.
Mercoledì, Nbc News ha riferito, citando un alto diplomatico arabo, che i negoziati per la tregua tra Israele e Hamas erano «quasi congelati». Due giorni fa il membro dell’ufficio politico di Hamas, Husam Badran, ha dichiarato che il primo ministro israeliano Benjamin Netanyahu non sta cercando una soluzione politica del conflitto. L’ufficio politico di Hamas sta cercando di spostare la propria sede dal Qatar a un altro Paese arabo, il che probabilmente interromperà gli attuali negoziati tra Israele e Hamas, ha riferito il quotidiano. Il movimento integralista islamico palestinese si è già rivolto ad almeno due Paesi alternativi per ospitare la sua sede politica, tra cui l’Oman.”

Here’s the translation:

“There is a possibility that Israel and the Palestinian Islamist movement Hamas will interrupt negotiations in Qatar for a ceasefire and the release of hostages.
This was reported by the Wall Street Journal, citing an anonymous Arab official familiar with the dossier.

On Wednesday, NBC News reported, citing a high-ranking Arab diplomat, that talks between Israel and Hamas for a truce were “almost frozen”. Two days ago, Husam Badran, a member of Hamas’ political office, stated that Israeli Prime Minister Benjamin Netanyahu is not seeking a political solution to the conflict. Hamas’ political office is reportedly looking to relocate its headquarters from Qatar to another Arab country, which would likely halt current negotiations between Israel and Hamas, according to the newspaper. The Islamist Palestinian movement has already approached at least two alternative countries to host its political seat, including Oman.”

>>> Translate this text to Italian: “DÎNER CROISIÈRE À PARIS. Le soleil se couche et la nuit tombe sur Paris, le bateau vient de larguer les amarres et le quai s’éloigne doucement. Vous venez d’embarquer pour un dîner croisière d’exception avec la compagnie des Bateaux Mouches® … La ville lumière se dévoile peu à peu alors que vous glissez doucement sur la Seine scintillante. L’instant est romantique, la magie opère … Vous êtes les acteurs de cette magnifique pièce qui ne se joue que pour vous. Le raffinement de la carte fait écho à ce cadre exceptionnel et illustre un art culinaire guidé par l’excellence. Un diner croisière qui restera un moment privilégié de votre séjour à Paris. Une manière élégante et romantique de découvrir ou redécouvrir Paris depuis la Seine.”

Here’s the translation:

“DINNER CRUISE À PARIS
Il sole è tramontato e la notte si abbassa su Parigi, il battello ha appena lasciato le banchine e il molo si allontana lentamente. Hai appena imbarcato per un dinner cruise d’eccezione con la compagnia dei Bateaux Mouches… La città luminosa si sviluppa pian piano mentre tu scorri dolcemente sulla Senna lucente. Il momento è romantico, la magia opera… Sei il protagonista di questa magnifica pièce che non viene rappresentata che per te. Il raffinamento della carta corrisponde a questo sfondo ecceccile e illustr un art culinge guidé per l. Un cena croise che restisce un momento privl.

Note: the translation from French to Italian is definitely not accurate: strangely the title is translated to English, some words/expressions are wrong in the translation, and the last two sentences are returned as a complete non-sense. On the other hand, as you see in the next prompt, the same text is correctly translated to English.

>>> Translate this text to English: “DÎNER CROISIÈRE À PARIS. Le soleil se couche et la nuit tombe sur Paris, le bateau vient de larguer les amarres et le quai s’éloigne doucement. Vous venez d’embarquer pour un dîner croisière d’exception avec la compagnie des Bateaux Mouches® … La ville lumière se dévoile peu à peu alors que vous glissez doucement sur la Seine scintillante. L’instant est romantique, la magie opère … Vous êtes les acteurs de cette magnifique pièce qui ne se joue que pour vous. Le raffinement de la carte fait écho à ce cadre exceptionnel et illustre un art culinaire guidé par l’excellence. Un diner croisière qui restera un moment privilégié de votre séjour à Paris. Une manière élégante et romantique de découvrir ou redécouvrir Paris depuis la Seine.”

Here is the translation to English:

“PARIS CROSS-DINNER CRUISE
The sun sets and night falls on Paris, the boat has just cast off and the dock slowly recedes. You have just embarked on an exceptional dinner cruise with Bateaux Mouches® … The City of Light gradually unfolds as you glide smoothly over the sparkling Seine. The moment is romantic, magic is at work… You are the actors in this magnificent play that only plays for you. The refinement of the menu echoes this exceptional setting and illustrates a culinary art guided by excellence. A dinner cruise that will remain a privileged moment of your stay in Paris. An elegant and romantic way to discover or re-discover Paris from the Seine.”

Content Creation Assistance

>>> Write a news article on the Grand Prix in China about Verstappen’s victory in the Sprint Race”. Elaborate from this content: “Verstappen wins the Shanghai Sprint ahead of Hamilton and Perez. Immediately behind the Ferraris of Leclerc and Sainz. Norris’s terrible start, Hamilton takes advantage of it and resists the counterattack: the McLaren driver slips to 7th place. Verstappen waits, halfway through the race he raises the revs and quickly passes Alonso and Hamilton, greeting the group. Sainz, Perez and Leclerc roar behind Alonso. With 4 laps to go the fight between Alonso and Sainz favors Perez. Alonso punctures, Leclerc engages and wins the duel with his teammate for 4th place.”

**VERSTAPPHEN STEALS THE SHOW IN SHANGHAI WITH SPRINT RACE VICTORY**

Shanghai, China — The Shanghai Grand Prix Sprint Race came to a thrilling conclusion today as Max Verstappen took top honors ahead of Lewis Hamilton and Sergio Perez. The Dutch driver’s victory was marked by a masterclass performance, where he patiently waited for his moment before striking with lightning speed.

The race got off to a rocky start for Lando Norris, whose poor launch saw him drop down the order. This opened the door for Hamilton to capitalize on the
mistake, and he quickly moved up the ranks to challenge Verstappen’s position. However, the Red Bull Racing driver was not about to let his chance slip away,
expertly fending off Hamilton’s advances.

As the race wore on, Alonso and Verstappen were locked in a battle royale for position, with the McLaren driver ultimately succumbing to Verstappen’s superior pace. Meanwhile, behind them, Sainz, Perez, and Leclerc were engaged in their own fierce struggle for fourth place.

In a stunning turn of events, Alonso suffered a puncture with just four laps remaining, ceding crucial ground to his rivals. This opened the door for Leclerc to pounce on Sainz, ultimately securing the position as the two Ferrari drivers clashed.

As the checkered flag waved, Verstappen emerged victorious, with Hamilton and Perez trailing behind in second and third respectively. The top four finishers were
rounded out by Leclerc’s impressive recovery, which saw him jump over Alonso to claim fourth place.

The result marks a crucial win for Verstappen as he looks to build momentum ahead of the remainder of the season. “I’m thrilled with today’s victory,” said Verstappen in the post-race press conference. “It was a tough fight out there, but I managed to stay focused and execute my strategy to perfection.”

The Shanghai Grand Prix Sprint Race marks another exciting chapter in what has been an unpredictable and thrilling season of Formula One racing. As the teams prepare for the next challenge ahead, one thing is certain: Verstappen’s victory will be a tough act to follow.

>>> Describe a pair of sports shoes for an e-commerce site highlighting the keywords “running”, “discount”, “competitive”

**Competitive Edge Running Shoes — Limited Time Discount!**

Take your running to the next level with our top-of-the-line Competitive Edge Running Shoes! Designed for speed and agility, these shoes are perfect for serious
runners looking to shave precious seconds off their personal best.

**Key Features:**

* Lightweight and responsive midsole for maximum energy return
* Breathable mesh upper with strategic ventilation for cooling airflow
* Aggressive tread pattern for excellent traction and grip on various terrain
* Comfortable and supportive fit with a roomy toe box

**Why Runners Love Them:**

* Proven to reduce the risk of injury by 30% due to improved support and stability
* Lightweight design allows for a natural stride and quick turnaround between strides
* Durable outsole withstands even the toughest training sessions

**Limited Time Discount:** Don’t miss out on this incredible opportunity to upgrade your running gear at an unbeatable price! For a limited time, use the code RUN15 at checkout to receive 15% off your purchase.

**Order Now and Take Your Running to New Heights!**

Price: $120 (Regularly $140)

Discounted Price with code RUN15: $102

Get ready to crush your next PR!

Code Generation Testing

>>> Write the Python code that writes a list of strings called nameList to a text file called list.txt putting one string per row, preceding each string with an index starting from 0

Here is the Python code that writes a list of strings called `nameList` to a text file called `list.txt`, with one string per row, preceded by an index starting
from 0:
```python
nameList = [“John”, “Alice”, “Bob”, “Charlie”, “Emma”]

with open(“list.txt”, “w”) as f:
for i, name in enumerate(nameList):
f.write(f”{i}: {name}\n”)
```
Let me explain what this code does:

1. `nameList` is a list of strings.
2. The `with` statement opens a file called `list.txt` in write mode (`”w”`). This ensures that the file is properly closed when we’re done writing to it,
regardless of whether an exception is thrown or not.
3. The `for` loop iterates over the elements of `nameList`. We use the `enumerate` function to get both the index `i` and the value `name` for each element.
4. For each iteration, we write a line to the file using the `f.write()` method. The format string `”{i}: {name}\n”` uses f-strings (introduced in Python 3.6) to
insert the index `i` and the name `name` into the string. The `\n` at the end adds a newline character to each line.
5. When we’re done writing, the file is automatically closed by the `with` statement.

The resulting file `list.txt` will contain:
```
0: John
1: Alice
2: Bob
3: Charlie
4: Emma
```
Note that if you want to append to an existing file instead of overwriting it, use `”a”` instead of `”w”` in the `open()` function.

>>> Write the javascript that finds all the div elements of class “event__round event__round — static” and then loops through each element found logging it to the console

Here’s how you can loop through each element found and log it to the console:
```javascript
const roundDivs = document.querySelectorAll(‘.event__round.event__round — static’);

for (let i = 0; i < roundDivs.length; i++) {
const roundDiv = roundDivs[i];
console.log(roundDiv);
}
```
Let me explain what this code does:
1. `document.querySelectorAll()` is a method that returns a NodeList (a collection of nodes) containing all the `<div>` elements with the specified class names.
2. We loop through each element in the NodeList using a `for` loop, where `i` represents the current index and `roundDivs.length` is the total number of
elements.
3. Inside the loop, we assign the current element to a variable called `roundDiv`.
4. Finally, we log each element to the console using `console.log(roundDiv)`.

This will output something like:
```
<div class=”event__round event__round — static”>…</div>
<div class=”event__round event__round — static”>…</div>

```
The actual output will depend on what elements are found with the specified classes.

Note: the CSS selector used in the querySelectorAll() method doesn’t specify to select only div elements, therefore all elements would be selected with those classes, not only div ones.

Video Transcript Summarization

>>> Summarize this video transcript: “More partners in space exploration … New data measuring ocean health, air quality and our climate … And an upgrade to testing facilities for Artemis II …
a few of the stories to tell you about — This Week at NASA!
On April 15th, Switzerland became the 37th country
to sign the Artemis Accords.
… follows the transcript of video
https://www.youtube.com/watch?v=1FwQNtJ2hks

The transcript is a summary of various news and updates from NASA (National Aeronautics and Space Administration) for the week ending April 19th. Here are the main points:

1. Artemis Accords: Switzerland, Sweden, and Slovenia have joined the accords, which aim to guide space exploration among nations for the sustainable use of
space.
2. PACE Satellite: NASA has released new data from its Plankton, Aerosol, Cloud, ocean Ecosystem (PACE) satellite, which measures ocean health, air quality, and
climate effects. The data will help researchers study issues like fisheries health, air pollution, and wildfire smoke.
3. Artemis II Mission: NASA’s Kennedy Space Center has upgraded the altitude chamber where testing for the Orion spacecraft will occur. The spacecraft is being
prepared for the Artemis II mission, which will carry four astronauts on a flight around the Moon and back to Earth.
4. Student Launch Challenge: NASA hosted its 20th annual Student Launch challenge, which brought students from various educational institutions to launch amateur rockets and payloads near NASA’s Marshall Space Flight Center.

Overall, the video highlights NASA’s progress in space exploration, earth science, and student engagement initiatives.

>>> Summarize the following video transcript: “Jason Wilcox has been appointed as the new technical director of Manchester United with immediate effect the news comes after United agreed compensation with Southampton where Wilcox was working as director of football new United chief executive Omar bereda knows Wilcox well from their time together at Manchester City where Wilcox worked his way up from coach to Academy director as a player WILX won the Premier League with Blackburn back in 1995 well Southampton said this Southampton football club can today confirm that it has reluctantly agreed to allow director of football Jason Wilcox to join Manchester United having reached an agreement with the Premier League side on an acceptable compensation fee whilst the club is naturally disappointed that Jason’s stay at Southampton has only lasted nine months it wishes him well for the future”

Manchester United has appointed Jason Wilcox as their new technical director, effective immediately. Wilcox was previously the director of football at Southampton, where he worked for just 9 months. Manchester United and Southampton have reached an agreement on compensation for Wilcox’s departure. Wilcox is known to new United chief executive Omar Beredia from their time together at Manchester City, where Wilcox progressed from coach to Academy director. Wilcox previously won the Premier League with Blackburn Rovers in 1995. Despite only being at Southampton for a short time, the club wishes him well for his future endeavors.

Conclusion

Overall the model showed excellent behavior in all 5 test cases considered.
The improvement points noted concern the translation between languages other than English and an inaccuracy in the generated Javascript code.
In any case, the ability to understand the request, the accuracy of the answers, the level of coherence and creativity, the efficiency and relevance in the summarization were decidedly excellent.

--

--

Fabio Oppini

Engineer and technologist, coder since the age of 8, expert on AI and digital transformation.