Llama 3 - A cost analysis

Outline

Llama 3 was just dropped on April 18th, 2024 with two available versions (8B and 70B) with a third larger model (400B) on the way. While you can self-host these models (especially the 8B version) the amount of compute power you need to run them fast is quite high. Thankfully, there are cloud providers that can help you run these models at scale. Below is a cost analysis of running Llama 3 on Google Vertex AI, Amazon SageMaker, Azure ML, and Groq API.

Cost Analysis

For context, these prices were pulled on April 20th, 2024 and are subject to change. The prices are based on running Llama 3 24/7 for a month with 10,000 chats per day. Some providers like Google and Amazon charge for the instance type you use, while others like Azure and Groq charge per token processed. Note, at this time, on VertexAI with their one-click-deployments you cannot scale down to 0 instances so you will be charged for the instance even if it is not processing any requests. Amazon only charges for the time you are actively using the instance which may make it a more cost effective solution if you have a variable load, Groq is the cheapest option, and also the fastest bosting ~870 tokens/second, although the provider is quite new to the game compared to the others.

While the price varies a bit between providers, another factor is if you already work with some of their other services - you may be able to get committed use discounts and bundles.

Service Detail Llama 3 - 8B Llama 3 - 70B
Google Vertex AI
(us-central1)
Instance Type g2-standard-8 g2-standard-96
# CPU 8 96
# Ram GB 32 384
Accelerator NVIDIA_L4 NVIDIA_L4
# Accelerator 1 8
$ Cost/Instance/Month $623.15 $5,842.43
$ Cost/Instance/Month (3-year commitment) $280.42 $2,629.09
Amazon SageMaker
(us-east-2)
Instance Type ml.g5.2xlarge ml.p4d.24xlarge
# CPU 8 96
# Ram GB 32 1152
# Accelerator 1 8
$ Cost/Instance/Month (24/7) $1,054.44 $26,230.85
Azure ML
(On Demand | 8B | 70B)
Assume 6K tokens per chat - 5.5K input, 0.5K output
$/1,000 Input Tokens $0.00110 $0.01134
$/1,000 Output Tokens $0.00037 $0.00378
# Chats/Day 10000 10000
$ Cost/Day $62.35 $642.60
$ Cost/Month $1,808.15 $18,635.40
Groq API
(On Demand)
Assume 6K tokens per chat - 5.5K input, 0.5K output
$/1,000 Input Tokens $0.00005 $0.00059
$/1,000 Output Tokens $0.00010 $0.00079
# Chats/Day 10000 10000
$ Cost/Day $3.25 $36.40
$ Cost/Month $94.25 $1,055.60

Conclusion

The new Llama 3 models are honestly quite impressive, with the 70B model beating Claude 3 Sonnet on several metrics. If you know you will have a consistent load and are already integrated into one of these providers, check out Google or Amazon. If you need speed, check out Groq (personal pick). Azure seems a bit high all around, so I would only recommend them if you are already using their services.

Happy chatting!

Comments

Login to Add comments.