Picture by Writer | ideogram.ai# Introduction
With the surge of huge language fashions (LLMs) lately, many LLM-powered functions are rising. LLM implementation has launched options that have been beforehand non-existent.
As time goes on, many LLM fashions and merchandise have turn out to be out there, every with its execs and cons. Sadly, there’s nonetheless no customary technique to entry all these fashions, as every firm can develop its personal framework. That’s the reason having an open-source instrument equivalent to LiteLLM is beneficial whenever you want standardized entry to your LLM apps with none further value.
On this article, we’ll discover why LiteLLM is useful for constructing LLM functions.
Let’s get into it.
# Profit 1: Unified Entry
LiteLLM’s largest benefit is its compatibility with totally different mannequin suppliers. The instrument helps over 100 totally different LLM companies by standardized interfaces, permitting us to entry them whatever the mannequin supplier we use. It’s particularly helpful in case your functions make the most of a number of totally different fashions that have to work interchangeably.
A couple of examples of the foremost mannequin suppliers that LiteLLM helps embody:
- OpenAI and Azure OpenAI, like GPT-4.
- Anthropic, like Claude.
- AWS Bedrock & SageMaker, supporting fashions like Amazon Titan and Claude.
- Google Vertex AI, like Gemini.
- Hugging Face Hub and Ollama for open-source fashions like LLaMA and Mistral.
The standardized format follows OpenAI’s framework, utilizing its chat/completions schema. Which means that we are able to swap fashions simply with no need to grasp the unique mannequin supplier’s schema.
For instance, right here is the Python code to make use of Google’s Gemini mannequin with LiteLLM.
from litellm import completion
immediate = "YOUR-PROMPT-FOR-LITELLM"
api_key = "YOUR-API-KEY-FOR-LLM"
response = completion(
mannequin="gemini/gemini-1.5-flash-latest",
messages=[{"content": prompt, "role": "user"}],
api_key=api_key)
response['choices'][0]['message']['content']
You solely have to get hold of the mannequin title and the respective API keys from the mannequin supplier to entry them. This flexibility makes LiteLLM excellent for functions that use a number of fashions or for performing mannequin comparisons.
# Profit 2: Value Monitoring and Optimization
When working with LLM functions, you will need to monitor token utilization and spending for every mannequin you implement and throughout all built-in suppliers, particularly in real-time situations.
LiteLLM permits customers to take care of an in depth log of mannequin API name utilization, offering all the required data to manage prices successfully. For instance, the `completion` name above could have details about the token utilization, as proven beneath.
utilization=Utilization(completion_tokens=10, prompt_tokens=8, total_tokens=18, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetailsWrapper(audio_tokens=None, cached_tokens=None, text_tokens=8, image_tokens=None))
Accessing the response’s hidden parameters can even present extra detailed data, together with the price.
With the output much like beneath:
{'custom_llm_provider': 'gemini',
'region_name': None,
'vertex_ai_grounding_metadata': [],
'vertex_ai_url_context_metadata': [],
'vertex_ai_safety_results': [],
'vertex_ai_citation_metadata': [],
'optional_params': {},
'litellm_call_id': '558e4b42-95c3-46de-beb7-9086d6a954c1',
'api_base': 'https://generativelanguage.googleapis.com/v1beta/fashions/gemini-1.5-flash-latest:generateContent',
'model_id': None,
'response_cost': 4.8e-06,
'additional_headers': {},
'litellm_model_name': 'gemini/gemini-1.5-flash-latest'}
There’s a variety of data, however an important piece is `response_cost`, because it estimates the precise cost you’ll incur throughout that decision, though it might nonetheless be offset if the mannequin supplier gives free entry. Customers also can outline customized pricing for fashions (per token or per second) to calculate prices precisely.
A extra superior cost-tracking implementation can even enable customers to set a spending price range and restrict, whereas additionally connecting the LiteLLM value utilization data to an analytics dashboard to extra simply mixture data. It is also doable to offer customized label tags to assist attribute prices to sure utilization or departments.
By offering detailed value utilization information, LiteLLM helps customers and organizations optimize their LLM software prices and price range extra successfully.
# Profit 3: Ease of Deployment
LiteLLM is designed for straightforward deployment, whether or not you utilize it for native improvement or a manufacturing setting. With modest sources required for Python library set up, we are able to run LiteLLM on our native laptop computer or host it in a containerized deployment with Docker and not using a want for complicated further configuration.
Talking of configuration, we are able to arrange LiteLLM extra effectively utilizing a YAML config file to record all the required data, such because the mannequin title, API keys, and any important customized settings to your LLM Apps. You may also use a backend database equivalent to SQLite or PostgreSQL to retailer its state.
For information privateness, you’re liable for your individual privateness as a person deploying LiteLLM your self, however this strategy is safer for the reason that information by no means leaves your managed setting besides when despatched to the LLM suppliers. One characteristic LiteLLM supplies for enterprise customers is Single Signal-On (SSO), role-based entry management, and audit logs in case your software wants a safer setting.
General, LiteLLM supplies versatile deployment choices and configuration whereas maintaining the information safe.
# Profit 4: Resilience Options
Resilience is essential when constructing LLM Apps, as we would like our software to stay operational even within the face of surprising points. To advertise resilience, LiteLLM supplies many options which are helpful in software improvement.
One characteristic that LiteLLM has is built-in caching, the place customers can cache LLM prompts and responses in order that similar requests do not incur repeated prices or latency. It’s a helpful characteristic if our software steadily receives the identical queries. The caching system is versatile, supporting each in-memory and distant caching, equivalent to with a vector database.
One other characteristic of LiteLLM is computerized retries, permitting customers to configure a mechanism when requests fail resulting from errors like timeouts or rate-limit errors to mechanically retry the request. It’s additionally doable to arrange further fallback mechanisms, equivalent to utilizing one other mannequin if the request has already hit the retry restrict.
Lastly, we are able to set charge limiting for outlined requests per minute (RPM) or tokens per minute (TPM) to restrict the utilization degree. It’s an effective way to cap particular mannequin integrations to forestall failures and respect software infrastructure necessities.
# Conclusion
Within the period of LLM product progress, it has turn out to be a lot simpler to construct LLM functions. Nonetheless, with so many mannequin suppliers on the market, it turns into arduous to ascertain a regular for LLM implementation, particularly within the case of multi-model system architectures. That is why LiteLLM may help us construct LLM Apps effectively.
I hope this has helped!
Cornellius Yudha Wijaya is a knowledge science assistant supervisor and information author. Whereas working full-time at Allianz Indonesia, he likes to share Python and information ideas by way of social media and writing media. Cornellius writes on quite a lot of AI and machine studying matters.
