Skip to main content

Deploying an LLM Application as an API Endpoint using FastAPI in Docker

1. Introduction to FastAPI and LLMs​

Overview of FastAPI​

FastAPI is a modern, high-performance web framework for building APIs with Python. It has become the go-to framework for developers due to its speed, ease of use, and scalability. FastAPI allows you to build RESTful APIs, interact with models (such as LLMs), and deploy them in production with minimal effort. Its key features include:

  • Asynchronous programming: FastAPI uses async/await for non-blocking, concurrent operations.
  • Automatic validation: It automatically validates request data based on Python type hints, which simplifies error handling.
  • Interactive documentation: FastAPI generates interactive Swagger UI and ReDoc for easy API testing.

Introduction to Large Language Models (LLMs)​


LLMs like GPT-4 and BERT are machine learning models capable of processing and generating human-like text. These models have revolutionized fields like natural language processing (NLP), chatbot development, and content generation. LLMs can be fine-tuned for various tasks such as:

  • Text completion and generation
  • Sentiment analysis
  • Summarization
  • Question answering

With frameworks like Hugging Face Transformers, it’s possible to load these models and serve them via APIs.

Why FastAPI for LLM Applications?​

FastAPI’s asynchronous nature and low-latency performance make it an ideal framework for serving LLMs. Given the computational demands of LLMs, FastAPI's ability to efficiently handle multiple requests concurrently is crucial in ensuring that the service remains responsive under load.

2. Setting up the Development Environment​

Prerequisites​

Before we begin, make sure you have the following tools installed:

  • Python 3.7+
  • FastAPI (pip install fastapi)
  • Uvicorn (ASGI server) (pip install uvicorn)
  • Hugging Face Transformers (pip install transformers)
  • Docker (For containerization)

Installing Python and FastAPI​

  1. Install Python 3.7 or higher: Make sure Python is installed. You can verify it with:
bash
python --version
  1. Install FastAPI:
    Use pip to install FastAPI:
bash
pip install fastapi
  1. Install Uvicorn:
    Uvicorn is an ASGI server that allows FastAPI to handle asynchronous requests. Install it using:
bash
pip install uvicorn

Installing and Configuring Hugging Face Transformers​

Hugging Face Transformers provide an easy way to load pre-trained LLMs like GPT-4, BERT, and more. To install it:

bash
pip install transformers

We will use a pre-trained model from the Hugging Face Model Hub to demonstrate deploying an LLM with FastAPI.

3. Developing an API with FastAPI to Serve the LLM​

Structuring the FastAPI Project​

A typical FastAPI project consists of:

  • main.py: This is the entry point of your application where you define the API endpoints.
  • models.py: Defines the data models for request validation and responses.
  • services.py: Contains business logic (e.g., LLM interaction).
  • Dockerfile: Used to containerize the application later.

Project Structure:​

css
my_fastapi_app/
β”œβ”€β”€ main.py
β”œβ”€β”€ models.py
β”œβ”€β”€ services.py
└── Dockerfile

Loading the Pre-trained LLM Model​

In services.py, we load a pre-trained language model using Hugging Face Transformers.

python
from transformers import pipeline

def load_model():
# Load a text-generation pipeline with a pre-trained GPT model
model = pipeline("text-generation", model="gpt2")
return model

# Create a function to generate text based on a prompt
def generate_text(prompt: str):
model = load_model()
return model(prompt, max_length=50, num_return_sequences=1)

Writing FastAPI Endpoints to Interact with the LLM​

In main.py, we define API endpoints to interact with the LLM. Here, we’ll create a POST endpoint that receives a text prompt and returns generated text.

python
from fastapi import FastAPI
from services import generate_text

app = FastAPI()

@app.post("/generate/")
async def generate(prompt: str):
result = generate_text(prompt)
return {"generated_text": result[0]['generated_text']}

This API will allow users to send a text prompt, and the FastAPI backend will generate text using the LLM.

4. Testing the FastAPI Application Locally​

To run the FastAPI application locally, use Uvicorn, the ASGI server:

bash
uvicorn main:app --reload
  • Testing the API: You can now navigate to http://127.0.0.1:8000/docs to view the auto-generated Swagger UI documentation and interact with the API.

Try sending a POST request to /generate/ with a JSON body like:

json
{
"prompt": "Once upon a time"
}

You should receive a response containing the generated text from the LLM.

5. Introduction to Docker​

Docker is a platform for developing, shipping, and running applications inside containers. Containers encapsulate the application along with its dependencies, ensuring consistency across environments.

Installing Docker​

You can install Docker from the official website:

  • Windows/Mac: Download and install Docker Desktop from Docker's official website.
  • Linux: Use the package manager for your distribution (e.g., apt-get for Ubuntu).

6. Dockerizing the FastAPI Application​

Dockerizing a FastAPI application involves creating a Dockerfile, which specifies the steps to build a Docker image for the application.

Writing a Dockerfile​

Here’s an example Dockerfile for our FastAPI LLM application:

Dockerfile
# Use the official Python image from Docker Hub
FROM python:3.9-slim

# Set the working directory
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Expose the port that FastAPI will run on
EXPOSE 8000

# Run the FastAPI application with Uvicorn
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Creating the requirements.txt File​

We also need a requirements.txt file to install the necessary dependencies:

txt
fastapi
uvicorn
transformers

Building a Docker Image for the FastAPI Application​

In the terminal, navigate to the root of the project directory and run:

bash
docker build -t fastapi-llm .

This command builds the Docker image, tagging it as fastapi-llm.

7. Testing the Dockerized Application Locally​

Once the image is built, you can run it as a Docker container:

bash
docker run -d -p 8000:8000 fastapi-llm

This will start the FastAPI application in a Docker container, mapping port 8000 on your local machine to port 8000 in the container.

You can now test the API in the same way as before by visiting http://127.0.0.1:8000/docs.

8. Deploying Docker Containers​

Pushing Your Docker Image to Docker Hub​

To deploy your Docker container to a cloud service, you first need to push it to a container registry like Docker Hub.

  1. Log in to Docker Hub:
bash
docker login
  1. Tag your image:
bash
docker tag fastapi-llm <your-dockerhub-username>/fastapi-llm:latest
  1. Push the image to Docker Hub:
bash
docker push <your-dockerhub-username>/fastapi-llm:latest

Choosing a Cloud Provider​

Once your image is in a registry, you can deploy it to a cloud platform like AWS, Google Cloud, or Azure.

9. Deploying to Production​

For a production setup, you may want to use Docker Compose to manage multiple containers or Kubernetes for scaling. Here is a basic setup with Docker Compose.

Transform Your Business with AI and Expert Chatbot Solutions.

Our experts can help you streamline your processes with AI and Chatbot solution using large language models(LLMs).

Conclusion​

Deploying an LLM using Python FastAPI and Docker is a robust and scalable approach to handling complex machine learning tasks. This process ensures that your model is easily accessible, scalable, and deployable across multiple environments with consistency and security.