Skip to content

UV Data Science Project Template

Welcome to the documentation for UV Data Science Project Template. This project demonstrates how to set up a data science environment using Docker, UV, FastAPI, along with other tools for developing python projects.

Tutorial Project for 1) Data Science in a Dev Container, and 2) for a Machine Learning Application in Production; using Docker, UV, and FastAPI

This guide provides instructions on how to use Docker Compose with uv to create a container for a machine learning Python project that uses FastAPI for a production setup.

Project Documentation

Table of Contents

Overview

This project demonstrates how to set up a machine learning application using FastAPI, Docker, and Docker Compose. The project uses uv to manage dependencies and the virtual Python environment inside the container.

Docker and Docker Compose

Dockerfile

The Dockerfile is used to build the Docker image for the project. It includes the following steps:

  1. Define build-time arguments for the base container images and workspace name.
  2. Use a Python image with uv pre-installed.
  3. Set the working directory.
  4. Enable bytecode compilation for faster startup.
  5. Copy and install dependencies without installing the project.
  6. Copy the application source code and install it.
  7. Add executables and source to environment paths.
  8. Set the default command to run the FastAPI application.
# filepath: Dockerfile
# ...existing code...

Multi-Stage Dockerfile

To build the multistage image for a container optimized final image without uv use the multistage.Dockerfile.

Docker Compose

The docker-compose.yml file is used to define and run multi-container Docker applications. It includes the following configurations:

  1. Build the image from the Dockerfile.
  2. Define the image name.
  3. Host the FastAPI application on port 8000.
  4. Mount the current directory to the app directory in the container.
  5. Set environment variables.
  6. Define the default command to start the FastAPI application.
# filepath: docker-compose.yml
# ...existing code...

Using uv to Manage the Project

uv is a tool that simplifies the management of Python projects and virtual environments. It handles dependency installation, virtual environment creation, and other project configurations. In this project, uv is used to manage dependencies and the virtual environment inside the Docker container, ensuring a consistent and reproducible setup.

pyproject.toml

The pyproject.toml file includes the following sections:

  1. Project metadata (name, version, description, etc.).
  2. Dependencies required for the project.
  3. Dependency groups for development and linting.
  4. Configuration for pylint and tomlsort.
# filepath: pyproject.toml
# ...existing code...

Custom Code in src Folder

The src folder contains the custom code for the machine learning project. The main components include:

lit_auto_encoder.py

This file defines the LitAutoEncoder class, which is a LightningModule an autoencoder using PyTorch Lightning. The LitAutoEncoder class includes:

  1. An __init__ method to initialize the encoder and decoder.
  2. A training_step method to define the training loop.
  3. A configure_optimizers method to set up the optimizer.
# filepath: src/dev_container_uv_datascience/lit_auto_encoder.py
# ...existing code...

train_autoencoder.py

This file defines the training function train_litautoencoder to initialize and train the model on the MNIST dataset using PyTorch Lightning.

# filepath: src/dev_container_uv_datascience/train_autoencoder.py
# ...existing code...

FastAPI Application

The FastAPI application is defined in the app_fastapi_autoencoder.py file. It includes the following endpoints:

  1. GET /: Root endpoint that provides a welcome message and instructions.
  2. POST /train: Endpoint to train the autoencoder model.
  3. POST /embed: Endpoint to embed fake images using the trained autoencoder.

app_fastapi_autoencoder.py

This file defines the FastAPI application and the endpoints. It includes:

  1. Importing necessary libraries and modules.
  2. Defining global variables for the encoder, decoder, and model training status.
  3. A NumberFakeImages class for input validation.
  4. A train_litautoencoder function to initialize and train the autoencoder.
  5. A read_root function to handle the root endpoint.
  6. A train_model function to handle the model training endpoint.
  7. An embed function to handle the embedding endpoint.
  8. The application entry point to run the FastAPI application.

main.py

This file defines the uvicorn server to run the FastAPI AutoEncoder application and the endpoints. It includes:

  1. Importing necessary libraries and modules, including the source code of the project.
  2. The application entry point to run the FastAPI application.
# filepath: main.py
# ...existing code...

Production Setup for the Machine Learning FastAPI App hosted in the Docker container

  • Build the docker image and start a container:

To build all services when multiple services are defined in docker-compose.yml ("app" and "app-optimized-docker"). Note that in the give example both services us the same port and only one service at a time should be used.

docker-compose up --build

or to build a single service only "app" respectively "app-optimized-docker".

docker-compose up --build app
docker-compose up --build app-optimized-docker
  • Test the endpoint with curl:

  • Welcome root endpoint

curl -X GET http://0.0.0.0:8000/
  • Get docs of the request options of the FastAPI app:
curl -X GET http://0.0.0.0:8000/docs
  • Test the endpoint with curl by training the model first, followed by requesting predictions for n fake images
curl -X POST http://0.0.0.0:8000/train \
curl -X POST http://0.0.0.0:8000/embed -H "Content-Type: application/json" -d '{"n_fake_images": 4}'

Development in Dev Container

  • Run the server: uv run /workspace/main.py
  • Test the standard endpoints with curl:
  • Get docs of the request options of the FastAPI app
curl -X GET http://localhost:8000/docs
  • Welcome root request of the FastAPI app, providing an app description
curl -X GET http://localhost:8000/
  • Test the machine learning endpoints with curl:
curl -X POST http://localhost:8000/train \
curl -X POST http://localhost:8000/embed -H "Content-Type: application/json" -d '{"n_fake_images": 1}'

Post-Create and Post-Start Commands

The devcontainer.json file includes post-create and post-start commands to configure the development environment.

# filepath: .devcontainer/devcontainer.json
# ...existing code snippet...

This guide provides a comprehensive overview of setting up and running the machine learning FastAPI project using Docker Compose and uv. Follow the instructions to build and run the application in both development and production environments.