Three ways to Structure a Codebase containing Multiple Applications

It is common to find codebases (monorepos) in which logic is repeated across multiple applications. We evaluate three ways to structure such codebases for ease of development, version control, and deployment.

2024-03-17

06m Read

By: Abhilaksh Singh Reen

Expectations

Common Files

Producer

Consumer

Method 1: Two Directories with Duplicate Files

Method 2: `Common` Directory with the common files

Method 3: A single Directory

Conclusion

Expectations

Common Files

Producer

Consumer

Method 1: Two Directories with Duplicate Files

Method 2: `Common` Directory with the common files

Method 3: A single Directory

Conclusion

In this Blog Post, we cover three ways to structure a multi-application code base with common files between the applications.

We evaluate the different methods on the basis of:

1) Ease of development

2) Ease of Version Control

3) Building a Docker image and deploying

Expectations

I have a very simple demo application: a producer and a consumer that produce and consume tasks for the Huey task queue. The producer and consumer are two separate applications but they have some files in common: such as the redis_client.py file that sets up the connection to Redis, or the tasks.py file that defines the tasks that have to be produced or consumed.

Once the development of the applications is complete, we will also be building two Docker images for the producer and consumer and pushing them to Docker Hub.

Common Files

Both applications have two files in common. The first one is the redis_client.py file that contains our Redis connection:

from os import environ

from redis import Redis


REDIS_HOST = environ["REDIS_HOST"]
REDIS_PORT = int(environ["REDIS_PORT"])
REDIS_DB = (environ["REDIS_DB"])
REDIS_PASSWORD = environ.get("REDIS_PASSWORD", None)

redis_client = Redis(host=REDIS_HOST, port=REDIS_PORT, db=REDIS_DB, password=REDIS_PASSWORD, decode_responses=True)

The second file is the tasks.py file that contains the tasks registered in Huey as well as some utility functions:

from json import dumps as json_dumps, loads as json_loads
from os import environ
from time import time

from huey import RedisHuey

from .redis_client import redis_client


REDIS_HOST = environ["REDIS_HOST"]

huey = RedisHuey("entrypoint", host=REDIS_HOST)

task_status_map_key = "long_tasks"
task_length = 100_000_000
task_num_intervals = 100


def validate_task_id(task_id):
    progress_str = redis_client.hget(task_status_map_key, task_id)
    return progress_str is not None


def get_task_progress(task_id):
    progress_str = redis_client.hget(task_status_map_key, task_id)
    if progress_str is None:
        return {
            'message': "Task not found."
        }

    return json_loads(progress_str)


def update_task_progress(task_id, time_elapsed, progress_percentage, status="Processing"):
    task_progress = {
        'status': status,
        'timeElapsed': str(round(time_elapsed, 4)),
        'progress': str(round(progress_percentage, 4)),
    }
    task_progress_str = json_dumps(task_progress)

    redis_client.hset(task_status_map_key, task_id, task_progress_str)


def mark_task_as_failed(task_id, time_elapsed=None, progress_percentage=None):
    task_progress = get_task_progress(task_id)

    if time_elapsed is not None:
        task_progress['timeElapsed'] = str(round(time_elapsed, 4))

    if progress_percentage is not None:
        task_progress['progress'] = str(round(progress_percentage, 4))

    task_progress['status'] = "Failed"
    task_progress['failedAt'] = task_progress['progress']

    task_progress_str = json_dumps(task_progress)

    redis_client.hset(task_status_map_key, task_id, task_progress_str)


@huey.task()
def long_task(task_id: str):
    progress_update_interval = task_length / task_num_intervals

    start_time = time()

    update_task_progress(task_id, 0, 0)

    count = 0
    for i in range(1, task_length + 1):
        try:
            count += 1

            if i % progress_update_interval == 0:
                time_elapsed = time() - start_time
                progress_percentage = (count / task_length) * 100
                update_task_progress(task_id, time_elapsed, progress_percentage)
        except:
            time_elapsed = time() - start_time
            progress_percentage = (count / task_length) * 100

            mark_task_as_failed(task_id, time_elapsed, progress_percentage)

    time_elapsed = time() - start_time
    update_task_progress(task_id, time_elapsed, 100, status="Completed")

Producer

To produce tasks for the Huey worker, we'll create a simple server with the following two endpoints:

1) To add a task to the queue: randomly generates the task id, adds it to the queue, and returns it in a JSON response.

2) To get the status of a task: returns the status (progress, success, etc) of a task in a JSON response.

To make this work, we'll need two more files apart from the common files. The first one is utils.py which contains a simple utility function

from uuid import uuid4
from time import time


def generate_task_id():
    return str(uuid4()) + "---" + str(time())

The second file is app.py which defines our FastAPI server

from os import environ

from fastapi import FastAPI
from fastapi.responses import JSONResponse
import prometheus_client

from .tasks import get_task_progress, long_task, update_task_progress, validate_task_id
from .utils import generate_task_id

METRICS_PORT = int(environ["METRICS_PORT"])
prometheus_client.start_http_server(METRICS_PORT)

app = FastAPI()


@app.post("/api/tasks/add")
def add_task():
    new_task_id = generate_task_id()
    update_task_progress(new_task_id, 0, 0, status="Queued")

    long_task(new_task_id)

    return JSONResponse(status_code=200, content={
        'success': True,
        'result': {
            "id": new_task_id,
            'message': "Task queued.",
        },
    })


@app.get("/api/tasks/status/{task_id}")
async def get_task_status(task_id: str):
    if not validate_task_id(task_id):
        return JSONResponse(status_code=404, content={
            'success': False,
            'error': {
                'message': "Task not found.",
            },
        })

    task_progress = get_task_progress(task_id)

    return JSONResponse(status_code=200, content={
        'success': True,
        'result': {
            'progress': task_progress,
        },
    })

Consumer

The consumer has no other files except the two common files. In a virtual environment, install the dependencies for it

pip install redis huey

To run the consumer, we can run the following command:

huey_consumer.py tasks.huey

Method 1: Two Directories with Duplicate Files

The first method is the most straightforward: we have two separate folders for the two applications and the common files are stored duplicated in both those folders.

Here's the directory structure:

├───consumer
│       .dockerignore
│       Dockerfile
│       docker_build.sh
│       redis_client.py
│       tasks.py
│
└───producer
        .dockerignore
        app.py
        Dockerfile
        docker_build.sh
        redis_client.py
        tasks.py
        utils.py

But, there's an obvious problem here: we have to manage the files in two places, and we'll have duplicated files in our version control. A workaround would be to only update these files in the consumer and modify producer/docker_build.sh to copy the latest files from the consumer folder before building the Docker image. We'll also need to add a run.sh script that performs this copy when we have to run the server in development.

But, if we are copying files, why not just move all the common files into a separate directory?

Method 2: `Common` Directory with the common files

Let's create a new directory called common on the same level as the producer and consumer directories and move the redis_client.py and tasks.py files into this directory. Now, we should have the following directory structure:

├───common
│       redis_client.py
│       tasks.py
│
├───consumer
│       .dockerignore
│       Dockerfile
│       docker_build.sh
│       run.sh
│
└───producer
        .dockerignore
        app.py
        Dockerfile
        docker_build.sh
        run.sh
        utils.py

Let's take a look at producer/docker_build.sh

cp -r ../common/* .

docker build -t your-docker-hub-username/task-producer .

rm redis_client.py
rm tasks.py

In this configuration, my IDE's IntelliSense does not work for the functions imported from the common files.

Another obvious problem is that for each common file, I need to add an rm statement. It is possible to move the entire common folder into producer/common, but that would require us to make changes to the app.py file.

A possible solution is to move the common files into the producer folder and let them stay there. We'll change their permissions so that they cannot be edited by the IDE. But, once again, this means that if we make some changes to a file in the common folder, those changes will not be reflected in our IDE's IntelliSense.

Method 3: A single Directory

For our current requirement i.e. the producer and consumer applications, I find this method the most suitable: we have a single directory with code for both the applications along with:

1) Two run scripts: run_producer.sh and run_consumer.sh.

1) Two docker_build scripts: docker_build_producer.sh and docker_build_consumer.sh.

1) Two Dockerfiles: Dockerfile.producer and Dockerfile.consumer.

1) Two dockerignores: .dockerignore.producer and .dockerignore.consumer.

Here is the directory structure:

│   .dockerignore.consumer
│   .dockerignore.producer
│   Dockerfile.consumer
│   Dockerfile.producer
│   docker_build_consumer.sh
│   docker_build_producer.sh
│   run_consumer.sh
│   run_producer.sh
│
└───src
        app.py
        redis_client.py
        tasks.py
        utils.py

The run scripts are pretty simple, here's run_producer.sh

uvicorn src.app:app --port=8000 --host=0.0.0.0

The docker_build scripts, on the other hand, are a little more convoluted. We have to perform two operations here:

1) Currently, the docker build command does not support passing a custom dockerignore file and defaults to the file in the current directory named .dockerignore. So, while building the producer, we copy .dockerignore.producer to .dockerignore, and delete the .dockerignore once the build is completed.

2) The second thing we have to take care of is passing the right Dockerfile to the docker build command. Fortunately, the docker build command does accept a customer Dockerfile using the -f argument.

The docker_build_producer.sh script should look something like the following

#!/bin/bash

cp .dockerignore.producer .dockerignore

docker build -f Dockerfile.producer -t your-docker-hub-username/task-producer .

rm .dockerignore

This method allows us to use our IDE's IntelliSense and prevent duplicate files in version control but at the cost of having two of building and development-related files. If we had a large number of applications, this could quickly turn into a mess.

A solution would be to use a simple bash script that allows us to run/build the producer/consumer. Create a new file called manager.sh in the project's root directory.

if [ $# -lt 2 ]; then
    echo "Usage: $0 <command> <target>"
    exit 1
fi

COMMAND=$1
TARGET=$2

valid_targets=("producer" "consumer")


run_target() {
    local target_app="$1"

    if [ "$target_app" == "producer" ]; then
        uvicorn src.app:app --port=8000 --host=0.0.0.0
    elif [ "$target_app" == "consumer" ]; then
        huey_consumer.py src.tasks.huey
    fi
}

build_target() {
    local target_app="$1"
    local target_docker_dir="docker/$target_app"

    cp -r $target_docker_dir/. .

    docker build -t your-docker-hub-username/task-$target_app .

    rm Dockerfile
    rm .dockerignore
}

if [ "$TARGET" != "producer" ] && [ "$TARGET" != "consumer" ]; then
    echo "Invalid target. Valid targets are: 'producer' and 'consumer'."
    exit 1
fi

if [ "$COMMAND" == "run" ]; then
    run_target $TARGET
elif [ "$COMMAND" == "build" ]; then
    build_target $TARGET
else
    echo "Invalid command specified. Valid targets are: 'run' and 'build'."
    exit 1
fi

The script accepts two commands - run or build, and one of two targets - producer or consumer.

The directory structure would now look like the following

│   manager.sh
│
├───docker
│   ├───consumer
│   │       .dockerignore
│   │       Dockerfile
│   │
│   └───producer
│           .dockerignore
│           Dockerfile
│
└───src
        app.py
        redis_client.py
        tasks.py
        utils.py

In this way, we can manage multiple applications without having an overwhelming amount of files.

Conclusion

We've gone over three methods of structuring our code so that we can build multiple applications out of the same code base. Which method is the best? Well, as with most things in software engineering, it depends on your use case.

See you next time :)

DIY Weighted Load Balancing in Python

Set up a single-node Kubernetes Cluster (minikube) as well as other tools (Kubectl, Helm, KEDA, etc.) on your local machine.

04m Read

A.N.T

Three ways to Structure a Codebase containing Multiple Applications

Table of Contents

Table of Contents

Expectations

Common Files

Producer

Consumer

Method 1: Two Directories with Duplicate Files

Method 2: `Common` Directory with the common files

Method 3: A single Directory

Conclusion

DIY Weighted Load Balancing in Python