It is common to find codebases (monorepos) in which logic is repeated across multiple applications. We evaluate three ways to structure such codebases for ease of development, version control, and deployment.
2024-03-17
06m Read
By: Abhilaksh Singh Reen
In this Blog Post, we cover three ways to structure a multi-application code base with common files between the applications.
We evaluate the different methods on the basis of:
1) Ease of development
2) Ease of Version Control
3) Building a Docker image and deploying
I have a very simple demo application: a producer and a consumer that produce and consume tasks for the Huey task queue. The producer and consumer are two separate applications but they have some files in common: such as the redis_client.py file that sets up the connection to Redis, or the tasks.py file that defines the tasks that have to be produced or consumed.
Once the development of the applications is complete, we will also be building two Docker images for the producer and consumer and pushing them to Docker Hub.
Both applications have two files in common. The first one is the redis_client.py file that contains our Redis connection:
from os import environ
from redis import Redis
REDIS_HOST = environ["REDIS_HOST"]
REDIS_PORT = int(environ["REDIS_PORT"])
REDIS_DB = (environ["REDIS_DB"])
REDIS_PASSWORD = environ.get("REDIS_PASSWORD", None)
redis_client = Redis(host=REDIS_HOST, port=REDIS_PORT, db=REDIS_DB, password=REDIS_PASSWORD, decode_responses=True)
The second file is the tasks.py file that contains the tasks registered in Huey as well as some utility functions:
from json import dumps as json_dumps, loads as json_loads
from os import environ
from time import time
from huey import RedisHuey
from .redis_client import redis_client
REDIS_HOST = environ["REDIS_HOST"]
huey = RedisHuey("entrypoint", host=REDIS_HOST)
task_status_map_key = "long_tasks"
task_length = 100_000_000
task_num_intervals = 100
def validate_task_id(task_id):
progress_str = redis_client.hget(task_status_map_key, task_id)
return progress_str is not None
def get_task_progress(task_id):
progress_str = redis_client.hget(task_status_map_key, task_id)
if progress_str is None:
return {
'message': "Task not found."
}
return json_loads(progress_str)
def update_task_progress(task_id, time_elapsed, progress_percentage, status="Processing"):
task_progress = {
'status': status,
'timeElapsed': str(round(time_elapsed, 4)),
'progress': str(round(progress_percentage, 4)),
}
task_progress_str = json_dumps(task_progress)
redis_client.hset(task_status_map_key, task_id, task_progress_str)
def mark_task_as_failed(task_id, time_elapsed=None, progress_percentage=None):
task_progress = get_task_progress(task_id)
if time_elapsed is not None:
task_progress['timeElapsed'] = str(round(time_elapsed, 4))
if progress_percentage is not None:
task_progress['progress'] = str(round(progress_percentage, 4))
task_progress['status'] = "Failed"
task_progress['failedAt'] = task_progress['progress']
task_progress_str = json_dumps(task_progress)
redis_client.hset(task_status_map_key, task_id, task_progress_str)
@huey.task()
def long_task(task_id: str):
progress_update_interval = task_length / task_num_intervals
start_time = time()
update_task_progress(task_id, 0, 0)
count = 0
for i in range(1, task_length + 1):
try:
count += 1
if i % progress_update_interval == 0:
time_elapsed = time() - start_time
progress_percentage = (count / task_length) * 100
update_task_progress(task_id, time_elapsed, progress_percentage)
except:
time_elapsed = time() - start_time
progress_percentage = (count / task_length) * 100
mark_task_as_failed(task_id, time_elapsed, progress_percentage)
time_elapsed = time() - start_time
update_task_progress(task_id, time_elapsed, 100, status="Completed")
To produce tasks for the Huey worker, we'll create a simple server with the following two endpoints:
1) To add a task to the queue: randomly generates the task id, adds it to the queue, and returns it in a JSON response.
2) To get the status of a task: returns the status (progress, success, etc) of a task in a JSON response.
To make this work, we'll need two more files apart from the common files. The first one is utils.py which contains a simple utility function
from uuid import uuid4
from time import time
def generate_task_id():
return str(uuid4()) + "---" + str(time())
The second file is app.py which defines our FastAPI server
from os import environ
from fastapi import FastAPI
from fastapi.responses import JSONResponse
import prometheus_client
from .tasks import get_task_progress, long_task, update_task_progress, validate_task_id
from .utils import generate_task_id
METRICS_PORT = int(environ["METRICS_PORT"])
prometheus_client.start_http_server(METRICS_PORT)
app = FastAPI()
@app.post("/api/tasks/add")
def add_task():
new_task_id = generate_task_id()
update_task_progress(new_task_id, 0, 0, status="Queued")
long_task(new_task_id)
return JSONResponse(status_code=200, content={
'success': True,
'result': {
"id": new_task_id,
'message': "Task queued.",
},
})
@app.get("/api/tasks/status/{task_id}")
async def get_task_status(task_id: str):
if not validate_task_id(task_id):
return JSONResponse(status_code=404, content={
'success': False,
'error': {
'message': "Task not found.",
},
})
task_progress = get_task_progress(task_id)
return JSONResponse(status_code=200, content={
'success': True,
'result': {
'progress': task_progress,
},
})
The consumer has no other files except the two common files. In a virtual environment, install the dependencies for it
pip install redis huey
To run the consumer, we can run the following command:
huey_consumer.py tasks.huey
The first method is the most straightforward: we have two separate folders for the two applications and the common files are stored duplicated in both those folders.
Here's the directory structure:
├───consumer
│ .dockerignore
│ Dockerfile
│ docker_build.sh
│ redis_client.py
│ tasks.py
│
└───producer
.dockerignore
app.py
Dockerfile
docker_build.sh
redis_client.py
tasks.py
utils.py
But, there's an obvious problem here: we have to manage the files in two places, and we'll have duplicated files in our version control. A workaround would be to only update these files in the consumer and modify producer/docker_build.sh to copy the latest files from the consumer folder before building the Docker image. We'll also need to add a run.sh script that performs this copy when we have to run the server in development.
But, if we are copying files, why not just move all the common files into a separate directory?
Let's create a new directory called common on the same level as the producer and consumer directories and move the redis_client.py and tasks.py files into this directory. Now, we should have the following directory structure:
├───common
│ redis_client.py
│ tasks.py
│
├───consumer
│ .dockerignore
│ Dockerfile
│ docker_build.sh
│ run.sh
│
└───producer
.dockerignore
app.py
Dockerfile
docker_build.sh
run.sh
utils.py
Let's take a look at producer/docker_build.sh
cp -r ../common/* .
docker build -t your-docker-hub-username/task-producer .
rm redis_client.py
rm tasks.py
In this configuration, my IDE's IntelliSense does not work for the functions imported from the common files.
Another obvious problem is that for each common file, I need to add an rm statement. It is possible to move the entire common folder into producer/common, but that would require us to make changes to the app.py file.
A possible solution is to move the common files into the producer folder and let them stay there. We'll change their permissions so that they cannot be edited by the IDE. But, once again, this means that if we make some changes to a file in the common folder, those changes will not be reflected in our IDE's IntelliSense.
For our current requirement i.e. the producer and consumer applications, I find this method the most suitable: we have a single directory with code for both the applications along with:
1) Two run scripts: run_producer.sh and run_consumer.sh.
1) Two docker_build scripts: docker_build_producer.sh and docker_build_consumer.sh.
1) Two Dockerfiles: Dockerfile.producer and Dockerfile.consumer.
1) Two dockerignores: .dockerignore.producer and .dockerignore.consumer.
Here is the directory structure:
│ .dockerignore.consumer
│ .dockerignore.producer
│ Dockerfile.consumer
│ Dockerfile.producer
│ docker_build_consumer.sh
│ docker_build_producer.sh
│ run_consumer.sh
│ run_producer.sh
│
└───src
app.py
redis_client.py
tasks.py
utils.py
The run scripts are pretty simple, here's run_producer.sh
uvicorn src.app:app --port=8000 --host=0.0.0.0
The docker_build scripts, on the other hand, are a little more convoluted. We have to perform two operations here:
1) Currently, the docker build command does not support passing a custom dockerignore file and defaults to the file in the current directory named .dockerignore. So, while building the producer, we copy .dockerignore.producer to .dockerignore, and delete the .dockerignore once the build is completed.
2) The second thing we have to take care of is passing the right Dockerfile to the docker build command. Fortunately, the docker build command does accept a customer Dockerfile using the -f argument.
The docker_build_producer.sh script should look something like the following
#!/bin/bash
cp .dockerignore.producer .dockerignore
docker build -f Dockerfile.producer -t your-docker-hub-username/task-producer .
rm .dockerignore
This method allows us to use our IDE's IntelliSense and prevent duplicate files in version control but at the cost of having two of building and development-related files. If we had a large number of applications, this could quickly turn into a mess.
A solution would be to use a simple bash script that allows us to run/build the producer/consumer. Create a new file called manager.sh in the project's root directory.
if [ $# -lt 2 ]; then
echo "Usage: $0 <command> <target>"
exit 1
fi
COMMAND=$1
TARGET=$2
valid_targets=("producer" "consumer")
run_target() {
local target_app="$1"
if [ "$target_app" == "producer" ]; then
uvicorn src.app:app --port=8000 --host=0.0.0.0
elif [ "$target_app" == "consumer" ]; then
huey_consumer.py src.tasks.huey
fi
}
build_target() {
local target_app="$1"
local target_docker_dir="docker/$target_app"
cp -r $target_docker_dir/. .
docker build -t your-docker-hub-username/task-$target_app .
rm Dockerfile
rm .dockerignore
}
if [ "$TARGET" != "producer" ] && [ "$TARGET" != "consumer" ]; then
echo "Invalid target. Valid targets are: 'producer' and 'consumer'."
exit 1
fi
if [ "$COMMAND" == "run" ]; then
run_target $TARGET
elif [ "$COMMAND" == "build" ]; then
build_target $TARGET
else
echo "Invalid command specified. Valid targets are: 'run' and 'build'."
exit 1
fi
The script accepts two commands - run or build, and one of two targets - producer or consumer.
The directory structure would now look like the following
│ manager.sh
│
├───docker
│ ├───consumer
│ │ .dockerignore
│ │ Dockerfile
│ │
│ └───producer
│ .dockerignore
│ Dockerfile
│
└───src
app.py
redis_client.py
tasks.py
utils.py
In this way, we can manage multiple applications without having an overwhelming amount of files.
We've gone over three methods of structuring our code so that we can build multiple applications out of the same code base. Which method is the best? Well, as with most things in software engineering, it depends on your use case.
See you next time :)