The Art of Problem Decomposition in Computer Science

An Illustrative Case Study of a Mildly Complex Problem - Part 1

Posted by Sultan M. Babar, MD on March 5, 2024

Introduction

In the realm of computer science, the concept of problem decomposition stands as a cornerstone technique for problem-solving. This approach involves breaking down a complex problem into smaller, more manageable parts. This is done to make them easier to understand, analyze, and solve. While I have previously discussed the decomposition of simpler problems in another blog post, today, we delve into the intricacies of dissecting a slightly more complex issue. The journey through this case study will not only highlight the practical application of decomposition but also underscore its universality across different domains. Even though this example does not involve clinical informatics, the approach is nearly identical.

The Problem

Although this case is not clinical informatics, it involves a problem that we faced in a clinical setting. The physicians in the Department of Emergency Medicine received their work schedules through emails, specifically in Excel sheets. This method might seem straightforward, but it had significant drawbacks. The main issue was that the Excel sheets were organized by shift and date, which meant that physicians had to comb through the sheet to find their individual schedules. This process was not only time-consuming but also prone to mistakes, as the volume of information could easily overwhelm anyone. Additionally, once they found their schedules, the staff had to manually input this information into their personal calendars. This extra step was not just tedious; it was also prone to errors, which can disrupt personal plans or professional responsibilities. A sample of what this schedule could look like is shown in figure 1.

A sample of a schedule in an Excel sheet
Figure 1: A sample of a schedule in an Excel sheet

Furthermore, every time there was a schedule change, the whole process of distributing, finding, and manually entering schedules had to be redone, adding to the frustration and taking time away from patient care. This situation underscores the importance of solving operational problems effectively to maintain a smooth clinical operation, ensure staff satisfaction, and uphold the quality of care provided to patients.

The Solution: Automate the Process

Given the complexity of the problem, we realized that the solution had to be multi-faceted. The solution was to automate the process of distributing schedules to the physicians. This involved creating a script application that would automate the entire process. The application would also send them notifications of upcoming shifts and perform some other related tasks.

Decomposing the Problem/Solution: High Level

Before diving into the details of the solution, it was important to break down the problem into smaller, more manageable parts. The high-level decomposition of the proposed solution to the problem involved the following steps:

  1. Download Excel sheets from the email inbox
  2. Extract the relevant information from the Excel sheets
  3. Automatically add it to schedules for the physicians

Decomposing the Problem/Solution: 1. Downloading Excel Sheets

Downloading the Excel sheets from the email inbox was the first critical step in our automation process. While this task might seem straightforward for humans, computers require explicit, step-by-step instructions to accomplish it. This step was divided into several sub-steps, each addressing a key aspect of the process:

  1. Google APIs

    What is an API?

    Before we delve into Google APIs, it's crucial to understand what an API (Application Programming Interface) is. An API is a set of instructions, codes, routines, and tools that allows two programs to interact with each other without human intervention. It specifies how software components should interact and can greatly simplify programming by abstracting complex code into simple, actionable functions.

    A screenshot of code used to make a gmail api
    Figure 2: A screenshot of code used to make a gmail api

    The above code, when configured with the appropriate files and scopes, enables my program to authenticate and interact with Gmail, performing functions within the specified scopes. CREATE_SERVICE is a function imported directly from a Google library. For full details on the Gmail API, visit the official documentation. The above function returns what is referred to as a "service." For our purposes, we will assume that it is the key that allows us to perform further functions.

  2. Identifying the Email to Download

    Now we will use the service and a specific query to search for relevant e-mails

    email_messages = search_mail(gmail_service, query_string, ['INBOX'])

    Here, the search_mail function is a custom function that takes the service we created in the previous step and a query string as input and returns the email messages that match the query. The query string is a string that follows the Gmail search syntax. For example, the query string "from: sender has:Attachment after: time -term that will exclude the emails." This snippet returns a list of messages that meet the criteria specified in the variable query_string. Note here that the computer does not accept time in human format. It expects time here in what is called epoch time or UNIX time. Epoch time (or UNIX time) is an integer that basically represents the number of seconds that have passed since 00:00:00 UTC on 1 January 1970. This is another piece of information I had to learn as I was breaking down and solving this problem.

    This function is a foundational step in our process. It authenticates our application with Google, providing the necessary permissions to send, read, and compose emails via the Gmail API. By setting up this service, we can programmatically manage email interactions, which is essential for automating the download of Excel sheets containing the physicians' schedules.

  3. Downloading and saving the attachments

    Once the correct emails are identified, the next step is to download and process these messages. We leverage the Google Gmail API to programmatically access and download email content, including attachments, using the following code:

    A screenshot of code used to process message found.
    Figure 3: A screenshot of code used to process message found.

    The above code will download the attachments and save them as downloaded_attachment_{n}.xlsx where n increases continuously. This function also returns the list of files that were downloaded.

Summary

This marks the initial phase in this somewhat complex process of automating the distribution of work schedules to physicians—a crucial step towards enhancing operational efficiency in a clinical setting. This level of breakdown is necessary just to download Excel sheets from email inboxes. The real work starts once these Excel sheets are securely downloaded. The next installment of this series will explore the critical steps that follow, including the extraction and accurate interpretation of the data from each cell of the excel sheet, and the process to update the individual physician calendars accordingly.

Disclaimer

This post serves as a guide on how to break down and solve a problem, rather than focusing on coding or Python best practices. As an amateur programmer, my approach to coding and the overall design of the program might not align with professional standards. I am sure that seasoned professionals might find aspects of my methodology and coding practices amusing or unconventional. My primary aim is to share the thought process and problem-solving strategies that I found effective, hoping they might be useful to others facing similar challenges, regardless of their coding proficiency.