-
Sustainable Cloud Operations for Research (SCORE): A guideline for cost-efficient and carbon-aware data pipelines design in low- and middle-income countries research
Nigel Thuku Kamotho, Allan Zablon, Stephen Wong, Amos Bunde, Antony Kagure, Thembi Kiiru, Anthony Ngugi, Amina Abubakar, Akbar Waljee, and Farhana Alarakhiya
As cloud computing becomes a growing pillar for modern data intensive research, especially in fields like global health, many organizations across low- and middle-income countries (LMICs) remain hindered by operational constraints. These include limited DevOps capability, unreliable internet connectivity, fixed grant ceilings, and growing demands for environmental sustainability. Standard cloud architecture is based on stable infrastructure and high technical ability, making them neither viable in terms of cost nor sustainability within LMICs environments.
This report introduces the Sustainable Cloud Operations for Research (SCORE) guidelines. It is a guideline for context-specific cloud architecture models for LMICs research groups to optimize architecture decisions for cost, performance, and carbon emissions. These guidelines consist of three phases: Assessment, Selection, and Optimization. There are actionable opportunities at each step to guide pipeline design and resource management under constrained operating conditions.
The guidelines were validated using the National Institutes of Health Chest X-ray dataset (42 GB). Three Azure-native ingestion strategies based on the guidelines are compared: Synapse Pipelines (no-code), Azure Functions (serverless), and Synapse Notebooks (code-based). Execution time, cost, and emissions are tracked using Azure Monitor, Cost Management, and Emissions Insights tools. The results revealed that the serverless approach (Azure Functions) achieved the lowest cost (USD 0.01), lowest carbon emissions (0.00003 kg CO₂e), and shortest execution time (1.12 hours), whereas the Code approach (Synapse Notebooks) incurred the highest cost (USD 15.20) and emissions (0.16901 kg CO₂e), primarily due to a dedicated Spark pool. Additionally, geo-replication accounted for approximately 85% of storage costs, highlighting the need for clearer understanding of cloud pricing structures.
SCORE addresses a critical gap among existing cloud infrastructure guidelines. Rather than providing broad best practices, it provides systematic guidelines in a practical iterative model that addresses the fiscal, technical, and sustainability constraints that LMICs institutions face. The guidelines are designed to be practical, scalable, and uniform in a range of research environments where cloud integration must be efficient and sustainable.
Printing is not supported at the primary Gallery Thumbnail page. Please first navigate to a specific Image before printing.