Unlocking Big Data Projects with Temporal Orchestration

Case study -
Health & Wellness
Unlocking Big Data Projects with Temporal Orchestration

This case study demonstrates how Temporal's workflow orchestration provides the desired level of control and insight to help launch a multi-tiered serverless architecture project centered around a complex data pipeline.

Background

Data insights can be hard to come by in the race management industry as there is no agreed upon standard shared by the existing race management platforms.  

Event directors looking to better understand how their data compares to other marathons and races, and solve an industry-wide challenge.

TechFabric’s partner Athlinks (a Life Time company) set out to change things by building a new industry-wide Business Intelligence (BI) platform. With it, directors can not only see their own data broken down in helpful ways, but they can put things in perspective by comparing themselves against similar “cohort” events and industry averages. Added to this, the platform offers a series of potent data-driven tools to help boost registration sales and manage events.  

To make this happen, they first needed to build a data pipeline capable of stitching together upcoming and historic event data. One powerful enough to follow the trail of each individual athlete across a variety of sources.  

In the past, a project with this magnitude of complexity would have been reserved for only the largest of companies capable of big data analysis. However, new technologies and development approaches are now available to make this process significantly more accessible.  

Challenges

The challenges here add up quickly:  

  • Serverless Microservice Architecture: Serverless infrastructure provides an expedient way to create applications, being frequently both more economical and scalable than other architecture options like managed services or virtual machines.  
  • Serverless Limitations: Additionally, serverless functions have execution time limitations that pose a problem with long-running processes.
  • Data Ambiguity: Matching event records over time and across disparate events to actual people can be troublesome as the data is typically inconsistent, incomplete, and/or overly broad. For instance, most result records lack email addresses and provide only an age range rather than a specific DOB. Added to this, throughout their years spent participating, athletes may change their name, phone number, or address.  
  • N² Problem: With over 400 Million records, if not properly curated and controlled the matching/deduplication process can quickly surpass processing limitations, failing under the weight of the gravity of the data
N² Problem Visualized

Solution

  • Workflow Orchestration: A series of Temporal workflows were created to oversee the data pipeline process so that the developers would have granular control where desired without having to micromanage every aspect of the system. The high-level business logic is centralized within the workflows rather than being spread out in various places. This is significantly less labor-intensive and more efficient than using choreography. Additionally, Temporal is more convenient for development and is less limited than other alternatives like AWS Step Functions.
  • Segmented Business Rules. Each tier or step in the data pipeline process is configured with distinct business rules. This greatly increases matching accuracy as it contextualizes the data appropriately for each step.
  • Data Reduction. Multiple avenues were used to condense and reduce data to make it less unwieldy.
  • Retry Mechanism: Temporal seamlessly takes care of retries and ensures sub-steps are not skipped before the next step is initiated.
  • Visibility. Problems are quickly identified by Temporal and pushed along with a preview of the failures to Slack for real-time visibility.

Results

By deciding in the prototyping phase to pair serverless AWS architecture with Temporal, the new application/data pipeline build process has been smooth, especially given the inherent complexity of this project.  

Not only does temporal provide the optics and orchestration layers to build and debug the system, but it also reduces build time as it automatically provides retries and other mechanisms that would otherwise have to be baked into the product.

Bottom line – Implementing Temporal with proper usage of orchestration can make the creation of complex data projects significantly more accessible.  

Let’s build something amazing.

*Unsubscribe anytime.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
We prioritize your privacy. See our privacy policy for more info.