With so many “something-as-a-service” options at our disposal for infrastructure, in 2017 we opted for function-as-a-service (FaaS) to build our platform of e-commerce services. Our mission was, and remains, to build for e-commerce services what AWS built for web services, or what we call the commerce fabric of the internet.
Building with FaaS and only paying for function execution time rather than constantly running servers (i.e. serverless) allowed us to serve customers early on without incurring high infrastructure costs. These customers were large, household names and referrals from our founding team who had transformed digital at Staples. Part of this transformation involved moving from the monolithic e-commerce platform IBM Websphere to a service-oriented architecture with open-source and custom-built software.
E-commerce veterans who knew our founding team wanted to transform digital for their companies, which led us to build our platform of modular commerce services in 2017. Using serverless FaaS with AWS Lambda supported these services reliably and efficiently, and it continued to support them through 2020 when we raised our $9.5M seed round and attracted customers outside of our circle of friends.
But as we raise more money (most recently our $100M Series B) and onboard global brands like GNC, life on the serverless backend is not as easy as it once was. As a result, some challenges that were inconsequential at the time are now more pronounced. In this post, I will explain how we direct these serverless challenges and show you how serverless FaaS powers our platform of commerce services.
Our Serverless Architecture
The current iteration of the fabric platform can be hosted in either AWS, GCP, Azure, and even Knative since all of these are supported by the serverless framework we use. However, in 2017 when we adopted serverless, AWS had the most mature offering so we made the simple choice in choosing Lambda for FaaS.
Flash forward to today and we still use Lambda and sparingly route requests through AWS Fargate, a container-as-a-service (CaaS) environment. Fargate is mainly used when we can’t meet latency service level agreements (SLAs) with Lambda due to cold start problems. However, since Lambda powers 95% of all processes within our platform, we’ve found ways to direct these issues which I’ll talk about below.
As for the way our products are structured, a data store layer is used, followed by a business layer in which most logic is programmed as you would with any other backend scenario. Lambda orchestrates event-driven workflows at the business and data storage layers.
The actual product offerings consist of commerce applications like a product information manager (PIM) and order management system (OMS), referred to as Co-Pilot Apps in the diagram below. (Co-Pilot is the name of the UI that merchandisers, marketers, and other business users interact with.) We also offer e-commerce storefronts that fast-track storefront development. Each type of offering has a separate yet connected business layer.
On the top level, each layer is supported by Lambda in a serverless manner. API gateway routes interface between them, and the logic layer and external services layer interface through integrations.
Every action, process module, and element of the platform is isolated into standalone functions in Lambda. The resulting layout achieves full separation of concern, with a single gateway route connecting with a single layer action/process at any given time. This cancels out the most common issues with resource sharing and allows each action/module to scale independently.
There are numerous tools engineers can use to manage serverless setups of the type I've discussed thus far. However, mainly because it was one of the earlier market representatives, we decided to use the Serverless Framework to manage serverless infrastructure through infrastructure as code (IaC).
Adopting the Serverless Framework as our infrastructure modeling and standardization tool enables quick spring-ups of environments. Further, the vendor-agnostic nature of the framework leaves room for any migration toward a cloud-agnostic, client-requested offering down the road. Using the Serverless Framework, we could support this offering while maintaining the same variables for mushroomed fabric platform instances.
In addition to these benefits, we found getting started with the Serverless Framework incredibly easy, even for a robust platform of applications and APIs like ours. If you want to try it yourself, just launch your terminal and install the node package manager. After that, a simple serverless command sets your environment up for configuration and management through the framework’s CLI. A serverless.yml file is then created, which is where it all started for us in 2017!
Below is what a typical serverless.yml file configuration looks like using the Serverless Framework. The scripting language used is proprietary serverless.js, specific for a node.js app running on a MongoDB data store layer. This setup is similar to our own.
The provider section of the config file allows for the full exploration of cloud services providers. However, we continue to use AWS as our go-to provider as we’ve found that other providers cause capacity bottlenecks as we continue supporting more of the world’s largest retailers and brands. After AWS is selected, each function across the implementation is associated with a handler and specific API gateways.
Using this template has dramatically improved the accuracy and speed of infrastructure configurations as we scale with serverless.
As good as the story sounds so far, we have experienced our fair share of challenges with serverless and continue to experiment with ways to maintain business efficiencies while improving capacity and performance.
While our current setup has enough capacity to support existing customers, there's always concern about having enough serverless infrastructure to support more, larger customers—especially during holiday shopping bursts. To get ahead of this, we are experimenting with augmenting Lambda with Amazon EKS using serverless containers with AWS Fargate.
Cloud-agnostic support is an option we are exploring as well. But before allowing customers to dictate their cloud provider, we need to better understand capacity with GCP and Azure. With customers calling the shots for infrastructure, capacity issues could also arise with AWS. For instance, if a customer selects an AWS region of deployment outside USA East and West, burst concurrency diminishes.
What makes serverless efficient for your business can cause inefficiencies with application performance, particularly when using Lambda functions. These inefficiencies occur during the cold start duration when Lambda is downloading your code and starting a new execution environment.
To reduce inefficiencies, we have tested and implemented several performance optimizations. One workaround we use for avoiding cold starts is periodic warmup events that counteract environment timeouts. We are also exploring Lambda@Edge (Lambda + Amazon Cloudfront) to offset any latency caused by Lambda. However, while doing this, we are trying not to defeat the cost-benefit variable for which Lambda was chosen in the first place.
Stick with Serverless?
Despite these challenges, we remain loyal to serverless infrastructure and putting product development ahead of server management. As one of the fastest-growing tech startups in e-commerce this year, serverless has served us well in supporting new customers while enabling fast iterations of new commerce services like Subscriptions and Loyalty Management.
While AWS Lambda is not yet perfect, we have noticed that, as fabric grows, so too does Lambda in terms of capacity. This gives us breathing room to test iterations of Lambda and other serverless offerings from AWS while continuing conversations and experimentation around matching the right architecture with the right infrastructure.