The growth of SPS Commerce has continued to be very strong, even amidst the recent global pandemic, as we work to provide an enabler for essential services. The demand for SPS services and products continues to grow and our architectural patterns additionally must continue to mature alongside of that demand growth. Like many organizations that started with a smaller footprint in the cloud that experienced exponential growth, part of our growing pain stems from the boundaries of our AWS Account structure. Having hundreds of engineers working across a single AWS Account for development simply doesn’t scale effectively. When we started in a single account for an environment, AWS was a lot less mature in the management capabilities of multiple AWS Accounts for a single Organization. If you are starting out fresh today, definitely heed their warning:
“A well-defined AWS account structure that your teams agree on will help you understand and optimize costs. As with tagging, it is important that you implement a deliberate account strategy early on and allow it to evolve in response to changing needs” (AWS Documentation).
A few of the advantages of working with multiple accounts that are absolute requirements for continued maturity:
Managing multiple AWS Accounts inside AWS Organizations is well beyond the intent of this article. But recognizing the advantages of working in a multi-account organization and its eventual necessity is key. With that being said though, there is a primary difference to having a single shared AWS Account to build and deploy into, compared to building and deploying into multiple AWS Accounts. When you find yourself in this situation, the simplicity of writing IAM permissions in your single AWS Account for your application to have implicit access to everything it needs now suddenly becomes a lot more confusing and difficult to reason about when your application straddles multiple AWS Accounts. Additionally, reasoning aside, the complexities of infrastructure design begin leaking much more heavily into your codebases, wrapped together with business logic as specific AWS Account IDs, and ARNs start to pollute your code. In an effort to keep that leaking of infrastructure knowledge as far to the boundary of our app domain as possible, let us take a look at patterns we can use to work in an AWS multi-account world, with the day-to-day developer experience top of mind. I think you will find that there are a few generic options to enable and simplify cross-account access, but using IAM Role associated with EKS and Kubernetes Service Accounts is the cleanest, and really abstracts the cross-account leaks into the infrastructure.
As SPS Engineering teams begin moving their AWS Resources to isolated team and product-based accounts, they are additionally moving container-based compute workloads in the opposite direction; that is to centralize our container workloads into a new platform built on-top of EKS and other AWS services. The reason SPS centralizes containerized compute is to both increase compute density in the clusters to lower costs, as well as to increase operational efficiency by operating far fewer compute clusters. Through internal custom deployment patterns and architecture, deploying containers to a cluster is relatively simple, alongside the deployment of AWS Resources using Cloud Formation to a different account. But the unanticipated complexity in dealing with cross-account AWS Resource access has caused some churn amongst our engineering teams as we work to standardize and “keep things simple” where possible. Definitely a luxury of an AWS single account world that is difficult to leave behind, but is absolutely necessary as mentioned above for growth and for security.
The below example should set up our scenario and where we need to migrate from; the standard setup of our two resources, followed by an IAM Role with access to the resources in the same account. Generally speaking, you would probably be more restrictive on wildcard usage, but this works well for demonstration and simplicity.
# CloudFormation AWS Resources we want to access
Resources:
AppDemoSecret:
Type: AWS::SecretsManager::Secret
Properties:
Name: AppDemoSecret
Description: AWS Secret Deployed via CFN
KmsKeyId: '{{resolve:ssm:/organizations/sip32/sip32-resources-v0-KeyARN:1}}' # KMS Key we have created and are resolving the ARN for
AppDemoTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: AppDemoTable
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: "Code"
AttributeType: "S"
KeySchema:
- AttributeName: "Code"
KeyType: "HASH"
# CloudFormation - AWS IAM Role to enable our application code to read these resources.
AppRole:
Type: AWS::IAM::Role
Properties:
RoleName: AppRole
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action:
- sts:AssumeRole
Policies:
- PolicyName: SecretAccess
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- kms:Decrypt
Resource: *
- Effect: Allow
Action:
- secretsmanager:Describe*
- secretsmanager:GetSecretValue
Resource: *
- PolicyName: DynamoDBAccess
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- dynamodb:Get*
Resource: *
Applying your created IAM Role to your container implicitly for usage is beyond the scope or direction of this post. It is assumed that you have the capability to do that already. Whether you are using ECS with IAM, Kube2Iam, kiam, or something else, the general principles in the discussion are very similar if not the same.
Let’s dive in and evaluate some of the available patterns.
The first obvious approach is to assume a different role that exists inside the new AWS Account where the resources are.
This has the implication of needing to create two IAM Roles, one in each account, with two different Cloud Formation stacks deployed to their respective accounts. The IAM Role created in Account B only has permissions to assume the role created in Account A. We have to provide an explicit “AssumeRolePolicyDocument” that gives the ability to assume this role from the other account.
# CloudFormation - AWS IAM Role in SAME ACCOUNT B - this only has the ability to ASSUME the other Role
AppRole:
Type: AWS::IAM::Role
Properties:
RoleName: AppRole
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action:
- sts:AssumeRole
Policies:
- PolicyName: NewAccountRoleAccess
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- sts:AssumeRole
Resource: "arn:aws:iam::*:role/AppRole*"
## NOTE: THESE WOULD EXIST IN DIFFERENT CLOUD FORMATION TEMPLATES, DEPLOYED TO THEIR RESPECTIVE ACCOUNTS.
# CloudFormation - AWS IAM Role in NEW ACCOUNT A - this has the ability to consume the resources
AppRole:
Type: AWS::IAM::Role
Properties:
RoleName: AppRole
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
# CORE CHANGE FOR THE ROLE IN THE NEW ACCOUNT IS TO ALLOW IT BE ASSUMED FROM THE OLD ACCOUNT
# WARNING: Demonstration only, in production you'll want to provide more restrictive access than :root.
- Effect: Allow
Action:
- sts:AssumeRole
Principal:
AWS: !Sub
- 'arn:aws:iam::${AccountId}:root'
- { AccountId: 'YOUR-OLD-ACCOUNT-ID-HERE' }
Policies:
... SAME AS POLICIES FROM INITIAL EXAMPLE TO READ RESOURCES....
With permissions in place, your application must also be updated to support this capability to assume the new role as needed. Take a look at what this glue code might look like if we wanted to access and read a record from the Dynamo Table:
// **NEW** accessing the DynamoDB table in the other account we need to assume the role allowed in ACCOUNT A (this code is executing in ACCOUNT B inside the container).
Credentials creds;
using (var client = new AmazonSecurityTokenServiceClient())
{
var request = new AssumeRoleRequest
{
RoleArn = "arn:aws:iam::{AccountId}:role/AppRole", // THIS MUST BE THE FULL IAM ROLE ARN FROM ACCOUNT A
DurationSeconds = 900, // must be at least 900 (15 min ) but less than 1hr (3600 seconds MAX)
RoleSessionName = "RoleFromMyApp"
};
var result = await client.AssumeRoleAsync(request);
creds = result.Credentials;
}
// **SAME** normal access the dynamic DB table (this the same code, the only difference is padding in the assumed role credentials to the Amazon Dynamo DB Client)
using (var client = new AmazonDynamoDBClient(creds))
{
var response = await client.ScanAsync("AppDemoTable", new List<String> { "Code" });
var item = response.Items.ToList().SingleOrDefault();
return $"Value from Dynamo: { item["Code"].S }";
}
In this contrived and simple example, you can see the code to assume the necessary role is more lines than the original code to just access the table directly in the same account. The duplicity of adding multiple roles, and having to assume the proper role before using a resource definitely adds a bit of pain and requires augmentation to any codebase moving to the multi-account world. In general, I think there are a series of small pain points that just don’t make this ideal:
Assuming a cross-account role will definitely work. In fact, this is a great escape hatch and technique that may be necessary as you migrate to multiple AWS Accounts, or perhaps as you decompose an existing monolithic application and need to piece together more than one account or resource (i.e. assume different roles for different features), but it is not the destination we wanted to land on for our developer experience.
Resource-based policies are a newer capability that natively supports the idea of accessing resources between accounts without having to assume a role. Resource-based policies are added to the resource you want to access and define explicitly who should have permission. This is different than the normal Identity-based policies, so be sure to review the differences. I think of Resource-based policies as being similar to the “AssumeRolePolicyDocument” for defining who can assume the role instead it just defines exactly who should have permissions to use it. Once you add a Resource policy to an AWS Resource, it then applies to both cross-account access as well as access within the same account too (interesting behavior that has uses for other security discussions as well). Much like the definition of any trust relationship, definitions in the Resource policy MUST NOT use any wildcarding, but rather full references to any ARNs including those wanting access from other accounts. These ARNs are also validated and will fail deployment if they do not exist. Resource policies have become very popular to give cross-account access (or even third-party access) for S3 buckets. AWS Secrets Manager is additionally often used with resource policies to share secrets in an organization with multiple accounts.
Let us take a look at with this does to our cloud formation templates. This time, we only need one IAM Role in Account B, but we’ll need to define a Resource policy on our AWS Secret Manager resource.
# CLOUD FORMATION RESOURCES - DEPLOYED TO ACCOUNT A
Resources:
AppDemoSecret:
Type: AWS::SecretsManager::Secret
Properties:
Name: AppDemoSecret
Description: AWS Secret Deployed via CFN
KmsKeyId: '{{resolve:ssm:/organizations/sip32/sip32-resources-v0-KeyARN:1}}' # KMS Key we have created and are resolving the ARN for
AppDemoSecretResourcePolicy:
Type: AWS::SecretsManager::ResourcePolicy
Properties:
SecretId: !Ref 'AppDemoSecret' # Note: At this time you can only assign one Secret, making this policy NOT REUSEABLE, NOOOO :(
ResourcePolicy:
Version: '2012-10-17'
Statement:
# allow access to this secret from our application deployed in the compute or legacy account.
- Effect: Allow
Principal:
AWS: !Sub
- 'arn:aws:iam::${AccountId}:role/AppRole' # THIS MUST BE THE FULL IAM ROLE ARN (no wildcarding)
- { AccountId: 'ACCOUNT-ID-FROM-ACCOUNT-B-TO-GIVE-ACCESS' }
Action:
- secretsmanager:Describe*
- secretsmanager:GetSecretValue
Resource: '*'
# CLOUD FORMATION APP ROLE - DEPLOYED TO ACCOUNT B
Resources:
AppRole:
Type: AWS::IAM::Role
Properties:
RoleName: AppRole
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action:
- sts:AssumeRole
Policies:
- PolicyName: SecretAccess
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- kms:Decrypt
Resource: *
- Effect: Allow
Action:
- secretsmanager:Describe*
- secretsmanager:GetSecretValue
Resource:
# SINCE THIS IS CROSS-ACCOUNT NOW, RESOURCE:* CANNOT BE USED. We have to be more explicit on the account / region.
# You can still use wildcard here without having the full ARN to the secret.
- !Sub
- 'arn:aws:secretsmanager:${AWS::Region}:${AccountId}:secret:AppDemoSecret*'
- { AccountId: 'ACCOUNT-ID-WHERE-SECRETS-EXIST' }
This would effectively enable us to now make a cross-account request to access an AWS Secret Manager value. The only difference in code that I need to make is to request the Secret by FULL ARN (super annoying, but obviously necessary in a multi-account world):
using (var client = new AmazonSecretsManagerClient())
{
var request = new GetSecretValueRequest();
request.SecretId = "AppDemoSecret"; // Use Full ARN For Cross-Account: "arn:aws:secretsmanager:us-east-1:<AWS-ACCOUNT-ID-B>:secret:AppDemoSecret";
var secretValue = client.GetSecretValueAsync(request).Result;
}
You’ll notice that this effectively solves a couple of pieces of complexity surrounding the simplification of the code, and we only require one IAM role now. That being said, there are several caveats:
While this pattern using Resource policies definitely can simplify some of your code, it certainly makes your Cloud Formation dramatically more complex in my opinion. The fact that you cannot use Resource policies for all AWS Resources and the caveat about usage limits make this pattern fall a bit short in some cases as a one-stop pattern for working with multiple accounts. It might just be preferable to use role assumption to enable full compatibility of all resources and standardize your pattern for access.
Sometimes it is easy to get blinded by the way you have always done things. It’s very engrained for us engineers working in AWS to consider how we associate IAM roles to Compute, assuming that compute is EC2 (definitely speaking about myself here!). If you are using EKS or ECS, you are automatically thinking about what the worker node permissions are and attempting to understand how you associate an IAM Role to it. Using EKS, an alternative exists to associate IAM roles directly to Kubernetes Service Accounts. This provides an entirely different offering for achieving our centralized compute with isolated AWS Accounts for AWS Resources. In fact, this capability provides a much more Kube native experience for associating granular permissions and IAM roles. Since this takes advantage of the AssumeRoleWithWebIdentity API part of STS, we can make use of this pattern regardless of the AWS Account we are in by setting up the Open ID Connect Providers on a per cluster basis. Find all the technical details for set up in the “IAM Roles for Service Accounts Technical Overview“.
While you can clearly see in the provided walkthrough how to set up the provider, IAM Role, and Kube Service Account, the following is the only change we have that is necessary inside our Cloud Formation template. It is pretty close to the original scenario we started within a single account, with the one key difference in the Role “AssumeRolePolicyDocument”, giving access to the OIDC Provider to retrieve temporary credentials to assume the role. All the AWS Resources exist in AWS Account A, while all the Kube Resources exist in the cluster in AWS Account B.
The particulars of the OIDC Provider and the Kube Service Accounts are well explained in the technical documentation and have been intentionally left out here to focus on the day-to-day experience faced by an Engineering team adding a new service having to use multiple AWS Accounts.
Resources:
AppDemoSecret:
... NO DIFFERENT IN SECRET CREATION
AppDemoTable:
... NO DIFFERENT IN TABLE CREATION ...
AppRole:
Type: AWS::IAM::Role
Properties:
RoleName: AppRole
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action:
- sts:AssumeRole
# ALLOW THE OIDC PROVIDER CREATED TO ASSUME THIS ROLE
- Effect: Allow
Principal:
# ID Provided by the OIDC Provider
Federated: !Sub arn:aws:iam::${AWS::AccountId}:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/AFF2087DBFD0EF4528F23B057B9F1C21
Action:
- sts:AssumeRoleWithWebIdentity
Condition:
# THIS CONDITION IS NECESSARY TO ENSURE THIS ROLE CAN ONLY BE USED BY THE PROVIDED SERVICE ACCOUNT, AND NOT BY EVERY SERVICE ACCOUNT
SringEquals:
"oidc.eks.us-east-1.amazonaws.com/id/AFF2087DBFD0EF4528F23B057B9F1C21": "system:serviceaccount:your-namespace:service-account-name"
}
}
Policies:
... NO DIFFERENT FOR POLICY CREATION ...
Additionally, recognize in this pattern that since the role itself is not being assumed, but rather directly attached from inside AWS Account A, no additional code changes are necessary. Usage of Full ARNs is not a requirement to change in your code, The management of getting the temporary credentials by request to the API is fully ready to use.
Using IAM Roles with Kubernetes Service Accounts is a fantastic option that, with a bit of work, solves the main developer experience requirements we were trying to attain, by mostly making the cross-account capability something the engineers do not have to worry about. That being said, there are many workloads and compute infrastructures not driven in containers or not easily portable to EKS from other platforms. While we can target a simpler, cleaner future with EKS as our centralized compute platform, existing or legacy code migrating to multiple AWS Accounts have options as well, albeit requiring a bit more work and forward planning using role assumption and resource policies. Additionally, consider that depending on your service and logic you may require a hybrid of some or all of these options working together.
Moving forward at SPS Commerce, we are working to simplify the process of our Engineers building for multiple AWS Accounts by using our in-house “BDP Core” deploy service to orchestrate the deployment of Kubernetes Charts alongside automated IAM Role creation that links and associates it to the appropriate Service Account and OIDC Provider all automatically. A story for next time, when it is up and running…
*Note: Special note thank the efforts of the SPS Commerce Cloud Operations & Site Reliability Engineering teams for their work in evaluating and building proof of concepts of the patterns outlined here.
[…] be helpful related to encryption for an audit. Newer AWS Services also offer the ability to use Resource Policies to define who has access to your secret (so the whole world cannot just add their own access via […]