Amazon S3 + Cloudfront: Multi-page suport

Published on: Sun Dec 11 2022

Series

Content

Introduction

When I first tried this out... I was surprised that this does not work out of the box.

After some research, I found out that this is another one of those AWS gotchas.

Well, not really a gotcha but a small misunderstanding of how S3 actually works.

When working with Amazon S3 and Cloudfront, you can specify a root object.

For example, for a static site, you may choose to use index.html at the root folder.

However, what happens when you have a multi-page site — such as when you are using a framework like Astro ?

Illustrations of the Astro build outputs
Illustrations of the Astro build outputs

When I was working on the Astro technical series, I came across this problem.

So, this article will highlight the problem, the why and also propose a solution to fix this!

Breaking down the problems

Let’s first break down the problem.

S3 the default root object

When you setup S3 with Cloudfront, there is an option to provide a default root object.

My understanding is that this option basically will default to fetching a particular S3 object when you request for the root path (/ ).

Let’s say for example, we set the default root object as index.html .

In our example of a site built using Astro, Cloudfront will automatically map the root path to the default S3 object.

It will notice the root path (/ ) and direct that request to fetch the index.html object in the S3 bucket.

Illustration of fetching the default root object
Illustration of fetching the default root object

Well, what happens when we have multiple pages ?

Nested folders

In multi-page sites, it’s common to have nested folder structures.

However, the problem with the S3 and Cloudfront setup, is that, when you try to request a folder, it doesn’t know what to do.

Illustration of failure when requesting a folder in S3
Illustration of failure when requesting a folder in S3

This is actually not S3’s fault. It was designed that way.

To better understand why this fails, we need to revisit the following: What does a folder in S3 really mean ?

S3 Folders

Amazon S3 as a blob storage may give the impression that it has a hierarchical structure but it technically does not.

In reality, it has a flat structure, similar to a Key-Value store.

What it does behind the scenes is that it’ll organize your files in a way that it creates this hierarchy like a filesystem.

Amazon S3 stores these blobs or objects based on “prefixes”, and these keys will act like namespaces used to group objects together.

For example:

  • "assets/image1.jpeg"

  • "assets/image2.jpeg"

In our example, both of these files share the prefix of "assets", the folder is used as a key prefix to group these files together.

Illustration of how S3 uses folder prefixes
Illustration of how S3 uses folder prefixes

⚠️ Important:

You only need to perform this “remapping” if you are using Cloudfront with Origin Access Identity (OAI) to access S3.

This is because when Cloudfront + OAI tries to access S3, it is using the S3 REST API endpoint (s3:getObject ) to access the resources.

This approach does not support redirection to a default index, hence, it would return with an error.

On the other hand, if you are just using S3 with the website hosting (ie <bucket-name>.s3-website-<AWS-region>.amazonaws.com ), this redirection will happen by default.

So, requesting for /about will redirect paths to the /about/index.html .

A subtle distinction between the two, nevertheless, it is important to point out.

How does this relate to multi-page support ?

This means that when we request for a folder, we are essentially making a get request (s3 get-bucket ) for a prefix.

This leads to an error because S3 doesn’t know what to do in this case because technically this is not a S3 object.

Knowing all of this, what can we do about it in our multi-page site ?

Remapping the S3 object path

One way to solve this problem is by remapping the S3 object path.

We can leverage the power of edge functions on Cloudfront to do this.

This can done through the use of the viewer request function on cloudfront, which is a type function that will run before a request comes into cloudfront.

That way, if any request does not have an extension or a filename, we can direct it to a S3 object of our choice.

In our case, we can do the following:

  • Remap all request uri with no filename to default to index.html object (/* -> /*/index.html )

  • Remap all request uri with no extensions to default to index.html object (/* -> /*/index.html )

This should fulfill all our needs for the multi-page site!

After adding this change, when we make a request for /about it would be remap the uri to /about/index.html .

Illustration of remapping the url
Illustration of remapping the url

Conclusion

We covered quite a bit in this article, let’s do a quick recap.

Takeaways:

  • Think of Amazon S3’s in a way you’d think about a Key-Value stores

  • Use folders in Amazon S3 as a way to group related objects together to create an hierarchy (like a filesystem)

  • A solution to get around requesting a S3 folder is to remap the paths to a default object (ie index.html )

Coming up... we’ll cover how to set this up in our infrastructure using terraform!

And... that’s all for now, stay tuned for more!

I hope you enjoyed this guide and you learned something new!

If you did, please do share this article with a friend or co-worker 🙏❤️ (Thanks!)

Want to get hands on ?

Check out my tutorial on this here 👉 Astro: Adding multi page support with Amazon S3 + Cloudfront (Tutorial)

Helpful References


Enjoy the content ?

Then consider signing up to get notified when new content arrives!

Jerry Chang 2023. All rights reserved.