Big Data

DynamoDB Secondary Indexes | Rockset

January 11, 2025

[ad_1]

Introduction

Indexes are a vital a part of correct information modeling for all databases, and DynamoDB isn’t any exception. DynamoDB’s secondary indexes are a strong software for enabling new entry patterns in your information.

On this publish, we’ll take a look at DynamoDB secondary indexes. First, we’ll begin with some conceptual factors about how to consider DynamoDB and the issues that secondary indexes clear up. Then, we’ll take a look at some sensible suggestions for utilizing secondary indexes successfully. Lastly, we’ll shut with some ideas on when you must use secondary indexes and when you must search for different options.

Let’s get began.

What’s DynamoDB, and what are DynamoDB secondary indexes?

Earlier than we get into use circumstances and finest practices for secondary indexes, we must always first perceive what DynamoDB secondary indexes are. And to try this, we must always perceive a bit about how DynamoDB works.

This assumes some fundamental understanding of DynamoDB. We’ll cowl the fundamental factors that you must know to know secondary indexes, however for those who’re new to DynamoDB, you could wish to begin with a extra fundamental introduction.

The Naked Minimal you Must Learn about DynamoDB

DynamoDB is a novel database. It is designed for OLTP workloads, that means it is nice for dealing with a excessive quantity of small operations — consider issues like including an merchandise to a buying cart, liking a video, or including a touch upon Reddit. In that manner, it might deal with comparable purposes as different databases you may need used, like MySQL, PostgreSQL, MongoDB, or Cassandra.

DynamoDB’s key promise is its assure of constant efficiency at any scale. Whether or not your desk has 1 megabyte of knowledge or 1 petabyte of knowledge, DynamoDB needs to have the identical latency in your OLTP-like requests. It is a massive deal — many databases will see diminished efficiency as you improve the quantity of knowledge or the variety of concurrent requests. Nonetheless, offering these ensures requires some tradeoffs, and DynamoDB has some distinctive traits that that you must perceive to make use of it successfully.

First, DynamoDB horizontally scales your databases by spreading your information throughout a number of partitions beneath the hood. These partitions should not seen to you as a consumer, however they’re on the core of how DynamoDB works. You’ll specify a main key in your desk (both a single component, known as a ‘partition key’, or a mix of a partition key and a kind key), and DynamoDB will use that main key to find out which partition your information lives on. Any request you make will undergo a request router that can decide which partition ought to deal with the request. These partitions are small — usually 10GB or much less — to allow them to be moved, break up, replicated, and in any other case managed independently.

Horizontal scalability through sharding is fascinating however is certainly not distinctive to DynamoDB. Many different databases — each relational and non-relational — use sharding to horizontally scale. Nonetheless, what is distinctive to DynamoDB is the way it forces you to make use of your main key to entry your information. Fairly than utilizing a question planner that interprets your requests right into a sequence of queries, DynamoDB forces you to make use of your main key to entry your information. You’re primarily getting a straight addressable index in your information.

The API for DynamoDB displays this. There are a sequence of operations on particular person gadgets (GetItem, PutItem, UpdateItem, DeleteItem) that permit you to learn, write, and delete particular person gadgets. Moreover, there’s a Question operation that means that you can retrieve a number of gadgets with the identical partition key. You probably have a desk with a composite main key, gadgets with the identical partition key might be grouped collectively on the identical partition. They are going to be ordered in response to the kind key, permitting you to deal with patterns like “Fetch the latest Orders for a Person” or “Fetch the final 10 Sensor Readings for an IoT Machine”.

For instance, lets say a SaaS utility that has a desk of Customers. All Customers belong to a single Group. We would have a desk that appears as follows:

We’re utilizing a composite main key with a partition key of ‘Group’ and a kind key of ‘Username’. This permits us to do operations to fetch or replace a person Person by offering their Group and Username. We are able to additionally fetch the entire Customers for a single Group by offering simply the Group to a Question operation.

What are secondary indexes, and the way do they work

With some fundamentals in thoughts, let’s now take a look at secondary indexes. One of the simplest ways to know the necessity for secondary indexes is to know the issue they clear up. We have seen how DynamoDB partitions your information in response to your main key and the way it pushes you to make use of the first key to entry your information. That is all effectively and good for some entry patterns, however what if that you must entry your information another way?

In our instance above, we had a desk of customers that we accessed by their group and username. Nonetheless, we might also must fetch a single consumer by their e mail deal with. This sample would not match with the first key entry sample that DynamoDB pushes us in direction of. As a result of our desk is partitioned by completely different attributes, there’s not a transparent technique to entry our information in the way in which we would like. We might do a full desk scan, however that is sluggish and inefficient. We might duplicate our information right into a separate desk with a special main key, however that provides complexity.

That is the place secondary indexes are available. A secondary index is principally a completely managed copy of your information with a special main key. You’ll specify a secondary index in your desk by declaring the first key for the index. As writes come into your desk, DynamoDB will routinely replicate the info to your secondary index.

Word: The whole lot on this part applies to world secondary indexes. DynamoDB additionally gives native secondary indexes, that are a bit completely different. In virtually all circumstances, you will have a worldwide secondary index. For extra particulars on the variations, try this text on selecting a worldwide or native secondary index.

On this case, we’ll add a secondary index to our desk with a partition key of “E-mail”. The secondary index will look as follows:

Discover that this is identical information, it has simply been reorganized with a special main key. Now, we will effectively lookup a consumer by their e mail deal with.

In some methods, that is similar to an index in different databases. Each present a knowledge construction that’s optimized for lookups on a selected attribute. However DynamoDB’s secondary indexes are completely different in just a few key methods.

First, and most significantly, DynamoDB’s indexes dwell on completely completely different partitions than your predominant desk. DynamoDB needs each lookup to be environment friendly and predictable, and it needs to supply linear horizontal scaling. To do that, it must reshard your information by the attributes you will use to question it.

Screenshot 2024-02-22 at 11.37.21 AM

In different distributed databases, they typically do not reshard your information for the secondary index. They will often simply keep the secondary index for all information on the shard. Nonetheless, in case your indexes do not use the shard key, you are shedding among the advantages of horizontally scaling your information as a question with out the shard key might want to do a scatter-gather operation throughout all shards to search out the info you are searching for.

A second manner that DynamoDB’s secondary indexes are completely different is that they (typically) copy your entire merchandise to the secondary index. For indexes on a relational database, the index will typically include a pointer to the first key of the merchandise being listed. After finding a related document within the index, the database will then must go fetch the total merchandise. As a result of DynamoDB’s secondary indexes are on completely different nodes than the principle desk, they wish to keep away from a community hop again to the unique merchandise. As a substitute, you will copy as a lot information as you want into the secondary index to deal with your learn.

Secondary indexes in DynamoDB are highly effective, however they’ve some limitations. First off, they’re read-only — you’ll be able to’t write on to a secondary index. Fairly, you’ll write to your predominant desk, and DynamoDB will deal with the replication to your secondary index. Second, you might be charged for the write operations to your secondary indexes. Thus, including a secondary index to your desk will typically double the overall write prices in your desk.

Suggestions for utilizing secondary indexes

Now that we perceive what secondary indexes are and the way they work, let’s speak about methods to use them successfully. Secondary indexes are a strong software, however they are often misused. Listed here are some suggestions for utilizing secondary indexes successfully.

Attempt to have read-only patterns on secondary indexes

The primary tip appears apparent — secondary indexes can solely be used for reads, so you must intention to have read-only patterns in your secondary indexes! And but, I see this error on a regular basis. Builders will first learn from a secondary index, then write to the principle desk. This ends in further price and additional latency, and you’ll typically keep away from it with some upfront planning.

When you’ve learn something about DynamoDB information modeling, you most likely know that you must consider your entry patterns first. It isn’t like a relational database the place you first design normalized tables after which write queries to hitch them collectively. In DynamoDB, you must take into consideration the actions your utility will take, after which design your tables and indexes to help these actions.

When designing my desk, I like to start out with the write-based entry patterns first. With my writes, I am typically sustaining some kind of constraint — uniqueness on a username or a most variety of members in a gaggle. I wish to design my desk in a manner that makes this simple, ideally with out utilizing DynamoDB Transactions or utilizing a read-modify-write sample that might be topic to race situations.

As you’re employed by these, you will usually discover that there is a ‘main’ technique to establish your merchandise that matches up along with your write patterns. It will find yourself being your main key. Then, including in extra, secondary learn patterns is simple with secondary indexes.

In our Customers instance earlier than, each Person request will doubtless embrace the Group and the Username. It will permit me to lookup the person Person document in addition to authorize particular actions by the Person. The e-mail deal with lookup could also be for much less distinguished entry patterns, like a ‘forgot password’ stream or a ‘seek for a consumer’ stream. These are read-only patterns, and so they match effectively with a secondary index.

Use secondary indexes when your keys are mutable

A second tip for utilizing secondary indexes is to make use of them for mutable values in your entry patterns. Let’s first perceive the reasoning behind it, after which take a look at conditions the place it applies.

DynamoDB means that you can replace an present merchandise with the UpdateItem
operation. Nonetheless, you can’t change the first key of an merchandise in an replace. The first key’s the distinctive identifier for an merchandise, and altering the first key’s principally creating a brand new merchandise. If you wish to change the first key of an present merchandise, you will must delete the previous merchandise and create a brand new one. This two-step course of is slower and expensive. Typically you will must learn the unique merchandise first, then use a transaction to delete the unique merchandise and create a brand new one in the identical request.

However, if in case you have this mutable worth within the main key of a secondary index, then DynamoDB will deal with this delete + create course of for you throughout replication. You possibly can problem a easy UpdateItem request to alter the worth, and DynamoDB will deal with the remainder.

I see this sample come up in two predominant conditions. The primary, and commonest, is when you’ve got a mutable attribute that you simply wish to type on. The canonical examples listed here are a leaderboard for a recreation the place individuals are regularly racking up factors, or for a regularly updating listing of things the place you wish to show probably the most not too long ago up to date gadgets first. Consider one thing like Google Drive, the place you’ll be able to type your recordsdata by ‘final modified’.

A second sample the place this comes up is when you’ve got a mutable attribute that you simply wish to filter on. Right here, you’ll be able to consider an ecommerce retailer with a historical past of orders for a consumer. You could wish to permit the consumer to filter their orders by standing — present me all my orders which can be ‘shipped’ or ‘delivered’. You possibly can construct this into your partition key or the start of your type key to permit exact-match filtering. Because the merchandise adjustments standing, you’ll be able to replace the standing attribute and lean on DynamoDB to group the gadgets accurately in your secondary index.

In each of those conditions, transferring this mutable attribute to your secondary index will prevent money and time. You may save time by avoiding the read-modify-write sample, and you may get monetary savings by avoiding the additional write prices of the transaction.

Moreover, be aware that this sample suits effectively with the earlier tip. It is unlikely you’ll establish an merchandise for writing primarily based on the mutable attribute like their earlier rating, their earlier standing, or the final time they have been up to date. Fairly, you will replace by a extra persistent worth, just like the consumer’s ID, the order ID, or the file’s ID. Then, you will use the secondary index to type and filter primarily based on the mutable attribute.

Keep away from the ‘fats’ partition

We noticed above that DynamoDB divides your information into partitions primarily based on the first key. DynamoDB goals to maintain these partitions small — 10GB or much less — and you must intention to unfold requests throughout your partitions to get the advantages of DynamoDB’s scalability.

This usually means you must use a high-cardinality worth in your partition key. Consider one thing like a username, an order ID, or a sensor ID. There are massive numbers of values for these attributes, and DynamoDB can unfold the visitors throughout your partitions.

Typically, I see individuals perceive this precept of their predominant desk, however then fully overlook about it of their secondary indexes. Typically, they need ordering throughout your entire desk for a sort of merchandise. In the event that they wish to retrieve customers alphabetically, they’re going to use a secondary index the place all customers have USERS because the partition key and the username as the kind key. Or, if they need ordering of the latest orders in an ecommerce retailer, they’re going to use a secondary index the place all orders have ORDERS because the partition key and the timestamp as the kind key.

This sample can work for small-traffic purposes the place you will not come near the DynamoDB partition throughput limits, however it’s a harmful sample for a heavy-traffic utility. Your whole visitors could also be funneled to a single bodily partition, and you’ll shortly hit the write throughput limits for that partition.

Additional, and most dangerously, this may trigger issues in your predominant desk. In case your secondary index is getting write throttled throughout replication, the replication queue will again up. If this queue backs up an excessive amount of, DynamoDB will begin rejecting writes in your predominant desk.

That is designed that can assist you — DynamoDB needs to restrict the staleness of your secondary index, so it’s going to forestall you from a secondary index with a considerable amount of lag. Nonetheless, it may be a shocking state of affairs that pops up whenever you’re least anticipating it.

Use sparse indexes as a worldwide filter

Individuals typically consider secondary indexes as a technique to replicate all of their information with a brand new main key. Nonetheless, you do not want all your information to finish up in a secondary index. You probably have an merchandise that does not match the index’s key schema, it will not be replicated to the index.

This may be actually helpful for offering a worldwide filter in your information. The canonical instance I exploit for this can be a message inbox. In your predominant desk, you would possibly retailer all of the messages for a selected consumer ordered by the point they have been created.

However for those who’re like me, you’ve got a variety of messages in your inbox. Additional, you would possibly deal with unread messages as a ‘todo’ listing, like little reminders to get again to somebody. Accordingly, I often solely wish to see the unread messages in my inbox.

You might use your secondary index to supply this world filter the place unread == true. Maybe your secondary index partition key’s one thing like ${userId}#UNREAD, and the kind key’s the timestamp of the message. While you create the message initially, it’s going to embrace the secondary index partition key worth and thus might be replicated to the unread messages secondary index. Later, when a consumer reads the message, you’ll be able to change the standing to READ and delete the secondary index partition key worth. DynamoDB will then take away it out of your secondary index.

I exploit this trick on a regular basis, and it is remarkably efficient. Additional, a sparse index will prevent cash. Any updates to learn messages is not going to be replicated to the secondary index, and you may save on write prices.

Slender your secondary index projections to scale back index measurement and/or writes

For our final tip, let’s take the earlier level a bit of additional. We simply noticed that DynamoDB will not embrace an merchandise in your secondary index if the merchandise would not have the first key components for the index. This trick can be utilized for not solely main key components but additionally for non-key attributes within the information!

While you create a secondary index, you’ll be able to specify which attributes from the principle desk you wish to embrace within the secondary index. That is known as the projection of the index. You possibly can select to incorporate all attributes from the principle desk, solely the first key attributes, or a subset of the attributes.

Whereas it is tempting to incorporate all attributes in your secondary index, this could be a expensive mistake. Do not forget that each write to your predominant desk that adjustments the worth of a projected attribute might be replicated to your secondary index. A single secondary index with full projection successfully doubles the write prices in your desk. Every extra secondary index will increase your write prices by 1/N + 1, the place N is the variety of secondary indexes earlier than the brand new one.

Moreover, your write prices are calculated primarily based on the dimensions of your merchandise. Every 1KB of knowledge written to your desk makes use of a WCU. When you’re copying a 4KB merchandise to your secondary index, you will be paying the total 4 WCUs on each your predominant desk and your secondary index.

Thus, there are two methods you could get monetary savings by narrowing your secondary index projections. First, you’ll be able to keep away from sure writes altogether. You probably have an replace operation that does not contact any attributes in your secondary index projection, DynamoDB will skip the write to your secondary index. Second, for these writes that do replicate to your secondary index, it can save you cash by decreasing the dimensions of the merchandise that’s replicated.

This could be a difficult steadiness to get proper. Secondary index projections should not alterable after the index is created. When you discover that you simply want extra attributes in your secondary index, you will must create a brand new index with the brand new projection after which delete the previous index.

Do you have to use a secondary index?

Now that we have explored some sensible recommendation round secondary indexes, let’s take a step again and ask a extra basic query — do you have to use a secondary index in any respect?

As we have seen, secondary indexes show you how to entry your information another way. Nonetheless, this comes at the price of the extra writes. Thus, my rule of thumb for secondary indexes is:

Use secondary indexes when the diminished learn prices outweigh the elevated write prices.

This appears apparent whenever you say it, however it may be counterintuitive as you are modeling. It appears really easy to say “Throw it in a secondary index” with out fascinated about different approaches.

To convey this residence, let us take a look at two conditions the place secondary indexes may not make sense.

Numerous filterable attributes in small merchandise collections

With DynamoDB, you usually need your main keys to do your filtering for you. It irks me a bit of every time I exploit a Question in DynamoDB however then carry out my very own filtering in my utility — why could not I simply construct that into the first key?

Regardless of my visceral response, there are some conditions the place you would possibly wish to over-read your information after which filter in your utility.

The most typical place you will see that is whenever you wish to present a variety of completely different filters in your information in your customers, however the related information set is bounded.

Consider a exercise tracker. You would possibly wish to permit customers to filter on a variety of attributes, corresponding to kind of exercise, depth, length, date, and so forth. Nonetheless, the variety of exercises a consumer has goes to be manageable — even an influence consumer will take some time to exceed 1000 exercises. Fairly than placing indexes on all of those attributes, you’ll be able to simply fetch all of the consumer’s exercises after which filter in your utility.

That is the place I like to recommend doing the maths. DynamoDB makes it simple to calculate these two choices and get a way of which one will work higher in your utility.

Numerous filterable attributes in massive merchandise collections

Let’s change our state of affairs a bit — what if our merchandise assortment is massive? What if we’re constructing a exercise tracker for a fitness center, and we wish to permit the fitness center proprietor to filter on the entire attributes we talked about above for all of the customers within the fitness center?

This adjustments the state of affairs. Now we’re speaking about lots of and even 1000’s of customers, every with lots of or 1000’s of exercises. It will not make sense to over-read your entire merchandise assortment and do post-hoc filtering on the outcomes.

However secondary indexes do not actually make sense right here both. Secondary indexes are good for recognized entry patterns the place you’ll be able to rely on the related filters being current. If we would like our fitness center proprietor to have the ability to filter on quite a lot of attributes, all of that are optionally available, we would must create a lot of indexes to make this work.

We talked in regards to the potential downsides of question planners earlier than, however question planners have an upside too. Along with permitting for extra versatile queries, they will additionally do issues like index intersections to have a look at partial outcomes from a number of indexes in composing these queries. You are able to do the identical factor with DynamoDB, however it should end in a variety of forwards and backwards along with your utility, together with some advanced utility logic to determine it out.

When I’ve a lot of these issues, I usually search for a software higher suited to this use case. Rockset and Elasticsearch are my go-to suggestions right here for offering versatile, secondary-index-like filtering throughout your dataset.

Conclusion

On this publish, we realized about DynamoDB secondary indexes. First, we checked out some conceptual bits to know how DynamoDB works and why secondary indexes are wanted. Then, we reviewed some sensible tricks to perceive methods to use secondary indexes successfully and to study their particular quirks. Lastly, we checked out how to consider secondary indexes to see when you must use different approaches.

Secondary indexes are a strong software in your DynamoDB toolbox, however they are not a silver bullet. As with all DynamoDB information modeling, be sure you rigorously think about your entry patterns and rely the prices earlier than you bounce in.

Study extra about how you should utilize Rockset for secondary-index-like filtering in Alex DeBrie’s weblog DynamoDB Filtering and Aggregation Queries Utilizing SQL on Rockset.

[ad_2]