~/nyuma.dev

Design for query patterns, not data structures

How to choose the right database for your application's needs

18 mins read

Rethinking data storage

Like many engineers, relational databases were the default for almost any greenfield project I had. It’s ingrained in our education (go beavs) and tooling - we design entity relationship diagrams, normalize data into neat tables, and assume SQL will handle whatever queries we throw at it.

Reality is, relational modeling dominates our (me included) logical representations of data by habit. And to be honest, for good reason: they are proven workhorses that can do “99% of real world applications” just fine.

But, over the years building my own side projects, I started to question this automatic choice.

In practice, I often found myself bending over backwards with complex joins and mapping code to make data from those perfect tables usable in my app’s features.

What I eventually learned is this: Don’t design for the data structure. Design for the query patterns.

In other words

Shape your data model around how your application will actually query and use the data, rather than the academically “pure” way to organize that data. This simple insight opened my eyes to the power of NoSQL document databases in scenarios I previously assumed were SQL territory.

In this post, I’ll share how this mindset shift came about, drawing on real experiences from my projects Muse and TalkToBeavs, and discuss when a document store can be a better fit than a relational database. We’ll touch on technologies like DynamoDB (with its single-table design mentality), MongoDB, and even PostgreSQL, - not diving into performance or ops minutiae, but focusing on modeling and querying data in a way that makes development simpler.


The relational mindset (and where it stumbles)

Relational databases encourage a structure-first design approach. You create schemas upfront - tables for each entity and relationships between them - without necessarily knowing how the application will use the data. In fact, traditional database design tells us that if we model the data correctly and normalize it, we can query it for any scenario later.

Users

idnameemail
1Alicealice@example.com
2Bobbob@example.com

Orders

iduser_idproduct_idquantity
10112011
10212022
10322015

Products

idnameprice
201Laptop1200
202Mouse25
1SELECT
2 u.name,
3 p.name as product_name,
4 o.quantity
5FROM orders o
6JOIN users u ON o.user_id = u.id
7JOIN products p ON o.product_id = p.id
8WHERE u.name = 'Alice';

Flexibility in queries is the promise, and indeed SQL is very powerful and flexible. It’s common to forge ahead with a RDBMS design “without thinking about access patterns”; you focus on avoiding redundancy and capturing all data relations. This way, query optimization is considered a separate concern from schema design - you trust indexes and SQL planners to handle performance, while you keep the schema logical and normalized.

This works up to a point - until the query patterns of your application start to conflict with that pristine normalized structure. In my own experience, that conflict often surfaced as overly complex joins or brittle multi-query workflows in the application code. For example, early on I might design a schema for a feature by identifying the nouns (Users, Orders, Products, etc.) and normalizing their relationships. But then to render a single screen or API response, I’d frequently have to join five or six tables, or perform sequential queries to gather all the needed pieces of information.

Each join or extra round-trip added complexity and slowed down development. Sure, the data was consistent and no value was ever duplicated - but I was designing for the data structure, not for the actual usage pattern in the app.

Telltale signs you're fighting your schema

When you start denormalizing on top of your carefully normalized schema to improve performance or simplicity - perhaps by adding a cache table, a materialized view, or even storing JSON blobs in a relational table to avoid another join, it's often a sign that your query patterns conflict with your data model.

I’ve seen teams add an extra_info JSON column in PostgreSQL to stuff in data that didn’t fit their original schema, essentially treating Postgres as a pseudo-document store. While powerful, heavy use of such features can indicate you’re fighting the relational model.

One such case

In my case, while building Muse - an audio downloader app - I initially attempted a normalized design for user libraries, song metadata, downloads, etc. Very soon I realized that nearly every meaningful query in the app needed data from multiple tables: user info plus their playlist plus song details from an external API. I was writing complex SQL or multiple queries just to assemble what felt like a basic object. The relational design was logically clean, but it wasn’t aligned with how the application actually accessed data.

“Design for the Query Patterns”

The turning point was embracing the idea that data modeling should start with the application’s needs. What questions will we ask of the data? What does the app actually do with it? In the NoSQL world, this is a well-known principle.

The DynamoDB Principle

Amazon’s DynamoDB documentation explicitly states: “you shouldn’t start designing your schema for DynamoDB until you know the questions it will need to answer.”1 This reverses the traditional relational approach. Instead of designing the schema and then writing queries, you figure out the access patterns first, then model the data to fit those patterns.

In practice, this means thinking in terms of aggregates or use-case centric data structures. museisfun.com frequently needs to show a Playlist and all its' Songs together - scattering that data across separate normalized tables at query time is a pain.

MongoDB encourages storing data in this model - here, a Playlist and its Songs are in a single document or at least in the same collection, making that query trivial. This "single-table design" philosophy should be an illuminating example for you too.

You identify all your core query patterns upfront, and then denormalize and organize your data such that each pattern can be served with as few requests as possible - ideally one. (Look familiar? pssst, GraphQL).

As AWS puts it, in DynamoDB you design your schema “to make the most common and important queries as fast and inexpensive as possible,” tailoring data structures to your business use cases1. This is the essence of designing for query patterns over normalized structure.

I found a great explanation of this mindset on Stack Overflow, where a user explains MongoDB: “The design of the application comes first, then design the database schema to match the application usage (query, writes, updates, etc).”2. In other words, don’t start with an ER diagram in isolation. Start with how the data flows through your app.

  • Which pieces of data belong together from the app’s perspective?
  • Which queries are critical and run frequently?
  • What data do you need to fetch together to render a screen or respond to an API call?

By focusing on those, you might decide to store data that way from the get-go. Document databases shine here because they impose far fewer constraints on how you structure your data. You can put related information in one document (even if that means nesting or repeating some data) without the rigid table boundaries of SQL. As one commenter succinctly noted, in NoSQL, data is stored to serve specific queries; redundancy is added for read performance (at the cost of extra work on writes or some consistency trade-offs)2. In other words, optimize for the read/query path that matters most, rather than optimizing purely for storage efficiency.

When joins got out of hand

To ground this in a real scenario, let me recount my experience with Muse, a side project of mine. In Muse (the app I mentioned earlier), my initial data model reflex was to use a relational database (PostgreSQL) because the data seemed straightforwardly relational: a table for Users, a table for Songs, maybe a join table for a user’s saved songs (since it’s a many-to-many relationship of users and songs). Perhaps another table for Artists, and so on. Classic normalized design.

It all looked good on paper, but when I started implementing features, the friction became sooo apparent. For example, just to show a user’s music library screen in the app, I needed each song’s details and artist info. In the normalized schema, that meant: find all songs in the user’s SavedSongs join-table, then join with the Songs table, and join again with the Artists table (and perhaps an external lookup if the song data wasn’t fully in our DB).

🤔You see?

That’s a three or four-way join on what conceptually is “get me everything in Nyuma's library.” Worse, if I wanted to display that along with user-specific data (say, the timestamp when the user saved the song, or their personal rating of it), that was yet another table or join. Writing the query was one challenge; ensuring it was efficient and that Prisma was handling it optimally was another.

And any change in what I wanted to show - like now needing to also show whether the song is available for offline play - meant altering the schema (add a column here, another join there) and the query.

The schema was perfectly normalized, but the query pattern was clunky - an example of a conflict between the data model and the query patterns.

Halfway through, I stepped back and reconsidered: what if I stored user libraries in a more natural way? Essentially, what I wanted was: Some document (User) which contains an array of objects (Songs) - with maybe just the fields needed for the library view.

1{
2 "userId": "nyuma-id",
3 "songs": [
4 { "id": "song-1", "title": "Top Off", "artist": "Gunna", "favorited": true },
5 { "id": "song-2", "title": "Low Down", "artist": "Lil Baby", "favorited": false },
6 ],
7 "preferences": { "offlineMode": true }
8}

Now, to get a user’s entire music library, I just fetch this one document by userId - no joins, no multi-round trips. The query pattern (“get user library with songs and preferences”) directly matches the data structure stored. This was a revelation in terms of simplicity. When migrating to this, it meant the code to retrieve and display a library went from a yucky SQL operations across multiple tables, to a single key-value lookup by userId.

It also meant if I wanted to add a new piece of data (say, each song also stores some album_cover URL), I could just start storing it in the document without painful schema migrations across multiple tables.

What About Redundancy and Consistency?

Naysayers (🗣️): Isn’t this denormalized approach dangerous?

It’s true that if the same song appears in 1000 users’ libraries, its title and artist are stored 1000 times. In a pure relational design, that data would live once in the Songs table.

But I asked myself: how often does a song's title or artist change? Almost never. Yeah, in Muse, users can modify their own library, but that's local-only, and at scale, not a problem. At the end of the day, when working with rarely-mutated data, the cost of duplication is often outweighed by the benefit of simpler, faster queries (Take that 'DRY' principle).

This is a key trade-off with document stores: optimize for common read patterns and accept some redundancy. In Muse, the common case (reading a library) far outweighed the rare case of a song’s metadata changing.

Tip

If you say something along these lines in a system-design interview question on why you chose a document store over a relational database, trust me, you'll look like a pro if you can go deep like how I did here.

Flexibility and speed of iteration

Another project that solidified my appreciation for document stores was TalkToBeavs, a real-time social app my friends and I built for our university community. Think of it as an OSU-only Omegle/YikYak hybrid - real-time text and video chat, plus a social feed where students could post.

We knew from the start this project it would evolve rapidly; as it was a class project with a tight deadline, and we’d likely be changing features on the fly. We chose MongoDB as our database, not just for its free-tier generosity, but because we needed a schema that could adapt quickly. It turned out to be a great fit.

Deep in the weeds

One core feature of TTB was the real-time chat rooms (randomly pairing students to chat). We had a Room document for each active chat. That Room document contained an array of messages and a list of the users in the room. In a normalized SQL design, you might have a Messages table and a RoomUsers join table (with a foreign key to RoomID).

But, because our typical access pattern was “fetch the entire chat history when a user joins a room,” we chose to embed the messages directly in the Room document. This way, joining a chat meant one database call to get the Room document, which already held the last N messages to display.

We didn’t have to issue a separate query to load messages for that room ID - embedding made the read path very fast and straightforward. The catch being that writing a new message means updating the Room document (which could contend if many writes happen concurrently, but our scale was modest and MongoDB can handle atomic updates on a single document). We favored the fast read (since chat history is read every time someone joins) over absolute write scalability.

This design choice - “favor embedding, unless there’s a reason not to” - is actually a recommended approach in MongoDB funnily enough. We found it to be quite true. By embedding messages, we avoided the complexity of join queries in a performance-critical part of the app. And if a message had to be deleted or edited, it was just a matter of updating that one document.

Hopefully, this example shows you how small-wins in your data model can lead to big-wins in your code and development speed.

The trade-off

Of course, like everything else in engineering, there's some catches. Here, it's inflexibility if you need new query patterns that aren't anticipated. Like, in a DynamoDB single-table, adding a new type of query might require refactoring the data model or backfilling data to fit a new access pattern.

In a traditional SQL database, you have more ad-hoc query flexibility (you can usually write a new query joining existing tables in a new way, albeit maybe slowly, but it will function). Highlighting the key revelation from this post.

If you know you’ll always fetch X then Y, you can model X and Y together. But if your application is exploratory or requires super-flexible reporting and ad-hoc slicing and dicing of data, a relational model (or a multi-model approach) might serve you better.

For a lot of applications though - especially the operational, user-facing ones - the core access patterns are known up front.

  • In an e-commerce site, you know you’ll show product pages with reviews together, user profiles with their order history, etc.
  • In a game app, you know you’ll fetch leaderboards or player stats together.
  • In a social media app, you know you’ll fetch a user’s profile with their posts together.

These are scenarios where designing the data by usage makes a ton of sense. In Mongo, it might be about deciding which sub-documents to embed versus which to keep separate (based on whether they’re needed together). In a relational database, you might even end up creating a consolidated view or using JSON fields to mimic this if needed (though at that point, you’re halfway into document store territory). The bottom line is that thinking “query-first” leads to a more purpose-built schema, which often means simpler application code and faster queries for those specific patterns.

It’s easier to reason about the data as a cohesive object. In DynamoDB’s single-table world, this is even more pronounced – you effectively have one big collection of everything, but partitioned in such a way that each entity group is isolated. The AWS team recommends using as few tables as possible in DynamoDB - often just one - to simplify management and scalability. In Mongo, you might not always go to one collection, but you typically have far fewer collections than the number of tables you’d have in a fully normalized SQL design for the same app. Fewer moving parts in the schema can lead to fewer join bugs, fewer foreign key constraints to manage, and so on.

Relational Databases Still Have Their Place

I’m not declaring that relational databases are obsolete—far from it. They are incredibly powerful and often the right choice for scenarios requiring strong ACID guarantees, complex ad-hoc queries for analytics, or managing highly interconnected data where you need to traverse relationships arbitrarily.

The goal is to choose the right tool for the job, not to replace one default with another.

In fact, one of the downsides I’ve encountered with document-focused designs is when you suddenly need a new data access pattern that wasn’t anticipated. For example, in the Muse scenario, if I one day wanted to find “all users who have song X in their library,” that’s easy in a normalized SQL schema (just query the join table by songId). In my denormalized user-centric model, it’s harder – I might have to scan all user documents to find that song in their array. This is the flip side of not designing for flexibility: you gain efficiency on the known queries but pay for it on the unplanned ones.

So, when should you use a document store over a relational database? Based on my experience, whenever your data can be naturally divided into discrete aggregates that match your usage patterns. If you can identify clear boundaries where one document can satisfy a whole query or command in your app, that’s a strong signal. Also, if you anticipate a lot of requirement changes or you’re in an early phase where you need agility, a schema-flexible store will smooth out your path. If you find yourself doing tons of joins, or caching joined data to serve your UI, that’s essentially your application telling you “put these things together, I always need them together.” Listen to that and consider a document model for them.

As one Stack Overflow answer noted in the context of MongoDB: with a flexible schema, you design the database after the application’s needs; different applications can leverage different models. In other words, let the way you query guide the way you store.

To the senior engineers reading this, I’d say: challenge the reflexive decision of SQL-first. We have so many great NoSQL options now - from MongoDB for general document storage, to DynamoDB for massive scale with single-digit millisecond access (if you design it right), to CouchDB, Firebase, etc. Each has its own nuances, but they all encourage thinking beyond tables and relations. The next time you’re architecting a new service or feature, try starting with the question, “How will I need to access this data?” and sketch your model from there. You might find that a document-oriented approach not only meets your needs but actually makes the whole development process more enjoyable. My mantra these days is: design your data model to answer questions, not just to store facts. Often that leads to a document store - and a sigh of relief when I realize my app logic for those questions has become much simpler.

Relational databases are one way to model data, but they are not the only way, and not always the best way for an application’s particular access patterns. Document stores align your data with your code’s view of the world, which in many cases is exactly what you want. Embrace that when it makes sense.

There will always be a place for SQL, but document databases have earned a much larger place in my toolbox than I ever expected - and I suspect they might in yours too, once you start designing around how your data is used. So when you find yourself contorting SQL to fit your app, or writing code to glue together normalized data for the hundredth time, take a step back and consider: maybe it’s time to pick a database that already speaks the language of your application’s needs. Chances are, you’ll be glad you did.

Footnotes

  1. Amazon Web Services - NoSQL Design for DynamoDB: guidance on modeling DynamoDB tables based on access patterns. 2

  2. Stack Overflow - Schema design Q&A: emphasizes application-driven schema design and favoring embedded documents by default. 2