~/nyuma.dev

How I've been storing data on the web

Because localStorage was never meant to be a database.

22 mins read

To no surprise, there’s tons of ways to store bytes on the web. From .mp3 files in Muse’s cloud system to .mdx files for even this website’s content — it’s a lot of data. Let's see how we got here.

Humble beginnings

LocalStorage

Persistence for most web applications started with this popular API. localStorage is simple, written for synchronous operations, and worked across sessions. You could store small amounts of data as key-value pairs, like so:

1localStorage.setItem("theme", "dark")
2const theme = localStorage.getItem("theme")

Nice right? Until it wasn’t. When building production-level applications, rarely is the data structures and model so simple. We often need complex objects, nested data, and sometimes even binary blobs. In these cases, localStorage quickly becomes a bottleneck.

More importantly, it's synchronous. On page load, the UI will freeze up for hundreds of milliseconds while it read from localStorage. This is fine for small data, but as soon as you try to store anything larger than a few kilobytes, it becomes a performance nightmare - you're storing serialized objects in string land, after all.

LocalStorage done right

Web storage is not a single tool. It’s a toolbox. Some are sharp, others brittle. Some will cut you if you’re not careful, so we as engineers have the responsibility to choose the right tool for the job.

I say this because it (local storage) still has its place. Seleneo uses it (via Zustand + persist) to save lightweight state like theme, current draft ID, and canvas config. It works well for tiny, UI-first state that doesn’t need to sync or be deeply structured.

1import { create } from 'zustand'
2import { persist } from 'zustand/middleware'
3
4const useUI = create(
5 persist(
6 (set) => ({
7 theme: 'light',
8 setTheme: (t) => set({ theme: t })
9 }),
10 { name: 'seleneo-ui' }
11 )
12)

But beyond 1000KB or so, it gets scary.

RIP WebSQL (and why we can't have nice things)

Right before the influx of the modern reactive web, there was WebSQL. To put it shortly: SQLite in the browser. People back then were writing real SQL queries, and although I wasn't around in 2010 to use it in everyday workflows - it looked glorious. You could fuck around and pull this off in the client:

1CREATE TABLE tracks (id INTEGER PRIMARY KEY, title TEXT, artist TEXT);
2INSERT INTO tracks (title, artist) VALUES ('Song 1', 'Artist 1');

Every now and then I daydream about a world where WebSQL didn’t die. But it did. Mainly because:

  • It was non-standardized (basically a wrapper over SQLite)
    • Although, the community at the time loved it
  • Firefox refused to implement it
    • W3C deprecated it in favor of IndexedDB
    • Mozilla felt it could codify the quirks of the SQLite implementation

WebSQL was too good to live, and IndexedDB is what we got instead.

IndexedDB

The long-term vision for Muse is for it to run almost entirely on IndexedDB — but through Dexie. Native IndexedDB's API is not the most dev-friendly, so we use Dexie for a their more convenient API and type safety. On the contrary I know that T3 Chat -used- Dexie (later migrating to Convex).

1import Dexie from 'dexie';
2import { z } from "zod";
3
4const TrackSchema = z.object({
5 id: z.string(),
6 title: z.string(),
7 artist: z.string(),
8 audioBlob: z.instanceof(Blob).optional(),
9 favorite: z.boolean().default(false),
10 playCount: z.number().default(0),
11});
12
13type Track = z.infer<typeof TrackSchema>;
14class MusicDB extends Dexie {
15 tracks!: Dexie.Table<Track, string>;
16
17 constructor() {
18 super('muse');
19
20 this.version(1).stores({
21 tracks: 'id, artist, favorite, playCount'
22 });
23 }
24
25 async addTrack(trackData: unknown) {
26 const validatedTrack = TrackSchema.parse(trackData);
27 return this.tracks.add(validatedTrack);
28 }
29}
30
31const db = new MusicDB();
32
33async function getFavoriteTracks() {
34 return db.tracks.where('favorite').equals(true).toArray();
35}
Note

Your UX is only as fast as your slowest dependency. IndexedDB isn't RAM-fast, but it's close.

Browser storage isn’t infinite. If you don’t ask how much space you’ve got, you will get evicted. Guaranteed.

1const { usage, quota } = await navigator.storage.estimate()
2console.log(`Used: ${usage}, Quota: ${quota}`)

In Muse, we check this when users start saving a lot of tracks. If you’re bumping against your quota, we warn you. If not, we carry on.

We also request persistent storage:

1await navigator.storage.persist()

This tells the browser: 'Hey, don’t wipe me unless you absolutely have to.' If you skip this step, your app data is fair game.

😬Eviction is silent

Users don’t see a modal. You don’t get a callback. Your storage just disappears.

PouchDB: offline dreams

I played around with PouchDB for Muse’s library sync. It’s a CouchDB-compatible client that uses IndexedDB under the hood. What you get:

  • structured docs
  • live sync
  • attachment support (for blobs)

We didn’t ship it because of size, performance, and need for full CouchDB backend. But the dream is real. PouchDB makes offline-first feel native.

Storage strategy at scale (Muse edition)

Muse's storage system, as I mentioned, runs across a few layers. Let's break that down further with the context of its API and features:

  1. Dexie for structured metadata:

    • What's stored? Think playlists (from /api/playlists), individual song metadata (title, artist, album, duration, from /api/songs), user's favorite songs and playlists (from /users/me/favorites/songs and /users/me/favorites/playlists), and potentially user statistics (from /users/me/stats).
    • Why Dexie? It allows for rich querying. For example, "show me all songs by artist X in playlist Y, sorted by title" or "list all my favorite playlists." This makes the local cache highly effective for browsing the library without constant network calls.
    • UX Impact: Snappy UI. When you open Muse, your library, playlists, and favorites can load almost instantly from Dexie, even before the first network request for updates completes.
  2. Audio blobs in IndexedDB (via Dexie):

    • The Challenge: Audio files are large. Storing them requires careful management.
    • Process: When a user downloads a song (perhaps after getting a stream URL from /api/songs/{id}/stream and deciding to save it), the audio data is saved as a blob in an IndexedDB store.
    • Management: This is where navigator.storage.estimate() becomes vital. Muse would need to monitor available space and potentially implement a Least Recently Used (LRU) cache eviction strategy or allow users to manage downloads if space gets low. The navigator.storage.persist() call is also key to signal to the browser that this data is important.
  3. Storage API to monitor quotas: As above, crucial for managing large audio files. The app should proactively inform the user if they're nearing their quota.

  4. Persistent flag to request durability: Essential for an app like Muse where users expect their downloaded music to stick around.

  5. Cloudflare R2 for synced storage across devices:

    • What's synced? Primarily user-specific metadata: playlist definitions, liked songs/playlists, play counts (perhaps via /api/playlists/{id}/play or /api/songs/{id}/listen), and user settings. Audio blobs themselves are kept local to the device due to size and cost.
    • Strategy: This could involve a sync on app load, a periodic background sync if the app is open, or sync on specific actions (e.g., creating a new playlist). The goal is that if a user logs into Muse on a new device, their core library structure and preferences are rehydrated from R2.

We only sync metadata (not audio) to R2 — your downloads live locally. But if you log in from another device, we rehydrate your likes and playlists.

We chose this because:

  • Audio is big
  • Syncing blobs is expensive
  • Most people stream anyway

So: local-first for audio, cloud-first for state. It’s worked surprisingly well.

Seleneo and the lightweight case

Seleneo, the design tool, has simpler needs for now. It doesn’t deal with media blobs or require extensive offline capabilities for its core offering. We use localStorage + Zustand to store UI preferences (like theme), active draft IDs, and sidebar state.

1import { create } from 'zustand'
2import { persist, createJSONStorage } from 'zustand/middleware'
3
4interface SeleneoUISettings {
5 theme: 'light' | 'dark' | 'system';
6 activeDraftId: string | null;
7 isToolPanelOpen: boolean;
8 lastUsedTool: string | null;
9 // ... other UI specific settings
10}
11
12const useSeleneoUIStore = create(
13 persist<SeleneoUISettings>(
14 (set) => ({
15 theme: 'system',
16 activeDraftId: null,
17 isToolPanelOpen: true,
18 lastUsedTool: null,
19 setTheme: (theme) => set({ theme }),
20 setActiveDraftId: (id) => set({ activeDraftId: id }),
21 toggleToolPanel: () => set((state) => ({ isToolPanelOpen: !state.isToolPanelOpen })),
22 setLastUsedTool: (tool) => set({ lastUsedTool: tool }),
23 }),
24 {
25 name: 'seleneo-ui-settings', // unique name
26 storage: createJSONStorage(() => localStorage), // specify localStorage
27 }
28 )
29)

This works because:

  • The data is small and not overly complex.
  • It's primarily for enhancing the immediate user experience on that specific browser (e.g., remembering UI state).
  • Synchronous access is acceptable for these small, quick reads/writes.

However, if Seleneo were to evolve to support, say, offline editing of complex vector designs or storing large embedded image assets locally for a draft, localStorage would quickly hit its limits in terms of both storage capacity and performance. At that point, migrating draft storage to IndexedDB would be the logical next step, potentially keeping localStorage for truly ephemeral UI state.

LocalStorage vs. IndexedDB: A Quick Rule of Thumb
  • LocalStorage: Good for small amounts (less than 5MB, ideally much less) of simple key-value data, like user preferences or UI state, where synchronous access is okay. Think Seleneo's current UI settings.
  • IndexedDB: Best for larger, structured data, blobs (like Muse's audio files), needing asynchronous access, complex queries, or transactions. Think Muse's entire music library metadata and downloaded tracks.

That’s kind of the point. You don’t have to reach for the most advanced option. Pick the thing that makes sense for the scale you’re working at.

Nuggets from the field

Nugget/TipExplanation & Recommendation
IndexedDB PerformancePerformance is generally good, but opening numerous stores simultaneously in a cold tab can be slow. Bundle your read operations if possible.
Reactive Offline StateDexie.js with liveQuery() offers a straightforward way to achieve reactive data that updates your UI automatically when underlying IndexedDB data changes.
LocalStorage UsageAvoid using localStorage for critical application state due to its synchronous nature and size limitations. It's better suited for persisting simple UI preferences.
PouchDB ConsiderationsPouchDB is powerful for full database sync capabilities (especially with CouchDB backends) but can be heavy. Evaluate if its feature set is essential for your needs.
navigator.storage APIThis API is often overlooked. Use navigator.storage.estimate() to check storage quotas and navigator.storage.persist() to request persistent storage, reducing the likelihood of data eviction.
Derived DataAvoid storing data that can be derived from a source of truth. Recalculate it when needed to save space and maintain consistency.
Storage SpeedNever assume client-side storage is fast. Always measure performance, especially with large datasets or frequent operations.
Data PermanenceDon't assume data stored on the client is permanent. Browsers can evict data under storage pressure. Implement strategies for data backup or sync if permanence is critical.

Diving Deeper into IndexedDB

While Dexie.js abstracts away much of IndexedDB's verbosity, understanding its core concepts can help you build more robust and performant applications.

Schema Versioning and Migrations

As your application evolves, so will your data structures. IndexedDB handles schema changes through database versions. When you open a database with a higher version number, an upgradeneeded event is fired, allowing you to modify the database structure (create/delete object stores, create/delete indexes).

Important

Properly managing schema versions and migrations is crucial to prevent data loss or application errors for your users when you deploy updates.

Here's a conceptual Dexie example of a migration:

1import Dexie, { type Version } from 'dexie';
2
3const db = new Dexie('MyMusicAppDB');
4
5db.version(1).stores({
6 tracks: '++id, title, artist, album',
7 playlists: '++id, name, trackIds',
8});
9
10// version 2: add album to tracks and a new settings store
11db.version(2).stores({
12 tracks: '++id, title, artist, album', // ensure 'album' is indexed
13 playlists: '++id, name, trackIds',
14 settings: 'key, value' // store for say global app settings
15}).upgrade(tx => {
16 // only called on version 1
17 return tx.table('tracks').toCollection().modify(track => {
18 // if the album was previously stored in a different way or needs a default
19 // for new tracks, 'album' will be undefined unless provided
20 // for existing tracks, we'd set a default value
21 if (typeof track.album === 'undefined') {
22 track.album = 'Unknown Album';
23 }
24 });
25});
26
27// version 3: add genre to tracks
28db.version(3).stores({
29 tracks: '++id, title, artist, album, genre',
30 playlists: '++id, name, trackIds',
31 settings: 'key, value'
32}).upgrade(tx => {
33 return tx.table('tracks').toCollection().modify(track => {
34 track.genre = track.genre || 'Unknown Genre';
35 });
36});
37
38export default db;

In this example, each .version(N) call defines the schema for that version. The .upgrade() callback provides a transaction (tx) to safely modify existing data to fit the new schema.

Indexing Strategies

Indexes are the backbone of fast queries in IndexedDB.

  • Choose wisely: Only index fields you'll frequently query or sort by. Too many indexes can slow down write operations.
  • Compound indexes: While IndexedDB doesn't support compound indexes directly in the same way SQL databases do, you can create them by combining multiple properties into a single indexed property (e.g., artistAndAlbum: "ArtistName_AlbumName") or use Dexie's multi-valued indexes for more complex scenarios.
  • unique constraint: Use this when a field must have unique values (e.g., a username).
  • multiEntry: Useful for indexing arrays. If a track has an array of tags ['rock', '90s', 'alternative'], a multiEntry index on tags would allow you to efficiently find all tracks tagged 'rock'.

Bulk Operations

For performance, always prefer bulk operations over multiple individual operations when dealing with many records. Dexie provides:

  • bulkAdd(): Adds an array of objects.
  • bulkPut(): Adds or replaces an array of objects.
  • bulkDelete(): Deletes an array of records by their primary keys.

These are significantly faster as they perform operations within a single transaction, reducing overhead.

IndexedDB and Web Workers

For very large datasets or computationally intensive data processing, consider offloading IndexedDB operations to a Web Worker. This keeps the main thread free, ensuring your UI remains responsive.

Responsive UI with Web Workers

Moving IndexedDB logic to a Web Worker can dramatically improve perceived performance by preventing database operations from blocking UI updates, especially on initial data hydration or large data writes.

While direct Dexie usage in a worker requires some setup (as Dexie instances are not directly transferable), you can communicate with a worker that handles its own Dexie instance or uses the native IndexedDB API.

Bridging the Gap: Real-world Data Synchronization Patterns

Storing data locally is one piece of the puzzle; keeping it in sync with a server or across multiple devices is another. Both Muse and Seleneo (if it were to expand its cloud features) would employ synchronization strategies.

Optimistic Updates

This pattern dramatically improves perceived performance. The UI updates as if the operation was successful immediately, while the actual network request happens in the background.

Example: Favoriting a song in Muse

  1. User Action: Clicks the "favorite" icon for a song.
  2. UI Update: Icon changes state instantly.
  3. Local Data Update (Dexie): The song's isFavorite flag is set to true in the local IndexedDB.
    1async function toggleFavoriteSong(songId: string, currentStatus: boolean) {
    2 const newStatus = !currentStatus;
    3 // propogate local DB immediately
    4 await db.tracks.update(songId, { isFavorite: newStatus });
    5 updateSongInUI(songId, { isFavorite: newStatus });
    6
    7 try {
    8 await fetch(`/api/users/me/favorites/songs`, {
    9 method: 'POST',
    10 headers: { ... },
    11 body: JSON.stringify({ songId, favorite: newStatus }),
    12 });
    13 } catch (error) {
    14 // revert local change & notify user if something goes wrong
    15 console.error("Failed to sync favorite status:", error);
    16 await db.tracks.update(songId, { isFavorite: currentStatus });
    17 updateSongInUI(songId, { isFavorite: currentStatus });
    18 }
    19}
  4. Network Request: A POST request is made to /users/me/favorites/songs (as per Muse API docs).
  5. Outcome:
    • Success: All good. The local state already matches the server.
    • Failure: The local change must be reverted, and the user notified. This is crucial.

Background Sync vs. Immediate Sync

  • Immediate Sync: For critical actions like login, registration, or a payment transaction. The user usually waits for the server response.
  • Background Sync (Service Worker BackgroundSync API): For non-critical data like analytics, logging, or less urgent updates. The Service Worker can wait for a stable connection to send the data. Muse might use this for syncing play counts if immediate accuracy isn't paramount.
  • Periodic Sync (Service Worker PeriodicBackgroundSync API): For regularly updating content, like fetching new playlist recommendations for Muse in the background.

Conflict Resolution

When data can be changed on multiple clients (e.g., two browser tabs editing the same Seleneo design, or Muse on two devices managing the same playlist), conflicts can arise.

  • Last Write Wins (LWW): Simplest. The last update received by the server overwrites previous ones. Common but can lead to data loss.
  • Timestamp-based: Each piece of data has a timestamp; the newest one wins.
  • CRDTs (Conflict-free Replicated Data Types): More complex, but mathematically guarantee convergence. Overkill for many apps but powerful for collaborative tools.
  • Manual Resolution: Prompt the user to resolve the conflict.

Muse's R2 sync for metadata likely uses a timestamp-based LWW or a server-side merge logic for things like playlist contents. Seleneo, if syncing complex designs, would need a robust strategy here.

Service Workers and the Cache API: True Offline Power

Beyond storing structured data in IndexedDB, Service Workers and the Cache API are essential for building truly robust offline-first web applications.

A Service Worker is a script that your browser runs in the background, separate from a web page, opening the door to features that don't need a web page or user interaction.

Key Capabilities:

  • Network Proxying: Intercept network requests and decide how to respond (e.g., serve from cache, fetch from network).
  • Caching Assets: Store your app shell (HTML, CSS, JS) and static assets for instant loading and offline availability.
  • Caching API Responses: Store responses from your backend APIs, making data available even when offline.
  • Background Sync: Defer actions until the user has stable connectivity.
  • Push Notifications: Engage users even when they aren't actively using the site.

Caching Strategies with the Cache API

The Cache API allows you to create and manage named caches of request/response pairs. Common strategies include:

  1. Cache First (Offline First):
    • Check the cache for a response.
    • If found, serve it.
    • If not, fetch from the network, cache the response, and then serve it.
    • Best for app shell assets and data that doesn't change frequently.
  2. Network First, then Cache:
    • Try to fetch from the network.
    • If successful, cache the response and serve it.
    • If the network fails (e.g., offline), serve from the cache as a fallback.
    • Good for data that changes frequently but should still be available offline (e.g., user's latest feed).
  3. Stale-While-Revalidate:
    • Serve from cache immediately (if available) for a fast response.
    • Simultaneously, fetch from the network in the background.
    • If the network fetch is successful, update the cache for the next request.
    • Offers a good balance of speed and freshness.
  4. Cache Only: Only serve from the cache. Useful for assets you know are versioned and won't change until the next app update.
  5. Network Only: Always bypass the cache and go to the network. For non-critical requests or things that must always be live.
Service Worker Lifecycle

Service workers have a distinct lifecycle (installing, activated, fetching). Understanding this is key to avoiding common pitfalls, like not seeing updated content immediately after deploying a new service worker. Tools like Workbox can simplify this.

For an application like Muse, a service worker could:

  • Cache the core application shell (HTML, JS, CSS).
  • Cache album art and track metadata fetched from an API.
  • Potentially even cache recently played audio files (though this needs careful quota management). This would allow users to open Muse, see their library, and play recently listened-to tracks even if they are completely offline.

Web Storage Security: Handle with Care

Client-side storage is inherently less secure than server-side storage. It's crucial to be aware of the risks.

localStorage and Cross-Site Scripting (XSS)

localStorage is synchronous and accessible via JavaScript on the same origin. If your site has an XSS vulnerability, an attacker can execute script that reads everything in your localStorage.

Sensitive Data in localStorage

Never store sensitive information like session tokens, API keys, or personal user data in localStorage. An XSS attack could lead to complete account takeover. Use HttpOnly cookies for session tokens.

IndexedDB Security

IndexedDB is also origin-bound, meaning only scripts from the same origin can access a specific database. This provides a good baseline of security against other sites. However, like localStorage, if your own site has an XSS vulnerability, malicious scripts running on your origin can access and manipulate IndexedDB data.

Storing Sensitive Information

Generally, avoid storing highly sensitive, unencrypted data on the client. If you must store data that needs some level of protection client-side (e.g., user preferences that are sensitive but not critical secrets):

  • Consider the threat model: What are you protecting against?
  • Web Crypto API: For actual encryption/decryption tasks, the Web Crypto API provides cryptographic primitives. However, client-side encryption is complex. If the encryption key itself is derived from something accessible to client-side JavaScript (e.g., a user password typed into the app), an XSS attacker could potentially intercept that key.
    Client-Side Encryption is Hard

    While the Web Crypto API provides the tools, managing keys securely on the client-side is a significant challenge. Don't assume it makes data impervious if your site is compromised.

  • Server-side is safer: For truly sensitive data, always prefer server-side storage and processing.

Choosing Your Storage Weapon: A Practical Decision Framework

Selecting the right client-side storage isn't just about picking the newest API; it's about matching the tool to the task, considering your application's specific needs. Here's a framework to guide your decision, using Muse and Seleneo as examples:

ConsiderationKey QuestionslocalStorage (Seleneo - UI)IndexedDB (Muse - Library & Audio)Cache API (Muse - Assets/Offline)
Data Size & ComplexityHow much data? Simple key-value or complex structured objects/blobs?Small (less than 5MB, ideally less than 100KB). Simple JSON-serializable objects. (e.g., Seleneo's theme, active draft ID).Large (MBs to GBs). Complex objects, arrays, blobs. (e.g., Muse's track metadata, playlists, downloaded MP3s).Request/Response pairs. Good for app shell, static assets, API responses. (e.g., Muse's JS/CSS, album art).
Querying NeedsNeed to search/filter/sort data by specific fields?No. Only key-based retrieval.Yes. Supports indexes for efficient querying, sorting, and range scans. (e.g., "Find all songs by artist X").No. Cache key (Request object) based retrieval.
Performance (Access Mode)Synchronous or Asynchronous access? Impact on main thread?Synchronous. Can block main thread if data is large or operations are frequent. Fine for tiny, quick reads/writes.Asynchronous. Non-blocking, uses Promises/callbacks. Essential for large data or frequent operations.Asynchronous. Non-blocking.
Offline RequirementsDoes data need to be available offline? How critical is it?Yes, for its specific use (UI persistence). But not for core app functionality if it were more complex.Yes, critical for Muse's offline playback of downloaded music and library browsing.Yes, primary purpose is to enable offline access to cached assets and API responses.
TransactionsNeed to perform multiple operations as a single atomic unit?No.Yes. Supports transactions for data integrity across multiple operations.No direct transaction model like IndexedDB, but operations are atomic.
Data Sync NeedsDoes this data need to be synced with a server or across devices?Not inherently. Seleneo's localStorage is device-specific. Sync would be a separate concern.Often, yes. Muse syncs metadata (not blobs) via R2. IndexedDB is the local cache/source of truth.Can cache API responses that are part of a sync strategy, but doesn't handle sync logic itself.
Ease of UseAPI complexity? Need for libraries?Very simple API.Complex native API. Libraries like Dexie.js are highly recommended.Relatively straightforward Promise-based API. Workbox can simplify further.
Security ContextWhat kind of data is it? What are the risks if exposed via XSS?Avoid sensitive data. Risk of XSS exposure. (Seleneo: UI prefs - low risk).Avoid highly sensitive unencrypted data. Risk of XSS exposure. (Muse: Track metadata - moderate risk if user data is detailed; audio blobs - IP risk).Can store API responses containing user data. Same-origin policy applies. Risk of XSNIFF if Content-Type is not set correctly.
🚀Start Simple, Evolve as Needed

Many applications start with localStorage for basic persistence. As features grow and data needs become more complex (like Seleneo potentially adding offline design storage), they might graduate to IndexedDB. Muse likely started with simpler storage and evolved its sophisticated multi-layered approach. Don't over-engineer from day one, but be prepared to refactor and adopt more powerful tools when your requirements demand it.

Final note

Web storage isn’t about choosing the “best” API — it’s about knowing your constraints. The right answer depends on what you're building, who you're building it for, and whether you care if it works offline.

I care. I think we all should.

Whether you're building a cloud music player or a drag-and-drop design studio, you’ll eventually ask yourself:

Where should this data live?

Make sure you have an answer.

And if you don’t yet — you will.

Happy storing, happy building 💽