Blockchain data is public, but that does not mean it is easy to use. Anyone who has tried to build a wallet dashboard, DeFi analytics page, NFT activity feed, or onchain game backend runs into the same wall: raw chain data is messy, expensive to query, and painfully slow to turn into product-ready information.
That is exactly where SubQuery becomes useful. Instead of hitting RPC endpoints over and over, parsing events manually, and building your own indexing pipeline from scratch, SubQuery gives teams a way to transform blockchain data into something application-friendly: structured, queryable, and fast.
For founders and crypto builders, this matters for one simple reason. Speed of product development often depends less on smart contract complexity and more on how quickly you can make data accessible to your frontend, internal dashboards, and users. If your app cannot reliably answer questions like “what happened?”, “who owns what?”, or “which transactions matter?”, your user experience breaks down fast.
This guide walks through how to use SubQuery for indexing crypto data, where it fits in a modern crypto stack, how the workflow actually looks in practice, and when it is the right choice versus when it creates unnecessary complexity.
Why blockchain apps break when they rely only on RPC calls
At first, many teams assume they can query everything directly from a node or third-party RPC provider. That works for small prototypes. It stops working the moment your product needs historical views, filtered event streams, cross-contract relationships, or user-specific summaries.
Here is the underlying problem: blockchains are optimized for consensus and execution, not for rich application queries. Nodes can tell you blocks, logs, and transaction receipts, but they are not built to answer higher-level product questions efficiently.
For example:
- A DeFi app may need to display every liquidity action for a wallet across multiple pools.
- An NFT marketplace may need to show collection-level trading history and rarity-linked ownership changes.
- A gaming app may need to track player actions, asset states, and reward claims over time.
Trying to do that directly from RPC often leads to rate limits, inconsistent performance, duplicate processing, and backend code that becomes harder to maintain than the smart contracts themselves.
SubQuery solves this by letting you index blockchain events and state changes into a queryable data model. In practical terms, it acts like a blockchain ETL layer for your application.
Where SubQuery fits in a modern crypto data stack
SubQuery is an open, flexible data indexing framework designed for Web3 applications. It lets developers define what onchain data they care about, how that data should be transformed, and how it should be stored for fast querying.
The basic pattern looks like this:
- Connect SubQuery to a supported blockchain network.
- Specify the smart contracts, events, or blocks you want to monitor.
- Write mapping logic that transforms raw chain data into structured entities.
- Store those entities in a database.
- Expose them through GraphQL APIs for your app or analytics layer.
That means your frontend no longer needs to reconstruct history in real time. Instead, it asks a purpose-built index: “give me all swaps by this user,” or “show all recent mints in this collection,” or “return protocol TVL snapshots by day.”
For builders, the big shift is this: you stop treating blockchain data as raw infrastructure and start treating it as application data.
The core building blocks that make SubQuery work
Your project manifest defines the indexing scope
Every SubQuery project starts by defining what should be indexed. This usually includes the network endpoint, the contracts or runtime modules being tracked, the events or calls of interest, and the mapping handlers that process them.
This is where you make the first strategic decision: index only what your product truly needs. Teams often over-index because they assume more data equals more flexibility later. In reality, extra indexing increases sync time, storage cost, and maintenance burden.
Your schema turns chain activity into product-ready entities
SubQuery uses a schema to define the data models your app will query. Think of this as the bridge between low-level blockchain mechanics and high-level application logic.
Instead of storing vague event payloads, you can model meaningful entities such as:
- User
- Swap
- PoolPosition
- NFTTransfer
- RewardClaim
- DailyVolumeSnapshot
This is one of the biggest advantages of using an indexing layer. You are not just collecting data; you are designing a usable data product.
Mapping functions are where the business logic happens
When SubQuery processes a block, event, or transaction, it runs your mapping handlers. These functions take raw onchain input and transform it into entities that match your schema.
For example, if your contract emits a token purchase event, your mapping function might:
- Extract buyer address, token ID, and amount paid
- Create a purchase record
- Update total sales volume for the collection
- Recalculate buyer activity stats
This is where product thinking matters. Good mapping logic keeps your app queries simple. Bad mapping logic pushes complexity downstream and forces your frontend or API layer to do extra work later.
How to set up a SubQuery project for crypto indexing
The exact setup depends on the chain you are indexing, but the workflow follows a predictable pattern.
1. Start with a clear data question
Before touching code, define the product question you need to answer. Not “we want indexing.” Something concrete, like:
- Show all staking actions for a wallet
- Track protocol lending volume by market
- Build a real-time transaction feed for a token
- Display ownership history for an NFT collection
This keeps your project focused and avoids building a data pipeline that is technically impressive but commercially irrelevant.
2. Initialize the project and configure the network
Set up your SubQuery project using the CLI and point it to the blockchain network you want to index. This involves selecting the chain endpoint and defining the starting block.
Choosing the right start block is more important than many teams realize. If you start too early, sync time gets unnecessarily long. If you start too late, you miss historical context. In production, founders should treat this as a product decision, not just an engineering detail.
3. Define entities around user value, not contract structure
It is tempting to mirror the smart contract exactly in your schema. That is usually a mistake. Your app users do not care about internal contract mechanics; they care about outcomes.
For instance, if your contract has multiple low-level events that together describe a completed user action, your schema should probably capture the completed action itself. This makes your API cleaner and your frontend much easier to build.
4. Write handlers for the events that matter most
Focus first on the events that drive product functionality. If you are building a DEX analytics tool, that may be swaps, liquidity adds, and liquidity removals. If you are building a staking app, it may be delegate, undelegate, reward claim, and slash events.
Start narrow. Get a reliable index running. Then expand coverage if needed.
5. Run locally and validate against chain reality
One of the most common indexing mistakes is assuming your logic works because it compiles. In reality, you need to compare your indexed output with actual contract activity and block explorer records.
Test edge cases like:
- Failed transactions
- Contract upgrades
- Reorg-sensitive events
- Duplicate logs
- Large-volume wallets
Indexing is not only about correctness in theory. It is about correctness under production conditions.
6. Deploy and connect your app through GraphQL
Once your project is syncing reliably, deploy it and expose the indexed data through GraphQL. This is where the payoff shows up. Your frontend or backend can now query a clean, fast interface instead of reconstructing data from RPC responses.
For startup teams, this often shortens development cycles significantly. Product engineers can work with stable query patterns instead of chain-specific parsing logic every time they need a new feature.
A practical example: indexing NFT transfer activity
Let’s say you are building an NFT portfolio tracker. Your users want to see:
- Current ownership
- Transfer history
- Mint events
- Wallet-level activity across a collection
Using SubQuery, your workflow might look like this:
- Track the NFT contract’s Transfer events
- Create entities for Token, Owner, and TransferRecord
- Update current token owner whenever a transfer happens
- Flag mint events when the sender is the zero address
- Store timestamps, transaction hashes, and collection metadata for queryable history
Now your app can ask questions like:
- Which tokens does this wallet currently own?
- What is the transfer history of token #482?
- How many mints happened in the last 24 hours?
- Which wallets are most active in this collection?
That is a much better developer and user experience than pulling logs manually every time a page loads.
Where SubQuery is especially strong for startups
SubQuery is most valuable when your product depends on repeatable, structured access to onchain data. That includes analytics products, wallets, portfolio apps, DeFi dashboards, DAO tooling, gaming backends, and any app where history matters as much as current state.
It is also useful when your team wants more control than a fully managed analytics platform can provide. If your data model is unique to your app, a custom indexer often becomes a strategic advantage.
Another strength is speed of iteration. Once the indexing layer is designed properly, shipping product features becomes easier because the data is already shaped around user needs.
Where SubQuery can become the wrong tool
SubQuery is not automatically the best choice for every crypto product.
If your app only needs a handful of simple real-time reads from a smart contract, direct RPC or contract calls may be enough. Adding an indexer too early can introduce operational complexity before you actually need it.
It can also be overkill for teams that lack backend ownership. Indexing pipelines require monitoring, schema changes, reindexing decisions, and care around data consistency. Founders should be honest about whether their team can support that operationally.
Another trade-off is that indexed data is only as good as its design. A rushed schema or weak mapping layer can leave you with technical debt that is harder to fix than the contracts themselves. Reindexing large datasets is possible, but it is not free.
Expert Insight from Ali Hajimohamadi
Founders often think of indexing as a developer convenience. That is too narrow. In a serious crypto product, indexing is part of your data strategy. It determines how quickly you can ship analytics, personalize user experiences, detect valuable patterns, and build retention loops around onchain behavior.
The strongest strategic use case for SubQuery is when onchain activity is central to your product experience, but raw blockchain infrastructure is not where you want your application team spending time. In that scenario, SubQuery helps you create a clean separation between chain ingestion and product logic.
I would strongly consider using it if:
- Your startup depends on historical blockchain data, not just current contract state
- You need app-friendly APIs for wallets, dashboards, feeds, or analytics
- Your frontend team is moving slowly because data access is too fragmented
- You expect product iteration around data views, user segmentation, or activity insights
I would avoid it, or at least delay it, if:
- You are still validating whether users care about the product at all
- Your onchain reads are simple and low-frequency
- You do not yet have someone who can own backend data quality
- You are indexing broad datasets without a clear product question
A common founder mistake is assuming “more indexed data” creates defensibility. Usually it just creates cost. Defensibility comes from how you model, interpret, and apply data to product decisions. Another misconception is that indexers are purely technical plumbing. In reality, they shape what metrics you can trust, what user behaviors you can detect, and how quickly you can make strategic decisions.
If I were advising an early-stage Web3 startup, I would say this: build the smallest indexing layer that unlocks a better product experience. Do not aim for a perfect blockchain data warehouse on day one. Aim for a fast, reliable path from chain events to user value.
The real trade-off: flexibility versus operational overhead
SubQuery gives teams control, and control is powerful. But control always comes with responsibility. You need to think about sync speed, database performance, schema migrations, and how your index reacts when contracts evolve.
This makes SubQuery a good fit for teams that view data infrastructure as part of the product. It is a weaker fit for teams looking for a completely hands-off analytics solution.
The right question is not “Can SubQuery index this?” It usually can. The better question is: Will owning this indexing layer help us move faster where it matters?
Key Takeaways
- SubQuery helps turn raw blockchain events into structured, queryable application data.
- It is especially useful for wallets, DeFi dashboards, NFT products, gaming backends, and analytics-heavy crypto apps.
- The most important step is defining the product question before designing the index.
- A strong schema should reflect user value, not just raw contract structure.
- Mapping functions are where your application logic and data quality are won or lost.
- SubQuery is powerful, but it adds operational overhead and should not be adopted blindly.
- For startups, the best approach is to build a narrow indexing layer first, then expand based on product traction.
SubQuery at a glance
| Category | Summary |
|---|---|
| Primary role | Indexes blockchain data and exposes it through queryable APIs |
| Best for | Crypto apps needing historical, structured, app-friendly onchain data |
| Typical users | Founders, Web3 developers, backend teams, analytics builders |
| Main advantage | Reduces reliance on raw RPC calls and simplifies frontend/backend data access |
| Core components | Manifest, schema, mapping handlers, database, GraphQL API |
| Strong use cases | DeFi analytics, NFT history, wallet activity feeds, staking dashboards, gaming state tracking |
| Main trade-off | Operational complexity, maintenance, and the need for thoughtful schema design |
| When to avoid | Very early validation stages or apps with only simple, low-volume contract reads |