Do You Actually Need a Time Series Database?
Time series databases have gained significant mindshare over the past few years. InfluxDB, TimescaleDB, Prometheus, and others promise optimized storage and queries for time-stamped data. The question is whether you actually need these specialized databases or if a traditional relational database would serve you just as well.
Time series data is characterized by being timestamped, append-mostly, and queried with time-based ranges and aggregations. Sensor readings, application metrics, stock prices, server logs—these all fit the time series pattern.
But just because your data has timestamps doesn’t automatically mean you need a specialized database. Regular databases handle timestamped data perfectly well for many use cases.
Let’s start with scale. If you’re collecting a few hundred or even a few thousand data points per second, PostgreSQL or MySQL can handle that without breaking a sweat. A properly indexed timestamp column performs well for range queries. Aggregation queries work fine with standard SQL functions.
The tipping point is usually when you hit tens of thousands to hundreds of thousands of data points per second, sustained over long periods. At that scale, the write amplification from indexing and the query planning overhead of general-purpose databases start becoming problematic.
Time series databases optimize for this high-throughput append-heavy workload. They use compression techniques specific to time series data patterns, store data in time-ordered chunks, and skip traditional indexing in favor of structures optimized for time-range queries.
Data retention is another consideration. If you’re keeping metrics for a few days or weeks, a regular database is fine. If you’re storing years of high-resolution metrics, the storage efficiency of time series databases becomes valuable.
InfluxDB, for example, can achieve 10x or better compression compared to storing the same data in PostgreSQL. That’s not because PostgreSQL is bad at compression—it’s because time series databases use compression algorithms specifically designed for the patterns in timestamped metrics.
Query patterns matter. If your queries are mostly “give me the average value of this metric over the last hour” or “show me the 95th percentile response time for the past day,” time series databases optimize exactly for those patterns.
But if you’re doing complex joins across multiple tables, or if you need full ACID transactions, or if you’re running ad-hoc analytical queries that don’t follow time-series patterns, a relational database is often better.
TimescaleDB is interesting because it’s built as a PostgreSQL extension. You get time-series optimizations while retaining PostgreSQL’s query capabilities and ecosystem. It’s a middle ground that works well when you need both time series performance and relational database features.
Operational complexity is a real cost. Introducing a time series database means another system to deploy, monitor, backup, and maintain. If your existing relational database meets your performance needs, adding another database increases operational burden without clear benefit.
This is particularly true for smaller teams. If you have one person managing databases, doubling the number of database systems they need to understand and maintain is a significant cost.
Ecosystem and tooling also matter. Prometheus has become the de facto standard for infrastructure monitoring largely because of its ecosystem. Grafana integration, alert manager, service discovery, and the broader cloud-native tooling all expect Prometheus.
If you’re building in that ecosystem, using Prometheus is the path of least resistance regardless of whether your scale technically requires it. The integration benefits outweigh any performance overhead from running a specialized database.
Conversely, if your application already uses PostgreSQL and you have strong PostgreSQL expertise on your team, storing metrics in PostgreSQL (possibly with TimescaleDB) might be simpler than introducing InfluxDB or Prometheus.
Data model flexibility is worth considering. Relational databases let you easily join time series data with other tables. If your metrics need to be correlated with user accounts, or product catalogs, or any other relational data, doing that in PostgreSQL is straightforward. Doing it across separate systems requires application-level joins, which is messier.
Time series databases generally have limited data modeling capabilities. You have tags and fields, which map roughly to indexed and non-indexed columns, but you don’t have foreign keys, complex constraints, or the full relational model.
Downsampling and retention policies are areas where time series databases shine. InfluxDB and Prometheus have built-in support for automatically aggregating high-resolution data into lower-resolution summaries and deleting old data based on age.
You can implement this in a relational database with scheduled jobs and manual aggregation queries, but it’s more work and more opportunity for mistakes. If you need sophisticated retention and downsampling, that’s a point in favor of specialized databases.
Analytics and machine learning workloads have different requirements. If you’re doing real-time anomaly detection on metrics streams, some time series databases offer built-in functions for that. But if you’re doing complex statistical analysis or training ML models, exporting to a data warehouse or analytical database might be necessary anyway.
Cost is often overlooked. Managed time series database services can be expensive at scale. Running your own InfluxDB or Prometheus cluster requires infrastructure and operational effort. If your PostgreSQL instance already exists and has spare capacity, using it for time series data has minimal incremental cost.
Query language is another consideration. InfluxQL and PromQL are powerful for time series queries but have learning curves. If your team already knows SQL, staying within SQL might be more productive than learning another query language.
Integration with existing monitoring and alerting systems matters. If you already use a monitoring platform that expects data in a specific format, you might need the database that integrates naturally with that platform.
Write durability requirements vary. Prometheus explicitly trades durability for performance—it’s designed for monitoring where losing a few seconds of metrics during a crash is acceptable. If you need guaranteed durability for every data point, you need a different solution.
For financial data, sensor readings in industrial systems, or other scenarios where data loss is unacceptable, you need proper write-ahead logging and replication. Not all time series databases provide the same guarantees.
Cardinality is a classic pitfall with time series databases. If you have metrics with highly variable tag values (like user IDs or session IDs), cardinality explodes and performance degrades. This is a known issue with Prometheus and requires careful data modeling to avoid.
Relational databases handle high cardinality better in some ways because they’re designed for arbitrary indexed columns. But they perform worse at the high write rates that often accompany high cardinality time series data.
Real-time aggregation requirements influence the decision. If you need to query real-time moving averages, percentiles, or other streaming aggregations, some time series databases offer this natively. Implementing it efficiently in a relational database requires materialized views or application-level state management.
Multi-datacenter replication is simpler with some relational databases than with time series databases. If you need to replicate metrics across regions, check whether your chosen time series database supports that use case well.
The honest answer is that many applications use time series databases when they don’t strictly need them, and many applications use relational databases for time series data when a specialized database would be better.
If you’re collecting high volumes of metrics, need efficient compression and retention, and your queries follow typical time series patterns, a specialized database makes sense. If your volumes are moderate, you need complex queries or joins, and you want to minimize operational complexity, stick with your relational database.
There’s no hard cutoff point. It’s a continuum based on your specific requirements, scale, query patterns, operational capacity, and existing infrastructure. Start with what you know and what you already have. Migrate to specialized databases when you have concrete evidence that you need them, not based on what seems trendy.
The database choice should follow from requirements, not the other way around.