The data industry has a standard playbook: before you can use AI, you need to modernize. Migrate to a cloud data warehouse. Set up an ETL pipeline. Build a transformation layer. Create a semantic model. Then — and only then — can you start asking questions.
This playbook costs $2M and takes 18 months. And 73% of these projects overrun on both budget and timeline.
We think the playbook is wrong.
The modernization industrial complex
There's a $47B annual market in data modernization. Snowflake, Databricks, Fivetran, dbt, Looker, Sigma — each tool is excellent at what it does. Together, they form a stack that requires dedicated data engineers to build and maintain.
For companies with 50+ engineers and $100M+ in revenue, this makes sense. The investment pays for itself in operational efficiency and self-serve analytics.
But for a SaaS company at $10M ARR with 15 engineers? Spending $200K+ and 6 months on data infrastructure — before getting a single insight — is not a reasonable trade-off. Especially when the insights they need (accurate MRR, churn prediction, entity deduplication) are well-defined and could be delivered in days.
The 80% who can't use AI
McKinsey reported that 80% of companies that want to use AI on operational data can't — because their data isn't "ready." But what does "ready" mean?
Usually, it means the data isn't in one place. It's scattered across Stripe, Postgres, HubSpot, Zendesk, and a dozen other tools. The traditional answer is to centralize it. Our answer is: don't.
Virtual integration vs. physical migration
Physical migration means copying all your data into a warehouse, transforming it, and querying the warehouse. Virtual integration means connecting to your data where it lives and computing what you need on the fly.
The trade-offs are real:
| Physical migration | Virtual integration | |
|---|---|---|
| **Setup time** | 3–6 months | Hours |
| **Cost** | $100K–500K/year | $500–5K/month |
| **Data freshness** | Minutes to hours (ETL lag) | Real-time |
| **Flexibility** | Very high (arbitrary SQL) | Focused (predefined metrics) |
| **Maintenance** | Ongoing engineering effort | Managed |
Virtual integration is worse for exploratory analytics ("let me write arbitrary SQL against all my data"). But it's better for operational intelligence ("tell me my MRR, flag anomalies, predict churn") — which is what 90% of SaaS companies actually need.
What you actually need
Most SaaS companies between $2M and $20M ARR need exactly five things from their data:
- . **Accurate MRR** — including proper normalization of annual plans, multi-currency handling, and deduplication
- . **Churn analysis** — not just "who churned" but "why" and "who's likely to churn next"
- . **Customer 360** — one view that combines billing, product usage, and CRM data
- . **Anomaly detection** — automatic alerts when metrics deviate from baseline
- . **Daily brief** — a Slack message every morning with the key numbers and any flags
None of these require a data warehouse. They require read-only access to your existing systems, entity resolution to connect records across sources, and a metrics engine that knows how to compute SaaS KPIs.
The alternative path
Here's what the non-warehouse path looks like:
No warehouse. No ETL. No dbt models. No 6-month project plan.
When you do need a warehouse
To be clear: data warehouses are the right choice for companies that need ad-hoc analytics across dozens of data sources, have dedicated data teams, and want to build custom dashboards and models.
If you have a VP of Data Engineering and 3+ analysts, modernize away. The tooling is excellent and the long-term benefits are real.
But if you're a CTO at a $10M ARR SaaS company trying to understand why churn spiked last month, and the alternative is a 6-month warehouse project or a 3-hour Vesh AI pilot — the choice is straightforward.
The real insight
The modernization trap isn't that warehouses are bad. They're great. The trap is believing you need one before you can get any value from your data. You don't. You need the right questions, the right connections, and an engine that can compute answers from the data you already have.