There is a specific kind of engineering regret that does not show up in the postmortem. It does not come from a bad hire, a botched launch, or a product bet that missed. It comes from a decision made in week one that nobody wrote down, defended in year one by everyone who touched it, and then quietly absorbed six person-months of engineering time in year three while the roadmap sat still.
The pattern is so consistent it almost has a grammar. A team picks Auth0 because the docs are excellent and the free tier covers them for now. Or they pick Algolia because the search results are genuinely good on day one and the integration takes an afternoon. Or they pick a transactional email provider because it was the first result and the API looked clean. None of these is a bad decision in isolation. The problem is that none of them was made with any attention to what it would cost to leave.
Why year one feels fine and year three does not
Joel Spolsky’s 2002 essay “Fire and Motion” is about competitive strategy, but it contains a line that applies exactly here: the goal of a slow-moving competitor is to keep you busy so you cannot advance. Vendor lock-in works the same way. It does not hurt you while you are small. It hurts you the moment you need to move.
In year one, the team is small, the data is thin, and the vendor’s defaults fit. Auth0’s user management is fine for ten thousand users. Algolia’s index is fast and the relevance tuning is good enough. SendGrid handles transactional email without complaint. The DX is the whole point: someone evaluated these tools on how quickly they could get to working code, and they succeeded.
By year three, the conditions that made the choice easy have changed. The user count is higher, which means the Auth0 invoice has climbed through several tiers. The search index has grown to a size where Algolia’s per-record and per-search pricing is a line item that finance has started asking about. The email volume has moved from negligible to meaningful. And somewhere in the stack, a product requirement has arrived that the vendor either cannot support or can only support on a higher plan that resets the economics entirely.
The team now faces a choice: pay the new price, work around the limitation, or migrate. All three options consume engineering time. The third option consumes the most, but it is often the one that makes the most sense over a three-to-five year horizon. And it is almost always harder than anyone estimates.
What public migration data shows
The honest version of this conversation requires specific examples. The most useful ones are the migrations companies have written about themselves — not because the published account is complete (it almost never is) but because it lets you see the shape of the work and the period of time it consumes.
Notion’s Postgres sharding migration. Notion’s October 2021 engineering post, “Herding elephants: lessons learned from sharding Postgres at Notion,” describes a multi-year effort. By mid-2020 their five-year-old Postgres monolith — never MySQL — was hitting hard limits: engineers woken by CPU spikes, VACUUM stalls threatening transaction-ID wraparound, schema migrations becoming unsafe. They sharded application-side into 480 logical shards across 32 physical Postgres databases, using dual-write periods and dark reads to verify correctness before cutover. The underlying mistake was not picking Postgres; it was picking a monolithic database without planning the sharding strategy that horizontal scale would eventually require.
Figma’s vertical partitioning and custom sharding. Figma published “The growing pains of database architecture” in 2022. By 2020 they were running on AWS’s largest single RDS Postgres instance. Rather than migrate off Postgres — they considered and rejected NoSQL — they vertically partitioned by table. The final partitioning operation in October 2022 involved 50 tables and produced about 30 seconds of partial availability impact (around 2% of requests dropped). Horizontal sharding came later: they built a custom Go DBProxy and shipped the first horizontally sharded table in September 2023. Staying on RDS Postgres and building their own tooling was the deliberate choice — a bigger migration was the riskier path, not the safer one.
What you notice across both is that the companies willing to publish the technical narrative almost never publish the full cost in engineering hours. They publish the architecture. The actual cost in person-months tends to stay internal, surfacing only in founder conversations and in the HN threads where someone is venting at midnight.
The pricing drift problem
One factor that makes migrations harder to justify in year one and easier to justify in year three is that vendor pricing has moved — and not always in the direction the marketing pages suggest.
Auth0. Okta acquired Auth0 in May 2021 for $6.5 billion, and the pricing matured under Okta ownership. Paid tiers now start at $35 per month for Essentials (B2C, 500 monthly active users), $150 for Essentials B2B, and $240 for Professional. In September 2024, Auth0 expanded the free tier to cover up to 25,000 monthly active users — a meaningful improvement on the volume side. But the production-grade features most growth-stage applications need — MFA, role-based access control, audit log streaming to Datadog or Splunk, separate development and production environments — sit behind the paid tiers. The free tier is generous on volume and restrictive on capability. The shape that emerges is a structural pull toward Essentials and Professional once a product is past prototyping, even when MAU counts would otherwise fit under the free ceiling.
Algolia. Algolia restructured its pricing on March 30, 2023, with a public announcement titled “Algolia Introduces New Developer-Friendly ‘Build’ Pricing Plan with One Million Free Records.” The new Build tier raised free records from 10,000 to 1 million — a 100x increase — and the new Grow tier cut search-request pricing by 50% and record pricing by 60%. Grow now runs $0.50 per 1,000 search requests and $0.40 per 1,000 records, on a pay-as-you-go basis. The 2023 change is the most customer-favorable pricing move in the search-as-a-service category in years. The lock-in dynamic, however, is unchanged: index format, relevance configuration, and API conventions are not portable. Cheaper-to-stay does not mean easier-to-leave.
Stripe. Stripe’s headline pricing has been stable for years at 2.9% plus $0.30 per transaction for standard online card processing in the United States. What has changed is the surface area. Teams that started on a clean Stripe integration and have since added Subscriptions, Connect, Radar, and Stripe Tax have a much more complex dependency than they originally signed up for, even if the per-transaction rate has not moved.
The pattern across all three is the same: the tool that was easy to adopt is harder to leave than it was to join, and the cost of staying — whether measured in dollars, in product surface, or in operational entanglement — has risen faster than the cost of the product the team is building.
Why teams make this mistake
The day-one developer experience problem is structural, not a failure of judgment. The people evaluating a tool are almost always the people who will use it first, and the criteria they apply are the criteria that matter most to them right now: how fast can I get to working code, how good are the docs, how clean is the API, does this solve my immediate problem. Exit cost is not a criterion that feels real on day one because exit cost is not a day-one problem. Nobody integrating Auth0 for the first time is thinking about what it will take to move to Clerk or to a self-hosted Keycloak instance in three years. They are thinking about OAuth flows and JWT validation and whether the dashboard is comprehensible. There is also a motivated reasoning problem: once a team has integrated a vendor deeply, the people who did the integration have a stake in the decision having been correct. Questioning the vendor choice is, implicitly, questioning their judgment. This is not cynical, it is just how people work. The result is that the “this is fine” assessment of a vendor’s limitations tends to persist longer than it should, right up until the moment when it clearly is not fine and the migration is unavoidable.
The tool that was easy to adopt is harder to leave than it was to join, and the cost of staying has risen faster than the cost of the product the team is building.
The vendors most likely to lock you in
Not all vendor lock-in is equal. Some tools are easy to replace; others are load-bearing in ways that are not obvious until you try to remove them.
Auth providers are among the highest-lock-in categories. Auth0, Okta, and Cognito all store user credentials in a format and location that requires careful migration. Password hashes, MFA configurations, and session management are deeply embedded in how users experience the product. A user who cannot log in during a migration cutover is not a technical problem, it is a customer service crisis. Clerk has published migration tooling specifically designed to reduce this friction. Supabase Auth is open-source and self-hostable, which changes the exit calculus entirely. If you are choosing an auth provider today and have any reason to believe you will outgrow the free or low-cost tier within two years, the migration cost of Auth0 or Cognito is worth pricing in explicitly.
Search-as-a-service is similarly sticky. Algolia’s index format, relevance configuration, and API conventions are not portable. Moving to Typesense, Meilisearch, or a self-hosted Elasticsearch cluster means rebuilding the index, re-tuning relevance, and rewriting the integration layer. Typesense has positioned itself as the lower-lock-in alternative, with an API intentionally similar to Algolia’s and a self-hosting option that removes the per-search pricing entirely. For teams where search is a core product feature rather than a secondary utility, the build-vs-buy question deserves more time than it usually gets.
Transactional email is lower-stakes than auth but higher-stakes than it looks. SendGrid, Mailgun, and Postmark all have their own template formats, suppression list management, and webhook schemas. Switching providers means migrating templates, re-verifying domains, and rebuilding any automation that listens to delivery events. Postmark has a strong reputation for deliverability and a clean API; it is not the cheapest option but it is the one most engineers who have migrated between providers tend to land on and stay with.
Payment processors are the highest-stakes category and, paradoxically, often the most defensible choice. Stripe is genuinely hard to migrate away from, but the reasons to do so are narrower than they are for auth or search. Stripe’s pricing has been stable, its feature set is comprehensive, and the alternatives — Adyen for enterprise, Tap or HyperPay for MENA, Braintree for specific use cases — each have their own switching costs. The case for moving off Stripe is usually either geographic (needing local payment methods Stripe does not support well) or pricing (at very high volume, the per-transaction fee becomes meaningful enough to negotiate or route around). For most teams, Stripe is the right choice and the lock-in is acceptable because the alternatives are not clearly better.
A pre-commitment checklist for any vendor over $5,000 per year
The framework does not need to be complex. Before committing to any vendor that will cost more than $5,000 per year at current or projected scale, or that will be integrated into a core product flow, answer these five questions in writing.
What does this vendor store that we cannot easily export? User credentials, trained models, proprietary index formats, and customer data in vendor-specific schemas are the categories that create hard exits. If the answer is “nothing we cannot export,” the lock-in risk is low. If the answer is “user password hashes” or “our entire search index configuration,” the lock-in risk is high.
What does the pricing look like at 10x our current scale? Most vendor pricing pages make this easy to calculate. Do the math now, not when you are already at 10x. If the number is uncomfortable, it is worth knowing before you are committed.
What would a migration to the next-best alternative actually require? Name the alternative. Estimate the work in rough person-weeks. If you cannot name an alternative, that is a signal worth sitting with.
Is there a self-hosted or open-source option that covers 80% of the use case? For auth: Keycloak, Supabase Auth. For search: Typesense, Meilisearch. For email: a self-hosted Postal instance or a simpler SMTP relay. The self-hosted option is not always the right answer, but knowing it exists and what it would take to run it changes the negotiating position with the vendor.
Who on the team will own the relationship with this vendor in year three? This question sounds soft but it is not. Vendor relationships degrade when the person who made the original decision has moved on and nobody has context for why the choice was made. Assigning ownership at the point of commitment, even just a name in a decision log, makes the year-three conversation less likely to start from scratch.
The decisions least likely to compound badly
Some categories are genuinely low-lock-in and do not need this level of scrutiny.
Feature flags via LaunchDarkly or Unleash are easy to migrate between; the integration surface is thin and the data model is simple. Monitoring and observability tools like Datadog or Grafana Cloud have high switching costs in terms of dashboard configuration but low costs in terms of data portability. Error tracking via Sentry is easy to replace; the integration is a few lines and the historical data is rarely critical to preserve. Analytics via Mixpanel or Amplitude is annoying to migrate but not catastrophic; the data can be exported and the integration is not load-bearing in the same way auth or payments is.
The categories that deserve the most scrutiny are the ones closest to the user’s identity and money: auth, payments, and any vendor that stores data that users created and expect to get back. Everything else is a nuisance to migrate. Those are the ones that can stop a roadmap.
The week-one decision that nobody wrote down is sitting in your stack right now. It is probably fine. It will probably stay fine for another year. The question worth asking is not whether it will become a problem, but whether you will have the context and the runway to deal with it when it does.