Your platform team isn't a cost center, it's a competitive advantage, and you're leaving money on the table.
We're all familiar with the usual framing: product engineering is what's truly important. Operations is just table stakes, a cost to be borne. "Everyone knows" and (like many things everyone knows) it's wrong1. I've been watching this pattern for nearly a decade. I've seen what happens when companies accept the framing. I've witnessed first-hand the predictable fire-fighting, the last-minute scrambling to get a release out the door, the burnout and churn of valuable engineers.
I've also seen what happens when companies reject it, and watched that choice become a competitive advantage worth hundreds of millions.
Let me show you what that alternative looks like.
The Problem
The cost-center framing creates predictable, preventable failure modes. Tell me if this sounds familiar: product engineering comes to you at the last minute, looking for your blessing to deploy the latest feature to production. But their solution won't scale, or costs will be prohibitive, or the toil it will impose will be immense. You're in an impossible position: allow a flawed release you'll be paying technical debt on for months (years? forever?) or push back, and become the "bad guys" who delayed the release of "critical business priorities".
Platform teams do complex and critical work which is often invisible until it's on fire. They're given no seat at the table, little to no voice in planning. Then something breaks. It's an emergency and you're expected to fix problems you could have prevented from the start. Infrastructure concerns surface late in the game instead of during design when they could be built around.
Even the most excellent platform engineers, the truly customer-focused ones, are flying blind about what matters - to the customers and thus the business. Their priorities often lack a clear connection to the workflows customers depend on. While operational dashboards show green, customers are missing critical functionality2.
When the fires start, executives notice, and your team gets fleeting attention. Best case, you get promises of resources and prioritization that quickly fade: "de-prioritize that work, we have revenue to pursue". Until the next fire. Worst case: your team bears the brunt of blame and is directed to fix the problem: with no more resources than you had to begin with.
The tension is adversarial. We're all familiar with the "Ops vs. Dev" divide, one that the promises of "DevOps" failed to bridge3. Stability and scaling fight for resources with feature delivery. You know what this feels like. You sense it can be better, and you're not sure how.
Changing The Narrative
This requires a mindset shift - and not just from leadership.
Starting with the platform teams: your dashboards aren't the source of truth you want them to be - customer outcomes are. The shift from "are the servers/services healthy?" to "are customers getting what they need" changes what you measure, how you measure it, and what you fight for. It also gives you powerful tools to negotiate for the resources you need. You come to the table speaking the language of executives and product managers. Your role is no longer to say "no", it's to productively highlight the operational and cost concerns. To be the voice in the room that asks "how will this scale?", "what will this cost?", "how can we deliver this in a way that safeguards our customers' trust?" You raise these questions when there's time to make eyes-open tradeoffs, not late enough to be the bearer of bad news.
Imagine if you have a seat at the product table. You bring data about costs and operational metrics. You balance the enthusiasm of product for new features against realistic understanding of what it takes to bring those features to production in your environment. You provide the key information to leadership to help them drive decisions. You don't walk away from that table with everything you want (nobody does), but you walk away with alignment. You walk away knowing that leadership has all the information. Most importantly, you didn't spend your time in that conversation saying "no".
The power of the shift from "no" to "yes, however" is striking.
Product and leadership: reliability, scalability, operational excellence - these aren't just how you "keep the lights on." They're potential differentiators. While your company scales efficiently, your competitors are burning cash on AWS. While your platform just works, earning customer trust that compounds over time, your competitors are burning trust on endless incidents. Operational excellence generates revenue through customer retention. Operational concerns aren't just table stakes, they're a key component of your strategy... if you let them be.
Imagine if you give your platform team a seat at the product table. You invite them to inform your strategy. New features come in and product and platform give you all the information you need to make eyes-open decisions. You get to strike the balance between speed of delivery and stability out of the gate. You have a realistic view of the road ahead. Both teams feel heard, and walk away aligned and ready to pull together to meet your objectives.
Seeing It In Action
These scenarios aren't just fantasies. I've seen it work.
Early in 2012, I joined Krux Digital, a startup Data Management Platform4. I was the third engineer to join Krux' operations team. One of the key draws for the role was that Ops at Krux had "control our infrastructure costs" as an explicit part of the team charter. Our team had authority and discretion, baked into the culture from the start - we weren't second-class engineers, we were a critical part of the development team.
That authority, backed by the CTO, meant we became more than gatekeepers to production. We consulted on the design of new features. We pushed back on decisions that would have significant scaling implications. We proposed engineering projects that weren't (directly) "revenue generating" but led to significant improvements to our efficiency - and those projects became part of the roadmap.
This wasn't without friction. When we proposed rewriting our primary beacon endpoint5 from Python/Tornado to a custom Nginx module, product resisted. "It doesn't generate revenue." But we could make the case: we were burning AWS instances to handle ~500k requests/second. Nginx would be dramatically more efficient, and thus cost-effective. Backed by data and our charter, we won that argument. Other times, we had to compromise - at a startup sometimes delivery speed is more critical than efficiency. The key was: the whole team knew when we were making trade-offs and why. We stayed aligned.
Like many operations teams, we displayed our primary operational signals on prominent wall displays where the whole engineering team could see them at a glance. Unlike most operations teams, our most visible display was our cost dashboard: AWS spend over time, cost breakdown by service, and cost per impression. Our team had alerts for when the slope of our cost curve changed outside of a certain tolerance. Cost efficiency was a first-class operational metric. Cost wasn't a quarterly financial surprise, we had no sudden reactive cost-cutting scrambles. We responded in real-time, and maintained our edge.
This paid off handsomely.
We didn't truly appreciate the effectiveness of our cost control until the due diligence phase of Krux' acquisition by Salesforce. They'd seen other startups, with exponential cost-per-user curves. Our cost curve was logarithmic. When Salesforce acquired the company for ~$700M, they weren't just buying the tech. They were buying the cloud expertise.
A Better Way
When platform teams understand the product they can anticipate business needs, not just react to fires. When product understands operational constraints, they can incorporate scalability and cost from the start. When working in concert, both can provide the critical information leadership needs to make eyes-open decisions.
The natural tensions between "ship faster" and "keep it running" never go away - nor should they! When everyone has a seat at the table, that tension produces better outcomes than either perspective could alone. The tension becomes healthy, generative; no longer "platform vs. product" but "platform and product" pushing against each other productively in ways that make the business stronger.
A motivated platform team plugged into product is magic. When platform engineers can draw a steel thread between their work and customer outcomes, they're not just keeping the lights on anymore. They're invested. They bring ideas and push back productively. They catch things that might otherwise slip through. And they burn out less frequently.
Those of us who have lived and breathed platform engineering know the weight of an environment where our best work is invisible and our mistakes invite scrutiny from the highest levels. When we have a seat at the table, when the work is visible, valued, and connected to outcomes, that weight lifts. We're not just preventing disasters, we're building something tangible. We're directly supporting the mission. Our work matters.
The Takeaway
Your platform team isn't a cost center. You can keep treating it as "keeping the lights on." You can leave the team reactive, invisible, fighting for scraps. Or you can give them a seat at the table, authority and accountability over costs, visibility into customer outcomes and business objectives, and watch what happens.
Ask yourself: Does your platform team have a seat at the table, or do they find out about scaling problems at release time? Is cost a real-time signal or an unpleasant quarterly surprise? Do your platform engineers know how their work connects to customer outcomes, or are they just keeping the lights on?
You know the framing. It doesn't have to be this way. Unlock some of your best engineers. Prevent the disasters before they happen. Capture the competitive advantages you've been leaving on the table.
Footnotes
-
Charity Majors put it nicely:
↩"It's an abstraction, a brute simplification... but it's also a lie"
-
At Heroku we had sophisticated observability into the state of our logging pipeline: running processes, disk space, memory, disk and network I/O. All the usual suspects. But when a product change delayed customer log delivery by nearly 10 minutes, we had no signal - we were monitoring the wrong things. While our board showed green, customers bombarded support. We couldn't correlate the product change to the customer impact, because we weren't monitoring what customers actually cared about: how soon they saw their log events. ↩
-
Fight me. ⚔️ ↩
-
"Data Management Platform" is an ad-tech term for a system that ingests, organizes, and analyzes data from publisher websites. ↩
-
"Beacon" was what we called our data ingestion endpoint. It was designed to return a 204 No Content HTTP response as quickly as possible to ingest data while avoiding introducing latency into our customers' sites. ↩