We see many of our large clients respond to the costs and inconsistencies of running multiple local instances of an online service by setting up a single global instance that everyone must use; examples include build tooling, such as Jenkins, and issue tracking, such as Jira.

But often those common services aren't good enough. Remember your frustration when stuck on hold because your call isn't "important" enough to the company to hire more operators? For many employees, that's their entire workday.

Why you want common services

There are very good reasons for establishing a global instance, especially for commodity services.

First, it supports the establishment of a centre of expertise, a dedicated team who really understand the tooling and can make sure that it’s appropriately supported. This should free up the staff who have been looking after the many local instances to either join the core team or to work on more customer-facing activities.

Second, it gives the organisation a chance to find out who amongst its tens of thousands of employees and vendors is actually using this tool, and to shut down the many zombie instances that are no longer active or are decaying. 

Third, it allows the organisation to bring some consistency to the service. This includes making sure that critical properties, such as resilience and security, are actively supported. A service failure can cripple an entire team. 

Why you don’t want common services

Unfortunately, this ideal is often not realised because of organisational misalignment. The most common failure we see is under-investment, both in terms of infrastructure and staff. 

While the headline costs of centralisation might be lower, especially when responsibility is transferred to staff in a low-cost location, this doesn’t account for the increased drag on the users of the service—which might include thousands of developers around the world and, in turn, their customers.

Problems we’ve seen in practice include:

  • Being unable to commit to version control in the afternoon, because that’s when the US comes online and swamps the servers
  • Some staff struggling to synchronise code at any time because of slow connectivity in their location
  • Four days to change a line in a configuration file because support was in a different time zone and didn’t have the skills to interpret the request
  • Difficulty finding approved libraries because the repository hosting is underpowered for search

Apart from the direct time lost, it's hard for staff to stay motivated and focussed in a broken environment. How badly does anyone want this product if we're working with broken tools?

This type of failure can be traced directly to under-investment, which is particularly a risk when the justification for centralisation is direct cost saving, rather than effectiveness. Everyone is doing their best to meet their goals, of course, but a motivation for underinvestment is the disconnect between the targets for the service and the effects on its users.

Sharing is not free

There are less obvious costs, even for services that are supported appropriately. Common services introduce external dependencies that lock together groups with different goals.

When a task requires changes to multiple common services, each based in a different region, the coordination costs can overwhelm the actual work to be done. The simplest change can take days of round-the-world correspondence. To make progress, local teams eventually will either develop informal social networks to expedite changes, or break the rules and implement their own solution on an old laptop under a desk. As Gregor Hohpe wrote, “Most enterprises run a vibrant bootleg market, which is often the only way to get anything done.”

Even where there are no changes required, any service has to balance between the needs of different clients. Again, we see two common failings:

  • client teams developing brittle workarounds for “standard” configurations which are too restrictive; and,
  • client teams spending time (mis)understanding complex “standard” configurations because the service includes everything any user has ever asked for

There’s an implicit setup cost. Any commitment this large will require significant time and effort to propose, approve, initiate, design, and build. Once the initiative has been approved, a likely consequence is that investment stops on all the existing implementations because the new service will make them redundant. This can leave teams stranded, without features they need, while waiting for the global implementation to be released and stabilised. And often that global roll-out can take many months.

Finally, there’s an organisational risk. Once an entity is established, it gains a life of its own, where one of its goals is its own continued survival. The initial excitement dies down, the founding team moves on, and the group starts to look inward. This can prove a major impediment to experimentation and moving to the next generation of the tooling—or even to upgrading to a current version.

Common services are a multiplier

What many large organisations don’t take into account sufficiently is how the effects of common services are multiplied across thousands of staff across the world. Those extra minutes per day that every employee spends waiting for system responses, plus the secondary costs of lost focus and demotivation, add up to a huge overhead that can swamp any direct saving. This, in turn, triggers another level of loss, from customer business that is delayed or cancelled because the organisation cannot reliably deliver new functionality.

And that’s the core of the problem, costs and benefits that are localised in organisational silos, rather than viewed across the whole organisation. These implicit costs rarely appear on any spreadsheet, but they still exist.

Common services have a multiplier impact across an organisation, but there’s no guarantee as to whether that impact is up or down.

Common services are critical infrastructure

So, how does an organisation make a common service work?

First, the top priority of the initiative has to be to make the work of its users easier by taking on some of their load, not to enforce compliance or to save costs. The real savings are in the improved productivity of the organisation, not in reduced staff or decommissioned servers. A service is a product and needs to be run as such, including active product management. Unlike an external product there are no sales revenues to track, so success criteria must be based on how well it helps its users succeed in turn.

Second, do you really need to implement all of your services internally rather than using one of the standard publicly hosted services? They’ve built their company on supporting their tool, they have the best engineers in that business, they probably have better security than you do because they don’t have your legacy, and you can tell exactly how much it costs. Every organisation has artefacts that are deeply sensitive but, for much of what we do, good authorisation with a reputable provider is appropriate.

Third, if you decide to run a service in-house, give it the necessary status and funding. The service levels you promise will guide hardware and infrastructure choices. But you also need to allocate your best staff to grow and run the systems. The point of running an internal implementation is so that you can adapt it for your organisation, which requires taste, judgement, and skill. That's why well-known software service companies still have offices in expensive software clusters, they know how hard it is to do this sort of thing well and they need access to the right people.

Finally, for really large organisations, not every service needs to be completely centralised. Some, such as authorisation and email, should be because they’re well-understood commodities. Others, such as build systems or (the inevitable) Jira, are highly configurable and often should be tuned to the needs of a particular programme. We regularly see Jira installations that are so complicated, because they had to meet every team’s requests,  that no-one knows which options to select and any reporting is meaningless. A per-programme installation might appear less efficient, but allows for a better-tuned local setup so that staff can get more done. After all, some corporate programmes are larger than most companies’ entire software organisations.

  • Steve Freeman

    Steve Freeman

    Distinguished Consultant

    CV

    Steve Freeman, author of Growing Object Oriented Software, Guided by Tests (Addison-Wesley), was a pioneer of Agile software development in the UK. He has developed software in many organisations from small vendors to multinational institutions. Previous to his Zühlke engagement, he has worked as an independent consultant, for research centres in Palo Alto and Grenoble and software houses, earned a PhD, written shrink-wrap applications for IBM, and taught at several universities.