Inside any organization, once the headcount crosses a certain threshold, accumulating knowledge starts to break down. With two or three people, consensus aligns itself through casual conversation, and the knowledge base is more or less an extension of everyone’s daily work. Add a few more people, and the same concept starts sprouting three or four versions. You can no longer tell which document to trust. Eventually the cost of maintenance outruns the benefit of use, and either a single person is deputized as the lone maintainer, or everyone goes their own way, and the whole thing slowly rots.
Internal wikis have gone down this road. Internal CLIs have. Shared coding conventions have. Some internal BI tools have. Now it’s the turn of AI skill libraries. The precursor to this essay, Context Infrastructure, discussed how to assemble your personal cognition into an asset that can be fed to an AI, so the AI produces analysis that goes beyond consensus. But every mechanism in that essay assumed you were serving yourself alone. How do you extend the same idea to a team?
The skill system works at an individual level because it reflects one person’s judgment. How you understand a particular table, how you define a particular metric, which join path you habitually reach for when you run into a particular type of question — taken together, these are what let the AI produce non-consensus analysis. If a skill isn’t personalized, it collapses back into a generic document, and the AI reverts to its consensus default.
So skills must be personalized. That part isn’t negotiable.
But inside a team, this is exactly where the trouble starts. Two
analysts can have two equally defensible but different ways of using the
same orders table. One defines active users on a seven-day
window, another on thirty. Neither is wrong, but the queries the Agent
generates will drift. If everyone’s skills are checked into the same
directory, the AI will bump into contradictory instructions when loading
them. If no one checks anything in, nothing accumulates at the team
level, and each new hire starts from scratch.
That is the core tension: without personalization the AI is useless; with personalization people collide. Individual perspective and team-level accumulation end up working against each other under the current mechanics.
Faced with this tension, two responses come to mind immediately.
The first is to build a base skill library that everyone agrees on — covering the core tables, shared definitions, and cross-team business logic — and let each person extend it however they like. This moves fast and doesn’t require any review process. The cost is that the knowledge still sits with individuals, there’s basically no team-level accumulation, and new hires still depend on senior members to hand-hold them.
The second is to designate a person, or a small group, to collect, review, and merge contributions from everyone and maintain one authoritative version. This can build a systematic knowledge asset, with consistency guaranteed. The cost is that the review overhead becomes absurd, which contradicts the original motivation for the tool (efficiency), and in a team where no one is willing to maintain schema documentation in the first place, you won’t find anyone willing to do this long term.
The two look opposite, but they rest on the same hidden assumption: team knowledge should eventually converge to one authoritative version. The first admits that this is unachievable and gives up on accumulation. The second insists on achieving it and pays a cost that cannot be sustained.
The data engineering world actually has a mature answer here, called the semantic layer. Tools like dbt, Looker, and Holistics all follow the same idea: pull metric definitions up into a central contract stored in Git, and force all downstream queries to go through that layer. This works well for large data teams. But it carries an implicit prerequisite: you need a platform team to steward the contract, run CI/CD, and review every change. In other words, a team that can’t get schema docs written is not a team that can stand up a semantic layer. This path is real, but it leads somewhere other than the scenario we’re discussing.
So the right question to ask is different: is a single authoritative version really the only reasonable endpoint?
Once you loosen the “must converge” assumption, a different shape appears.
Concretely, each person maintains their own skill collection, allowed to overlap with or even contradict others. The team doesn’t require everyone to use the same definition; instead, it lets consensus emerge through actual use. When five people’s skills all contain a nearly identical explanation of the same table, that fact itself is evidence that the explanation has become team knowledge. When a join pattern one person wrote is copied by three others, it has become a baseline in fact.
Consensus that emerges this way doesn’t depend on central review or on any one person maintaining it long term. What it does require is two things: everyone can see what others have written; and an AI periodically compares these skills across people and tells everyone where they overlap and where they diverge.
Here’s how this breaks down into four components.
The first step is to separate storage from loading.
Everyone puts their own skills in a shared location (a Git repo is the most natural choice; a network drive or shared directory also works), with each person owning a subfolder and maintaining their own piece. The shared space allows skills written by different people to contradict each other; it only stores.
Loading is controlled by each person’s own INDEX file. INDEX is a short list of skills the person currently endorses and wants the AI to load. When the AI does work, it only reads this person’s INDEX; other people’s skills in the shared pool that aren’t referenced simply don’t enter the context. So what others wrote in the pool, and whether it contradicts what you wrote, has no effect on your Agent’s behavior.
This is really just an extension of the progressive disclosure that the skill system already has. The AI never loads all skills at once anyway — it loads on demand. The personal INDEX extends “on demand” from the per-task level to the per-person level.
The upshot: each person’s Agent behavior remains fully under their control, and the shared pool accumulates all the team’s raw skill material. Two needs that used to undermine each other can now coexist.
A new hire can’t realistically pick skills from scratch out of the shared pool. So the pool also needs to carry a baseline INDEX — essentially, the team’s recommended default skill set. New hires start from that, and gradually replace it with their own versions.
The baseline can be maintained by the team lead or on rotation. The key point is that it isn’t mandatory. Anyone can edit their local INDEX however they like; adopting the baseline is optional. It plays the same role as things like oh-my-zsh or vim-sensible in editor communities: a sensible starting point, nothing more.
How the baseline gets updated depends on the next mechanism.
The shared pool and personal INDEX solve storage and loading, but not accumulation. A third component is needed: a periodic scan job, fully delegated to the AI.
The scan itself is simple. The AI periodically walks through
everyone’s skills and uses semantic search or string matching to find
entries that talk about the same table, the same metric, or the same
type of analysis pattern, and builds a relevance graph. When the graph
shows high overlap between a few entries, the AI sends a note to each of
the authors involved: you and so-and-so have about 80% overlap on the
orders table, with the main difference in how you define
active users — want to take a look at each other’s versions?
Note that the AI never forces a merge in this process. It only does two things: spot the overlap, and notify the authors. Whether to adopt anything, whether to merge, and what the merged version should look like are entirely up to the authors.
The design maps onto two real sources of motivation for accumulation. The first is personal: I look at someone else’s version, find it illuminating, and voluntarily copy it in or revise my own. Most accumulation actually happens this way. The second is organizational: a team lead sees that several skills have converged heavily in actual practice, and promotes one into the baseline INDEX for new hires to inherit by default. What gets promoted isn’t a version that was pushed through by force, but one that was validated through daily use.
Neither source depends on a full-time maintainer, and neither strips away individual autonomy over skills. But added together, they’re enough to make team knowledge accumulate year over year.
There are two ways to reference someone else’s skill in your INDEX. One is to copy their content into your own file; from that point on, the file is detached from the source and whatever the source does doesn’t affect you — semantically it’s indistinguishable from a skill you wrote yourself. The other is to put the relative path of the other person’s file directly in your INDEX, so the AI reads the source file at load time. The upside of the second form is that you automatically benefit from improvements the other person makes; the downside is that their changes can silently shift your AI’s behavior without you noticing.
This calls for a fourth component: a periodic review task, again
delegated to the AI. The shared pool lives in Git, so which of other
people’s files you reference is plain to see, and what changed is
captured by git log and git diff. The AI
periodically inspects each person’s cross-author references, checks
whether the source files have changed, and classifies the risk of each
change.
Two tiers are enough. A low-risk change is one where the skill’s success criteria haven’t changed; the author has just hit a few more pitfalls and added some workarounds. Semantically the thing you reference still has the same purpose, it’s just more mature. The AI mentions this in a daily digest and doesn’t interrupt you. A high-risk change is one like the skill having grown large enough that the author has split it into two, so the original file’s scope has shrunk. In that case the AI reaches out through a stronger channel like email and suggests that you reference both new files in your INDEX, so you don’t silently lose coverage of cases the original file used to handle. The criterion behind the split is straightforward: does this change cause any of the behavior you were relying on to be lost or shifted?
The baseline INDEX is a special entry that many people reference, so any change to it naturally goes through the high-risk channel; it doesn’t need separate treatment.
At this point there’s probably a question: isn’t this encouraging duplication? An engineer’s reflex is DRY — duplicated code means bugs get fixed twice, interfaces changed twice, and duplication is debt.
But a skill isn’t code; it’s a prompt. A skill tweaked by ten different people doesn’t create technical debt and doesn’t cause “interface incompatibility” problems. It actually preserves ten perspectives. For an analysis task that crosses multiple business lines, ten perspectives are more useful than one “correct” version, because what you actually need is to see how different authors carve up the same metric differently.
Code reuse optimizes for machine-level consistency. The relationship between skills should optimize for human inspiration. The objective functions are different, and porting the code-world habit over will cost the skill library its most important property.
Zoom out one more level. For a team to use a skill library seriously, the components above (shared pool, personal INDEX, baseline, heartbeat, review) still need someone to set up, someone to maintain, someone to watch the outcomes. Who does that?
There’s a useful analogy from the history of Platform Engineering. Early on, every product team stood up its own database and managed its own deployments. Only after scaling up did dedicated Infra groups appear to consolidate the shared work. Context Infrastructure is likely heading down the same path.
What’s different is that most of the grunt work for Context Infra can be handed to the AI. Scanning, comparing, tracking reference changes, building relevance graphs, notifying authors — the AI can do all of that. What’s left for humans is more meta: setting the cadence for heartbeat and review, writing a few meta-skills that teach the AI how to judge overlap and risk tiers, reviewing merge suggestions it surfaces, and deciding how the baseline INDEX should evolve. So this group doesn’t have to be large — three to five people, or even one or two, is enough.
The challenge, as with every Infra group, is whether it can drive adoption. The success or failure of an Infra group has never been a technical question; it’s an adoption question. Well-run Infra groups raise the ceiling for the whole company; poorly-run ones get routed around by every product team. Context Infra will face the same test.
I’m not writing this section to suggest every company should spin up a Context Infra group. The real point is: if you’re reading this and already thinking about how to bring context infrastructure into your team, you’re the first version of this role. There’s no existing job description and no existing playbook; the shape is being defined by you.
Back to the tension from the start: individual perspective and team accumulation work against each other under the current mechanics. It’s not an inherent contradiction; it’s an artifact of the assumption that “knowledge must converge to one authoritative version.”
Loosen that assumption, and the contradiction dissolves. The shared pool lets overlap happen freely. The personal INDEX keeps each person’s Agent behavior under control. Baseline gives new hires a reasonable starting point without forcing their hand. Heartbeat lets consensus surface through daily use. Review looks after changes propagating through cross-author references. Not one of the four components requires a full-time maintainer, not one needs code review, and not one asks team members to surrender their own perspective.
Looking back, this essay and its predecessor are two sides of the same coin; the underlying principle is identical. There’s only one core mechanism at work: distill axioms from a large body of raw material, where the selection criterion is stability. The earlier essay used temporal stability (judgments that the same person makes across different contexts and times). This one uses spatial stability (identical content that different people arrive at independently). In both cases, the distillation is driven by heartbeat-scheduled AI. What gets distilled ultimately composes into a context infrastructure that can be loaded dynamically and is progressively disclosed. The dimensions differ; the engine and the resulting structure are the same.
With that infrastructure in place, the AI is no longer an off-the-shelf generalist; it becomes an informed practitioner, and with enough domain accumulation, a proprietary insightful analyst. That’s the real goal of context infrastructure.