Fluency First: The Right Order of Operations for AI in Procurement

A surprising amount of my job is being a professional CPO meet-and-greeter. Conferences, customer visits, intro calls -- I spend my weeks talking to procurement and supply chain leaders, and for the past two years one question has beaten every other question, at every table, in every accent: what should we actually be doing about AI? Not the vendor-pitch version. The real one -- what do I do for my team, my org, my own career.

I have given the answer enough times now that it has hardened into a shape, and the shape surprises people: it is not a technology answer. The companies getting AI right are not the ones that picked the best model. They are the ones that solved three different problems, in the right order -- personal fluency first, organizational systems second, organizational design last.

The companies stuck in pilot purgatory almost always skipped a step.

The gap nobody wants to own

Two numbers describe the moment. Eighty-eight percent of organizations now use AI in at least one business function, per McKinsey's latest State of AI survey. Yet when MIT researchers analyzed 300 enterprise deployments, 95% of generative AI pilots showed no measurable P&L impact.

Everyone is using it. Almost nobody is getting paid for it. And before you blame the models, the same MIT analysis traced the failures to organizational causes: pilots that never touched core systems, tools that never learned the business -- and a budget detail this audience should sit with. More than half of AI spending chased sales and marketing, while the most dramatic documented savings came from back-office automation. Procurement is the back office. The money is being spent where the demos are flashy, not where the returns are.

So here is the three-layer answer I give at all those tables, with the evidence behind each layer.

Layer one: start in the playground

The cheapest capability investment available to a procurement leader is a $20-to-$30-a-month frontier model subscription for every person on the team. ChatGPT, Claude, Gemini, Copilot -- which one matters far less than starting. The assignment is simple: use it on real work, small then bigger. Summarize a supplier's 10-K before the negotiation. Draft the RFQ clarification questions. Role-play the other side of a price discussion. The assistants now live inside Excel, where your spend data already sits, which removes the last excuse.

Why start here, when the org-level prize is so much bigger? Because of what the research calls the jagged frontier. In a field experiment with 758 BCG consultants, people using AI finished 12.2% more work at higher quality on tasks inside the model's competence -- and were 19 percentage points more likely to get the answer wrong on tasks just outside it. Same tool, same people, opposite outcomes. The frontier between those zones is invisible. It does not follow job titles or intuition. The only way to learn where AI is brilliant and where it confidently fails is repeated, low-stakes contact.

Personal use is the right venue because it is a playground: you read every output, mistakes cost minutes, and nothing touches a system of record. Shopify's CEO pushed this to its logical end in a memo titled "AI usage is now a baseline expectation" -- teams must show AI cannot do a job before asking for headcount. You do not need a mandate that aggressive. But you cannot opt out personally, either: the build-versus-buy and guardrail calls in layer two cannot be delegated to people who have never used the tools. I have watched executives evaluate AI platforms the way I would evaluate figure skating -- confidently, and with no relevant experience. The subscription is how you earn the judgment.

One rule applies even in the playground: provision official accounts. MIT found that 90% of employees already use personal AI tools at work, while only 40% of companies have purchased an official subscription. Your team is already playing. The only question is whether your supplier pricing went with them, on a personal account, into a consumer product's training data.

Layer two: the organization is not a bigger playground

Here is where most companies stumble, and the stumble has a structure. Individual AI use is forgiving because a human reads every output. Organizational AI is different for one reason: nobody is reading every output anymore. An agent that contacts suppliers or updates records inherits whatever foundation it stands on -- and in Deloitte's 2025 Global CPO Survey, 57% of CPOs named siloed working practices a top barrier to value delivery. An agent pointed at contradictory data does not fail safely. It acts on it.

The differences are worth seeing side by side:

	The playground (you + a chatbot)	Production (agents in your org)
Who checks the output	You, every time	Often nobody
Cost of a mistake	Minutes of your time	A wrong PO, a damaged supplier relationship
Data it touches	Whatever you paste in	Your systems of record
What it needs to work	A subscription	Clean data, encoded business rules, a system of action
Permissions	Not applicable	Read-only / write-with-approval / forbidden zones
The right builder	You, experimenting	Tested vendors who absorb the frontier for you

So the build order matters. First, a data foundation: item, supplier, spend, and contract records that agree. Second, business logic: approval thresholds, preferred suppliers, payment terms -- the encoded rules of how your company actually buys. Third, a system of action, where sourcing events, approvals, and purchase orders execute. Only then the AI layer.

The enterprise AI stack: what has to exist before agents can act

Scope that top layer by permissions, in writing. Read-only agents summarize risk and find duplicate spend. Write-with-approval agents draft the supplier email and build the requisition; a person owns the commit. Fully autonomous writes should stay rare and thresholded, and some domains belong off-limits entirely: payment execution, banking detail changes, anything that moves money without a human in the loop. (If this layered picture sounds like an engineering control system, that is exactly what it is -- I wrote a whole nerdy companion post on it.)

Then the question I hear most at those conference tables: build or buy? The evidence is unusually one-sided. In MIT's sample, external partnerships reached deployment about 67% of the time, against roughly 33% for internal builds -- twice the success rate. The reasons are structural, not flattering to vendors: a vendor amortizes hard lessons across hundreds of deployments, so you learn from everyone else's rough edges instead of only your own. And the frontier moves too fast to chase in-house -- METR measures the length of tasks AI agents can complete autonomously doubling roughly every seven months. A system hand-built around today's model ages out before it pays back; a vendor tracking the frontier absorbs the upgrades for you. Buy carefully all the same -- Gartner expects over 40% of agentic AI projects canceled by end of 2027 and warns thousands of vendors are relabeling old automation as agents. Demand action logs, data portability, model-swap options, and reference customers. Save internal builds for the thin proprietary edge no vendor serves. This is the same make-versus-buy logic procurement applies to everything else -- applied, for once, to procurement's own tooling.

External partnerships reach deployment about twice as often as internal builds -- MIT GenAI Divide, 2025

Layer three: the org chart catches up last

When every buyer runs a fleet of AI workers, the scarce human contributions become strategy and exception handling. The exposed roles are not the people with judgment; they are the ones who collect status, move information between systems, or check work that software now checks better. I wrote a full ranking of procurement and supply chain jobs by AI exposure -- the pattern there is the same one here.

The flattening is already underway. Gusto's payroll data shows small-business managers now oversee nearly six people, up from about three in 2019, and Amazon raised its ratio of individual contributors to managers by 15% by March 2025.

But treat those numbers as direction, not permission, because the cautionary data is just as strong. Korn Ferry's survey of 15,000 workers found 41% had seen management layers cut -- and 37% of them felt directionless afterward. Klarna automated roughly 700 service roles, watched quality slide, and resumed hiring humans in 2025. Flattening is an output of layers one and two working. It is not a shortcut to them, and the companies that restructure first are running the sequence backward.

What emerges, run forward, is a structure I would describe as senior leaders setting strategy and risk appetite, a thinner management layer that handles exceptions and designs the systems agents operate in, and what I have started calling super ICs: experienced individual contributors who each direct a portfolio of AI workflows. Picture one person owning resin-market intelligence across business units -- agents watching index moves and supplier capacity, the human stepping into negotiations only on exceptions. (People at the center is not a slogan in this picture; it is the job description.)

Staff it by retraining, not replacement. The World Economic Forum expects 39% of core job skills to change by 2030, and 85% of employers plan to upskill for it. Here is the asymmetry that should drive every retraining decision: your senior buyer who knows which supplier quotes aggressively and slips on tooling timelines is the best agent operator you will ever hire. Prompting can be taught in months. Supplier judgment takes a decade.

A toy excavator and the real thing -- the playground and production are the same capability at different scales

What I tell the room

At LightSource we live inside layer two every day -- our customers are challenger manufacturers who consolidated their sourcing data precisely so that AI, theirs and ours, has clean signals to act on, with normalized bids and BOM-level costs as the foundation rather than the afterthought. That vantage point is exactly why I keep giving the unglamorous answer.

The sequence is the strategy. Most of the 95% ran it backward: agents before the data foundation, restructuring before anyone could tell good output from confident nonsense. Run it forward and each layer de-risks the next -- the playground builds the judgment, the judgment picks the systems, the systems earn the reorganization. The subscriptions cost less per person than a supplier lunch. The stack takes quarters. The org chart moves last, and by then it is catching up to a team that already knows exactly what the machines are good for.

That is the whole answer. It fits in a conference hallway, and it has not failed me yet.

Sources

McKinsey, The State of AI (November 2025) -- 88% of organizations use AI in at least one function; 39% report any EBIT impact; high performers are 2.8x more likely to have fundamentally redesigned workflows
Fortune: MIT report finds 95% of generative AI pilots at companies are failing (Aug. 18, 2025) -- coverage of the MIT NANDA GenAI Divide findings
MIT NANDA, The GenAI Divide: State of AI in Business 2025 (report PDF) -- external partnerships reach deployment ~67% vs ~33% for internal builds (p. 19); 50% of AI budgets flow to sales/marketing while back-office automation delivers the most dramatic savings (p. 20); 90% of employees use personal AI tools while 40% of companies purchased subscriptions (p. 8)
Harvard Business School working paper 24-013: Navigating the Jagged Technological Frontier (2023) -- 758 BCG consultants; +12.2% tasks, 25.1% faster, ~40% higher quality inside the frontier; 19 percentage points worse outside it
Microsoft Learn: Copilot in Microsoft 365 apps with Anthropic models (2026) -- frontier AI assistants selectable inside Excel
CNBC: Shopify CEO says staffers must prove AI can't do a job before asking for headcount (April 7, 2025)
Deloitte, 2025 Global Chief Procurement Officer Survey -- 57% of CPOs cite siloed working practices as a top barrier
Gartner press release (June 25, 2025): Over 40% of agentic AI projects will be canceled by end of 2027
METR: Measuring AI Ability to Complete Long Tasks (March 2025) -- autonomous task length doubling roughly every 7 months
Axios: Middle managers in decline as "flattening" spreads, AI advances (July 8, 2025) -- Gusto span-of-control data
About Amazon: Andy Jassy's update on manager-to-contributor ratios (Sept. 2024)
Korn Ferry, Workforce 2025: Power Shifts -- survey of 15,000 workers: 41% report delayering; 37% of those feel directionless
Fortune: Klarna plans to hire humans again (May 9, 2025)
World Economic Forum, The Future of Jobs Report 2025 -- 39% of key skills expected to change by 2030; 85% of employers plan to prioritize upskilling

Frequently Asked Questions

How much should a procurement team budget for individual AI tools?

Frontier model subscriptions (ChatGPT Plus, Claude Pro, Gemini, Microsoft Copilot) run $20 to $30 per person per month. For a 20-person procurement team that is under $8,000 a year -- a rounding error against most category budgets, and the highest-information-per-dollar spend available because it builds the judgment every later AI decision depends on.

Where should companies forbid AI from acting autonomously?

The clearest forbidden zones are payment execution, supplier banking and master data changes, and any action that moves money without a human in the loop. The working principle: AI can draft and recommend in high-stakes domains, but a person owns the commit. Write these zones down as policy; unwritten guardrails do not survive turnover or vendor demos.

Should procurement organizations build or buy AI capabilities?

MIT's 2025 GenAI Divide research found AI pilots built through external partnerships reached deployment about 67% of the time, versus roughly 33% for fully internal builds -- twice the success rate. Buy the platform layer from vendors who track frontier model releases, and reserve internal building for narrow, proprietary problems no vendor serves. Whatever you buy, require action logs, model-swap options, and data portability so you are not locked in.

Will AI eliminate procurement managers?

It thins the middle rather than eliminating it. Spans of control are widening -- Gusto data shows small-business managers overseeing nearly twice as many people as in 2019 -- but Korn Ferry found 37% of employees in delayered companies feel directionless, and Klarna had to rehire after over-automating. The managers who remain shift toward exception handling, coaching, and designing the systems AI workers operate in.

What skills should procurement professionals develop for the AI era?

Daily fluency with AI assistants, enough data literacy to question a model's output, and the supplier judgment that only comes from negotiations and launches. The World Economic Forum expects 39% of core skills to change by 2030. The defensible human skills in procurement are relationship depth and exception judgment -- knowing when a quote, a clause, or an agent's action is wrong despite looking right.

How do you know if your organization is ready for AI agents?

Run a simple data test: pull the same supplier's spend, open orders, and contract terms from your systems and see whether the numbers agree. If a human analyst cannot trust the records without manual reconciliation, an autonomous agent cannot either. Readiness means clean foundational data, encoded business rules, and a defined system of action -- before the first agent gets write access.

A surprising amount of my job is being a professional CPO meet-and-greeter. Conferences, customer visits, intro calls -- I spend my weeks talking to procurement and supply chain leaders, and for the past two years one question has beaten every other question, at every table, in every accent: what should we actually be doing about AI? Not the vendor-pitch version. The real one -- what do I do for my team, my org, my own career.

I have given the answer enough times now that it has hardened into a shape, and the shape surprises people: it is not a technology answer. The companies getting AI right are not the ones that picked the best model. They are the ones that solved three different problems, in the right order -- personal fluency first, organizational systems second, organizational design last.

The companies stuck in pilot purgatory almost always skipped a step.

The gap nobody wants to own

Two numbers describe the moment. Eighty-eight percent of organizations now use AI in at least one business function, per McKinsey's latest State of AI survey. Yet when MIT researchers analyzed 300 enterprise deployments, 95% of generative AI pilots showed no measurable P&L impact.

Everyone is using it. Almost nobody is getting paid for it. And before you blame the models, the same MIT analysis traced the failures to organizational causes: pilots that never touched core systems, tools that never learned the business -- and a budget detail this audience should sit with. More than half of AI spending chased sales and marketing, while the most dramatic documented savings came from back-office automation. Procurement is the back office. The money is being spent where the demos are flashy, not where the returns are.

So here is the three-layer answer I give at all those tables, with the evidence behind each layer.

Layer one: start in the playground

The cheapest capability investment available to a procurement leader is a $20-to-$30-a-month frontier model subscription for every person on the team. ChatGPT, Claude, Gemini, Copilot -- which one matters far less than starting. The assignment is simple: use it on real work, small then bigger. Summarize a supplier's 10-K before the negotiation. Draft the RFQ clarification questions. Role-play the other side of a price discussion. The assistants now live inside Excel, where your spend data already sits, which removes the last excuse.

Why start here, when the org-level prize is so much bigger? Because of what the research calls the jagged frontier. In a field experiment with 758 BCG consultants, people using AI finished 12.2% more work at higher quality on tasks inside the model's competence -- and were 19 percentage points more likely to get the answer wrong on tasks just outside it. Same tool, same people, opposite outcomes. The frontier between those zones is invisible. It does not follow job titles or intuition. The only way to learn where AI is brilliant and where it confidently fails is repeated, low-stakes contact.

Personal use is the right venue because it is a playground: you read every output, mistakes cost minutes, and nothing touches a system of record. Shopify's CEO pushed this to its logical end in a memo titled "AI usage is now a baseline expectation" -- teams must show AI cannot do a job before asking for headcount. You do not need a mandate that aggressive. But you cannot opt out personally, either: the build-versus-buy and guardrail calls in layer two cannot be delegated to people who have never used the tools. I have watched executives evaluate AI platforms the way I would evaluate figure skating -- confidently, and with no relevant experience. The subscription is how you earn the judgment.

One rule applies even in the playground: provision official accounts. MIT found that 90% of employees already use personal AI tools at work, while only 40% of companies have purchased an official subscription. Your team is already playing. The only question is whether your supplier pricing went with them, on a personal account, into a consumer product's training data.

Layer two: the organization is not a bigger playground

Here is where most companies stumble, and the stumble has a structure. Individual AI use is forgiving because a human reads every output. Organizational AI is different for one reason: nobody is reading every output anymore. An agent that contacts suppliers or updates records inherits whatever foundation it stands on -- and in Deloitte's 2025 Global CPO Survey, 57% of CPOs named siloed working practices a top barrier to value delivery. An agent pointed at contradictory data does not fail safely. It acts on it.

The differences are worth seeing side by side:

	The playground (you + a chatbot)	Production (agents in your org)
Who checks the output	You, every time	Often nobody
Cost of a mistake	Minutes of your time	A wrong PO, a damaged supplier relationship
Data it touches	Whatever you paste in	Your systems of record
What it needs to work	A subscription	Clean data, encoded business rules, a system of action
Permissions	Not applicable	Read-only / write-with-approval / forbidden zones
The right builder	You, experimenting	Tested vendors who absorb the frontier for you

So the build order matters. First, a data foundation: item, supplier, spend, and contract records that agree. Second, business logic: approval thresholds, preferred suppliers, payment terms -- the encoded rules of how your company actually buys. Third, a system of action, where sourcing events, approvals, and purchase orders execute. Only then the AI layer.

Scope that top layer by permissions, in writing. Read-only agents summarize risk and find duplicate spend. Write-with-approval agents draft the supplier email and build the requisition; a person owns the commit. Fully autonomous writes should stay rare and thresholded, and some domains belong off-limits entirely: payment execution, banking detail changes, anything that moves money without a human in the loop. (If this layered picture sounds like an engineering control system, that is exactly what it is -- I wrote a whole nerdy companion post on it.)

Then the question I hear most at those conference tables: build or buy? The evidence is unusually one-sided. In MIT's sample, external partnerships reached deployment about 67% of the time, against roughly 33% for internal builds -- twice the success rate. The reasons are structural, not flattering to vendors: a vendor amortizes hard lessons across hundreds of deployments, so you learn from everyone else's rough edges instead of only your own. And the frontier moves too fast to chase in-house -- METR measures the length of tasks AI agents can complete autonomously doubling roughly every seven months. A system hand-built around today's model ages out before it pays back; a vendor tracking the frontier absorbs the upgrades for you. Buy carefully all the same -- Gartner expects over 40% of agentic AI projects canceled by end of 2027 and warns thousands of vendors are relabeling old automation as agents. Demand action logs, data portability, model-swap options, and reference customers. Save internal builds for the thin proprietary edge no vendor serves. This is the same make-versus-buy logic procurement applies to everything else -- applied, for once, to procurement's own tooling.

Layer three: the org chart catches up last

When every buyer runs a fleet of AI workers, the scarce human contributions become strategy and exception handling. The exposed roles are not the people with judgment; they are the ones who collect status, move information between systems, or check work that software now checks better. I wrote a full ranking of procurement and supply chain jobs by AI exposure -- the pattern there is the same one here.

The flattening is already underway. Gusto's payroll data shows small-business managers now oversee nearly six people, up from about three in 2019, and Amazon raised its ratio of individual contributors to managers by 15% by March 2025.

But treat those numbers as direction, not permission, because the cautionary data is just as strong. Korn Ferry's survey of 15,000 workers found 41% had seen management layers cut -- and 37% of them felt directionless afterward. Klarna automated roughly 700 service roles, watched quality slide, and resumed hiring humans in 2025. Flattening is an output of layers one and two working. It is not a shortcut to them, and the companies that restructure first are running the sequence backward.

What emerges, run forward, is a structure I would describe as senior leaders setting strategy and risk appetite, a thinner management layer that handles exceptions and designs the systems agents operate in, and what I have started calling super ICs: experienced individual contributors who each direct a portfolio of AI workflows. Picture one person owning resin-market intelligence across business units -- agents watching index moves and supplier capacity, the human stepping into negotiations only on exceptions. (People at the center is not a slogan in this picture; it is the job description.)

Staff it by retraining, not replacement. The World Economic Forum expects 39% of core job skills to change by 2030, and 85% of employers plan to upskill for it. Here is the asymmetry that should drive every retraining decision: your senior buyer who knows which supplier quotes aggressively and slips on tooling timelines is the best agent operator you will ever hire. Prompting can be taught in months. Supplier judgment takes a decade.

What I tell the room

At LightSource we live inside layer two every day -- our customers are challenger manufacturers who consolidated their sourcing data precisely so that AI, theirs and ours, has clean signals to act on, with normalized bids and BOM-level costs as the foundation rather than the afterthought. That vantage point is exactly why I keep giving the unglamorous answer.

The sequence is the strategy. Most of the 95% ran it backward: agents before the data foundation, restructuring before anyone could tell good output from confident nonsense. Run it forward and each layer de-risks the next -- the playground builds the judgment, the judgment picks the systems, the systems earn the reorganization. The subscriptions cost less per person than a supplier lunch. The stack takes quarters. The org chart moves last, and by then it is catching up to a team that already knows exactly what the machines are good for.

That is the whole answer. It fits in a conference hallway, and it has not failed me yet.

Sources

McKinsey, The State of AI (November 2025) -- 88% of organizations use AI in at least one function; 39% report any EBIT impact; high performers are 2.8x more likely to have fundamentally redesigned workflows
Fortune: MIT report finds 95% of generative AI pilots at companies are failing (Aug. 18, 2025) -- coverage of the MIT NANDA GenAI Divide findings
MIT NANDA, The GenAI Divide: State of AI in Business 2025 (report PDF) -- external partnerships reach deployment ~67% vs ~33% for internal builds (p. 19); 50% of AI budgets flow to sales/marketing while back-office automation delivers the most dramatic savings (p. 20); 90% of employees use personal AI tools while 40% of companies purchased subscriptions (p. 8)
Harvard Business School working paper 24-013: Navigating the Jagged Technological Frontier (2023) -- 758 BCG consultants; +12.2% tasks, 25.1% faster, ~40% higher quality inside the frontier; 19 percentage points worse outside it
Microsoft Learn: Copilot in Microsoft 365 apps with Anthropic models (2026) -- frontier AI assistants selectable inside Excel
CNBC: Shopify CEO says staffers must prove AI can't do a job before asking for headcount (April 7, 2025)
Deloitte, 2025 Global Chief Procurement Officer Survey -- 57% of CPOs cite siloed working practices as a top barrier
Gartner press release (June 25, 2025): Over 40% of agentic AI projects will be canceled by end of 2027
METR: Measuring AI Ability to Complete Long Tasks (March 2025) -- autonomous task length doubling roughly every 7 months
Axios: Middle managers in decline as "flattening" spreads, AI advances (July 8, 2025) -- Gusto span-of-control data
About Amazon: Andy Jassy's update on manager-to-contributor ratios (Sept. 2024)
Korn Ferry, Workforce 2025: Power Shifts -- survey of 15,000 workers: 41% report delayering; 37% of those feel directionless
Fortune: Klarna plans to hire humans again (May 9, 2025)
World Economic Forum, The Future of Jobs Report 2025 -- 39% of key skills expected to change by 2030; 85% of employers plan to prioritize upskilling