top of page

What a Project Manager Needs to Understand to Lead AI Projects Well

  • Writer: Kenneth Linnebjerg
    Kenneth Linnebjerg
  • Mar 27
  • 11 min read


Artificial intelligence projects are now moving beyond experimentation. Many organizations no longer ask whether they should use AI. They ask where it creates value, how fast they can deploy it, and how to control the risks once it is connected to real data, systems, and business decisions.


The result is that the role of the project manager is changing. AI delivery is not just another software project. It introduces new patterns: agents, retrieval, multimodal interfaces, observability, governance, model choice, and fast-moving platform dependencies. Official platform documentation now frames AI delivery around agents, tools, governance, and scale, which is a strong signal that enterprise AI has entered a more operational phase.


For project and program managers, that means success depends less on deep coding skill and more on understanding the control points that determine whether an AI initiative becomes useful, governable, and scalable. In my view, there are seven main areas that an AI project manager needs to understand and manage well.


Intelligent Systems See Patterns
Being a Project Manager in the AI age is still about mastering tasks and activities - Now these items turn into functions that we need to manage - Understanding how these new AI functions work is fundamental to structuring and managing healthy projects in the AI age

Use-case shaping and business value

The first responsibility is not technical. It is to define the right problem. AI projects often fail because the use case is vague, over-ambitious, or disconnected from measurable business value. A good AI project manager must force clarity: what task is being improved, who benefits, what process is affected, what data is required, and what measurable outcome is expected.


This matters even more in AI because many solutions look impressive in demos before anyone has proven they solve a real operational problem. The strongest use cases are usually narrow enough to measure, but important enough to matter. Typical examples include knowledge search, service assistance, document handling, proposal generation, workflow automation, and decision support. In practice, a PM must translate enthusiasm into a scoped hypothesis with success criteria, cost boundaries, and adoption targets.


The project manager therefore needs to own the discipline of separating “interesting AI” from “valuable AI.” That means defining objectives, baseline metrics, target users, expected business effect, and rollout logic before the team disappears into model experimentation.



Agent and workflow design understanding

A second core area is understanding agentic workflows. Modern AI systems increasingly do more than answer prompts. They can use tools, retrieve information, call APIs, hand work to specialized agents, and maintain traces of what happened.


OpenAI’s Agents SDK explicitly describes agents as systems that can use context and tools, hand off to other specialized agents, and keep a full trace of execution. MCP, the Model Context Protocol, is also becoming important because it standardizes how AI applications connect to external systems, tools, and workflows.


A project manager does not need to build these components personally, but does need to understand the design implications. Which actions should the AI be allowed to take? Where is human approval required? Which tools are read-only and which are destructive? What happens when the agent fails, loops, or chooses the wrong action? Which actions must be logged? These are delivery and governance questions as much as technical questions.


In other words, the AI PM must be able to structure an agent project the same way a strong PM would structure a complex integration project: define roles, boundaries, escalation paths, failure handling, decision rights, and operational controls.



Data grounding, retrieval, and knowledge quality

One of the most important enterprise AI patterns is retrieval-augmented generation, or RAG. This is the pattern where a model is grounded in current, private, or domain-specific information rather than relying only on its training.


Microsoft’s Azure AI Search documentation now explicitly distinguishes between classic RAG and newer agentic retrieval approaches, and highlights the importance of chunking, citations, hybrid search, and relevance for reliable grounded answers. Databricks similarly positions vector search as a production component for storing embedded data with metadata and syncing it with enterprise tables.


For the project manager, this means the data workstream cannot be treated as a side note. AI quality depends heavily on source quality, freshness, access control, document structure, and retrieval design. If the content is poor, fragmented, outdated, or inaccessible, the AI experience will be poor no matter how advanced the model is.


A capable AI PM must therefore ask the right questions: which documents or records are the source of truth, how often are they updated, who owns them, what permission model applies, how will stale content be handled, and how will answers be cited or grounded? This area is often where enterprise AI projects become real. It is also where many proofs of concept stall when they meet messy operational data.



Evaluation, observability, and quality management

Traditional software can often be tested against deterministic expected outcomes. AI systems are different. Output quality is probabilistic, context-sensitive, and vulnerable to drift when prompts, models, tools, or data sources change. That is why evaluation and observability are no longer optional.


Microsoft’s Foundry observability documentation emphasizes monitoring performance, safety, and quality metrics across the lifecycle, while its agent tracing material highlights the need to capture inputs, outputs, tool usage, retries, latencies, and costs to understand complex runs. Microsoft also provides dedicated RAG evaluators to assess groundedness and answer quality.


This has major implications for project leadership. The AI PM must ensure the team defines what good looks like before rollout: answer relevance, groundedness, task completion rate, escalation rate, latency, token cost, error patterns, user trust, and fallback behavior. Without that, AI projects become endless debates driven by anecdotes and demo impressions.


In practical terms, this means treating evaluation as a managed workstream. Test cases need to be built. Quality thresholds need to be agreed. Regression checks need to be run when prompts, models, indexes, or tools change. Dashboards need to show whether the solution is improving or degrading in production. AI delivery without observability is not serious delivery.



Governance, security, and risk control

This is the area many organizations underestimate until late in the program. AI introduces new risks around privacy, incorrect outputs, unsafe tool use, access to sensitive information, insecure integrations, and poorly governed autonomous behavior. NIST’s AI Risk


Management Framework is explicit that organizations need structured ways to incorporate trustworthiness into the design, development, use, and evaluation of AI systems. OWASP’s current work on LLM application risks similarly highlights prompt injection and other vulnerabilities across the lifecycle of generative AI systems.


For a project manager, governance is not just a legal or security topic. It is a delivery topic. Someone must define approval boundaries, logging requirements, data classification rules, access models, review forums, and risk ownership. In agent-based solutions, this becomes even more important because the system may take actions, not just generate text.


A good AI PM therefore ensures that governance is designed into the project from the start. That includes security reviews, risk logs, human-in-the-loop controls, auditability, change approval, and clear ownership between business, architecture, security, legal, and operations.



Platform, vendor, and operating model management

AI solutions are built on fast-moving platforms. OpenAI, Microsoft, Google Cloud, Databricks, and others are continuously changing APIs, capabilities, deployment models, and operating patterns. OpenAI now recommends the Responses API for new projects as the newer primitive for agentic integrations.


Microsoft Foundry describes itself as an “AI app and agent factory” for building, optimizing, and governing AI apps and agents at scale. Google positions Vertex AI as a unified platform for building, deploying, and scaling generative AI and machine learning applications, while Vertex AI Agent Engine focuses specifically on deploying and managing AI agents in production.


This means the project manager must understand platform strategy and vendor dependency. Which services are core? Which components are experimental? What is locked to a vendor? How will changes in model pricing, API behavior, quotas, or deprecations affect the roadmap? How will environments, support, incident handling, and cost management be organized after go-live?


These are classic program management concerns, but they now apply to models, vector stores, observability layers, agent runtimes, and tool connectivity as well. The PM must help the organization move from prototype thinking to operating model thinking.



Cross-functional delivery and adoption

Finally, AI projects are deeply cross-functional. They sit between business operations, subject matter experts, architects, data engineers, application teams, security, legal, and change management. The project manager must be able to align these actors around one delivery path. That includes stakeholder management, phased rollout, user onboarding, process redesign, and benefit tracking.


This is especially important because AI changes work, not just systems. If the project introduces a copilot, agent, or knowledge assistant, users may need new ways of working, new escalation paths, and new trust habits. Adoption cannot be assumed simply because the tool is clever.


The strongest AI PMs therefore manage both delivery and organizational fit. They make sure the business process, the operating model, and the solution design evolve together. They also understand that adoption is part of delivery quality. A technically functioning AI solution that no one trusts or uses is still a failed project.



Conclusion

A strong AI project manager does not need to become a machine learning engineer. But they do need to understand the seven management areas where AI programs succeed or fail: business value, agent design, data grounding, evaluation, governance, platform strategy, and cross-functional adoption.


That combination is what makes AI leadership credible. The market no longer needs people who can merely say that AI is important. It needs delivery leaders who can turn AI into structured, governed, measurable change.


For project and program managers, that is the opportunity. AI is not replacing the need for management discipline. It is increasing it.



READING LIST:


1. OpenAI – Agents SDK Why it is relevant: This is one of the clearest official references for understanding what an AI agent actually is in practice: tools, handoffs, traces, and orchestration. It is directly relevant to the “agent and workflow design” area in your article. What you get from reading it: You get a practical understanding of how modern agentic systems are structured, what components they use, and why AI projects are moving beyond simple chat interfaces into tool-using workflows.


2. OpenAI – How to Build an Agent Why it is relevant: This guide is useful because it explains the design process behind agent systems, not just the SDK itself. It is a good source for describing how AI initiatives need workflow thinking, not only model selection.

What you get from reading it: You get a conceptual view of how to frame an agent project: goal definition, workflow construction, tool selection, and system composition. That helps a project manager understand scope and architecture at a meaningful level.


Why it is relevant: This is valuable because it moves from theory into patterns and examples. For a PM, examples are important because they reveal the kinds of projects organizations are actually building.

What you get from reading it: You get concrete use cases and implementation patterns that help you see what “agent projects” look like in real life, which makes your blog post more grounded and credible.


Why it is relevant: MCP is becoming an important standard for connecting AI applications to external tools, data sources, and workflows. It is highly relevant to your section on agent design and platform integration.

What you get from reading it: You get a clear mental model of why AI applications increasingly need standardized access to systems and data, and why that matters for enterprise delivery.


Why it is relevant: This goes one step deeper than the intro and explains the core components and boundaries in the protocol.

What you get from reading it: You get a stronger structural understanding of hosts, clients, servers, tools, resources, and protocol layers, which is useful when writing about governance, integrations, and controlled access.


Why it is relevant: This is relevant because many enterprise AI projects are really about exposing capabilities safely to an AI system.

What you get from reading it: You get insight into how AI applications are connected to real capabilities such as files, databases, calendars, or collaboration tools. That helps explain why AI PMs must understand permission boundaries and integration risk.


Why it is relevant: Microsoft’s framing is useful because it presents AI delivery as an enterprise factory for AI apps and agents, which fits your audience of project and program managers.

What you get from reading it: You get a platform-level perspective on how enterprises are organizing AI development, deployment, and governance at scale. That is useful for your “platform, vendor, and operating model” section.


Why it is relevant: Observability is one of the strongest and most overlooked PM topics in AI. This source explains how quality, safety, reliability, latency, token usage, and production monitoring are handled.

What you get from reading it: You get the language and structure needed to explain why AI projects require active monitoring and measurement, not just delivery of a feature. This is central to your evaluation and quality management section.


Why it is relevant: This source is helpful because it makes evaluation concrete rather than abstract. It shows that AI applications and agents are tested against datasets and scored with built-in or custom evaluators.

What you get from reading it: You get practical evidence for your argument that AI PMs must define what good looks like and manage evaluation as a workstream, not as an afterthought.


Why it is relevant: This is directly relevant to the data grounding and retrieval section, because it explains how groundedness, relevance, and completeness are assessed in RAG systems.

What you get from reading it: You get a practical understanding of what “good retrieval quality” means and how teams evaluate whether a grounded AI system is actually trustworthy.


Why it is relevant: This is one of the best official explanations of enterprise RAG and grounded retrieval. It is especially useful because it covers both classic RAG patterns and more agentic retrieval approaches.

What you get from reading it: You get the concepts needed to explain chunking, retrieval, citations, grounding, relevance, and search quality in a business-relevant way.


Why it is relevant: This is one of the strongest primary sources for governance, trustworthiness, and risk management in AI systems. It gives your article authority and seriousness.

What you get from reading it: You get a structured way to talk about AI risk beyond buzzwords: governance, mapping risk, measurement, and management of trustworthy AI. This is ideal for your governance and control section.


Why it is relevant: This source is useful because it translates AI security risk into concrete categories that project managers can understand and act on.

What you get from reading it: You get awareness of prompt injection, data leakage, insecure tool use, and other practical risks that should shape approvals, controls, and testing.


Why it is relevant: This is a strong source for explaining how enterprise retrieval is implemented in a modern data platform. It supports your section on grounding, retrieval, and platform design.

What you get from reading it: You get an operational picture of vector indexes, retrieval architecture, and how AI systems are tied to enterprise data platforms rather than standing alone.


Why it is relevant: This source is especially useful because it focuses on improving retrieval quality rather than merely describing the technology.

What you get from reading it: You get practical insight into how teams improve relevance and search quality in RAG systems, which helps you speak about quality management in a more mature way.


Why it is relevant: Vertex AI is a good reference for the platform and vendor management side of AI programs because it presents a unified enterprise AI platform covering model access, deployment, and scale.

What you get from reading it: You get a good view of how a major cloud provider positions the AI stack from experimentation through production, which supports your section on platform strategy and operating model.


Why it is relevant: This is directly relevant to current agentic delivery patterns because it focuses specifically on deploying, managing, and scaling agents in production.

What you get from reading it: You get a production-oriented perspective on agent runtime, sessions, observability, and scaling, which helps explain why AI programs need more than prototype-level thinking.


Comments


LINNFOSS Consulting ApS - info@linnfoss.com - +45 4116 6770

INCUBA Katrinebjerg - Åbogade 15 - DK-8200 Aarhus - Denmark - ©2018 by LINNFOSS

  • LinkedIn
bottom of page