Artifact-Driven Development: Making It Possible to Query Large Analytics and AI Projects

Jun 22, 2026
No comments yet
Original Content
1766 Views

1 month ago
Artifact-Driven Development: Making It Possible to Query Large Analytics and AI Projects

By: David Katz

A practical introduction to making complex project structure explicit for humans and AI, with examples from predictive analytics and enterprise ML.

Large analytics and AI projects contain more than source code. Predictive analytics and enterprise ML projects make this especially visible: they contain intermediate datasets, derived tables, feature definitions, model inputs, evaluation results, decisions, workflow state, dependencies, business rules, and operational constraints. Much of this structure is often left implicit: buried in notebooks, scripts, SQL, conventions, chat history, or in one person’s head.

Artifact-driven development is a simple response: treat important intermediate products as explicit artifacts.

In this article, I focus on predictive analytics, predictive AI, and enterprise ML projects because they make the problem especially visible: teams must coordinate data, features, models, evaluation results, assumptions, deployment constraints, and business decisions. But the underlying idea is broader. Artifact-driven development can apply to any large project where important intermediate products, dependencies, decisions, and workflow state need to be made explicit, inspectable, and reusable.

LLMs and AI assistants matter because they make it more practical to query and use project structure, but the central problem is broader than generative AI: complex project work becomes hard to maintain when important intermediate products and decisions remain implicit.

An artifact might be a derived table, a semantic view, a workflow state, a dependency summary, a design decision, a task definition, or a compact record of how one output depends on others. The point is not just to save outputs. The point is to make project structure more visible, inspectable, and reusable.

Talk to your project

Artifact-driven development makes a large project structured enough to query. Instead of treating a repository as a pile of files, it represents important project elements as artifacts: documents, scripts, generated outputs, decisions, assumptions, validation records, and semantic views. The examples in this article are drawn mainly from predictive analytics and ML, but the pattern is not limited to those domains.

This makes it possible to ask project-level questions: what explains this concept, what depends on this output, what context should an AI assistant read before editing this file, and what may be stale?

The goal is not to replace documentation. The goal is to make it possible to query documentation, code, decisions, and generated outputs as part of the project structure.

Why artifact-driven development?

Artifact-driven development is motivated by two related goals:

Context selection for leaner models – instead of sending an entire repository or a large undifferentiated prompt, ADD helps identify the artifacts most relevant to a question or task.

Making project structure possible to query – by making artifacts, dependencies, and semantic views explicit, the project becomes easier for humans and agents to interrogate, navigate, and modify.

ADD is not a replacement for ordinary documentation such as README files, design notes, flowcharts, architecture diagrams, or data dictionaries. Those remain useful.

The difference is that ADD asks documentation and project structure to do a more operational job: make it possible to query the project. A flowchart might explain a pipeline to a human reader; an ADD artifact graph should also identify the artifacts, their paths, purposes, dependencies, downstream consumers, semantic meaning, and relevance to common questions.

The core idea

Traditional project development often leaves important context outside the working system. Humans reconstruct it from memory, code, and notes. AI systems try to reconstruct it from prompts, files, and chat history.

Both do better when the important intermediate products are made explicit.

Artifact-driven development can be understood as an explicit abstraction layer over a project. Instead of asking humans or AI systems to infer structure from scattered files, scripts, notes, and conversations, it gives the project a layer of named artifacts with purposes, dependencies, and relationships.

Artifact-driven development treats these intermediate products as first-class objects; that is:

they are named
they can be inspected
they can be reused
they can be connected by dependencies
they can help explain what exists and why

Together, these artifacts make the system easier to understand, explain, and change. Instead of treating the project as a mass of code, data, and notes, they give both humans and AI systems concrete objects to inspect, discuss, reuse, and modify.

Minimal vocabulary

Here is a small practical vocabulary.

Operational artifacts are things the system actively uses while running or producing outputs. Examples: tables, views, task records, workflow states, checkpoints, budgets, approvals.

Descriptive artifacts are things that help explain or organize the system. Examples: design decisions, dependency summaries, semantic definitions, structured documentation.

Dependencies describe how one artifact relies on another.

Artifact-driven development means making important project structure explicit through artifacts rather than leaving it implicit in code, prompts, or convention.

These categories are not rigid. A useful artifact can be partly operational and partly descriptive.

A small example

Suppose a predictive analytics pipeline computes features from raw event data.

A conventional implementation might have this logic hidden in code:

clean duplicates
normalize timestamps
extract features
build summary tables

A more artifact-driven version might make several intermediate products explicit:

raw_events
deduplicated_events
normalized_events
feature_table
summary_table
decision: duplicate cleaning happens before feature extraction

That last item is important. A design choice that would otherwise be hidden in code or chat becomes an explicit artifact.

The result is not just better documentation. It is a system whose structure is easier to inspect and reason about.

A human can see what exists and how outputs are formed.

An AI system can focus on the relevant artifacts instead of reconstructing everything from scratch.

Why this helps with AI-assisted project work

AI is increasingly capable at coding, data analysis, and system design. But it still works best when the project exposes clear structure.

Without explicit artifacts, important context is often scattered across:

source files
notebooks
shell scripts
SQL queries
naming conventions
issue threads
chat history
unwritten assumptions

With explicit artifacts, more of that context becomes available in reusable form.

That helps AI systems:

understand the current state of a project
work from stable intermediate products
trace dependencies
compare alternative designs
make smaller, better-scoped changes

In a larger system, this also helps an AI focus on a few relevant artifacts instead of a mass of loosely connected files and conversation.

Using artifacts as AI context

One practical benefit of making artifacts explicit is that they help with context selection.

Context selection means choosing the subset of project material that is relevant to the current question or task. When working with an AI system, it is often unclear which files, notes, examples, schemas, decisions, or intermediate outputs should be included in the prompt.

Artifact-driven development makes this easier by giving the project named objects with purposes and dependencies. Instead of sending “the whole project,” you can select the artifacts most relevant to the task. This both saves context space and often produces better results.

For example, if the question is why duplicate removal happens before feature extraction, the relevant context might include the raw input artifact, the deduplicated artifact, the normalized artifact, and the artifact catalog describing their dependencies.

Beyond documentation

Artifact-driven development is not just “write better docs.”

It broadens the effective system boundary.

Instead of treating important context as something outside the system, it brings more of that context inside the working structure of the project. Decisions, dependencies, workflow state, and operational constraints can all become explicit artifacts.

This matters because explicit artifacts are easier to inspect, revise, test, govern, and reuse than hidden conventions.

Why this matters for autonomous agents

This idea also matters for AI governance.

As AI systems become more autonomous, good behavior depends not only on model capability but on system structure. Goals, budgets, tasks, approvals, permissions, and audit trails should not live only in prompts or operator intuition. They should be explicit artifacts inside the system.

When those structures are explicit, autonomous behavior becomes easier to constrain, inspect, interrupt, and review.

In that sense, artifact-driven development is relevant not only to productivity, but also to alignment and governance.

Relation to LLM-maintained knowledge systems

A related recent pattern is the idea of an LLM-maintained wiki or knowledge layer, such as Andrej Karpathy’s LLM Wiki gist.

The most important idea behind “LLM wiki” is not “let the model write a wiki.” It is “let context accumulate into explicit artifacts.” Raw sources remain the source of truth, while maintained intermediate artifacts capture synthesis, cross-references, logs, and operating conventions.

Artifact-driven development generalizes this idea beyond wikis. It treats decisions, dependencies, summaries, indexes, validation records, and workflow state as first-class artifacts too. That broader framing is useful because it preserves the compounding benefits of maintained context while making more room for provenance, human review, role-specific context, and stronger structures such as typed relationships or graphs when a simple markdown layer is no longer enough.

Limitations: opaque components

Artifact-driven development does not make every part of a system transparent. Some components, especially learned models such as neural networks, may remain internally opaque even when the surrounding project structure is explicit.

In those cases, the method is still useful, but in a different way. The opaque component can be treated as a bounded artifact, while the surrounding evidence is made explicit: training data, evaluation results, behavioral tests, counterexamples, monitoring outputs, interpretability attempts, and decision records.

This does not turn the model’s internal representations into a human-readable DAG. It does, however, make the system’s knowledge about the model more visible, inspectable, and revisable.

Where to start

See the full repository: artifact-driven-development

A simple claim

Artifact-driven development does not require a new programming language or a fully automated system.

It starts with a smaller and more practical move:

Make important project structure explicit.

That helps humans work with more clarity, and it gives AI systems a better chance of being useful, reliable, and governable.

About the author

David Katz recently retired after a long career in predictive analytics, enterprise software, and applied data systems, including extensive work as Principal Consultant in the TIBCO/Spotfire analytics ecosystem. His work has spanned analytics applications, data pipelines, modeling workflows, and customer-facing technical consulting in fields from marketing to hi-tech manufacturing and energy. He is now focused on writing, speaking, and selected consulting projects around predictive analytics, AI-assisted development, and making complex project structure more explicit, queryable, and governable.

EXCLUSIVE HIGHLIGHTS

Related

1 month ago
Artifact-Driven Development: Making It Possible to Query Large Analytics and AI Projects

Talk to your project

Why artifact-driven development?

The core idea

Minimal vocabulary

A small example

Why this helps with AI-assisted project work

Using artifacts as AI context

Beyond documentation

Why this matters for autonomous agents

Relation to LLM-maintained knowledge systems

Limitations: opaque components

Where to start

A simple claim

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact

EXCLUSIVE HIGHLIGHTS

Related

1 month agoArtifact-Driven Development: Making It Possible to Query Large Analytics and AI Projects

Talk to your project

Why artifact-driven development?

The core idea

Minimal vocabulary

A small example

Why this helps with AI-assisted project work

Using artifacts as AI context

Beyond documentation

Why this matters for autonomous agents

Relation to LLM-maintained knowledge systems

Limitations: opaque components

Where to start

A simple claim

Recommended

Hybrid AI Emerges To Tame LLMs – And Not A Moment Too Soon

AGI Is Infeasible. Instead, Pursue Superhuman Adaptable Intelligence

Incoherent AGI Hype Spurs An Industrywide Pivot To Hybrid AI

The AI Paradox: More Humanlike Means Less Autonomous

Login

Industry News

Connect with Us

Subscription

ADVERTISEMENTS

Produced By:

Archives

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190 Produced by: Rising Media & Prediction Impact

1 month ago
Artifact-Driven Development: Making It Possible to Query Large Analytics and AI Projects

The Machine Learning Times © 2026 • 1221 State Street • Suite 12, 91940 • Santa Barbara, CA 93190
Produced by: Rising Media & Prediction Impact