AutoCoC: Automating Chain-of-Concepts Prompting for Domain-Specific LLMs

🧠 AutoCoC is a prompt-based knowledge injection method that automatically converts domain documentation into a structured concept tree and then turns that tree into a sequence of prompts for large language models. The goal is to improve domain-specific reasoning and structured generation without fine-tuning, external retrieval infrastructure, or manual concept engineering.

Problem Setting

Large language models perform well on many general NLP tasks, but their performance often degrades in specialized domains such as industrial engineering, healthcare, legal analysis, and formal knowledge modeling. The core issue is not language fluency, but the absence of reliable domain-specific conceptual structure.

In practice, vanilla LLMs face three recurring problems in these settings:

They lack proprietary or highly specialized knowledge that was never part of pre-training.
They may hallucinate plausible but semantically wrong answers when the task depends on precise concept hierarchies or domain constraints.
They struggle with structured generation tasks, such as ontology authoring, where outputs must satisfy both conceptual and formal requirements.

This creates a knowledge gap between what general-purpose LLMs know and what real-world domain tasks require. The thesis addresses this gap through explicit knowledge injection at inference time.

Background: Knowledge Injection for LLMs

A useful way to frame the problem is through the broader landscape of knowledge injection methods. These methods differ in when knowledge is introduced and how strongly it modifies the model.

1. Dynamic knowledge injection

This family injects external knowledge at inference time, typically through retrieval systems. It is flexible and easy to update, but depends heavily on retrieval quality, external infrastructure, and latency-sensitive orchestration.

2. Static knowledge embedding

This family integrates domain knowledge through continued pre-training or fine-tuning. It can produce strong task performance, but it is expensive to train, harder to update, and less suitable for low-resource or rapidly changing domains.

3. Modular adapters

Adapter-based approaches add trainable components to a frozen backbone model. They are more parameter-efficient than full fine-tuning, but still require architecture choices, training data, and domain-specific optimization.

4. Prompt optimization

Prompt-based methods inject knowledge through carefully designed instructions, examples, and reasoning scaffolds without changing model parameters. This makes them attractive for closed-source models, low-resource domains, and fast iteration cycles.

The thesis focuses on prompt optimization because it offers the lightest-weight path to domain adaptation. It avoids retraining, works with frontier APIs, and can be updated as soon as the underlying documentation changes.

Related Work and Research Gap

The work is motivated by two strands of literature.

First, research on prompting has shown that LLM behavior can be improved substantially through structured inference-time guidance, including few-shot prompting, chain-of-thought prompting, and related reasoning scaffolds. These methods demonstrate that models can be guided toward better reasoning without modifying parameters.

Second, prior work on Chain of Concepts introduced the idea of injecting domain knowledge through explicit conceptual structures. Rather than supplying isolated facts, the method provides concepts and their relations in an order that mirrors abstraction levels. This is appealing because many domain tasks depend not only on facts, but on understanding conceptual dependencies.

However, prior concept-based prompting methods depend heavily on manually curated concept hierarchies or expert-authored DAGs. That leads to three practical limitations:

Accuracy risk: manually curated structures can omit concepts or misrepresent relations.
Reproducibility risk: different experts organize the same domain differently.
Scalability risk: creating and maintaining concept structures across domains is labor-intensive.

The core research gap, therefore, is not whether concept-grounded prompting is useful, but whether it can be automated in a way that remains semantically coherent, reproducible, and practically deployable.

Main Idea

The central idea of the thesis is to replace manual concept curation with an automated pipeline that extracts domain concepts from documentation, organizes them into a concept tree, enriches them with examples, and converts the resulting structure into a pedagogically ordered prompt sequence.

This method is called Automated Chain of Concepts (AutoCoC).

At a high level, the pipeline has two stages:

Concept tree construction from domain documentation
Structured prompt generation from the resulting tree

The resulting prompt sequence acts as an explicit conceptual scaffold for the model. Instead of asking the LLM to infer the full domain structure implicitly, AutoCoC provides the conceptual backbone directly.

Method Design

Concept tree as the intermediate representation

The thesis deliberately uses a tree rather than a general graph or DAG as the main representation.

This is an important architectural choice.

A knowledge graph is expressive, but difficult to construct, traverse, serialize, and evaluate.
A DAG supports richer multi-parent dependencies, but introduces ambiguity during traversal and prompt linearization.
A tree is less expressive, but computationally simpler, deterministic to traverse, easier to serialize into prompts, and more tractable for experimentation.

The method therefore makes two explicit assumptions:

domain knowledge can be approximated hierarchically
a single primary concept tree is sufficient as the conceptual backbone for a domain

This simplification is not just a convenience decision. It is a design choice to reduce structural degrees of freedom and make the pipeline more reproducible.

Node structure

Each concept node stores both semantic content and structural context. In the thesis, nodes include:

concept identity
type
semantic description
relation to parent
relation description
parent relevance score
domain relevance score
supporting example
child nodes

This schema enables the same structure to support extraction, ranking, traversal, pruning, and downstream prompt generation.

Recursive top-down construction

AutoCoC builds the tree recursively from a root concept. For each parent node, the system asks the model for candidate child concepts together with relationship information and confidence-like scores.

The construction algorithm applies several controls:

a depth limit to prevent uncontrolled expansion
a call budget to bound computational cost
edge-strength filtering to reject weak relations
cycle prevention to preserve tree validity
selective expansion so only the most domain-relevant nodes are explored further

This gives the method a bounded, policy-driven search procedure rather than unconstrained concept generation.

Dual scoring system

A key design contribution is the separation of two different decisions:

score parent: how strong the parent-child relation is
score domain: how central the concept is to the overall target domain

This is useful because a concept can be a valid child of a parent without being central enough to deserve deeper expansion. Separating structural fit from global importance allows the algorithm to maintain both coherence and focus.

Dynamic reparenting

Another important mechanism is dynamic reparenting. During recursive expansion, the same concept may be discovered from different parts of the tree. A concept initially attached under one parent may later be found to fit better under another.

Instead of freezing early decisions, AutoCoC allows the concept to be detached and reattached when a stronger parent-child relationship is identified. This makes the construction process self-correcting and reduces the impact of order-dependent early mistakes.

Prompt generation

Once the concept tree is built, the system traverses it breadth-first and generates a sequence of prompts from more abstract concepts to more concrete ones.

This ordering matters. It creates a teaching-like progression in which foundational concepts appear before specialized sub-concepts. The output can then be used as a prompt sequence for downstream tasks.

Architecture Options and Design Trade-offs

A major contribution of the thesis is not only the final method, but the explicit design space around it.

Option A: Retrieval-first architectures

A retrieval-first system could ground the model by fetching relevant passages on demand. This is flexible and often powerful, but it introduces infrastructure complexity and does not automatically create an explicit conceptual hierarchy.

Option B: Fine-tuned domain models

A fine-tuned system can absorb domain knowledge into parameters, but requires data, training cost, and maintenance. It is less appropriate when documentation changes frequently.

Option C: Adapter-based specialization

Adapters reduce training cost relative to full fine-tuning, but still require domain supervision and introduce additional model-management complexity.

Option D: Structured prompting with explicit concept organization

AutoCoC belongs here. The design goal is to keep the injection mechanism lightweight while increasing semantic control. This makes it especially attractive for industrial settings where documentation exists, but labeled datasets or training budgets are limited.

Representation choice: graph vs DAG vs tree

Within structured prompting, the thesis also studies representation choices.

Graphs maximize expressiveness.
DAGs balance hierarchy and cross-links.
Trees maximize determinism and simplicity.

The thesis chooses the tree representation because it is the best fit for controlled prompt sequencing, recursive construction, and reproducible evaluation.

Downstream Use Case: Ontology Generation

To test whether concept-grounded prompting actually improves task performance, the thesis evaluates AutoCoC on OWL ontology generation in Turtle format.

This is a strong benchmark because ontology generation requires:

correct concept hierarchies
well-formed relations
explicit semantic constraints
valid formal syntax

In other words, it is not enough for the model to produce fluent text. It must generate structurally and semantically correct artifacts.

This use case is particularly appropriate because it exposes weaknesses that general LLM prompting often fails to solve reliably: incomplete taxonomies, invalid relations, and syntax errors in formal output.

Evaluation Summary

The thesis evaluates AutoCoC along two dimensions.

1. Intrinsic evaluation: concept tree consistency

Repeated generation experiments show that AutoCoC is highly stable at the concept level, even though some structural variation remains in parent-child arrangement. This suggests that the semantic backbone is reproducible, while multiple valid hierarchies may still emerge.

2. Extrinsic evaluation: downstream ontology quality

Across ten domains, AutoCoC achieved the strongest overall balance between semantic correctness and structural richness.

Key findings reported in the thesis include:

highest mean meta-property confidence: 0.893
highest mean OntoClean compliance rate: 90.59%
richest generated ontologies: 21.7 classes per file on average
concept-level stability: 0.889 across repeated runs

The main trade-off is computational cost. AutoCoC produces larger and semantically richer outputs, but evaluation time increases by roughly 40% relative to lighter baselines.

Why This Matters

The broader significance of the thesis is that it reframes concept structuring as an algorithmic problem rather than a manual bottleneck.

Instead of relying on domain experts to handcraft concept hierarchies, AutoCoC shows that documentation can be transformed automatically into a usable conceptual scaffold for LLMs. This makes concept-grounded prompting more scalable, more reproducible, and easier to adapt across domains.

More generally, the work argues for a middle layer between raw documentation and LLM inference: an explicit, dynamically constructed conceptual representation. That representation can improve semantic control without requiring full retrieval stacks or costly training pipelines.

Limitations and Future Directions

The thesis also makes clear that AutoCoC is not the final answer, but a strong starting point.

The main open directions include:

improving relation extraction and parent-child stability
moving beyond a single-tree assumption toward DAGs or multiple coordinated trees
testing the method on additional downstream tasks such as code generation, specification writing, and domain QA
comparing the approach more directly against retrieval, fine-tuning, and adapter-based alternatives under matched constraints

Takeaway

AutoCoC demonstrates that structured concept-based prompting can be automated effectively. By building concept trees from documentation and converting them into prompt sequences, the method injects domain knowledge into LLMs in a way that is lightweight, reproducible, and useful for structured reasoning tasks.

For domain-specific AI systems, this points to a practical design principle: when the model lacks the right conceptual structure, do not only retrieve more text or train more parameters. Build and inject the structure explicitly.