What Is a Large Language Model (LLM)?
A large language model (LLM) is a type of artificial intelligence built on a transformer-based neural network, trained on hundreds of billions of tokens of text to learn statistical patterns in language. The result is a model that can generate coherent text, answer questions, summarize documents, write code, and hold multi-turn conversations — all without being explicitly programmed for any individual task.
LLMs are the foundation of systems like GPT-4, Claude, Gemini, and Llama. They have moved from research curiosity to enterprise infrastructure in under three years, powering everything from customer service bots to internal knowledge assistants to automated data pipeline documentation. Understanding what they are — and what they are not — is now a prerequisite for anyone building or governing enterprise AI.
A large language model is a neural network trained on massive text datasets to predict and generate language. LLMs excel at synthesis, reasoning, and content generation but have hard limits: they don't have real-time knowledge, they can hallucinate, and they need structured context — like that provided by a knowledge graph or semantic layer — to give accurate, enterprise-grade answers.
How LLMs Work
LLMs are built on the transformer architecture introduced by Google in 2017. At their core, they work by predicting what comes next. Given a sequence of tokens (words, subwords, or characters), the model learns to predict the most probable continuation. Trained across enough text with enough compute, this simple objective produces a model capable of remarkable generalization.
Tokens, Attention, and Context Windows
Text is broken into tokens — roughly 0.75 words per token on average. The model processes tokens through layers of self-attention, which lets each token relate to every other token in the current input. This attention mechanism is what allows LLMs to track long-range dependencies (the subject of a sentence 400 tokens ago, a reference made three paragraphs back) that earlier models like LSTMs struggled with.
The total number of tokens the model can process at once is the context window. Early GPT-3 had 4,096 tokens. State-of-the-art models in 2026 handle 128K–1M tokens. A larger context window means the model can hold more of a conversation, document, or codebase in working memory — which directly affects the quality of enterprise RAG applications.
Pre-training and Fine-tuning
LLM development has two main phases. Pre-training is expensive and done by foundation model labs (OpenAI, Anthropic, Google, Meta): train on internet-scale text, produce a general-purpose model. Fine-tuning is cheaper and done by enterprises: take a pre-trained model and adapt it to a specific domain, task, or behavior using a smaller curated dataset. Fine-tuning can teach the model a company's terminology, tone, or specialized knowledge without retraining from scratch.
Training Data and Scale
The "large" in large language model refers to two dimensions: the number of parameters (weights) in the neural network, and the volume of training data. GPT-3 had 175 billion parameters. Modern frontier models are measured in the trillions. This scale is what enables emergent capabilities — behaviors that appear at large scale and weren't explicitly trained for, like multi-step reasoning, in-context learning, and code generation.
Training data quality matters at least as much as quantity. Models trained on poorly curated data learn to replicate its flaws — biases, factual errors, inconsistencies. The "garbage in, garbage out" principle applies with multiplicative force when the model has memorized its training distribution and will confidently reproduce it.
The Data Provenance Problem
For enterprise AI teams, training data provenance is increasingly a legal and governance concern. Questions that now land on data teams: What data did this model train on? Are we allowed to use it for commercial output? Do we understand what biases it may have absorbed? These are data governance questions — the same discipline that governs enterprise data assets now needs to extend to the AI systems that consume and generate data.
Capabilities and Limits
LLMs are genuinely good at a cluster of tasks: synthesizing information across many sources, generating coherent long-form text, translating, summarizing, answering questions with context, writing and explaining code, and reformatting structured data. These capabilities transfer across domains — a model trained on general text can discuss medical literature, legal contracts, and financial disclosures without separate training on each.
Hard Limits That Won't Be Fixed by Scale
Three limits are structural — they don't disappear just by making models bigger:
- Knowledge cutoff. LLMs are trained on a static snapshot of text. They don't know what happened yesterday, don't update continuously, and will confidently answer questions about events after their cutoff as if they had seen the data. This is why enterprise deployments almost always pair LLMs with retrieval systems.
- Hallucination. LLMs generate plausible-sounding text by predicting likely tokens. When asked about something outside their training distribution, they don't say "I don't know" — they generate a confident-sounding answer that may be entirely fabricated. See the separate entry on AI hallucination for the full treatment.
- No private knowledge by default. A foundation LLM knows nothing about your company's data, your internal systems, your customers, or your processes — unless you give that context in the prompt or through a retrieval mechanism. This is why RAG and fine-tuning exist, and why the quality of the context you provide is often the limiting factor in enterprise AI quality.
The LLM is not the hard part. For most enterprise AI deployments, the foundation model is a commodity. The hard parts are: what context you give it, how you verify its outputs, and whether your data governance infrastructure can support trustworthy AI at scale.
LLMs in the Enterprise
Enterprise LLM deployment typically takes one of three architectures:
- API access to foundation models — connect to OpenAI, Anthropic, Google via API. Fast to deploy, no training overhead, but all prompts go to third-party infrastructure. Data residency and privacy concerns apply.
- Self-hosted open models — run Llama 3, Mistral, or similar open-weight models on your own infrastructure. Data stays in-house, but requires ML engineering capacity for deployment, inference optimization, and updates.
- Fine-tuned domain models — adapt a base model to enterprise-specific vocabulary, tone, and domain knowledge. Highest performance on specialized tasks, highest upfront cost.
All three architectures share the same need: retrieval-augmented generation to ground the model in current, private, enterprise-specific knowledge. The LLM without an information retrieval layer is a general-purpose text generator. With a well-governed retrieval layer, it becomes an enterprise intelligence system.
Data Governance for LLMs
LLMs introduce new requirements for data governance that most existing frameworks don't address. Three areas are most critical:
Input Governance: What Context You Feed the Model
Every prompt carries data. If employees use an LLM to answer questions about customers, patient records, or financial projections, the sensitive data in those prompts may leave the corporate network. Input governance — classifying what data can go to which models, under what conditions — is a new operational requirement for data teams.
Output Governance: Verifying Generated Content
LLM outputs can't be trusted as-is. They require verification: Does this answer match our metadata? Is this summary accurate? Did the model cite real sources? Organizations that have invested in a data catalog and knowledge graph can point the LLM at governed, authoritative sources — making verification tractable. Without that infrastructure, each output is a manual fact-check.
Model Governance: Version, Drift, and Accountability
Like software, models have versions. Like data, they can drift. A model that performed well in March may perform differently in June — the same input producing different outputs after a provider update. Model governance tracks which version is deployed, monitors for performance drift, and maintains an audit trail of what the model was used for. This is the domain of LLMOps.
How Dawiso Supports LLM Workflows
Dawiso's role in the enterprise LLM stack is to be the governed context layer that makes LLM answers trustworthy. When a user asks an AI assistant "what does churn rate mean in our context?" or "which datasets feed the monthly revenue report?", the answer should come from a governed, version-controlled source — not from an LLM's general training data.
The Business Glossary provides the canonical definitions that LLMs should retrieve when asked about business terms. The Data Catalog provides the authoritative inventory of datasets, owners, and quality metrics. The Dawiso MCP Server exposes this governed knowledge base to LLM agents via the Model Context Protocol, so external AI systems can query Dawiso's graph for reliable, up-to-date enterprise context before generating answers.
The result: LLMs that answer enterprise questions from governed sources, with traceability back to the data assets that informed every response.
Conclusion
Large language models are a genuinely transformative technology — but their enterprise value depends entirely on the infrastructure around them. An LLM with no reliable context is a confident text generator prone to hallucination. An LLM connected to a governed knowledge graph, a maintained business glossary, and well-cataloged data assets is the foundation of enterprise intelligence. The model provides the language capability; data governance provides the trustworthiness.