Artificial Intelligence

Artificial Intelligence-Powered Document Intelligence: Combining WOPI, Office Add-Ins, and LLMs

Artificial Intelligence-Powered Document Intelligence: Combining WOPI, Office Add-Ins, and LLMs

AI-Powered Document Intelligence: Combining WOPI, Office Add-Ins, and LLMs

Enterprise document processing has been a persistent challenge. Organisations generate, receive, and process vast amounts of documents — contracts, invoices, reports, correspondence, regulatory filings — and the intelligence locked within those documents remains largely inaccessible to automated systems.

Traditional approaches to document processing (OCR, rule-based extraction, keyword search) handle structured documents reasonably well but struggle with the variability, nuance, and context that real-world documents contain. These earlier systems were explicitly programmed to follow human-coded rules, making them inflexible when faced with new formats or language. A contract does not always put the termination clause in the same place. An invoice from one vendor looks nothing like an invoice from another. A regulatory filing uses domain-specific language that generic systems misinterpret.

Large language models change the equation fundamentally. LLMs can read, understand, classify, extract from, and summarise documents with a degree of accuracy and flexibility that was previously impossible. Today, AI solutions are able to address these document processing challenges at scale. When combined with WOPI for document access and Office add-ins for in-application user interfaces, the result is enterprise AI document intelligence that works where people work — inside Microsoft Office. Artificial intelligence has long been a theme in science fiction, shaping public perception of its potential and risks.

Artificial intelligence originated as an academic discipline in the 1950s, with John McCarthy coining the term ‘artificial intelligence’ at the Dartmouth Conference in 1956. This article explores the architecture patterns for building AI-powered document intelligence systems that combine WOPI, Office Add-Ins, and LLMs.

The Three-Layer Architecture

Enterprise AI document intelligence systems built on Microsoft technologies typically follow a three-layer architecture:

Layer 1: Document Access (WOPI)

The WOPI protocol provides the document access layer. Through WOPI, the system can:

  • Retrieve documents from any WOPI-enabled storage system — SharePoint, custom document management systems, SaaS platforms with WOPI integrations.

  • Open documents in Office for the web, enabling users to view and edit documents within their browser.

  • Access document content programmatically for AI processing, using the GetFile operation to retrieve the document’s binary content.

  • Update documents with AI-processed results, using the PutFile operation to write modified content back to the storage system.

WOPI serves as the universal connector between the AI system and wherever documents live in the organisation.

Layer 2: User Interface (Office Add-Ins)

Office add-ins provide the user interface layer, embedding AI capabilities directly within the applications where users work. These add-ins are essentially a type of web application built using web technologies like HTML, CSS, and JavaScript, running inside Office. Add-ins can be developed for a specific application such as Word, Excel, or Outlook, allowing tailored solutions that extend the functionality of each Office product:

  • Word add-ins for contract analysis, document summarisation, and content classification.

  • Excel add-ins for financial data extraction, data validation, and automated reporting.

  • Outlook add-ins for email triage, response drafting, and attachment analysis.

The add-in task pane provides a dedicated panel where AI results are presented, user input is collected, and workflow actions are triggered — without the user leaving their Office application. The add-in can also display web-based content, such as charts or media, directly on a page within the Office application to enhance user experience.

Layer 3: Intelligence (LLMs)

The LLM layer provides the analytical intelligence, leveraging advanced ai algorithms to analyze documents. This intelligence layer is designed to achieve specific goals in document processing, adapting to evolving requirements and workflows.

  • Document classification: Determining what type of document it is (contract, invoice, report, letter) and routing it to the appropriate processing pipeline. LLMs perform tasks such as classification to ensure documents are handled efficiently.

  • Information extraction: Pulling structured data from unstructured text — dates, parties, amounts, obligations, key terms. LLMs are often optimized for a specific task within the workflow, ensuring high accuracy in extraction.

  • Summarisation: Generating concise summaries of long documents, highlighting key points and action items. This is another example of how LLMs perform tasks to streamline document review.

  • Analysis: Identifying risks, obligations, anomalies, or patterns within document content. Problem solving is a key function of the intelligence layer, enabling the system to find solutions to complex document-related challenges.

  • Generation: Drafting responses, creating reports, or producing new content based on document analysis, helping users find solutions to their document needs.

The system continuously improves over time based on the knowledge gained from processing documents, enhancing its ability to perform tasks and achieve specific goals with greater efficiency and accuracy.

Use Case: Automated Contract Review in Word

Let us walk through a concrete implementation: an AI-powered contract review system built as a Word add-in.

User Experience
  1. A user opens a contract in Word (either locally or via WOPI from a document management system).

  2. They click the “Review Contract” button in the add-in task pane.

  3. The add-in extracts the document content and sends it to the AI service.

  4. Within seconds, the task pane displays:

    • Contract type: Service agreement, NDA, licence agreement, etc.

    • Key parties: Names and roles of the contracting parties.

    • Key dates: Effective date, expiration date, renewal dates, notice periods.

    • Financial terms: Contract value, payment terms, penalties.

    • Risk flags: Unusual clauses, missing standard terms, unfavourable provisions.

    • Summary: A plain-English summary of the contract’s key terms and obligations.

  5. The user can click on any extracted item to navigate to the relevant section in the document.

  6. They can request deeper analysis: “What are the termination conditions?” or “Compare this contract’s liability cap to our standard terms.”

Technical Architecture
[Word Document] → [Word Add-In (Task Pane)]
                        ↓
                  [API Gateway]
                        ↓
                  [AI Processing Service]
                   ↓            ↓
            [LLM API]    [Document Parser]
                   ↓            ↓
            [Classification,  [Text Extraction,
             Extraction,       Section Detection,
             Analysis]         Structure Analysis]
                        ↓
                  [Results API]
                        ↓
                  [Word Add-In (Display Results)]

The Word add-in uses the Office JavaScript API to extract the document content:

async function getDocumentContent(): Promise< string> {
  return new Promise((resolve) => {
    Word.run(async (context) => {
      const body = context.document.body;
      body.load("text");
      await context.sync();
      resolve(body.text);
    });
  });
}

The AI processing service receives the document text, applies the LLM for classification and extraction, and returns structured results. AI models can also be trained on computer code to enable advanced document processing and automation.

Use Case: Financial Data Extraction in Excel

Financial teams frequently receive data in documents — PDF invoices, bank statements, financial reports — that needs to be entered into Excel for analysis. AI-powered extraction automates this.

Implementation
  1. The user opens an Excel workbook and opens the add-in task pane.

  2. They drag a PDF invoice into the add-in (or select it from a document management system via WOPI).

  3. The AI service processes the PDF:

    • Extracts the document text (using OCR if necessary).

    • Identifies it as an invoice.

    • Extracts line items: description, quantity, unit price, total, tax.

    • Extracts header information: vendor name, invoice number, date, payment terms.

  4. The add-in presents the extracted data for review.

  5. The user confirms the extraction, and the add-in populates the Excel worksheet:

async function populateInvoiceData(invoiceData: InvoiceData): Promise<void> {
  await Excel.run(async (context) => {
    const sheet = context.workbook.worksheets.getActiveWorksheet();

    // Write header information
    sheet.getRange("A1").values = [["Vendor"]];
    sheet.getRange("B1").values = [[invoiceData.vendor]];
    sheet.getRange("A2").values = [["Invoice Number"]];
    sheet.getRange("B2").values = [[invoiceData.invoiceNumber]];

    // Write line items
    const startRow = 5;
    const headers = [["Description", "Quantity", "Unit Price", "Total", "Tax"]];
    sheet.getRange(`A${startRow}:E${startRow}`).values = headers;

    invoiceData.lineItems.forEach((item, index) => {
      const row = startRow + 1 + index;
      sheet.getRange(`A${row}:E${row}`).values = [[
        item.description,
        item.quantity,
        item.unitPrice,
        item.total,
        item.tax
      ]];
    });

    await context.sync();
  });
}

Use Case: Email Triage and Response Drafting in Outlook

Customer-facing teams handle hundreds or thousands of emails daily. AI-powered triage and response drafting in Outlook transforms this workflow. The AI analyzes emails written in human language to understand intent and context.

  • Automatic categorization: AI sorts incoming emails by urgency, topic, or customer segment.

  • Suggested response: Generative AI applications create draft responses tailored to the email’s content, saving time and ensuring consistency.

  • Other forms: Generative AI can also produce other forms of content, such as summaries or action items, to further streamline communication.

  • Analysis: The AI recognizes complex patterns in email content, identifying trends, sentiment, and potential issues for escalation.

Implementation
  1. The user selects an email in Outlook.

  2. The add-in automatically analyses the email and displays:

    • Classification: Enquiry, complaint, order, technical support, billing.

    • Priority: Based on content analysis, sender history, and urgency signals.

    • Suggested response: A draft response tailored to the email’s content and classification.

    • Relevant information: Links to related customer records, previous correspondence, and knowledge base articles.

  3. The user reviews the suggested response, makes any necessary edits, and sends.

The Outlook add-in uses the Mail APIs to access the email content:

function getEmailContent(item: Office.MessageRead): Promise<string> {
  return new Promise((resolve) => {
    item.body.getAsync(Office.CoercionType.Text, (result) => {
      resolve(result.value);
    });
  });
}

LLM Integration Patterns

The AI processing service can integrate with LLMs through several patterns, each with trade-offs. LLMs are built on deep neural network architectures, which are a type of artificial neural network and form the foundation of deep learning. As part of the broader field of machine learning, these models leverage neural networks to enable advanced language understanding and document intelligence. The architecture of LLMs typically consists of multiple layers—often many layers—which allows them to extract hierarchical features from complex data. Training and running these models requires significant computing power, as the deep neural network structures are computationally intensive. Recent advancements in AI have introduced new features, such as generative capabilities and enhanced automation, further expanding the potential of LLM integration.

Direct API Call

Send the document content directly to an LLM API (GPT-4, Claude, etc.) with instructions for the analysis task. This is the simplest pattern and works well for moderate-length documents.

Advantages: Simple implementation, no infrastructure to manage. Limitations: Document length limited by the model’s context window. Cost scales linearly with document size.

Retrieval-Augmented Generation (RAG)

For large document collections or when the AI needs to reference external knowledge:

  1. Chunk the document into sections.

  2. Create embeddings for each section.

  3. Store embeddings in a vector database.

  4. When processing a query, retrieve relevant sections and include them in the LLM prompt.

Advantages: Handles very large documents and document collections. Can reference external knowledge bases. Limitations: More complex infrastructure. Retrieval quality affects analysis quality.

Hybrid (On-Premises SLM + Cloud LLM)

For organisations with data privacy requirements:

  1. Use an on-premises small language model for initial classification and extraction of non-sensitive metadata.

  2. Escalate to a cloud LLM only for complex analysis tasks, with sensitive data redacted or anonymised.

Advantages: Balances privacy with capability. Most document processing stays on-premises. Limitations: More complex architecture. Redaction must be thorough to be effective.

Algorithmic Bias and Mitigation in Document Intelligence

Identifying and Addressing Bias in AI-Powered Document Workflows

As AI systems become increasingly central to enterprise document intelligence, the challenge of algorithmic bias has come to the forefront. AI models, especially those used in document classification, extraction, and summarization, can inadvertently reflect and amplify biases present in their training data or underlying algorithms. For organizations relying on AI tools to analyze data, make hiring decisions, or automate repetitive tasks, unchecked bias can lead to unfair outcomes, regulatory risks, and a loss of trust.

AI researchers and software developers have identified several sources of bias in document intelligence workflows. These include imbalances in training data, the use of non-representative examples, and the influence of human biases during data labeling or model design. For example, if an AI agent is trained primarily on contracts from a single industry or region, it may misclassify documents from other sectors, leading to skewed results.

To address these challenges, the field of computer science has developed a range of techniques for bias mitigation. Data preprocessing is a foundational step, involving the careful cleaning and balancing of training data to ensure it accurately represents the diversity of real-world documents. Feature engineering allows AI models to focus on relevant attributes while minimizing the impact of potentially biased features. Model selection is also critical—choosing deep learning architectures or generative AI tools that are robust against bias and can adapt to new data and other tasks.

Explainability is another key pillar. By designing AI systems that provide transparent insights into their decision-making processes, organizations can identify and address potential biases before they impact business outcomes. For instance, Office add-ins developed for Word or Excel can include features that highlight why a document was classified a certain way, or flag sections where algorithmic bias may have influenced the analysis.

Advanced AI research continues to push the boundaries of bias mitigation. Techniques such as adversarial training help AI models become more resilient to biased inputs, while transfer learning enables the use of pre-trained models that can be fine-tuned for specific document intelligence tasks with reduced risk of bias. Ensemble methods, which combine the outputs of multiple AI models, can further enhance fairness and accuracy by balancing out individual model weaknesses.

In practical terms, integrating these bias mitigation strategies into enterprise document workflows is essential. For example, Microsoft Edge can serve as a platform for deploying AI-powered Office add-ins that automatically detect and correct bias in real-time, ensuring that document analysis remains fair and reliable. AI agents embedded in office applications can simulate human intelligence, providing unbiased recommendations and supporting a broad range of real world applications—from virtual assistants to autonomous vehicles.

Ultimately, building trustworthy AI-powered document intelligence requires a multidisciplinary approach. By combining the latest advances in deep learning, natural language processing, and knowledge representation with domain expertise and rigorous testing, organizations can develop AI applications that are not only powerful but also fair, transparent, and explainable. This commitment to fairness is essential for unlocking the full potential of AI in document intelligence, marketing campaigns, and beyond—ensuring that AI systems deliver accurate, unbiased results across the enterprise.

The McKenna Advantage

What makes McKenna Consultants uniquely positioned for AI document intelligence is the combination of three deep specialisms:

  • WOPI expertise: We understand how to build robust, scalable document access through the WOPI protocol, including authentication, lock management, and coauthoring.

  • Office add-in development: We have built dozens of enterprise Office add-ins across Word, Excel, and Outlook, with deep knowledge of the JavaScript APIs, deployment models, and user experience patterns.

  • AI implementation: We design and build AI processing pipelines using LLMs, including RAG architecture, prompt engineering, and model selection.

No single-competence consultancy can deliver the complete AI document intelligence solution. A WOPI specialist cannot build the add-in UI. An add-in developer cannot design the AI pipeline. An AI consultancy cannot integrate with the document access layer.

If you are exploring AI-powered document intelligence for your enterprise, contact us to discuss how we can help.

Have a question about this topic?

Our team would be happy to discuss this further with you.