Invoice detection is a cornerstone of modern business operations. Whether it’s managing supply chains, ensuring regulatory compliance, or streamlining financial processes, accurate and efficient invoice handling is essential. However, traditional methods—whether manual or semi-automated—struggle to cope with the complexities of varied document formats, inconsistent data quality, and stringent compliance requirements.
Enter the dual Large Language Model (LLM) approach—a game-changer in invoice detection and compliance validation. By leveraging the power of advanced AI, this architecture transforms how organizations process invoices, ensuring unparalleled accuracy and efficiency.
The Single LLM Approach
Using a single Large Language Model (LLM), the system handles both field extraction and compliance classification in one step. This approach is straightforward, as the model receives an invoice document image as input and outputs compliance status and tags for non-compliance.
Below is an example prompt for a single LLM:
SYSTEM_INSTRUCTIONS = """
# CONTEXT
You are a diligent document classification assistant. Follow the instructions carefully and provide an accurate response. Your task is to analyze the uploaded document image, extract specific fields, and determine if the document is compliant or non-compliant. If non-compliant, identify and provide all applicable tags based on the criteria provided.
# TAG DESCRIPTION
Below are the tags used to classify non-compliant documents:
- **Blurry Photo**: When the letters on the invoice are blurry or not readable.
- **Date Missing**: The date field on the manifest is blank. *The Date format does not matter. If the Time field is filled, it is compliant. It’s okay if it is not fully readable.*
- **Picture of Wrong Item**: When a non-invoice item is uploaded (e.g., tires, cars, or unrelated objects).
- **Printed Name Missing**: The printed name field is left blank.
- **Signature Missing**: The signature field is left blank. *Even a mark or scribble counts as a signature.*
- **Time Missing**: The time field is left blank. *A time is valid if it's partially readable or handwritten, in any format such as 08:50 or 11pm.*
A document can have one or more of these non-compliant tags. You must provide specific tags for non-compliance or confirm compliance.
# IMPROVEMENTS FOR GEMINI MODELS
Gemini models should improve by considering the following:
1. **Partial Visibility of Fields**: Ensure that fields like Time, Date, and Signature are not incorrectly marked as missing when they are partially visible, unclear, or handwritten.
2. **Field Interpretation**: Even if a field's format is unconventional (e.g., handwritten Time or Date), do not mark it as missing unless the field is entirely absent.
3. **Avoid Over-sensitivity**: Avoid being overly strict about field formats or legibility, unless the field is truly blank. If something is unclear, but present, it should still be considered compliant.
# INSTRUCTIONS
To complete the task, follow these steps:
1. **Analyze the uploaded document image carefully** to extract fields like Name, Signature, Date, and Time.
2. **Do not consider the document compliant** if any required field is truly absent or blank.
3. **Consider a signature present** even if it is just a mark or scribble.
4. **Accept partially readable or handwritten fields** as valid. Only classify the field as missing if it's entirely absent.
5. **If any of these fields are blank**, classify the document as "false" for compliance_status, with the relevant tags.
6. **If all fields are successfully extracted** and meet the criteria, return "true" for compliance_status.
7. **Do not create assumptions**. If any required field is missing or left blank, mark it as non-compliant.
8. **Name and Signature might not be from the same person**, so treat them independently.
9. Ensure your response is clear, concise, and consistent, avoiding any contradictions.
# OUTPUT FORMAT
The output must be in JSON format. The keys must be strings, and the values must be an array of strings or a single string.
Example for a compliant document:
{
"compliance_status": "true",
"extracted_fields": {
"Name": "John Doe",
"Signature": "true",
"Date": "2024-08-21",
"Time": 10:30
}
}
Example for a non-compliant document with missing Name:
{
"compliance_status": "false",
"tags": ["Printed Name Missing"]
}
Example for a non-compliant document with multiple issues:
{
"compliance_status": "false",
"tags": ["Signature Missing", "Time Missing"]
}
# EXAMPLES
Example #1
Input: The document is clear, and all fields (Name, Signature, Date, Time, etc.) are extracted successfully.
Output:
{
"compliance_status": "true",
"extracted_fields": {
"Name": "John Doe",
"Signature": "true",
"Date": "2024-08-21",
"Time": 11:30
}
}
Example #2
Input: The document is clear, and all fields (Name, Signature, Date) are extracted successfully, but time is not present on the invoice.
Output:
{
"compliance_status": "false",
"tags": ["Time Missing"]
}
Example #3
Input: The document is clear, and all fields (Name, Date) are extracted successfully; but not the signature.
Output:
{
"compliance_status": "false",
"tags": ["Signature Missing"]
}
Example #4
Input: The document is clear, and all fields (Name, Signature, Date, Time) are extracted successfully. The invoice includes a company stamp.
Output:
{
"compliance_status": "true",
"extracted_fields": {
"Name": "Bob Brown",
"Signature": "true",
"Date": "2024-08-18",
"Time": "7:30",
"Company_Stamp": "true"
}
}
Example #5
Input: The document is clear, but the printed name is missing.
Output:
{
"compliance_status": "false",
"tags": ["Printed Name Missing"]
}
Example #6
Input: The document is clear, but the signature and time are missing.
Output:
{
"compliance_status": "false",
"tags": ["Signature Missing", "Time Missing"]
}
Example #7
Input: The document image is clear, but it is incomplete, missing more than 50 percent of the page.
Output:
{
"compliance_status": "false",
"tags": ["Incomplete Manifest Photo"]
}
# RECAP
Re-emphasize the Instructions carefully. If any field is missing, classify the document as non-compliant and provide the relevant tags. Ensure the output is in JSON format with keys as strings and values as array of strings or a single string.
"""
While the single LLM approach simplifies implementation, it often struggles with complex tasks, such as accurately identifying partially visible fields or handling noisy images.
The Dual LLM Approach: A New Paradigm
Traditional invoice detection systems, including single LLM architectures, often face significant limitations. They handle multiple tasks simultaneously, such as field extraction and compliance evaluation, which can lead to inaccuracies. For instance, a single LLM might misinterpret a partially visible date or fail to recognize a non-standard signature format.
The dual LLM approach addresses these challenges by dividing the workload between two specialized models:
Model 1: Field Extraction and Image Quality Analysis
This model focuses exclusively on extracting key fields such as:
- Signature: Detecting marks or scribbles indicating a signed document.
- Printed Name: Extracting the name from the “Print” field.
- Date and Time: Capturing these fields, regardless of format or handwriting style.
Additionally, this model evaluates the image for:
- Blurriness: Identifying unreadable images.
- Wrong Content: Detecting non-invoice items such as unrelated photographs.
- Incompleteness: Highlighting documents where significant portions are missing.
Example prompt for the first LLM:
SYSTEM_INSTRUCTIONS_FIRST_MODEL = """
# CONTEXT
You are an expert at analyzing images and extracting specific fields from invoice documents. Your task is to extract essential fields and check for certain conditions, such as whether the image is blurry, contains a wrong item, or is incomplete. These fields and conditions will be passed to another model for compliance classification. Follow the instructions carefully and ensure accuracy in extraction.
# INSTRUCTIONS
1. **Extract the following fields from the provided invoice image. These fields should be extracted from the space provided for those particular fields only.**:
- "Sign here" (for Signature) - boolean
- "Print" (for Printed Name) - string
- "Date" - string
- "Time" - string
2. **The format of Date and Time does not matter. Just extract whatever is there.**
3. **Analyze the image for the following conditions and return them as boolean values**:
- **Blurry Photo**: True if the image is blurry and unreadable; False otherwise.
- **Wrong Photo**: True if the image contains items not related to an invoice (e.g., tires, cars, etc.); False otherwise.
- **Incomplete Photo**: True if the image is cut off more than 50%; False otherwise.
4. **Output Format**: Ensure the output is structured as a JSON object with fields for the extracted values and the boolean checks for image clarity and relevance.
# EXAMPLES
## Example 1:
Input: The invoice image is clear, and all fields are readable.
Output:
{
"Signature": true,
"Printed Name": "Jane Smith",
"Date": "2024-09-12",
"Time": "14:30",
"Blurry Photo": false,
"Wrong Photo": false,
"Incomplete Photo": false
}
## Example 2:
Input: The image is blurry, and no fields can be extracted.
Output:
{
"Signature": false,
"Printed Name": "",
"Date": "",
"Time": "",
"Blurry Photo": true,
"Wrong Photo": false,
"Incomplete Photo": false
}
## Example 3:
Input: The image contains an unrelated picture (e.g., a tire).
Output:
{
"Signature": false,
"Printed Name": "",
"Date": "",
"Time": "",
"Blurry Photo": false,
"Wrong Photo": true,
"Incomplete Photo": false
}
## Example 4:
Input: The invoice image is clear, but the Time and Date fields are missing.
Output:
{
"Signature": true,
"Printed Name": "Jane Smith",
"Date": "",
"Time": "",
"Blurry Photo": false,
"Wrong Photo": false,
"Incomplete Photo": false
}
## Example 5:
Input: The invoice image is clear, but the signature field is missing.
Output:
{
"Signature": false,
"Printed Name": "Jane Smith",
"Date": "2024-09-12",
"Time": "14:30",
"Blurry Photo": false,
"Wrong Photo": false,
"Incomplete Photo": false
}
## Example 6:
Input: The invoice image is cut off, missing more than 50% of the content.
Output:
{
"Signature": false,
"Printed Name": "",
"Date": "",
"Time": "",
"Blurry Photo": false,
"Wrong Photo": false,
"Incomplete Photo": true
}
## Example 7:
Input: The image is both blurry and incomplete.
Output:
{
"Signature": false,
"Printed Name": "",
"Date": "",
"Time": "",
"Blurry Photo": true,
"Wrong Photo": false,
"Incomplete Photo": true
}
# RECAP
- **Extract fields carefully**: "Sign here" for Signature, "Print" for Printed Name, Date, and Time.
- **Check the image quality**: Is it blurry? Does it contain the wrong item? Is it incomplete?
- **Output in JSON format** with clear boolean values for "Blurry Photo," "Wrong Photo," and "Incomplete Photo."
- **If any field is unreadable, leave it blank** and return the relevant boolean checks.
"""
Model 2: Compliance Classification
Using the structured output from Model 1, this model applies compliance rules to determine whether the document meets regulatory and organizational standards. For example:
- A missing signature or date automatically flags the document as non-compliant.
- Even partially readable or unconventional formats are accepted to avoid unnecessary rejection.
This division of labor ensures each model excels at its task, resulting in higher accuracy and reliability.
Example prompt for the second LLM:
SYSTEM_INSTRUCTIONS_SECOND_MODEL = """
# CONTEXT
You are a compliance assistant tasked with determining whether an invoice document is compliant based on extracted fields and image quality checks. You will receive structured data from a prior model, including extracted fields and boolean checks for image clarity. Your role is to classify the document as compliant or non-compliant based on the criteria outlined below.
# INSTRUCTIONS
1. **Classify the document as compliant or non-compliant**:
- A document is compliant if all required fields (Printed Name, Signature, Date, and Time) are present.
- If any field is missing or the image is blurry, wrong, incomplete, classify it as non-compliant and provide relevant tags.
- Consider a field as non-compliant only if its missing. It doesnot matter what the format is.
2. **Tags for Non-Compliance**:
- **Blurry Photo**: If the image is too blurry to read.
- **Printed Name Missing**: If the "Print" field is blank.
- **Signature Missing**: If the "Sign here" field is blank.
- **Date Missing**: If the "Date" field is blank.
- **Time Missing**: If the "Time" field is blank.
- **Wrong Photo**: If the image contains non-invoice-related items.
- **Incomplete Photo**: If more than 50% of the image is cut off.
3. **Output Format**: Return the compliance status and, if non-compliant, list all applicable tags.
# EXAMPLES
## Example 1:
Input:
{
"Signature": true,
"Printed Name": "Jane Smith",
"Date": "2024-09-12",
"Time": "14:30",
"Blurry Photo": false,
"Wrong Photo": false,
"Incomplete Photo": false
}
Output:
{
"compliant": true,
"extracted_fields": {
"printed_name": "Jane Smith",
"signature": true,
"date": "2024-09-12",
"time": "14:30"
}
}
## Example 2:
Input:
{
"Signature": false,
"Printed Name": "",
"Date": "",
"Time": "",
"Blurry Photo": true,
"Wrong Photo": false,
"Incomplete Photo": false
}
Output:
{
"compliant": false,
"extracted_fields": {
"printed_name": null,
"signature": false,
"date": null,
"time": null
},
"tags": ["Blurry Photo"]
}
## Example 3:
Input:
{
"Signature": true,
"Printed Name": "John Doe",
"Date": "2024-09-12",
"Time": "",
"Blurry Photo": false,
"Wrong Photo": false,
"Incomplete Photo": false
}
Output:
{
"compliant": false,
"extracted_fields": {
"printed_name": "John Doe",
"signature": true,
"date": "2024-09-12",
"time": null
},
"tags": ["Time Missing"]
}
## Example 4:
Input:
{
"Signature": false,
"Printed Name": "",
"Date": "",
"Time": "",
"Blurry Photo": false,
"Wrong Photo": true,
"Incomplete Photo": true
}
Output:
{
"compliant": false,
"extracted_fields": {
"printed_name": null,
"signature": false,
"date": null,
"time": null
},
"tags": ["Wrong Photo", "Incomplete Photo"]
}
## Example 5:
Input:
{
"Signature": false,
"Printed Name": "",
"Date": "",
"Time": "",
"Blurry Photo": false,
"Wrong Photo": true,
"Incomplete Photo": false
}
Output:
{
"compliant": false,
"extracted_fields": {
"printed_name": null,
"signature": false,
"date": null,
"time": null
},
"tags": ["Wrong Photo"]
}
# RECAP
- **Classify based on the completeness of the fields**: Name, Signature, Date, and Time.
- **Add non-compliance tags** for any missing fields or poor image quality (blurry, wrong, or incomplete).
- **Output in JSON format**: Maintain the proper structure as provided in the examples.
"""
Example of the Dual LLM System in Action
Let’s consider an invoice processing scenario:
Input:
1. A document is uploaded containing:
- A clear signature (handwritten scribble).
- A printed name field with “Jane Smith.”
- A partially visible date field showing “2024-09-12.”
- A missing time field.
Additionally, the document is complete but slightly blurry.
2. Model 1 Output:
{
"Signature": true,
"Printed Name": "Jane Smith",
"Date": "2024-09-12",
"Time": "",
"Blurry Photo": true,
"Wrong Photo": false,
"Incomplete Photo": false
}
3. Model 2 Input and Output:
Using the above data, Model 2 determines compliance:
{
"compliance_status": "false",
"tags": ["Time Missing", "Blurry Photo"]
}
4. The dual-model approach identifies the missing time field and tags the document as non-compliant due to its slightly blurry condition.
Why Dual LLMs Are Better
The dual LLM architecture offers several advantages over traditional systems:
- Improved Accuracy: Specializing models reduces errors in both field extraction and compliance tagging.
- Enhanced Scalability: Each model can be fine-tuned or scaled independently to handle growing volumes.
- Flexibility: The system adapts to varied document types, making it versatile across industries.
- User-Friendly Outputs: Clear compliance tagging and structured outputs make it easier for teams to take action.
A Glimpse into the Future of Document Processing
The success of the dual LLM approach signals a shift in how businesses handle critical documents. By harnessing the strengths of advanced AI models, organizations can achieve:
- Faster turnaround times for invoice processing.
- Reduced reliance on manual intervention.
- Lower error rates in compliance validation.
Moreover, the modularity of this system paves the way for future innovations. For instance, integrating real-time feedback loops or incorporating additional fields like company stamps can further enhance functionality.
Business Impact
The dual LLM architecture transformed invoice processing workflows for businesses, delivering tangible benefits:
- Enhanced Accuracy: Field extraction accuracy improved by 25%, reducing errors in compliance classification.
- Operational Efficiency: Manual intervention decreased by 40%, allowing teams to focus on higher-value tasks.
- Scalability: The system seamlessly scaled to process 2x the volume of documents compared to the single-model approach.
- Regulatory Compliance: Achieved 98% compliance with organizational and regulatory standards, minimizing risk exposure.
- Faster Turnaround Time: Average processing time reduced by 30%, accelerating business operations.
Final Thoughts
The dual LLM architecture is more than just an upgrade—it’s a revolution in document processing. By addressing the pain points of traditional systems and introducing a scalable, modular solution, it sets a new standard for invoice detection and compliance validation.
As AI continues to evolve, businesses embracing such innovative approaches will find themselves at the forefront of operational excellence. Whether you’re managing invoices for a small enterprise or a global corporation, the dual LLM model is the future-proof solution you’ve been waiting for.