🎓 Keep learning → Explore all DeepLearning.AI courses — taught by the people building the future of AI. Find your next one.
💻 Explore more course artifacts → Browse the DeepLearning.AI course artifacts repo to find notebooks, projects, and notes from other courses across the DeepLearning.AI library.
This repository covers supplemental material for Lab 6 of the course. Document processing can surface information from across files and images by parsing, extracting and splitting data. In Lab 5 of the course, you built a local pipeline for document processing with LandingAI. Here you will learn to build the same pipeline in the cloud with AWS. In particular, you will build a chatbot for conducting deep research with several AWS services. Those services include Lambda, S3, IAM, and Bedrock. To learn more about cloud computing on AWS, please check out the following resources:
- Documentation
- Libraries
The pipeline consists of three components:
- S3 Bucket: Stores uploaded PDF documents
- Lambda Function: Automatically triggered on file upload to S3
- LandingAI ADE:
- Processes documents and extracts chunks with bounding boxes.
- Creates individual JSON files for each document chunk
- Storage:
output/medical/: Markdown filesoutput/medical_grounding/: Grounding data with bounding boxesoutput/medical_chunks/: Individual chunk JSON files for Knowledge Baseoutput/medical_chunk_images/: Dynamically generated cropped chunk images
- AWS Bedrock Knowledge Base: Indexes individual chunk JSON files
- Metadata: Maintains chunk type, page number, and bounding box coordinates
- Strands Agent Framework: Orchestrates conversation flow
- Bedrock Memory Service: Maintains conversation context
- Visual Grounding:
- Extracts and crops specific chunk regions from PDFs
- Adds red border highlighting around chunks
To replicate the lab, you must configure your own AWS account.
- Python
- Use version 3.10
- OS
- Recommended to use x86_64
- AWS
- Please get AWS account with permissions for the following service
- Lambda
- S3
- IAM
- Bedrock
- CloudWatch Logs
- In your account you must set up the following resources
- S3 Bucket
- Bedrock Knowledge Base
- Please get AWS account with permissions for the following service
- LandingAI
- Vision Agent API Key
- Remember that you can make a free account at LandingAI:
sc-landingai/
├── L6.ipynb # Main lab notebook
├── ade_s3_handler.py # Lambda function for document processing
├── lambda_helpers.py # Helper functions for Lambda deployment
├── visual_grounding_helper.py # Functions for creating cropped chunk images
├── medical/ # Sample medical PDF documents
│ ├── Common_cold_clinincal_evidence.pdf
│ ├── CT_Study_of_the_Common_Cold.pdf
│ ├── Evaluation_of_echinacea_for_the_prevention_and_treatment_of_the_common_cold.pdf
│ ├── Prevention_and_treatment_of_the_common_cold.pdf
│ ├── The_common_cold_a_review_of_the_literature.pdf
│ ├── Understanding_the_symptoms_of_the_common_cold_and_influenza.pdf
│ ├── Viruses_and_Bacteria_in_the_Etiology_of_the_Common_Cold.pdf
│ └── Vitamin_C_for_Preventing_and_Treating_the_Common_Cold.pdf
└── README.md # This file
- Make two folders in your S3 bucket called
input/andoutput/ - Connect the Bedrock Knowledge Base to the folder
Create a .env file with your credentials:
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-west-2
S3_BUCKET=your-bucket-name
VISION_AGENT_API_KEY=your_landingai_api_key
BEDROCK_MODEL_ID=us.anthropic.claude-sonnet-4-5-20250929-v1:0
BEDROCK_KB_ID=your_knowledge_base_idpip install boto3 python-dotenv Pillow PyMuPDF landingai-ade typing-extensions
pip install bedrock-agentcore strands-agents pandasOpen Lab-6.ipynb in Jupyter and follow the step-by-step instructions to:
- Deploy the Lambda function
- Set up S3 triggers
- Process medical documents (creates chunks automatically)
- Configure Bedrock Knowledge Base to index
output/medical_chunks/ - Test chunk-based search with
search_medical_chunks() - Launch the interactive chatbot
Monitor Lambda execution in AWS CloudWatch:
- Processing status for each document
- Error messages and stack traces
- Performance metrics and duration
Check processed outputs:
# List all processed files
stats = monitor_lambda_processing(logs_client, s3_client, bucket_name)Verify document ingestion:
response = bedrock_agent.start_ingestion_job(
knowledgeBaseId=BEDROCK_KB_ID,
dataSourceId=DATA_SOURCE_ID
)- Lambda Timeout: Increase timeout in deployment (default: 900s)
- Memory Errors: Increase Lambda memory (default: 1024MB)
- IAM Permissions: Ensure role has S3 and CloudWatch access
- Python Version Mismatch: Use Python 3.10 for compatibility
- Knowledge Base Not Found: Verify KB ID and region settings
# Check Lambda logs
monitor_lambda_processing(logs_client, s3_client, bucket)
# Verify S3 outputs
s3_client.list_objects_v2(Bucket=bucket, Prefix='output/')
# Test chunk-based search
results = search_medical_chunks("test query", s3_client, bucket)
# Test knowledge base search
test_result = search_knowledge_base("test query")