AWS Lambda Demo on File Processing

VAIBHAV KURKUTE
3 min readMay 21, 2023

A) Resource Creation — Lambda

i)

ii)

iii)

iv) Create a Role for S3 Write Access

v) for s3 only for specific operations Create a New Policy
(Under Resource Select Specific S3 Bucket to provide ARN)
(Also add “AWSLambdaBasicExecutionRole” policy for cloudwatch log)

vi) Add Policy to Role and Then Role to Lambda

v) After Lambda Creation add
(You will get URL under the trigger section of Lambda Configuration)

vii) Here is sample code to check if file upload and present working

def lambda_handler(event, context):
# Check if a file is present in the event
if ‘file’ not in event:
return {
‘statusCode’: 301,
‘body’: ‘No file present in the request.’
}

file = event[‘file’]
# Extract the original file name and extension
filename, extension = file[‘filename’].rsplit(‘.’, 1)

return {
‘statusCode’: 200,
‘body’: ‘File extension: {}’.format(extension)
}

vii) Here is Code for Doc to PDF Conversion
(Replace <>BUCKET_NAME_HERE<>)

import boto3
import io
from flask import Flask, request, jsonify

app = Flask(__name__)
s3 = boto3.client(‘s3’)

@app.route(‘/’, methods=[‘POST’])
def process_document():
# Check if a file is present in the request
if ‘file’ not in request.files:
return jsonify({‘success’: False, ‘message’: ‘No file present in the request.’})

file = request.files[‘file’]
# Check if the file size is less than 5 MB
if file.content_length > 5 * 1024 * 1024:
return jsonify({‘success’: False, ‘message’: ‘File size exceeds the limit of 5 MB.’})

try:
# Read the file contents
file_data = file.read()

# Extract the original file name
filename = file.filename

# Convert the file to PDF using the AWS Textract service
textract_client = boto3.client(‘textract’)
response = textract_client.start_document_text_detection(
Document={‘Bytes’: file_data}
)
job_id = response[‘JobId’]

# Wait for the document conversion to complete
textract_client.get_waiter(‘document_text_detection_completed’).wait(JobId=job_id)
result = textract_client.get_document_text_detection(JobId=job_id)

# Extract the text from the Textract response
text = “”
for item in result[‘Blocks’]:
if item[‘BlockType’] == ‘LINE’:
text += item[‘Text’] + “\n”

# Create a PDF file from the extracted text
pdf_data = text.encode(‘utf-8’)

# Save the PDF file to S3 bucket with the original file name
s3.put_object(
Body=pdf_data,
Bucket=’<>BUCKET_NAME_HERE<>’,
Key=’files/’ + filename
)

return jsonify({‘success’: True, ‘message’: ‘File uploaded and converted successfully.’})
except Exception as e:
return jsonify({‘success’: False, ‘message’: str(e)})

if __name__ == ‘__main__’:
app.run()

Monitor Logs of Failure in CLoudwatch

Also Package this Python code this contain dependency

--

--