How to connect your Python project to your Cloud data lake
Blogs

How to connect your Python project to your Cloud data lake

Share Article:

If you are working on complex data queries on Python by using the underlying data in Azure or AWS datalake, please read further.

Step 1 Install Python for Windows

  • Download Python from the official Python website.
  • Run the installer and check “Add Python to PATH.”
  • Verify installation by running python --version in Command Prompt

For MAC

  • Open Terminal
  • Install Python using Homebrew
brew install python

Step 2 Connect Python with Cloud Services (Azure Data Lake, SQL, AWS S3)

  • Install necessary Python packages
  • Connecting to Azure Data Lake

Use the azure-storage-blob package to access data

  • Connecting to SQL Database
  • Use SQLAlchemy to connect to SQL databases
  • Connecting to AWS S3

Use Boto3 for accessing AWS S3

import boto3

s3 = boto3.client('s3', aws_access_key_id='your-access-key', aws_secret_access_key='your-secret-key')
bucket_name = 'your-bucket'
obj = s3.get_object(Bucket=bucket_name, Key='your-file.csv')
data = pd.read_csv(obj['Body'])

Step 3 Write Mathematical Python Script using Numpy/Pandas

  • Use NumPy or Pandas to process the data
import pandas as pd
import numpy as np

# Load data
df = pd.read_csv('your_data_file.csv')

# Perform some operations
df['new_column'] = np.sqrt(df['some_column'])

Step 4 Test the Output

  • Write test cases for your Python script using unittest or simple print statements.

Step 5 Create a Webpage to Display Output

Use Flask to create a simple webpage.

pip install flask

Flask app to display the output

from flask import Flask, render_template
import pandas as pd

app = Flask(name)

@app.route(‘/’)
def index():
df = pd.read_csv(‘your_data_file.csv’)
data = df.to_html()
return render_template(‘index.html’, tables=[data])

if name == ‘main‘:
app.run(debug=True)

HTML file (index.html) for rendering the table:

 

Step 6 Deploy the Application on Amazon EC2

Set up an EC2 instance

Log in to AWS Console.
  • Launch an Amazon EC2 instance with a Linux AMI.
  • Configure Security Groups to allow HTTP (port 80) and SSH (port 22) access.
  • Connect to the instance using SSH:
ssh -i "your-key.pem" ec2-user@your-ec2-public-ip
Install necessary software
  • Update the EC2 instance and install Python and Flask
sudo yum update -y
sudo yum install python3 -y
pip3 install flask pandas numpy boto3 azure-storage-blob
Transfer your project
  • Use SCP to transfer your project files to the EC2 instance
scp -i "your-key.pem" -r /path/to/your/project ec2-user@your-ec2-public-ip:/home/ec2-user/
Run the Flask App on EC2
  • Navigate to your project folder on the EC2 instance and run the Flask app
Configure EC2 for public access
  • Install and configure Nginx or use Gunicorn to serve the app on port 80
Access the App via Public URL
  • Your Flask app will now be live on the public IP address of the EC2 instance, e.g., http://your-ec2-public-ip/
Summary of Steps
  1. Install Python on both Windows and Mac.
  2. Connect to cloud data sources (Azure, SQL, AWS S3) using appropriate libraries.
  3. Process the data using Pandas/NumPy.
  4. Test the Python script output.
  5. Create a webpage using Flask to display the output.
  6. Deploy the Flask app on Amazon EC2 and access it via the live URL

Struggling with your Python project?

Reach out to our Python and Laravel experts at [email protected]


Share Article:

Leave a Reply

Your email address will not be published. Required fields are marked *

  • Let's Talk