Cloud Computing Basics¶
Mental Model
Cloud computing is renting someone else's computers by the hour instead of buying your own. You trade upfront cost and maintenance for on-demand flexibility, paying only for what you use. The key decision is how much control you need: full machines (IaaS), just a platform (PaaS), or a finished product (SaaS).
What is Cloud Computing?¶
Cloud computing provides on-demand access to computing resources over the internet, without owning physical hardware.
``` Traditional (On-Premises): Cloud:
┌─────────────────────────┐ ┌─────────────────────────┐ │ Your Data Center │ │ Cloud Provider │ │ │ │ │ │ Buy servers │ │ Rent by the hour │ │ Maintain hardware │ │ No maintenance │ │ Fixed capacity │ │ Elastic scaling │ │ Upfront cost │ │ Pay-as-you-go │ │ Full control │ │ Managed services │ └─────────────────────────┘ └─────────────────────────┘ ```
Service Models¶
IaaS, PaaS, SaaS¶
``` Cloud Service Stack:
┌─────────────────────────────────────────────────────────────┐ │ SaaS (Software as a Service) │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Gmail, Dropbox, Salesforce, Slack │ │ │ │ You manage: Just use it │ │ │ │ Provider manages: Everything │ │ │ └─────────────────────────────────────────────────────┘ │ ├─────────────────────────────────────────────────────────────┤ │ PaaS (Platform as a Service) │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ Heroku, Google App Engine, AWS Elastic Beanstalk │ │ │ │ You manage: Code, data │ │ │ │ Provider manages: Runtime, OS, servers │ │ │ └─────────────────────────────────────────────────────┘ │ ├─────────────────────────────────────────────────────────────┤ │ IaaS (Infrastructure as a Service) │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ AWS EC2, Google Compute, Azure VMs │ │ │ │ You manage: OS, runtime, code, data │ │ │ │ Provider manages: Virtualization, servers, network │ │ │ └─────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ```
Major Cloud Providers¶
| Provider | Strengths | Key Services |
|---|---|---|
| AWS | Largest, most services | EC2, S3, Lambda |
| Google Cloud | ML/AI, data analytics | BigQuery, TPUs |
| Azure | Enterprise, Microsoft integration | VMs, Active Directory |
Core Cloud Services¶
Compute¶
``` Compute Options:
Virtual Machines (IaaS): - Full control, any OS - AWS EC2, GCP Compute Engine, Azure VMs - Pay by hour, various sizes
Containers: - Package app + dependencies - AWS ECS/EKS, GCP GKE, Azure AKS - Kubernetes orchestration
Serverless (Functions): - Run code without servers - AWS Lambda, GCP Functions, Azure Functions - Pay per execution, auto-scales ```
Storage¶
``` Storage Types:
Object Storage (files, media): - AWS S3, GCP Cloud Storage, Azure Blob - Unlimited capacity, pay per GB - Access via HTTP/API
Block Storage (VM disks): - AWS EBS, GCP Persistent Disk - Attached to VMs - SSD or HDD options
Database: - Managed SQL: AWS RDS, Cloud SQL - NoSQL: DynamoDB, Firestore, CosmosDB - No maintenance, automatic backups ```
Python in the Cloud¶
AWS with Boto3¶
```python import boto3
S3: Upload/download files¶
s3 = boto3.client('s3')
Upload file¶
s3.upload_file('local_file.csv', 'my-bucket', 'data/file.csv')
Download file¶
s3.download_file('my-bucket', 'data/file.csv', 'downloaded.csv')
List objects¶
response = s3.list_objects_v2(Bucket='my-bucket', Prefix='data/') for obj in response.get('Contents', []): print(obj['Key'], obj['Size']) ```
Google Cloud¶
```python from google.cloud import storage, bigquery
Cloud Storage¶
storage_client = storage.Client() bucket = storage_client.bucket('my-bucket')
Upload¶
blob = bucket.blob('data/file.csv') blob.upload_from_filename('local_file.csv')
BigQuery¶
bq_client = bigquery.Client()
query = """
SELECT * FROM project.dataset.table
WHERE date > '2024-01-01'
"""
df = bq_client.query(query).to_dataframe()
```
Azure¶
```python from azure.storage.blob import BlobServiceClient from azure.identity import DefaultAzureCredential
Blob Storage¶
credential = DefaultAzureCredential() blob_service = BlobServiceClient( account_url="https://myaccount.blob.core.windows.net", credential=credential )
Upload¶
container = blob_service.get_container_client("my-container") with open("local_file.csv", "rb") as data: container.upload_blob("data/file.csv", data) ```
Cloud for Machine Learning¶
Managed ML Services¶
``` ML Platforms:
┌─────────────────────────────────────────────────────────────┐ │ AWS SageMaker │ │ - Managed notebooks │ │ - Built-in algorithms │ │ - Model training and deployment │ │ - GPU instances available │ └─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐ │ Google Vertex AI │ │ - AutoML (no-code ML) │ │ - Custom training │ │ - TPU access │ │ - ML pipelines │ └─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐ │ Azure ML Studio │ │ - Drag-and-drop ML │ │ - Automated ML │ │ - Integration with VS Code │ └─────────────────────────────────────────────────────────────┘ ```
GPU Instances¶
```python
Example: Launch GPU instance for training¶
AWS EC2 GPU instances:¶
- p3.2xlarge: 1x V100 (16GB) - ~$3/hour¶
- p3.8xlarge: 4x V100 - ~$12/hour¶
- p4d.24xlarge: 8x A100 - ~$33/hour¶
Google Cloud:¶
- a2-highgpu-1g: 1x A100 - ~$3/hour¶
- a2-highgpu-8g: 8x A100 - ~$25/hour¶
Can also attach GPUs to regular VMs:¶
# gcloud compute instances create my-gpu-vm \ # --machine-type=n1-standard-8 \
--accelerator=type=nvidia-tesla-v100,count=1¶
```
Serverless Computing¶
AWS Lambda Example¶
```python
lambda_function.py¶
import json import boto3
def lambda_handler(event, context): """Process uploaded S3 file."""
# Get bucket and key from event
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['key']
# Process file
s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
content = response['Body'].read().decode('utf-8')
# Do something with content
line_count = len(content.split('\n'))
return {
'statusCode': 200,
'body': json.dumps({
'file': key,
'lines': line_count
})
}
```
When to Use Serverless¶
``` Good for Serverless: ✓ Event-driven processing ✓ Variable/unpredictable load ✓ Short-running tasks (<15 min) ✓ API endpoints
Not Good for Serverless: ✗ Long-running jobs ✗ GPU computation ✗ Stateful applications ✗ Constant high load (cost) ```
Cost Management¶
Pricing Models¶
``` Cloud Pricing:
On-Demand: - Pay by hour/second - Most expensive - Maximum flexibility
Reserved (1-3 years): - 30-75% cheaper - Commitment required - Good for steady workloads
Spot/Preemptible: - 60-90% cheaper - Can be interrupted - Good for fault-tolerant batch jobs ```
Cost Optimization Tips¶
```python
1. Use spot instances for training¶
2. Right-size instances (don't over-provision)¶
3. Auto-scaling for variable loads¶
4. Use serverless for intermittent workloads¶
5. Choose appropriate storage tier¶
Example: S3 storage tiers¶
Standard: $0.023/GB/month (frequent access)¶
Infrequent Access: $0.0125/GB/month¶
Glacier: $0.004/GB/month (archival)¶
Glacier Deep: $0.00099/GB/month (rarely accessed)¶
```
Getting Started¶
Local Development → Cloud¶
```python
1. Develop locally¶
python train.py --data local_data.csv
2. Test with cloud storage¶
python train.py --data s3://bucket/data.csv
3. Run on cloud compute¶
Deploy to EC2/GCE or use managed service¶
4. Scale up¶
Increase instance size or use multiple instances¶
```
Basic Cloud Workflow¶
1. Create cloud account
2. Set up credentials/authentication
3. Install SDK (boto3, google-cloud, azure)
4. Store data in cloud storage
5. Launch compute instance or use managed service
6. Run workload
7. Download results / deploy model
8. Shut down resources!
Summary¶
| Service Type | What It Provides | Example |
|---|---|---|
| IaaS | Virtual machines | EC2, Compute Engine |
| PaaS | Application platform | Heroku, App Engine |
| SaaS | Complete applications | Gmail, Dropbox |
| Storage | Files and databases | S3, Cloud Storage |
| Serverless | Run code on events | Lambda, Functions |
| ML Platform | Managed ML training | SageMaker, Vertex AI |
Key takeaways:
- Cloud provides flexible, on-demand computing
- Pay-as-you-go vs large upfront investment
- Choose service level based on control needs
- Use managed services to reduce operational burden
- Monitor costs—easy to overspend
- Spot/preemptible instances for cost savings
- Python SDKs available for all major clouds
Exercises¶
Exercise 1. Explain the difference between IaaS, PaaS, and SaaS. Give an example of each relevant to data science.
Solution to Exercise 1
```python
Conceptual solution - see page content for details¶
import sys import platform
print(f"Python version: {sys.version}") print(f"Platform: {platform.platform()}") print(f"Architecture: {platform.machine()}") ```
Exercise 2. Name three major cloud providers and one service from each that is commonly used for machine learning workloads.
Solution to Exercise 2
See the main content for the detailed explanation. The key concept involves understanding the hardware-software interaction and how it affects Python performance.
Exercise 3. Explain the concept of horizontal scaling (scaling out) vs vertical scaling (scaling up). When would you use each?
Solution to Exercise 3
```python import time
Simple benchmark¶
n = 10_000_000 start = time.perf_counter() total = sum(range(n)) elapsed = time.perf_counter() - start print(f"Sum of {n} integers: {total}") print(f"Time: {elapsed:.4f} seconds") ```
Exercise 4. Write Python code that demonstrates a simple client-server interaction using requests to call a REST API endpoint.
Solution to Exercise 4
```python import numpy as np import time
n = 1_000_000
Python loop¶
start = time.perf_counter() result_py = sum(i * i for i in range(n)) time_py = time.perf_counter() - start
NumPy vectorized¶
arr = np.arange(n) start = time.perf_counter() result_np = np.sum(arr * arr) time_np = time.perf_counter() - start
print(f"Python: {time_py:.4f}s, NumPy: {time_np:.4f}s") print(f"Speedup: {time_py / time_np:.1f}x") ```