Why Python for AI
Python is the glue of today’s AI stack. It’s where:
- Data is loaded, cleaned, and vectorized (pandas, polars, numpy).
- Models are trained and evaluated (scikit-learn, PyTorch, TensorFlow).
- Foundation models are integrated (transformers, OpenAI/Hugging Face clients).
- Apps are served (FastAPI), orchestrated (Airflow, Prefect), and monitored (MLflow, W&B).
This post shows how Python plugs into each layer and walks you through building three production-ready mini apps you can run locally today.
The AI integration map
- Data layer: pandas/polars for tabular; Pillow/OpenCV for images; PyMuPDF/pdfminer for PDFs; datasets for corpora.
- Feature/embedding layer: scikit-learn for classical features; sentence-transformers for text embeddings; torchvision for image transforms.
- Model layer: scikit-learn for classic ML; PyTorch/TensorFlow for deep learning; transformers/OpenAI API for LLMs.
- Retrieval layer: FAISS for in-memory/vector search; SQLite/Postgres + pgvector; Pinecone/Weaviate/Chroma for managed/local vectors.
- Serving layer: FastAPI + Uvicorn; Celery/RQ for background jobs; Docker for packaging; CUDA/ROCm for GPU.
- Observability & MLOps: MLflow/W&B for experiments; Prometheus/Grafana for metrics; Sentry/OpenTelemetry for tracing.
Quick environment setup
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install --upgrade pip
# Core packages used below (CPU builds)
pip install fastapi uvicorn[standard] pydantic numpy pandas scikit-learn joblib pillow
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# RAG & LLM integration
pip install sentence-transformers faiss-cpu openai python-dotenv
Tips:
- Reproducibility: pin versions in requirements.txt and set seeds.
- CPUs are fine to start; add GPU wheels later for acceleration.
App 1: A robust tabular ML pipeline with scikit-learn
Shows classic Python→AI flow: data prep → modeling → evaluation → persistence.
# file: tabular_train.py
import numpy as np
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import classification_report
from joblib import dump
np.random.seed(42)
# Example dataset: Titanic from OpenML
import sklearn
from sklearn.datasets import fetch_openml
Xy = fetch_openml('titanic', version=1, as_frame=True)
df = Xy.frame.drop(columns=['body']) # remove high-missingness col for demo
# Target may be string; coerce to int 0/1
df = df.dropna(subset=['survived'])
df['survived'] = df['survived'].astype(int)
y = df['survived']
X = df.drop(columns=['survived'])
num_cols = X.select_dtypes(include=['number']).columns.tolist()
cat_cols = X.select_dtypes(exclude=['number']).columns.tolist()
pre = ColumnTransformer([
('num', StandardScaler(with_mean=False), num_cols),
('cat', OneHotEncoder(handle_unknown='ignore'), cat_cols),
])
model = Pipeline([
('prep', pre),
('clf', LogisticRegression(max_iter=1000))
])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)
scores = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')
print(f"CV accuracy: {scores.mean():.3f} ± {scores.std():.3f}")
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
dump(model, 'titanic_pipeline.joblib')
print('Saved model to titanic_pipeline.joblib')
Why this matters:
- ColumnTransformer is the cleanest way to mix numeric and categorical features.
- Pipelines keep transforms and model together, making serving and retraining consistent.
Serve it quickly:
# file: tabular_api.py
from fastapi import FastAPI
from pydantic import BaseModel
from joblib import load
app = FastAPI()
pipe = load('titanic_pipeline.joblib')
class Passenger(BaseModel):
pclass: float
age: float | None = None
sibsp: float | None = None
parch: float | None = None
fare: float | None = None
sex: str | None = None
embarked: str | None = None
@app.post('/predict')
async def predict(p: Passenger):
X = {k: [getattr(p, k)] for k in p.model_fields}
pred = pipe.predict(X)[0]
proba = float(pipe.predict_proba(X)[0][int(pred)])
return {"survived": int(pred), "confidence": round(proba, 4)}
Run: uvicorn tabular_api:app --reload
App 2: An image classifier API with PyTorch + FastAPI
Turn a pre-trained CNN into a usable service.
# file: vision_api.py
import io
from fastapi import FastAPI, File, UploadFile
from PIL import Image
import torch
from torchvision import models, transforms
app = FastAPI()
device = 'cpu' # switch to 'cuda' if you have a GPU build
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
model.eval().to(device)
preprocess = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
# Load ImageNet class names
import urllib.request, json
with urllib.request.urlopen('https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt') as f:
idx_to_label = [l.strip() for l in f.read().decode().splitlines()]
@app.post('/classify')
async def classify(file: UploadFile = File(...)):
img_bytes = await file.read()
img = Image.open(io.BytesIO(img_bytes)).convert('RGB')
x = preprocess(img).unsqueeze(0).to(device)
with torch.no_grad():
logits = model(x)
probs = torch.softmax(logits, dim=1)[0]
top5 = torch.topk(probs, k=5)
results = [
{"label": idx_to_label[idx.item()], "prob": float(probs[idx])}
for idx in top5.indices
]
return {"top5": results}
Run: uvicorn vision_api:app --reload
Production tips:
- Batch requests where feasible.
- For CPU-only deployments, consider TorchScript, ONNX Runtime, or quantization for speed.
App 3: A doc-chat RAG service (FAISS + sentence-transformers + optional LLM)
Use embeddings to retrieve relevant passages and optionally ask an LLM to answer with citations.
3.1 Build the index
# file: rag_build.py
import os, json, glob
from sentence_transformers import SentenceTransformer
import faiss
MODEL_NAME = 'sentence-transformers/all-MiniLM-L6-v2'
DOC_GLOB = 'docs/**/*.txt' # add .md/.pdf with your own loaders
model = SentenceTransformer(MODEL_NAME)
paths = glob.glob(DOC_GLOB, recursive=True)
chunks, meta = [], []
# naive splitter
def split_text(text, size=500, overlap=100):
out = []
start = 0
while start < len(text):
out.append(text[start:start+size])
start += size - overlap
return [c.strip() for c in out if c.strip()]
for p in paths:
with open(p, 'r', encoding='utf-8', errors='ignore') as f:
text = f.read()
for i, ch in enumerate(split_text(text)):
chunks.append(ch)
meta.append({"path": p, "chunk": i})
emb = model.encode(chunks, convert_to_numpy=True, normalize_embeddings=True)
index = faiss.IndexFlatIP(emb.shape[1])
index.add(emb)
faiss.write_index(index, 'rag.index')
with open('rag_meta.json', 'w', encoding='utf-8') as f:
json.dump({"chunks": chunks, "meta": meta}, f)
print(f'Indexed {len(chunks)} chunks from {len(paths)} files')
3.2 Serve Q&A
# file: rag_api.py
import os, json
from fastapi import FastAPI
from pydantic import BaseModel
from sentence_transformers import SentenceTransformer
import faiss
# Optional: LLM via OpenAI
from dotenv import load_dotenv
load_dotenv()
OPENAI = os.getenv('OPENAI_API_KEY')
client = None
if OPENAI:
from openai import OpenAI
client = OpenAI()
EMB_MODEL = 'sentence-transformers/all-MiniLM-L6-v2'
app = FastAPI()
index = faiss.read_index('rag.index')
with open('rag_meta.json', 'r', encoding='utf-8') as f:
store = json.load(f)
CHUNKS, META = store['chunks'], store['meta']
emb_model = SentenceTransformer(EMB_MODEL)
class Query(BaseModel):
question: str
k: int = 4
@app.post('/ask')
async def ask(q: Query):
qv = emb_model.encode([q.question], normalize_embeddings=True)
D, I = index.search(qv, q.k)
ctx = [CHUNKS[i] for i in I[0]]
cites = [META[i] for i in I[0]]
if client:
prompt = (
"You are a helpful assistant. Answer USING ONLY the context. "
"Cite sources as (path:chunk). If unknown, say you don't know.\n\n"
f"Context:\n{chr(10).join(ctx)}\n\nQuestion: {q.question}"
)
completion = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{"role": "user", "content": prompt}],
temperature=0.2,
)
answer = completion.choices[0].message.content
else:
# Fallback: just return best passages
answer = "\n\n".join(ctx)
return {"answer": answer, "citations": cites}
Run sequence:
- Put .txt docs under docs/.
python rag_build.pyuvicorn rag_api:app --reload- POST to /ask with {"question":"..."}
Production tips:
- Swap FAISS for pgvector or a managed vector DB for persistence.
- Add chunk caching and request-level timeouts.
- Add guardrails: prompt templates, max context tokens, content filtering.
Deployment and performance
- Containerize:
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
- CPU optimization: ONNX Runtime, torch.compile (PyTorch 2.x), quantization (bitsandbytes/int8 for transformers), smaller embedding models.
- GPU optimization: use CUDA images; enable mixed precision (fp16/bf16); batch requests; warm model on startup.
- Concurrency: use async FastAPI endpoints; prefer one model per process pinned to a device; queue for backpressure.
Observability, testing, and safety
- Metrics: prometheus-fastapi-instrumentator + Grafana dashboards.
- Tracing: OpenTelemetry.
- Experiment tracking: MLflow or Weights & Biases.
- Tests: pytest for endpoints and data contracts; seed RNG for determinism.
- Security: never log raw inputs if they may contain PII; store secrets in env vars or a secret manager; validate payload sizes and MIME types.
Example: metrics in FastAPI
from prometheus_fastapi_instrumentator import Instrumentator
Instrumentator().instrument(app).expose(app)
A simple project layout
ai-app/
app/ # FastAPI apps
tabular_api.py
vision_api.py
rag_api.py
models/ # Saved models and indexes
titanic_pipeline.joblib
rag.index
rag_meta.json
docs/ # Your source documents for RAG
scripts/
tabular_train.py
rag_build.py
requirements.txt
Dockerfile
README.md
What to build next
- A meeting-notes assistant: transcribe audio (faster-whisper), summarize (LLM), and store key decisions via a vector DB.
- A visual search feature: embed product images and instantly find similar items.
- A secure internal policy bot: RAG over your company handbook with role-based access.
With these patterns—data → embed → model → retrieve → serve—you can integrate Python into nearly any AI workflow and ship useful apps quickly.