Monish Keswani

All deep-learning floating-point bf16 mixed-precision training cnn batch-normalization relu convolution backpropagation linear-algebra neural-networks convolutions image-processing llm claude prompt-engineering productivity machine-learning ensemble-methods decision-trees edge-ai quantization preprocessing

Understanding Floating Point Precision, BF16, and Why Deep Learning Training Still Works

March 8, 2026 · 3 min read

deep-learning floating-point bf16 mixed-precision training

An explanation of floating point spacing, BF16 precision limitations, and how mixed precision training enables modern deep learning despite low-precision formats.
Conv → BatchNorm → ReLU: Why This Order is Standard in CNNs

March 8, 2026 · 2 min read

deep-learning cnn batch-normalization relu convolution

An explanation of why the Conv → BatchNorm → ReLU ordering is standard in CNNs, grounded in mathematical reasoning and empirical results from modern deep learning architectures.
Understanding Gradients in a Linear Layer: Why dL/dW = g xᵀ

March 7, 2026 · 2 min read

deep-learning backpropagation linear-algebra neural-networks

A step-by-step derivation of the weight gradient in a linear layer, explaining why the 3D Jacobian tensor collapses into a simple outer product during backpropagation.
2D Convolution in Image Processing — Complete Summary

March 6, 2026 · 4 min read

deep-learning convolutions image-processing cnn

A complete guide to 2D convolution covering kernels, padding, stride, output size, and why modern CNNs prefer 3x3 filters.
Optimizing Claude Token Usage: A Practical Guide to Context Management

February 28, 2026 · 5 min read

llm claude prompt-engineering productivity

Learn how Claude uses context tokens and how to optimize them using compacting, task isolation, and tool management.
Bias-Variance Tradeoff: From Theory to Ensemble Methods

February 15, 2026 · 5 min read

machine-learning ensemble-methods decision-trees

A deep dive into the bias-variance decomposition, decision trees, bagging, boosting, and XGBoost — with clear math and intuition.
Quantization in CNNs: From FP32 Training to INT8 Deployment

February 14, 2026 · 4 min read

edge-ai deep-learning quantization

A practical walkthrough of how CNN weights go from 32-bit floating point to 8-bit integers — and why it barely hurts accuracy.
PCA and ZCA Whitening: A Comprehensive Study Guide

February 14, 2026 · 8 min read

machine-learning linear-algebra preprocessing

Understanding whitening transforms — from eigenvalue decomposition to PCA and ZCA whitening — with clear math and intuition.
Depthwise and Pointwise Convolutions: A Practical Guide for Edge AI

February 14, 2026 · 2 min read

edge-ai deep-learning convolutions

How depthwise and pointwise convolutions enable efficient CNN deployment on edge devices.