Articles by category: machine-learning


graph-processing parallel distributed-systems online-learning my-whitepapers hardware-acceleration interview-question tools optimization deep-learning numpy-gems joke-post philosophy causal statistics numpy tricks history

This month, I posted a blog entry on Sisu’s engineering blog post. I discuss an effective strategy for lossless column reduction on sparse datasets.Check out the blog post there. Read More

Multiplicative weights is a simple, randomized algorithm for picking an option among \(n\) choices against an adversarial environment.The algorithm has widespread applications, but its analysis frequently introduces a learning rate parameter, \(\epsilon\), which we’ll be trying to get rid of.In this first post, we introduce multiplicative weights and make some practical observations. We follow Arora’s survey for the most part.Problem SettingWe play \(T\) rounds. On the \(t\)-th round, the pla... Read More

Compressed Sensing and SubgaussiansCandes and Tao came up with a broad characterization of compressed sensing solutions a while ago. Partially inspired by a past homework problem, I’d like to explore an area of this setting.This post will dive into the compressed sensing context and then focus on a proof that squared subgaussian random variables are subexponential (the relation between the two will be explained).Compressed SensingFor context, we’re interested in the setting where we observe a... Read More

Subgaussian ConcentrationThis is a quick write-up of a brief conversation I had with Nilesh Tripuraneni and Aditya Guntuboyina a while ago that I thought others might find interesting.This post focuses on the interplay between two types of concentration inequalities. Concentration inequalities usually describe some random quantity \(X\) as a constant \(c\) which it’s frequently near (henceforth, \(c\) will be our stand-in for some constant which possibly changes equation-to-equation). Basical... Read More

Beating TensorFlow Training in-VRAMIn this post, I’d like to introduce a technique that I’ve found helps accelerate mini-batch SGD training in my use case. I suppose this post could also be read as a public grievance directed towards the TensorFlow Dataset API optimizing for the large vision deep learning use-case, but maybe I’m just not hitting the right incantation to get tf.Dataset working (in which case, drop me a line). The solution is to TensorFlow harder anyway, so this shouldn’t reall... Read More

Non-convex First Order MethodsThis is a high-level overview of the methods for first order local improvement optimization methods for non-convex, Lipschitz, (sub)differentiable, and regularized functions with efficient derivatives, with a particular focus on neural networks (NNs).\[\argmin_\vx f(\vx) = \argmin_\vx \frac{1}{n}\sum_{i=1}^nf_i(\vx)+\Omega(\vx)\]Make sure to read the general overview post first. I’d also reiterate as Moritz Hardt has that one should be wary of only looking at con... Read More

Neural Network Optimization MethodsThe goal of this post and its related sub-posts is to explore at a high level how the theoretical guarantees of the various optimization methods interact with non-convex problems in practice, where we don’t really know Lipschitz constants, the validity of the assumptions that these methods make, or appropriate hyperparameters. Obviously, a detailed treatment would require delving into intricacies of cutting-edge research. That’s not the point of this post, w... Read More

My Princeton Senior ThesisSubmitted to the university as part of completion of Computer Science BSE degree June 2017Completed during the 2016-2017 academic year.A concise and more up-to-date paper version.Link to download report.Code repository. Read More

My Princeton Junior Year ResearchUnpublishedSubmitted to the university as part of completion of Computer Science BSE degree January 2016Completed during fall semester 2015-2016Link to download report. Read More

Ad Click Prediction

17 Jul 2016

Ad Click Prediction: a View from the TrenchesPublished August 2013Paper linkAbstractIntroductionBrief System OverviewProblem StatementFor any given a query, ad, and associated interaction and metadata represented as a real feature vector \(\textbf{x}\in\mathbb{R}^d\), provide an estimate of the probability \(\mathbb{P}(\text{click}(\textbf{x}))\)that the user making the query will click on the ad. Solving this problem has beneficial implications for ad auction pricing in Google’s online adver... Read More