Jupyter Tricks

25 May 2017

Jupyter Tricks Here’s a list of my top-used Juypter tricks, and what they do. UI I find the UI to be intuitive, Help > User Interface Tour describes more. There are command (enter by pressing the escape button or clicking outside of a cell) and edit (enter by typing in a cell) modes. You can tell you’re in edit mode if the “pencil” corner indicator is present: It’s also faster to use the commands as listed in Help > Keyboard Shortcuts; with those you can also remove the toolbar wi... Read More

My Princeton Senior Thesis Submitted to the university as part of completion of Computer Science BSE degree June 2017 Completed during the 2016-2017 academic year. A concise and more up-to-date paper version. Link to download report. Code repository. Read More

The Semaphore Barrier This is the answer post to the question posed here. A Useful Formalism Reasoning about parallel systems is tough, so to make sure that our solution is correct we’ll have to introduce a formalism for parallel execution. The notion is the following. Given some instructions for threads \(\{t_i\}_{i=0}^{n-1}\), we expect each thread’s individual instructions to execute in sequence, but instructions between threads can be interleaved arbitrarily. In our simplified execut... Read More

The Semaphore Barrier

24 Jan 2017

The Semaphore Barrier I wanted to share an interview question I came up with. The idea came from my operating and distributed systems classes, where we were expected to implement synchronization primitives and reason about concurrency, respectively. Synchronization primitives can be used to coordinate across multiple threads working on a task in parallel. Most primitives can be implemented through the use of a condition variable and lock, but I was wondering about implementing other primit... Read More

My Princeton Junior Year Research Unpublished Submitted to the university as part of completion of Computer Science BSE degree January 2016 Completed during fall semester 2015-2016 Link to download report. Read More

MapReduce

17 Sep 2016

MapReduce: Simplified Data Processing on Large Clusters Published December 2004 Paper link Abstract MapReduce offers an abstraction for large-scale computation by managing the scheduling, distribution, parallelism, partitioning, communication, and reliability in the same way to applications adhering to a template for execution. Introduction Programming Model MR offers the application-level programmer two operations through which to express their large-scale computation. Note: the type... Read More

Ad Click Prediction

17 Jul 2016

Ad Click Prediction: a View from the Trenches Published August 2013 Paper link Abstract Introduction Brief System Overview Problem Statement For any given a query, ad, and associated interaction and metadata represented as a real feature vector \(\textbf{x}\in\mathbb{R}^d\), provide an estimate of the probability \(\mathbb{P}(\text{click}(\textbf{x}))\)that the user making the query will click on the ad. Solving this problem has beneficial implications for ad auction pricing in Google’... Read More

Ligra

09 Jul 2016

Ligra: A Lightweight Graph Processing Framework for Shared Memory Published February 2013 Paper link Abstract Ligra graph processing goals: Single machine, shared memory, multicore Lightweight BFS Mapping over vertex subsets Mapping over edge subsets Adapt to varying vertex degrees Introduction Motivation for Single Machine Framework Lower communication cost allows for performance benefits. Current demands do not necessitate the distributed computation framework previou... Read More