Fooled by Average Customer Revenue

Let’s say you are running a business. You have just on-boarded an (expensive) new agency for acquiring customers. After a month, you want to decide if you should continue using them. You do this by comparing the average revenue of customers brought in by them agency, to average revenue of customers from your existing channels. How many customers from that agency do you need, to make an informed decision? [Read More]

QUALIFY clause is now available in BigQuery

When browsing the BigQuery docs, I discovered that BigQuery now supports the QUALIFY-clause. At time of writing, it’s in preview, but you can already try it out.

The QUALIFY-clause lets you filter on the output of analytic functions. This allows me to scratch an itch I’ve had for a while:

For each distinct value in column X, find the row with max value in column Y

A common pattern is: “For each distinct value in column X, find the row where column Y takes it’s maximum value”. You can accomplish this by using ROW_NUMBER() OVER (PARTITION BY X ORDER BY Y DESC) AS rank.

An example of this would be to find the date at which each of a set of countries reported the largest number of new confirmed covid-19 cases. We’ll be using the bigquery-public-data.covid19_open_data dataset.

[Read More]
sql  big-query  gcp  bq 

Lenovo Yoga Slim 7 Pro running Fedora 34 (with WiFi)

I’ve been on the hunt for a $1000 Linux UltraBook for personal use. I finally decided on a Lenovo Yoga Slim 7 Pro, with a Ryzen 7 5800 CPU and 16 GB RAM. I’ll leave the in-depth hardware review for others, but I’ll offer a nano-review: I’m happy with both build quality and specs. Best value for money I’ve seen in a while!

I’m less happy about the Realtek 8852 WiFi module not having built-in support in the 5.13 Linux kernel. Everything else worked out of the box with a fresh installation of Fedora 34 (my distro of choice).

Drivers for the Realtek 8852 module can be downloaded here, but the included Makefile doesn’t work with Fedora if you’re using secure boot. I’ll provide some quick notes below on how to install the drivers on Fedora 34.

You’ll still need to install a few packages, and get code on your computer. Try using your phone as a USB tethered modem. If you need to purchase additional hardware (USB network interface), consider just replacing the WiFi module at probably the same cost.

[Read More]
linux 

Mutual Information – Example with categorical variables

Mutual information and its cousin, the Uncertainty coefficient (Theil’s U) are useful tools from Information Theory for discovering dependencies between variables that are not necessary described by a linear relationship.

Plenty of good material already exists on the subject: see Section 1.6 in “Pattern Recognition and Machine Learning” by Bishop, freely available as PDF online. What I hope to contribute here are some trivial examples, to help build intuition, based on categorical/multinomial variables.

[Read More]

KL Divergence Online Demo

To try out Shiny, I created an interactive visualization for Kullback-Leibler divergence (or KL Divergence). Right now, it only supports two univariate Gaussians, which should be sufficient to build some intuition. [Read More]

Proving the set-covering problem is NP-complete (using reduction from the vertex cover problem)

In this post, we will prove that the decision version of the set-covering problem is NP-complete, using a reduction from the vertex covering problem (which is NP-complete). This is Exercise 35.3-2 in CLRS3.

We will follow the template given in an earlier post.

Problem Statement

We seek to answer whether a set cover \(\mathcal{C}\) exists for \(X\) (the set containing the elements to be covered), containing a maximum of \(k\) sets.

[Read More]

How to show a problem is NP-complete

This is the first post in a series of posts where I will attempt to give visual, easy to understand, proofs of NP-completeness for a selection of decision problems. I created these while studying them back in college. Hopefully, they’ll be helpful for someone.

In this post, I will give a “template” which can be used (and will be used for the proofs I post). [Read More]

Proving 0-1 integer programming is NP-complete (using reduction from 3-CNF-SAT)

In this post, we will prove that 0-1 integer programming is NP-complete using a reduction from 3-CNF-SAT (which is NP-complete).

We will follow the template given in an earlier post.

Problem Statement

The decision problem for 0-1 integer programming is formulated as follows:

Given an integer \(m \times n\) matrix \(A\) and an integer \(m\)-vector \(b\), determine whether there exists an integer \(n\)-vector \(x\) with elements in \(\{0, 1\}\), such that:

\[ Ax \le b \] [Read More]

Proving set-partition problem is NP-complete (using reduction from subset sum)

In this post, we will prove that the set-partition problem is NP-complete using a reduction from the subset sum problem (which is NP-complete). This is also happens to be Exercise 34.5-5 in CLRS3.

We will follow the template given in an earlier post.

Problem statement

Given a set \(S\) of numbers, determine whether \(S\) can be partitioned into \(A\) and \(\overline{A} = S - A\), such that: \[ \sum_{x \in A} x = \sum_{x \in \overline{A}} x \] [Read More]