Most developers are familiar with the FizzBuzz code test which is a quick way to filter out developers who can’t code. At Yodle, we had our own, slightly more challenging problem. The challenge was read in a text file and then print out the frequency each word appears in descending order. It’s more complicated than FizzBuzz but it assesses a variety of skills. The solution needs to do the following:

  1. Read the file
  2. Split the text into a list of words delimited by non-letter characters
  3. Convert each word to lower case
  4. Compute the frequency each word appears
  5. Sort the results in descending order by frequency
  6. Print this sorted list

I thought it would be fun to see if I could do it in PostgreSQL and was surprised by how quick and easy it was. The most challenging part was figuring out how to read the file - after that it was just using a few of the built in functions to clean and organize the text.

DROP TABLE IF EXISTS temp_t;
CREATE TABLE temp_t (c text);

COPY temp_t(c) FROM '/tmp/data.txt';

select lower(data.w) as word, count(1) as num_words
from (
  select regexp_split_to_table((select string_agg(c,' ') from temp_t), E'[^\\w]+') as w
) data
where data.w <> ''
group by word
order by num_words desc, word;

It also turns out to be very simple to do FizzBuzz in PostgreSQL. The nice part of the PostgreSQL solution is that it can easily scale to adding a 3rd combination. for example print Dozz if the number is divisible by 7. In the PostgreSQL solution, it would just require adding a row whereas in the standard solutions it would require a bit of work and would increase the chance of a bug.

DROP TABLE IF EXISTS fizzbuzz;
CREATE TABLE fizzbuzz (
  num int,
  text varchar(4),
  priority smallint
);

insert into fizzbuzz (num, text, priority) values (3, 'Fizz', 1), (5, 'Buzz', 2);

select coalesce(string_agg(fizzbuzz.text, '' order by fizzbuzz.priority asc), nums.num::text) as text
from (
  select generate_series(1,100) as num
) nums
left join fizzbuzz on nums.num % fizzbuzz.num = 0
group by nums.num
order by nums.num asc;

Clearly PostgreSQL isn’t the right tool for every task but it’s surprising how powerful it can be given the right problem. It’s also a great way to think differently about a problem - even if you end up choosing a more standard solution.


Read more!