Most developers are familiar with the FizzBuzz code test which is a quick way to filter out developers who can’t code. At Yodle, we had our own, slightly more challenging problem. The challenge was read in a text file and then print out the frequency each word appears in descending order. It’s more complicated than FizzBuzz but it assesses a variety of skills. The solution needs to do the following:
- Read the file
- Split the text into a list of words delimited by non-letter characters
- Convert each word to lower case
- Compute the frequency each word appears
- Sort the results in descending order by frequency
- Print this sorted list
I thought it would be fun to see if I could do it in PostgreSQL and was surprised by how quick and easy it was. The most challenging part was figuring out how to read the file - after that it was just using a few of the built in functions to clean and organize the text.
DROP TABLE IF EXISTS temp_t;
CREATE TABLE temp_t (c text);
COPY temp_t(c) FROM '/tmp/data.txt';
select lower(data.w) as word, count(1) as num_words
from (
select regexp_split_to_table((select string_agg(c,' ') from temp_t), E'[^\\w]+') as w
) data
where data.w <> ''
group by word
order by num_words desc, word;
It also turns out to be very simple to do FizzBuzz in PostgreSQL. The nice part of the PostgreSQL solution is that it can easily scale to adding a 3rd combination. for example print Dozz if the number is divisible by 7. In the PostgreSQL solution, it would just require adding a row whereas in the standard solutions it would require a bit of work and would increase the chance of a bug.
DROP TABLE IF EXISTS fizzbuzz;
CREATE TABLE fizzbuzz (
num int,
text varchar(4),
priority smallint
);
insert into fizzbuzz (num, text, priority) values (3, 'Fizz', 1), (5, 'Buzz', 2);
select coalesce(string_agg(fizzbuzz.text, '' order by fizzbuzz.priority asc), nums.num::text) as text
from (
select generate_series(1,100) as num
) nums
left join fizzbuzz on nums.num % fizzbuzz.num = 0
group by nums.num
order by nums.num asc;
Clearly PostgreSQL isn’t the right tool for every task but it’s surprising how powerful it can be given the right problem. It’s also a great way to think differently about a problem - even if you end up choosing a more standard solution.