Lately I came across Codility, a company which present themselves quite simply: “we test coders”. I had been thinking for some time about trying my hand at some coding competition, and was in the process of searching for a new job, so I became curious. But after a quick look at their website the curiosity became disdain. It all sounded like boasting to be a coarse-grained filter to separate the chaff from the wheat, quickly.

Only that the chaff are people too. Sounded pretty dystopian. Probably this is a First World problem, but hey, after interviewing in a couple of places things seemed sufficiently bleached and dead inside as to not need any more automatic filter to dehumanize the interviewing / testing even more. Typically in my interviews I missed feeling anything that would make me want to work with the interviewers; I wanted to find someone with passion, instead of someone boringly fulfilling the recruiter role for a hollow company.

And the Codility premise didn’t seem to help.

But after a couple of days I found myself going back to the thought of what they do. I started realizing how one of the causes that pushed me to look for a new job was that I missed having an environment where I could grow as a programmer - ideally with programmers I could look up to. My job at the moment was not in a software-centered company, so I guess that’s why I was lacking inspiration; everyone had a variety of hats there, and I was feeling little by little more sure that my hat should be more of a software-centered one, while most other people’s hats were mainly in the hardware side.

But that doesn’t change the fact that the software side of things were not looking right. Even my tries to spice up my own work and make it something I could be proud of were finding resistance - for example, use the preprocessor a bit too much (for example to implement a dab of generic functions without the full problem of getting into C++) and get told that “you’re working too high level, this is firmware”. Yeah, I wonder what people using these techniques in the 70’s would have thought about even our humblest 8-bit uC with 128 KB of RAM.

So then it hit me. What would have happened if Codility had been used in the recruiting process of my own company? Probably most problems would have disappeared. And reminded me of that FizzBuzz article, and how an astounding proportion of programmers plainly can’t program their way out of a wet paper bag.
What if Codility was in fact something to look for? What if a company using Codility could be a sign that they are serious about programmers?
Even, if they might be such a force for good, what about working for Codility themselves?

I decided to investigate them a bit more – even the application procedure was interesting – and tried one of their free tests. And turns out it was easy. I finished in half of the allotted time, and after some cursory checking I submitted my solution feeling smug. It worked. I knew because I did it in parallel in my own IDE and could see it was working. Yeah, I'm good.

…except that I made an error controlling the index of an array, which made my solution fucking useless in most cases apart from the small test case I had used. I got a score of 20%.

Well... that was sobering.

(in my defence I’ll say that Codility’s test required to use big, signed integers, and their solution checker had their own problems, since their compiler emitted warnings about long long type only being a C99 type, but didn’t allow to use compilation flags to for example specify C99 mode. So I concentrated on the sign problems… only to fail to realize that the more mundane problem was right there too). 

So I took to practice with some of their other free tests, of which they have a full page, for those wanting to improve. This helpfulness seemed strange at the moment, but turned to be an interesting detail later.

Why strange? Because until then I was rather impressed by their online testing/evaluation system and was half-sold to their possible goals, but could still see lots of warning signs that made me still wonder if they were a force for good or bad. Every rough corner had a big potential to change from being just an apparent nuisance to being a showstopper - for the candidate!, while the recruiting company would just receive a bad report about the candidate and Codility would come out looking like it did its job - even when that was arguable. And the opacity of Codility didn’t help me trust them. So, to avoid the risk of a let down in the case the ugly face was the real one, I decided to try my hand with Python, even if I am mostly a novice with it: if I was going to invest my time and get tested, why not at least take the chance to practice what I want and have some fun by the way.

Turns out, I think the testing style of the tasks does in fact favour doing just that: there is no time to stop to think about all the corner cases in C. Or rather, with Python you can do the same task using half the brain cycles.

Some tries later I was managing to reliably get good scores and tried interviewing with them. And turns out that the official test scores were good enough, so I finally got to talk to them. Into the Death Star!

Well, what a surprise. From the first moment they seemed even warm, in the best “mythological startup”-y way. Not only they were rather nice as plain people. Not only they were rather in the know of the shortcomings of the testing environment - they even seemed mortified about some of the problematic details I mentioned, and even asked for details to fix them, though I expected them to brush it aside with a “yep, we know, we have a bug report somewhere, we’ll fix it when we have time”.

Not only they no longer sounded like douchebags - they in fact have such an emphasis on teaching and making the programming community get better that it left me positively dazed. Sounded much better than I the best case I had imagined. Remember the “strange page full of tests”? When I found it I thought it was strange because it seemed quite helpful for a site intended on fucking you up. Now I see it in other way; the “fucking you up” part is still there, but is trying hard to get better/not be worse than it has to be (after all they are still a filter!). And that part is needed to pay for the other side that really is the one with a greater-than-themselves potential, with a dream: to teach!

But the intriguing thing is that they have the possibility to teach in a unique way. After all they have a big and growing corpus of programming examples showing how programmers try to do a task and how/where/why they fail. Imagine all that can be learnt from there. Even in a purely "psychology of programming" way it must be terribly interesting.

For example, Peter Norvig said something to the effect that “Design patterns are bug reports against your programming language” - which is great, because now I don’t need to explain myself as much when I say that I dislike the premise of design patterns :P. But, what if the kind of “typical mistake” data that Codility can gather was in fact a source of antipatterns - meaning constructions which are typically done wrong by programmers? Maybe you could find even a blind spot or Achiles heel of human thinking; some articulation of thought which for a big percentage of people will cause wrong results. What if (some?) patterns made sense as a framework to automatically avoid those dangerous spots? Or inversely, what if you could standardize some antipatterns to let programmers realize when they are treading on specially dangerous ground?

Sounds maybe far-fetched, but don’t we already have examples of something similar in high-school level philosophy courses? P -> Q implies ~Q -> ~P, but everyone has had the impulse at some moment to think ~P -> ~Q - and lots of people still do it, of course. Didn’t the greeks already have people specializing in convincing people by exploiting just that kind of buggy thinking?

Going slightly further, what if that kind of “problematic construction detection” was built into a compiler or lint-type tool?

But the most interesting thing would be in a context where in fact it touches even more pressing and real problems. For example, in some safety systems (say, railway traffic control) a number N of equivalent-but-different versions of a program run in parallel as a way to get redundancy. The different versions can cross-check their results and orderly shut down the system when a discrepancy is detected, to avoid the system working in an unexpected state; or if more than 2 versions are running, the result can be decided from a majority vote, while the system/s in minority can be isolated and taken for corrective measures. The idea is that N independent implementations by N different programmers should have different bugs in different places, affording a measure of safety.

…or does it? Turns out that MIT published not long ago a very interesting paper in which they show that this approach to redundancy is NOT good because it assumes that bugs are equiprobable and uncorrelated. But they aren’t! They depend both on the programmer and, more importantly and interestingly, on the task at hand. The MIT group asked a number of programmers with different contexts and experience to solve some problems, and they do demonstrate how the bugs are much more probable in certain parts of the task, no matter if the programmer is a student or a experienced developer. Terribly interesting, even though the sampling space they have is too small to do much more than assert that the bug-statistical-independence is false - so N-version programming gets at the very least a pretty big warning.

But now imagine what could you do to attack the bug determinism in N-version programming if you had a database of how thousands of programmers tend to fail in different programming tasks?
How BEAUTIFUL could such research be, and how terribly useful could the result be? Things like that are the ones which make me think about getting into a PhD…

Also, I think it is interesting to compare that to machine translation. For a long time it was an intractable problem, which gave useless results. And then, “all of a sudden” we have Google Translate which gives usually pretty good results (if you have never used 90’s style translating software, do believe me, Google Translate is just incredible). What changed? Well, the approach changed. Instead of having a machine “understand” the original text and translate it into a target language, now Google has a huge corpus of translations of the same texts into different languages, and can statistically link parts of a text in language A to parts of the same text in language B. Rosetta stone, a billion times over, with feedback from users and an ever growing corpus.
Again: a big corpus, statistics, translation / finding correspondence.
What could Codility do here with its data? Still thinking on that, but the potential has to be huge…

So. Yep, finally I interviewed with them. And though it seemed so interesting finally the thing was not to be. But certainly I expect them to evolve into something great… if only because if they don’t, the potential of the nasty-looking part of it taking over the dreamy part sounds horrible.

So here’s one hoping that they will really be a force for good! And looking forward for companies either using Codility with a measure of taste and care - or improving their recruitment processes for programmers… because if not, I can see myself fully taking advantage of the test periods :P.

[UPDATE: 2 years later, things look less wonderful… ]

1 comment

  1. Interesting :)

    About google translate - it is a good tool if you translate from/to English.

    If you translate from some language to polish the biggest problem are English based words. In translation to English they are not translated, but when you translate it directly to polish they are "unfortunately" translated from English without context...

    So, if you have some a text about motorbikes in eg. Swedish and they use a word "bike" to polish it will be translated as "rower". Or if you have some text about technical diving in eg. Norvegian and they use phrase "stage" which mean set of diving tank and breathing aparatus. it will be translated to polish as "poziom"...

    Therefore, I always translate to/via English by googletranslate;)