Please don't use Cucumber
May 31, 2012Cucumber is by far my least favorite thing in the Ruby ecosystem, and also the worst example of cargo cult programming. Cucumber has almost no practical benefit over acceptance testing in pure Ruby with Capybara. I understand the philosophical goals behind behavior driven development, but in the real world, Cucumber is a solution looking for a problem.
The fact that Cucumber has gained the popularity it has in the Ruby community is outright baffling to me. All the reasons to use it that people give are theoretical, and I have never seen them matter or be remotely applicable in the real world. Cucumber aims to bridge the gap between software developers and non-technical stakeholders, but the reality is that product managers don't really care about Gherkin. Their time is better spent brainstorming all the various use cases for a feature and communicating this either verbally or in free form text. Reading (and especially writing) Gherkin is a waste of their time, because Gherkin is not English. It's extremely inflexible Ruby disguised as English. The more naturally it reads, the more difficult it is to translate it into reusable code via step definitions.
There are basically two extremes of Cucumber:
- Writing Gherkin describing the feature at a very high level, and reusing few of the step definitions between features.
- Reusing step definitions, resulting in a level of detail described in the Gherkin which is not useful for any of the stakeholders.
Everything in between is just a bad compromise of one or the other.
Gherkin is really just glorified comments. If you simply write free form comments above Capybara scenarios, you can convey the same high level information about what the test is doing and what the acceptance criteria are, without any of the overhead, maintenance cost, and general technical debt of Cucumber. This doesn't allow for the real red-green-refactor cycle from the outside in of the BDD philosophy, but in my experience, developers tend to avoid the test-first approach with Cucumber simply because it's so painful to use. If you're not really following BDD practices, and your non-technical stakeholders are not reading or writing Gherkin, Cucumber is wasting your developers' time and bloating your test suite.
The one advantage Cucumber offers over simply commenting Capybara scenarios is that, by tying the "English" directly to the implementation, it's impossible for the "comment" to rot. This is certainly a danger, as a misleading comment is worse than no comment at all. However, this benefit comes at an extremly heavy cost. I would argue that it should simply be the discipline of developers to make sure that any time a Capybara scenario is updated, the corresponding comment is read through and updated as necessary.
Whenever someone writes a criticism of a particular piece of software, there is always a group of people who respond by saying, "It's just a tool. If it works for you, use it. If it doesn't, don't." While I agree in theory, this is where the effect of the cargo cult becomes real and damaging. Some guy somewhere came up with this idea that seemed great in theory, and everyone jumped on the bandwagon doing it because it sounded cool and it seemed like something they should do. After a while, people choose to use it just because it became the status quo. They don't see that all the reasons a tool like Cucumber seemed like a good idea, based on some blog post they read 3 years ago, are not in tune with the real, practical needs of their project or their organization. And once that choice has been made, everyone has to live with the increasing technical debt and slowed, painful development it creates.
Comments
May 31, 2012
Thanks for this. I'm just about to set up acceptance tests for a new Ruby project, but I hadn't yet dived into the murky gherkin-filled waters.
Now that you've laid it all out, it's plain to see that Cucumber is a process disguised as a tool. Separating that process into two human-organization-friendly steps (write the scenario in English and the acceptance tests in Ruby) removes the need to write a translation between the two. As you say, the additional requirement to keep the two in sync isn't nearly as much work as keeping a brittle translator up-to-date.
May 31, 2012
The world isn't black or white. I think it's save to assume that a lot of companies are doing ATDD/Specification by Example with Cucumber and have a lot of success with it. Especially when they do TDD and start with the features first.
There are of course other projects and settings where it's very hard to install an environment that can deal with writing feature/scenario-files upfront, especially when there is no business product owner available (e.g. in chaotic environments).
Anyhow. It depends.
Please check companies that successfully use cucumber and ask them questions about your concerns before judging.
May 31, 2012
The fact that people use it successfully is NOT testament to whether it is advantageous or even necessary. Companies use Perl successfully. Companies use Visual Basic successfully. A few companies still use COBOL successfully. I feel justified in being critical of those companies, even though they do use those tools successfully.
In my opinion, a good analogy is Haml. Haml adds nothing to the capabilities of Rails, it is less performant than plain Rails, yet some people feel it is the "Right Thing To Do". To the extent that Obie's "The Rails 3 Way" used nothing but Haml throughout. Which I feel was a tremendous mistake. Because his book was supposed to be about Rails, not Haml. Even if he felt Haml was a good thing, you don't write a book about servicing Ford Mustangs using only Shelby models as examples.
I sometimes get a real chuckle out of some of these chuckleheads who discuss half-baked theory as though it were gospel and then seldom actually follow it anyway. The other day I saw a post in which somebody put a hodgepodge of code in a model to get it out of the controller, but it wasn't coherent at all. And commenter after commenter gave him kudos for using the "Single Responsibility Principle", even though his model called methods of several other models. Meh. :o/
I have often noticed on Twitter, when people talk theory to @dhh, he often says "In this case, it's more trouble than it's worth. Why bother?"
I have the same basic opinion of Cucumber.
June 01, 2012
I completely agree. I never chosen Cucumber but I was in a project that used it. Many, many times I found myself doing crazy things because my test wasn't written in Ruby and I couldn't just add a loop here or there or an if here or there.
Before I ever used Cucumber, it looked puzzling, I was skeptic, and I'm generally an adopter of new stuff; but after using it, I'm really confident I don't want to use it ever again.
June 01, 2012
Thanks for sharing your perspective on cucumber.
Unrelated to this thread- thanks for having your website enabled for mobile apps. Many times I follow a url from Twitter only to find I need to check back when I'm on a lap/desktop. (Small things matter! Great website! Thank you!)
June 01, 2012
Thanks for your post. I have a couple of thoughts:
First premise is
"Cucumber has almost no practical benefit over acceptance testing in
pure Ruby withCapybara https://github.com/jnicklas/capybara"
Evidence:
"I have never seen them matter or be remotely applicable in the real world"
This statement is relative to the persons experience but it is expressed
as generalisation. A single persons experience does not provide
sufficient evidence of the stated premise.
"Gherkin is really just glorified comments."
This point is a miss classification. Comments do not have structure or
grammar rules and they are not executed. Comments are removed by the
compiler.
"If you simply write free form comments above Capybara scenarios, you can convey the same high level information about what the test is doing and what the acceptance criteria are "
Correct. You could replicate Cucumber in comments.
"without any of the overhead, maintenance cost, and general technical
debt of Cucumber"
Correct, but that does not exclude maintaining costs for keeping the
comments up to date, using consistent language etc etc.
"but in my experience, developers tend to avoid the test-first approach
with Cucumber simply because it's so painful to use"
Again a good point and I'm sure that could be the case with some
developers. Don't use a tool that you find painful. But again its an
extrapolate to generalize this.
Second premise:
"product managers don't really care about Gherkin"
Evidence: (there does not seem to be any)
"Their time is better spent brainstorming all the various use cases for a
feature and communicating this either verbally or in free form text.
Reading (and especially writing) Gherkin is a waste of their time,
because Gherkin is not English."
Maybe they could better use their time but that does not provide any
evidence of the premise.
Third premise:
"Gherkin is not English."
Correct Gerkin is not pure, free flowing English, it is a subset of English.
"It's extremely inflexible Ruby disguised as English."
Technically incorrect, Gherkin has no tie to Ruby (and in fact uses a
python syntax for multiline strings)
" The more naturally it reads, the more difficult it is to translate it
into reusable code via step definitions. "
A good point, this is a challenge a lot of people face and its a
difficult one. I've seen people solve this but that does not discount
the point.
Fourth premise (and your conclusion)
"Whenever someone writes a criticism of a particular piece of software,
there is always a group of people who respond by saying, "It's just a
tool. If it works for you, use it. If it doesn't, don't. While I agree
in theory, this is where the effect of the cargo cult becomes real and
damaging. "
I personally find this one difficult to understand. Tell people to think
and evaluate their tools and why they use them feels like the opposite
of cargo culting. I would like to hear more about how telling people to think leads to cargo culting.
June 01, 2012
100% true! Else, cucumber is a good tool. Not perfect. But worth it.
Also, some people use it for executable documentation as well: https://www.relishapp.com/
June 01, 2012
You have a point about the cargo-culters, I've laid into them more than once myself[1].
I don't think it's a good idea to generalise in either direction though. Cucumber isn't a silver bullet, but it is definitely helping many teams I've worked with to build better relationships between the business and technical facing sides of the team. It might not have worked for you, but that doesn't mean it won't work for anyone.
Gojko Adzic's book, Specification by Example[2], is packed with examples of people applying BDD ideas in lots of different contexts.
[1] http://skillsmatter.com/podcast/agile-testing/refuctoring-your-cukes
[2] http://specificationbyexample.com/
June 02, 2012
The idea of having the customer write executable specifications is probably a pipe dream. As a developer writing integration tests I want to avoid the overhead of the Gherkin to Ruby translation step. You can produce very readable testing DSLs in Ruby.
June 02, 2012
@ Joseph Wilk:
Re: "Technically incorrect, Gherkin has no tie to Ruby (and in fact uses a
python syntax for multiline strings)"
Technically correct. When used in the context of Ruby (and this whole blog post is about Ruby), then Gherkin is a DSL that ultimately results in Ruby code. The syntax of the DSL has exactly zero relevance to the point he was making.
Regarding these comments:
"This statement is relative to the persons experience but it is expressed
as generalisation. A single persons experience does not provide
sufficient evidence of the stated premise."
"But again its an extrapolate to generalize this."
"Maybe they could better use their time but that does not provide any
evidence of the premise."
These comments would be entirely appropriate... if this were a scientific paper about a study of the subject.
But it is not. It is a blog post, which in most cases, by their very nature, mean that the author is discussing the matter in the context of personal experience, and expressing opinion. You have been criticizing the author for expressing opinion, when the author never pretended to be doing anything else.
Further, while you criticize him for expressing his opinion, and not presenting "evidence", you have offered no contrary evidence of your own. I smell a bit of hypocrisy here.
Please get off your high horse, and use scientific rigor in its appropriate place... which is not in response to an opinion raised in a blog post. Unless you do, indeed, have some actual evidence to justify such a position.
June 03, 2012
The biggest advantage of Cucumber it truly facilitates outside in development. You can start with cucumber scenarios which rely directly on some model objects (forget about Rails in first step). Then add ActiveRecord to mix, then Rails, then UI and finally drive out the same scenarios interacting with UI through Capybara. In the whole setup you are not rewriting the Gherkin features.
Highly recommend you read The Cucumber book by Matt Wayne before you jump into some conclusion.
June 04, 2012
I documented some of my explorations on my blog posts that are tagged "cucumber": http://testerstories.com/?cat=9.
I'm still forming opinions but, overall, I've found Cucumber (and tools like it) can be quite effective and even efficient if the premise is that your tests should always be expressive and always convey intent in the language of the business domain. There are ways to do this directly via code as well, I realize. However, I've had more luck getting testers, developers, and business analysts to write Gherkin than I have to write code. And it is critical (to me) that everyone should be able to write these kinds of acceptance-based tests, having to know the minimum of code constructs. [That being said: I am experimenting with DSL-like aspects where people could effectively write code and barely realize they are doing so.]
You mention: "Cucumber aims to bridge the gap between software developers and non-technical stakeholders..."
While that may be how some people word it, I prefer to look at it as a way to provide a shared notion of quality by allowing a constrained, shared business domain language to be written by various stakeholders on a project.
You mention: "Reading (and especially writing) Gherkin is a waste of their time, because Gherkin is not English."
No, I agree it's not full English but it's a constrained form of English and, as such, can be used to guide thinking and guide expression of intent. This is where I've found the opposite of your findings: namely, people like managers, developers, and business analysts do not find it a waste of their time at all. (A good example of a game system that does something similar is Inform 7 [http://inform7.com].)
You mention: "The more naturally it reads, the more difficult it is to translate it into reusable code via step definitions."
I have found this not to be true --- but, I will say, the only way I found this was by using composable steps. Meaning, where steps can refer to other steps. Some people in the Cucumber world think this is the devil's work but I've found it quite effective and I have worked in domains like clinical trials, banking, and hedge funds where the domain conditions are very complex to describe.
All this being said, I appreciate your viewpoint and, as I said, I'm still coming to terms with these tools myself. I've gone back and forth as to whether I find them cumbersome, helpful, the greatest thing since sliced bread, or the worst idea since "reality TV."
Comments are closed
Comments are automatically closed 2 weeks after publication. If you still have something to say about the article, feel free to contact me.