VaultWiki Development Reality Check
by, May 28, 2014 at 12:48 AM (9930 Views)
No one really needs to read this. I just go on about our development mistakes and some steps we've taken to course-correct. It's more of a writing exercise than anything, an attempt to articulate some kind of order and basis for recent decisions on our part.
The last time I spoke on our development practices was almost 4 years ago now. We had just begun work on VaultWiki 4 and had no idea really what an undertaking it would be. We had grandiose plans and great new ideals for the process like offline programming and using IDEs.
We were going to code offline.
We were going to code in IDEs with built-in error checking.
We were going to deploy code only when it was confirmed working offline.
It was going to be amazing.
We had a joke going that the number 4 couldn't be cursed after vBulletin tried to use it.
We would make VaultWiki 4 and it was going to be bug free.
Using Eclipse was great -- it performed silly syntax error checking as we typed, and theoretically we could unit test code in it using our offline server.
Well the unit testing functionality never worked, no matter what IDE we used, or what testing software we tried. It had something to do with us loving symlinks, and test suites not being able to follow them properly on Windows. Or something. So we were back to online testing.
Not such a big deal. We could do that in an online but private test environment. As it turned out, the Eclipse didn't actually perform error checking or code searching in a useful way when the code-base wasn't local to Eclipse's workspace. So we still had to develop offline, and redeploy. And of course found that if we were testing online, it was just faster to fix the online instance, test again, fix anything else etc, all online, rather than fix something offline, upload the file, see if it worked, and repeat. So we started coding online again for bug fixes, and continued offline for major features.
Oh, this was a nightmare. We couldn't keep track of which files were updated and which weren't, were constantly uploading and downloading the whole package, which led to iteration times of upwards of 2 hours on bad days (I don't recommend Cablevision to anyone serious about high-speed connectivity) to just see if a silly syntax error would go away.
Why weren't we using version control for all this pushing and pulling? That would have made things a lot easier, but the reason relates back to around 2009 when we hired someone to install SVN on our box, the box kept complaining, and finally they came back and said, "It's installed. But it doesn't actually work. It seems like SVN isn't compatible with the way your systems are setup." So that was that for SVN. We ended up writing an in-house VCS for archiving, but it had no branch support, merge support, or really anything useful other than archival and retrieval of already released versions. By the way, we used this VCS (called Arsenal), pretty much unchanged, until 2 days ago. More on that later.
So we were doing all this pushing and pulling, giving up on proper merging or branching even because SVN was a bust, and trying to manually keep track of stuff that needed to be merged and make those file edits all by hand. And we made mistakes, and some of those mistakes are still in the product today.
There was actually a whole bunch of Admin Panel functionality that we developed and a month's work was lost because a developer accidentally deleted the wrong directory after refactoring some of it. And we had no VCS to fall back on.
Around the time that VaultWiki 4 was being privately Alpha-tested it was wonderful. We had in-house QA testers, churning out a hundred reports an hour, and we had private Alpha users, churning out another hundred a day. This all continued great except that the private Alpha was probably too small and a natural disaster caused us to lose a central office and a continuing financial burden that meant we could just not have such great QA anymore.
Eventually the time came where we released VaultWiki 4 and everything seemed okay at the time. If there were major problems that were missed by the limited QA the programmers themselves were delegated to doing, we had the Arsenal which we could use to generate new builds for the current revision. Except that it had no branching support, so when we started generating fixed builds for the last release off of a development code-base that was already moving forward to the next release, we started introducing more bugs than we resolved. And to resolve this problem, we eventually came up with the bright idea of not moving toward a new release until it seemed like no more major issues were being found.
The last straw was Gamma 5. We had 7 re-builds total, the most of any version to date, and it still contained showstopper issues, many brought on by the simple act of re-building.
What NowWe took a hard look at ourselves. What we had wanted to accomplish, what we had actually accomplish, and tried to figure out why the two didn't align. Well, the reasons are mostly all in the above paragraphs. We basically failed every single one of the Joel tests the more development went on:
- Do you use source control? No.
- Can you make a build in one step? Yes,
but the way it was implemented actually introduced more bugs than if we had just done the steps manually. For this reason, I don't think I agree with this part of the test. Perhaps if the answer to #1 is Yes, it's okay.
- Do you make daily builds? No.
- Do you have a bug database? Yes,
but it was not consistently used for programmer-discovered bugs or for certain pet projects. It was not consistently used by users, who sometimes sent bug reports as emails or PMs... or via Skype which we almost never look at.
- Do you fix bugs before writing new code? Yes,
this was one thing we actually learned from vBulletin 4's mistakes.
- Do you have an up-to-date schedule? Nope.
After failing to estimate any kind of schedule for VaultWiki 4 correctly in the first 2 years, we gave up trying.
- Do you have a spec? Not really.
If we had any sort of functional spec, it was this very brief post. There was a technical spec early on for database relationships, written in a tiny, ruled notebook. It was never typed up and no one knows where the notebook is anymore.
- Do programmers have quiet working conditions? Yes.
Unless a workstation starts bellowing like a frog or the alarm speaker goes off for no plausible reason, except that the speaker is defective and needs to be replaced whenever someone finds the time to order one.
- Do you use the best tools money can buy? Yes.
The workstation I'm using now cost over $6000. And it's not finished. Perhaps some of this money should have gone towards testers but at least my system doesn't lock up anymore.
- Do you have testers? No.
- Do new candidates write code during their interview? Yes.
I have always insisted that this be the case. I have found that even some CS graduates and seasoned professionals don't follow semantics or generally accepted specs.
- Do you do hallway usability testing? Usually no.
I have personally done this from time to time, very literally so. The passersby are generally reluctant to participate. And since I'm not in a high foot-traffic area anymore, it's been a while.
With a score of 6, Joel suggests we're in the "serious problems" range. I'd tend to agree, aside from personally not agreeing with #2. We're going to be changing our answer from Yes to No because of the problems Yes to that one has caused us.
What are we doing to correct the situation?
For starters, and this one is going to go a long way:
#1 - 3 We finally got real version control installed and working on our systems. We're using Mercurial, mainly because SVN and Git didn't have a feature or plugin to preserve timestamps, and we had an existing archive of over 50 dated releases that needed its metadata maintained. And also because I personally tried using the Git+Eclipse integration or whatever it's called, and countless other tools, and couldn't get anything pushed to the server. No such problems with the Mercurial tools.
This means we can branch and merge code now. It means we can develop offline and push to the server in seconds, rather than hours. And it means we can make daily/nightly/hourly/whenever-we-want builds just in case we accidentally delete something that took us a month to write. And now, with a distributed VCS, if the server goes down or a workstation crashes, the loss to the codebase is minimized.
This means we can write code offline, and let the IDE find stupid bugs again, as we originally intended. It means we can write code and branches of code, that we don't apply to the current release, so we avoid unintended bugs from code that's not ready yet. And we've updated our archive so that we can generate new builds based on previous revisions, rather than whatever the current working directory is at, so we can still patch old versions quickly without holding up development progress.
#6 I've been looking into ways to create meaningful schedules for myself using some of the Evidence Based Scheduling techniques Joel recommends. We have Microsoft Project here, but for the life of me I don't see why this exists when you can do the same things in Microsoft Excel much faster with fewer headaches. Until we rewrite our bug tracker, and incorporate scheduling in it, I think I'll be setting up some Excel spreadsheets and charts to be used for the foreseeable future.
#7 We need a spec. At least so somebody can use it as a reference to write or use it directly for Manual entries on this site, which are still lacking for VaultWiki 4, and this definitely is not helping sales. I feel that the spec should be public, using our wiki, so that users know what to expect and we won't really have a need for threads like this anymore.
#8 Somebody is fixing my workstation (again) next week.
#10 & 12 We definitely need more QA. I should grab some people occasionally like I used to, but it might help to have a spec first. Either way, they could help find showstoppers like this (that one specifically happened because of one of the pre-VCS merges we tried in Gamma 5).
That's All for NowWell, I spent the whole post talking about our historical mistakes and the Joel test, that I didn't cover what I set out to when I decided to write another blog entry, and that was all about our new VCS, how the toolset works with our existing archives, and how we plan on integrating it into VaultWiki (say what?). Maybe I'll find some time again soon, as it's pretty exciting stuff. Right now the editor is complaining my post is too long.