Random header image... Refresh for more!

Pizza Pizza

Visual Studio never fails to come up with amusing ways to go horribly wrong.

And yet somehow, “Append” is immune.

May 18, 2010   No Comments

That’s What I’m Talking About.

task main()
{
  TextOut(0, LCD_LINE1, "Starting in 2s");
  Wait(2000);
 
  TextOut(0, LCD_LINE1, "Rotate 45 +");
  RotateMotor(OUT_A, 75, 45);
 
  TextOut(0, LCD_LINE1, "Rotate 45 -");
  RotateMotor(OUT_A, -75, 45);

  TextOut(0, LCD_LINE1, "Rotate 360 + x2");
  RotateMotor(OUT_A, 75, 360);
  RotateMotor(OUT_A, 75, 360);

  TextOut(0, LCD_LINE1, "Rotate 180 -");
  RotateMotor(OUT_A, -75, 180);
}

February 26, 2010   No Comments

What is NXC?

What is NXC?

That’s a really good question…  I’m not exactly sure.

It’s “Not eXactly C”, a programming language developed for programming Mindstorm NXT bricks.  It’s supposed to be similar to C.  There’s also the related language NBC, which is an assembly-like language for the NXT.  I’m hoping that either one of them will be easier to deal with than the default NXT-G programming language, which reminds me of a bad symbol puzzle from a Myst game.

BricxCC is an IDE for NXC/NBC and whatever the RCX bricks used.  It’s available from here:  http://bricxcc.sourceforge.net/

NXC and NBC information is here: http://bricxcc.sourceforge.net/nbc/

February 26, 2010   No Comments

And right into the next bug…

So, I fix the bug with the waiting and I realize there’s another bug close behind.  It’s now properly waiting for the action to complete, the trouble is that the action is the last action I told it to do.  In other words, I start it up and tell it to rotate 10 degrees and it does nothing.  Then I tell it to rotate 360 degrees and it goes 10.  Then I tell it to go 180 and it goes 360.

The NXT program is supposed to wait for a “Go!” signal, then read two packets from the same mailbox for values to pass to the motor.  The mailbox is a queue, so if it’s reading the Go! signal, then racing to the motor command before the other two messages arrive, then the mailbox should be out of sync for the next Go! signal.

But that’s the scenario that makes the most sense here.  It gets a “Go!” then goes straight to the motor before the next two packets arrive.  So it reads zero and does nothing.  Then the next Go! code comes in.  By then, the values have been written from the first request, so they’re what gets sent to the motor.  But that would make it out of sync.  The second Go! code would be reading a boolean, then there’d be a number where a bool should be, and finally, the bit that should have a number will get the string “Go!”.

So…  WTF?

February 25, 2010   No Comments

Achievement Unlocked: Wild Goose Chase

So, I found out why it wasn’t waiting…

It was waiting and I’m just a moron.

I had my program set up to send a Bluetooth message to the NXT to retrieve the contents of Mailbox 10.  One the NXT, I was writing true to Mailbox 10 when I started an action and false when the action completed, so that Mailbox 10 could be used as a “Busy” signal.  My program would poll until Mailbox 10 reported false, then continue.

Thing is, I was sending the “Clear Mailbox” flag to the NXT.  So the NXT gave me the contents of Mailbox 10, then cleared Mailbox 10.  The first read returned true, because the motor was busy.  The second read returned false, not because the action had completed, but beacuse the “Clear Mailbox” flag caused Mailbox 10 to be filled with a bunch of zeroes.

February 25, 2010   No Comments

Perhaps I Don’t Understand…

The checkbox says “Wait for Completion”.  I would think that means that it would wait for completion before proceeding.  So why is it immediately continuing and sending me the “Done” signal?  The motor acts different if I have the box checked or not, so I know it’s doing something.  I’m just not sure what it’s doing.

I think it’s using a different definition of completion than I am.

February 25, 2010   No Comments

Um. Yeah. Whatever.

I’m sure this made sense to me at some point.

February 25, 2010   No Comments

This Code Doesn’t Work And Here’s Why

I am a fan of continuous integration and automated builds.  It’s a great tool to help ensure that people are always checking in clean code that can be built successfully.  However, I have found that simply having an automated build system that sends out mail isn’t enough.  Some people are too lazy to set up proper e-mail filters, so they end up deleting all mails from the build server, including the “HEY, YOU BROKE THE BUILD” e-mails that are required for the proper functioning of any CI system.  As a result, I realized that simply having the system is not enough, there needs to be some other indication of the state of a build, something completely passive, yet visible to all.  At first, I set up a red light, using X10.  This alone was not enough, so it was soon joined by a screen of shame:  A computer monitor that would display, in big text for everyone to see, who broke the build and how long they’d left it broken.  That system worked fairly well.

Over time, though, the number of builds that needed to be monitored grew unwieldy.  The original display program I’d written was no longer up to the task.  When you have over a hundred separate builds, split across multiple teams, something that can display the status of only seven builds at once doesn’t work all that well.  It was clear that I needed to start over and create a new display system for this new world.

At the core of this build monitor system is a timer.  Every 50 ms, it wakes up and sees if there are any tasks that need to be performed.  This timer function is used to control the build status updates.  Every 30 seconds, the system will contact each of the CCNet servers that are being monitored, and pull down their latest build information.  Here’s the essence of that code (Error checking, etc. removed):

protected void FireEvents(object sender, ElapsedEventArgs e)
{
    foreach (IntervalEventBase intervalEvent in _collectionList)
    {
        if (intervalEvent.NextEventTime < DateTime.Now)
        {
             ThreadPool.QueueUserWorkItem(
                 delegate
                 {
                    intervalEvent.UpdateNextEventTime();
                    intervalEvent.FireEvent(_state);
                 });
        }
    }
}

Those IntervalEvents can be any action I want to take on a timed interval.  One type of event checks a build server every thirty seconds, while another checks the time to determine if the application should enter a power saver mode which turns the screen off if it’s after hours and no one is around to look at it.  Not all events are on the same schedule, which is why it has to check whether or not it needs to fire a given event.

I set my program up to monitor a single build machine and, as expected, it updated the status every thirty seconds.  I failed a build and it showed up as failed on the next update.  Stopped a build and it displayed as stopped.  Everything was working, so I added a few other build servers to be monitored and deployed my program.  My new and improved build monitor application was now visible to all of the engineers in the company, telling them whether or not any of their builds were broken.

Almost immediately, I noticed that the screen was reporting that some of the builds were “stale”, that is, they’d missed their next expected build time by more than five hours.  Usually, this is a sign that a particular build on a server has gone sideways and is blocking all of the other builds as a result.  I didn’t think much of it at the time, especially when it cleared.  The server that was getting the stale warning was one I wasn’t very familiar with, so I figured that maybe there actually was a build that kicked off occasionally and that took six hours and blocked everything else while it ran.

I ignored the problem for a few days, until I noticed that a build box I was familiar with showed the same problem.  I knew that nothing on that box takes more than five minutes to run, so blocking for five hours was a major problem.  I went to the CCNet Dashboard and found that it was all clear.  Nothing was building and nothing was late.  So…  What the hell?  Why was the build monitor saying everything on that box was stale?  Obviously, my update timer was working right, because otherwise all of the build machines would be stale.  Network issues, perhaps?  Whatever.  I kicked the box, restarted my program, and everything showed up fine.

But still, at least once a day, individual build boxes would randomly report that everything was late to build when nothing was wrong.  I had to dig deeper and figure out what was going on.  The simplest way to begin was to put the last updated time for each server on the screen.  The servers should never be more than 30 seconds out of date, so by knowing when a particular server ran into trouble, I might be able to solve the problem.  Maybe the anti-virus software was kicking off at 11:30 every day and killing a connection to a build server somehow.  A broken connection would result in an error being displayed, but it would get displayed once for a total of ten seconds.  Very easy to miss.

I put the last updated time on the build status slides, then left the program to bake for a day so things could get in a weird state.  I checked it at 4:30 PM, and here’s the times it reported:

Build Server 1: 12:23 PM
Build Server 2: 4:27 PM
Build Server 3: 6:12 AM
Build Server 4: 3:56 PM
Build Server 5: 2:03 PM
Build Server 6: 4:30 PM
Build Server 7: 10:24 AM
Build Server 8: 3:38 PM

Blink.  Blink.  WTF?

Okay, that makes NO sense at all.  WHAT THE HELL?

If the timer is working, all of them should report a last updated time of around 4:30 PM.

If the timer isn’t working, all of them should be stale and all reporting the time I launched the program.

But this?  Scattered randomly throughout the day?  How in the hell was that happening?  I know that the .Net Timer classes aren’t meant to provide 100% realtime reliability and that using the ThreadPool will cause execution to be unpredictable, but this is well beyond the expected limits of tolerance.  This just cannot be happening.  There’s no way.

I open up the code and start looking around.  The timer is set up with an interval of 50 ms.  Clearly it’s firing properly because some of the builds are getting updated.  And if any of them are they all should be, since they’re all being queued in the same function call from the same timer tick.  Then it hit me what the problem was.

You see, using the ThreadPool doesn’t really make many guarantees about the execution.  You hand some work over to the ThreadPool and it’ll get done at some point in the future when the pool gets around to it.  It’s great for when you have something you need to do asynchronously and don’t require too much control over when or where it’s executed.  If you go back to the code, you’ll see that the function I’m putting on the pool will schedule the next execution of the event.  Trouble is, it’s the timer method that’s checking that value, and the timer is in its own asynchronous little world.  Obviously, what’s going on here is that the timer is firing and enqueuing a bunch of events and then firing again and enqueuing some of them AGAIN, before they had a chance to run and update their scheduled time.  As this happens, the number of queued events will grow larger and larger and who knows when any particular build will run.  Obviously, that will cause all sorts of problems.

Okay then, easy fix.  Wrap the body of the FireEvents method in a Monitor.TryEnter()/Monitor.Exit() block to ensure it can’t execute multiple times at once.  Then, put a flag on the events, so they won’t get re-queued if they’re already in the queue to be executed.  Here’s what the function looked like:

protected void FireEvents(object sender, ElapsedEventArgs e)
{
    if (Monitor.TryEnter(_collectionList))
    {
        foreach (IntervalEventBase intervalEvent in _collectionList)
        {
            if (intervalEvent.NextEventTime < DateTime.Now && !intervalEvent.IsQueued)
            {
                intervalEvent.IsQueued = true;
                ThreadPool.QueueUserWorkItem(
                    delegate
                    {
                        intervalEvent.UpdateNextEventTime();
                        intervalEvent.FireEvent(_state);
                        intervalEvent.IsQueued = false;
                    });
            }
        }

        Monitor.Exit(_collectionList);
    }
}

Okay, everything’s golden now.  The problem is solved.  I rebuild, fire it up and…

COMPLETE FAILURE

Now they’re not updating at all.  AT ALL.  I put in some breakpoints and watch it run.  There are 9 events in _collectionList.  The foreach iterates through all nine, sets the .IsQueued flag to true on them and enqueues them on the ThreadPool.  The next time the timer fires, the first eight events, all of which are build servers, still have their IsQueued flags set to true.  Only the last event, which is the stupid power saver check, has its flag set to false.

What the hell?  It can’t possibly be taking thirty seconds to ask the build servers for their status.  One of them had to have finished by now.  How can they all be enqueued, yet not be finished?  I set a breakpoint inside the FireEvent method on the build status updater class and start things running, then wait.

And wait.

And wait.

It’s obviously been more than thirty seconds and it’s never getting called.  Not once.

I’m watching eight build status update events getting enqueued, but they’re never running and never having their queued flags reset.  But the power saver event gets its flag reset, so it must be getting called.  I put a breakpoint inside it and start things over.

Breakpoint hit!  Well, at least something is working.

Press F5.

Breakpoint hit!  Did I wait 30 seconds?

Press F5.  Breakpoint Hit!  Press F5.  Breakpoint Hit!  It’s definitely not 30 seconds.

I keep going.  I hit the breakpoint inside the power saver event NINE TIMES.

Great.  I now have a build monitor that doesn’t monitor any builds, but at least it’ll be absolutely certain to shut off the screen at 7PM Monday through Friday and keep it off all day Saturday and Sunday.  WTF?  It’s clearly enqueuing work for build update events and somewhere along the line, they’re all magically becoming power saver events.  Did I accidentally set System.Environment.TreeHugger = true; ?

I go back and stare at the code.

Oh…  Yeah…  Now I get it…

I like anonymous delegates and lambda functions.  I think they’re a great addition to the language.  I know they utterly confuse some people, but I think they’re great.  They let you keep your logic all together in one block, instead of forcing you to write a separate function and then figure out how to get your data into that function when it gets called.  In this case, I want to set the IsQueued flag on the event, then schedule its next execution time, fire the event, and finally say that it’s not queued up anymore.  That’s the logical flow of operations and thanks to an anonymous function, it’s all in one place.  Behind the scenes, it’s the compiler that ends up writing a separate function and figures out how to get your data into that function.  In most cases, that process works fine.

This is not one of those cases.

See, when the compiler works its magic, it packages up the anonymous function and any variables your function references into what’s called a closure.  This closure has a reference to the variable, not the value of the variable.  Subtle difference.  In most cases, you’re using anonymous functions synchronously, so there’s really no practical difference.  The value of the variable cannot change by the time the anonymous function is called and the variable is referenced.  In an asynchronous context, however, this subtle difference will get all hopped up on angry juice and come back to bite you.

What went wrong here is that the foreach loop does not create a new variable with every iteration.  It’s reusing intervalEvent and simply giving it a new value every time through the loop. 1  When I’m calling UserQueueWorkItem, I’m enqueuing a function that references that variable, not the value in it at the time I enqueue.  Which means that by the time the ThreadPool actually executes my function, there’s no guarantee what intervalEvent will be pointing at.  Most of the time, it will be the last event in _collectionList.  Sometimes, the ThreadPool will fire off an event while the iteration is still happening, so it will execute with some value in the middle.  It’s even possible that the value of intervalEvent will change while my anonymous function is executing.

This is what was causing all of the wackiness with the updates.  Most of the time, none of the updates were getting through, but it would run the screen shutoff event nine times in a row every 30 seconds.  Every once in a while, the ThreadPool would have mercy on an intervalEvent and allow it to run while the foreach was still iterating, which was how many of the servers would appear to be up-to-date and also how stale servers got cured automatically.

Okay, so, that’s why everything’s all FUBAR.  Now, how do you fix it?  Well, the problem is that the iterator variable is being reused, so, what you need to do is create a local variable inside the scope of the foreach to capture the value, then use that variable inside the anonymous delegate.  Like so:

protected void FireEvents(object sender, ElapsedEventArgs e)
{
    if (Monitor.TryEnter(_collectionList))
    {
        foreach (IntervalEventBase intervalEvent in _collectionList)
        {
            IntervalEventBase localEvent = intervalEvent;
            if (localEvent.NextEventTime < DateTime.Now && !localEvent.IsQueued)
            {
                localEvent.IsQueued = true;
                ThreadPool.QueueUserWorkItem(
                    delegate
                    {
                        localEvent.UpdateNextEventTime();
                        localEvent.FireEvent(_state);
                        localEvent.IsQueued = false;
                    });
            }
        }

        Monitor.Exit(_collectionList);
    }
}

You have to love it when the fix for a big nasty bug is something that looks like a rookie coding mistake.  Which, of course, means that you have to clearly comment the fix, otherwise someone will think that it IS a rookie coding mistake and “fix” it.  The closure now has the proper value when it gets run and all of my builds are updating like clockwork every thirty seconds.

Well, at least I think they are.  I fixed it at 6:59 PM, so pretty much as soon as it started running, the power saver event fired and the screen turned off…

 

Anyway, if you want a compact repro case to play with, here you go.  “foreachString” will usually print out all “nine”s down the line (Or occasionally a “three” or something in the first spot or two), while “localString” will have a different value on every line (Although not necessarily in order).

string[] strings = new string[] { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };

foreach (string foreachString in strings)
{
    string localString = foreachString;
    ThreadPool.QueueUserWorkItem(delegate { Console.WriteLine("foreachString: {0}  localString: {1}", foreachString, localString); });
}

Sample Output:

foreachString: zero  localString: zero
foreachString: nine  localString: two
foreachString: nine  localString: three
foreachString: nine  localString: four
foreachString: nine  localString: five
foreachString: nine  localString: six
foreachString: nine  localString: seven
foreachString: nine  localString: eight
foreachString: nine  localString: nine
foreachString: nine  localString: one

And for more information on the code generated by the compiler for anonymous functions and closures, including information on this very problem (called “an interesting but dangerous side effect” in section 6), check out this CodeProject article by P. Adityanand: http://www.codeproject.com/KB/cs/InsideAnonymousMethods.aspx

  1. With a foreach, this variable reuse isn’t really apparent.  If you’re using a regular for loop, it’s clear that the iterator variable is the same each time through. []

January 23, 2010   No Comments

This Code Doesn’t Work

The following is a snippet of code that reproduces a bug I came across today.

string[] strings = new string[] { "zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine" };

foreach (string str in strings)
{
    ThreadPool.QueueUserWorkItem( delegate { Console.WriteLine(str); } );
}

It should print out the numbers “zero” through “nine”.  It doesn’t work.  Why not?

I’ll write up the answer tomorrow, but I want to see if anyone else can get it first.

January 22, 2010   No Comments

WPF WTF

I’ve recently been working on a small project using WPF.  It’s the first time I’ve done anything of significant worth in WPF.  I’ve done some things in Silverlight before, but nothing in full WPF.

I wrote some code, ran it, and it didn’t perform as expected.  I got a blank window instead of the colorful boxes I was expecting.  No errors, just a blank window.  I set a breakpoint and stepped through the code.

Step…

Step…

Step…

Nothing.

Right in the middle of a function, the control had jumped somewhere else.  No unintentional return, no exception, no freak goto statement.  The debugger had gone from the line it was supposed to be on to telling me that it was no longer in the function where it should have been.

I looked at the line.

CurrentSlide.Initialize();

Well, that’s where the problem is.  The property CurrentSlide is not assigned at that point.  Originally it had been, but I changed the code around so that it’s set only after I make sure that the new slide is valid and worthy of becoming the CurrentSlide.  The line should be:

newSlide.Initialize();

But hold on a minute.  That original line should throw a NullReferenceException.  I got nothing.  Obviously something was going wrong because execution was leaving the function.  That’s consistent with throwing an exception at that line, but what happened to it?  I’m in Visual Studio and debugging the application.  I should get a big window screaming about a NullReferenceException and telling me precisely what line it’s on.  There aren’t any try/catch blocks in my code yet because nothing I’m doing is expecting any exceptions, and there wouldn’t be anything I could do about them at this point, anyway.  So…  WTF?

Then I see this in the output window of VS:

A first chance exception of type 'System.NullReferenceException' occurred in BuildMonitor.dll

Well, THERE’S my exception, hanging out in the output window.  Where I’m obviously going to find it.  No stack trace.  No exception message.  Just the type and the assembly the exception was thrown from.  Ever so helpful.

It appears that WPF is swallowing exceptions in my code.  WTF?  I mean, I’d really like to know about unhandled exceptions in my application, especially when I’m debugging.  You know, since unhandled exceptions are generally indicative of problems that, you know, need to be fixed.

So, maybe there’s an unhandled exception event that I can attach to.  I know that AppDomain has one that I’ve used before, so I try it, but that doesn’t help.  The exception is being captured somewhere above that level.  So, I Dogpile around for others that have had the same problem.  I see a lot of people who are complaining that WPF isn’t throwing binding exceptions to them.  Not quite my problem, but close enough.  The recommendation is to hook up your own handler to the Dispatcher.UnhandledException event and write out the error yourself.  Not perfect, but at least I’ll get a stack trace that way.  So, I hook it up and try it out.

AND THIS TIME I GET THE VS EXCEPTION DIALOG!

WTF?  I handle an unhandled exception that was silently being handled by something, and that stops whatever was handling it from handling it and therefore makes it unhandled so that Visual Studio picks it up?  I repeat, WTF WPF?

So, here’s the code that I’ve added to my app’s constructor to make things happy:

Dispatcher.UnhandledException += (x, y) =&gt; { };

I attach to the Dispatcher.UnhandledException event and do absolutely nothing, and now I’ll get exceptions from my WPF app showing up in Visual Studio.

December 2, 2009   No Comments