Rambling around foo: November 2007

Thursday 29 November 2007

Have you ever drove your dreams?

Thomas writes about a dream of his and how his dream self failed to do the right thing.

I have been able several times, while asleep, to be aware enough to realize I was dreaming and was asleep and started to drive my dream the way I wanted.

I made myself fly superman like and fly a plane, I managed to come in second place after Ayrton Senna (this was while he was alive and I respected him too much to take his victory) I managed to do a lot of things. At some point I even managed to turn right a recursive nightmare I had and got rid of it that way.

Unfortunately, I haven't been able to do this in the last 3 or 4 years, but I know I had to set a goal before going to sleep in order to achieve it.

Has anyone else experienced this?

I thought that was Debian's 50000

Phew, Rob, I thought that was Debian's 50000th bug!

Fortunately, I haven't lost the bet!

Tuesday 27 November 2007

Updates: NSLU2, Andrew S. Tanenbaum in .ro

Last weekend was as hectic as my life has been lately: I have been trying to restore sanity into my NSLU2, I went to a lecture from Andrew S. Tanenbaum and I made a 2.5 hours drive to my parents in about 4 hours because of the fog.

First, my slug:

refuses to recognise the USB NIC I have been using until the latest incidents (it either says 'not accepting address, error -71' or 'device descriptor read/64, error -71')
sometimes reboots when I insert the USB NIC
either doesn't boot at all or boots really slowly when the USB NIC is inserted
(obviously) doesn't show the NIC in lsusb listing when is not recognised

Since the USB NIC works on my laptop, I suspect a hardware problem with the slug. Bummer!

Dear no-so-lazyweb, is there a way to install Debian on an ASUS WL-500G Premium router without loosing wireless ability? Or, is there a way to make use of my USB NIC with the ASUS router?

Second, Andrew S. Tanenbaum visited Romania and lectured Friday at the University „Politehnica” Bucharest.

He presented Minix3's architecture and the advantages it has over monolithic OSes. I attended the lecture (although I am not a student anymore) and found it quite nice and well prepared, but I had the feeling that sometimes he was trying to avoid or to bash topics that were not putting Minix into a good light or challenged its title of being the first open[0] OS based on a micro-kernel architecture[1]. In spite of that, I found him to be a really good speaker and I liked the overall presentation, although, I also expected some on the spot demos or at least some recordings.

The things that I remember:

2.4 millions subtle code alterations in drivers with only 80000 driver crashes (of course, no kernel crashes)
simulation of network driver repeated crashes at different time intervals and how it affects performance - a 30% degradation at crashes that occur once every second and an insignificant degradation at crashes occurring at each 10 seconds
every driver has a set of rights assigned to it; it was difficult for them to define this - this sounds a lot like SELinux issues
messages have a fixed length
there is no dynamic memory allocation within the kernel
the kernel is 5000 lines of code (all drivers are in user space)
really secure system
there were performance comparisons with Minix2 and the hit was about 20%; still, is said that L4 has only an approximate 2-5% performance hit because of the micro-kernel architecture
apparently the FreeBSD kernel has only 3 bugs /1000 lines of code
Minix uses a BSD license

I also got a Minix live CD (which is more like the Gentoo Linux install CD - just console in the live system) and made an installation of Minix in a qemu machine[2]. Unfortunately, I don't think I'll have the time to dwell into the source.

I was thinking, would it worth the effort to try to make a GNU/Hurd/Minix system (i.e. replace Mach with Minix's micro-kernel)? BTW, is Debian GNU/Hurd now based on L4 or does it still uses Mach?

Note: Some of my work colleagues suggested that the presentation was the same as one he made at linux.conf.au last year, but I can't confirm/infirm that since I didn't saw the recording.

I won't write about the "fog drive", but I'll just say it wasn't pleasant at all, and I felt I was in driving in The Twilight Zone for the whole Friday evening.

[0] he gave credit to QNX
[1] For instance, I tried to ask him twice if he felt that GNU Hurd was violating the micro-kernel paradigm or if he can compare it to Minix' architecture. I had the impression that both times he avoided to answer and started the usual Hurd bashing, "they have been developing it for 20+ years, but got nothing working", meanwhile "Minix is here". After the lecture/presentation somebody told me that AST shortly said that they "were similar, but different". I didn't catch that line.
[2] thanks to qemu-launcher it is trivial to create and manage multiple qemu virtual machines

Wednesday 21 November 2007

I just made mself a favour...

... and ignored a troll. I feel so good now.

Wednesday 14 November 2007

Lesson relearned: when Linux networking weirdess occurs...

My relearned lesson for the day: when Linux networking weirdness occurs in a NAT environment, remember to try MTU clamping.

Thanks to the comments by Justin and Sesse, I was fast-tracked to the core of the problems I have been experiencing since Thursday, MTU issues. What's worse (from my pov) is that I have encountered this issue before with the provider I had in Timișoara, but, since that ISP was using PPPoE and my current ISP in Bucharest doesn't, I never really made the connection. I even had a commented out iptables rule for MTU clamping in my firewall script.

The rule I am talking about looks like this:

iptables -t mangle -A POSTROUTING -p tcp --tcp-flags SYN,RST SYN -o $EXT_IF -j TCPMSS --clamp-mss-to-pmtu

or like the one I have been using (seems more logical to me):

iptables -I FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

Note that this is not a fix, but a workaround and the real problem is over-zealous admins or weird setups[1] which think that banning TCP fragmentation (or the entire ICMP traffic) is a way to secure networks.

Once again, thanks to everybody who read and/or commented about my issue.

[1] Sesse told me that in his case there was a transparent proxy involved when he exeprienced MTU weirdness.

Linux: plain weird network behaviour; Windows is OK

Update: problem fixed, thanks for the comments; it was MTU related issues.

Note: This is a long post, but I expect it to bring in some questions for many Linux people; I advise you to read this when you have enough time.

After the last events related to connectivity at home which have been lasting since Thursday evening and the weird fact that it seems that NAT doesn't work at home, but does at work for the same setup, I called the ISP's support and wasted 15 minutes trying to convince them at least to send a technical team with another modem just to test if that is the reason for the breakage.

Of course, talking to the ISP support and trying to convince them that there might be a problem on their side since NAT works with another provider but doesn't with them, was fruitless.

Tonight I was stuck and tried another approach, convinced I would confirm my suspicion that there is something wrong on ISP's side. Still I am not yet sure what to conclude from what happened.

So, to get a better view of what is going on, I'll describe the setup I have and what are its limitations and characteristics. People in a hurry can skip to the paragraph starting with "I tested" and stare in wonder at will.

So the connection I have is done through a DSL modem which gets the MAC of the network card connected to it and exposes that as its own to the ISP's network. This MAC seems to be quite persistent and special measures must be taken in order to be able to use another NIC to connect. The modem (or probably some machine in ISP's; the IP is 10.10.0.1) offers a DHCP address and everything should work fine.

Because of this "your MAC is my MAC" issue, when I connected the first time, I used an USB NIC since a broken router or some temporary failure would have allowed me to use the Internet connection directly from my laptop. I can say this decision has proven in time to be wise.

The router I used until now is a NSLU2 with Debian installed on it. The built in network interface always faced the internal network.

The router (which I call ritter) served as a NAT router for two machines inside the network, my laptop and my apartment mate's laptop. All until last Thursday, after which it never got back properly.

I tested (doing NAT on my laptop; the laptop shouldn't have been affected by the power problems):

with ritter behind the laptop NAT, at home
with two different virtual machines as NAT "clients", at home
with ritter behind the laptop NAT, at work
with a virtual machine as NAT client, at home
directly from the laptop
NAT made through a SNAT rule
NAT made through a MASQUERADING rule
with TTL mangled (increased by one, although is was never in the ballpark of a low TTL)

Of course, I have n-checked that /proc/sys/net/ipv4/ip_forward was set to 1, the tables had policy ACCEPT and there were no extra rules, except the basic NAT-ting stuff, the routes were correctly set on both the clients and the machine doing the NAT.

All I could see is that:

the machine doing the NAT was always working fine
at work all NAT clients worked fine
at home any of the NAT "clients" were

able to resolve addresses while the DNS server was in ISP territory and another LAN
ping the outside world (if ping was available - in D-I is not)
hanging when trying to get a http page
telnet-ting directly to the port 80 was ok (but I didn't try to "GET / HTTP/1.0")

So after all of this, I was thinking of trying to see if a new client (the laptop of my apartment mate, a Windows XP machine) would work using the connection at home. It didn't work.

Then I thought of trying to do "Internet Connection Sharing" as is called in Windows. Of course, there was some pain to find Windows XP drivers for the ASIX AX88172 network card (remember, the modem needed to see the MAC of that NIC), but I managed to find the proper one.

And, almost sure the NAT wouldn't work for this case either, I configured the new connection as a shared one. I didn't even disabled the firewall, as I was thinking I could take those down gradually.

I wasn't expecting this, not even by pure chance, but my laptop which was a NAT "client" now was able to browse, ping and do whatever was normal through NAT, while the Windows machine was doing the "Connection sharing".

I was utterly flabbergasted. And that was just the beginning.

I was expecting that the problem coincidentally went away, but after a minute I was proven otherwise. It still didn't work with Linux as the NAT-ting machine. I connected back the USB NIC to the Windows machine and I saw the same thing. NAT was just working.

At that point I was to observe an even more shocking fact: the IP that the Windows machine got was different from the one the Linux machine received, in spite of the fact that the network card was the same, so it would have made sense to get the same one. More than that, the IP that the Windows machine got was from an entirely different network, although it was a valid IP belonging to my ISP.

I was thinking that one reason why it works with Windows might be that there could be some TCP protocol twist that is differently implemented in Windows and the equipment from my ISP gets along better with the Windows network stack.

As a way to test that, I am thinking of forcing somehow the IP on the Linux machine to see if anything changes. But before doing that, I felt the urge to post these, maybe some kind soul will shed some light on this issue for me or drop a hint.

Another reason might be different DHCP servers answering, but I don't know how I can see in Windows who offered the lease.

If anyone has any clue why these weird things happen, please drop a line. I would greatly appreciate it. TIA.

Tuesday 13 November 2007

good news, bad news, such is life

After a really crazy weekend in which it looked like I wasn't able to set up NAT[1], I went to work with the gear and tested the exact same thing I did at home and it worked without a glitch.

So, now I suspect that there might be something wrong with my ISP or the DSL modem, while ritter (my NSLU2) is fine. As a bonus I just installed Debian armel on it (installation report to arrive soon).

When I got home, after the laptop came back from hibernate[2] I saw these messages on all my terminals:

Message from syslogd@bounty at Mon Nov 12 23:57:04 2007 ...
bounty kernel: Uhhuh. NMI received for unknown reason b0.

Message from syslogd@bounty at Mon Nov 12 23:57:04 2007 ...
bounty kernel: You have some hardware problem, likely on the PCI bus.

Message from syslogd@bounty at Mon Nov 12 23:57:04 2007 ...
bounty kernel: Dazed and confused, but trying to continue

Is this a good time to panic?
I guess I'll have to dig into this, too.

In other news, svn-buildpackage 0.6.23 has been uploaded to unstable and it fixes yet another 7 bugs, which brings svn-bp's bug count down to 19 open valid bugs (2 more if you count wontfix bugs, too). This is the lowest bug count of svn-buildpackage since at least the end of April, according to the bug count graph.

Also, oolite 1.65-6 was also uploaded to unstable and fixes the breakage due to the gnustep build tools changes. Unfortunately, on arm is dep-waiting for libgnustep-base-dev which is broken on this arch.

Thanks to my sponsors, you know who you are ;-) .

[1] I have been trying to set up a simple NAT machine for more than 6 hours and, although everything seemed to be OK, checked and double checked it didn't work
[2] which works now only if I don't use fglrx which is broken from this PoV (bug 449095)

Saturday 10 November 2007

nslu2 broken?

Thursday evening, when I came from work I found my nslu2 not working. It is/was a router and a local mirror. I tried to understand what's going on, but I couldn't. Everything looked fine, except that it did not resolve any addresses and the ping to my provider was not giving any results.

I tried to restart dnsmasq, although it seemed it was running fine, and then, in a desperate gesture (after trying to understand what was going on) I thought of restarting networking. I got locked on the outside and was forced to restart the system.

Everything looked to come back to normal. Since it was late, I set myself to investigate the next morning what happened. But yesterday morning, the network was again down and the slug was inaccessible.

Since the only way to restore sanity in such a situation (because I don't have a serial console on the slug) was to reset and hope for the best, I did that.

The slug didn't start anymore. It seemed to cycle through boot -> reset -> reboot. I tried to see what was going on and connected the hard disk to my laptop. The filesystem was clean, although /var/log/boot.0 was a directory (and that directory had the same content as /var/lib/dpkg/info). I manually removed that, but that didn't make the slug boot.

It seems my slug is kind of dead.

Now I would like to know if is there any way to do remotely what flash-kernel does from within debian.

Monday 5 November 2007

(debian) work and the silence

I've been really busy lately in RL and Debian work took the hit.

Some news:

on the 1st of November, the compendiums on i18n.debian.net managed to waste a huge chunk of disk space and made other scripts (and itself) on churro fail miserably; now there is only a 7 days backlog of the compendiums
oolite needs an upload due to the gnustep libs transition; I'll try to prepare tonight the long due 1.65-6 version for unstable
wormux upstream is preparing for yet another beta which should be really close to the final 0.8 version; I wish I had more time to work on this game
sadly, no news on the naughtysvn front :-( from me
I have been coding now and then on svn-buildpackage 0.6.23 and I intent to make yet another drop in the bug count visible on the graph:

Rambling around foo