Thursday 2 September 2021

Stretch to Buster upgrade issues: "Grub error: symbol ‘grub_is_lockdown’ not found", missing RTL8111/8168/8411 Ethernet driver and RTL8821CE Wireless adapter on Linux Kernel 5.10 (and 4.19)

I have been Debian Stretch running on my HP Pavilion 14-ce0000nq laptop since buying it back in April 2019, just before my presence at Oxidizeconf where I presented "How to Rust When Standards Are Defined in C".

Debian Buster (aka Debian 10) was released about 4 months later and I've been postponing the upgrade as my free time isn't what it used to be. I also tend to wait for the first or even second update of the release to avoid any sharp edges.

As this laptop has a Realtek 8821CE wireless card that wasn't officially supported in the Linux kernel, I had to use an out-of-tree hacked driver to have the wireless work on Stretch kernels such as 4.19, it didn't even got along with DKMS, so all compilations and installations of it, I did them manually. More reason to wait for a newer release that would contain a driver inside the official kernel.

I was waiting for the inevitable and dreading the wireless issues, but since mid-august Bullseye became stable, turning Stretch into oldoldstable, I decided that I had to do the upgrade, at least to buster.

The Grub error and the fix

Everything went quite smooth, except that after the reboot, the laptop failed to boot with this Grub error:

error: symbol ‘grub_is_lockdown’ not found

I looked for a solution and it seemed everyone was stuck or the solution was unclear.

There is even a bug report in Debian about this error, bug #984760.

Adding to the pile of confusion my own confused solution: I tried supergrubdisk2/rescatux, it didn't work for me, it might have been a combination of me using LVM and grub-efi-amd64. I also tried to boot in rescue mode the Buster first DVD (to avoid the need for network), I was able to enter the partition, mount the EFI partition, too, but since I didn't want to mess the setup even more or depend on an external USB stick, I didn't know where should I try to write the Grub EFI config - the root partition is on an NVME storage.

When buying the laptop it had FreeDOS installed on it and some HP rescue app, which I did not wipe when installing Debian. I even forgot where or how was the EFI installed on the disk and EFI, even if it should be more reliable and simpler, I never got the hang of it.

In the end, I realized that I could via BIOS actually select manually which EFI executable should be booted into, so I was able to boot with some manual intervention during boot into the regular system.

I tried regenerating the grub configuration, installing it and also tried restoring the default proper boot sequence (and I even installed refind in the system during my fumbling), but I think somewhere between grub-efi-amd64 reconfiguration and its reinstallation I managed to do the right thing, as the default boot screen is the Grub one now.

Hints for anyone reading this in the hope to fix the same issue, hopefully it will make things better, not worse (see the text below):

1) regenerate the grub config:

update-grub2
2) reinstall grub-efi-amd64 and make Debian the default

dpkg-reconfigure -plow grub-efi-amd64

When reinstalling grub-efi-amd64 onto the disk, I think the scariest questions were to these:

Force extra installation to the EFI removable media path?

Some EFI-based systems are buggy and do not handle new bootloaders correctly. If you force an extra installation of GRUB to the EFI removable media path, this should ensure that this system will boot Debian correctly despite such a problem. However, it may remove the ability to boot any other operating systems that also depend on this path. If so, you will need to make sure that GRUB is configured successfully to be able to boot any other OS installations correctly.

 and

Update NVRAM variables to automatically boot into Debian?

GRUB can configure your platform's NVRAM variables so that it boots into Debian automatically when powered on. However, you may prefer to disable this behavior and avoid changes to your boot configuration. For example, if your NVRAM variables have been set up such that your system contacts a PXE server on every boot, this would preserve that behavior.

I think the first can be safely answered "No" if you don't plan on booting via a removable USB stick, while the second is the one that does the restoring.

The second question is probably safe if you don't use PXE boot or other boot method, at least that's what I understand. But if you do, I suspect by installing refind, by playing with the multiple efi* named packages and tools, you can restore that, or it might be that your BIOS allows that directly.

I just did a walk through of these 2 steps again on my laptop and answered "No" to the removable media question as it leads to errors when the media was not inserted (in my case the internal SD card reader), and "Yes" to making Debian the default.

It seems that for me this broke the FreeDOS and HP utilities boot entries from Grub, but I still can boot via the BIOS options and my goal was to have Debian boot correctly by default.

Fixing the missing RTL811/8168/8411 Ethernet card issue

As a side note for people with computers having Realtek RTL8111/8168/8411 Gigabit Ethernet Controller and upgrading to Buster or switching to a newer kernel, please note that you might end up having the unpleasant surprise even your Ethernet card to disappear because the r8169 driver is not loader by default.

I had to add it to /etc/modules so is loaded by default:

eddy@aptonia:/ $ cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
r8169

The 5.10 compatible driver for RTL8821CE wireless adapter

After the upgrade to Buster, the oldstable version of the kernel, 4.19, the hacked version of the driver I've been using on Stretch on 4.9 kernels was no longer compatible - failed to compile due to missing symbols.

The fix for me was to switch to the DKMS compatible driver from https://github.com/tomaspinho/rtl8821ce, as this seems to work for both 4.19 and 5.10 kernels (installed from backports).

I installed it via a modification of the manual install method only for the 4.19 and 5.10 kernels, leaving the legacy 4.9 kernels working with the hacked driver. You can do the same if instead of running the provided script, you do its steps manually and you install only for the kernel versions you want, instead of the default to install for all:

I looked inside the dkms-install.sh script to do the required steps:

Copy the driver, add it to the dkms set of known drivers:

DRV_NAME=rtl8821ce
DRV_VERSION=v5.5.2_34066.20200325

cp -r . /usr/src/${DRV_NAME}-${DRV_VERSION}

dkms add -m ${DRV_NAME} -v ${DRV_VERSION}

But you just build and install them only for the select kernel versions of your choice:

dkms build -m ${DRV_NAME} -v ${DRV_VERSION} -k 5.10.0-0.bpo.8-amd64
dkms install -m ${DRV_NAME} -v ${DRV_VERSION} -k 5.10.0-0.bpo.8-amd64

 Or, without the variables:

dkms build rtl8821ce/v5.5.2_34066.20200325 -k 4.19.0-17-amd64
dkms install rtl8821ce/v5.5.2_34066.20200325 -k 4.19.0-17-amd64

dkms status should confirm everything is in place and I think you need to update grub2 again after this.

Please note this driver is no longer maintained and the 5.10 tree should support the RTL8821CE wireless card with the rtw88 driver from the kernel, but for me it did not. I'll probably try this at a later time, or after I upgrade to the current Debian stable, Bullseye.

Wednesday 10 July 2019

Rust: How do we teach "Implementing traits in no_std for generics using lifetimes" without students going mad?

Update 2019-Jul-27: In the code below my StackVec type was more complicated than it had to be, I had been using StackVec<'a, &'a mut T> instead of StackVec<'a, T> where T: 'a. I am unsure how I ended up making the type so complicated, but I suspect the lifetimes mismatch errors and the attempt to implement IntoIterator were the reason why I made the original mistake.

Corrected code accordingly.



I'm trying to go through Sergio Benitez's CS140E class and I am currently at Implementing StackVec. StackVec is something that currently, looks like this:

/// A contiguous array type backed by a slice.
///
/// `StackVec`'s functionality is similar to that of `std::Vec`. You can `push`
/// and `pop` and iterate over the vector. Unlike `Vec`, however, `StackVec`
/// requires no memory allocation as it is backed by a user-supplied slice. As a
/// result, `StackVec`'s capacity is _bounded_ by the user-supplied slice. This
/// results in `push` being fallible: if `push` is called when the vector is
/// full, an `Err` is returned.
#[derive(Debug)]
pub struct StackVec<'a, T: 'a> {
    storage: &'a mut [T],
    len: usize,
    capacity: usize,
}
The initial skeleton did not contain the derive Debug and the capacity field, I added them myself.

Now I am trying to understand what needs to happens behind:
  1. IntoIterator
  2. when in no_std
  3. with a custom type which has generics
  4. and has to use lifetimes
I don't now what I'm doing, I might have managed to do it:

pub struct StackVecIntoIterator<'a, T: 'a> {
    stackvec: StackVec<'a, T>,
    index: usize,
}

impl<'a, T: Clone + 'a> IntoIterator for StackVec<'a, &'a mut T> {
    type Item = &'a mut T;
    type IntoIter = StackVecIntoIterator<'a, T>;

    fn into_iter(self) -> Self::IntoIter {
        StackVecIntoIterator {
            stackvec: self,
            index: 0,
        }
    }
}

impl<'a, T: Clone + 'a> Iterator for StackVecIntoIterator<'a, T> {
    type Item = &'a mut T;

    fn next(&mut self) -> Option {
        let result = self.stackvec.pop();
        self.index += 1;

        result
    }
}

Corrected code as of 2019-Jul-27:
pub struct StackVecIntoIterator<'a, T: 'a> {
    stackvec: StackVec<'a, T>,
    index: usize,
}

impl<'a, T: Clone + 'a> IntoIterator for StackVec<'a, T> {
    type Item = T;
    type IntoIter = StackVecIntoIterator<'a, T>;

    fn into_iter(self) -> Self::IntoIter {
        StackVecIntoIterator {
            stackvec: self,
            index: 0,
        }
    }
}

impl<'a, T: Clone + 'a> Iterator for StackVecIntoIterator<'a, T> {
    type Item = T;

    fn next(&mut self) -> Option {
        let result = self.stackvec.pop().clone();
        self.index += 1;

        result
    }
}



I was really struggling to understand what should the returned iterator type be in my case, since, obviously, std::vec is out because a) I am trying to do a no_std implementation of something that should look a little like b) a std::vec.

That was until I found this wonderful example on a custom type without using any already implemented Iterator, but defining the helper PixelIntoIterator struct and its associated impl block:

struct Pixel {
    r: i8,
    g: i8,
    b: i8,
}

impl IntoIterator for Pixel {
    type Item = i8;
    type IntoIter = PixelIntoIterator;

    fn into_iter(self) -> Self::IntoIter {
        PixelIntoIterator {
            pixel: self,
            index: 0,
        }

    }
}

struct PixelIntoIterator {
    pixel: Pixel,
    index: usize,
}

impl Iterator for PixelIntoIterator {
    type Item = i8;
    fn next(&mut self) -> Option {
        let result = match self.index {
            0 => self.pixel.r,
            1 => self.pixel.g,
            2 => self.pixel.b,
            _ => return None,
        };
        self.index += 1;
        Some(result)
    }
}


fn main() {
    let p = Pixel {
        r: 54,
        g: 23,
        b: 74,
    };
    for component in p {
        println!("{}", component);
    }
}
The part in bold was what I was actually missing. Once I had that missing link, I was able to struggle through the generics part.

Note that, once I had only one new thing, the generics - luckly the lifetime part seemed it to be simply considered part of the generic thing - everything was easier to navigate.


Still, the fact there are so many new things at once, one of them being lifetimes - which can not be taught, only experienced @oli_obk - makes things very confusing.

Even if I think I managed it for IntoIterator, I am similarly confused about implementing "Deref for StackVec" for the same reasons.

I think I am seeing on my own skin what Oliver Scherer was saying about big infodumps at once at the beginning is not the way to go. I feel that if Sergio's class was now in its second year, things would have improved. OTOH, I am now very curious how does your curriculum look like, Oli?

All that aside, what should be the signature of the impl? Is this OK?

impl<'a, T: Clone + 'a> Deref for StackVec<'a, &'a mut T> {
    type Target = T;

    fn deref(&self) -> &Self::Target;
}
Trivial examples like wrapper structs over basic Copy types u8 make it more obvious what Target should be, but in this case it's so unclear, at least to me, at this point. And because of that I am unsure what should the implementation even look like.

I don't know what I'm doing, but I hope things will become clear with more exercise.

Thursday 4 July 2019

HOWTO: Rustup: Overriding the rustc compiler version just for some directory

If you need to use a specific version of the rustc compiler instead of the default, the rustup documentation tells you how to do that.


First install the desired version, e.g. nightly-2018-01-09

$ rustup install nightly-2018-01-09
info: syncing channel updates for 'nightly-2018-01-09-x86_64-pc-windows-msvc'
info: latest update on 2018-01-09, rust version 1.25.0-nightly (b5392f545 2018-01-08)
info: downloading component 'rustc'
info: downloading component 'rust-std'
info: downloading component 'cargo'
info: downloading component 'rust-docs'
info: installing component 'rustc'
info: installing component 'rust-std'
info: installing component 'cargo'
info: installing component 'rust-docs'

  nightly-2018-01-09-x86_64-pc-windows-msvc installed - rustc 1.25.0-nightly (b5392f545 2018-01-08)

info: checking for self-updates

Then override the default compiler with the desired one in the top directory of your choice:

$ rustup override set nightly-2018-01-09
info: using existing install for 'nightly-2018-01-09-x86_64-pc-windows-msvc'
info: override toolchain for 'C:\usr\src\rust\sbenitez-cs140e' set to 'nightly-2018-01-09-x86_64-pc-windows-msvc'

  nightly-2018-01-09-x86_64-pc-windows-msvc unchanged - rustc 1.25.0-nightly (b5392f545 2018-01-08)
That's it.

Saturday 15 June 2019

How to generate a usable map file for Rust code - and related (f)rustrations

Intro


Cargo does not produce a .map file, and if it does, mangling makes it very unusable. If you're searching for the TLDR, read from "How to generate a map file" on the bottom of the article.

Motivation

As a person with experience in embedded programming I find it very useful to be able to look into the map file.

Scenarios where looking at the map file is important:
  • evaluate if the code changes you made had the desired size impact or no undesired impact - recently I saw a compiler optimize for speed an initialization with 0 of an array by putting long blocks of u8 arrays in .rodata section
  • check if a particular symbol has landed in the appropriate memory section or region
  • make an initial evaluation of which functions/code could be changed to optimize either for code size or for more readability (if the size cost is acceptable)
  • check particular symbols have expected sizes and/or alignments

Rustrations 

Because these kind of scenarios  are quite frequent in my work and I am used to looking at the .map file, some "rustrations" I currently face are:
  1. No map file is generated by default via cargo and information on how to do it is sparse
  2. If generated, the symbols are mangled and it seems each symbol is in a section of its own, making per section (e.g. .rodata, .text, .bss, .data) or per file analysys more difficult than it should be
  3. I haven't found a way disable mangling globally, without editing the rust sources. - I remember there is some tool to un-mangle the output map file, but I forgot its name and I find the need to post-process suboptimal
  4. no default map file filename or location - ideally it should be named as the crate or app, as specified in the .toml file.

How to generate a map file

Generating map file for linux (and possibly other OSes)

Unfortunately, not all architectures/targets use the same linker, or on some the preferred linker could change for various reasons.

Here is how I managed to generate a map file for an AMD64/X86_64 linux target where it seems the linker is GLD:

Create a .cargo/config file with the following content:

.cargo/config:
[build]
    rustflags = ["-Clink-args=-Wl,-Map=app.map"]

This should apply to all targets which use GLD as a linker, so I suspect this is not portable to Windows integrated with MSVC compiler.

Generating a map file for thumb7m with rust-lld


On baremetal targets such as Cortex M7 (thumbv7m where you might want to use the llvm based rust-lld, more linker options might be necessary to prevent linking with compiler provided startup code or libraries, so the config would look something like this:
.cargo/config: 
[build]
target = "thumbv7m-none-eabi"
rustflags = ["-Clink-args=-Map=app.map"]
The thins I dislike about this is the fact the target is forced to thumbv7m-none-eabi, so some unit tests or generic code which might run on the build computer would be harder to test.

Note: if using rustc directly, just pass the extra options

Map file generation with some readable symbols

After the changes above ae done, you'll get an app.map file (even if the crate is of a lib) with a predefined name, If anyone knows ho to keep the crate name or at least use lib.map for libs, and app.map for apps, if the original project name can't be used.

The problems with the generated linker script are that:
  1. all symbol names are mangled, so you can't easily connect back to the code; the alternative is to force the compiler to not mangle, by adding the #[(no_mangle)] before the interesting symbols.
  2. each symbol seems to be put in its own subsection (e.g. an initalized array in .data.

Dealing with mangling

For problem 1, the fix is to add in the source #[no_mangle] to symbols or functions, like this:

#[no_mangle]
pub fn sing(start: i32, end: i32) -> String {
    // code body follows
}

Dealing with mangling globally

I wasn't able to find a way to convince cargo to apply no_mangle to the entire project, so if you know how to, please comment. I was thinking using #![no_mangle] to apply the attribute globally in a file would work, but is doesn't seem to work as expected: the subsection still contains the mangled name, while the symbol seems to be "namespaced":

Here is a some section from the #![no_mangle] (global) version:
.text._ZN9beer_song5verse17h0d94ba819eb8952aE
                0x000000000004fa00      0x61e /home/eddy/usr/src/rust/learn-rust/exercism/rust/beer-song/target/release/deps/libbeer_song-d80e2fdea1de9ada.rlib(beer_song-d80e2fdea1de9ada.beer_song.5vo42nek-cgu.3.rcgu.o)
                0x000000000004fa00                beer_song::verse
 
When the #[no_mangle] attribute is attached directly to the function, the subsection is not mangled and the symbol seems to be global:

.text.verse    0x000000000004f9c0      0x61e /home/eddy/usr/src/rust/learn-rust/exercism/rust/beer-song/target/release/deps/libbeer_song-d80e2fdea1de9ada.rlib(beer_song-d80e2fdea1de9ada.beer_song.5vo42nek-cgu.3.rcgu.o)
                0x000000000004f9c0                verse
I would prefer to have a cargo global option to switch for the entire project, and code changes would not be needed, comment welcome.

Each symbol in its section

The second issue is quite annoying, even if the fact that each symbol is in its own section can be useful to control every symbol's placement via the linker script, but I guess to fix this I need to custom linker file to redirect, say all constants "subsections" into ".rodata" section.

I haven't tried this, but it should work.

Tuesday 22 May 2018

rust for cortex-m7 baremetal

Update 14 December 2018: After the release of stable 1.31.0 (aka 2018 Edition), it is no longer necessary to switch to the nightly channel to get access to thumb7em-none-eabi / Cortex-M4 and Cortex-M7 components. Updated examples and commands accordingly.
For more details on embedded development using Rust, the official Rust embedded docs site is the place to go, in particular, you can start with The embedded Rust book.
 
 
This is a reminder for myself, if you want to install Rust for a baremetal Cortex-M7 target, this seems to be a tier 3 platform:

https://forge.rust-lang.org/platform-support.html

Higlighting the relevant part:

Target std rustc cargo notes
...
msp430-none-elf * 16-bit MSP430 microcontrollers
sparc64-unknown-netbsd NetBSD/sparc64
thumbv6m-none-eabi * Bare Cortex-M0, M0+, M1
thumbv7em-none-eabi *

Bare Cortex-M4, M7
thumbv7em-none-eabihf * Bare Cortex-M4F, M7F, FPU, hardfloat
thumbv7m-none-eabi * Bare Cortex-M3
...
x86_64-unknown-openbsd 64-bit OpenBSD

In order to enable the relevant support, use the nightly build and use stable >= 1.31.0 and add the relevant target:
eddy@feodora:~/usr/src/rust-uc$ rustup show
Default host: x86_64-unknown-linux-gnu

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu
nightly-x86_64-unknown-linux-gnu (default)

active toolchain
----------------

nightly-x86_64-unknown-linux-gnu (default)
rustc 1.28.0-nightly (cb20f68d0 2018-05-21)
eddy@feodora:~/usr/src/rust$ rustup show
Default host: x86_64-unknown-linux-gnu

stable-x86_64-unknown-linux-gnu (default)
rustc 1.31.0 (abe02cefd 2018-12-04)

If not using nightly, switch to that:


eddy@feodora:~/usr/src/rust-uc$ rustup default nightly-x86_64-unknown-linux-gnu
info: using existing install for 'nightly-x86_64-unknown-linux-gnu'
info: default toolchain set to 'nightly-x86_64-unknown-linux-gnu'

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.28.0-nightly (cb20f68d0 2018-05-21)
Add the needed target:
eddy@feodora:~/usr/src/rust$ rustup target add thumbv7em-none-eabi
info: downloading component 'rust-std' for 'thumbv7em-none-eabi'
info: installing component 'rust-std' for 'thumbv7em-none-eabi'
eddy@feodora:~/usr/src/rust$ rustup show
Default host: x86_64-unknown-linux-gnu

installed targets for active toolchain
--------------------------------------

thumbv7em-none-eabi
x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.31.0 (abe02cefd 2018-12-04)
Then compile with --target.

Thursday 10 May 2018

"Where does Unity store its launch bar items?" or "Convincing Ubuntu's Unity 7.4.5 to run the newer version of PyCharm when starting from the launcer"

I have been driving a System76 Oryx-Pro for some time now. And I am running Ubuntu 16.04 on it.
I typically try to avoid polluting global name spaces, so any apps I install from source I tend to install under a versioned directory under ~/opt, for instance, PyCharm Community Edition 2016.3.1 is installed under ~/opt/pycharm-community-2016.3.1.

Today, after Pycharm suggested I install a newer version, I downloaded the current package, and ran it, as instructed in the embedded readme.txt, via the wrapper script:
~/opt/pycharm-community-2018.1.2/bin$ ./pycharm.sh
Everything looked OK, but when wanting to lock the icon on the launch bar I realized Unity did not display a separate Pycharm Community Edition icon for the 2018.1.2 version, but showed the existing icon as active.

"I guess it's the same filename, so maybe unity confuses the older version with the new one, so I have to replace the launcher to user the newer version by default", I said.

So I closed the interface, then removed the PyCharm Community Edition, then I restarted the newer Pycharm from the command line, then blocked the icon, then I closed PyCharm once more, then clicked on the launcher bar.

Surprise! Unity was launching the old version! What?!

Repeated the entire series of steps, suspecting some PEBKAC, but was surprised to see the same result.

"Damn! Unity is stupid! I guess is a good thing they decided to kill it!", I said to myself.

Well, it shouldn't be that hard to find the offending item, so I started to grep in ~/.config, then in ~/.* for the string "PyCharm Cummunity Edition" without success.
Hmm, I guess the Linux world copied a lot of bad ideas from the windows world, probably the configs are not in ~/.* in plain text, they're probably in that simulacrum of a Windows registry called dconf, so I installed dconf-editor and searched once more for the keyword "Community", but only found one entry in the gedit filebrowser context.

So where does Unity gets its items from the launchbar? Since there is no "Properties" entry context menu and didn't want to try to debug the starting of my graphic environment, but Unity is open source, I had to look at the sources.

After some fighting with dead links to unity.ubuntu.com subpages, then searching for "git repository Ubuntu Unity", I realized Ubuntu loves Bazaar, so searched for "bzr Ubuntu Unity repository", no luck. Luckly, Wikipedia usually has those kind of links, and found the damn thing.

BTW, am I the only one considering some strong hits with a clue bat the developers which name projects by some generic term that has no chance to override the typical term in common parlance such as "Unity" or "Repo"?

Finding the sources and looking a little at the repository did not make it clear which was the entry point. I was expecting at least the README or the INSTALL file would give some relevant hints on the config or the initalization. M patience was running dry.

Maybe looking on my own system would be a better approach?
eddy@feodora:~$ which unity
/usr/bin/unity
eddy@feodora:~$ ll $(which unity)
-rwxr-xr-x 1 root root 9907 feb 21 21:38 /usr/bin/unity*
eddy@feodora:~$ ^ll^file
file $(which unity)
/usr/bin/unity: Python script, ASCII text executable
BINGO! This looks like a python script executable, it's not a binary, in spite of the many .cpp sources in the Unity tree.

I opened the file with less, then found this interesting bit:
 def reset_launcher_icons ():
    '''Reset the default launcher icon and restart it.'''
    subprocess.Popen(["gsettings", "reset" ,"com.canonical.Unity.Launcher" , "favorites"])
Great! So it stores that stuff in the pseudo-registry, but have to look under com.canonical.Unity.Launcher.favorites. Firing dconf-editor again found the relevant bit in the value of that key:
'application://jetbrains-pycharm-ce.desktop'
So where is this .desktop file? I guess using find is going to bring it up:
find /home/eddy/.local/ -name jetbrains* -exec vi {} \;
It did, and the content made it obvious what was happening:
[Desktop Entry]
Version=1.0
Type=Application
Name=PyCharm Community Edition
Icon=/home/eddy/opt/pycharm-community-2016.3.1/bin/pycharm.png
Exec="/home/eddy/opt/pycharm-community-2016.3.1/bin/pycharm.sh" %f

Comment=The Drive to Develop
Categories=Development;IDE;
Terminal=false
StartupWMClass=jetbrains-pycharm-ce
Probably Unity did not create a new desktop file when locking the icon, it would simply check if the jetbrains-pycharm-ce.desktop file existed already in my.local directory, saw it was, so it skipped its recreation.

Just as somebody said, all difficult computer science problems are eiether caused by leaky abstractions or caching. I guess here we're having some sort of caching issue, but is easy to fix, just edit the file:
eddy@feodora:~$ cat /home/eddy/.local/share/applications/jetbrains-pycharm-ce.desktop

[Desktop Entry]
Version=1.0
Type=Application
Name=PyCharm Community Edition
Icon=/home/eddy/opt/pycharm-community-2018.1.2/bin/pycharm.png
Exec="/home/eddy/opt/pycharm-community-2018.1.2/bin/pycharm.sh" %f
Comment=The Drive to Develop
Categories=Development;IDE;
Terminal=false
StartupWMClass=jetbrains-pycharm-ce
Checked again the start, and now the expected slash screen appears. GREAT!

I wonder if this is a Unity issue or is it due to some broken library that could affect other desktop environments such as MATE, GNOME or XFCE?

"Only" lost 2 hours (including this post) with this stupid bug, so I can go back to what I was trying in the first place, but now is already to late, so I have to go to sleep.

Friday 26 January 2018

Detecting binary files in the history of a git repository

Git, VCSes and binary files

Git is famous and has become popular even in the enterprise/commercial environments. But Git is also infamous regarding storage of large and/or binary files that change often, in spite of the fact they can be efficiently stored. For large files there have been several attempts to fix the issue, with varying degree of success, the most successful being git-lfs and git-annex.

My personal view is that, contrary to many practices, is a bad idea to store binaries in any VCS. Still, this practice has been and still is in use in may projects, especially in closed source projects. I won't go into the reasons, and how legitimate they are, let's say that we might finally convince people that binaries should be removed from the VCS, git, in particular.

Since the purpose of a VCS is to make sure all versions of the stored objects are never lost, Linus designed git in such a way that knowing the exact hash of the tip/head of your git branch, it is guaranteed the whole history of that branch hasn't changed even if the repository was stored in a non-trusted location (I will ignore hash collisions, for practical reasons).

The consequence of this is that if the history is changed one bit, all commit hashes and history after that change will change also. This is what people refer to when they say they rewrite the (git) history, most often, in the context of a rebase.

But did you know that you could use git rebase to traverse the history of a branch and do all sorts of operations such as detecting all binary files that were ever stored in the branch?

Detecting any binary files, only in the current commit

As with everything on *nix, we start with some building blocks, and construct our solution on top of them. Let's first find all files, except the ones in .git:

find . -type f -print | grep -v '^\.\/\.git\/'
Then we can use the 'file' utility to list for non-text files:
(find . -type f -print | grep -v '^\.\/\.git\/' | xargs file )| egrep -v '(ASCII|Unicode) text'
And if there are any such file, then it means the current git commit is one that needs our attention, otherwise, we're fine.
(find . -type f -print | grep -v '^\.\/\.git\/' | xargs file )| egrep -v '(ASCII|Unicode) text' && (echo 'ERROR:' && git show --oneline -s) || echo OK
 Of course, we assume here, the work tree is clean.

Checking all commits in a branch

Since we want to make this an efficient process and we only care if the history contains binaries, and branches are cheap in git, we can use a temporary branch that can be thrown away after our processing is finalized.
Making a new branch for some experiments is also a good idea to avoid losing the history, in case we do some stupid mistakes during our experiment.

Hence, we first create a new branch which points to the exact same tip the branch to be checked points to, and move to it:
git checkout -b test_bins
Git has many commands that facilitate automation, and my case I want to basically run the chain of commands on all commits. For this we can put our chain of commands in a script:

cat > ../check_file_text.sh
#!/bin/sh

(find . -type f -print | grep -v '^\.\/\.git\/' | xargs file )| egrep -v '(ASCII|Unicode) text' && (echo 'ERROR:' && git show --oneline -s) || echo OK
then (ab)use 'git rebase' to execute that for us for all commits:
git rebase --exec="sh ../check_file_text.sh" -i $startcommit
After we execute this, the editor window will pop up, just save and exit. Assuming $startcommit is the hash of the first commit we know to be clean or beyond which we don't care to search for binaries, this will look in all commits since then.

Here is an example output when checking the newest 5 commits:

$ git rebase --exec="sh ../check_file_text.sh" -i HEAD~5
Executing: sh ../check_file_text.sh
OK
Executing: sh ../check_file_text.sh
OK
Executing: sh ../check_file_text.sh
OK
Executing: sh ../check_file_text.sh
OK
Executing: sh ../check_file_text.sh
OK
Successfully rebased and updated refs/heads/test_bins.

Please note this process can change the history on the test_bins branch, but that is why we used a throw-away branch anyway, right? After we're done, we can go back to another branch and delete the test branch.

$ git co master
Switched to branch 'master'

Your branch is up-to-date with 'origin/master'
$ git branch -D test_bins
Deleted branch test_bins (was 6358b91).
Enjoy!

Thursday 11 January 2018

Suppressing color output of the Google Repo tool

On Windows, in the cmd shell, the color control caracters generated by the Google Repo tool (or its windows port made by ESRLabs) or git appear as garbage. Unfortunately, the Google Repo tool, besides the fact it has a non-google-able name, lacks documentation regarding its options, so sometimes the only way to find out what is the option I want is to look in the code.
To avoid repeatedly look over the code to dig up this, future self, here is how you disable color output in the repo tool with the info subcommand:
repo --color=never info
Other options are 'auto' and 'always', but for some reason, auto does not do the right thing (tm) in Windows and garbage is shown with auto.

Saturday 25 March 2017

LVM: Converting root partition from linear to raid1 leads to boot failure... and how to recover

I have a system which has 3 distinct HDDs used as physucal volumes for Linux LVM. One logical volume is the root partition and it was initally created as a linear LV (vg0/OS).
Since I have PV redundancy, I thought it might be a good idea to convert the root LV from liear to raid1 with 2 mirrors.

WARNING: It seems LVM raid1 logicalvolume for / is not supported with grub2, at least not with Ubuntu's 2.02~beta2-9ubuntu1.6 (14.04LTS) or Debian Jessie's grub-pc 2.02~beta2-22+deb8u1!

So I did this:
lvconvert -m2 --type raid1 vg0/OS

Then I restarted to find myself at the 'grub rescue>' prompt.

The initial problem was seen on an Ubuntu 14.04 LTS (aka trusty) system, but I reproduced it on a VM with Debian Jessie.

I downloaded the Super Grub2 Disk and tried to boot the VM. After choosing the option to load the LVM and RAID support, I was able to boot my previous system.

I tried several times to reinstall GRUB, thinking that was the issue, but I always got this  kind of error:


/usr/sbin/grub-probe: error: disk `lvmid/QtJiw0-wsDf-A2zh-2v2y-7JVA-NhPQ-TfjQlN/phCDlj-1XAM-VZnl-RzRy-g3kf-eeUB-dBcgmb' not found.

In the end, after digging for more than 4 hours for answers,  I decided I might be able to revert the config to linear configuration, from the (initramfs) prompt.

Initally the LV was inactive, so I activated it:

lvchange -a y /dev/vg0/OS

Then restored the LV to linear:

lvconvert -m0 vg0/OS

Then tried to reboot without reinstalling GRUB, just for kicks, which succeded.

In order to confirm this was the issue, I redid the whole thing, and indeed, with a raid1 root, I always got the error lvmid error.

I'll have to check on Monday at work if I can revert it the same way the Ubuntu 14.04 system, but I suspect I will have no issues.


Is it true root on lvm-raid1 is nto supported?

Thursday 10 December 2015

HOWTO: Setting and inserting/using MS Word 2013 document properties in the body of the document

I wrote this so I won't forget it and for others to find, if confronted with the same issue.

I hate Microsoft Office in all its incarnations, but I have to use it at work for various stuff. One of them is maintaining some technical documentation. we now use Office 365 and Office 2013.

Since MS Office Word 2013 is not a technical documentation program, some of the support for it is clunky. For things such as version numbers or others strings that might repeat throughout the document, (advanced) document properties is a way to go.

To set them select File > Info > Properties > Advanced Properties > Custom then fill in the 'Name:', 'Type:' and 'Value:', then press Add, then OK.

Once the properties are set, it can be inserted in the document by selecting its name in the 'Property:' list from the menu: INSERT > Quick Parts >  Field... > Categories:Document Information > DocProperty.

After updating the value of any property (from the Advanced Properties dialog), to update all the places where the properties were used in the document, press Ctrl+A then right click > Update Filed > Update entire table > OK.

And, yes, 'Update entire table' will update the values, although it's name is stupid.

Saturday 23 May 2015

HOWTO: No SSH logins SFTP only chrooted server configuration with OpenSSH

If you are in a situation where you want to set up a SFTP server in a more secure way, don't want to expose anything from the server via SFTP and do not want to enable SSH login on the account allowed to sftp, you might find the information below useful.

What do we want to achive:
  • SFTP server
  • only a specified account is allowed to connect to SFTP
  • nothing outside the SFTP directory is exposed
  • no SSH login is allowed
  • any extra security measures are welcome
To obtain all of the above we will create a dedicated account which will be chroot-ed, its home will be stored on a removable/no always mounted drive (acessing SFTP will not work when the drive is not mounted).

Mount the removable drive which will hold the SFTP area (you might need to add some entry in fstab). 

Create the account to be used for SFTP access (on a Debian system this will do the trick):
# adduser --system --home /media/Store/sftp --shell /usr/sbin/nologin sftp

This will create the account sftp which has login disabled, shell is /usr/sbin/nologin and create the home directory for this user.

Unfortunately the default ownership of the home directory of this user are incompatible with chroot-ing in SFTP (which prevents access to other files on the server). A message like the one below will be generated in this kind of case:
$ sftp -v sftp@localhost
[..]
sftp@localhost's password:
debug1: Authentication succeeded (password).
Authenticated to localhost ([::1]:22).
debug1: channel 0: new [client-session]
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
Write failed: Broken pipe
Couldn't read packet: Connection reset by peer
Also /var/log/auth.log will contain something like this:
fatal: bad ownership or modes for chroot directory "/media/Store/sftp"

The default permissions are visible using the 'namei -l' command on the sftp home directory:
# namei -l /media/Store/sftp
f: /media/Store/sftp
drwxr-xr-x root root    /
drwxr-xr-x root root    media
drwxr-xr-x root root    Store
drwxr-xr-x sftp nogroup sftp
We change the ownership of the sftp directory and make sure there is a place for files to be uploaded in the SFTP area:
# chown root:root /media/Store/sftp
# mkdir /media/Store/sftp/upload
# chown sftp /media/Store/sftp/upload

We isolate the sftp users from other users on the system and configure a chroot-ed environment for all users accessing the SFTP server:
# addgroup sftpusers
# adduser sftp sftusers
Set a password for the sftp user so password authentication works:
# passwd sftp
Putting all pieces together, we restrict access only to the sftp user, allow it access via password authentication only to SFTP, but not SSH (and disallow tunneling and forwarding or empty passwords).

Here are the changes done in /etc/ssh/sshd_config:
PermitEmptyPasswords no
PasswordAuthentication yes
AllowUsers sftp
Subsystem sftp internal-sftp
Match Group sftpusers
        ChrootDirectory %h
        ForceCommand internal-sftp
        X11Forwarding no
        AllowTcpForwarding no
        PermitTunnel no
Reload the sshd configuration (I'm using systemd):
# systemctl reload ssh.service
Check sftp user can't login via SSH:
$ ssh sftp@localhost
sftp@localhost's password:
This service allows sftp connections only.
Connection to localhost closed.
But SFTP is working and is restricted to the SFTP area:
$ sftp sftp@localhost
sftp@localhost's password:
Connected to localhost.
sftp> ls
upload 
sftp> pwd
Remote working directory: /
sftp> put netbsd-nfs.bin
Uploading netbsd-nfs.bin to /netbsd-nfs.bin
remote open("/netbsd-nfs.bin"): Permission denied
sftp> cd upload
sftp> put netbsd-nfs.bin
Uploading netbsd-nfs.bin to /upload/netbsd-nfs.bin
netbsd-nfs.bin                                                              100% 3111KB   3.0MB/s   00:00
Now your system is ready to accept sftp connections, things can be uploaded in the upload directory and whenever the external drive is unmounted, SFTP will NOT work.

Note: Since we added 'AllowUsers sftp', you can test no local user can login via SSH. If you don't want to restrict access only to the sftp user, you can whitelist other users by adding them in the AllowUsers directive, or dropping it entirely so all local users can SSH into the system.

Wednesday 20 May 2015

Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 2 - RedBoot reverse engineering and APEX hacking

(continuation of Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 1; meanwhile, my article was mentioned briefly in BSDNow Episode 89 - Exclusive Disjunction around minute 36:25)

Choosing to call RedBoot from a hacked Apex


As I was saying in my previous post, in order to be able to automate the booting of the NetBSD image via TFTP, I opted for using a 2nd stage bootloader (planning to flash it in the NSLU2 instead of a Linux kernel), and since Debian was already using Apex, I chose Apex, too.

The first problem I found was that the networking support in Apex was relying on an old version of the Intel NPE library which I couldn't find on Intel's site. The new version was incompatible/not building with the old build wrapper in Apex, so I was faced with 3 options:
  1. Fight with the availabel Intel code and try to force it to compile in Apex
  2. Incorporate the NPE driver from NetBSD into a rump kernel to be included in Apex instead of the original Intel code, since the NetBSD driver only needed an easily compilable binary blob
  3. Hack together an Apex version that simulates the typing necessary RedBoot commands to load via TFTP the netbsd image and execute it.
After taking a look at the NPE driver buildsystem, I concluded there were very few options less attractive that option 1, among which was hammering nails through my forehead as a improvement measure against the severe brain damage which I would probably be likely to be inflicted with after dealing with the NPE "build system".

Option 2 looked like the best option I could have, given the situation, but my NetBSD foo was too close to 0 to even dream to endeavor on such a task. In my opinion, this still remains the technically superior solution to the problem since is very portable and a flexible way to ensure networking works in spite of the proprietary NPE code.

But, in practice, the best option I could implement at the time was option 3. I initially planned to pre-fill from Apex my desired commands into the RedBoot buffer that stored the keyboard strokes typed by the user:

load -r -b 0x200000 -h 192.168.0.2 netbsd-nfs.bin
g
Since this was the first time ever for me I was going to do less than trivial reverse engineering in order to find the addresses and signatures of interesting functions in the RedBoot code, it wasn't bad at all that I had a version of the RedBoot source code.

When stuck with reverse engineering, apply JTAG


The bad thing was that the code Linksys published as the source of the RedBoot running inside the NSLU2 was, in fact, a different code which had some significant changes around the code pieces I was mostly interested in. That in spite of the GPL terms.

But I thought that I could manage. After all, how hard could it be to identify the 2-3 functions I was interested in and 1 buffer? Even if I only had the disassembled code from the slug, it shouldn't be that hard.

I struggled with this for about 2-3 weeks on the few occasions I had during that time, but the excitement of leaning something new kept me going. Until I got stuck somewhere between the misalignment between the published RedBoot code and the disassembled code, the state of the system at the time of dumping the contents from RAM (for purposes of disassemby), the assembly code generated by GCC for some specific C code I didn't have at all, and the particularities of ARM assembly.

What was most likely to unblock me was to actually see the code in action, so I decided attaching a JTAG dongle to the slug and do a session of in-circuit-debugging was in order.

Luckily, the pinout of the JTAG interface was already identified in the NSLU2 Linux project, so I only had to solder some wires to the specified places and a 2x20 header to be able to connect through JTAG to the board.


JTAG connections on Kinder (the NSLU2 targeting NetBSD)

After this was done I tried immediately to see if when using a JTAG debugger I could break the execution of the code on the system. The answer was sadly, no.

The chip was identified, but breaking the execution was not happening. I tried this in OpenOCD and in another proprietary debugger application I had access to, and the result was the same, breaking was not happening.
$ openocd -f interface/ftdi/olimex-arm-usb-ocd.cfg -f board/linksys_nslu2.cfg
Open On-Chip Debugger 0.8.0 (2015-04-14-09:12)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.sourceforge.net/doc/doxygen/bugs.html
Info : only one transport option; autoselect 'jtag'
adapter speed: 300 kHz
Info : ixp42x.cpu: hardware has 2 breakpoints and 2 watchpoints
0
Info : clock speed 300 kHz
Info : JTAG tap: ixp42x.cpu tap/device found: 0x29277013 (mfg: 0x009,
part: 0x9277, ver: 0x2)
[..]

$ telnet localhost 4444
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
Open On-Chip Debugger
> halt
target was in unknown state when halt was requested
in procedure 'halt'
> poll
background polling: on
TAP: ixp42x.cpu (enabled)
target state: unknown
Looking into the documentation I found a bit of information on the XScale processors[X] which suggested that XScale processors might necessarily need the (otherwise optional) SRST signal on the JTAG interface to be able to single step the chip.

This confused me a lot since I was sure other people had already used JTAG on the NSLU2.

The options I saw at the time were:
  1. my NSLU2 did have a fully working JTAG interface (either due to the missing SRST signal on the interface or maybe due to a JTAG lock on later generation NSLU-s, as was my second slug)
  2. nobody ever single stepped the slug using OpenOCD or other JTAG debugger, they only reflashed, and I was on totally new ground
I even contacted Rod Whitby, the project leader of the NSLU2 project to try to confirm single stepping was done before. Rod told me he never did that and he only reflashed the device.

This confused me even further because, from what I encountered on other platforms, in order to flash some device, the code responsible for programming the flash is loaded in the RAM of the target microcontroller and that code is executed on the target after a RAM buffer with the to be flashed data is preloaded via JTAG, then the operation is repeated for all flash blocks to be reprogrammed.

I was aware it was possible to program a flash chip situated on the board, outside the chip, by only playing with the chip's pads, strictly via JTAG, but I was still hoping single stepping the execution of the code in RedBoot was possible.

Guided by that hope and the possibility the newer versions of the device to be locked, I decided to add a JTAG interface to my older NSLU2, too. But this time I decided I would also add the TRST and SRST signals to the JTAG interface, just in case single stepping would work.

This mod involved even more extensive changes than the ones done on the other NSLU, but I was so frustrated by the fact I was stuck that I didn't mind poking a few holes through the case and the prospect of a connector always sticking out from the other NSLU2, which was doing some small, yet useful work in my home LAN.

It turns out NOBODY single stepped the NSLU2

 

After biting the bullet and soldering JTAG interface with also the TRST and the SRST signals connected as the pinout page from the NSLU2 Linux wiki suggested, I was disappointed to observe that I was not able to single step the older NSLU2 either, in spite of the presence of the extra signals.

I even tinkered with the reset configurations of OpenOCD, but had not success. After obtaining the same result on the proprietary debugger, digging through a presentation made by Rod back in the hay day of the project and the conversations on the NSLU2 Linux Yahoo mailing list, I finally concluded:
Actually nobody single stepped the NSLU2, no matter the version of the NSLU2 or connections available on the JTAG interface!
So I was back to square 1, I had to either struggle with disassembly, reevaluate my initial options, find another option or even drop entirely the idea. At that point I was already committed to the project, so dropping entirely the idea didn't seem like the reasonable thing to do.

Since I was feeling I was really close to finish on the route I had chosen a while ago, I was not any significantly more knowledgeable in the NetBSD code, and looking at the NPE code made me feel like washing my hands, the only option which seemed reasonable was to go on.

Digging a lot more through the internet, I was finally able to find another version of the RedBoot source which was modified for Intel ixp42x systems. A few checks here and there revealed this newly found code was actually almost identical to the code I had disassembled from the slug I was aiming to run NetBSD on. This was a huge step forward.

Long story short, a couple of days later I had a hacked Apex that could go through the RedBoot data structures, search for available commands in RedBoot and successfully call any of the built-in RedBoot commands!

Testing with loading this modified Apex by hand in RAM via TFTP then jumping into it to see if things woked as expected revealed a few small issues which I corrected right away.

Flashing a modified RedBoot?! But why? Wasn't Apex supposed to avoid exactly that risky operation?


Since the tests when executing from RAM were successful, my custom second stage Apex bootloader for NetBSD net booting was ready to be flashed into the NSLU2.

I added two more targets in the Makefile in the code on the dedicated netbsd branch of my Apex repository to generate the images ready for flashing into the NSLU2 flash (RedBoot needs to find a Sercomm header in flash, otherwise it will crash) and the exact commands to be executed in RedBoot are also print out after generation. This way, if the command is copy-pasted, there is no risk the NSLU2 is bricked by mistake.

After some flashing and reflashing of the apex_nslu2.flash image into the NSLU2 flash, some manual testing, tweaking and modifying the default built in APEX commands, checking that the sequence of commands 'move', 'go 0x01d00000' would jump into Apex, which, in turn, would call RedBoot to transfer the netbsd-nfs.bin image from a TFTP to RAM and then execute it successfully, it was high time to check NetBSD would boot automatically after the NSLU was powered on.

It didn't. Contrary to my previous tests, no call made from Apex to the RedBoot code would return back to Apex, not even the execution of a basic command such as the 'version' command.

It turns out the default commands hardcoded into RedBoot were 'boot; exec 0x01d00000', but I had tested 'boot; go 0x01d0000', which is not the same thing.

While 'go' does a plain jump at the specified address, the 'exec' command also does some preparations so it allows a jump into the Linux kernel and those preparations break some environment the RedBoot commands expect. I don't know which those are and didn't had the mood or motivation to find out.

So the easiest solution was to change the RedBoot's built-in command and turn that 'exec' into a 'go'. But that meant this time I was actually risking to brick the NSLU, unless I
was able to reflash via JTAG the NSLU2.


(to be continued - next, changing RedBoot and bisecting through the NetBSD history)

[X] Linksys NSLU2 has an XScale IXP420 processor which is compatible at ASM level with the ARMv5TEJ instruction set

Friday 8 May 2015

Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 1

About 2 months ago I set a goal to run some kind of BSD on the spare Linksys NSLU2 I had. This was driven mostly by curiosity, after listening to a few BSDNow episodes and becoming a regular listener, but it was a really interesting experience (it was also somewhat frustrating, mostly due to lacking documentation or proprietary code).

Looking for documentation on how to install any BSD flavour on the Linksys NSLU2, I have found what appears to be some too-incomplete-to-be-useful-for-a-BSD-newbie information about installing FreeBSD, no information about OpenBSD and some very detailed information about NetBSD on the Linksys NSLU2.

I was very impressed by the NetBSD build.sh script which can be used to cross-compile the entire NetBSD system - to do that, it also builds the appropriate toolchain - NetBSD kernel and the base system, even when ran on a Linux host. Having some experience with cross compilation for GNU/Linux embedded systems I can honestly say this is immensely impressive, well done NetBSD!

Gone were a few failed attempts to properly follow the instruction and lots of hours of (re)building, but then I had the kernel and the sets (the NetBSD system is split into several parts which are grouped by functionality, these are the sets), so I was in the position to have to set things up to be able to net boot - kernel loading via TFTP and rootfs on NFS.

But it wouldn't be challenging if the instructions were followed to the letter, so the first thing I wanted to change was that I didn't want to run dhcpd just to pass the DHCP boot configuration to the NSLU2, that seemed like a waste of resources since I already had dnsmasq running.

After some effort and struggling with missing documentation, I managed to use dnsmasq to pass DHCP boot parameters to the slug, but also use it as TFTP server - after some time I documented this for future reference on my blog and expect to refer to it in the future.

Setting up NFS wasn't a problem, but, when trying to boot, I found that I managed to misread at least 3 or 4 times some of the NSLU2 related information on the NetBSD wiki. To be able to debug what was happening I concluded the slug should have a serial console attached to it, which helped a lot.

Still the result was that I wasn't able to boot the trunk version of the NetBSD code on my NSLU2.

Long story short, with the help of some people from the #netbsd IRC channel on Freenode and from the port-arm NetBSD mailing list I found out that I might have a better chance with specific older versions. In practice what really worked was the code from the netbsd_6_1 branch.

Discussions on the port-arm mailing list, some digging into the (recently found) PR (problem reports), and a successful execution of the trunk kernel (at the time, version 7.99.4) together with 6.1.5 userspace lead me to the conclusion the NetBSD userspace for armbe was broken in the trunk branch.

And since I concluded this would be a good occasion to learn a few details about NetBSD, I set out to git bisect through the trunk history to identify when this happened. But that meant being able to easily load kernels and run them from TFTP, which was not how the RedBoot bootloader flashed into the slug behaves by default.

Be default, the RedBoot bootloader flashed into the NSLU2 waits for 2 seconds for a manual interaction (it waits for a ^C) on the serial console or on the telnet RedBoot prompt, then, if no such event happens, it copies the Linux image it has in flash starting with adress 0x50060000 into RAM at address 0x01d00000 (after stripping the Sercomm header) and then executes the copied code from RAM.

Of course, this is not a very handy way to try to boot things from TFTP, so my first idea to overcome this limitation was to use a second stage bootloader which would do the loading via TFTP of the NetBSD kernel, then execute it from RAM. Flashing this second stage bootloader instead of the Linux kernel at 0x50060000 would make sure that no manual intervention except power on would be necessary when a new kernel+userspace pair is ready to be tested.

Another advantage was that I would not risk bricking the NSLU2 since I would not be changing RedBoot, the original bootloader.

I knew Apex was used as the second stage bootloader in Debian, so I started configuring my own version of the APEX bootloader to make it work for the netbsd-nfs.bin image to be loaded via TFTP.

My first disappointment was that Apex was did not support receiving the boot parameters via DHCP, but only via RARP (it was clear it was less tested with BOOTP or DHCP) and TFTP was documented in the code as being problematic. That meant that I would have to hard code the boot configuration or configure RARP, but that wasn't too bad.

Later I found out that I wasted time on that avenue because the network driver in Apex was some Intel code (NPE Access Library) which can't be freely distributed, but could have been downloaded from Intel's site back in 2008-2009. The bad news was that current versions did not work at all with the old patch work that was done in Apex to allow for the driver made for Linux to compile in a world of its own so it could be incorporated in Apex.

I was stuck and the only options I were:
  1. Fight with the available Intel code and make it compile in Apex
  2. Incorporate the NPE driver from NetBSD into a rump kernel which will be included in Apex, since I knew the NetBSD driver only needed a very easily obtainable binary blob, instead of the entire driver as was in Apex before
  3. Hack together an Apex version that simulates the typing of the necessary commands to load the netbsd-nfs.bin image inside RedBoot, or in other words, call from Apex the RedBoot functions necessary to load from TFTP and execute NetBSD.
Option 1 did not look that appealing after looking into the horrible Intel build system and its endless dependencies into a specific Linux kernel version.

Option 2 was more appealing, but since I didn't knew NetBSD and only tried once to build and run a NetBSD rump kernel, it seemed like a doable project only for an experienced NetBSD developer or at least an experienced NetBSD user, which I was not.

So I was left with option 3, which meant I had to do some reverse engineering of the code, because, although RedBoot is GPL, Linksys did not publish the source from which the running RedBoot was built from.


(continues here)

Thursday 30 April 2015

Linksys NSLU2 JTAG help requested

Some time ago I have embarked on a jurney to install NetBSD on one of my two NSLU2-s. I have ran into all sorts of hurdles and problems which I finally managed to overcome, except one:

The NSLU I am using has a standard 20 pin ARM JTAG connector attached to it (as per this page http://www.nslu2-linux.org/wiki/Info/PinoutOfJTAGPort, only TDI, TDO, TMS, TCK, Vref and GND signals), but, although the chip is identified, I am unable to halt the CPU:
    $ openocd -f interface/ftdi/olimex-arm-usb-ocd.cfg -f board/linksys_nslu2.cfg
    Open On-Chip Debugger 0.8.0 (2015-04-14-09:12)
    Licensed under GNU GPL v2
    For bug reports, read
        http://openocd.sourceforge.net/doc/doxygen/bugs.html
    Info : only one transport option; autoselect 'jtag'
    adapter speed: 300 kHz
    Info : ixp42x.cpu: hardware has 2 breakpoints and 2 watchpoints
    0
    Info : clock speed 300 kHz
    Info : JTAG tap: ixp42x.cpu tap/device found: 0x29277013 (mfg: 0x009,
    part: 0x9277, ver: 0x2)
    [..]
    $ telnet localhost 4444
    Trying ::1...
    Trying 127.0.0.1...
    Connected to localhost.
    Escape character is '^]'.
    Open On-Chip Debugger
    > halt
    target was in unknown state when halt was requested
    in procedure 'halt'
    > poll
    background polling: on
    TAP: ixp42x.cpu (enabled)
    target state: unknown
My main goal is to make sure I can  flash the device via JTAG, in case I break it, but it would be ideal if I could use the JTAG to single step through the code.

I have found that other people have managed to flash the device via JTAG without the other signals, and some have even changed the bootloader (and had JTAG confirmed as backup solution), so I am stuck.

So if anyone can give some insights into ixp42x / Xscale / NSLU2 specific JTAG issues or hints regarding this issue on OpenOCD or other such tool, I would be really grateful.


Note: I have made a hacked second stage Apex bootloader to laod the NetBSD image via TFTP, but the default RedBoot sequence 'boot; exec 0x01d00000' should be 'boot; go 0x01d00000' for NetBSD to work, so I am considering changing the RedBoot partition to alter that command. The gory details can be summed as my Apex is calling RedBoot functions to be network enabled (because Intel's NPE current code is not working on Apex) and I have tested this to work with go, but not with exec.