Tag Archives: fedora

Notes on a mass upgrade to Fedora 23

Picture of Fedora 23 desktop

Fedora 23

One of the hardest parts of running Fedora in a school setting is keeping on top of the upgrades, and I ended up falling a few months behind. Fedora 23 was released back in November, and it took me until February to start the upgrade process.

For our provisioning process, we’ve switched from a custom koji instance to ansible (with our plays on github), and this release was the first time I was really able to take advantage it. I changed our default kickstart to point to the Fedora 23 repositories, installed it on a test system, ran ansible on it, and voilà, I had a working Fedora 23 setup, running perfectly with all our school’s customizations. It was the easiest upgrade experience I’ve ever had!

Well, mostly.

As usual, the moment you think everything is perfect is the moment everything goes wrong. On our multiseat systems, we have three external AMD graphics cards along with the internal Intel graphics. The first bug I noticed was that the Intel card wasn’t doing any graphics acceleration. It turns out that VGA arbitration is automatically turned on if you have more than one video card, and Intel cards don’t support it in DRI2. DRI3 does handle arbitration just fine, but it was (and still is) disabled in the latest xorg-x11-drv-intel in the updates repository. Luckily for me, there’s a build in koji that re-enables DRI3. Problem solved.

The second bug was…odd. While we use gnome-shell as the default desktop environment in the school, we use lightdm for logging in, mainly because of it’s flexibility. We run xscreensaver in the login screen (and only in the login screen) to make it clear which computers are off, which are on, and which are logged in. GDM doesn’t support xscreensaver, but lightdm does. And this brings us back to the bug. On the Intel seat, moving the mouse or pressing a key would stop the screensaver as expected, but the screen would remain black except for the username control. It seems that the “VisibilityNotify” event isn’t being honored by the driver (though don’t ask me why it should be passed down to the driver). I filed a bug, and then finally figured out that fading xscreensaver back in works around the problem.

The third bug is even stranger. On the teacher’s machine, we have a small script that starts x11vnc (giving no control to anyone connecting to it) so the teacher can give a demonstration to the students. But after install Fedora 23 on the teacher’s machine, the demo kept showing the same three frames over and over. The teacher’s system isn’t multiseat and is using the builtin Intel graphics, so, oddly enough, disabling DRI3 fixed the problem. I filed another bug.

When upgrading the staff room systems, I ran into a bug in which cups runs screaming into the night (ok, slight exaggeration) if you have a server announcing printers over both the old cups and new dnssd protocols. Since we don’t have any pre-F21 systems any more, I’ve just disabled the old cups protocol on the server.

And, finally, my principal, who teachers computers to grades 11 and 12, came in to ask me why LibreOffice was crashing for a couple (and only a couple) of his students when they were formatting cells on a spreadsheet that he gave them. After some fancy footwork involving rm’d .config/libreoffice directories and files saved into random odd formats and then back into ods, we finally managed to format the cells without a crash. Lovely.

All this brings me back to ansible. In each of the bugs that required changes to the workstations, all I had to do was update the ansible scripts and push the changes out. Talk about painless! Ansible has made this job so much easier!

And I do want to finish by saying that these bugs are part of the reason that I love Fedora. With Fedora, I have the freedom to fix these problems myself. For both the cups bug and the xscreensaver bug, I was able to dig into the source code to start tracking down where the problem lay and come up with a workaround. And if I can just get the LibreOffice bug to reproduce, I could get a crash dump off of it and possibly figure it out too. Hurrah for source code!

Virtualizing Windows (and simplifying my life)

Picture of fireworks

Freedom

At our school, we’ve been running Fedora on most of the desktops since Fedora 8, but the one department that’s stuck with Windows is the accounting department, mainly because their software is Windows-only.  This has long been a problem because most of our infrastructure is built around Linux and we haven’t put nearly as much energy into making sure Windows systems are maintained properly.

Obviously, this led to problems that started out small, but grew until the systems were bordering on unusable.  When it reached the point that we were considering yet another reinstall of Windows, I suggested switching the accountants over to Fedora and having them use a virtual machine for the software that required the other OS.

It took a few days to get something that worked, and another week (including one very late night) to tie down the little glitches and get the virtual machine beyond just-usable to easy-to-use.

I started with VirtualBox, but there were a number of issues with stability, so I decided to take another look at QEMU.  I thought about using libvirt, but one of my requirements was that everything needed to run under the user’s permissions, so it turned out to be easier to run qemu-kvm directly.  I used SPICE and installed the guest agent, which gave us a far better experience with QEMU than the last time I used it for a desktop OS (which, granted, was over five years ago).

Most of my time was spent fixing problems inherent to Windows 7 itself, rather than the virtualization process.  It turns out that there are bugs in how it handles network printers, causing delays every time you want to print.  Oddly enough, the fix was pretty simple, but it took a while to figure it out.  There was also the bug where network drives aren’t mapped properly if the system boots so quickly that the network isn’t up in time, which was only fixable by using a batch file for mapping the network drives.

One change I made was to insist that we use throw-away snapshots for day-to-day work (the data is stored on a network drive) and only keep changes when we’re updating the accounting software.  This should help protect us from viruses and malware that can’t be easily removed.

The best part of all this is that the new accounting VM and the scripts necessary to start it are sitting in a network folder only accessible by the accountants.  This means that they can now do their work from any computer in the school, if necessary, while still protecting them.

And I’m no longer stuck keeping unmanaged Windows systems running.  What a way to close out the year!

Colorful Fireworks by 久留米市民(Kurume-Shimin) used under a CC BY-SA 3.0 unported license

Solving the mystery of the disappearing bluetooth device

wifi This is a true[1] story

One of the features my laptop comes with is Bluetooth, which I’ve found to be quite handy considering all the highly important uses I have for Bluetooth (using Bluetooth tethering on my phone when traveling, controlling my presentations with my phone, using a Wii-mote for playing SuperTuxKart portable Bluetooth controller with built-in accelerometer to analyze the consistency of the matrices used when rendering three-dimensional objects onto a two-dimensional field).

About three months ago, I started to run into problems. Not the easy kind of problem where “BUG: unable to handle kernel paging request at 0000ffffd15ea5e” brings the laptop to an abrupt stop, but instead the kind of problem that causes real trouble.

My Bluetooth module starts to randomly reset itself. I’ll be working merrily, trying to connect my phone or the… portable Bluetooth controller… and, halfway through the process, it will hang. Kernel logs show that the Bluetooth module has been unplugged from the USB bus and then reconnected. Which, when you think about it, makes a whole lot of sense, given that the Bluetooth module is built into the WiFi card which is screwed onto the motherboard.

When faced with kernel logs that boggle the mind, the most logical thing to do is downgrade the kernel. I know that I was able to successfully… analyze the matrices used for, oh, whatever it was… back at the beginning of June, which means I had working Bluetooth on June 1. Let’s see what kernel was latest then, download and install it, boot from it, and…

kernel: usb 8-4: USB disconnect, device number 3
kernel: usb 8-4: new full-speed USB device number 4 using ohci-pci

#$@&%*!

Ok, the hardware must be dying.  Stupid Atheros card.  No idea why it’s just the Bluetooth and not the WiFi as well, but we’re in Ireland and I’m on eBay, so I’ll just order another one.  Made by a different company.  A week later, a slightly used Ralink combo card shows up. I plug it in, fire her up, and…

kernel: usb 8-4: USB disconnect, device number 3
kernel: ohci-pci 0000:00:13.0: HC died; cleaning up
kernel: ohci-pci 0000:00:13.0: frame counter not updating; disabled

Double #$@&%*! Now the Bluetooth module is completely gone and the only way to get it back is to reboot. Grrrrr.

At this point I’ve got a hammer in my hand, my laptop in front of me, and the only thing keeping me from submitting a video for a new OnePlus One is my wife warning me that we’re not going to be buying me a new laptop any time this decade.

So I take a deep breath, calmly return the hammer to the toolbox (no, dear, I have no idea how that dent got on the toolbox), and decide to instead go down the road less traveled. I open up Fedora’s bugzilla and start preparing my bug report, taking special care to only use words that I’d be willing to say in front of my children. “…so the Bluetooth module keeps getting disconnected. It’s almost like the USB bus is cutting its power for some stupid…”

Wait a minute! Just before we traveled to Ireland, I remember experimenting with PowerTOP. And PowerTOP has this cool feature that allows you to automatically enable all power saving options on boot. And I might have enabled it. So I check, and, yes I have turned on autosuspend for my Bluetooth module. I turn it off, try to connect my… portable Bluetooth controller… and it works, first time. I do some… matrix analysis… with it and everything continues to work perfectly.

So I am an idiot. I close the page with the half-finished bug report and go to admit to my wife that I just wasted €20 on a WiFi card that I didn’t really need.  And, uh, if any Atheros or Ralink people read this, well, I’m sorry for any negative thoughts I may have had about your WiFi cards.

[1] Well, mostly true, anyway. Some of the details might be mildly exaggerated.

Scratch

Scratching an itch

Scratch

Scratch

Last year I started teaching programming to my grade 10 classes. I started with Python, which is easy to understand, forces good programming practices, and is one of my favorite languages. It was a complete disaster. I had four or five in each class who understood what I was doing, and the rest were completely lost, which says a whole lot about my teaching. At DevConf.cz 2014, I chatted with Matthew Miller about my Python problem, and he suggested teaching my students Scratch.

For those (like me) that don’t know about it, Scratch is a graphical programming language that’s designed to be easy to use while still allowing the full power of a proper programming language. The benefit of teaching programming using Scratch is that the students get quick graphical feedback on what works and what doesn’t, and syntax errors are pretty much impossible. Once they understand the basic concepts of programming, it’s then easier to switch to something like Python.

I switched to Scratch, and the students loved it. (Or, at the very least, liked it better than Python.) I ended the school year with a group assignment that was partially graded based on votes by the rest of the classes. I had great ideas for making the group assignments available online, but never went anywhere with it. Fast-forward to this year where we’ve started with Scratch and are now almost done with it and ready to move on to Python. And, since I now have a deadline, I’ve put together a simple site so they can vote on each others’ group projects.

At the moment, it has last year’s projects and is open for anyone to rate, so if you want to try out their projects, go to https://scratch.lesbg.com, give them a shot, and rate them. This was a first attempt for both students and myself, so please be gentle on the ratings.

Sometime in the next few weeks I’ll post this year’s projects. They will be available to play, but initially only students or teachers in the school will be able to rate them. Once I’ve scored them, I’ll open up the ratings to everybody.

If you have any comments or suggestions for the site itself, please leave them below.

Multiseat and anaconda bugs

Clouds over a mountain

Those look like storm clouds…

A year ago, I put together a post about the multiseat Fedora systems we’re using in our school. Over the past month, I’ve been putting together an upgrade from our Fedora 19 image to Fedora 21.

While doing the upgrade, I ran into a few bugs, and the first one was a doozy! Roughly half the time our multiseat systems started, the login screen would only show on two or three of the four seats. The only way to fix it was to restart the display manager, and even that only had a 50% chance of success.

At first I tried bodging around the bug by staggering the timing of Xorg’s startup, but that only made things worse. So I started looking at the logs and then looking at the Xorg code. It became obvious that the problem was that the first seat (seat0) would try to claim all the GPUs on the system. If it beat the other seats to their GPUs, they would, oddly enough, refuse to start. I put together a patch, filed a bug, and watched as those who know a lot more about Xorg’s internals take my ugly patch and make it beautiful. This patch has been merged into Xorg 1.17 and I’m hoping we’ll get it backported for F20 and F21 as I really don’t want to have to maintain internal Xorg packages until we switch to F22.

There do seem to be a couple of other bugs related to lightdm/xorg, but they’re far rarer and I haven’t spent much time on tracking them down, much less filing bugs. Occasionally lightdm starts the X server, but never gets a signal back saying that it’s ready, so they both sit there waiting for the other process. And far more rarely, the greeter crashes, which causes lightdm to shut down the seat. I think lightdm should retry a few times, but either it doesn’t or I haven’t found the right config option yet.

We did run into one interesting race condition in anaconda when we started mass-installing F21 on our systems. We use iPXE and Fedora’s PXE network install images with a custom kickstart to do the install (in graphical mode, because pretty installs make it less likely that a student will press the reset button while the install is progressing). On some systems, I’d get an error message that basically said that a repository that was supposed to be enabled had disappeared, which would crash anaconda.

Thanks to anaconda’s wonderful debugging tools, I was able to work out what list was being emptied and finally tracked it down to a race between the backend filling the frontend with its list of repositories and the frontend telling the backend to remove any repositories that aren’t in its list of repositories. Another ugly patch attached to the bug report, and we’ll see what happens with this one. At least I’m able to rebuild the squashfs installer image so the bug is fixed for us internally.

So most of our computers have now been upgraded to Fedora 21 and the reaction from our students has been positive. Now to get some Fedora 22 test systems built…

LEGO fire being put out by team

Us Versus Them

LEGO fire being put out by team

Teamwork

I was reading the backlog of the Fedora development mailing list and came across a post in which Richard Hughes made a very interesting comment:

I know lots of Red Hat developers worn down by the low-level harassment
on this mailing list, so much so, that they just stop pushing the boundaries and go work on something else cool, e.g. ChromeOS.

I’ve been following this particular mailing list for many years, and the sad thing is, I think he’s right. There’s this underlying current of “us versus them” that can pop up, especially in longer-running threads, and “them” is someone with a @redhat.com email address.

On some levels this makes sense. Red Hat is the single largest entity in Fedora and many (if not most) of the movers and shakers in Fedora are Red Hat employees. A quick glance at the Fedora 21 System Wide Changes shows many more Red Hat employees than not. Is it any wonder that individual contributors can feel a bit like a sailboat in the way of an aircraft carrier?

So, is this some conspiracy to keep Fedora under Red Hat control? Is it something we should fight against? Or is there a reasonable explanation for Red Hat’s influence?

First off, there’s the question of whether people are hired at Red Hat to work on Fedora or whether they’re hired because of their work on Fedora. I had the opportunity at Devconf earlier this year to sit down with Patrick Uiterwijk, who did most of the work on Fedora’s OpenID provider, and was then hired by Red Hat because of that work. Patrick’s is not the only story like that. While not all competent Fedora contributors are Red Hat employees, Red Hat employees who contribute to Fedora are generally pretty darn competent, and competency in Fedora is rewarded with influence.

There’s also the fact that Red Hat pays people to work on Fedora. Many individual contributors are working on Fedora in their spare time. While this doesn’t necessarily affect the quality of their work, it does tend to affect the quantity. To give an example, at DevConf, I also talked with Stephen Gallagher about joining the Fedora Server working group. After DevConf, I signed up for the mailing list and then did… nothing. I’m the sysadmin and a teacher at my school, and at home I’m a husband and father of four children under six. While I have great intentions of helping out with the Server working group, it’s just not high enough on my list of priorities for me to have the time… and I suspect I’m not the only individual contributor in that boat.

Finally, there’s the fact that Red Hat’s employees actually get to know each other, at least to some extent. One of the big things I’ve learned in my years working here in Lebanon is the importance of relationship. It’s a lot easier to work with someone after you’ve sat down with them, had a coffee (or, in my case, a Coke) and chatted. This was the main reason I enjoyed DevConf and one reason I really wish I could make one of the Flock conferences.

So where does this leave us? Red Hat does have a large influence on Fedora. It’s not a conspiracy, it’s life, and attacking Red Hat employees because of its influence is counterproductive.

So, going back to Richard’s original message, we need to stop tearing each other down. When people speak, let’s assume good faith, and not assume that any ideas we disagree with will spell the end of Fedora, Linux or the world as we know it. Most of all, we need to make a conscious choice to value each other, even when we disagree.

Have a great 2015!

Using FreeIPA as a backend for DHCP

 

Yeah, this…

Disclaimer: This is not an official guide and in no way represents best practices for FreeIPA. It is ugly and involves the digital equivalent of bashing on screws with a hammer. Having said that, when nobody has invented the right screwdriver yet, sometimes you just have to hammer away.

First, some history. We’ve been running separate DHCP, DNS and LDAP servers since we switched from static IP addresses and a Windows NT domain somewhere around ten years ago. The DHCP server was loosely connected with the DNS server, and I had written this beautifully complex (read: messily unreadable) script that would allow you to quickly add a system to both DHCP and DNS. A few months ago, we migrated all of our users over to FreeIPA, and I started the process of migrating our DNS database over. Unfortunately, this meant that our DHCP fixed addresses were being configured separately from our DNS entries.

Last week I investigated what it would take to integrate our DHCP leases into FreeIPA. First I checked on the web to see if something like this had already been written, but the closest thing I could find was a link to a design page for a feature that’s due to appear in FreeIPA 4.x.

So here’s my (admittedly hacky) contribution:

  1. sync_dhcp – A bash script (put in /srv, chmod +x)that constantly checks whether the DNS zone’s serial number has changed, and, if it has, runs…
  2. generate_dhcp.py – A python script (put in /srv, chmod +x) that regenerates a list of fixed-addresses in /etc/dhcp/hosts.conf
  3. dhcpd.conf – A sample dhcpd.conf (put in /etc/dhcp) that uses the list generated by generate_dhcp.py
  4. sync-dhcp.service – A systemd service (put in /etc/systemd/system) to run sync_dhcp on bootup
  5. make_dns – A script (chmod +x) that allows the sysadmin to easily add new dns entries with a mac address

sync_dhcp does need to know your domain so it knows which DNS zone serial to check, but other than that, the first four files should work with little or no modification. You will need to create a dnsserver user in FreeIPA, give the user read access to DNS entries, and put its password in /etc/dhcp/dnspasswd (readable only by root).

make_dns makes a number of assumptions that are true of our network, but may not be true of yours. It first assumes that you’re using a 10.10.0.0/16 network (yes, I know that’s not right; it’s long story) and that 10.10.9.x and 10.10.10.x IPs are for unrecognized systems. It also requires that you’ve installed freeipa-admintools and run kinit for a user with permissions to change DNS entries, as it’s just basically a fancy wrapper around the IPA cli tools.

Bent Screw Hole Backyard Metal Macros by Steven Depolo used under a CC BY 2.0 license