While writing my treatise on the naming scheme I’ve adopted, a few funny stories came to mind. The first of them is the fun I had with elspeth and a dead motherboard.
Technical support: hard to do?
One of my major gripes with all telephone technical support services is the fact that there’s no geek-friendly support. Even if you are even moderately technologically savvy, you have to put up with tech support who are trained to deal with people who merely need to power cycle their system, or check their cable. So, every time I have to call any technical support, I clear a few hours from my day, enter my calm place, and hope that I don’t want to kill someone afterwards.
Well, almost all. There’s one company that this doesn’t apply to, and that’s Apple.
AppleCare Technical Support are, by far, the most competent technicians I’ve ever reached by phone, although, considering the state of the rest of the industry, that’s not a hard thing to achieve. Of the dozen or so times I’ve had to call Apple’s support team over the nearly three years I used elspeth continuously, I’ve only once had what I’d call a bad experience.
elspeth starts misbehaving
So I get home after a day at school, plug in all the assorted cables that get plugged in – power, networking, USB, sound, miniDisplayPort – and wander off for food. By the time I got back, elspeth was unresponsive.
Ugh. I rebooted.
A quick post-mortem showed nothing that made me suspect anything other than a
panic, which is somewhat unusual, but not especially so. Unlike FreeBSD, it’s not always straightforward to determine whether the kernel,
xnu, has panic’d, after it has done so. Sometimes the OS notices that it panic’d, but that’s quite rare.
Then it did it again, as it tried to recover my session. This time, I got a good look at the system when it died, and what I saw worried me significantly: neither screen was being refreshed properly, and there was digital signal corruption noise apparent on both the internal and external displays.
That immediately worried me, so I rebooted, and it failed again, but even more so. Both displays looked to have missed their vertical timing points, and were showing all manner of horrible flickering and appalling line corruption.
That sounded like memory to me, so I reseated it1, and got even more problems: it wouldn’t even get to a login screen, just sat at a wall of KMS’d nastiness.
So I called Apple.
The only design flaw of the MacBookPro8,2
In my opinion, there’s only one serious design flaw in the MacBook Pro 8,2 (Early 2011). The MacBookPro8,2 is a dual-GPU system, with an Intel GMA 3000 and the Radeon 6750M sharing the load, and so you have an Intel Sandy Bridge series processor (an i7-2820QM, apparently), which has a TDP of some 60-odd watts, on the same heat-pipe as an AMD Radeon HD 6750M, possibly the most power-hungry and hottest-running GPU I’ve ever had the displeasure to use.
It’s probably obvious, now, that my discrete GPU had failed.
The first time I called Apple, I was advised that it was clearly a software fault. Eventually, I rebooted it enough that I got to a login prompt, and was advised to boot in “Safe Mode” (something I never knew existed, and all it seems to do is blow away any session restore data permanently and break things spectacularly) by holding a shift key (I forget which). That didn’t help.
So I called Apple back, and said, yeah, this is clearly a hardware fault. “Nah,” said a different support droid, “it has to be a software issue. Reinstall OS X and give us a call back.”
So I called them back, quoted my case ID and serial number, and pointed out that I knew it was a hardware issue, I knew it wasn’t related to OS X because I had the issue before OS X booted, and thus was booked a Genius Bar appointment.
The Genius Bar
I’ve had fun experiences with Apple’s “Genius” bar. Almost every time I’ve been there, I’ve been able to diagnose faults of people near me faster than the technicians could – they needed to “take it out the back for a technician to look at”, and I could quite clearly see that, yes, their realtime clock battery had died2, or yes, their hard disk had died irrevocably3.
The other thing that usually happens when I’m at a Genius bar is that people look at my screen, usually littered with multiple
xterm windows in delightful reverse video, and get scared.4 I don’t know why.
So, because there was only one Apple Store in Sydney with an available appointment time in the immediate future, I hiked out to Castle Hill, on Sydney’s outer north-west, for a 9 AM appointment on Friday. It took just over an hour and a half to get there.
I showed the Geniuses the fault, they ran diagnostics, and came up with nothing (which speaks wonders for Apple’s “diagnostics”). They said they’d run a soak test, and they’d let me know when it was resolved.
Trekked to work, then home, devoid of a Mac. I’d borrowed Peter’s iPad for the day, and typing on an iPad is nowhere near as nice as typing on a real keyboard.
Trek 2: Trek Harder
On Saturday, I got a message: my Mac was ready for collection and its fault had been resolved. That was surprising; I expected that the hardware fault I’d seen couldn’t be cured by anything other than a hardware replacement, and the person who called me had no information about what had transpired in that case.
So I made arrangements to get out there on Monday morning, bright and early, to pick up my Mac.
I should also mention: this is the first time I’d been on a modern double-decker bus, and barrelling up one of Sydney’s major arterial roads, in front top centre, is a phenomenal experience, especially heading into tunnels with only a handful of centimetres of clearance above the bus. So I arrived in a good mood.
That dissipated as soon as a Genius came out with elspeth. According to the case notes, they reseated the memory and the problem went away.
At this point, I got angry, and pointed out that there was no way that they could possibly have resolved the issue I’d seen by simply reseating the memory. It wasn’t a memory fault, it most definitiely was not because it was “third party” memory (they spent a long time bitching and whining about that), it was a fundamental hardware fault.
I demanded that they bring out a test-bench display and run a graphics soak test in front of me. They did so, presumably just to humour me, and it immediately showed exactly the same fault I reported.
How do you “soak-test” hardware, and not find a critical hardware fault which makes all displays strobe, lose sync, lose refresh, start displaying garbage, and then panic?
I demanded a 24-hour turnaround on a resolution under warranty, and stormed out, grumbling loudly about incompetent technicians who clearly have no clue.
Trek the Third
Hiked back out to Castle Towers. By this stage, I was not only used to getting to, and around, that incredibly badly laid-out shopping hell, but getting recognised by the staff there. And atop that, I wasn’t in a good mood.
A new support droid came out with the laptop, advised that, because I’d dropped it on its corner and that was classified as damage, the warranty may not hold.
Angrily argued my way out of that, pointed out that the fault was entirely a design flaw and noted that it couldn’t’ve damaged the dGPU or its support infrastructure. And so I got elspeth back with a new motherboard, and finally (finally!) the hellish festival of failure finished.
In hindsight, I probably didn’t seat it correctly, and that’s probably why the “Geniuses” passed it off as a memory seating issue. Nonetheless. ↩
I overheard the following exchange between a guy, probably in his 40s, and a “Genius”:
guy to Genius: “So every time I start my MacBook the clock resets to January the first, 1970 and I don’t know why.”
Genius: “I… wow, I’ve never seen anything like that before.”
me: “Your realtime clock is faulty, most likely because the clock battery is flat. Modern UNIX-like systems count time in seconds since midnight UTC, January first, 1970, and if it resets to then, its realtime clock isn’t storing the time correctly, or the OS isn’t reading that value.”
guy: “So what do I do to fix it?”
me: “Under most circumstances, that should automatically get reset by the network time service within a few minutes of the system starting. However, in some versions of OS X, if the time difference is too big, the system won’t slew the clock that far, and so you may need to set it manually.” ↩
I overheard the following exchange between an elderly lady with a first-generation MacBook Air (the ones with PATA spinning rust) and a “Genius”:
lady: “My Mac isn’t booting.”
The Genius boots it, and it panic’s on boot. I read the panic message over his shoulder.
me: “Boot it in single-user mode and check you can read the first 1024 sectors of the disk.”
Genius: “… ?”
Me to the lady: “May I?”
I boot the laptop in single-user mode, noting that it’s spewing I/O errors when I try to do anything. I turn to the genius. “Disk’s cooked.” And it’s true – Ayden Graham had this issue, and I unfortunately couldn’t get the filesystem sufficiently consistent to recover data. Yay, catastrophic disk failures. ↩
This was especially evident on what I expect to be my last visit to an Apple Genius bar, diagnosing a battery fault in my Mac. I had jaenelle with me, and I was trying to get some work done; I run i3 on it, so I typically have at least one or two full-screen xterms or Emacsen frames open. The number of people who looked at jaenelle’s screen and edged away from me was too damn’ high. ↩