Spectre and Meltdown explained

Anton Gostev wrote a wonderful article on the Spectre and Meltdown attack in his weekly forums digest.

I will repost it here for those who haven’t subscribed yet.

Veeam Community Forums Digest January 1 – January 7, 2018

By now, most of you
have probably already heard of the biggest disaster in the history of IT –
Meltdown and Spectre security vulnerabilities which affect all modern CPUs,
from those in desktops and servers, to ones found in smartphones.
Unfortunately, there’s much confusion about the level of threat we’re
dealing with here, because some of the impacted vendors need reasons to
explain the still-missing security patches. But even those who did release
a patch, avoid mentioning that it only partially addresses the threat. And,
there’s no good explanation of these vulnerabilities on the right level
(not for developers), something that just about anyone working in IT could
understand to make their own conclusion. So, I decided to give it a shot
and deliver just that.First, some essential background. Both
vulnerabilities leverage the „speculative execution“ feature, which is
central to the modern CPU architecture. Without this, processors would idle
most of the time, just waiting to receive I/O results from various
peripheral devices, which are all at least 10x slower than processors. For
example, RAM – kind of the fastest thing out there in our mind – runs at
comparable frequencies with CPU, but all overclocking enthusiasts know that
RAM I/O involves multiple stages, each taking multiple CPU cycles. And hard
disks are at least a hundred times slower than RAM. So, instead of waiting
for the real result of some IF clause to be calculated, the processor
assumes the most probable result, and continues the execution according to
the assumed result. Then, many cycles later, when the actual result of said
IF is known, if it was „guessed“ right – then we’re already way ahead in
the program code execution path, and didn’t just waste all those cycles
waiting for the I/O operation to complete. However, if it appears that the
assumption was incorrect – then, the execution state of that „parallel
universe“ is simply discarded, and program execution is restarted back from
said IF clause (as if speculative execution did not exist). But, since
those prediction algorithms are pretty smart and polished, more often than
not the guesses are right, which adds significant boost to execution
performance for some software. Speculative execution is a feature that
processors had for two decades now, which is also why any CPU that is still
able to run these days is affected.Now, while the two
vulnerabilities are distinctly different, they share one thing in common –
and that is, they exploit the cornerstone of computer security, and
specifically the process isolation. Basically, the security of all
operating systems and software is completely dependent on the native
ability of CPUs to ensure complete process isolation in terms of them being
able to access each other’s memory. How exactly is such isolation achieved?
Instead of having direct physical RAM access, all processes operate in
virtual address spaces, which are mapped to physical RAM in the way that
they do not overlap. These memory allocations are performed and controlled
in hardware, in the so-called Memory Management Unit (MMU) of CPU.At this point, you already know enough to understand Meltdown. This
vulnerability is basically a bug in MMU logic, and is caused by skipping
address checks during the speculative execution (rumors are, there’s the
source code comment saying this was done „not to break optimizations“). So,
how can this vulnerability be exploited? Pretty easily, in fact. First, the
malicious code should trick a processor into the speculative execution
path, and from there, perform an unrestricted read of another process‘
memory. Simple as that. Now, you may rightfully wonder, wouldn’t the
results obtained from such a speculative execution be discarded completely,
as soon as CPU finds out it „took a wrong turn“? You’re absolutely correct,
they are in fact discarded… with one exception – they will remain in the
CPU cache, which is a completely dumb thing that just caches everything CPU
accesses. And, while no process can read the content of the CPU cache
directly, there’s a technique of how you can „read“ one implicitly by doing
legitimate RAM reads within your process, and measuring the response times
(anything stored in the CPU cache will obviously be served much faster).
You may have already heard that browser vendors are currently busy
releasing patches that makes JavaScript timers more „coarse“ – now you know
why (but more on this later).As far as the impact goes,
Meltdown is limited to Intel and ARM processors only, with AMD CPUs
unaffected. But for Intel, Meltdown is extremely nasty, because it is so
easy to exploit – one of our enthusiasts compiled the exploit literally
over a morning coffee, and confirmed it works on every single computer he
had access to (in his case, most are Linux-based). And possibilities
Meltdown opens are truly terrifying, for example how about obtaining admin password as it is being
in another process running on the same OS? Or accessing your
precious bitcoin wallet? Of course, you’ll say that the exploit must first
be delivered to the attacked computer and executed there – which is fair,
but here’s the catch: JavaScript from some web site running in your browser
will do just fine too, so the delivery part is the easiest for now. By the
way, keep in mind that those 3rd party ads displayed on legitimate web
sites often include JavaScript too – so it’s really a good idea to install
ad blocker now, if you haven’t already! And for those using Chrome, enabling Site Isolation
is also a good idea.OK, so let’s switch to Spectre
next. This vulnerability is known to affect all modern CPUs, albeit to a
different extent. It is not based on a bug per say, but rather on a design
peculiarity of the execution path prediction logic, which is implemented by
so-called Branch Prediction Unit (BPU). Essentially, what BPU does is
accumulating statistics to estimate the probability of IF clause results.
For example, if certain IF clause that compares some variable to zero
returned FALSE 100 times in a row, you can predict with high probability
that the clause will return FALSE when called for the 101st time, and
speculatively move along the corresponding code execution branch even
without having to load the actual variable. Makes perfect sense, right?
However, the problem here is that while collecting this statistics, BPU
does NOT distinguish between different processes for added „learning“
effectiveness – which makes sense too, because computer programs share much
in common (common algorithms, constructs implementation best practices and
so on). And this is exactly what the exploit is based on: this peculiarity
allows the malicious code to basically „train“ BPU by running a construct
that is identical to one in the attacked process hundreds of times,
effectively enabling it to control speculative execution of the attacked process once it
hits its own respective construct, making one dump „good stuff“ into the
CPU cache. Pretty awesome find, right?But here comes the major
difference between Meltdown and Spectre, which significantly complicates
Spectre-based exploits implementation. While Meltdown can „scan“ CPU cache
directly (since the sought-after value was put there from within the scope
of process running the Meltdown exploit), in case of Spectre it is the
victim process itself that puts this value into the CPU cache. Thus, only
the victim process itself is able to perform that timing-based CPU cache
„scan“. Luckily for hackers, we live in the API-first world, where every
decent app has API you can call to make it do the things you need, again
measuring how long the execution of each API call took. Although getting
the actual value requires deep analysis of the specific application, so
this approach is only worth pursuing with the open-source apps. But the
„beauty“ of Spectre is that apparently, there are many ways to make the
victim process leak its data to the CPU cache through speculative execution
in the way that allows the attacking process to „pick it up“. Google
engineers found and documented a few, but unfortunately many more are
expected to exist. Who will find them first?

Of course, all of
that only sounds easy at a conceptual level – while implementations with
the real-world apps are extremely complex, and when I say „extremely“ I
really mean that. For example, Google engineers created a Spectre exploit
POC that, running inside a KVM guest, can read host kernel memory at a rate
of over 1500 bytes/second. However, before the attack can be performed, the
exploit requires initialization that takes 30 minutes! So clearly, there’s
a lot of math involved there. But if Google engineers could do that,
hackers will be able too – because looking at how advanced some of the
ransomware we saw last year was, one might wonder if it was written by
folks who Google could not offer the salary or the position they wanted.
It’s also worth mentioning here that a JavaScript-based POC also exists
already, making the browser a viable attack vector for Spectre.

Now, the most important part – what do we do about those vulnerabilities?
Well, it would appear that Intel and Google disclosed the vulnerability to
all major vendors in advance, so by now most have already released patches.
By the way, we really owe a big „thank you“ to all those dev and QC folks
who were working hard on patches while we were celebrating – just imagine
the amount of work and testing required here, when changes are made to the
holy grail of the operating system. Anyway, after reading the above, I hope
you agree that vulnerabilities do not get more critical than these two, so
be sure to install those patches ASAP. And, aside of most obvious stuff
like your operating systems and hypervisors, be sure not to overlook any
storage, network and other appliances – as they all run on some OS that too
needs to be patched against these vulnerabilities. And don’t forget your
smartphones! By the way, here’s one good community tracker for
all security bulletins (Microsoft is not listed there, but they did push
the corresponding emergency update to
Windows Update back on January 3rd).

Having said that, there are
a couple of important things you should keep in mind about those patches.
First, they do come with a performance impact. Again, some folks will want
you to think that the impact is negligible, but it’s only true for
applications with low I/O activity. While many enterprise apps will
definitely take a big hit – at least, big enough to account for. For
example, installing the patch resulted in almost 20% performance
in the PostgreSQL benchmark. And then, there is this major cloud
service that saw CPU usage double after
installing the patch on one of its servers. This impact is caused due to
the patch adding significant overhead to so-called syscalls, which is what computer
programs must use for any interactions with the outside world.

Last but not least, do know that while those patches fully address
Meltdown, they only address a few currently known attacks vector that Spectre enables. Most
security specialists agree that Spectre vulnerability opens a whole slew of
„opportunities“ for hackers, and that the solid fix can only be delivered
in CPU hardware. Which in turn probably means at least two years until
first such processor appears – and then a few more years until you replace
the last impacted CPU. But until that happens, it sounds like we should all
be looking forward to many fun years of jumping on yet another critical
patch against some newly discovered Spectre-based attack. Happy New Year!
Chinese horoscope says 2018 will be the year of the Earth Dog – but my
horoscope tells me it will be the year of the Air Gapped Backup.



Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.