What would 128 bits computing look like?

vis4valentine@lemmy.ml · 1 year ago

What would 128 bits computing look like?

mindbleach@lemmy.world · 1 year ago

Gene Amdahl himself was arguing hardware. It was never about writing better software - that’s the lesson we’ve clawed out of it, after generations of reinforcing harmful biases against parallelism.

Telling people a billion cores won’t solve their problem is bad, actually.

Human beings by default think going faster means making each step faster. How you explain that’s wrong is so much more important than explaining that it’s wrong. This approach inevitably leads to saying ‘see, parallelism is a bottleneck.’ If all they hear is that another ten slow cores won’t help but one faster core would - they’re lost.

That’s how we got needless decades of doggedly linear hardware and software. Operating systems that struggled to count to two whole cores. Games that monopolized one core, did audio on another, and left your other six untouched. We still lionize cycle-juggling maniacs like John Carmack and every Atari programmer. The trap people fall into is seeing a modern GPU and wondering how they can sort their flat-shaded triangles sooner.

What you need to teach them, what they need to learn, is that the purpose of having a billion cores isn’t to do one thing faster, it’s to do everything at once. Talking about the linear speed of the whole program is the whole problem.

Vlyn@lemmy.world · 1 year ago

You still don’t get it. This is about algorithmic complexity.

Say you have an algorithm that has 90% that can be done in parallel, but you have 10% that can’t. No matter how many cores you throw at it, be it 4, 10, or a billion, the 10% will be the slowest part that you can’t optimize with more cores. So even with an unlimited amount of cores, your algorithm is still having to wait on the last 10% that runs on a single core.

Amdahl’s law is simply about those 10% you can’t speed up, no matter how many cores you have. It’s a bottleneck.

There are algorithms you can’t run in parallel, simply because the results depend on each other. For example in a cipher where you first calculate block A, then to calculate block B you rely on block A. You can’t do block A and B at the same time, it’s not possible. Yes, you can use multi-threading to calculate A, then do it again to calculate B, but overall you still have waiting times while you wait for each result, which means no matter how fast you get, you always have a minimum time that you’ll need.

Throwing more hardware at this won’t help, that’s the entire point. It helps to a certain degree, but at some point the parts you can’t run in parallel will hold you back. This obviously doesn’t count for workloads that can be done 100% in parallel (like rendering where you can split the workload up without issues), Amdahl’s law doesn’t apply there as the amount of single-core work would be zero in the equation.

The whole thing is used in software development (I heard of Amdahl’s law in my university class) to decide if it makes sense to multi-thread part of the application. If the work you do is too sequential then multi-threading won’t give you much of a benefit (or makes it run worse, as you have to spin up threads and synchronize results).

mindbleach@lemmy.world · 1 year ago

I am a computer engineer. I get the math.

This is not about the math.

Speeding up a linear program means you’ve already failed. That’s not what parallelism is for. That’s the opposite of how it works.

Parallel design has to be there from the start. But if you tell people adding more cores doesn’t help, unless!, they’re not hearing “unless.” They’re hearing “doesn’t.” So they build shitty programs and bemoan poor performance and turn to parallelism to hurry things up - and wow look at that, it doesn’t help.

I am describing a bias.

I am describing how a bias is reinforced.

That’s not even a corruption of Amdahl’s law, because again, the actual dude named Amdahl was talking to people who wanted to build parallel machines to speed up their shitty linear code. He wasn’t telling them to code better. He was telling them to build different machines.

Building different machines is what we did for thirty or forty years after that. Did we also teach people to make parallelism-friendly programs? Did we fuck. We’re still telling students about “linear portions” as if programs still get entered on a teletype and eventually halt. What should be a 300-level class about optimization is instead thrown at people barely past Hello World.

We tell them a billion processors might get them a 10% speedup. I know what it means. You know what it means. They fucking don’t.

Every student’s introduction to parallelism should be a case where parallelism works. Something graphical, why not. An edge-detect filter that crawls on a monster CPU and flies on a toy GPU. Not some archaic exercise in frustration. Not some how-to for turning two whole cores into a processor and a half. People should be thinking in workloads before they learn what a goddamn pointer is. We betray them, by using a framing of technology that’s older than disco. Amdahl’s law as she is taught is a relic of the mainframe era.

Telling kids about the limits of parallelism before they’ve started relying on it has been an excellent way to ensure they won’t.

Vlyn@lemmy.world · 1 year ago

At this point you’re just arguing to argue. Of course this is about the math.

This is Amdahl’s law, it’s always about the math:

https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/AmdahlsLaw.svg/1024px-AmdahlsLaw.svg.png

No one is telling students to use or not use parallelism, it depends on the workload. If your workload is highly sequential, multi-threading won’t help you much, no matter how many cores you have. So you might be able to switch out the algorithm and go with a different one that accomplishes the same job. Or you re-order tasks and rethink how you’re using the data you have available.

Practical example: The game Factorio. It has thousands of conveyor belts that have to move items in a deterministic way. As to not mess things up this part of the game ran on a single thread to calculate where everything landed (as belts can intersect, items can block each other and so on). With some clever tricks they rebuilt how it works, which allowed them to safely spread the workload over several cores (at least for groups of belts). Bit of a write-up here (under “Multithreaded belts”).

Teaching software development involves teaching the theory. Without that you would have a difficult time to decide what can and what can’t benefit from multi-threading. Absolutely no one says “never multi-thread!” or “always multi-thread!”, if you had a teacher like that then they sucked.

Learning about Amdahl’s law was a tiny part of my university course. A much bigger part was actually multi-threading programs, working around deadlocks, doing performance testing and so on. You’re acting as if the teacher shows you Amdahl’s law and then says “Obviously this means multi-threading isn’t worth it, let’s move on to the next topic”.

mindbleach@lemmy.world · 1 year ago

“The way we teach this relationship causes harm.”

“Well you don’t understand this relationship.”

“I do, and I’m saying: people plainly aren’t getting it, because of how we teach it.”

“Well lemme explain the relationship again–”

Nobody has to tell people not to use parallelism. They just… won’t. In part because of how people tend to think, by default, and in part because of how we teach them to think.

We would have to tell students to use parallelism, if we expect graduates to choose it freely. It’s hard and it’s weird and you can’t just slap it on at the end. It should become what they do first.

I am telling you in some detail how focusing on linear performance, using the language of the nineteen goddamn seventies, doesn’t need to say multi-threading isn’t worth it, to leave people thinking multi-threading isn’t worth it.

Jesus, even calling it “multi-threading” is an obstacle. It makes parallelism sound like some fancy added feature. It’s the version of parallelism that shows up in late-version changelogs, when for some reason performance has become an obstacle.

Vlyn@lemmy.world · 1 year ago

Multi-threading is difficult, you can’t just slap it on everything and call it a day.

There are languages where it’s easier (Go, Rust, …) but parallelism is an advanced feature. Do it wrong and you get race conditions or dead locks. There is a reason you learn about this later in programming, but you do learn about it (and get to use it).

When we’re being honest most programmers work on CRUD applications, which are highly sequential, usually waiting on IO and not CPU cycles and so on. Saving 2ms on some operations doesn’t matter if you wait 50ms on the database (and sometimes using more threads is actually slower due to orchestration). If you’re working with highly efficient algorithms or with GPUs then parallelism has a much higher priority. But it always depends on what you’re working with.

Depending on your tech stack you might not even have the option to properly use parallelism, for example with JavaScript (if you don’t jump through hoops).

mindbleach@lemmy.world · 1 year ago

“Here’s all the ways we tell people not to use parallelism.”

I’m sorry, that’s not fair. It’s only a fraction of the ways we tell people not to use parallelism.

Multi-threading is difficult, which is why I said it’s a fucking obstacle. It’s the wrong model. The fact you’d try to “slap it on” is WHAT I AM TALKING ABOUT. You CANNOT just apply more cores to existing linear code. You MUST actively train people to write parallel-friendly code, even if it won’t necessarily run in parallel.

Javascript is a terrible language I work with regularly, and most of the things that should be parallel aren’t - and yet - it has abundant features that should be parallel. It has absorbed elements of functional programming that are excellent practice, even if for some goddamn reason they’re actually executed in-order.

Fetches are single-threaded, in Javascript. I don’t even know how they did that. Grabbing a webpage and then responding to an event using an inline function is somehow more rigidly linear than pre-emptive multitasking in Windows 95. But you should still write the damn things as though they’re going to happen in parallel. You have no control over the order they happen in. That and some caching get you halfway around most locks.

Javascript, loathesome relic, also has vector processing. The kind insisted upon by that pedant in the other subthread, who thinks the 512-bit vector units in a modern Intel chip don’t qualify, but the DSP on a Super Nintendo does. Array.forEach and Array.map really fucking ought to be parallelisable. Google could use its digital imperialism to force millions of devs to adopt better standards, just by following the spec and not processing keys in a rigid order. Bad code treating it like a simplified for-loop would break. Good code… wouldn’t.

We want people to write that kind of code.

Not necessarily code that will run in parallel. Just code that could.

Workload-centric thinking is the only thing that’s going to stop “let’s add a little parallelism, as a treat” from producing months of needless agony. Anything else has to be dissected, warped beyond recognition, and stitched back together, with each step taking more effort than starting over from scratch, and the end result still being slow and unreadable and fragile.

Spedwell@lemmy.world · edit-2 1 year ago

Amdahl’s isn’t the only scaling law in the books.

Gustafson’s scaling law looks at how the hypothetical maximum work a computer could perform scales with parallelism—idea being for certain tasks like simulations (or, to your point, even consumer devices to some extent) which can scale to fully utilize, this is a real improvement.

Amdahl’s takes a fixed program, considers what portion is parallelizable, and tells you the speed up from additional parallelism in your hardware.

One tells you how much a processor might do, the only tells you how fast a program might run. Neither is wrong, but both are incomplete picture of the colloquial “performance” of a modern device.

Amdahl’s is the one you find emphasized by a Comp Arch 101 course, because it corrects the intuitive error of assuming you can double the cores and get half the runtime. I only encountered Gustafson’s law in a high performance architecture course, and it really only holds for certain types of workloads.