Optimising Code - Computerphile

135,633
0
Published 2023-12-07
You can optimise for speed, power consumption or memory use & tiny changes can have a negligible or huge impact, but what should you optimise and most importantly, when? Dr Steve Bagley has an example!

www.facebook.com/computerphile
twitter.com/computer_phile

This video was filmed and edited by Sean Riley.

Computer Science at the University of Nottingham: bit.ly/nottscomputer

Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharanblog.com/

Thank you to Jane Street for their support of this channel. Learn more: www.janestreet.com/

All Comments (21)
  • @bentpen2805
    It’s always cool to compile two versions of the same algorithm to assembly, and see they’re identical
  • @mausmalone
    One fun thing in optimization is the work that Kaze Eamanuar is doing on the N64. To make a long story short - he's optimizing for speed BUT one of the biggest bottlenecks for the N64 is RAM access. The CPU doesn't have direct access to the RAM, so when it requests a read from RAM it first goes to a local cache and if that page of memory isn't cached, it has to request the (essentially) GPU to copy the entire page of RAM into the CPU's local cache. This goes for both data and instructions which go to separate caches. What he's found is that one of the best ways to get good performance out of the N64 is to get your code to fit into (and be aligned into) a single page of memory. If you can do that, the CPU will hum away happily at 93MHz (which was a lot for the time!). But if you keep calling functions that are located in different pages, you'll frequently have to stall the CPU to wait for those pages to be moved into cache.
  • @bartekkowalski
    For others not to waste their 10 minutes of their life: In 3:58 the text encoded in binary is "OK so this isn't strictly speaking the same text as the stuff on the left. Oh well :) :)" In 5:17 the text encoded in binary is "Right, are you really converting all this binary back to asci? Well done you! - Sean"
  • @Adeith
    As someone who has to teach juniors to optimize games, these are the types of optimization that are the least important and is very rare that you actually bother with. The part he scoffed at also kinda glossed over a very important part, the reason you don't just write it "right from the beginning" is because you don't only do trade offs between speed, memory and battery, you also do it against maintainability, which these kinds of optimization ruin and is the thing you should be optimizing for while writing it the first time around. Also, by far the most important optimizations are choosing the correct algoritm, data structure and architecture. Those optimizations are almost never premature and can give speed ups of many orders of magnitude compared to the micro optimizations shown that might give 2x at most.
  • @Zullfix
    1:36 The quote "premature optimization is the root of all evil" is severely misued nowadays as an excuse to write very slow or just plain bad bad on the first (and often only) pass. The quote was initially said to discourage developers from inlining assembly inside their C applications before they even had an MVP. But nowadays it's an excuse for developers to write poor, unefficient code that becomes the architecture of application, making meaningful changes require a large refactoring. If you can write good, fast code from the start without going overboard into inlined assembly, what excuse do you have not to?
  • @brunoramey50
    - Optimise fo CPU speed - Optimise for memory usage - Optimise for power consumption - Optimise for maintenability - Optimise for Developer time Make your educated choice !
  • A note on the idea of not optimizing until its working. For context, I write embedded C for microcontrollers in time-critical systems that have safety requirements (aerospace and such). With that in mind, the software architecture needs to be designed with some degree of optimization in mind, but that aside, I agree, we not optimize until a functional block is complete. The biggest benefit we have in doing this is readability and maintainability; it is just as important that a piece of code can be understood as it is that it works. I have worked with a lot of legacy code that was written without oversight and focused on hand-coded optimization from the start. The result is code that is hard to follow and is prone to mistakes when it is picked up by another developer.
  • @solhsa
    Modern compilers are pretty insane. Writing benchmarks is difficult because compiler may realize what you're doing and optimize your payload away.
  • @LunarcomplexMain
    It's also helpful to just make the program first without thinking anything of optimization, because when you have it finished (or w.e part you're working on) changes you make afterwards while trying to optimize, you'll already have the ability to test that immediately, with whatever code you've already written.
  • @Schnorzel1337
    One huge trick John Carmack taught: If a piece of code is written poorly and you know it. But at the moment the input size is so small that it doesnt matter. Give it a nice living comment with for example assert. Example: You have an array of unsorted numbers and you want to find if 3 numbers add up to zero. The 3-Sum problem. And your first approach takes O(n^3) instead of the optimal O(n^2) its fine go ahead. Then put down: assert arr.length <= 10000 : "This algorithm needs a fix, because the input got huge" The result is, that if that code never gets overwhelmed you won't ever notice. But if the code gets called with a huge array, the program stops running with an error and you are forced to optimize that code.
  • @Slarti
    Always optimise for debugging. Someone is going to have to fix your code at some point - always write the code so it is easier to step through and fix it.
  • @rmsgrey
    Just watching the video, I can see two obvious (but mutually incompatible) optimisations (in terms of number of instructions per loop) before trying loop unrolling: - rather than counting up until R2 matches R3, copy R3 into R2 before the loop (if you want to preserve the original byte count for some reason) and use a single decrement-and-compare-to-0 instruction rather than the separate increment and comparison instructions. - rather than having a separate loop counter at all, calculate what the final value for one of the pointers should be and compare that pointer with that value.
  • @johnbennett1465
    I am disappointed that he didn't even mention the memory alignment problem. At best, off alignment access is a significant performance hit. At worst, it fails. I don't know if any current computers have the limitation, but I have worked on computers that require all access to be aligned to the data size.
  • @adamburry
    There was a missed opportunity here. Based on the example, I was expecting you to circle back to the point about making your code correct before optimising it. This code fails for a certain case of overlap between src and dst; it has the potential to overwrite your data before you've had a chance to move it.
  • Software can be and most often is magnitudes more complex since 1974 when Knuth wrote that. I would go so far as to say optimization in software means something different now than it did then since we can do a lot of optimization without writing assembly or messing with the hardware. You could make a wise optimization choice before you even start writing by choosing a package or library with no dependencies, for example. In other words, optimization can't be considered a single step of the process anymore. I can't fathom waiting until I'm close to finishing a project before thinking about efficiency
  • @AnttiBrax
    Remember kids, you are only allowed to quote Knuth if you are going to profile and optimise your code later. If you don't, it's just an excuse for writing sloppy code.
  • @bigutubefan2738
    Steve Bagley is one of the most underrated people on the Internet.
  • @uuu12343
    I think everyone in the comment section has to remember: this video is meant as a general introduction and understanding of the necessary rules and purpose of code optimization, NOT code optimization for C (using GCC, gdb, and ASSEMBLY) These notes are meant to apply to code optimization for, say, Rust, golang, python etc etc, so while memcpy might make optimization via manual control easier - Python, Rust does not have memcpy This uses manual approaches to fully understand the flow, not the tools
  • Quite a number of really good comments here calling out how this video covers all the wrong things and is off base. Glad to see so many people understand that and are willing to speak up.
  • @fishsayhelo9872
    speaking on unrolling, one of my favorite C "techniques" for doing so has got so be Duff's device, which makes clever use of some C language features to implement loop unrolling at runtime, similarly to as shown in the video