Sweat, Focus, and Fantastical

May 31, 2021

Apple’s money and resources are effectively limitless. They have the ability to create any piece of hardware or software they desire to create.

Almost.

I don’t think Apple could have made Fantastical, the calendar application that won Mac App of the Year. Fantastical is a great app, but its greatness does not come from being a revolutionary re-imagining of a calendar. It isn’t that. Its greatness does not come from one specific feature that can’t be found anywhere else either.

Fantastical is great because the people who made it sweat the details.

Flexibits, creators of Fantastical, put in the work. They carefully thought through how users would perform each action a user needs to perform with a calendar. They designed an interface with smooth, delightful animations. One that is always responsive to input, and clearly presents the required information. Every user interaction, every interface detail, every small code change that improved performance: they all add up to Fantastical.

The word that keeps coming to mind with Fantastical is polish. This is what the most polished calendar app you could imagine looks and feels like.

Apple sweats the details on a lot of things. But “polished calendar app” is not going to sell a new iPhone (and thus won’t receive the engineering resources to make it happen). The target they’ve chosen for most of their default software is “good enough.” That creates an opportunity to create something for users who want more than “good enough.” A subset of users want the best calendar app.

Like Apple, we can’t focus on every detail of every project. But like Flexibits, we can seize opportunities to sweat details that others don’t.

First ask where you can sweat the details. Then ask where you should.

Order Matters

May 8, 2021

I was recently reminded that integer division is…well, integer division.

Local variables, a globally defined constant, and function parameters (all holding integer values) made this problem more difficult to see in my case, but here’s a simple example:

#include <stdio.h>

int main() {
    int result = 1000 / 10 * 5;
    int reordered = 5 / 10 * 1000;
    printf("result is %d\n", result);
    printf("reordered is %d\n", reordered);
    return 0;
}

Typing the result and reordered lines into a calculator will give you an identical result: 500. It’s easy, especially during a quick code review, to ignore the types, mentally execute both lines and conclude they are functionally identical. So easy in fact, that one might forget that with integer division order matters:

result is 500
reordered is 0

Remember, both explicit and implicit values are integers in such calculations.

// int result = 1000 / 10 * 5;
// ((1000 / 10) * 5)
// ((100) * 5)
// (500)

// int reordered = 5 / 10 * 1000;
// ((5 / 10) * 1000)
// ((0) * 1000) <----- 0.5 cannot be represented as an integer
// (0)

With named values instead of integer literals in this example, it’s easy to imagine a mistake like this reordering issue slipping through.

This is partly a reminder of the importance of unit tests (or regression tests of some kind). Ideally an incorrect change introducing this behavior would be caught immediately. Well-written unit tests are great for pointing out simple bugs like this, though I am making an assumption that this bug would break a functional requirement of the code under test.

But this is mostly just a friendly reminder that with integer division, just like many other pockets of programming, order matters.

Where to Start

April 17, 2021

If you are new or early in your programming journey, these are the things I suggest learning first. They will benefit you no matter what kind of programming you ultimately do.

Learn C

If you understand the fundamentals of computer science, namely data structures and algorithms, you have the ability to understand and use any programming language.

But while that’s true, learning C will give you a practical programming knowledge that computer science theory will not. After learning C you can mentally translate every other language’s behavior to C (for example, parameter passing). Another benefit is the C language can be learned in its entirety (something I would never say about C++).

As someone who has primarily developed systems software for my entire career, perhaps I’m biased. But I believe C is essential knowledge for every programmer.

Learn a scripting language

Programmers need to write tailored but ephemeral programs on a daily basis. The ability to automate some kind of work with minimal investment to create the automation is invaluable.

That’s why I recommend deeply learning a scripting language. I mentioned that any language is on the table for someone who understands the fundamentals, and that’s true. But paying the get-up-to-speed cost every time you need to automate something is expensive.

Python, Perl, and Ruby are examples of scripting languages that can do this work. My personal favorite is Python, but give these and others a look. Pick one that suits you, and master its basics to the point that you don’t need to look much up to use them.

Learn a shell

Technically a shell script is just as capable as the scripting languages I mentioned above, but I use them differently. Using even the most basic data structure will feel more comfortable in a true scripting language. Shell scripts are useful for, well, automating the shell. But that’s incredibly useful.

The shell is the driver for so many programming tasks. Understanding Unix commands, how to chain them together, and how to automate it all will make you a more efficient programmer. (Saving your steps as a shell script is also a great way to remember how to do something.)

Depending on your environment you may have not complete freedom of choice, but I recommend learning one of: bash, zsh, or fish.

Keep learning

I recommend C, a scripting language, and a shell as some of the first things to learn. They provide a foundation for future learning, and improve your productivity.

But remember, the world of computer science and programming is too big to completely master. None of us will learn it all, and that’s fine. The much more important advice I can give is, no matter what you’re learning, to keep learning.

Keep learning.

Persistence

April 7, 2021

Persistence is a critically important quality of a software developer.

Debugging, for example, demands persistence. It’s a grind.

The first architecture of a system will never be correct. Neither will the second. It takes persistence to design and re-design to the right one.

It also takes persistence to continually show up and do your best work. If you do the best work you can every day, you are ahead of most of the industry. And if you’re not doing the best work you can, what are you doing instead?

Spicy

March 29, 2021

During my first week at my first job, I met one of the senior developers on my team.

He asked me if I liked spicy food.

“Not really,” I responded.

He paused.

“You’ll never make it in this industry.”


I love spicy food today.

Of course, you don’t have to enjoy eating any type of food to make it in software. But be it debugging a difficult problem or discovering the obvious mistake that held you up the last three days: you do need to have a tolerance for pain.

The Agenda Slide

March 27, 2021

A small tip for creating a slide deck: delete the agenda slide.

I know it’s in the company’s default template, and everyone else uses it.

But a presentation is an opportunity to build your message to its conclusion.

Why spoil that by opening with your blueprints?

AddressSanitizer Implementation Basics

March 23, 2021

AddressSanitizer (ASAN) is a compiler-based tool that detects memory bugs. It is an essential part of testing (assuming you are building compatible software with a supported compiler and platform).

It’s relatively straightforward to enable: compile and link your code with the flag -fsanitize=address, then run your binary. Plenty of documentation exists on how to use ASAN. Here I will instead explain the basics of how ASAN works.

Let’s start with the quintessential example of an error that ASAN will detect (modified to use a 64-bit integer for reasons explained later):

#include <cstdint>
int main(int argc, char** argv) {
  int64_t* array = new int64_t[100];
  delete[] array;
  return static_cast<int>(array[argc]);
}

Using the memory pointed to by array after it is deleted (freed) is illegal. ASAN will detect this error and helpfully report details about it. Those details become especially important as errors are detected in more complex code.

But how does ASAN perform such detection?

ASAN primarily relies on the concept of shadow memory. Imagine you were inside a running binary holding a giant notebook. And every time the binary allocated or freed memory, you wrote down what happened to the affected memory. At any given point, you would be able to look at your notebook and answer the question: “is the memory at a given address allocated and valid to use, or freed and invalid to use?”.

Shadow memory is the notebook where ASAN stores information about every memory address within the binary. And during runtime of a sanitized binary, the shadow memory is referenced to answer exactly that question (among others).

ASAN implements this mechanism using two parts: compiler instrumentation and a runtime library.

Instrumentation

The flag -fsanitize=address at compile time instructs the compiler to add additional instructions to the generated code for error detection. Here is the generated code for main from the example, without the ASAN flag (Ubuntu 20.10, x86_64, clang 11):

$ clang++ -g -O1 asan.cc -o asan_test
$ objdump -d asan_test | c++filt
[...]
0000000000401140 <main>:
  401140:   55                      push   %rbp
  401141:   53                      push   %rbx
  401142:   50                      push   %rax
  401143:   89 fd                   mov    %edi,%ebp
  401145:   bf 20 03 00 00          mov    $0x320,%edi
  40114a:   e8 e1 fe ff ff          callq  401030 <operator new[](unsigned long)@plt>
  40114f:   48 89 c3                mov    %rax,%rbx
  401152:   48 89 c7                mov    %rax,%rdi
  401155:   e8 e6 fe ff ff          callq  401040 <operator delete[](void*)@plt>
  40115a:   48 63 c5                movslq %ebp,%rax
  40115d:   8b 04 c3                mov    (%rbx,%rax,8),%eax
  401160:   48 83 c4 08             add    $0x8,%rsp
  401164:   5b                      pop    %rbx
  401165:   5d                      pop    %rbp
  401166:   c3                      retq
  401167:   66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
  40116e:   00 00
[...]

And here is the generated code for main with -fsanitize=address:

$ clang++ -g -O1 -fsanitize=address asan.cc -o asan_test
$ objdump -d asan_test | c++filt
[...]
00000000004c88b0 <main>:
  4c88b0: 55                    push   %rbp
  4c88b1: 53                    push   %rbx
  4c88b2: 50                    push   %rax
  4c88b3: 89 fd                 mov    %edi,%ebp
  4c88b5: bf 20 03 00 00        mov    $0x320,%edi
  4c88ba: e8 a1 d7 ff ff        callq  4c6060 <operator new[](unsigned long)>
  4c88bf: 48 89 c3              mov    %rax,%rbx
  4c88c2: 48 89 c7              mov    %rax,%rdi
  4c88c5: e8 e6 df ff ff        callq  4c68b0 <operator delete[](void*)>
  4c88ca: 48 63 c5              movslq %ebp,%rax
  4c88cd: 48 8d 3c c3           lea    (%rbx,%rax,8),%rdi
  4c88d1: 48 89 f8              mov    %rdi,%rax
  4c88d4: 48 c1 e8 03           shr    $0x3,%rax
  4c88d8: 80 b8 00 80 ff 7f 00  cmpb   $0x0,0x7fff8000(%rax)
  4c88df: 75 09                 jne    4c88ea <main+0x3a>
  4c88e1: 8b 07                 mov    (%rdi),%eax
  4c88e3: 48 83 c4 08           add    $0x8,%rsp
  4c88e7: 5b                    pop    %rbx
  4c88e8: 5d                    pop    %rbp
  4c88e9: c3                    retq
  4c88ea: e8 01 3e fd ff        callq  49c6f0 <__asan_report_load8>
  4c88ef: 90                    nop
[...]

Ok, there are more instructions, but what are they doing?

4c88d4: 48 c1 e8 03           shr    $0x3,%rax
4c88d8: 80 b8 00 80 ff 7f 00  cmpb   $0x0,0x7fff8000(%rax)

These two bitwise right shift the address of array[argc] by 3, then compare the value at “shifted address plus 0x7fff8000” to zero. Why? Let’s let the AddressSanitizer paper answer that question:

Given the application memory address Addr, the address of the shadow byte is computed as (Addr>>3)+Offset.

That’s exactly what we see here. The compiler added code to compute the address of shadow memory corresponding to array[argc] and check the contents at that address to determine if the memory is valid to use (the shadow memory value will be zero if the original address is valid). In other words, this above code checks the notebook.

  4c88df: 75 09                 jne    4c88ea <main+0x3a>
  [... Execute the normal sequence of instructions ...]
  4c88ea: e8 01 3e fd ff        callq  49c6f0 <__asan_report_load8>

If the result of the comparison is zero (meaning the shadow memory value is zero), no error is detected. The memory can be used safely, so the code proceeds normally.

But if the result of the comparison is not zero (meaning the shadow memory value is not zero), an unsafe use of memory has occurred. Instead of proceeding, the binary will invoke __asan_report_load8 to indicate an invalid 8-byte read and stop execution.

(Note: the calculation of the shadow memory address is slightly more complicated for memory access of a size less than 8 bytes. I modified the example source code to use an 8-byte integer to keep detailing the instructions as straightforward as possible.)

Ok, so the compiler did its part by adding these instructions to our generated code. But this raises a few questions:

  • Where does the shadow memory actually live, and who manages it?
  • How is the shadow memory notified when memory is allocated or freed?
  • Where does a function such as __asan_report_load8 originate?

All of these questions are answered by the second part of ASAN: the runtime library.

Runtime Library

Alongside the instrumented code, ASAN requires its runtime library to be linked into the binary. This command is actually using clang as a compiler and a linker:

$ clang++ -g -O1 -fsanitize=address asan.cc -o asan_test
  • -fsanitize=address at the compilation step instructs the compiler to add the instructions. We covered that above.
  • -fsanitize=address at the link step instructs the linker to bundle the ASAN library (or libraries in the case of C++) into the binary. By default, clang statically links the dependencies into the binary.
  • The runtime libraries exist within the compiler installation directory. For example, /usr/lib/clang/11/lib/linux/libclang_rt.asan-x86_64.a.

Back to the questions that the runtime library answers:

Where does the shadow memory actually live, and who manages it?

The primary purpose of the runtime library is allocation and management of the shadow memory. The runtime sets everything up during binary initialization. The compiler is relying on the binary ultimately being linked with the runtime, because as we observed it inserts extra instructions that directly reference shadow memory.

How is the shadow memory notified when memory is allocated or freed?

ASAN intercepts all calls to malloc and free to keep track of what’s happening (writing in the notebook) before passing things along to the actual malloc and free. The interception function is a part of the runtime library, thus a part of the final binary:

$ nm /usr/lib/clang/11/lib/linux/libclang_rt.asan-x86_64.a 2>/dev/null | grep 'T __interceptor_malloc$'
0000000000000000 T __interceptor_malloc

$ nm asan_test | grep 'interceptor_malloc$'
0000000000496290 T __interceptor_malloc

The interception technique is used for a lot more functions to a support a lot more validation beyond just this, but we are focused on the basics.

Where does a function such as __asan_report_load8 originate?

It is probably not a surprise at this point, but this function is also provided by the runtime library:

$ nm /usr/lib/clang/11/lib/linux/libclang_rt.asan-x86_64.a 2>/dev/null | grep -e 'T __asan_report_load[[:digit:]]$'
0000000000000000 T __asan_report_load1
0000000000000000 T __asan_report_load2
0000000000000000 T __asan_report_load4
0000000000000000 T __asan_report_load8

Wrap-Up

ASAN’s two-part approach of compiler instrumentation and a runtime library carries a cost: building a completely separate binary (and any libraries that should also be sanitized) for testing. But the performance benefits versus a tool such as Valgrind make this well worth it.

(Valgrind is a fantastic tool that covers the analysis of ASAN and MSAN, but it has a reversed cost structure: there is no need to re-compile, but you pay with a significant increase to runtime and memory usage. Depending on your binary and the environment in which it is tested, paying these costs may be infeasible.)

It is not a requirement to understand ASAN’s internals to benefit from using it. But when ASAN reports something that you don’t immediately understand, it is always helpful to know just a bit about what it’s doing behind the scenes.

For more information, I highly recommend reading the original AddressSanitizer paper in its entirety.

The Context and the Logic

March 9, 2021

Soroush Khanlou:

As a professional programmer, there are two main types of tasks you work on. I’ve started thinking about them as the context and the logic. […]

This framing, context vs logic, illustrates two things for me:

First, that we all tell ourselves a lie: this job is primarily about the logic, interview candidates should mainly be tested on their ability to think about the logic, a “good” programmer is someone who can write the logic really well. In fact, an overwhelming amount of the job is making the context work. […]

I’m primarily a context programmer. I wish I weren’t — I enjoy writing the logic a lot more — but it is the reality. I should embrace that and treat the context as my job, rather than as an impediment to “my real job”.

Second, if you can make your context simpler and smaller, you can spend less time on it.

(Via Michael Tsai.)

Relentlessly Patient

March 6, 2021

Throughout the first year of my first job, a senior developer on my team relentlessly shredded my code reviews. Dozens of comments, each one explaining in detail why my code was bug-prone, inefficient, or just incorrect. The comments were never mean-spirited, and almost always right.

It was an incredible gift I wouldn’t understand until years later.

It took a lot of time to carefully review that code and provide such a massive amount of constructive feedback. It took a lot of patience to explain, repeatedly, the why behind some of the more subtle suggested changes. Even more so when the review comments extended into meetings, cubicle visits, emails, and so on.

It is hard to overstate the impact that year had on my career and ability as a programmer, both technically and non-technically.

Don’t miss an opportunity to receive honest feedback, always doing so with humility.

And down the road, don’t forget to be the patient senior developer for someone else.

Software Inventory

March 2, 2021

Joel Spolsky:

The “cost” of code inventory is huge. It might add up to six or twelve months of work that is stuck in the assembly line and not yet in customers’ hands. This could be the difference between having a cutting-edge product (iPhone) or constantly playing catchup (Windows Phone).

A great line when it was written in 2012, now aged to perfection.

The reason I’m linking to this article though, is the timeless wisdom about bug tracking that has stuck with me since the first time I read it:

At some point you realize that you’ve put too much work into the bug database and not quite enough work into the product.

  • Suggestion: use a triage system to decide if a bug is even worth recording.
  • Do not allow more than two weeks (in fix time) of bugs to get into the bug database.
  • If you have more than that, stop and fix bugs until you feel like you’re fixing stupid bugs. Then close as “won’t fix” everything left in the bug database. Don’t worry, the severe bugs will come back.

You’re Not Alone

March 1, 2021

Brent Simmons:

I don’t know about other senior developers, but I can tell you about me. I have decades of experience and an amount of wisdom. I’ve written some bad, some good, and a couple great apps over the years.

But I’m no Jedi.

Chris Liscio:

Despite my long history of writing (and debugging) pretty “hardcore” software for Mac and iOS, I still managed to miss a (now-)obvious red flag in my configuration.

We’re all just a bunch of dummies flailing about wildly out here!

Years ago a similarly “hardcore” developer told me that if he gets to lunchtime without realizing he made a colossally simple mistake, the day is a win. Count me in with him, Brent, and Chris: nobody knows it all, and sometimes we even miss things that we do know.

There’s a lot of peace to be found accepting that this happens to everybody.

Paper

March 1, 2021

There are a lot of distractions readily available on a computer, and people are easily distracted.

To stay focused while writing code, try to work on paper as long as possible.

Some of the most productive “programming” I’ve ever done was on paper. Or with a colleague at a whiteboard.

That’s because writing code is mostly thinking. Identifying the problem and its constraints; designing the solution within those constraints; considering implications past and future in light of the design: it’s all thinking. And focused thinking can be done without a computer.

At some point there will be typing. But how much more productive, how much more focused, will that typing be when the code is merely expressing an idea already formed in your notebook?