The breaking tech news this year has been the pervasive spread of "AI" (or rather, statistical modeling based on hidden layer neural networks) into everything. It's the latest hype bubble now that Cryptocurrencies are no longer the freshest sucker-bait in town, and the media (who these days are mostly stenographers recycling press releases) are screaming at every business in tech to add AI to their product.
Well, Apple and Intel and Microsoft were already in there, but evidently they weren't in there enough, so now we're into the silly season with Microsoft's announcement of CoPilot plus Recall, the product nobody wanted.
CoPilot+ is Microsoft's LLM-based add-on for Windows, sort of like 2000's Clippy the Talking Paperclip only with added hallucinations. Clippy was rule-based: a huge bundle of IF ... THEN statements hooked together like a 1980s Expert System to help users accomplish what Microsoft believed to be common tasks, but which turned out to be irritatingly unlike anything actual humans wanted to accomplish. Because CoPilot+ is purportedly trained on what users actually do, it looked plausible to someone in marketing at Microsoft that it could deliver on "help the users get stuff done". Unfortunately, human beings assume that LLMs are sentient and understand the questions they're asked, rather than being unthinking statistical models that cough up the highest probability answer-shaped object generated in response to any prompt, regardless of whether it's a truthful answer or not.
Anyway, CoPilot+ is also a play by Microsoft to sell Windows on ARM. Microsoft don't want to be entirely dependent on Intel, especially as Intel's share of the global microprocessor market is rapidly shrinking, so they've been trying to boost Windows on ARM to orbital velocity for a decade now. The new CoPilot+ branded PCs going on sale later this month are marketed as being suitable for AI (spot the sucker-bait there?) and have powerful new ARM processors from Qualcomm, which are pitched as "Macbook Air killers", largely because they're playing catch-up with Apple's M-series ARM-based processors in terms of processing power per watt and having an on-device coprocessor optimized for training neural networks.
Having built the hardware and the operating system Microsoft faces the inevitable question, why would a customer want this stuff? And being Microsoft, they took the first answer that bubbled up from their in-company echo chamber and pitched it at the market as a forced update to Windows 11. And the internet promptly exploded.
First, a word about Apple. Apple have been quietly adding AI features to macOS and iOS for the past several years. In fact, they got serious about AI in 2015, and every Apple Silicon processor they've released since 2016 has had a neural engine (an AI coprocessor) on board. Now that the older phones and laptops are hitting end of life, the most recent operating system releases are rolling out AI-based features. For example, there's on-device OCR for text embedded in any image. There's a language translation service for the OCR output, too. I can point my phone at a brochure or menu in a language I can't read, activate the camera, and immediately read a surprisingly good translation: this is an actually useful feature of AI. (The ability to tag all the photos in my Photos library with the names of people present in them, and to search for people, is likewise moderately useful: the jury is still out on the pet recognition, though.) So the Apple roll-out of AI has so far been uneventful and unobjectionable, with a focus on identifying things people want to do and making them easier.
Microsoft Recall is not that.
"Hey, wouldn't it be great if we could use AI in Windows to help our users see everything they've ever done on their computer?" Is a great pitch, and Recall kinda-sorta achieves this. But the implementation is soemthing rather different. Recall takes snapshots of all the windows on a Windows computer's screen (except the DRM'd media, because the MPAA must have their kilo of flesh) and saves them locally. The local part is good: the term for software that takes regular screenshots and saves them in the cloud is "part of a remote access trojan". It then OCRs any text in the images, and I believe also transcribes any speech, and saves the resulting output in an unencrypted SQLite database stored in:
C:\Users\$USER\AppData\Local\CoreAIPlatform.00\UKP{GUID}
And there are tools already out there to slurp through the database and see what's in it, such as TotalRecall.
Surprise! It turns out that the unencrypted database and the stored images may contain your user credentials and passwords. And other stuff. Got a porn habit? Congratulations, anyone with access to your user account can see what you've been seeing. Use a password manager like 1Password? Sorry, your 1Password passwords are probably visible via Recall, now.
Now, "unencrypted" is relative; the database is stored on a filesystem which should be encrypted using Microsoft's BitLocker. But anyone with credentials for your Microsoft account can decrypt it and poke around. Indeed, anyone with access to your PC, unlocked, has your entire world at their fingertips.
But this is an utter privacy shit-show. Victims of domestic abuse are at risk of their abuser trawling their PC for any signs that they're looking for help. Anyone who's fallen for a scam that gave criminals access to their PC is also completely at risk.
Worse: even if you don't use Recall, if you send an email or instant message to someone else who does then it will be OCRd and indexed via Recall: and preserved for posterity.
Now imagine the shit-show when this goes corporate.
And it turns out that Microsoft is pushing this feature into the latest update of Windows 11 for all compatible hardware and making it impossible to remove or disable, because that tactic has worked so well for them in the past at driving the uptake of new technologies that Microsoft wanted its ~~customers~~ victims to start using. Like, oh, Microsoft Internet Explorer back in 2001, and remember how well that worked out for them.
Suddenly every PC becomes a target for Discovery during legal proceedings. Lawyers can subpoena your Recall database and search it, no longer being limited to email but being able to search for terms that came up in Teams or Slack or Signal messages, and potentially verbally via Zoom or Skype if speech-to-text is included in Recall data.
It's a shit-show for any organization that handles medical records or has a duty of legal confidentiality; indeed, for any business that has to comply with GDPR (how does Recall handle the Right to be Forgotten? In a word: badly), or HIPAA in the US. This misfeature contravenes privacy law throughout the EU (and in the UK), and in healthcare organizations everywhere which has a medical right to privacy. About the only people whose privacy it doesn't infringe are the Hollywood studios and Netflix, which tells you something about the state of things.
Recall is already attracting the attention of data protection regulators; I suspect in its current form it's going to be dead on arrival, and those CoPilot+ PCs due to launch on June 18th are going to get a hurried overhaul. It's also going to be interesting to see what Apple does, or more importantly doesn't announce at WWDC next week, which is being trailed as the year when Apple goes all-in on AI.
More to the point, though, Windows Recall blows a hole under the waterline of Microsoft's trustworthiness. Microsoft "got serious" about security earlier this decade, around the time Steve Balmer stepped down as CEO, and managed to recover somwhat from having a reputation for taking a slapdash approach to its users data. But they've been going backwards since 2020, with dick moves like disabling auto-save to local files in Microsoft Word (your autosave data only autosaves to OneDrive), slurping all incoming email for accounts accessed via Microsoft Outlook into Microsoft's own cloud for AI training purposes (ask the Department of Justice how they feel about Microsoft potentially having access to the correspondence for all their investigations in progress), and now this. Recall undermines trust, and once an institution loses trust it's really hard to regain it.
Some commentators are snarking that Microsoft really really wants to make 2025 the year of Linux on the Desktop, and it's kind of hard to refute them right now.
I've gotten a few bits of feedback asking for my thoughts and/or reactions to the whole "xz backdoor" thing that happened over the past couple of days. Most of my thoughts on the matter apply to autoconf and friends, and they aren't great.
I don't have to cross paths with those tools too often these days, but there was a point quite a while back when I was constantly building things from source, and a ./configure --with-this --with-that was a given. It was a small joy when the thing let me reuse the old configure invocation so I didn't have to dig up the specifics again.
I got that the whole reason for autoconf's derpy little "recipes" is that you want to know if the system you're on supports X, or can do Y, or exactly what flavor of Z it has, so you can #ifdef around it or whatever. It's not quite as relevant today, but sure, there was once a time when a great many Unix systems existed and they all had their own ways of handling stuff, and no two were the same.
So, okay, fine, at some point it made sense to run programs to empirically determine what was supported on a given system. What I don't understand is why we kept running those stupid little shell snippets and little bits of C code over and over. It's like, okay, we established that this particular system does <library function foobar> with two args, not three. So why the hell are we constantly testing for it over and over?
Why didn't we end up with a situation where it was just a standard thing that had a small number of possible values, and it would just be set for you somewhere? Whoever was responsible for building your system (OS company, distribution packagers, whatever) could leave something in /etc that says "X = flavor 1, Y = flavor 2" and so on down the line.
And, okay, fine, I get that there would have been all kinds of "real OS companies" that wouldn't have wanted to stoop to the level of the dirty free software hippies. Whatever. Those same hippies could have run the tests ONCE per platform/OS combo, put the results into /etc themselves, and then been done with it.
Then instead of testing all of that shit every time we built something from source, we'd just drag in the pre-existing results and go from there. It's not like the results were going to change on us. They were a reflection of the way the kernel, C libraries, APIs and userspace happened to work. Short of that changing, the results wouldn't change either.
But no, we never got to that point, so it's still normal to ship a .tar.gz with an absolute crap-ton of dumb little macro files that run all kinds of inscrutable tests that give you the same answers that they did the last time they ran on your machine or any other machine like yours, and WILL give the same answers going forward.
That means it's totally normal to ship all kinds of really crazy looking stuff, and so when someone noticed that and decided to use that as their mechanism for extracting some badness from a so-called "test file" that was actually laden with their binary code, is it so surprising that it happened? To me, it seems inevitable.
Incidentally, I want to see what happens if people start taking tarballs from various projects and diff them against the source code repos for those same projects. Any file that "appears" in the tarball that's allegedly due to auto[re]conf being run on the project had better match something from the actual trees of autoconf, automake, ranlib, gettext, or whatever else goofy meta-build stuff is being used these days.
$ find . -type f | sort | xargs sha1sum 7d963e5f46cd63da3c1216627eeb5a4e74a85cac ./ax_pthread.m4 c86c8f8a69c07fbec8dd650c6604bf0c9876261f ./build-to-host.m4 0262f06c4bba101697d4a8cc59ed5b39fbda4928 ./getopt.m4 e1a73a44c8c042581412de4d2e40113407bf4692 ./gettext.m4 090a271a0726eab8d4141ca9eb80d08e86f6c27e ./host-cpu-c-abi.m4 961411a817303a23b45e0afe5c61f13d4066edea ./iconv.m4 46e66c1ed3ea982b8d8b8f088781306d14a4aa9d ./intlmacosx.m4 ad7a6ffb9fa122d0c466d62d590d83bc9f0a6bea ./lib-ld.m4 7048b7073e98e66e9f82bb588f5d1531f98cd75b ./lib-link.m4 980c029c581365327072e68ae63831d8c5447f58 ./lib-prefix.m4 d2445b23aaedc3c788eec6037ed5d12bd0619571 ./libtool.m4 421180f15285f3375d6e716bff269af9b8df5c21 ./lt~obsolete.m4 f98bd869d78cc476feee98f91ed334b315032c38 ./ltoptions.m4 530ed09615ee6c7127c0c415e9a0356202dc443e ./ltsugar.m4 230553a18689fd6b04c39619ae33a7fc23615792 ./ltversion.m4 240f5024dc8158794250cda829c1e80810282200 ./nls.m4 f40e88d124865c81f29f4bcf780512718ef2fcbf ./po.m4 f157f4f39b64393516e0d5fa7df8671dfbe8c8f2 ./posix-shell.m4 4965f463ea6a379098d14a4d7494301ef454eb21 ./progtest.m4 15610e17ef412131fcff827cf627cf71b5abdb7e ./tuklib_common.m4 166d134feee1d259c15c0f921708e7f7555f9535 ./tuklib_cpucores.m4 e706675f6049401f29fb322fab61dfae137a2a35 ./tuklib_integer.m4 41f3f1e1543f40f5647336b0feb9d42a451a11ea ./tuklib_mbstr.m4 b34137205bc9e03f3d5c78ae65ac73e99407196b ./tuklib_physmem.m4 f1088f0b47e1ec7d6197d21a9557447c8eb47eb9 ./tuklib_progname.m4 86644b5a38de20fb43cc616874daada6e5d6b5bb ./visibility.m4 $
... there's no build-to-host.m4 with that sha1sum out there, *except* for the bad one in the xz release. That part was caught... but what about every other auto* blob in every other project out there? Who or what is checking those?
And finally, yes, I'm definitely biased. My own personal build system has a little file that gets installed on a machine based on how the libs and whatnot work on it. That means all of the Macs of a particular version of the OS get the same file. All of the Debian boxes running the same version get the same file, and so on down the line.
I don't keep asking the same questions every time I go to build stuff. That's just madness.
Billions of miles away at the edge of the Solar System, Voyager 1 has gone mad and has begun to die.
Let’s start with the “billions of miles”. Voyager 1 was launched in early September 1977. Jimmy Carter was a hopeful new President. Yugoslavia and the USSR were going concerns, as were American Motors, Pan Am, F.W. Woolworth, Fotomat booths, Borders bookshops, and Pier 1. Americans were watching Happy Days, M*A*S*H and Charlie’s Angels on television; their British cousins were watching George and Mildred, The Goodies, and Tom Baker as the Fourth Doctor. If you turned on the radio, “Hotel California” by The Eagles was alternating with “Dancing Queen” by Abba (and, if we want to be completely honest, “Car Wash” by Rose Royce). Most cars still ran on leaded gasoline, most phones were still rotary dial, and the Internet was a wonky idea that was still a few weeks from a working prototype.
_The Thorn Birds_ was on top of everyone’s bestseller list. The first Apple II home computer had just gone on sale. The Sex Pistols were in the studio wrapping up _Never Mind The Bollocks_; they would tour on it for just three months and then break up, and within another year Sid Vicious would be dead of a heroin overdose. Barack Obama was a high school junior living with his grandparents in Honolulu, Hawaii: his grades were okay, but he spent most of his time hanging with his pot-smoking friends in the “Choom Gang”. Boris Johnson was tucked away at the elite Ashdown House boarding school while his parents marriage was slowly collapsing: although he was only thirteen, he had already adopted his signature hair style. Elvis had just died on the toilet a few weeks ago. It was the summer of Star Wars.
And Voyager 1 was blasting off for a tour of the Solar System.
There’s no way to pack the whole story of Voyager 1 into a single blog post. Here’s the TLDR: Voyager was the first spacecraft to fly past Jupiter, and the first to take close-up photos of Jupiter’s moons. It flew on past Saturn, and examined Saturn’s moon Titan, the only moon with an atmosphere. And then it flew onwards, on and on, for another forty years. It officially left the Solar System and entered interstellar space in 2012. It just kept going, further and further into the infinite emptiness.
(You know about the Golden Record? Come on, everybody knows about the Golden Record. It’s kind of hokey and cheesy and also kind of amazing and great.)
Voyager has grown old. It was never designed for this! Its original mission was supposed to last a bit over three years. Voyager has turned out to be much tougher than anyone ever imagined, but time gets us all. Its power source is a generator full of radioactive isotopes, and those are gradually decaying into inert lead. Year by year, the energy declines, the power levels relentlessly fall. Year by year, NASA has been switching off Voyager’s instruments to conserve that dwindling flicker. They turned off its internal heater a few years ago, and they thought that might be the end. But those 1970s engineers built to last, and the circuitry and the valves kept working even as the temperature dropped down, down, colder than dry ice, colder than liquid nitrogen, falling towards absolute zero.
(Voyager stored its internal data on a digital tape recorder. Yes, a tape recorder, storing information on magnetic tape. It wasn’t designed to function at a hundred degrees below zero. It wasn’t designed to work for decades, winding and rewinding, endlessly re-writing data. But it did.)
Voyager kept going, and kept going, until it was over 15 billion kilometers away. At the speed of light, the Moon is one and a half seconds away. The Sun is about 8 minutes away. Voyager is twenty-two hours away. Send a radio signal to it at lunch on Monday, and you’ll get a response back Wednesday morning.
* * *
I could go on at great length about Voyager — the discoveries it has made, how amazing it has all been, the Deep Space Network that has maintained contact over the decades, the ever shrinking crew of aging technicians keeping it alive on a shoestring budget. But I’ll restrict myself to just this: the Pale Blue Dot.
In 1990, just before Voyager’s camera shut down forever, the probe turned around and looked backwards. It zoomed in and took a picture of Earth. But by that time, it was so far away that Earth was just a single pale blue pixel. Look at the right-most band of light. A little past halfway down — see that speck? It’s not a defect. It’s not something on your screen. That’s the Earth.
“That’s here. That’s home. That’s us. On it everyone you love, everyone you know, everyone you ever heard of, every human being who ever was, lived out their lives. The aggregate of our joy and suffering, thousands of confident religions, ideologies, and economic doctrines, every hunter and forager, every hero and coward, every creator and destroyer of civilization, every king and peasant, every young couple in love, every mother and father, hopeful child, inventor and explorer, every teacher of morals, every corrupt politician, every “superstar,” every “supreme leader,” every saint and sinner in the history of our species lived there – on a mote of dust suspended in a sunbeam.” — Carl Sagan
Voyager kept going for another 34 years after that photo. It’s still going. It has left the grip of the Sun’s gravity, so it’s going to fall outward forever.
* * *
Here’s a bit of trivia: Voyager 1 currently holds the record for most distant active spacecraft. It’s not even close. The only other contender is Voyager’s little sister, Voyager 2, which had a different mission profile and so lags billions of kilometers behind their older sibling.
Here’s another bit of trivia: if you’re reading this in 2024? It’s very unlikely that you will live to see that record broken. There are only two other spacecraft outside the Solar System — Voyager 2 and New Horizons. Both of them are going to die before they get as far as Voyager 1. And nobody — not NASA, not the Chinese, not the EU — is currently planning to launch another spacecraft to those distances. In theory we could. In practice, we have other priorities.
* * *
We thought we knew how Voyager would end. The power would gradually, inevitably, run down. The instruments would shut off, one by one. The signal would get fainter. Eventually either the last instrument would fail for lack of power, or the signal would be lost.
We didn’t expect that it would go mad.
In December 2023, Voyager started sending back gibberish instead of data. A software glitch, though perhaps caused by an underlying hardware problem; a cosmic ray strike, or a side effect of the low temperatures, or just aging equipment randomly causing some bits to flip.
The problem was, the gibberish was coming from the flight direction software — the operating system, as it were. And no copy of that operating system remained in existence on Earth.
(This is a problem NASA long since solved. These days, every space probe that launches, leaves a perfect duplicate back on Earth. Remember in “The Martian”, how they had another copy of Pathfinder sitting under a tarp in a warehouse? That’s accurate. It’s been standard practice for 30 years. But back in 1977, nobody had thought of that yet.)
Voyager Mission Control used to be a couple of big rooms full of busy people, computers, giant screens. Now it’s a single room in a small office building in the San Gabriel Valley, in between a dog training school and a McDonalds. The Mission Control team is a handful of people, none of them young, several well past retirement age.
And they’re trying to fix the problem. But right now, it doesn’t look good. You can’t just download a new OS from 15 billion kilometers away. (For starters, there isn’t the bandwidth.) They would have to figure out the problem, figure out if a workaround is possible, and then apply it… all with a round-trip time of 45 hours for every communication with a probe that is flying away from us at a million miles a day. They’re trying, but nobody likes their odds.
So at some point — not tomorrow, not next week, but at some point in the next few months — they’ll probably have to admit defeat. And then they’ll declare Voyager 1 officially over, dead and done, the end of a long song.
And that’s all.
Hello! I always wish that command line tools came with data about how popular their various options are, like:
So I asked about people’s favourite git config options on Mastodon:
what are your favourite git config options to set? Right now I only really have
git config push.autosetupremote true
andgit config init.defaultBranch main
set in my~/.gitconfig
, curious about what other people set
As usual I got a TON of great answers and learned about a bunch of very popular git config options that I’d never heard of.
I’m going to list the options, starting with (very roughly) the most popular ones. Here’s a table of contents:
All of the options are documented in man git-config
, or this page.
pull.ff only
or pull.rebase true
These two were the most popular. These both have similar goals: to avoid accidentally creating a merge commit
when you run git pull
on a branch where the upstream branch has diverged.
pull.rebase true
is the equivalent of running git pull --rebase
every time you git pull
pull.ff only
is the equivalent of running git pull --ff-only
every time you git pull
I’m pretty sure it doesn’t make sense to set both of them at once, since --ff-only
overrides --rebase
.
Personally I don’t use either of these since I prefer to decide how to handle
that situation every time, and now git’s default behaviour when your branch has
diverged from the upstream is to just throw an error and ask you what to do
(very similar to what git pull --ff-only
does).
merge.conflictstyle zdiff3
Next: making merge conflicts more readable! merge.conflictstyle zdiff3
and merge.conflictstyle diff3
were both super popular (“totally indispensable”).
The main idea is The consensus seemed to be “diff3 is great, and zdiff3 (which is newer) is even better!”.
So what’s the deal with diff3
. Well, by default in git, merge conflicts look like this:
<<<<<<< HEAD
def parse(input):
return input.split("\n")
=======
def parse(text):
return text.split("\n\n")
>>>>>>> somebranch
I’m supposed to decide whether input.split("\n")
or text.split("\n\n")
is
better. But how? What if I don’t remember whether \n
or \n\n
is right? Enter diff3!
Here’s what teh same merge conflict look like with merge.conflictstyle diff3
set:
<<<<<<< HEAD
def parse(input):
return input.split("\n")
||||||| b9447fc
def parse(input):
return input.split("\n\n")
=======
def parse(text):
return text.split("\n\n")
>>>>>>> somebranch
This has extra information: now the original version of the code is in the middle! So we can see that:
\n\n
to \n
input
to text
So presumably the correct merge conflict resolution is return
text.split("\n")
, since that combines the changes from both sides.
I haven’t used zdiff3, but a lot of people seem to think it’s better. The blog post Better Git Conflicts with zdiff3 talks more about it.
rebase.autosquash true
Autosquash was also a new feature to me. The goal is to make it easier to modify old commits.
Here’s how it works:
add parsing code
git commit --fixup OLD_COMMIT_ID
, which gives the new commit the commit message fixup! add parsing code
git rebase --autosquash main
, it will automatically combine all the fixup!
commits with their targetsrebase.autosquash true
means that --autosquash
always gets passed automatically to git rebase
.
rebase.autostash true
This automatically runs git stash
before a git rebase and git stash pop
after. It basically passes --autostash
to git rebase
.
Personally I’m a little scared of this since it potentially can result in merge conflicts after the rebase, but I guess that doesn’t come up very often for people since it seems like a really popular configuration option.
push.default simple
, push.default current
These push.default
options tell git push
to automatically push the current branch to a remote branch with the same name.
push.default simple
is the default in Git. It only works if your branch is already tracking a remote branchpush.default current
is similar, but it’ll always push the local branch to a remote branch with the same name.push.autoSetupRemote
and push.default simple
together seem to do basically the same thing as push.default current
current
seems like a good setting if you’re confident that you’re never going
to accidentally make a local branch with the same name as an unrelated remote
branch. Lots of people have branch naming conventions (like julia/my-change
)
that make this kind of conflict very unlikely, or just have few enough
collaborators that branch name conflicts probably won’t happen.
init.defaultBranch main
Create a main
branch instead of a master
branch when creating a new repo.
commit.verbose true
This adds the whole commit diff in the text editor where you’re writing your commit message, to help you remember what you were doing.
rerere.enabled true
This enables rerere (”reuse recovered resolution”), which remembers how you resolved merge conflicts
during a git rebase
and automatically resolves conflicts for you when it can.
help.autocorrect 10
By default git’s autocorrect try to check for typos (like git ocmmit
), but won’t actually run the corrected command.
If you want it to run the suggestion automatically, you can set
help.autocorrect
to 1
(run after 0.1 seconds), 10
(run after 1 second), immediate
(run
immediately), or prompt
(run after prompting)
core.pager delta
The “pager” is what git uses to display the output of git diff
, git log
, git show
, etc. People set it to:
delta
(a fancy diff viewing tool with syntax highlighting)less -x5,9
(sets tabstops, which I guess helps if you have a lot of files with tabs in them?)less -F -X
(not sure about this one, -F
seems to disable the pager if everything fits on one screen if but my git seems to do that already anyway)cat
(to disable paging altogether)I used to use delta
but turned it off because somehow I messed up the colour
scheme in my terminal and couldn’t figure out how to fix it. I think it’s a
great tool though.
I believe delta also suggests that you set up interactive.diffFilter delta --color-only
to syntax highlight code when you run git add -p
.
diff.algorithm histogram
Git’s default diff algorithm often handles functions being reordered badly. For example look at this diff:
-.header {
+.footer {
margin: 0;
}
-.footer {
+.header {
margin: 0;
+ color: green;
}
I find it pretty confusing. But with diff.algorithm histogram
, the diff looks like this instead, which I find much clearer:
-.header {
- margin: 0;
-}
-
.footer {
margin: 0;
}
+.header {
+ margin: 0;
+ color: green;
+}
Some folks also use patience
, but histogram
seems to be more popular. When to Use Each of the Git Diff Algorithms has more on this.
core.excludesfile
: a global .gitignorecore.excludeFiles = ~/.gitignore
lets you set a global gitignore file that
applies to all repositories, for things like .idea
or .DS_Store
that you
never want to commit to any repo. It defaults to ~/.config/git/ignore
.
includeIf
: separate git configs for personal and workLots of people said they use this to configure different email addresses for personal and work repositories. You can set it up something like this:
[includeIf "gitdir:~/code/<work>/"]
path = "~/code/<work>/.gitconfig"
url."git@github.com:".insteadOf 'https://github.com/'
I often accidentally clone the HTTP version of a repository instead of the
SSH version and then have to manually go into ~/.git/config
and edit the
remote URL. This seems like a nice workaround: it’ll replace
https://github.com
in remotes with git@github.com:
.
Here’s what it looks like in ~/.gitconfig
since it’s kind of a mouthful:
[url "git@github.com:"]
insteadOf = "https://github.com/"
One person said they use pushInsteadOf
instead to only do the replacement for
git push
because they don’t want to have to unlock their SSH key when
pulling a public repo.
A couple of other people mentioned setting insteadOf = "gh:"
so they can git
remote add gh:jvns/mysite
to add a remote with less typing.
fsckobjects
: avoid data corruptionA couple of people mentioned this one. Someone explained it as “detect data corruption eagerly. Rarely matters but has saved my entire team a couple times”.
transfer.fsckobjects = true
fetch.fsckobjects = true
receive.fsckObjects = true
I’ve never understood anything about submodules but a couple of person said they like to set:
status.submoduleSummary true
diff.submodule log
submodule.recurse true
I won’t attempt to explain those but there’s an explanation on Mastodon by @unlambda here.
Here’s everything else that was suggested by at least 2 people:
blame.ignoreRevsFile .git-blame-ignore-revs
lets you specify a file with commits to ignore during git blame
, so that giant renames don’t mess up your blamesbranch.sort -committerdate
, makes git branch
sort by most recently used branches instead of alphabetical, to make it easier to find branches. tag.sort taggerdate
is similar for tags.color.ui false
: to turn off colourcommit.cleanup scissors
: so that you can write #include
in a commit message without the #
being treated as a comment and removedcore.autocrlf false
: on Windows, to work well with folks using Unixcore.editor emacs
: to use emacs (or another editor) to edit commit messagescredential.helper osxkeychain
: use the Mac keychain for managingdiff.tool difftastic
: use difftastic (or meld
or nvimdiffs
) to display diffsdiff.colorMoved default
: uses different colours to highlight lines in diffs that have been “moved”diff.colorMovedWS allow-indentation-change
: with diff.colorMoved
set, also ignores indentation changesdiff.context 10
: include more context in diffsfetch.prune true
and fetch.prunetags
- automatically delete remote tracking branches that have been deletedgpg.format ssh
: allow you to sign commits with SSH keyslog.date iso
: display dates as 2023-05-25 13:54:51
instead of Thu May 25 13:54:51 2023
merge.keepbackup false
, to get rid of the .orig
files git creates during a merge conflictmerge.tool meld
(or nvim
, or nvimdiff
) so that you can use git mergetool
to help resolve merge conflictspush.followtags true
: push new tags along with commits being pushedrebase.missingCommitsCheck error
: don’t allow deleting commits during a rebaserebase.updateRefs true
: makes it much easier to rebase multiple stacked branches at a time. Here’s a blog post about it.I generally set git config options with git config --global NAME VALUE
, for
example git config --global diff.algorithm histogram
. I usually set all of my
options globally because it stresses me out to have different git behaviour in
different repositories.
If I want to delete an option I’ll edit ~/.gitconfig
manually, where they look like this:
[diff]
algorithm = histogram
My git config is pretty minimal, I already had:
init.defaultBranch main
push.autoSetupRemote true
merge.tool meld
diff.colorMoved default
(which actually doesn’t even work for me for some reason but I haven’t found the time to debug)and I added these 3 after writing this blog post:
diff.algorithm histogram
branch.sort -committerdate
merge.conflictstyle zdiff3
I’d probably also set rebase.autosquash
if making carefully crafted pull
requests with multiple commits were a bigger part of my life right now.
I’ve learned to be cautious about setting new config options – it takes me a
long time to get used to the new behaviour and if I change too many things at
once I just get confused. branch.sort -committerdate
is something I was
already using anyway (through an alias), and I’m pretty sold that diff.algorithm
histogram
will make my diffs easier to read when I reorder functions.
I’m always amazed by how useful to just ask a lot of people what stuff they like and then list the most commonly mentioned ones, like with this list of new-ish command line tools I put together a couple of years ago. Having a list of 20 or 30 options to consider feels so much more efficient than combing through a list of all 600 or so git config options
It was a little confusing to summarize these because git’s default options have actually changed a lot of the years, so people occasionally have options set that were important 8 years ago but today are the default. Also a couple of the experimental options people were using have been removed and replaced with a different version.
I did my best to explain things accurately as of how git works right now in 2024 but I’ve definitely made mistakes in here somewhere, especially because I don’t use most of these options myself. Let me know on Mastodon if you see a mistake and I’ll try to fix it.
I might also ask people about aliases later, there were a bunch of great ones that I left out because this was already getting long.
On large platforms, it's impossible to have policies on things like moderation, spam, fraud, and sexual content that people agree on. David Turner made a simple game to illustrate how difficult this is even in a trivial case, No Vehicles in the Park. If you haven't played it yet, I recommend playing it now before continuing to read this document.
The idea behind the site is that it's very difficult to get people to agree on what moderation rules should apply to a platform. Even if you take a much simpler example, what vehicles should be allowed in a park given a rule and some instructions for how to interpret the rule, and then ask a small set of questions, people won't be able to agree. On doing the survey myself, one of the first reactions I had was that the questions aren't chosen to be particularly nettlesome and there are many edge cases Dave could've asked about if he wanted to make it a challenge. And yet, despite not making the survey particularly challenging, there isn't broad agreement on the questions. Comments on the survey also indicate another problem with rules, which is that it's much harder to get agreement than people think it will be. If you read comments on rule interpretation or moderation on lobsters, HN, reddit, etc., when people suggest a solution, the vast majority of people will suggest something that anyone who's done moderation or paid attention to how moderation works knows cannot work, the moderation equivalent of "I could build that in a weekend"1. Of course we see this on Dave's game as well. The top HN comment, and a very common sentiment elsewhere is2:
I'm fascinated by the fact that my takeaway is the precise opposite of what the author intended.
To me, the answer to all of the questions was crystal-clear. Yes, you can academically wonder whether an orbiting space station is a vehicle and whether it's in the park, but the obvious intent of the sign couldn't be clearer. Cars/trucks/motorcycles aren't allowed, and obviously police and ambulances (and fire trucks) doing their jobs don't have to follow the sign.
So if this is supposed to be an example of how content moderation rules are unclear to follow, it's achieving precisely the opposite.
And someone agreeingly replies with:
Exactly. There is a clear majority in the answers.
After going through the survey, you get a graph showing how many people answered yes and no to each question, which is where the "clear majority" comes from. First of all, I think it's not correct to say that there is a clear majority. But even supposing that there were, there's no reason to think that there being a majority means that most people agree with you even if you take the majority position in each vote. In fact, given how "wiggly" the per-question majority graph looks, it would be extraordinary if it were the case that being in the majority for each question meant that most people agreed with you or that there's any sert of positions that the majority of people agree on. Although you could construct a contrived dataset where this is true, it would be very surprising if this were true in a natural dataset.
If you look at the data (which isn't available on the site, but Dave was happy to pass it along when I asked), as of when I pulled the data, there was no set of answers which the majority of users agreed on and it was not even close. I pulled this data shortly after I posted on the link to HN, when the vast majority of responses were HN readers, who are more homogeneous than the population at large. Despite these factors making it easier to find agreement, the most popular set of answers was only selected by 11.7% of people. This is the position the top commenter says is "obvious", but it's a minority position not only in the sense that only 11.7% of people agree and 88.3% of people disagree, almost no one holds a position with only a small amount of disagreement from this allegedly obvious position. The 2nd and 3rd most common positions, representing 8.5% and 6.5% of the vote, respectively, are similar and only disagree on whether or not a non-functioning WW-II era tank that's part of a memorial violates the rule. Beyond that, approximately 1% of people hold the 4th, 5th, 6th, and 7th most popular positions, with every less popular position having less than 1% agreement, with a fairly rapid drop from there as well. So, 27% of people find themselves in agreement with significantly more than 1% of other users (the median user agrees with 0.16% of other users). See below for a plot of what this looks like. The opinions are sorted from most popular to least popular, with the most popular on the left. A log scale is used because there's so little agreement on opinions that a linear scale plot looks like a few points above zero followed by a bunch of zeros.
In response to the same comment, Michael Chermside had the reasonable but not highly upvoted comment,
To me, the answer to all of the questions was crystal-clear.
That's not particularly surprising. But you may be asking the wrong question.
If you want to know whether the rules are clear then I think that the right question to ask is not "Are the answers crystal-clear to you?" but "Will different people produce the same answers?".
If we had a sharp drop in the graph at one point then it would suggest that most everyone has the same cutoff; instead we see a very smooth curve as if different people read this VERY SIMPLE AND CLEAR rule and still didn't agree on when it applied.
Many (and probably actually most) people are overconfident when predicting what other people think is obvious and often incorrectly assume that other people will find the same things obvious. This is more true of the highly-charged issues that result in bitter fights about moderation than the simple "no vehicles in the park" example, but even this simple example demonstrates not only the difficulty in reaching agreement, but the difficulty in understanding how difficult it is to reach agreement.
To use an example from another context that's more charged, consider in any sport and whether or not a player is considered to be playing fair or is making dirty plays and should be censured. We could look at many different players from many different sports, so let's arbitrarily pick Draymond Green. If you ask any serious basketball fan who's not a Warriors fan, who's the dirtiest player in the NBA today, you'll find general agreement that it's Draymond Green (although some people will argue for Dillon Brooks, so if you want near uniform agreement, you'll have to ask for the top two dirtiest players). And yet, if you ask a Warriors fan about Draymond, most have no problem explaining away every dirty play of his. So if you want to get uniform agreement to a question that's much more straightforward than the "no vehicles in the park" question, such as, "is it ok to stomp on another player's just and then use them as a springboard to leap into the air? on top of a hundred other dirty plays", you'll find that for many such seemingly obvious questions, a sizable group of people will have extremely strong disagreements with the "obvious" answer. When you move away from a contrived, abstract, example like "no vehicles in the park" to a real-world issue that people have emotional attachments to, it generally becomes impossible to get agreement even in cases where disinterested third parties would all agree, which we observed is already impossible even without emotional attachment. And when you move away from sports into issues people care even more strongly about, like politics, the disagreements get stronger.
While people might be able to "agree to disagree" on whether or not a a non-functioning WW-II era tank that's part of a memorial violates the "no vehicles in the park" rule (giving resulting in a a pair of positions that accounts for 15% of the vote), in reality, people often have a hard time agreeing to disagree over what outsiders would consider very small differences of opinion. Charged issues are often fractally contentious, causing disagreement among people who hold all but identical opinions, making them significantly more difficult to agree on than our "no vehicles in the park" example.
To pick a real-world example, consider Jo Freeman, probably best known among tech folks for writing "The Tyranny of Structurelessness", was also a feminist who, in 1976, wrote about her experienced being canceled for minute differences in opinion and how this was unfortunately common in the Movement (using the term "trashed" and not "canceled" because cancellation hadn't come into common usage yet and, in my opinion, "trashed" is the better term anyway). In the nearly fifty years since Jo Freeman wrote "Trashing", the propensity of humans to pick on minute differences and attempt to destroy anyone who doesn't completely agree with them hasn't changed; for a recent, parallel, example, Natalie Wynn's similar experience.
For people with opinions far away in the space of commonly held opinions, the differences in opinion between Natalie and the people calling for her to be deplatformed are fairly small. But, not only did these "small" differences in opinion result in people calling for Natalie to be deplatformed, they called for her to be physically assaulted, doxed, etc., and they suggested the same treatment suggested for her friends and associates as well as people who didn't really associate with her, but publicly talked about similar topics and didn't cancel her. Even now, years later, she still gets calls to be deplatformed and I expect this will continue past the end of my life (when I wrote this, I did a Twitter search and found a long thread from someone ranting about what a horrible human being Natalie is for the alleged transgression discussed in the video, dated 10 days ago, and it's easy to find more of these rants). I'm not going to attempt to describe the difference in positions because the positions are close enough that, to describe them would take something like 5k to 10k words (as opposed to, say, a left-wing vs. a right-wing politician, where the difference is blatant enough that you can describe in a sentence or two); you can watch the hour in the 1h40m video that's dedicated to the topic if you want to know the full details.
The point here is just that, if you look at almost any person who has public opinions on charged issues, the opinion space is fractally contentious enough that no large platform can satisfy user preferences because users will disagree over what content should be moderated off the platform and what content should be allowed. And, of course, this problem scales up as the platform gets larger3.
If you're looking for work, Freshpaint is hiring in engineering, sales, and recruitingr. Disclaimer: I may be biased since I'm an investor, but they seem to have found product-market fit and are rapidly growing.
Thanks to Peter Bhat Harkins, Dan Gackle, Laurence Tratt, Gary Bernhardt, David Turner, Kevin Burke, and Bert Muthalaly for comments/corrections/discussion.
Something I've repeatedly seen on every forum I've been on is the suggestion that we just don't need moderation after all and all our problems will be solved if we just stop this nasty censorship. If you want a small forum that's basically 4chan, then no moderation can work fine, but even if you want a big platform that's like 4chan, no moderation doesn't actually work. If we go back to those Twitter numbers, 300M users and 1M bots removed a day, if you stop doing this kind of "censorship", the platform will quickly fill up with bots to the point that everything you see will be spam/scam/phishing content or content from an account copying content from somewhere else or using LLM-generated content to post scam/scam/phishing content. Not only will most accounts be bots, bots will be a part of large engagement/voting rings that will drown out all human content.
The next most naive suggestion is to stop downranking memes, dumb jokes, etc., often throw in with a comment like "doesn't anyone here have a sense of humor?". If you look at why forums with upvoting/ranking ban memes, it generally happens after the forum becomes totally dominated by memes/comics because people upvote those at a much higher rate than any kind of content with a bit of nuance, and not everyone wants a forum that's full of the lowest common denominator meme/comic content. And as for "having a sense of humor" in comments, if you look forums that don't ban cheap humor, top comments will generally end up dominated by these, e.g., for maybe 3-6 months, one the top comments on any kind of story about a man doing anything vaguely heroic on reddit forums that don't ban this kind of cheap was some variant of "I'm surprised he can walk with balls that weigh 900 lbs.", often repeated multiple times by multiple user, amidst a sea of the other cheap humor that was trendy during that period. Of course, some people actually want that kind of humor to dominate the comments, they actually want to see the same comment 150 times a day for months on end, but I suspect most people who grumpily claim "no one has a sense of humor here" when their cheap humor gets flagged don't actually want to read a forum that's full of the trendy quips of the month.
[return]Nowadays, it's trendy to use "federation" as a cure-all in the same way people used "blockchain" as a cure-all five years ago, but federation doesn't solve this problem for the typical user. I actually had a conversation with someone who notes in their social media bio that they're one of the creators of the ActivityPub spec, who claimed that federation does solve this problem and that Threads adding ActivityPub would create some kind of federating panacea. I noted that fragmentation is already a problem for many users on Mastodon and whether or not Threads will be blocked is contentious and will only increase fragmentation, and the ActivityPub guy replied with something like "don't worry about that, most people won't block Threads, and it's their problem if they do.w
I noted that a problem many of my non-technical friends had when they tried Mastodon was that they'd pick a server and find that they couldn't follow someone they wanted to follow due to some kind of server blocking or ban. So then they'd try another server to follow this one person and then find that another person they wanted to follow is blocked. The fundamental problem is that users on different servers want different things to be allowed, which then results in no server giving you access to everything you want to see. The ActivityPub guy didn't have a response to this and deleted his comment.
[return]