1318 stories
·
102 followers

Cruel Luxuries

1 Share
Read the whole story
wmorrell
40 days ago
reply
Share this story
Delete

Is Microsoft trying to commit suicide?

4 Shares

The breaking tech news this year has been the pervasive spread of "AI" (or rather, statistical modeling based on hidden layer neural networks) into everything. It's the latest hype bubble now that Cryptocurrencies are no longer the freshest sucker-bait in town, and the media (who these days are mostly stenographers recycling press releases) are screaming at every business in tech to add AI to their product.

Well, Apple and Intel and Microsoft were already in there, but evidently they weren't in there enough, so now we're into the silly season with Microsoft's announcement of CoPilot plus Recall, the product nobody wanted.

CoPilot+ is Microsoft's LLM-based add-on for Windows, sort of like 2000's Clippy the Talking Paperclip only with added hallucinations. Clippy was rule-based: a huge bundle of IF ... THEN statements hooked together like a 1980s Expert System to help users accomplish what Microsoft believed to be common tasks, but which turned out to be irritatingly unlike anything actual humans wanted to accomplish. Because CoPilot+ is purportedly trained on what users actually do, it looked plausible to someone in marketing at Microsoft that it could deliver on "help the users get stuff done". Unfortunately, human beings assume that LLMs are sentient and understand the questions they're asked, rather than being unthinking statistical models that cough up the highest probability answer-shaped object generated in response to any prompt, regardless of whether it's a truthful answer or not.

Anyway, CoPilot+ is also a play by Microsoft to sell Windows on ARM. Microsoft don't want to be entirely dependent on Intel, especially as Intel's share of the global microprocessor market is rapidly shrinking, so they've been trying to boost Windows on ARM to orbital velocity for a decade now. The new CoPilot+ branded PCs going on sale later this month are marketed as being suitable for AI (spot the sucker-bait there?) and have powerful new ARM processors from Qualcomm, which are pitched as "Macbook Air killers", largely because they're playing catch-up with Apple's M-series ARM-based processors in terms of processing power per watt and having an on-device coprocessor optimized for training neural networks.

Having built the hardware and the operating system Microsoft faces the inevitable question, why would a customer want this stuff? And being Microsoft, they took the first answer that bubbled up from their in-company echo chamber and pitched it at the market as a forced update to Windows 11. And the internet promptly exploded.

First, a word about Apple. Apple have been quietly adding AI features to macOS and iOS for the past several years. In fact, they got serious about AI in 2015, and every Apple Silicon processor they've released since 2016 has had a neural engine (an AI coprocessor) on board. Now that the older phones and laptops are hitting end of life, the most recent operating system releases are rolling out AI-based features. For example, there's on-device OCR for text embedded in any image. There's a language translation service for the OCR output, too. I can point my phone at a brochure or menu in a language I can't read, activate the camera, and immediately read a surprisingly good translation: this is an actually useful feature of AI. (The ability to tag all the photos in my Photos library with the names of people present in them, and to search for people, is likewise moderately useful: the jury is still out on the pet recognition, though.) So the Apple roll-out of AI has so far been uneventful and unobjectionable, with a focus on identifying things people want to do and making them easier.

Microsoft Recall is not that.

"Hey, wouldn't it be great if we could use AI in Windows to help our users see everything they've ever done on their computer?" Is a great pitch, and Recall kinda-sorta achieves this. But the implementation is soemthing rather different. Recall takes snapshots of all the windows on a Windows computer's screen (except the DRM'd media, because the MPAA must have their kilo of flesh) and saves them locally. The local part is good: the term for software that takes regular screenshots and saves them in the cloud is "part of a remote access trojan". It then OCRs any text in the images, and I believe also transcribes any speech, and saves the resulting output in an unencrypted SQLite database stored in:

C:\Users\$USER\AppData\Local\CoreAIPlatform.00\UKP{GUID}

And there are tools already out there to slurp through the database and see what's in it, such as TotalRecall.

Surprise! It turns out that the unencrypted database and the stored images may contain your user credentials and passwords. And other stuff. Got a porn habit? Congratulations, anyone with access to your user account can see what you've been seeing. Use a password manager like 1Password? Sorry, your 1Password passwords are probably visible via Recall, now.

Now, "unencrypted" is relative; the database is stored on a filesystem which should be encrypted using Microsoft's BitLocker. But anyone with credentials for your Microsoft account can decrypt it and poke around. Indeed, anyone with access to your PC, unlocked, has your entire world at their fingertips.

But this is an utter privacy shit-show. Victims of domestic abuse are at risk of their abuser trawling their PC for any signs that they're looking for help. Anyone who's fallen for a scam that gave criminals access to their PC is also completely at risk.

Worse: even if you don't use Recall, if you send an email or instant message to someone else who does then it will be OCRd and indexed via Recall: and preserved for posterity.
Now imagine the shit-show when this goes corporate.

And it turns out that Microsoft is pushing this feature into the latest update of Windows 11 for all compatible hardware and making it impossible to remove or disable, because that tactic has worked so well for them in the past at driving the uptake of new technologies that Microsoft wanted its ~~customers~~ victims to start using. Like, oh, Microsoft Internet Explorer back in 2001, and remember how well that worked out for them.

Suddenly every PC becomes a target for Discovery during legal proceedings. Lawyers can subpoena your Recall database and search it, no longer being limited to email but being able to search for terms that came up in Teams or Slack or Signal messages, and potentially verbally via Zoom or Skype if speech-to-text is included in Recall data.

It's a shit-show for any organization that handles medical records or has a duty of legal confidentiality; indeed, for any business that has to comply with GDPR (how does Recall handle the Right to be Forgotten? In a word: badly), or HIPAA in the US. This misfeature contravenes privacy law throughout the EU (and in the UK), and in healthcare organizations everywhere which has a medical right to privacy. About the only people whose privacy it doesn't infringe are the Hollywood studios and Netflix, which tells you something about the state of things.

Recall is already attracting the attention of data protection regulators; I suspect in its current form it's going to be dead on arrival, and those CoPilot+ PCs due to launch on June 18th are going to get a hurried overhaul. It's also going to be interesting to see what Apple does, or more importantly doesn't announce at WWDC next week, which is being trailed as the year when Apple goes all-in on AI.

More to the point, though, Windows Recall blows a hole under the waterline of Microsoft's trustworthiness. Microsoft "got serious" about security earlier this decade, around the time Steve Balmer stepped down as CEO, and managed to recover somwhat from having a reputation for taking a slapdash approach to its users data. But they've been going backwards since 2020, with dick moves like disabling auto-save to local files in Microsoft Word (your autosave data only autosaves to OneDrive), slurping all incoming email for accounts accessed via Microsoft Outlook into Microsoft's own cloud for AI training purposes (ask the Department of Justice how they feel about Microsoft potentially having access to the correspondence for all their investigations in progress), and now this. Recall undermines trust, and once an institution loses trust it's really hard to regain it.

Some commentators are snarking that Microsoft really really wants to make 2025 the year of Linux on the Desktop, and it's kind of hard to refute them right now.

Read the whole story
wmorrell
51 days ago
reply
denubis
52 days ago
reply
Share this story
Delete

autoconf makes me think we stopped evolving too soon

1 Comment and 2 Shares

I've gotten a few bits of feedback asking for my thoughts and/or reactions to the whole "xz backdoor" thing that happened over the past couple of days. Most of my thoughts on the matter apply to autoconf and friends, and they aren't great.

I don't have to cross paths with those tools too often these days, but there was a point quite a while back when I was constantly building things from source, and a ./configure --with-this --with-that was a given. It was a small joy when the thing let me reuse the old configure invocation so I didn't have to dig up the specifics again.

I got that the whole reason for autoconf's derpy little "recipes" is that you want to know if the system you're on supports X, or can do Y, or exactly what flavor of Z it has, so you can #ifdef around it or whatever. It's not quite as relevant today, but sure, there was once a time when a great many Unix systems existed and they all had their own ways of handling stuff, and no two were the same.

So, okay, fine, at some point it made sense to run programs to empirically determine what was supported on a given system. What I don't understand is why we kept running those stupid little shell snippets and little bits of C code over and over. It's like, okay, we established that this particular system does <library function foobar> with two args, not three. So why the hell are we constantly testing for it over and over?

Why didn't we end up with a situation where it was just a standard thing that had a small number of possible values, and it would just be set for you somewhere? Whoever was responsible for building your system (OS company, distribution packagers, whatever) could leave something in /etc that says "X = flavor 1, Y = flavor 2" and so on down the line.

And, okay, fine, I get that there would have been all kinds of "real OS companies" that wouldn't have wanted to stoop to the level of the dirty free software hippies. Whatever. Those same hippies could have run the tests ONCE per platform/OS combo, put the results into /etc themselves, and then been done with it.

Then instead of testing all of that shit every time we built something from source, we'd just drag in the pre-existing results and go from there. It's not like the results were going to change on us. They were a reflection of the way the kernel, C libraries, APIs and userspace happened to work. Short of that changing, the results wouldn't change either.

But no, we never got to that point, so it's still normal to ship a .tar.gz with an absolute crap-ton of dumb little macro files that run all kinds of inscrutable tests that give you the same answers that they did the last time they ran on your machine or any other machine like yours, and WILL give the same answers going forward.

That means it's totally normal to ship all kinds of really crazy looking stuff, and so when someone noticed that and decided to use that as their mechanism for extracting some badness from a so-called "test file" that was actually laden with their binary code, is it so surprising that it happened? To me, it seems inevitable.

Incidentally, I want to see what happens if people start taking tarballs from various projects and diff them against the source code repos for those same projects. Any file that "appears" in the tarball that's allegedly due to auto[re]conf being run on the project had better match something from the actual trees of autoconf, automake, ranlib, gettext, or whatever else goofy meta-build stuff is being used these days.

$ find . -type f | sort | xargs sha1sum
7d963e5f46cd63da3c1216627eeb5a4e74a85cac  ./ax_pthread.m4
c86c8f8a69c07fbec8dd650c6604bf0c9876261f  ./build-to-host.m4
0262f06c4bba101697d4a8cc59ed5b39fbda4928  ./getopt.m4
e1a73a44c8c042581412de4d2e40113407bf4692  ./gettext.m4
090a271a0726eab8d4141ca9eb80d08e86f6c27e  ./host-cpu-c-abi.m4
961411a817303a23b45e0afe5c61f13d4066edea  ./iconv.m4
46e66c1ed3ea982b8d8b8f088781306d14a4aa9d  ./intlmacosx.m4
ad7a6ffb9fa122d0c466d62d590d83bc9f0a6bea  ./lib-ld.m4
7048b7073e98e66e9f82bb588f5d1531f98cd75b  ./lib-link.m4
980c029c581365327072e68ae63831d8c5447f58  ./lib-prefix.m4
d2445b23aaedc3c788eec6037ed5d12bd0619571  ./libtool.m4
421180f15285f3375d6e716bff269af9b8df5c21  ./lt~obsolete.m4
f98bd869d78cc476feee98f91ed334b315032c38  ./ltoptions.m4
530ed09615ee6c7127c0c415e9a0356202dc443e  ./ltsugar.m4
230553a18689fd6b04c39619ae33a7fc23615792  ./ltversion.m4
240f5024dc8158794250cda829c1e80810282200  ./nls.m4
f40e88d124865c81f29f4bcf780512718ef2fcbf  ./po.m4
f157f4f39b64393516e0d5fa7df8671dfbe8c8f2  ./posix-shell.m4
4965f463ea6a379098d14a4d7494301ef454eb21  ./progtest.m4
15610e17ef412131fcff827cf627cf71b5abdb7e  ./tuklib_common.m4
166d134feee1d259c15c0f921708e7f7555f9535  ./tuklib_cpucores.m4
e706675f6049401f29fb322fab61dfae137a2a35  ./tuklib_integer.m4
41f3f1e1543f40f5647336b0feb9d42a451a11ea  ./tuklib_mbstr.m4
b34137205bc9e03f3d5c78ae65ac73e99407196b  ./tuklib_physmem.m4
f1088f0b47e1ec7d6197d21a9557447c8eb47eb9  ./tuklib_progname.m4
86644b5a38de20fb43cc616874daada6e5d6b5bb  ./visibility.m4
$ 

... there's no build-to-host.m4 with that sha1sum out there, *except* for the bad one in the xz release. That part was caught... but what about every other auto* blob in every other project out there? Who or what is checking those?

And finally, yes, I'm definitely biased. My own personal build system has a little file that gets installed on a machine based on how the libs and whatnot work on it. That means all of the Macs of a particular version of the OS get the same file. All of the Debian boxes running the same version get the same file, and so on down the line.

I don't keep asking the same questions every time I go to build stuff. That's just madness.

Read the whole story
wmorrell
108 days ago
reply
Inertia is real. Seems at least part of it is so many people deciding to move on to new languages and toolchains, and evolving there … instead of trying to cater to build systems trying to keep compatibility with massively obsolete systems.
denubis
109 days ago
reply
Share this story
Delete

Death, Lonely Death

1 Comment and 4 Shares

Billions of miles away at the edge of the Solar System, Voyager 1 has gone mad and has begun to die.

Let’s start with the “billions of miles”. Voyager 1 was launched in early September 1977. Jimmy Carter was a hopeful new President. Yugoslavia and the USSR were going concerns, as were American Motors, Pan Am, F.W. Woolworth, Fotomat booths, Borders bookshops, and Pier 1. Americans were watching Happy Days, M*A*S*H and Charlie’s Angels on television; their British cousins were watching George and Mildred, The Goodies, and Tom Baker as the Fourth Doctor. If you turned on the radio, “Hotel California” by The Eagles was alternating with “Dancing Queen” by Abba (and, if we want to be completely honest, “Car Wash” by Rose Royce). Most cars still ran on leaded gasoline, most phones were still rotary dial, and the Internet was a wonky idea that was still a few weeks from a working prototype.

_The Thorn Birds_ was on top of everyone’s bestseller list. The first Apple II home computer had just gone on sale. The Sex Pistols were in the studio wrapping up _Never Mind The Bollocks_; they would tour on it for just three months and then break up, and within another year Sid Vicious would be dead of a heroin overdose. Barack Obama was a high school junior living with his grandparents in Honolulu, Hawaii: his grades were okay, but he spent most of his time hanging with his pot-smoking friends in the “Choom Gang”.  Boris Johnson was tucked away at the elite Ashdown House boarding school while his parents marriage was slowly collapsing: although he was only thirteen, he had already adopted his signature hair style.  Elvis had just died on the toilet a few weeks ago.  It was the summer of Star Wars.

And Voyager 1 was blasting off for a tour of the Solar System.


There’s no way to pack the whole story of Voyager 1 into a single blog post.  Here’s the TLDR: Voyager was the first spacecraft to fly past Jupiter, and the first to take close-up photos of Jupiter’s moons.  It flew on past Saturn, and examined Saturn’s moon Titan, the only moon with an atmosphere.  And then it flew onwards, on and on, for another forty years.  It officially left the Solar System and entered interstellar space in 2012.  It just kept going, further and further into the infinite emptiness.  

(You know about the Golden Record?  Come on, everybody knows about the Golden Record.  It’s kind of hokey and cheesy and also kind of amazing and great.)

Voyager has grown old.  It was never designed for this!  Its original mission was supposed to last a bit over three years.  Voyager has turned out to be much tougher than anyone ever imagined, but time gets us all.  Its power source is a generator full of radioactive isotopes, and those are gradually decaying into inert lead.  Year by year, the energy declines, the power levels  relentlessly fall.  Year by year, NASA has been switching off Voyager’s instruments to conserve that dwindling flicker.  They turned off its internal heater a few years ago, and they thought that might be the end.  But those 1970s engineers built to last, and the circuitry and the valves kept working even as the temperature dropped down, down, colder than dry ice, colder than liquid nitrogen, falling towards absolute zero.  

(Voyager stored its internal data on a digital tape recorder.  Yes, a tape recorder, storing information on magnetic tape.  It wasn’t designed to function at a hundred degrees below zero.  It wasn’t designed to work for decades, winding and rewinding, endlessly re-writing data.  But it did.)

Voyager kept going, and kept going, until it was over 15 billion kilometers away.  At the speed of light, the Moon is one and a half seconds away.  The Sun is about 8 minutes away.  Voyager is twenty-two hours away.  Send a radio signal to it at lunch on Monday, and you’ll get a response back Wednesday morning.

* * *

I could go on at great length about Voyager — the discoveries it has made, how amazing it has all been, the Deep Space Network that has maintained contact over the decades, the ever shrinking crew of aging technicians keeping it alive on a shoestring budget. But I’ll restrict myself to just this:  the Pale Blue Dot.


Dark grey and black static with coloured vertical rays of sunlight over part of the image. A small pale blue point of light is barely visible.


In 1990, just before Voyager’s camera shut down forever, the probe turned around and looked backwards.  It zoomed in and took a picture of Earth.  But by that time, it was so far away that Earth was just a single pale blue pixel.  Look at the right-most band of light.  A little past halfway down — see that speck?  It’s not a defect.  It’s not something on your screen.  That’s the Earth.

“That’s here. That’s home. That’s us. On it everyone you love, everyone you know, everyone you ever heard of, every human being who ever was, lived out their lives. The aggregate of our joy and suffering, thousands of confident religions, ideologies, and economic doctrines, every hunter and forager, every hero and coward, every creator and destroyer of civilization, every king and peasant, every young couple in love, every mother and father, hopeful child, inventor and explorer, every teacher of morals, every corrupt politician, every “superstar,” every “supreme leader,” every saint and sinner in the history of our species lived there – on a mote of dust suspended in a sunbeam.” — Carl Sagan

Voyager kept going for another 34 years after that photo.  It’s still going.  It has left the grip of the Sun’s gravity, so it’s going to fall outward forever. 

* * *

Here’s a bit of trivia: Voyager 1 currently holds the record for most distant active spacecraft.  It’s not even close.  The only other contender is Voyager’s little sister, Voyager 2, which had a different mission profile and so lags billions of kilometers behind their older sibling.

Here’s another bit of trivia:  if you’re reading this in 2024?  It’s very unlikely that you will live to see that record broken.  There are only two other spacecraft outside the Solar System — Voyager 2 and New Horizons.  Both of them are going to die before they get as far as Voyager 1.  And nobody — not NASA, not the Chinese, not the EU — is currently planning to launch another spacecraft to those distances.  In theory we could.  In practice, we have other priorities.

* * *

We thought we knew how Voyager would end.  The power would gradually, inevitably, run down.  The instruments would shut off, one by one.  The signal would get fainter.  Eventually either the last instrument would fail for lack of power, or the signal would be lost.

We didn’t expect that it would go mad.

In December 2023, Voyager started sending back gibberish instead of data.  A software glitch, though perhaps caused by an underlying hardware problem; a cosmic ray strike, or a side effect of the low temperatures, or just aging equipment randomly causing some bits to flip.

The problem was, the gibberish was coming from the flight direction software — the operating system, as it were.  And no copy of that operating system remained in existence on Earth.

(This is a problem NASA long since solved.  These days, every space probe that launches, leaves a perfect duplicate back on Earth.  Remember in “The Martian”, how they had another copy of Pathfinder sitting under a tarp in a warehouse?  That’s accurate.  It’s been standard practice for 30 years.  But back in 1977, nobody had thought of that yet.)

Voyager Mission Control used to be a couple of big rooms full of busy people, computers, giant screens.  Now it’s a single room in a small office building in the San Gabriel Valley, in between a dog training school and a McDonalds.  The Mission Control team is a handful of people, none of them young, several well past retirement age. 

And they’re trying to fix the problem.  But right now, it doesn’t look good.  You can’t just download a new OS from 15 billion kilometers away.  (For starters, there isn’t the bandwidth.)  They would have to figure out the problem, figure out if a workaround is possible, and then apply it… all with a round-trip time of 45 hours for every communication with a probe that is flying away from us at a million miles a day.  They’re trying, but nobody likes their odds.

So at some point — not tomorrow, not next week, but at some point in the next few months — they’ll probably have to admit defeat.  And then they’ll declare Voyager 1 officially over, dead and done, the end of a long song.

And that’s all.

 

Read the whole story
wmorrell
158 days ago
reply
Share this story
Delete
1 public comment
DGA51
158 days ago
reply
Deep space 1.
Central Pennsyltucky

Popular git config options

6 Shares

Hello! I always wish that command line tools came with data about how popular their various options are, like:

  • “basically nobody uses this one”
  • “80% of people use this, probably take a look”
  • “this one has 6 possible values but people only really use these 2 in practice”

So I asked about people’s favourite git config options on Mastodon:

what are your favourite git config options to set? Right now I only really have git config push.autosetupremote true and git config init.defaultBranch main set in my ~/.gitconfig, curious about what other people set

As usual I got a TON of great answers and learned about a bunch of very popular git config options that I’d never heard of.

I’m going to list the options, starting with (very roughly) the most popular ones. Here’s a table of contents:

All of the options are documented in man git-config, or this page.

pull.ff only or pull.rebase true

These two were the most popular. These both have similar goals: to avoid accidentally creating a merge commit when you run git pull on a branch where the upstream branch has diverged.

  • pull.rebase true is the equivalent of running git pull --rebase every time you git pull
  • pull.ff only is the equivalent of running git pull --ff-only every time you git pull

I’m pretty sure it doesn’t make sense to set both of them at once, since --ff-only overrides --rebase.

Personally I don’t use either of these since I prefer to decide how to handle that situation every time, and now git’s default behaviour when your branch has diverged from the upstream is to just throw an error and ask you what to do (very similar to what git pull --ff-only does).

merge.conflictstyle zdiff3

Next: making merge conflicts more readable! merge.conflictstyle zdiff3 and merge.conflictstyle diff3 were both super popular (“totally indispensable”).

The main idea is The consensus seemed to be “diff3 is great, and zdiff3 (which is newer) is even better!”.

So what’s the deal with diff3. Well, by default in git, merge conflicts look like this:

<<<<<<< HEAD
def parse(input):
    return input.split("\n")
=======
def parse(text):
    return text.split("\n\n")
>>>>>>> somebranch

I’m supposed to decide whether input.split("\n") or text.split("\n\n") is better. But how? What if I don’t remember whether \n or \n\n is right? Enter diff3!

Here’s what teh same merge conflict look like with merge.conflictstyle diff3 set:

<<<<<<< HEAD
def parse(input):
    return input.split("\n")
||||||| b9447fc
def parse(input):
    return input.split("\n\n")
=======
def parse(text):
    return text.split("\n\n")
>>>>>>> somebranch

This has extra information: now the original version of the code is in the middle! So we can see that:

  • one side changed \n\n to \n
  • the other side renamed input to text

So presumably the correct merge conflict resolution is return text.split("\n"), since that combines the changes from both sides.

I haven’t used zdiff3, but a lot of people seem to think it’s better. The blog post Better Git Conflicts with zdiff3 talks more about it.

rebase.autosquash true

Autosquash was also a new feature to me. The goal is to make it easier to modify old commits.

Here’s how it works:

  • You have a commit that you would like to be combined with some commit that’s 3 commits ago, say add parsing code
  • You commit it with git commit --fixup OLD_COMMIT_ID, which gives the new commit the commit message fixup! add parsing code
  • Now, when you run git rebase --autosquash main, it will automatically combine all the fixup! commits with their targets

rebase.autosquash true means that --autosquash always gets passed automatically to git rebase.

rebase.autostash true

This automatically runs git stash before a git rebase and git stash pop after. It basically passes --autostash to git rebase.

Personally I’m a little scared of this since it potentially can result in merge conflicts after the rebase, but I guess that doesn’t come up very often for people since it seems like a really popular configuration option.

push.default simple, push.default current

These push.default options tell git push to automatically push the current branch to a remote branch with the same name.

  • push.default simple is the default in Git. It only works if your branch is already tracking a remote branch
  • push.default current is similar, but it’ll always push the local branch to a remote branch with the same name.
  • push.autoSetupRemote and push.default simple together seem to do basically the same thing as push.default current

current seems like a good setting if you’re confident that you’re never going to accidentally make a local branch with the same name as an unrelated remote branch. Lots of people have branch naming conventions (like julia/my-change) that make this kind of conflict very unlikely, or just have few enough collaborators that branch name conflicts probably won’t happen.

init.defaultBranch main

Create a main branch instead of a master branch when creating a new repo.

commit.verbose true

This adds the whole commit diff in the text editor where you’re writing your commit message, to help you remember what you were doing.

rerere.enabled true

This enables rerere (”reuse recovered resolution”), which remembers how you resolved merge conflicts during a git rebase and automatically resolves conflicts for you when it can.

help.autocorrect 10

By default git’s autocorrect try to check for typos (like git ocmmit), but won’t actually run the corrected command.

If you want it to run the suggestion automatically, you can set help.autocorrect to 1 (run after 0.1 seconds), 10 (run after 1 second), immediate (run immediately), or prompt (run after prompting)

core.pager delta

The “pager” is what git uses to display the output of git diff, git log, git show, etc. People set it to:

  • delta (a fancy diff viewing tool with syntax highlighting)
  • less -x5,9 (sets tabstops, which I guess helps if you have a lot of files with tabs in them?)
  • less -F -X (not sure about this one, -F seems to disable the pager if everything fits on one screen if but my git seems to do that already anyway)
  • cat (to disable paging altogether)

I used to use delta but turned it off because somehow I messed up the colour scheme in my terminal and couldn’t figure out how to fix it. I think it’s a great tool though.

I believe delta also suggests that you set up interactive.diffFilter delta --color-only to syntax highlight code when you run git add -p.

diff.algorithm histogram

Git’s default diff algorithm often handles functions being reordered badly. For example look at this diff:

-.header {
+.footer {
     margin: 0;
 }

-.footer {
+.header {
     margin: 0;
+    color: green;
 }

I find it pretty confusing. But with diff.algorithm histogram, the diff looks like this instead, which I find much clearer:

-.header {
-    margin: 0;
-}
-
 .footer {
     margin: 0;
 }

+.header {
+    margin: 0;
+    color: green;
+}

Some folks also use patience, but histogram seems to be more popular. When to Use Each of the Git Diff Algorithms has more on this.

core.excludesfile: a global .gitignore

core.excludeFiles = ~/.gitignore lets you set a global gitignore file that applies to all repositories, for things like .idea or .DS_Store that you never want to commit to any repo. It defaults to ~/.config/git/ignore.

includeIf: separate git configs for personal and work

Lots of people said they use this to configure different email addresses for personal and work repositories. You can set it up something like this:

[includeIf "gitdir:~/code/<work>/"]
path = "~/code/<work>/.gitconfig"

url."git@github.com:".insteadOf 'https://github.com/'

I often accidentally clone the HTTP version of a repository instead of the SSH version and then have to manually go into ~/.git/config and edit the remote URL. This seems like a nice workaround: it’ll replace https://github.com in remotes with git@github.com:.

Here’s what it looks like in ~/.gitconfig since it’s kind of a mouthful:

[url "git@github.com:"]
	insteadOf = "https://github.com/"

One person said they use pushInsteadOf instead to only do the replacement for git push because they don’t want to have to unlock their SSH key when pulling a public repo.

A couple of other people mentioned setting insteadOf = "gh:" so they can git remote add gh:jvns/mysite to add a remote with less typing.

fsckobjects: avoid data corruption

A couple of people mentioned this one. Someone explained it as “detect data corruption eagerly. Rarely matters but has saved my entire team a couple times”.

transfer.fsckobjects = true
fetch.fsckobjects = true
receive.fsckObjects = true

submodule stuff

I’ve never understood anything about submodules but a couple of person said they like to set:

  • status.submoduleSummary true
  • diff.submodule log
  • submodule.recurse true

I won’t attempt to explain those but there’s an explanation on Mastodon by @unlambda here.

and more

Here’s everything else that was suggested by at least 2 people:

  • blame.ignoreRevsFile .git-blame-ignore-revs lets you specify a file with commits to ignore during git blame, so that giant renames don’t mess up your blames
  • branch.sort -committerdate, makes git branch sort by most recently used branches instead of alphabetical, to make it easier to find branches. tag.sort taggerdate is similar for tags.
  • color.ui false: to turn off colour
  • commit.cleanup scissors: so that you can write #include in a commit message without the # being treated as a comment and removed
  • core.autocrlf false: on Windows, to work well with folks using Unix
  • core.editor emacs: to use emacs (or another editor) to edit commit messages
  • credential.helper osxkeychain: use the Mac keychain for managing
  • diff.tool difftastic: use difftastic (or meld or nvimdiffs) to display diffs
  • diff.colorMoved default: uses different colours to highlight lines in diffs that have been “moved”
  • diff.colorMovedWS allow-indentation-change: with diff.colorMoved set, also ignores indentation changes
  • diff.context 10: include more context in diffs
  • fetch.prune true and fetch.prunetags - automatically delete remote tracking branches that have been deleted
  • gpg.format ssh: allow you to sign commits with SSH keys
  • log.date iso: display dates as 2023-05-25 13:54:51 instead of Thu May 25 13:54:51 2023
  • merge.keepbackup false, to get rid of the .orig files git creates during a merge conflict
  • merge.tool meld (or nvim, or nvimdiff) so that you can use git mergetool to help resolve merge conflicts
  • push.followtags true: push new tags along with commits being pushed
  • rebase.missingCommitsCheck error: don’t allow deleting commits during a rebase
  • rebase.updateRefs true: makes it much easier to rebase multiple stacked branches at a time. Here’s a blog post about it.

how to set these

I generally set git config options with git config --global NAME VALUE, for example git config --global diff.algorithm histogram. I usually set all of my options globally because it stresses me out to have different git behaviour in different repositories.

If I want to delete an option I’ll edit ~/.gitconfig manually, where they look like this:

[diff]
	algorithm = histogram

config changes I’ve made after writing this post

My git config is pretty minimal, I already had:

  • init.defaultBranch main
  • push.autoSetupRemote true
  • merge.tool meld
  • diff.colorMoved default (which actually doesn’t even work for me for some reason but I haven’t found the time to debug)

and I added these 3 after writing this blog post:

  • diff.algorithm histogram
  • branch.sort -committerdate
  • merge.conflictstyle zdiff3

I’d probably also set rebase.autosquash if making carefully crafted pull requests with multiple commits were a bigger part of my life right now.

I’ve learned to be cautious about setting new config options – it takes me a long time to get used to the new behaviour and if I change too many things at once I just get confused. branch.sort -committerdate is something I was already using anyway (through an alias), and I’m pretty sold that diff.algorithm histogram will make my diffs easier to read when I reorder functions.

that’s all!

I’m always amazed by how useful to just ask a lot of people what stuff they like and then list the most commonly mentioned ones, like with this list of new-ish command line tools I put together a couple of years ago. Having a list of 20 or 30 options to consider feels so much more efficient than combing through a list of all 600 or so git config options

It was a little confusing to summarize these because git’s default options have actually changed a lot of the years, so people occasionally have options set that were important 8 years ago but today are the default. Also a couple of the experimental options people were using have been removed and replaced with a different version.

I did my best to explain things accurately as of how git works right now in 2024 but I’ve definitely made mistakes in here somewhere, especially because I don’t use most of these options myself. Let me know on Mastodon if you see a mistake and I’ll try to fix it.

I might also ask people about aliases later, there were a bunch of great ones that I left out because this was already getting long.

Read the whole story
acdha
157 days ago
reply
Washington, DC
wmorrell
161 days ago
reply
denubis
161 days ago
reply
Share this story
Delete

The impossibility of agreement on what should be allowed

1 Comment and 2 Shares

On large platforms, it's impossible to have policies on things like moderation, spam, fraud, and sexual content that people agree on. David Turner made a simple game to illustrate how difficult this is even in a trivial case, No Vehicles in the Park. If you haven't played it yet, I recommend playing it now before continuing to read this document.

The idea behind the site is that it's very difficult to get people to agree on what moderation rules should apply to a platform. Even if you take a much simpler example, what vehicles should be allowed in a park given a rule and some instructions for how to interpret the rule, and then ask a small set of questions, people won't be able to agree. On doing the survey myself, one of the first reactions I had was that the questions aren't chosen to be particularly nettlesome and there are many edge cases Dave could've asked about if he wanted to make it a challenge. And yet, despite not making the survey particularly challenging, there isn't broad agreement on the questions. Comments on the survey also indicate another problem with rules, which is that it's much harder to get agreement than people think it will be. If you read comments on rule interpretation or moderation on lobsters, HN, reddit, etc., when people suggest a solution, the vast majority of people will suggest something that anyone who's done moderation or paid attention to how moderation works knows cannot work, the moderation equivalent of "I could build that in a weekend"1. Of course we see this on Dave's game as well. The top HN comment, and a very common sentiment elsewhere is2:

I'm fascinated by the fact that my takeaway is the precise opposite of what the author intended.

To me, the answer to all of the questions was crystal-clear. Yes, you can academically wonder whether an orbiting space station is a vehicle and whether it's in the park, but the obvious intent of the sign couldn't be clearer. Cars/trucks/motorcycles aren't allowed, and obviously police and ambulances (and fire trucks) doing their jobs don't have to follow the sign.

So if this is supposed to be an example of how content moderation rules are unclear to follow, it's achieving precisely the opposite.

And someone agreeingly replies with:

Exactly. There is a clear majority in the answers.

After going through the survey, you get a graph showing how many people answered yes and no to each question, which is where the "clear majority" comes from. First of all, I think it's not correct to say that there is a clear majority. But even supposing that there were, there's no reason to think that there being a majority means that most people agree with you even if you take the majority position in each vote. In fact, given how "wiggly" the per-question majority graph looks, it would be extraordinary if it were the case that being in the majority for each question meant that most people agreed with you or that there's any sert of positions that the majority of people agree on. Although you could construct a contrived dataset where this is true, it would be very surprising if this were true in a natural dataset.

If you look at the data (which isn't available on the site, but Dave was happy to pass it along when I asked), as of when I pulled the data, there was no set of answers which the majority of users agreed on and it was not even close. I pulled this data shortly after I posted on the link to HN, when the vast majority of responses were HN readers, who are more homogeneous than the population at large. Despite these factors making it easier to find agreement, the most popular set of answers was only selected by 11.7% of people. This is the position the top commenter says is "obvious", but it's a minority position not only in the sense that only 11.7% of people agree and 88.3% of people disagree, almost no one holds a position with only a small amount of disagreement from this allegedly obvious position. The 2nd and 3rd most common positions, representing 8.5% and 6.5% of the vote, respectively, are similar and only disagree on whether or not a non-functioning WW-II era tank that's part of a memorial violates the rule. Beyond that, approximately 1% of people hold the 4th, 5th, 6th, and 7th most popular positions, with every less popular position having less than 1% agreement, with a fairly rapid drop from there as well. So, 27% of people find themselves in agreement with significantly more than 1% of other users (the median user agrees with 0.16% of other users). See below for a plot of what this looks like. The opinions are sorted from most popular to least popular, with the most popular on the left. A log scale is used because there's so little agreement on opinions that a linear scale plot looks like a few points above zero followed by a bunch of zeros.

a plot illustrating the previous paragraph

In response to the same comment, Michael Chermside had the reasonable but not highly upvoted comment,

To me, the answer to all of the questions was crystal-clear.

That's not particularly surprising. But you may be asking the wrong question.

If you want to know whether the rules are clear then I think that the right question to ask is not "Are the answers crystal-clear to you?" but "Will different people produce the same answers?".

If we had a sharp drop in the graph at one point then it would suggest that most everyone has the same cutoff; instead we see a very smooth curve as if different people read this VERY SIMPLE AND CLEAR rule and still didn't agree on when it applied.

Many (and probably actually most) people are overconfident when predicting what other people think is obvious and often incorrectly assume that other people will find the same things obvious. This is more true of the highly-charged issues that result in bitter fights about moderation than the simple "no vehicles in the park" example, but even this simple example demonstrates not only the difficulty in reaching agreement, but the difficulty in understanding how difficult it is to reach agreement.

To use an example from another context that's more charged, consider in any sport and whether or not a player is considered to be playing fair or is making dirty plays and should be censured. We could look at many different players from many different sports, so let's arbitrarily pick Draymond Green. If you ask any serious basketball fan who's not a Warriors fan, who's the dirtiest player in the NBA today, you'll find general agreement that it's Draymond Green (although some people will argue for Dillon Brooks, so if you want near uniform agreement, you'll have to ask for the top two dirtiest players). And yet, if you ask a Warriors fan about Draymond, most have no problem explaining away every dirty play of his. So if you want to get uniform agreement to a question that's much more straightforward than the "no vehicles in the park" question, such as, "is it ok to stomp on another player's just and then use them as a springboard to leap into the air? on top of a hundred other dirty plays", you'll find that for many such seemingly obvious questions, a sizable group of people will have extremely strong disagreements with the "obvious" answer. When you move away from a contrived, abstract, example like "no vehicles in the park" to a real-world issue that people have emotional attachments to, it generally becomes impossible to get agreement even in cases where disinterested third parties would all agree, which we observed is already impossible even without emotional attachment. And when you move away from sports into issues people care even more strongly about, like politics, the disagreements get stronger.

While people might be able to "agree to disagree" on whether or not a a non-functioning WW-II era tank that's part of a memorial violates the "no vehicles in the park" rule (giving resulting in a a pair of positions that accounts for 15% of the vote), in reality, people often have a hard time agreeing to disagree over what outsiders would consider very small differences of opinion. Charged issues are often fractally contentious, causing disagreement among people who hold all but identical opinions, making them significantly more difficult to agree on than our "no vehicles in the park" example.

To pick a real-world example, consider Jo Freeman, probably best known among tech folks for writing "The Tyranny of Structurelessness", was also a feminist who, in 1976, wrote about her experienced being canceled for minute differences in opinion and how this was unfortunately common in the Movement (using the term "trashed" and not "canceled" because cancellation hadn't come into common usage yet and, in my opinion, "trashed" is the better term anyway). In the nearly fifty years since Jo Freeman wrote "Trashing", the propensity of humans to pick on minute differences and attempt to destroy anyone who doesn't completely agree with them hasn't changed; for a recent, parallel, example, Natalie Wynn's similar experience.

For people with opinions far away in the space of commonly held opinions, the differences in opinion between Natalie and the people calling for her to be deplatformed are fairly small. But, not only did these "small" differences in opinion result in people calling for Natalie to be deplatformed, they called for her to be physically assaulted, doxed, etc., and they suggested the same treatment suggested for her friends and associates as well as people who didn't really associate with her, but publicly talked about similar topics and didn't cancel her. Even now, years later, she still gets calls to be deplatformed and I expect this will continue past the end of my life (when I wrote this, I did a Twitter search and found a long thread from someone ranting about what a horrible human being Natalie is for the alleged transgression discussed in the video, dated 10 days ago, and it's easy to find more of these rants). I'm not going to attempt to describe the difference in positions because the positions are close enough that, to describe them would take something like 5k to 10k words (as opposed to, say, a left-wing vs. a right-wing politician, where the difference is blatant enough that you can describe in a sentence or two); you can watch the hour in the 1h40m video that's dedicated to the topic if you want to know the full details.

The point here is just that, if you look at almost any person who has public opinions on charged issues, the opinion space is fractally contentious enough that no large platform can satisfy user preferences because users will disagree over what content should be moderated off the platform and what content should be allowed. And, of course, this problem scales up as the platform gets larger3.

If you're looking for work, Freshpaint is hiring in engineering, sales, and recruitingr. Disclaimer: I may be biased since I'm an investor, but they seem to have found product-market fit and are rapidly growing.

Thanks to Peter Bhat Harkins, Dan Gackle, Laurence Tratt, Gary Bernhardt, David Turner, Kevin Burke, and Bert Muthalaly for comments/corrections/discussion.


  1. Something I've repeatedly seen on every forum I've been on is the suggestion that we just don't need moderation after all and all our problems will be solved if we just stop this nasty censorship. If you want a small forum that's basically 4chan, then no moderation can work fine, but even if you want a big platform that's like 4chan, no moderation doesn't actually work. If we go back to those Twitter numbers, 300M users and 1M bots removed a day, if you stop doing this kind of "censorship", the platform will quickly fill up with bots to the point that everything you see will be spam/scam/phishing content or content from an account copying content from somewhere else or using LLM-generated content to post scam/scam/phishing content. Not only will most accounts be bots, bots will be a part of large engagement/voting rings that will drown out all human content.

    The next most naive suggestion is to stop downranking memes, dumb jokes, etc., often throw in with a comment like "doesn't anyone here have a sense of humor?". If you look at why forums with upvoting/ranking ban memes, it generally happens after the forum becomes totally dominated by memes/comics because people upvote those at a much higher rate than any kind of content with a bit of nuance, and not everyone wants a forum that's full of the lowest common denominator meme/comic content. And as for "having a sense of humor" in comments, if you look forums that don't ban cheap humor, top comments will generally end up dominated by these, e.g., for maybe 3-6 months, one the top comments on any kind of story about a man doing anything vaguely heroic on reddit forums that don't ban this kind of cheap was some variant of "I'm surprised he can walk with balls that weigh 900 lbs.", often repeated multiple times by multiple user, amidst a sea of the other cheap humor that was trendy during that period. Of course, some people actually want that kind of humor to dominate the comments, they actually want to see the same comment 150 times a day for months on end, but I suspect most people who grumpily claim "no one has a sense of humor here" when their cheap humor gets flagged don't actually want to read a forum that's full of the trendy quips of the month.

    [return]
  2. This particular commenter indicates that they understand that moderation is, in general, a hard problem; they just don't agree with the "no vehicles in the park" example, but many other people think that both the park example and moderation are easy. [return]
  3. Nowadays, it's trendy to use "federation" as a cure-all in the same way people used "blockchain" as a cure-all five years ago, but federation doesn't solve this problem for the typical user. I actually had a conversation with someone who notes in their social media bio that they're one of the creators of the ActivityPub spec, who claimed that federation does solve this problem and that Threads adding ActivityPub would create some kind of federating panacea. I noted that fragmentation is already a problem for many users on Mastodon and whether or not Threads will be blocked is contentious and will only increase fragmentation, and the ActivityPub guy replied with something like "don't worry about that, most people won't block Threads, and it's their problem if they do.w

    I noted that a problem many of my non-technical friends had when they tried Mastodon was that they'd pick a server and find that they couldn't follow someone they wanted to follow due to some kind of server blocking or ban. So then they'd try another server to follow this one person and then find that another person they wanted to follow is blocked. The fundamental problem is that users on different servers want different things to be allowed, which then results in no server giving you access to everything you want to see. The ActivityPub guy didn't have a response to this and deleted his comment.

    [return]
Read the whole story
wmorrell
166 days ago
reply
Share this story
Delete
1 public comment
LeMadChef
165 days ago
reply
While I generally agree with the sentiment of this article, I don't agree with the assertion that it is IMPOSSIBLE to do content moderation and I don't think the article makes that case. The case I believe this article makes (and I agree with) is that it is NOT EASY or that it is HARD to do content moderation.
Denver, CO
Next Page of Stories