The Gap Through Which We Praise the Machine

In this post I’ll expose my current theory of agentic programming: people are amazing at adapting the tools they’re given and totally underestimate the extent to which they do it, and the amount of skill we build doing that is an incidental consequence of how badly the tools are designed.

I’ll first cover some of the drive behind AI assistant adoption in software, the stochastic-looking divide in expectations and satisfactions with these tools, and the desire to figure out an explanation for that phenomenon.

I’ll then look at what successful users seem to do, explore the type of scaffolding and skills they need to grow to do well with LLMs when coding or implementing features. By borrowing analytical ideas from French Ergonomists, I’ll then explain how this extensive adaptive work highlights a gap in interaction design from AI tool builders, which is what results in tricky skill acquisition.

Basically, things could be much better if we spent less time congratulating machines for the work people do and we instead supported people more directly.

Money Claps for Tinkerbell, and so Must You

A few months ago, Charity Majors and I gave the closing plenary talk at SRECon Americas 2025. While we were writing the talk, trying to thread a needle between skepticism and optimism, Charity mentioned one thing I hadn’t yet understood by then but was enlightening: investors in the industry already have divided up companies in two categories, pre-AI and post-AI, and they are asking “what are you going to do to not be beaten by the post-AI companies?”

The usefulness and success of using LLMs are axiomatically taken for granted and the mandate for their adoption can often come from above your CEO. Your execs can be as baffled as anyone else having to figure out where to jam AI into their product. Adoption may be forced to keep board members, investors, and analysts happy, regardless of what customers may be needing.

It does not matter whether LLMs can or cannot deliver on what they promise: people calling the shots assume they can, so it’s gonna happen no matter what. I’m therefore going to bypass any discussion of the desirability, sustainability, and ethics of AI here, and jump directly to “well you gotta build with it anyway or find a new job” as a premise. My main focus will consequently be on people who engage with the tech based on these promises, and how they do it. There’s a wide spectrum where at one end you have “true believers,” and at the other you have people convinced of the opposite—that this is all fraudulent shit that can’t work.

In practice, what I’m seeing is a bunch of devs who derive real value from it at certain types of tasks and workflows ranging from copilot-as-autocomplete to full agentic coding, and some who don’t and keep struggling to find ways to add LLMs to their workflows (either because they must due to some top-down mandate, or because they fear they’ll be left behind if they don’t1). I can also find no obvious correlation between where someone lands on that spectrum and things like experience levels; people fall here and there regardless of where they work, how much trust I have in their ability, how good they are at communicating, how much of a hard worker they are, or how willing to learn they might be.

A Theory of Division

So where does that difference come from? It could be easy to assign dissatisfaction to “you just gotta try harder”, or “some people work differently”, or “you go fast now but you are just creating more problems for later.” These all may be true to some degree, and the reality is surely a rich multifactorial mess. We also can’t ignore broader social and non-individual elements like the type of organizational culture people evolve in,2 on top of variations that can be seen within single teams.

My gut feeling is that, on top of all the potential factors already identified, people underestimate their own situatedness (how much they know and interpret and adjust from “thing I am told to build” and tie that to a richer contextualized “thing that makes sense to build” by being connected participants in the real world and the problem space) and how much active interpretation and steering work they do when using and evaluating coding assistants.3 Those who feel the steering process as taxing end up having a worse time and blame the machine for negative outcomes; those for whom it feels easy in turn praise the machine for the positive results.

This tolerance for steering is likely moderated or amplified by elements such as how much people trust themselves and how much they trust the AI, how threatened they might feel by it, their existing workflows, the support they might get, and the type of “benchmarks” they choose (also influenced by the preceding factors).4

I’m advancing this theory because the people I’ve seen most excited and effective about agentic work were deeply involved in constantly correcting and recognizing bugs or loops or dead ends the agent was getting into, steering them away from it, while also adding a bunch of technical safeguards and markers to projects to try and make the agents more effective. When willingly withholding these efforts, their agents’ token costs would double as they kept growing their context windows through repeating the same dead-end patterns; oddities and references to non-existing code would accumulate, and the agents would increasingly do unhinged stuff like removing tests they wrote but could no longer pass.

I’ve seen people take the blame for that erratic behavior on themselves (“oh I should have prompted in that way instead, my bad”), while others would just call out the agent for being stupid or useless.

The early frustration I have seen (and felt) seems to be due to hitting these road blocks and sort of going “wow, this sucks and isn’t what was advertised.” If you got more adept users around you, they’ll tell you to try different models, tweak bits of what you do, suggest better prompts, and offer jargon-laden workarounds.

remake of the old comic strip telling the user to 'write a map-reduce in Erlang' to query the DB and resulting in 'Did you just tell me to go fuck myself?' and 'I believe I did, Bob.' This version has the first character ask 'How do I make the AI learn things?', with the response 'It doesn't it grows stateless context'. The next panel has the character clarifying 'ok, it doesn't. How do I make it remember?' to which the other responds 'You have to use the LLM as its own MCP server!', which leads to an unchanged original panel ('... I believe I did, Bob')

That gap between “what we are told the AI can do” and “what it actually does out of the box” is significant. To bridge that gap, engineers need to do a lot of work.

The Load-bearing Scaffolding of Effective Users

There are tons of different artifacts, mechanisms, and tips and tricks required to make AI code agents work. To name a few, as suggested by vendors and multiple blog posts, you may want to do things such as:

  • Play and experiment with multiple models, figure out which to use and when, and from which interfaces, which all can significantly change your experience.
  • Agent-specific configuration files (such as CLAUDE.md, AGENTS.md, or other rule files) that specify project structure, commands, style guidelines, testing strategies, conventions, potential pitfalls, and other information. There can be one or more of them, in multiple locations, and adjusted to specific users.
  • Optimize your prompts by adding personality or character traits and special role-play instructions, possibly relying on prompt improvers.
  • Install or create MCP servers to extend the abilities of your agents. Some examples can include file management or source control, but can also do stuff like giving access to production telemetry data or issue trackers.
  • Use files as memory storage for past efforts made by the agent.
  • Specify checkpoints and manage permissions to influence when user input may be required.
  • Monitor your usage and cost.

There are more options there, and each can branch out into lots of subtle qualitative details: workarounds for code bases too large for the model’s context, defining broader evaluation strategies, working around cut-off dates, ingesting docs, or all preferences around specific coding, testing, and interaction methods. Having these artifacts in place can significantly alter someone’s experience. Needing to come up with and maintain these could be framed as increasing the effort required for successful adoption.

I’ve seen people experimenting, even with these elements in place, failing to get good results, and then being met with “yeah, of course, that’s a terrible prompt” followed with suggestions of what to improve (things like “if the current solution works, say it works so the agent does not try to change it”, asking for real examples to try and prevent fake ones, or being more or less polite).

For example, a coworker used a prompt that, among many other instructions, had one line stating “use the newest version of <component> so we can use <feature>”. The agent ignored that instruction and used an older version of the component. My coworker reacted by saying “I set myself up for refactoring by not specifying the exact version.”

From an objective point of view, asking for the newest version of the component is a very specific instruction: only one version is the newest, and the feature that was specified only existed in that version. There is no ambiguity. Saying “version $X.0” is semantically the same. But my coworker knew, from experience, that a version number would yield better results, and took it on themselves to do better next time.

These interactions show that engineers have internalized a complex set of heuristics to guide and navigate the LLM’s idiosyncrasies. That is, they’ve built a mental model of complex and hardly predictable agentic behavior (and of how it all interacts with the set of rules and artifacts and bits of scaffolding they’ve added to their repos and sessions) to best predict what will or won’t yield good results, and then do extra corrective work ahead of time through prompting variations. This is a skill that makes a difference.

That you need to do these things might in fact point at how agentic AI does not behave with cognitive fluency,5 and instead, the user subtly does it on its behalf in order to be productive.

Whether you will be willing to provide that skill for the machine may require a mindset or position that I’ll caricature as “I just need to get better”, as opposed to taking a stance of “the LLM needs to get better”. I suspect this stance, whether it is chosen deliberately or not, will influence how much interaction (and course-correcting) one expects to handle while still finding an agent useful or helpful.

I don’t know that engineers even realize they’re doing that type of work, that they’re essential to LLMs working for code, that the tech is fascinating but maybe not that useful without the scaffolding and constant guidance they provide. At least, people who speak of AI replacing engineers probably aren’t fully aware that while engineers could maybe be doing more work through assisting an agent than they would do alone, agents would still not do good work without the engineer. AI is normal technology, in that its adoption, propagation, and the efforts to make it work all follow predictable patterns. LLMs, as a piece of tech, mainly offer some unrealized potential.

It may sound demeaning, like I’m implying people lack awareness of their own processes, but it absolutely isn’t. The process of adaptation is often not obvious, even to the people doing it. There are lots of strategies and patterns and behaviors people pick up or develop tacitly as a part of trying to meet goals. Cognitive work that gets deeply ingrained sometimes just feels effortless, natural, and obvious. Unless you’re constantly interacting with newcomers, you forget what you take for granted—you just know what you know and get results.

By extension, my supposition is that those who won’t internalize the idiosyncrasies and the motions of doing the scaffolding work are disappointed far more quickly: they may provide more assistance to the agent than the agent provides to them, and this is seen as the AI failing to improve their usual workflow and to deliver on the wonders advertised by its makers.

The Gap Highlighted Through Adaptive Work

What AI sells is vastly different from what it delivers, particularly what it delivers out of the box. In their study of the difference between work-as-imagined (WAI) and work-as-done (WAD), ergonomists and resilience engineers have developed a useful framing device to understand what’s going on.

Work-as-imagined describes the work as it is anticipated or expected to happen, how it can be specified and described. The work-as-done comprises the work as it is carried out, along with the supporting tasks, deviations, meanings, and their relationships to the prescribed tasks.

By looking at how people turn artifacts they’re given into useful tools, we can make sense of that gap.6 This adjustment ends up transforming both the artifacts (by modifying and configuring them) and the people using them (through learning and by changing their behavior). The difference between the original artifact developed by the people planning the work and the forms that end up effectively used in the field offer a clue of the mismatch between WAI and WAD.

Tying this back to our LLM systems, what is imagined is powerful agents who replace engineers (at least junior ones), make everyone more productive, and that will be a total game changer. LLMs are artifacts. The scaffolding we put in place to control them are how we try to transform the artifacts into tools; the learning we do to get better at prompting and interacting with the LLMs is part of how they transform us. If what we have to do to be productive with LLMs is to add a lot of scaffolding and invest effort to gain important but poorly defined skills, we should be able to assume that what we’re sold and what we get are rather different things.

That gap implies that better designed artifacts could have better affordances, and be more appropriate to the task at hand. They would be easier to turn into productive tools. A narrow gap means fewer adaptations are required, and a wider gap implies more of them are needed.

Flipping it around, we have to ask whether the amount of scaffolding and skill required by coding agents is acceptable. If we think it is, then our agent workflows are on the right track. If we’re a bit baffled by all that’s needed to make it work well, we may rightfully suspect that we’re not being sold the right stuff, or at least stuff with the right design.

Bad Interaction Design Demands Greater Coping Skills

I fall in the baffled camp that thinks better designs are possible. In a fundamental sense, LLMs can be assumed to be there to impress you. Their general focus on anthropomorphic interfaces—just have a chat!—makes them charming, misguides us into attributing more agency and intelligence than they have, which makes it even more challenging for people to control or use them predictably. Sycophancy is one of the many challenges here, for example.

Coding assistants, particularly agents, are narrower in their interface, but they build on a similar interaction model. They aim to look like developers, independent entities that can do the actual work. The same anthropomorphic interface is in place, and we similarly must work even harder to peel the veneer of agency they have to properly predict them and apply them in controlled manners.

You can see the outline of this when a coding agent reaches limits it has no awareness of, like when it switches from boilerplate generation (where we’re often fine letting it do its thing) to core algorithms (where we want involvement to avoid major refactors) without proper hand-offs or pauses. Either precise prompting must be done to preempt and handle the mode switch, or we find the agent went too far and we must fix (or rewrite) buggy code rather than being involved at the right time.

And maybe the issue is prompting, maybe it’s the boilerplatey nature of things, maybe it’s because there was not enough training material for your language or framework. Maybe your config files aren’t asking for the right persona, or another model could do better. Maybe it’s that we don’t even know what exactly is the boundary where our involvement is more critical. Figuring that out requires skill, but also it’s kind of painful to investigate as a self-improvement workflow.

Coding agents require the scaffolding, learning, and often demand more attention than tools, but are built to look like teammates. This makes them both unwieldy tools and lousy teammates. We should either have agents designed to look like a teammate properly act like a teammate, and barring that, have a tool that behaves like a tool. This is the point I make in AI: Where in the Loop Should Humans Go?, where a dozen questions are offered to evaluate how well this is done.

Key problems that arise when we’re in the current LLM landscape include:

  • AI that aims to improve us can ironically end up deskilling us;
  • Not knowing whether we are improving the computers or augmenting people can lead to unsustainable workflows and demands;
  • We risk putting people in passive supervision and monitoring roles, which is known not to work well;
  • We may artificially constrain and pigeonhole how people approach problems, and reduce the scope of what they can do;
  • We can adopt known anti-patterns in team dynamics that reduce overall system efficiency;
  • We can create structural patterns where people are forced to become accountability scapegoats.

Hazel Weakly comes up with related complaints in Stop Building AI Tools Backwards, where she argues for design centered on collaborative learning patterns (Explain, Demonstrate, Guide, Enhance) to play to the strengths that make people and teams effective, rather than one that reinforces people into being ineffective.

Some people may hope that better models will eventually meet expectations and narrow the gap on their own. My stance is that rather than anchoring coding agent design into ideals of science fiction (magical, perfect workers granting your wishes), they should be grounded in actual science. The gap would be narrowed much more effectively then. AI tool designers should study how to integrate solutions to existing dynamics, and plan to align with known strength and limitations of automation.

We Oversell Machines by Erasing Ourselves

Being able to effectively use LLMs for programming demands a lot of scaffolding and skills. The skills needed are, however, poorly defined and highly context dependent, such that we currently don’t have great ways of improving them other than long periods of trial and error.7

The problem is that while the skills are real and important, I would argue that the level of sophistication they demand is an accidental outcome of poor interaction design. Better design, aimed more closely to how real work is done, could drastically reduce the amount of scaffolding and learning required (and the ease with which learning takes place).

I don’t expect my calls to be heard. Selling sci-fi is way too effective. And as long as the AI is perceived as the engine of a new industrial revolution, decision-makers will imagine it can do so, and task people to make it so.

Things won’t change, because people are adaptable and want the system to succeed. We consequently take on the responsibility for making things work, through ongoing effort and by transforming ourselves in the process. Through that work, we make the technology appear closer to what it promises than what it actually delivers, which in turn reinforces the pressure to adopt it.

As we take charge of bridging the gap, the machine claims the praise.


1: Dr. Cat Hicks has shared some great research on factors related to this, stating that competitive cultures that assume brilliance is innate and internal tend to lead to a much larger perceived threat from AI regarding people’s skills, whereas learning cultures with a sense of belonging lowered that threat. Upskilling can be impacted by such threats, along with other factors described in the summaries and the preprint.

2: Related to the previous footnote, Dr. Cat Hicks here once again shares research on cumulative culture, a framing that shows how collaborative innovation and learning can be, and offers an alternative construct to individualistic explanations for software developers’ problem solving.

3: A related concept might be Moravec’s Paradox. Roughly, this classic AI argument states that we tend to believe higher order reasoning like maths and logic is very difficult because it feels difficult to us, but the actually harder stuff (perception and whatnot) is very easy to us because we’re so optimized for it.

4: The concept of self-trust and AI trust is explored in The Impact of Generative AI on Critical Thinking by HPH Lee and Microsoft Research. The impact of AI skill threat is better defined in the research in footnote 1. The rest is guesswork.
The guess about “benchmarks” is based on observations that people may use heuristics like checking how it does at things you’re good at to estimate how you can trust it at things you’ve got less expertise on. This can be a useful strategy but can also raise criteria for elements where expertise may not be needed (say, boilerplate), and high expectations can lay the groundwork for easier disappointment.

5: The Law of Fluency states that Well-adapted cognitive work occurs with a facility that belies the difficulty of resolving demands and balancing dilemmas, basically stating that if you’ve gotten good at stuff, you make it look a lot easier than it actually is to do things.

6: This idea comes from a recent French ergonomics paper. It states that “Artifacts represent for the worker a part of the elements of WAI. These artifacts can become tools only once the workers become users, when they appropriate them. [Tools] are an aggregation of artifacts (WAI) and of usage schemas by those who use them in the field (WAD).”

7: One interesting anecdote here is hearing people say they found it challenging to switch from their personal to corporate accounts for some providers, because something in their personal sessions had made the LLMs work better with their style of prompting and this got lost when switching.
Other factors here include elements such as how updating models can significantly impact user experience, which may point to a lack of stable feedback that can also make skill acquisition more difficult.

Permalink

Elixir Outreach stipend for speakers and trainers

Dashbit, Oban, and the Erlang Ecosystem Foundation (EEF) are glad to announce a new program, which we will trial over the next 12 months, called “Elixir Outreach”. Our goal is to provide funds to community members who want to present Elixir and Erlang to other ecosystems and communities, while respecting our joint values.

In a nutshell:

  • We will provide funds to community members to speak in-person about anything related to Elixir and the Erlang VM.

  • We will cover hotel and transportation costs for up to $700 USD. Please reach out, even if you expect to exceed that limit. This is our first time running the program and we’re refining the budget.

  • The event must expect at least 150 attendees and happen outside of the Elixir, overall BEAM, and functional programming communities. In other words, we won’t cover costs for attending Erlang, Elixir, or other BEAM/FP conferences nor meetups. Consider it as an opportunity to learn and bring external knowledge and experiences to the BEAM community.

  • You will be expected to send a report about your experience. The format and duration is up to you. We’d prefer that you write a blog post or an article sharing your overall experience with the involved communities. However, if you would prefer to only send it privately to us, that’s fine too!

The event should take place within your area. Our overall goal is to support multiple individuals, rather than drain our budget on a few long-distance flights (such as across coasts or continents). We are flexible on event location, distance, or type. If in doubt, reach out to elixir_outreach at erlef dot org

Our initial budget of $7000 was donated by Dashbit ($5000) and Oban ($2000) to the Erlang Ecosystem Foundation (EEF), specifically for this program. The EEF will oversee the distribution of the funds.

Requesting a stipend

To request a stipend, visit the Erlang Ecosystem Foundation website and choose “Elixir Outreach” as the stipend type.

Given we have limited funds, we cannot guarantee they will be available when you request them. We recommend reaching out to us before submitting or acceptance your talk. Therefore, by contacting us early, we can validate if the event matches the criteria above, ask questions, and earmark the funds. Once your talk is accepted, send us any itemized travel and accommodation costs so we can transfer the stipend to you, (not in excess of $700 USD).

You can also request a stipend after your talk has already been accepted, but then there are no guarantees a stipend will be available.

Our goal is to make this process simple and as straight-forward as possible. Although, we reserve the right to refuse a request for any reason. If in doubt, reach out to elixir_outreach at erlef dot org.

Acknowledgements

This is a new effort for all involved! Please be patient while we figure out the details.

If you are looking for conferences to speak at, Dave Aronson keeps a list of CFPs closing soon and there are likely others available. Note, we don’t necessarily endorse all of the conferences listed nor guarantee they meet the requirements above, but the list may help you get the ball rolling.

Thanks to Parker Selbert, Shannon Selbert, Brian Cardarella, Alistair Woodman, and Lee Barney for feedback and helping make this a reality.

Permalink

Erlang/OTP 28.0

OTP 28.0

Erlang/OTP 28 is a new major release with new features, improvements as well as a few incompatibilities. Some of the new features are highlighted below.

Many thanks to all contributors!

Starting with this release, a source Software Bill of Materials (SBOM) will describe the release on the Github Releases page. We welcome feedback on the SBOM.

New language features

  • Functionality making it possible for processes to enable reception of priority messages has been introduced in accordance with EEP 76.

  • Comprehensions have been extended with “zip generators” allowing multiple generators to be run in parallel. For example, [A+B || A <- [1,2,3] && B <- [4,5,6]] will produce [5,7,9].

  • Generators in comprehensions can now be strict, meaning that if the generator pattern does not match, an exception will be raised instead of silently ignore the value that didn’t match.

  • It is now possible to use any base for floating point numbers as per EEP 75: Based Floating Point Literals.

Compiler and JIT improvements

  • For certain types of errors, the compiler can now suggest corrections. For example, when attempting to use variable A that is not defined but A0 is, the compiler could emit the following message: variable 'A' is unbound, did you mean 'A0'?

  • The size of an atom in the Erlang source code was limited to 255 bytes in previous releases, meaning that an atom containing only emojis could contain only 63 emojis. While atoms are still only allowed to contain 255 characters, the number of bytes is no longer limited.

  • The warn_deprecated_catch option enables warnings for use of old-style catch expressions on the form catch Expr instead of the modern trycatchend.

  • Provided that the map argument for a maps:put/3 call is known to the compiler to be a map, the compiler will replace such calls with the corresponding update using the map syntax.

  • Some BIFs with side-effects (such as binary_to_atom/1) are optimized in trycatch in the same way as guard BIFs in order to gain performance.

  • The compiler’s alias analysis pass is now both faster and less conservative, allowing optimizations of records and binary construction to be applied in more cases.

ERTS

  • The trace:system/3 function has been added. It has a similar interface as erlang:system_monitor/2 but it also supports trace sessions.

  • os:set_signal/2 now supports setting handlers for the SIGWINCH, SIGCONT, and SIGINFO signals.

  • The two new BIFs erlang:processes_iterator/0 and erlang:process_next/1 make it possible to iterate over the process table in a way that scales better than erlang:processes/0.

Shell and terminal

  • The erl -noshell mode has been updated to have two sub modes called raw and cooked, where cooked is the old default behaviour and raw can be used to bypass the line-editing support of the native terminal. Using raw mode it is possible to read keystrokes as they occur without the user having to press Enter. Also, the raw mode does not echo the typed characters to stdout.

  • The shell now prints a help message explaining how to interrupt a running command when stuck executing a command for longer than 5 seconds.

STDLIB

  • The join(Binaries, Separator) function that joins a list of binaries has been added to the binary module.

  • By default, sets created by module sets will now be represented as maps.

  • Module re has been updated to use the newer PCRE2 library instead of the PCRE library.

  • There is a new zstd module that does Zstandard compression.

Public_key

  • The ancient ASN.1 modules used in public_key has been replaced with more modern versions, but we have strived to keep the documented Erlang API for the public_key application compatible.

Dialyzer

SSL

  • The data handling for tls-v1.3 has been optimized.

Emacs mode (in the Tools application)

  • The indent-region in Emacs command will now handle multiline strings better.

For more details about new features and potential incompatibilities see the README.

Permalink

Erlang/OTP 28.0 Release Candidate 4

OTP 28.0-rc4

Erlang/OTP 28.0-rc4 is the fourth release candidate before the OTP 28.0 release.

The intention with this release is to get feedback from our users. All feedback is welcome, even if it is only to say that it works for you, and that it installs as it should. We encourage users to try it out and give us feedback either by creating an issue at https://github.com/erlang/otp/issues or by posting to Erlang Forums.

All artifacts for the release can be downloaded from the Erlang/OTP Github release and you can view the new documentation at https://erlang.org/documentation/doc-16-rc4/doc. You can also install the latest release using kerl like this:

kerl build 28.0-rc4 28.0-rc4

Starting with this release, a source Software Bill of Materials (SBOM) will describe the release on the Github Releases page. We welcome feedback on the SBOM.

Erlang/OTP 28 is a new major release with new features, improvements as well as a few incompatibilities. Some of the new features are highlighted below.

Many thanks to all contributors!

Highlights for RC4

  • The ancient ASN.1 modules used in public_key has been replaced with more modern versions, but we have strived to keep the documented Erlang API for the public_key application compatible.

Highlights for RC2

  • Functionality making it possible for processes to enable reception of priority messages has been introduced in accordance with EEP 76.

Highlights for RC1

New language features

  • Comprehensions have been extended with “zip generators” allowing multiple generators to be run in parallel. For example, [A+B || A <- [1,2,3] && B <- [4,5,6]] will produce [5,7,9].

  • Generators in comprehensions can now be strict, meaning that if the generator pattern does not match, an exception will be raised instead of silently ignore the value that didn’t match.

  • It is now possible to use any base for floating point numbers as per EEP 75: Based Floating Point Literals.

Compiler and JIT improvements

  • For certain types of errors, the compiler can now suggest corrections. For example, when attempting to use variable A that is not defined but A0 is, the compiler could emit the following message: variable 'A' is unbound, did you mean 'A0'?

  • The size of an atom in the Erlang source code was limited to 255 bytes in previous releases, meaning that an atom containing only emojis could contain only 63 emojis. While atoms are still only allowed to contain 255 characters, the number of bytes is no longer limited.

  • The warn_deprecated_catch option enables warnings for use of old-style catch expressions on the form catch Expr instead of the modern trycatchend.

  • Provided that the map argument for a maps:put/3 call is known to the compiler to be a map, the compiler will replace such calls with the corresponding update using the map syntax.

  • Some BIFs with side-effects (such as binary_to_atom/1) are optimized in trycatch in the same way as guard BIFs in order to gain performance.

  • The compiler’s alias analysis pass is now both faster and less conservative, allowing optimizations of records and binary construction to be applied in more cases.

ERTS

  • The trace:system/3 function has been added. It has a similar interface as erlang:system_monitor/2 but it also supports trace sessions.

  • os:set_signal/2 now supports setting handlers for the SIGWINCH, SIGCONT, and SIGINFO signals.

  • The two new BIFs erlang:processes_iterator/0 and erlang:process_next/1 make it possible to iterate over the process table in a way that scales better than erlang:processes/0.

Shell and terminal

  • The erl -noshell mode has been updated to have two sub modes called raw and cooked, where cooked is the old default behaviour and raw can be used to bypass the line-editing support of the native terminal. Using raw mode it is possible to read keystrokes as they occur without the user having to press Enter. Also, the raw mode does not echo the typed characters to stdout.

  • The shell now prints a help message explaining how to interrupt a running command when stuck executing a command for longer than 5 seconds.

STDLIB

  • The join(Binaries, Separator) function that joins a list of binaries has been added to the binary module.

  • By default, sets created by module sets will now be represented as maps.

  • Module re has been updated to use the newer PCRE2 library instead of the PCRE library.

  • There is a new zstd module that does Zstandard compression.

Dialyzer

SSL

  • The data handling for tls-v1.3 has been optimized.

Emacs mode (in the Tools application)

  • The indent-region in Emacs command will now handle multiline strings better.

For more details about new features and potential incompatibilities see the README.

Permalink

Erlang/OTP 28.0 Release Candidate 3

OTP 28.0-rc3

Erlang/OTP 28.0-rc3 is the third release candidate of before the OTP 28.0 release.

The intention with this release is to get feedback from our users. All feedback is welcome, even if it is only to say that it works for you, and that it installs as it should. We encourage users to try it out and give us feedback either by creating an issue at https://github.com/erlang/otp/issues or by posting to Erlang Forums.

All artifacts for the release can be downloaded from the Erlang/OTP Github release and you can view the new documentation at https://erlang.org/documentation/doc-16-rc3/doc. You can also install the latest release using kerl like this:

kerl build 28.0-rc3 28.0-rc3.

Starting with this release, a source Software Bill of Materials (SBOM) will describe the release on the Github Releases page. We welcome feedback on the SBOM.

Erlang/OTP 28 is a new major release with new features, improvements as well as a few incompatibilities. Some of the new features are highlighted below.

Many thanks to all contributors!

Highlights for RC2

  • Functionality making it possible for processes to enable reception of priority messages has been introduced in accordance with EEP 76.

Highlights for RC1

New language features

  • Comprehensions have been extended with “zip generators” allowing multiple generators to be run in parallel. For example, [A+B || A <- [1,2,3] && B <- [4,5,6]] will produce [5,7,9].

  • Generators in comprehensions can now be strict, meaning that if the generator pattern does not match, an exception will be raised instead of silently ignore the value that didn’t match.

  • It is now possible to use any base for floating point numbers as per EEP 75: Based Floating Point Literals.

Compiler and JIT improvements

  • For certain types of errors, the compiler can now suggest corrections. For example, when attempting to use variable A that is not defined but A0 is, the compiler could emit the following message: variable 'A' is unbound, did you mean 'A0'?

  • The size of an atom in the Erlang source code was limited to 255 bytes in previous releases, meaning that an atom containing only emojis could contain only 63 emojis. While atoms are still only allowed to contain 255 characters, the number of bytes is no longer limited.

  • The warn_deprecated_catch option enables warnings for use of old-style catch expressions on the form catch Expr instead of the modern trycatchend.

  • Provided that the map argument for a maps:put/3 call is known to the compiler to be a map, the compiler will replace such calls with the corresponding update using the map syntax.

  • Some BIFs with side-effects (such as binary_to_atom/1) are optimized in trycatch in the same way as guard BIFs in order to gain performance.

  • The compiler’s alias analysis pass is now both faster and less conservative, allowing optimizations of records and binary construction to be applied in more cases.

ERTS

  • The trace:system/3 function has been added. It has a similar interface as erlang:system_monitor/2 but it also supports trace sessions.

  • os:set_signal/2 now supports setting handlers for the SIGWINCH, SIGCONT, and SIGINFO signals.

  • The two new BIFs erlang:processes_iterator/0 and erlang:process_next/1 make it possible to iterate over the process table in a way that scales better than erlang:processes/0.

Shell and terminal

  • The erl -noshell mode has been updated to have two sub modes called raw and cooked, where cooked is the old default behaviour and raw can be used to bypass the line-editing support of the native terminal. Using raw mode it is possible to read keystrokes as they occur without the user having to press Enter. Also, the raw mode does not echo the typed characters to stdout.

  • The shell now prints a help message explaining how to interrupt a running command when stuck executing a command for longer than 5 seconds.

STDLIB

  • The join(Binaries, Separator) function that joins a list of binaries has been added to the binary module.

  • By default, sets created by module sets will now be represented as maps.

  • Module re has been updated to use the newer PCRE2 library instead of the PCRE library.

  • There is a new zstd module that does Zstandard compression.

Dialyzer

SSL

  • The data handling for tls-v1.3 has been optimized.

Emacs mode (in the Tools application)

  • The indent-region in Emacs command will now handle multiline strings better.

For more details about new features and potential incompatibilities see the README.

Permalink

Cyanview: Coordinating Super Bowl's visual fidelity with Elixir

How do you coordinate visual fidelity across two hundred cameras for a live event like the Super Bowl?

The answer is: by using the craft of camera shading, which involves adjusting each camera to ensure they match up in color, exposure and various other visual aspects. The goal is to turn the live event broadcast into a cohesive and consistent experience. For every angle used, you want the same green grass and the same skin tones. Everything needs to be very closely tuned across a diverse set of products and brands. From large broadcast cameras, drone cameras, and PTZ cameras to gimbal-held mirrorless cameras and more. This is what Cyanview does. Cyanview is a small Belgian company that sells products for the live video broadcast industry, and its primary focus is shading.

Broadcast is a business where you only get one chance to prove that your tool is up to the task. Reliability is king. There can be no hard failures.

A small team of three built a product so powerful and effective that it spread across the industry purely on the strength of its functionality. Without any marketing, it earned a reputation among seasoned professionals and became a staple at the world’s top live events. Cyanview’s Remote Control Panel (RCP) is now used by specialist video operators on the Olympics, Super Bowl, NFL, NBA, ESPN, Amazon and many more. Even most fashion shows in Paris use Cyanview’s devices.

These devices put Elixir right in the critical path for serious broadcast operations. By choosing Elixir, Cyanview gained best-in-class networking features, state-of-the-art resilience and an ecosystem that allowed fast iteration on product features.

Operating many displays with Cyanview products.

Why Elixir?

The founding team of Cyanview primarily had experience with embedded development, and the devices they produce involve a lot of low-level C code and plenty of FPGA. This is due to the low-level details of color science and the really tight timing requirements.

If you’ve ever worked with camera software, you know it can be a mixed bag. Even after going fully digital, much of it remained tied to analog systems or relied on proprietary connectivity solutions. Cyanview has been targeting IP (as in Internet Protocol) from early on. This means Cyanview’s software can operate on commodity networks that work in well-known and well-understood ways. This has aligned well with an increase in remote production, partially due to the pandemic, where production crews operate from a central location with minimal crew on location. Custom radio frequency or serial wire protocols have a hard time scaling to cross-continent distances.

This also paved the way for Elixir, as the Erlang VM was designed to communicate and coordinate millions of devices, reliably, over the network.

Elixir was brought in by the developer Ghislain, who needed to build integrations with cameras and interact with the other bits of required video gear, with many different protocols over the network. The language comes with a lot of practical features for encoding and decoding binary data down to the individual bits. Elixir gave them a strong foundation and the tools to iterate fast.

Ghislain has been building the core intellectual property of Cyanview ever since. While the physical device naturally has to be solid, reliable, and of high quality, a lot of the secret sauce ultimately lies in the massive number of integrations and huge amounts of reverse engineering. Thus, the product is able to work with as many professional camera systems and related equipment as possible. It is designed to be compatible with everything and anything a customer is using. Plus, it offers an API to ensure smooth integration with other devices.

David Bourgeois, the founder of Cyanview, told us a story how these technical decisions alongside Elixir helped them tackle real-world challenges:

“During the Olympics in China, a studio in Beijing relied on a large number of Panasonic PTZ cameras. Most of their team, however, was based in Paris and needed to control the cameras remotely to run various shows throughout the day. The problem? Panasonic’s camera protocols were never designed for internet use — they require precise timing and multiple messages for every adjustment. With network latency, that leads to timeouts, disconnects, and system failures… So they ended up placing our devices next to the cameras in Beijing and controlled them over IP from Paris — just as designed.”

Cyanview RIO device mounted on a camera at a sports field.

The devices in a given location communicate and coordinate on the network over a custom MQTT protocol. Over a hundred cameras without issue on a single Remote Control Panel (RCP), implemented on top of Elixir’s network stack.

Technical composition

The system as a whole consists of RCP devices running a Yocto Linux system, with most of the logic built in Elixir and C. While Python is still used for scripting and tooling, its role has gradually diminished. The setup also includes multiple microcontrollers and the on-camera device, all communicating over MQTT. Additionally, cloud relays facilitate connectivity, while dashboards and controller UIs provide oversight and control. The two critical devices are the RCP offering control on the production end and the RIO handling low-latency manipulation of the camera. Both run Elixir.

The configuration UI is currently built in Elm, but - depending on priorities - it might be converted to Phoenix LiveView over time to reduce the number of languages in use. The controller web UI is already in LiveView, and it is performing quite well on a very low-spec embedded Linux machine.

The cloud part of the system is very limited today, which is unusual in a world of SaaS. There are cloud relays for distributing and sharing camera control as well as forwarding network ports between locations and some related features, also built in Elixir, but cloud is not at the core of the business. The devices running Elixir on location form a cluster over IP using a custom MQTT-based protocol suited to the task and are talking to hundreds of cameras and other video devices.

It goes without saying that integration with so much proprietary equipment comes with challenges. Some integrations are more reliable than others. Some devices are more common, and their quirks are well-known through hard-won experience. A few even have good documentation that you can reference while others offer mystery and constant surprises. In this context, David emphasizes the importance of Elixir’s mechanisms for recovering from failures:

“If one camera connection has a blip, a buggy protocol or the physical connection to a device breaks it is incredibly important that everything else keeps working. And this is where Elixir’s supervision trees provide a critical advantage.”

Growth & team composition

The team has grown over the 9 years that the company has been operating, but it did so at a slow and steady pace. On average, the company has added just one person per year. With nine employees at the time of writing, Cyanview supports some of the biggest broadcast events in the world.

There are two Elixir developers on board: Daniil who is focusing on revising some of the UI as well as charting a course into more cloud functionality, and Ghislain, who works on cameras and integration. Both LiveView and Elm are used to power device UIs and dashboards.

What’s interesting is that, overall, the other embedded developers say that they don’t know much about Elixir and they don’t use it in their day-to-day work. Nonetheless, they are very comfortable implementing protocols and encodings in Elixir. The main reason they haven’t fully learned the language is simply time — they have plenty of other work to focus on, and deep Elixir expertise hasn’t been necessary. After all, there’s much more to their work beyond Elixir: designing PCBs, selecting electronic components, reverse engineering protocols, interfacing with displays, implementing FPGAs, managing production tests, real productions and releasing firmware updates.

Innovation and customer focus

Operator using Cyanview RCP for a massive crowd in an arena.

Whether it’s providing onboard cameras in 40+ cars during the 24 hours of Le Mans, covering Ninja Warrior, the Australian Open, and the US Open, operating a studio in the Louvre, being installed in NFL pylons, or connecting over 200 cameras simultaneously – the product speaks for itself. Cyanview built a device for a world that runs on top of IP, using Elixir, a language with networking and protocols deep in its bones. This choice enabled them to do both: implement support for all the equipment and provide features no one else had.

By shifting from conventional local-area radio frequency, serial connections, and inflexible proprietary protocols to IP networking, Cyanview’s devices redefined how camera systems operate. Their feature set is unheard of in the industry: Unlimited multicam. Tally lights. Pan & Tilt control. Integration with color correctors. World-spanning remote production.

The ease and safety of shipping new functionality have allowed the company to support new features very quickly. One example is the increasing use of mirrorless cameras on gimbals to capture crowd shots. Cyanview were able to prototype gimbal control, test it with a customer and validate that it worked in a very short amount of time. This quick prototyping and validation of features is made possible by a flexible architecture that ensures that critical fundamentals don’t break.

Camera companies that don’t produce broadcast shading remotes, such as Canon or RED, recommend Cyanview to their customers. Rather than competing with most broadcast hardware companies, Cyanview considers itself a partner. The power of a small team, a quality product and powerful tools can be surprising. Rather than focusing on marketing, Cyanview works very closely with its customers by supporting the success of their events and providing in-depth customer service.

Looking back and forward

When asked if he would choose Elixir again, David responded:

“Yes. We’ve seen what the Erlang VM can do, and it has been very well-suited to our needs. You don’t appreciate all the things Elixir offers out of the box until you have to try to implement them yourself. It was not pure luck that we picked it, but we were still lucky. Elixir turned out to bring a lot that we did not know would be valuable to us. And we see those parts clearly now.”

Cyanview hopes to grow the team more, but plans to do so responsibly over time. Currently there is a lot more to do than the small team can manage.

Development is highly active, with complementary products already in place alongside the main RCP device, and the future holds even more in that regard. Cloud offerings are on the horizon, along with exciting hardware projects that build on the lessons learned so far. As these developments unfold, we’ll see Elixir play an increasingly critical role in some of the world’s largest live broadcasts.

Cyanview Remote Control Panels in a control room.

In summary

A high-quality product delivering the right innovation at the right time in an industry that’s been underserved in terms of good integration. Elixir provided serious leverage for developing a lot of integrations with high confidence and consistent reliability. In an era where productivity and lean, efficient teams are everything, Cyanview is a prime example of how Elixir empowers small teams to achieve an outsized impact.

Permalink

AI: Where in the Loop Should Humans Go?

This is a re-publishing of a blog post I originally wrote for work, but wanted on my own blog as well.

AI is everywhere, and its impressive claims are leading to rapid adoption. At this stage, I’d qualify it as charismatic technology—something that under-delivers on what it promises, but promises so much that the industry still leverages it because we believe it will eventually deliver on these claims.

This is a known pattern. In this post, I’ll use the example of automation deployments to go over known patterns and risks in order to provide you with a list of questions to ask about potential AI solutions.

I’ll first cover a short list of base assumptions, and then borrow from scholars of cognitive systems engineering and resilience engineering to list said criteria. At the core of it is the idea that when we say we want humans in the loop, it really matters where in the loop they are.

My base assumptions

The first thing I’m going to say is that we currently do not have Artificial General Intelligence (AGI). I don’t care whether we have it in 2 years or 40 years or never; if I’m looking to deploy a tool (or an agent) that is supposed to do stuff to my production environments, it has to be able to do it now. I am not looking to be impressed, I am looking to make my life and the system better.

Another mechanism I want you to keep in mind is something called the context gap. In a nutshell, any model or automation is constructed from a narrow definition of a controlled environment, which can expand as it gains autonomy, but remains limited. By comparison, people in a system start from a broad situation and narrow definitions down and add constraints to make problem-solving tractable. One side starts from a narrow context, and one starts from a wide one—so in practice, with humans and machines, you end up seeing a type of teamwork where one constantly updates the other:

The optimal solution of a model is not an optimal solution of a problem unless the model is a perfect representation of the problem, which it never is.
 — Ackoff (1979, p. 97)

Because of that mindset, I will disregard all arguments of “it’s coming soon” and “it’s getting better real fast” and instead frame what current LLM solutions are shaped like: tools and automation. As it turns out, there are lots of studies about ergonomics, tool design, collaborative design, where semi-autonomous components fit into sociotechnical systems, and how they tend to fail.

Additionally, I’ll borrow from the framing used by people who study joint cognitive systems: rather than looking only at the abilities of what a single person or tool can do, we’re going to look at the overall performance of the joint system.

This is important because if you have a tool that is built to be operated like an autonomous agent, you can get weird results in your integration. You’re essentially building an interface for the wrong kind of component—like using a joystick to ride a bicycle.

This lens will assist us in establishing general criteria about where the problems will likely be without having to test for every single one and evaluate them on benchmarks against each other.

Questions you'll want to ask

The following list of questions is meant to act as reminders—abstracting away all the theory from research papers you’d need to read—to let you think through some of the important stuff your teams should track, whether they are engineers using code generation, SREs using AIOps, or managers and execs making the call to adopt new tooling.

Are you better even after the tool is taken away?

An interesting warning comes from studying how LLMs function as learning aides. The researchers found that people who trained using LLMs tended to fail tests more when the LLMs were taken away compared to people who never studied with them, except if the prompts were specifically (and successfully) designed to help people learn.

Likewise, it’s been known for decades that when automation handles standard challenges, the operators expected to take over when they reach their limits end up worse off and generally require more training to keep the overall system performant.

While people can feel like they’re getting better and more productive with tool assistance, it doesn’t necessarily follow that they are learning or improving. Over time, there’s a serious risk that your overall system’s performance will be limited to what the automation can do—because without proper design, people keeping the automation in check will gradually lose the skills they had developed prior.

Are you augmenting the person or the computer?

Traditionally successful tools tend to work on the principle that they improve the physical or mental abilities of their operator: search tools let you go through more data than you could on your own and shift demands to external memory, a bicycle more effectively transmits force for locomotion, a blind spot alert on your car can extend your ability to pay attention to your surroundings, and so on.

Automation that augments users therefore tends to be easier to direct, and sort of extends the person’s abilities, rather than acting based on preset goals and framing. Automation that augments a machine tends to broaden the device’s scope and control by leveraging some known effects of their environment and successfully hiding them away. For software folks, an autoscaling controller is a good example of the latter.

Neither is fundamentally better nor worse than the other—but you should figure out what kind of automation you’re getting, because they fail differently. Augmenting the user implies that they can tackle a broader variety of challenges effectively. Augmenting the computers tends to mean that when the component reaches its limits, the challenges are worse for the operator.

Is it turning you into a monitor rather than helping build an understanding?

If your job is to look at the tool go and then say whether it was doing a good or bad job (and maybe take over if it does a bad job), you’re going to have problems. It has long been known that people adapt to their tools, and automation can create complacency. Self-driving cars that generally self-drive themselves well but still require a monitor are not effectively monitored.

Instead, having AI that supports people or adds perspectives to the work an operator is already doing tends to yield better long-term results than patterns where the human learns to mostly delegate and focus elsewhere.

(As a side note, this is why I tend to dislike incident summarizers. Don’t make it so people stop trying to piece together what happened! Instead, I prefer seeing tools that look at your summaries to remind you of items you may have forgotten, or that look for linguistic cues that point to biases or reductive points of view.)

Does it pigeonhole what you can look at?

When evaluating a tool, you should ask questions about where the automation lands:

  • Does it let you look at the world more effectively?
  • Does it tell you where to look in the world?
  • Does it force you to look somewhere specific?
  • Does it tell you to do something specific?
  • Does it force you to do something?

This is a bit of a hybrid between “Does it extend you?” and “Is it turning you into a monitor?” The five questions above let you figure that out.

As the tool becomes a source of assertions or constraints (rather than a source of information and options), the operator becomes someone who interacts with the world from inside the tool rather than someone who interacts with the world with the tool’s help. The tool stops being a tool and becomes a representation of the whole system, which means whatever limitations and internal constraints it has are then transmitted to your users.

Is it a built-in distraction?

People tend to do multiple tasks over many contexts. Some automated systems are built with alarms or alerts that require stealing someone’s focus, and unless they truly are the most critical thing their users could give attention to, they are going to be an annoyance that can lower the effectiveness of the overall system.

What perspectives does it bake in?

Tools tend to embody a given perspective. For example, AIOps tools that are built to find a root cause will likely carry the conceptual framework behind root causes in their design. More subtly, these perspectives are sometimes hidden in the type of data you get: if your AIOps agent can only see alerts, your telemetry data, and maybe your code, it will rarely be a source of suggestions on how to improve your workflows because that isn’t part of its world.

In roles that are inherently about pulling context from many disconnected sources, how on earth is automation going to make the right decisions? And moreover, who’s accountable for when it makes a poor decision on incomplete data? Surely not the buyer who installed it!

This is also one of the many ways in which automation can reinforce biases—not just based on what is in its training data, but also based on its own structure and what inputs were considered most important at design time. The tool can itself become a keyhole through which your conclusions are guided.

Is it going to become a hero?

A common trope in incident response is heroes—the few people who know everything inside and out, and who end up being necessary bottlenecks to all emergencies. They can’t go away for vacation, they’re too busy to train others, they develop blind spots that nobody can fix, and they can’t be replaced. To avoid this, you have to maintain a continuous awareness of who knows what, and crosstrain each other to always have enough redundancy.

If you have a team of multiple engineers and you add AI to it, having it do all of the tasks of a specific kind means it becomes a de facto hero to your team. If that’s okay, be aware that any outages or dysfunction in the AI agent would likely have no practical workaround. You will essentially have offshored part of your ops.

Do you need it to be perfect?

What a thing promises to be is never what it is—otherwise AWS would be enough, and Kubernetes would be enough, and JIRA would be enough, and the software would work fine with no one needing to fix things.

That just doesn’t happen. Ever. Even if it’s really, really good, it’s gonna have outages and surprises, and it’ll mess up here and there, no matter what it is. We aren’t building an omnipotent computer god, we’re building imperfect software.

You’ll want to seriously consider whether the tradeoffs you’d make in terms of quality and cost are worth it, and this is going to be a case-by-case basis. Just be careful not to fix the problem by adding a human in the loop that acts as a monitor!

Is it doing the whole job or a fraction of it?

We don’t notice major parts of our own jobs because they feel natural. A classic pattern here is one of AIs getting better at diagnosing patients, except the benchmarks are usually run on a patient chart where most of the relevant observations have already been made by someone else. Similarly, we often see AI pass a test with flying colors while it still can’t be productive at the job the test represents.

People in general have adopted a model of cognition based on information processing that’s very similar to how computers work (get data in, think, output stuff, rinse and repeat), but for decades, there have been multiple disciplines that looked harder at situated work and cognition, moving past that model. Key patterns of cognition are not just in the mind, but are also embedded in the environment and in the interactions we have with each other.

Be wary of acquiring a solution that solves what you think the problem is rather than what it actually is. We routinely show we don’t accurately know the latter.

What if we have more than one?

You probably know how straightforward it can be to write a toy project on your own, with full control of every refactor. You probably also know how this stops being true as your team grows.

As it stands today, a lot of AI agents are built within a snapshot of the current world: one or few AI tools added to teams that are mostly made up of people. By analogy, this would be like everyone selling you a computer assuming it were the first and only electronic device inside your household.

Problems arise when you go beyond these assumptions: maybe AI that writes code has to go through a code review process, but what if that code review is done by another unrelated AI agent? What happens when you get to operations and common mode failures impact components from various teams that all have agents empowered to go fix things to the best of their ability with the available data? Are they going to clash with people, or even with each other?

Humans also have that ability and tend to solve it via processes and procedures, explicit coordination, announcing what they’ll do before they do it, and calling upon each other when they need help. Will multiple agents require something equivalent, and if so, do you have it in place?

How do they cope with limited context?

Some changes that cause issues might be safe to roll back, some not (maybe they include database migrations, maybe it is better to be down than corrupting data), and some may contain changes that rolling back wouldn’t fix (maybe the workload is controlled by one or more feature flags).

Knowing what to do in these situations can sometimes be understood from code or release notes, but some situations can require different workflows involving broader parts of the organization. A risk of automation without context is that if you have situations where waiting or doing little is the best option, then you’ll need to either have automation that requires input to act, or a set of actions to quickly disable multiple types of automation as fast as possible.

Many of these may exist at the same time, and it becomes the operators’ jobs to not only maintain their own context, but also maintain a mental model of the context each of these pieces of automation has access to.

The fancier your agents, the fancier your operators’ understanding and abilities must be to properly orchestrate them. The more surprising your landscape is, the harder it can become to manage with semi-autonomous elements roaming around.

After an outage or incident, who does the learning and who does the fixing?

One way to track accountability in a system is to figure out who ends up having to learn lessons and change how things are done. It’s not always the same people or teams, and generally, learning will happen whether you want it or not.

This is more of a rhetorical question right now, because I expect that in most cases, when things go wrong, whoever is expected to monitor the AI tool is going to have to steer it in a better direction and fix it (if they can); if it can’t be fixed, then the expectation will be that the automation, as a tool, will be used more judiciously in the future.

In a nutshell, if the expectation is that your engineers are going to be doing the learning and tweaking, your AI isn’t an independent agent—it’s a tool that cosplays as an independent agent.

Do what you will—just be mindful

All in all, none of the above questions flat out say you should not use AI, nor where exactly in the loop you should put people. The key point is that you should ask that question and be aware that just adding whatever to your system is not going to substitute workers away. It will, instead, transform work and create new patterns and weaknesses.

Some of these patterns are known and well-studied. We don’t have to go rushing to rediscover them all through failures as if we were the first to ever automate something. If AI ever gets so good and so smart that it’s better than all your engineers, it won’t make a difference whether you adopt it only once it’s good. In the meanwhile, these things do matter and have real impacts, so please design your systems responsibly.

If you’re interested to know more about the theoretical elements underpinning this post, the following references—on top of whatever was already linked in the text—might be of interest:

  • Books:
    • Joint Cognitive Systems: Foundations of Cognitive Systems Engineering by Erik Hollnagel
    • Joint Cognitive Systems: Patterns in Cognitive Systems Engineering by David D. Woods
    • Cognition in the Wild by Edwin Hutchins
    • Behind Human Error by David D. Woods, Sydney Dekker, Richard Cook, Leila Johannesen, Nadine Sarter
  • Papers:
    • Ironies of Automation by Lisanne Bainbridge
    • The French-Speaking Ergonomists’ Approach to Work Activity by Daniellou
    • How in the World Did We Ever Get into That Mode? Mode Error and Awareness in Supervisory Control by Nadine Sarter
    • Can We Ever Escape from Data Overload? A Cognitive Systems Diagnosis by David D. Woods
    • Ten Challenges for Making Automation a “Team Player” in Joint Human-Agent Activity by Gary Klein and David D. Woods
    • MABA-MABA or Abracadabra? Progress on Human–Automation Co-ordination by Sidney Dekker
    • Managing the Hidden Costs of Coordination by Laura Maguire
    • Designing for Expertise by David D. Woods
    • The Impact of Generative AI on Critical Thinking by Lee et al.

Permalink

Copyright © 2016, Planet Erlang. No rights reserved.
Planet Erlang is maintained by Proctor.