The Sand Reckoner: AI & Disruptive Tech

The 4 AI Roles of the Future

Simon Fabri — Sun, 01 Feb 2026 20:54:59 GMT

In Part 1 of this blog, we saw how Claude Code, despite its somewhat intimidating front-end, makes it feasible to create pretty compelling and useful software products, even by someone with pretty limited software experience. I built an AI bookmarking and research app, which I am now using in writing this blog. But what are the wider lessons? If Claude Code represents the state-of-the-art in AI coding assistants, what are the implications for software development going forward? And more broadly, what does this suggest about the way we will interact with AI agents in the future.

In this blog, I argue that as execution costs collapse, we will not be constrained by our ability to build stuff. Instead, I suggest that we will see the emergence of four key human roles that will shape the AI-augmented workplace of the future.

The 100x Economic Advantage

The cost structure of the software industry has been characterised by high up-front costs [software development costs and infrastructure], but very low marginal costs. Unlike physical products, which have per-unit material costs, there are practically no incremental per-unit costs associated with software. Replication is basically free, so the cost of adding incremental customers is largely driven by the cost of hosting and operating the software.

As a result, software and tech companies have been driven to rapidly scale the number of users to cover their high up-front costs. If, however, software can be created by AI, this no longer holds true. For example, the original team that created the Pocket app I tried to re-create was about 20 strong [1], and worked on a new version of the app for about a year. Thanks to Claude Code, I created my app over a couple of weeks for the cost of under $100. Of course, Pocket supported over 20 million customers - but the economics of software have truly been turned on its head.

It is now feasible to create curated experiences for a single customer! From a purely cost perspective, you are swapping a £100,000 per engineer salary with an approximately $50-100/month AI license. This is not an argument about the rights and wrongs of replacing human workers. It is about recognising that there is the potential to reduce the cost of creating software by a factor of one hundred.

The 10x engineering advantage

In my case, Claude Code created in excess of 11,000 lines of typescript code to create Weavify. (Yes, I know number of lines of code =/= quality or usefulness!) But what of its impact beyond hobbyists? Jensen Huang said that Nvidia uses it “all over” and “Anthropic made a huge leap in coding and reasoning, ” [2] while, reports claim that Microsoft engineers are using it in favour of, or together with its own product, GitHub Copilot. [3] By all intents, Claude’s Opus 4.5 represents a qualitative shift in the efficacy and reliability of AI code generation. Boris Cherny, the head of Claude Code, states that he uses it for all the code he creates. [4]

This shift is attributable to the way AI agents are now used. Unlike chatbots, they work on your behalf, and come back to you when it needs clarification, direction or to ask for permission to go to the next stage. They are no longer merely personal assistants. They are instead teams of co-workers. But what is the productivity advantage conferred by such systems?

It is not surprising that the AI frontier labs and Big Tech are at the front of the pack when it comes to the use of AI coding agents. Back to Cherny, he estimates that 95% of Claude Code is written by Claude Code - the tool literally building itself. The impact is difficult to quantify, and given that Opus 4.5 was only released in November, there is not a lot of data. But to get some sense, a team of 12 engineers was releasing 60-100 releases a day, including almost one external release a day. [5] The effectiveness can be seen in Anthropics’ Cowork tool, which was built “entirely” by a small team (around 5 engineers) with Claude Code within a week and a half. It is pretty unthinkable to go to a releasable preview in less than the time for a standard two-week sprint. This is pretty amazing. [5]

This step change is driven by AI coding tools shifting from being useful “auto-complete” tools and providing coding advice on point questions to being able to operate more autonomously. In an Ars Technica article, [6] a software engineer who worked extensively on the Linux kernel, said that he “now expects to tell an agent that ‘this test is failing, debug it and fix it for me.’ These are high-trust activities that have been traditionally very time-consuming. Not only can you ask an agent to do this, but you can also organise several agents to work in parallel, fixing, refactoring and optimising different parts of the code. A review of discussions on Reddit shows that engineers are increasingly confident in shifting to “refactoring large codebases” and taking over “upgrading libraries, fixing bugs, and performance improvements, freeing up time for innovative features.”

What can be achieved in small teams can be more difficult to achieve at scale. In their Q4 2025 financial report, Meta claims that since 2025, development teams have seen output increase by 30%, with power users seeing output by 80%. Looking across many tech firms, year-on-year productivity improvements in the order of 20-30% seems typical. But how can teams reach the productivity gains claimed by Claude Code?

“Syntax coding is dead” - bridging the human-machine gap

The most obvious takeaway from the latest evolution in AI coding is that the ability to write code syntax is no longer a useful skill. As Andrej Karpathy, former head of Tesla AI, put it, he now codes in English. Programming languages were always designed to act as a bridge between a programmer's intent and mental model and instructions interpretable by machines. The bridge was always incomplete; humans had to learn a “computer language” to give it instructions.

Now the bridge is complete. Humans can give instructions in English, and they can be confident of a good outcome. The Ars Technica report quotes a developer: “I still need to be able to read and review code,” he said, “but very little of my typing is actual Rust or whatever language I’m working in.” Coming back to Reddit, some developers are beginning to say that AI is now generating between 50-90% of code, and Anthropic’s own report into how AI is transforming work at Anthropic says that work has shifted “70%+ to being a code reviewer/reviser.” [7]

Looking to the future - the new categories of AI work:

I am not convinced that the role of humans as reviewers of AI output is durable. I think it is more representative of where we are today in AI maturity. As the AI agents become more reliable, I fully expect this role to be also be carried out more effectively by AI agents.

So what enduring roles will humans have in this new agentic world? My view is that there will be four key roles that will always have a strong human dimension. Let’s explore them.

1. Humans as Intelligent ‘AI Commissioners’

One thing we can be reasonably certain of is that AIs will never own assets. They will not own monetary resources, nor will they be accountable for what people do. Of course, human workers will take on tasks from AI systems, much as Uber drivers or workers in Amazon fulfilment centres do, but at the end of the day, the AI will not be the manager or ‘boss.’ The key point is that humans will remain accountable to their stakeholders (e.g. shareholders, taxpayers) for the conversion of resources into outcomes.

Humans will remain responsible for “how” AI is used. Consider a software example, where an AI coding agent is tasked to build a highly transactional social network feature in Python. It may then turn out, for example, that using Python for such applications is inefficient and consequently consumes too much cloud compute to be efficient. The service would have been profitable, had it been built in Rust, then it would have been profitable. Yes, the AI agent may have made the recommendation to use Python, but the decision rests with the person who commissions the task, the ‘AI Commissioner.’

So while many state that roles are “abruptly pivoting from creation/construction to supervision,” [6], I feel that this shift is more nuanced. A key enduring human role is to task AI systems intelligently, maintaining human accountability for the outcome. In software engineering, this means understanding the technical architecture choices that best match your needs. Are you building for one customer or 100 million customers? What is your business model? What are your per-unit costs? How will you market it? What will be your distribution model?

The Intelligent AI Commissioner role can be applied to all industries. In customer service roles, how do you specify and build systems of AI agents that create a great customer experience? If you create inconsistent, poor experiences, that is not the fault of the AI system, but of the people who commissioned, supervised and validated it. Think of law firms: their rules-based, document-based ways of working are natural territory for LLM disruption. However, despite how automated the system is, the legal accountability for the advice given will remain with a human. So, whether you are running a traditional law firm, where LLMs are used to help review and draft documents, or building a fully AI-based legal practice, it will always remain to humans to set the parameters of the service being built, and take legal accountability for the outcome.

2. Humans as ‘Human-AI Orchestrators’

One evening, while building my app, Claude Code got stuck in a rabbit hole. It was trying to debug a problem in the deployed version of the code, and kept to-ing and fro-ing, consuming tokens and failing to fix the problem. It was only when I established (with the help of Claude.ai) a clear workflow (see the Annex in the previous post) and mandated that Claude Code follow it, that we were able to fix the problem.

As I mentioned, much of the literature states that humans should be supervisors of AI. I don’t think that the term ‘supervisor’ does it justice. In this case, I was setting up the rules and the guidelines that I wanted my AI agents to follow when carrying out their tasks. As the person paying the monthly Anthropic bill, I am accountable for that spend. If Claude Code burns through tokens in ways I do not understand, then that is on me. By creating and enforcing a workflow and deployment strategy, I was taking ownership of the problem. I was not simply supervising; I was orchestrating the behaviour of my AI agents.

The orchestrator roles become critical when coordinating humans and agents within a single workflow. For example, the Claude Code team documents and maintains a workflow document (CLAUDE.md) that sets the rules by which the coding agents need to follow. Implicitly, this also guides the human software developers - thereby acting as a glue, a common playbook for the human-AI workforce. In software applications, this means understanding and owning your software development pipeline, ensuring that your development tools (CI/CD, repositories, test automation, etc., collaboration, documentation etc.) are designed and set up to work well for mixed human-AI teams.

CLAUDE.md for the Claude Code repo. We check it into git, and the whole team contributes multiple times a week. Anytime we see Claude do something incorrectly we add it to the CLAUDE.md, so Claude knows not to do it next ","username":"bcherny","name":"Boris Cherny","profile_image_url":"https://pbs.substack.com/profile_images/1902044548936953856/J2jeik0t_normal.jpg","date":"2026-01-02T19:59:00.000Z","photos":[{"img_url":"https://pbs.substack.com/media/G9rfKYRbkAA6Q3w.jpg","link_url":"https://t.co/zftuPx67oK"}],"quoted_tweet":{},"reply_count":63,"retweet_count":83,"like_count":2283,"impression_count":640764,"expanded_url":null,"video_url":null,"belowTheFold":true}" data-component-name="Twitter2ToDOM">

The complexity of this orchestration is what is currently holding back the rollout of agentic AI systems. In a previous post on the future of work, I wrote how less than 10% of firms were scaling AI systems outside of a single functional silo. This is not a problem that is restricted to the digital domain. A recent McKinsey report [8] describes how most human jobs consist of automatable and non-automatable tasks. The non-automatable tasks may include those that require physical activities (e.g. in construction), human-to-human interaction (e.g. in healthcare) or have a level of accountability that is difficult to delegate to an agent or robot (e.g. regulatory tasks, or human supervision). The orchestration role will involve creating workflows that work across humans, AI agents and robots.

The McKinsey report describes some imagined scenarios, including a buildings materials depot. This is very much in the ‘physical’ world end of the spectrum, but you can see how the workflow will include AI agents managing inventory and ordering materials, providing tailored customer advice, humans building customer relations and supervising the store, and robots transferring and loading materials. In another example, Jason Lemkin, the founder of SaaSTt, the largest community of SaaS (Software as a Service) events, describes how his company replaced a team of approximately 10 human sales and business development staff with a team of 2.5 people supervising 20+ agents. [9]

So irrespective of whether your workflow is fully digital or crosses into physical or human interactions, Human-AI Orchestrators will be key to your success. The AI orchestrators will be responsible for designing, integrating, managing and maintaining these complex systems. These are very much the CIO and CTO teams of the future.

3. Humans as ‘AI Validators’

We now close the “accountability cycle” by validating the output of AI or hybrid human-AI teams. I am not talking here about the act of carrying out the review and testing of AI outputs. Although we have seen how software engineering teams are now spending more time reviewing output, I believe that this is a transitory step, and before long, we will be relying on AI agents to test and validate outcomes.

I am instead referring to the socio-technical process by which an organisation ensures that its outcomes, and the processes used to create them, meet its customer expectations as well as its regulatory and compliance obligations. Customer expectations cover a broad range of considerations. In an aerospace components company, it is ensuring that the component you are creating meets the needs of the broader system (be it an aeroplane or a spacecraft), and is compliant with all the applicable and typically rigorous regulations. In this context, AI Validation is therefore an intrinsically human (and organisational) function. It ensures an AI-enabled outcome is acceptable and lawful for its intended use—i.e., it meets safety, regulatory, ethical, and customer obligations—and that there is contestability and redress when it fails. In healthcare applications, the medical professional responsible for your care remains accountable for your well-being, even if they may be using AI-supported diagnosis and treatment tools.

The key limitation of the current crop of generative AI algorithms are optimised to give statistically ‘most likely’ outcomes, which are not necessarily technically correct. There will therefore always be a role for humans to act as a person responsible for the outcome, and often this will have legal implications. I will certainly not offer any legal analysis, as I am not qualified or experienced to do so. That said, most jurisdictions, including the UK [10] and the EU, do not recognise AI or IT systems as being legally accountable entities. Instead, named individuals typically are expected to hold the role of the named “Validator”, being responsible for the outcomes of automated systems, whether it involves personal data, financial outcomes, safety-critical systems or healthcare outcomes. For example, the UK’s Information Commissioner’s Office sets the standard of “meaningful human review” when personal data is processed in a way to have significant outcomes. [11]

To be clear, people involved in AI Validator roles are not just the people who hold accountability for the systems, but those designing, maintaining and operating the systems that ensure the correctness of outputs and acceptability of outcomes. Examples are the chief engineer signing of declaration of conformity, or the Clinical Safety Officer in a healthcare setting. AI Validators are responsible for classifying risk, defining acceptance criteria, designing and maintaining verification tests, maintaining audit trails, and managing governance, including potentially review and rectification mechanisms.

4. Humans as ‘AI Innovators’ - A golden age of innovation?

AI tools are having two major disruptions on knowledge work. First, they are “work displacement” actors - replacing tasks previously carried out by humans with AI agents. Secondly, as described above, they are radically reducing the cost of producing complex outputs, such as software systems. As AI technology advances, there is no reason to believe that other fields of engineering, biotechnology, materials science, medicine, and the creative arts will not also see a collapse in the cost of ideation.

In all engineering and tech organisations that I have worked for, output has been constrained by engineering capacity. This, increasingly, is no longer the key constraint, and I believe we are entering a golden age of invention. Let me explain. The product development process is an iterative process. These can range in scale from Apollo Moon launches to the Build-Measure-Learn cycles popularised by Eric Ries’ Lean Startup methodology. The speed of product development has been traditionally constrained by the time to build and test systems. The “learning” bit - the act of figuring out what changes, pivots, optimisations, experiments to carry out to create a better outcome has, in my experience, rarely been the limiting factor. But remove that constraint, as I experienced when creating Weavify, teams can now innovate without the handbrake on.

This opportunity is largely overlooked, as most discussion on AI adoption appears to focus on the productivity and efficiency gains that hybrid human-AI systems can produce and the sheer difficulty of creating reliable human-AI workflows and systems. Even Google is focusing on metrics such as developer velocity, reducing toil and enhancing code quality. It is therefore not surprising that the prize of all this, an explosion of human creativity, is being missed. As the cost of running experiments falls, so does the cost of failure. You start to run out of reasons not to be ambitious in your ideation.

Your constraints shift “left” and “up” in the product development cycle. You are now limited by “early-stage” activities - such as product ideation, creativity, access to data, and the ability to carry out experimental A/B testing on real customers. Additionally, as the ability to move fast increases, then so does the importance of having a coherent strategy and what your core proposition is. Otherwise, it is too easy to get lost in the “noise” of multiple iterations. The risk here is not so much a lack of ideas or number of development cycles, but a lack of coherence. This makes strategic clarity, and human judgement, more, not less, critical.

AI Innovators will therefore be entrusted with the responsibility of steering these faster development cycles. Today, we typically call these roles product designers, product managers, scientists or innovators. I suspect that these roles will remain particularly well suited to human intuition for two main reasons. First, they operate in a space where there may be a lot of data about existing and previous products or iterations. However, innovation is the act of envisaging imagined futures, something on which little data exists, and consequently, prediction algorithms may struggle with. Secondly, these roles rely on understanding human needs, both expressed and unexpressed. Again, AI tools can help, but I am not sure they will be in the driving seat.

In a blog on the Product Management in the Age of AI, Brian Balfour, CEO of Reforge, an AI customer analysis company states, “The yearly strategy cycle makes no sense in this environment anymore, given how fast things are changing and speeding up… Leaders will be steering ships moving at 10X the speed before.” [13] I couldn’t put it better. If your velocity is 10X higher, then this must be matched by strategic clarity and rapid decision-making.

Conclusion

In this blog, I have tried to extrapolate what this inflection point in the performance of AI coding means for the future of work more generally. The most commonly used lenses - namely analysing which tasks cannot be automated, misses the mark. Additionally, the oft-used paradigm of humans as supervisors and AI as builders, while true, is somewhat simplistic.

What I have tried to do is ask an altogether more useful question. As the act of “building things” becomes cheaper, where will the uniquely human attributes of human connection, intuition and accountability continue to have enduring value in a world dominated by AI. My argument is that these attributes become more, or less valuable, and will be concentrated in four types of roles.

While we are seeing the first signals take shape in software development, I suggest that these four roles, or versions of them, will emerge across multiple sectors and industries. This is not a prediction on what specific jobs will emerge in the future. It is a framework of how to think about the reshaping of the workforce and the roles of humans within it

Claude Code and glimpses of the future - Part 1

Simon Fabri — Thu, 29 Jan 2026 20:46:39 GMT

Over the past few weeks, it has been impossible to miss the excitement and hyperbole over Claude Code, Anthropic’s AI coding tool over the past month or so. But is this hype pointing to a genuine inflexion point? What better way than spending some time hands on with it and try to make sense of what it tells us about the future of AI and work. Certainly, I’d hope to understand what it says about the state of AI coding today. Now, some caveats: I am at best an occasional hobbyist coder. I am not a software engineer. This is certainly not an assessment of how Claude Code can be used in production or enterprise-scale systems.

This will be a two-part blog. This post will run through my experiences with Claude Code, getting a sense of what it can do today. In Part 2, I will consider Claude Code says about the state of AI more broadly and what it means for the work of the future. Let’s go!

What others are saying

The excitement about Claude Code feels somewhat breathless. Andrej Karpathy, part of OpenAI’s founding team and previously director of AI at Tesla, said, “Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December.” Jaana Dogan, a principal engineer at Google, claimed that her team built a distributed agent orchestrator in one hour, a problem they had been working on for a year. Similarly, Boris Cherny, the head of Claude Code at Anthropic, said, “Pretty much 100% of our code is written by Claude Code + Opus 4.5. For me personally it has been 100% for two+ months now, I don’t even make small edits by hand. I shipped 22 PRs yesterday and 27 the day before, each one 100% written by Claude.”

So clearly something is afoot. Whilst these claims cannot be independently verified, the excitement feels real, somewhat reminiscent of what happened when ChatGPT was launched in 2022.

The project - introducing Weavify

So how difficult would it be to build something genuinely useful? I settled on an AI bookmarks and research assistant app. Until a year or so ago, I used to use Mozilla’s Pocket bookmarks app to tag web content that was interesting to read later. Mozilla then discontinued this product last summer, and I had not been able to find a suitable alternative. How about if instead I ‘vibe code’ my own app?

I started by having a think of what I’d want. Yes, I’d want bookmarking capabilities and the usual features to organise content according to projects (e.g. blogs I was thinking of writing), tagging, etc. But why not add AI-enabled features? So I started to scope an app that would allow me to add bookmarks (using a Chrome extension initially). It would auto-categorise, suggest tags, and create Twitter-like summaries. Once collated, it would also create summaries of my collated bookmarks, extract key themes, suggest topics and articles for further reading, and organise the bookmarks as neat citations. Yes, I know that Google’s NotebookLM already does much of this.

First Impressions

Well, Claude Code is initially a bit intimidating. Its text-based CLI is somewhat reminiscent of the 80’s TV Teletext service. At times, however, first impressions can be deceptive. As it is a CLI, accessible through your computer’s terminal, it effectively operates on your behalf, accessing and changing files, installing software and so on. It very much looks like a programmer’s tool, because it is indeed a programmer’s tool.

Back to the 80s? The Claude Code CLI

Anyway, what did I learn from working through this?

1. AI chatbots - your trusted advisors

Claude Code is optimised for “doing” stuff. All the coding, building, testing and so is carried out by Claude Code. So, given that I wasn’t quite sure where to start, Anthropic’s AI chatbot, Claude.ai can provide a great starting point. It will help you through coding and deployment options, explain how to use Claude Code etc. By having these speculative conversations about pros and cons in a separate ‘sandboxed’ chatbot, I felt confident that I was not driving Claude Code down unintentional rabbit holes.

Using a chatbot together with Claude Code [courtesy of Nano Banana Pro]

2. Documentation over Code

The Agile Manifesto, the seminal set of principles of modern software development, famously recommends prioritising the creation of code over documentation. I’d suggest that when working with AI coding agents, the reverse is true, at least for the human supervisor. Let me explain. First, the success or failure of AI coding is driven by the quality of its inputs. Bad inputs give bad outputs, with all code generated being predicated on the quality of its inputs. I learned that it is essential to get these right. In my case, I used a Product Requirements Document and an Architecture Specification. This helped ensure that Claude Code and I were on the same page. It uses the code produced and the associated documentation as their input context. The documents effectively act as a signal of intent, and the principal way I could steer the direction of travel of my app.

3. Slop is a Human Artefact, not an AI Artefact

Much has been said, with reason, about how AI slop is taking over the world. Whether it is all those inane videos on TikTok, AI-generated self-promotion on LinkedIn, AI-generated E-books and so on. However, in coding, it is the human who is responsible for slop. Here’s why. Give Claude Code, or indeed any generative AI tool a vague, generic question, then you will get a generic output. Generative AI algorithms are designed to create outputs most likely to elicit positive feedback from humans, so they tend to create statistically “average” outcomes. They can be fluent, polished, and coherent. But they are rarely distinctive.

The same is true for AI coding. Give it a vague input, and the tool will create a middle-of-the-road, “best fit” answer. It will have no implicit understanding of the nuance of your needs nor of your customers’ drivers, and so it will create something that works, but that is also pretty generic. In other words, slop. The more specific you are, the more context you can provide Claude Code, the better it can help you, and the more distinctive and useful the output can become.

4. Planning Mode - who is the intelligent being in this relationship?

Claude Code offers three modes of operation: Ask, Plan and Code. In Planning mode, you are guided through the process of creating the input artefacts, primarily the requirements document. This is where it becomes really interesting, as you can ask Claude Code to ask you clarification questions. Claude will refine its understanding of your intent by asking you several questions, asking for your preferences, or instead to suggest alternatives. It is designed to specifically surface trade-offs. Some may be architectural - e.g. what is the authentication strategy for the app, some may be usability related -e.g. do you want a ‘one-click’ bookmark feature for the app.

This is where your fundamental user insights come in. Here you are firmly in the role of product manager, working with a product development team, guiding them through the implications of what you are asking for, whilst asking probing clarification questions. Sometimes I did not understand the implications of the tradeoffs being proposed, but a quick chat with Claude.ai quickly solved that problem.

What ensues is a very involved back-and-forth with Claude Code, where you are iteratively refining and clarifying intent. Again, this is the heart of what’s been described as “coding in English.” I must say that I found this experience pretty spooky. It is somewhat of a role reversal compared to using a normal AI chatbot, where the user is the one asking the questions and guiding the conversation. Now, the AI is the interviewer. It patiently figures out the detail of what you have failed to articulate clearly, as well as surface implications or considerations that you have not yet thought about. This was probably the most impressive aspect of the Claude Code experience, but also somewhat chastening, providing a glimpse of what interacting with truly agentic systems might feel like.

5. Plan and Iterate - Your “house rules”

As I mentioned, while getting a great plan in place is essential for success, all the principles on how to approach creating a viable piece of software remains valid. For example, Claude.AI suggested how best to chunk up the development, starting with the architectural fundamentals - getting the Google Authentication system working, creating the databases for storing the content, establishing the secure storage for secrets, and so on. This allows you and Claude Code to take controlled, incremental steps towards your intended outcome. Yes, many people indeed claim that you can create an app in a single shot, and I have no reason to doubt it. It can, however, be an expensive and time-consuming exercise to undo, refactor etc. Better, in my mind, to take it step-by-step, and build from the ground up.

And this brings us to a key point. With Claude Code, you are in control of the development approach. Detailed planning up-front or step-by-step iteration and experimentation? It is up to you. Effectively, if you wish, you are in control of how you carry out the development. The tool for doing this is claude.md, which are effectively the “house rules + how we work”. This lets you shape what effectively is your personal or organisation-wide methodology, covering how you will document your work, the approach to testing the code, how to commit code to your git repository, your approach to security etc.

6. Fast creation, slow fine-tuning. Don’t talk about cost!

My experience creating this one-off app has been that getting the functionality off the ground was really quick. Once you have clear requirements, Claude Code can make really quick progress in generating a pretty decent first stab of your product iterations. It feels like you are flying, and quite mesmerising, seeing Claude deploy multiple agents in exploring and shaping different parts of the plan, as well as seeing it go through the process of creating code, building, testing, debugging, fixing, etc. In practice, there are different subagents working on different tasks, but there is nothing stopping you from creating different agents manually.

And so all is bliss, until you get to the point where the product is nearly right, but not quite perfect. So you start fine-tuning, implementing small corrections, going back-and-forward, and then you see your token count rise. And this is where time and cost, gets burned. Not only in fine-tuning your product, but in working through the edge-case defects that stop it from being truly production-ready. Granted, this was only my first attempt, but I got to 80% to where I wanted to be in 2 or 3 evening sessions, and then spent more than that amount of time doing final debugging. My $20/month Claude Pro plan was not quite sufficient to meet my desire for progress, particularly when, in the depths of debugging, I found myself occasionally topping up, rather than waiting for my next quota of usage.

In conclusion

Here’s the web version of Weavify. It also renders quite nicely in a mobile view, though that is something I only thought of later, and so had to refactor the UI. The app works really well, and is now genuinely useful. I will reflect on what I think this means more broadly in Part 2 of this post, but for now, I feel equally excited and anxious. The experience has felt like working with a team of really-enthused expert engineers. Engineers with the tenacity to keep going and persist through errors, patiently seeking alternative approaches, and with the empathy to ask sensible questions, curious, but never sneering.

It therefore feels really invigorating. For all the talk of AI leading to cognitive laziness, I feel I have learned a lot more about the practicalities of building real-world apps in a couple of days than I have in a long time. It is empowering. Anyone, be they engineers, product managers, founders, or senior execs, can now experiment with ideas in a fraction of the time it would otherwise have taken. This is doubly true for anyone who hasn’t got the hands-on skills to create useful code themselves - much like yours truly. Yet, at the same time, I feel a slight sense of unease. Claude Code is giving glimpses of what the future of work may be, and it’s not something I believe we are prepared for. For more of that, wait for Part 2.

AI themes for 2026

Simon Fabri — Sat, 03 Jan 2026 17:43:46 GMT

Niels Bohr famously said that making predictions is hard, especially about the future. So I am definitely not going to set off on a fool’s errand and try to figure out what lies ahead in 2026 from an AI perspective. I will, however, consider the themes that may be significant as the world navigates the implications of wide-scale adoption. These views are, as always, my own, although I will draw on observations made by others, adding links for further reading.

Now with an (AI-generated) companion podcast. Give it a try.

In this post, I explore what may happen to AI demand (a rather obvious outlook) and the resulting implications for the evolution of AI compute and data centres. I then touch on a favourite topic of mine - AI interpretability and explainability (and their limits), and whether this will have any impact in the roll-out of properly autonomous agentic workflows.

1. Jevons Paradox. A model for AI demand?

In a recent blog, Aaron Levie, the CEO of Box, argued that AI adoption will follow the pattern known as Jevon’s Paradox. William Stanley Jevons was an English economist at the time of the Industrial Revolution. He noticed that improvements in the efficiency of coal use drove up demand for coal, rather than decreasing it. Where output is constant, the use of more efficient technologies should reduce the demand for raw materials (e.g. making cars more fuel efficient). However, it also reduces the cost or barriers to entry of using technology, and hence demand instead increases.

We have seen this pattern numerous times, such as the transition from mainframe computers to PCs, the adoption of faster Internet connections and the availability of cloud computing. All these technologies not only made existing processes more efficient, they were also intrinsic drivers of brand new ways of using the technologies. Faster Internet did not just mean quicker email; it created the online gaming and streaming industries.

Levie is highly likely to be right that looking at AI demand through the prism of existing use cases is to miss the point. AI agents will dramatically lower the barrier to entry to the automation of non-deterministic tasks. Earlier this year, Stanford economist Erik Brynjolfsson explored the implication of Jevons Paradox. He argued that for some professions, productivity will be transformed, which will result in lowering the cost of their output, which in turn will increase the demand to the lower prices offered. It is the last point that is critical. Take the case of software developers. If their productivity increases 10-fold, will this result in an explosion in demand for more software development? Time will tell. What it certainly will do is shape how AI compute is structured.

Levie, Aaron. “Jevons Paradox for Knowledge Work.“ LinkedIn, December 28, 2025.

• Rosalsky, Greg. “Why the AI world is suddenly obsessed with a 160-year-old economics paradox.” NPR (Planet Money), February 4, 2025.

2. What Training / Inference Mix are we heading towards?

The growth in AI data centre construction is straining industries as diverse as construction, energy supply, AI chips, networking equipment, and most recently RAM. Much of the attention has been focused on the demand required by frontier labs such as OpenAI to develop and train the latest generation of AI models. However, AI model training is only half of the AI compute mix, with the other half being inference - i.e. using AI models to solve the tasks assigned to it.

The mix of AI workloads matters -a lot. Training of large frontier models makes use of enormous data sets (at least hundreds of terabytes) and several millions of compute hours of high-end GPUs to create models with trillions of parameters. This is what has been driving the adoption of the highest-performing GPUs, as faster processors and larger clusters shorten the model training, experimentation, development and validation time. These workloads are large, complex, and require a lot of orchestration. They are typically bursty as the AI labs pass through different phases of their development.

However, as the AI industry begins to mature, the compute requirements for AI usage (i.e. inference) will overtake the demand for training. Already, reports by McKinsey and Deloitte estimate that global AI compute usage is roughly split 50/50 across compute and inference. The dynamics of inference compute are very different to training. When solving an AI task, what is important is latency (i.e. the round-trip time to process a query) and cost. Therefore, inference workloads are characterised by billions of small requests, creating a lot less bursty traffic profiles, optimised for user experience.

Mix of training/inference compute 2021-2025

So what are the implications of this? As companies will seek to shift from growth to sustainable profits, cost and latency will be the two driving factors to enable them to scale as more efficiently as possible. We are therefore likely to see a split in hardware towards GPUs optimised for large-scale training loads, and hardware optimised for real-time, low-latency inference loads.

The early stages of the generative AI boom has been optimised for development speed - how fast can the frontier labs develop, train and optimise their models and gain an edge on their competitors. Training clusters are therefore optimised for very high levels of parallelism and cluster-scale efficiency in order to train very large models, with networking and memory capacity also optimised for highest bandwidth operations possible. Similarly, as training typically operates in very large batches, reliability is a key concern. On the other hand, inference fleets run continuously, as they pool together requests from a very large number of users. They are therefore optimised for cost per token, which means high concurrency, utilisation efficiency and optimising power consumption.

This mix matters as it will shape the compute mix in future AI data centres across NVIDIA-style GPUs or Google-style TPUs, and where the data centres are located - whether closer to where power is available or closer to where the user is.

Stewart, Duncan et al., “More compute for AI, not less.” Deloitte Insights, November 18, 2025.

Arora, Chhavi et al., “The next big shifts in AI workloads and hyperscaler strategies.” McKinsey & Company, December 17, 2025

3. Will AI become more explainable?

At heart, modern generative AI models are non-deterministic ‘black box’ systems. In other words, the output for a given input (e.g. a prompt) will be different each time the model is run, and it is not possible to work through the internal reasoning. This is due to the inherent structure of the most generative AI models. Individual concepts or pieces of data are not localised in individual areas of the AI model, such as a neuron or parameter, but are instead spread in connection patterns across large numbers of parameters. Furthermore, modern AI systems are multi-layer (or multi-step) systems where the final output is produced by a long chain of intermediate non-linear steps. This means that it is intractable in practice to work backwards through the ‘reasoning’ process, or even to work back to the source training data that contributed to the response. In a nutshell, AI systems are typically “emergent” rather than designed, so that their inner workings are difficult to understand.

Why does this matter? First, for any high-stakes application, such as safety-critical applications, health applications or where there are large financial implications, even a small number of mistakes can be harmful. The very fact that you cannot explain how a system came to make a decision is a blocker to its adoption on legal, regulatory or ethical grounds. Similarly, the opacity of models means that there is no way to detect if the models or their training data have been tampered with maliciously, meaning that it is difficult to trust them. The core question is that as the capabilities of AI systems continue to improve, the risk implications of their lack of explainability will continue to grow. For example, how can we check if an AI is being devious?

In a way, we are beginning to make some progress. “Chain of Thought” (CoT) processing, which involves AI systems decomposing problems into a sequence of prompts, can give indications on the approach taken to solve a problem. e.g. “scope the problem, search for references, categorise and remove duplications, etc.” Howeve,r the inner workings of each individual step remain obfuscated.

Much current research focuses on mechanistic interpretability, in other words, mapping the internal connections of the model. Other promising approaches, such as the use of Sparse Autoencoders - i.e. decomposing groups of neurons that activate on many features to maps of neurons that activate on individual features- are starting to be applied to large LLMs. [see below for a description of how they work]. The third main category, called neurosymbolic reasoning, combines neural-network-based learning with logic-based reasoning. Whilst the neural parts remain a black box, these models have applicability in constrained, rules-based applications.

Approaches to AI Explainability

In a nutshell, while 2025 has seen some breakthroughs in interpretability and explainability, no approaches have yet made it into mainstream LLMs, due to a combination of intrinsic model architecture constraints and cost efficiency. It is clear however, that the demand for solutions in this space will only continue to grow, particularly as AI systems are allowed to act more autonomously

Dario Amodei, “The Urgency of Interpretability”, April 2025

Hussein N. et al., “Mapping LLMs with Sparse Autoencoders”, People + AI Research (PAIR), Google, October 2025

4. How should we manage autonomous AI?

2025 has seen the first implementations of AI agents - where AI instances have some autonomy in the tasks they carry out. Probably the space where their use is most widespread is in software development, where tools such as OpenAI’s Codex software development agent and GitHub Copilot Workspace are seeing AI tools take on a greater share of tasks across the software development lifecycle. However, we are still in the very early stages. A survey published by McKinsey in November found that less than 10% of large firms are scaling the use of AI agents in any given function. So while adoption of AI is growing, it is predominantly driven by individual workers using it as an assistant to carry out their individual tasks.

To be clear, the direction of travel in the development of AI agents will result in AI-enabled systems that can observe, plan, act, respond and iterate over prolonged periods with limited human interaction. When AI is used as an assistant, there is, by implication, always a human in the loop. However, in agentic implementations, humans are no longer implicitly within the workflows, a shift from the Copilot Model to the Autopilot Model. This can have a number of implications.

First, coming back to Jevons’ Paradox, we truly enter the territory of exponential growth. We are already seeing implementations in systems consisting of easy-to-characterise rules such as customer support centres, enterprise workflow and finance operations. As the technology becomes more accessible, we will see agentic deployment in a broader range of business functions. The incremental cost of adding a new agent will be low compared to the potential benefits of adding one more agent - certainly a fraction of the cost of a human worker. Companies that manage to master multi-agent orchestration will gain a compounding advantage. They will therefore have a strong incentive to deploy as many agents as technically possible. The possibility therefore, exists of an explosion in size of agentic systems.

This will likely change the nature of work, with more roles dedicated to oversight and supervision of AI agents. We are all familiar with the “span of control” problem when managing human workers. How many AI agents can a human reasonably be responsible for? What about humans supervising AI agent supervisors? This will have significant implications for organisational design - teams responsible for managing interactions between agents, monitoring interactions with external systems, suppliers and customers, as well as teams responsible for oversight, monitoring and audit.

From my perspective, as human workforces are augmented by growing agentic AI work, there will be a corresponding shift to interactions and traffic being driven by AI systems rather than by humans. This means that APIs and endpoints designed for human interaction will instead be used by autonomous agents. We are already seeing web platforms being overwhelmed by AI use. The low incremental cost is already giving rise to “Agentic Sprawl”, and this will likely accelerate, both in terms of traffic internally within enterprises (e.g. between different enterprise applications) as well as with external parties.

Bringing it all together

I wasn’t quite sure where this blog post would lead me, but I feel that all themes here are interrelated. For all the talk about agentic AI systems, rolling out agentic workflows that span across multiple functions in a business will require a redesign of those workflows to be tailored around AI agents rather than human workers. This will be a non-trivial technical and workplace design challenge, not least due to the explainability limitations of current AI models. However, should these problems be solved, then we can expect a truly Cambrian explosion of AI agent adoption.

AI and the Future of Work

Simon Fabri — Tue, 18 Nov 2025 20:22:09 GMT

A couple of weeks ago, I was lucky to spend some time in Cambridge, Massachusetts, attending seminars at MIT and Harvard. (That’s one off the bucket list). While there, we had the opportunity to attend a fascinating panel discussion on possible futures of how AI will shape the future of work. I was particularly thrilled that the panellists included Ethan Mollick, one of the best commentators on AI, and Daron Acemoglu, author of an incomparable history of technology. This blog post builds on that discussion…

The state of AI in the workplace today

Before considering the future of work, let’s briefly consider how generative AI tools are being used today. A recent review of the labour market by economists at Stanford University [1] shows that in most developed economies, between 20% and 40% of workers currently use generative AI tools.

Source: [1]

What is more striking is the impact AI is having on carrying out individual tasks – something we can refer to as “micro-productivity”. This report highlighted the impact on tasks across many disciplines, from writing to technology design, management of personnel and many more. The time cut to carry out these tasks was reduced by between 2.5 and 5 times over their unassisted duration.

Average number of minutes to complete a task with and without Generative AI. Source [1]

It is therefore clear that while AI may not (yet) be replacing most human jobs, it is certainly being used to automate individual tasks at a significant scale. The report states finds “limited evidence that Generative AI exposure significantly affects job openings or employment levels.”This is consistent with my findings in a previous blog, which showed that while there are productivity gains experienced by individual workers, most companies are not experiencing these gains at the enterprise level [2]. In the seminar, Mollick argued, quite convincingly, that practically everyone is quite clueless about how to capture value or productivity. The big Frontier AI labs are ‘simply’ in a headlong race to produce the largest models tasked at performing best across a range of AI benchmarks in the quest to be the first to achieve AGI. The premise is that this is a high-stakes, winner-takes-all race. Mollick claims that to these labs, how these models will integrate in the workplace is a secondary concern.

In the enterprise, there is no established playbook for achieving AI-powered success, although there is broad experimentation. Companies will need to be fundamentally rewired to benefit from AI agents forming a core part of their workflows, rather than simply automating workflows designed for humans. We are still in the infancy of this process, and it’s occurring in pockets, such as in support centres and in software development teams. A report by McKinsey [3] states that across most organisations, agentic AI adoption is still in the pilot phase, with less than 10% of firms currently scaling AI agents.

Stating the obvious – AI continues to get (much) better

As this blog is supposedly about the future of work, rather than its present, where are we heading to? I will start by stating the obvious – AI continues to get better. Most models are updated every few months, with generational changes (e.g. GPT-4 to GPT-5) roughly on an annual basis. Strikingly, these improvements currently continue to be exponential in nature. Although AI systems routinely outperform humans in specific tasks such as reading, this is not sufficient to fully automate a human job. In practice, the jobs carried out by humans consist of fairly complex tasks, each of which is a long chain of smaller steps. To be able to fully replace that human, an AI agent must be able to carry out this long chain of tasks at an acceptable error rate.

To give a sense of where we are with AI viably replacing a human workers, consider research [4] coming out of METR, an organisation that assesses AI frontier models. This suggests that the task length that LLMs can carry out reliably [i.e. with >80% success rate] is doubling approximately every seven months. In 2022, GPT 3.5 could only carry out tasks which would take a human software engineer ~10s to carry out (with 80% accuracy). By 2025, ChatGPT-5 had shown sufficient improvements to carry out 26min worth of work at 80% accuracy. If AI systems were to continue on the current improvement path, a somewhat speculative assumption, then by late 2027 to early 2028, AI systems can successfully complete tasks equivalent to a software engineer’s whole working day.

Rate of improvement of GenAI taking on work of software engineers [4]

How AI changes the value of work

Given that AI is already replacing tasks across a wide range of professions, what is the impact on the value of (human) labour, and hence the likely effect on wages? A recent paper by Seb Murray [5] describes why automation is not always bad for workers. The key finding is that if automation removes the simpler parts of a job, the work that remains often requires more expertise, consequently demanding a higher premium. Conversely, where automation impacts the more skilled parts of a role, wages will fall. Consider accounting clerks and bookkeepers. As spreadsheets took over the more manual parts of their roles, they evolved into financial analysts. Over the period 1980-2018, the number employed in these roles fell by a third, while salaries rose in real terms by nearly 40%. On the other hand, taxi drivers used to rely on deep local knowledge to navigate urban areas. London taxi drivers needed to pass a test called “The Knowledge”, which typically took three to four years to master. Yet Uber’s navigation software has largely nullified the value of this know-how, seeing taxi wages drop by between 30-60%.

So applying this to the advent of generative AI, we can extrapolate that in the absence of policy interventions, there can be a broad range of outcomes for different professions, both in terms of the numbers employed, as well as upon their wages. We can postulate that, for example, customer support staff with deep technical know-how of their products can easily be replaced by AI agents that have ingested the technical datasets and knowledge base of their products.

Learning 1. Combine domain expertise with AI savvy

So what suggestions can I offer to whoever is worried that AI may replace their job? Once again, I will rely on the AI adoption within software development as a preview of AI’s broader impact on the workplace. Andrew Ng, a serial AI entrepreneur and expert, explains how the most effective software engineering hires are those who combine an understanding of software engineering fundamentals with the practices required to construct AI applications (such as prompting, RAG, agentic workflows, etc.) [6]. He says experienced software engineers who retain a pre-2022 way of working risk becoming obsolete. Similarly, less experienced engineers straight out of college who don’t have a strong understanding of computer programming fundamentals have limited utility. Whether this means that newly-minted graduates are over-reliant on AI is unclear. There is anecdotal evidence on X that startups are avoiding hiring new grads for this reason. As Ng puts it, “without understanding how computers work, you can’t just ‘vibe code’ your way to greatness.”

The general lesson for the AI-augmented workplace is that unless your job is fully automatable, then you need to understand how best to augment your job with AI. This means understanding how to combine your own domain-specific expertise with an understanding of how GenAI tools can help you. This is true whether your skills are in the creative arts, commercial skills or engineering. Whilst I am not qualified to comment on the likely macro-economic outcomes, if we look at the impact robotics and automation have had on manufacturing jobs in the US and Europe over the past twenty years, there are much fewer jobs, though better paid. As the rather tasteless joke about the two runners who bump into a bear goes, you don’t have to outrun the bear; you have to outrun your friend. Though this advice is clearly not conducive to teamwork!

Lesson 2. Optimise AI for problem-solving and skills development.

And in terms of advice for those designing AI-enabled workplaces, it’s worth returning to the MIT seminar I opened this blog post with. Much of the debate was on whether the nature of AI’s impact upon work is pre-ordained or inevitable. Zana Buçinca, a researcher at Microsoft, argued that there isn’t a single foregone conclusion on how AI will impact work and skills. [7] This, instead, will depend on how AI systems are integrated into the workplace. Buçinca suggested that systems that are designed solely to fully automate activities (i.e. simply to optimise for the immediate business outcome) can degrade the human operators’ expertise over time.

The risk of deskilling is a much-researched topic in the field of medicine, where an over-reliance on AI can see an erosion of clinical skills [7]. This mirrors what has been seen in aviation for many years. An over-reliance on autopilots and automation has been linked to a decline in pilots’ manual flying and monitoring skills. The US Federal Aviation Administration consequently issued guidance to ensure that pilots regularly practice manual control. The conclusion is that AI-assisted systems should be designed not only to best automate and optimise the task at hand, but also to optimise the human experience, particularly cognitive engagement. For example, she suggested that systems that offered an AI recommendation together with a contrastive option as an alternative resulted in better outcomes than systems that made single recommendations. They also stimulated better and deeper human engagement with the AI assistants, indicating that chatbots do not offer a one-way trip to deskilling.

In my view, organisations’ long-term interests lie not only in the most effective short-term optimisation, but in nurturing and developing the bedrock of skills required to maintain the integrity of the profession. As professions such as law and management consultancy now rely on AI systems to carry out research previously assigned to junior associates, AI tools in the workplace must therefore be designed to help the professional development of workers. The alternative is a world where the integrity and know-how of entire professions risk being eroded in the long term.

Conclusion on the Future of Work

In a nutshell, I am really not quite sure what the future holds, and please accept my apologies if you feel let down by the lack of a bullish oracular prediction. I am fairly confident about a couple of things, though. First, as companies figure out how to design their workflows to integrate autonomous AI agents as well as AI-augmented workers, they will eventually see the productivity gains that have eluded them so far. As these changes take place, many jobs will go the way of bookkeepers, typists, and many office jobs of decades gone by. Those that remain will be AI-augmented in one way or another. The key discriminator for each job will be whether AI will replace the most complex tasks we carry out, or whether it will relieve us of the burden of the mundane and repetitive parts of our jobs. As AI systems become more deeply integrated into organisations’ workflow, the only way for organisations, and indeed society, to sustainably maintain and build the human skill base is to design AI systems that optimise the human experience, as well as the business outcome. While this may sound obvious, this does not appear to be the state of play today.

References and Further Reading

The post AI and the Future of Work appeared first on The Sand Reckoner.

What does AI mean for university education?

Simon Fabri — Mon, 08 Sep 2025 21:27:10 GMT

This weekend, Geoffrey Hinton, who pioneered the field of deep learning and one of the so-called ‘godfathers of AI’, gave a wide-ranging interview to the Financial Times [1] (link behind a paywall). In it, he painted a rather gloomy picture of how AI is going to change the world. He said, “Rich people are going to use AI to replace workers. It’s going to create massive unemployment and a huge rise in profits.”

We should treat these words with caution. It is well-known that experts tend to over-emphasise the importance or impact of their space. Nevertheless, you’d be hard-pressed to find any serious technologist or economist suggesting that AI will be anything other than disruptive. Like many parents, I wonder how to prepare my teenage kids for the uncertainty that lies ahead. More specifically, in this blog, I have a look at how well the UK higher education system appears to be preparing our young people for this change. Note that I am not an educator, nor claim any expertise in pedagogy, so treat these musings with healthy scepticism.

A Tale of Two Time Horizons

We can apply two very different horizons to explore the disruption AI will have on our economy. If we were to take a very long view, AI offers to radically bring down the economic cost of productive output, particularly amongst knowledge workers. In itself, this is nothing particularly novel. The advent of agriculture displaced foraging (so-called hunter-gatherers), bronze and iron tools displaced stone tool-making, the invention of the printing press made the skills of monastic scribes largely redundant, the invention of steam-powered weavers saw the onset of the industrial revolution, while mass-production in the 20^th century saw skilled artisanal labourers replaced by less-trained factory workers. In all these transitions, overall economic output increased, though at the cost of (often significant) hardship for those workers who lost their jobs.

If we then zoom into the onset of generative AI, we can see that timescales of change collapse to a handful of years. It is the speed of change, rather than its impact, that is truly revolutionary. For example, ChatGPT (based on GPT 3.5) was launched in November 2022. This year’s cohort of computer science graduates were already well into their first or second year of undergraduate studies when ChatGPT launched AI into mainstream discourse. At the time they were selecting university courses, the majority of these students would have been oblivious to the pending arrival of gen AI, yet by the time they graduated, they were facing a very different job market.

Graduate hiring in the UK and US has plummeted over the last three years [2]. A separate study by McKinsey, shows that the reduction in hiring was twice as acute in jobs more exposed to AI compared to those with low exposure [3]. The point is that the displacement in occupations took a couple of generations at the time of the Industrial Revolution. This time round, that disruption may take as little as five years. Yes, we have seen this seismic disruption before (or at least, humanity has). What is new is the speed at which it is taking place.

Trend in Graduate Job Hiring, UK & US. Source: Financial Times

The hollowing out of entry-level jobs

A paper [4] by researchers at Stanford University, helpfully titled ‘Canaries in the coal mine’, studied payroll data of 50 million workers in the US. The study found that employees aged 22-25 in AI-exposed occupations were 13% more likely to lose their jobs than others in less-exposed jobs. As I explored in my previous blog, entry-level software engineering jobs, which involve component-level programming, debugging, testing and documentation, can easily be replaced by generative AI tools such as coding copilots. The study attributed much of this job loss to areas where AI can automate work (i.e. replace a human worker), rather than augment (i.e. help a human worker) the work. In an op-ed in the New York Times, the Chief Economic Opportunity Officer (what a job title!) at LinkedIn describes [5] how the bottom rungs of the career ladder are disappearing. Outside tech jobs, legal firms, consultancy companies, retailers and so on are all now looking towards AI to take on duties once carried out by entry-level roles.

Changes in headcount in US firms by age group [4]

Software recruitment compared to other professions in the US [pragmaticengineer.com]

Looking Ahead – what does this mean for graduate jobs?

Dario Amodei, the CEO of Anthropic believes this trend will accelerate. In an interview with Axios [6] he explained that Anthropic research shows that while AI models are primarily being used for augmentation, he believes that over the next two years, their use will tip more and more towards automation.

So just think about it for a moment. This will happen 1-2 years before this year’s university undergraduate intake enters the job market. At the beginning of the year, Mark Zuckerberg said that in 2025, “AI will be effectively a sort of mid-level engineer that you have at your company that can write code.” Coming back to Amodei, he believes that AI will replace half of entry-level white-collar jobs in the US within five years, increasing unemployment to 10-20%. For context, a shift of 10% in unemployment represents a loss of 17 million US jobs, which dwarfs the approximately 6 million manufacturing jobs lost in the US through globalisation of manufacturing.

UK Universities Spring into Action. Or do they?

As I mentioned previously, I have a busy autumn ahead, visiting a host of Russell Group Universities (supposedly the best universities in the UK) with my son who is hoping to start an engineering degree in a year’s time. Surely, these august, world-leading institutions, home to the country’s best technology thinkers dedicated to moulding and nurturing the technologists of the future would have this challenge in hand?

You will therefore understand my surprise that, having sat through about ten “introduction to engineering” courses, covering mechanical, aerospace, automotive and electronics disciplines, not once did I hear anything about AI. Of course, they were all happy to talk about their wonderful facilities, their co-curricular opportunities, their partnerships with industry, and the sheer excellence of their academic credentials. But with the notable exception of a talk on a B.Sc. in Artificial Intelligence course, not a peep on AI.

So, given that AI is undoubtedly going to transform most technology roles, does this feel adequate? My previous blog suggested that all engineers will effectively become supervisors of AI, be they by using AI tools as part of the design process, optimisation, prototyping, experimentation or validation. All engineers will need to be adept at their domain of expertise, as well as in the use of AI and data. Yet none of these universities sought to mention anything about it.

I then headed over to the undergraduate prospectus web pages of the same universities and asked ChatGPT to assess the curricula for engineering degrees for subjects related to data science and AI. I selected mechanical engineering-related courses as these are technology specialisms where data and AI do not traditionally feature heavily. On looking at the data, you can see that AI typically appears as a ‘specialisation’ subject in the latter years rather than a foundation for the course. This takes the guise of subjects such as control theory, machine learning, robotics and mechatronics. For this reason, 4-year MEng courses tend to have nearly double AI/data content of 3-year MEng courses. There are exceptions, though. Southampton offers a compulsory Data Science and Computing for Engineers in year, while Cambridge unsurprisingly offers a host of rigorous-looking options, with an early focus on Computer Science, followed by later options such as Probabilistic Machine Learning.

AI content of selected UK mechanical engineering courses

UniversityMechanical BEngMechanical MEngAerospace/Aeronautics BEngAerospace/Aeronautics MEngUCL6–9%8–12%——Southampton8–12%12–18%6–10%10–15%Bath6–9%9–13%5–8%7–11%Bristol8–12%10–15%8–12%10–15%Warwick8–12%12–18%——Loughborough6–9%9–13%8–12%12–16%Cambridge (BA≈BEng for years 1–3)3–6%6–12%3–6%6–12%Imperial8–12%12–18%—12–18%

Protecting Academic Integrity and the Learning Process

UK universities seem to prioritise the integrity of their qualifications above all else. This can be seen in “Russell Group principles on generative AI in education” policy paper [7]. This paper focuses primarily on addressing the risks associated with the use of Gen AI in higher education. Privacy, bias, inaccuracy and misinterpretation of results, ethics, plagiarism, and exploitation are all (quite rightly) highlighted as concerns. However, most universities provide thoughtful guidance on how AI tools (particularly Gen AI) can be used in to support learning as well as where and how they can be used to help create assessed work. Some even use these policies to ensure that students are ready for life in an AI-enabled world.

So what should Universities learn from this?

1. A University Degree is not a guarantee of anything.

Universities need to wake up and smell the coffee. Evidence is mounting that employers are increasingly not relying on a University degree or formal qualification as evidence that candidates have the required skills to compete in the new jobs market. A recent study [8] showed that while AI roles grew by 21% between 2018 and 2023, mentions of university requirements for these roles actually decreased over this period. Employers are prioritising skills over qualifications, clear evidence that universities have work to do to remain relevant.

2. Data, Statistics and AI as core subjects for technical disciplines

Especially for technical subjects, data science, statistics, and learning models should be treated as foundational subjects and not as specialisation options. There will be no technical field where some form of AI will not be part of the toolkit used by practitioners. Coding assistants as used by programmers represent just the tip of the AI-assisted revolution, which will reach all technical fields. Progressively, AI agents will take on the tasks of individual knowledge workers, so that each practitioner will effectively become a supervisor of AI agents. As such, they will need to be fluent in the language of AI – namely, data and statistics, as well as in the structure and methodologies used by the learning models. Key skills will be orchestration of models, managing multiple agents and being able to take professional accountability on their behalf through robust validation, particularly for high-stakes outcomes such as in health and safety-critical applications.

3. Basic AI Model Literacy for all disciplines

Some of the most professions most exposed to AI are those that traditionally fall into the humanities. A report by Microsoft into the most exposed professions sees interpreters at number 1 (perhaps unsurprisingly) and historians at number 2 (more surprisingly). A paper by researchers at Harvard Law School describes how AI can see productivity gains for lawyers in excess of 100 times, in one example, reducing associate time for preparing for litigation from 16 hours to 3-4 minutes. [9] Yet, reviewing the course material for five of the UK’s top law degree courses sees AI treated as part of “technology and law regulation” subjects, rather than a core part of the lawyer’s toolkit.

All knowledge workers should be equipped with an understanding of the characteristics, structures and limitations of AI models, particularly generative AI and other deep learning models. Their inherent non-deterministic nature combined with their propensity to hallucinate means that you’d have to be foolhardy to rely on the models without a thorough understanding of how they operate, at least conceptually.

4. Reinvent the value proposition

It is difficult to make predictions about such a fast-moving space, other than to say that such changes will continue for a while and will at times feel overwhelming. This means that Universities’ once-in-a-lifetime model of learning seems increasingly ill-equipped in today’s world. Just as the Open University pioneered distance learning fifty years ago, it is time again to be radical. Universities should shift towards offering long-term part-time learning models that accompany their students throughout their professional lives. Online learning providers such as Coursera, Udacity and Deeplearning.ai, all offer content from many providers. Universities should expand their partnerships with these providers – both making use of external content, as well as using these platforms to distribute their own lifelong educational offerings. Some Universities are already doing this. MIT has long been a leader in open AI programmes, and offers free online courses as well as certified qualifications through its MicroMasters programmes, with other top-tier universities also making courses available on platforms such as EdX and Coursera.

Predicting the future is hard

When assessing technologies close to the peak of the hype cycle, it is difficult to make predictions as to the extent of the disruption they will bring and how quickly they will be adopted. I wrote a piece back in 2018 on the inevitability of safe-driving cars, proof of the pitfalls of getting carried away with the excitement of ‘revolutionary’ technologies. Nevertheless, AI will very likely replace large chunks of what knowledge workers do today, radically reshaping their jobs in ways that are hard to predict. Ignore this at your peril.

References and Further Reading

The post What does AI mean for university education? appeared first on The Sand Reckoner.

How AI is changing engineering

Simon Fabri — Fri, 01 Aug 2025 13:25:47 GMT

I’ve been writing this blog intermittently for around ten years, exploring questions that intrigued me from a professional perspective. I started off exploring the then-emerging Internet of Things space in the early 2010s, during which time I took on roles building smart home and connected car products. Over the years, it evolved into a broader exploration of themes related to leading tech teams, while occasionally keeping an eye on advancements in machine learning and artificial intelligence.

In this blog post, the professional and personal coincide for the first time. As an engineer and leader of tech teams, I am very interested in understanding what the advances in artificial intelligence mean for how tech products will be built. On the personal front, I am currently visiting universities with my son, who is looking to start an engineering course. So the question as to what engineering will look like in five, ten, or twenty years is particularly pertinent. This blog is my attempt to explore that question.

A spotlight on software engineering

Software engineering is the technical discipline that to date has been most impacted by AI. At one level, the introduction of AI is simply extending the developer’s journey away from the native execution of assembly code, providing greater levels of abstraction, just as progressively higher-level languages have done today. However, LLM-based code assistants also fundamentally change the nature of coding. For the first time, they introduce a non-deterministic element, as LLMs do not create repeatable outputs. In effect, we are outsourcing control of the logic, the very essence of our intent, to machines.

So what’s the impact of AI coding assistants upon the developer experience and overall productivity? Early research shows a mixed picture. A recent report [3] by DORA, the DevOps Research and Assessment team, describes benefits as seen by developers, including flow, job satisfaction and a reduction in burnout. However, the gains at an organisational level appear to be modest, with a 25% increase in a user’s AI adoption increasing productivity by only 2.1%. While the quality of documentation, code quality and code review time improved significantly, this came at the cost of an increase in technical debt and code complexity. These gains in productivity are mirrored by a randomised control trial of 96 developers at Google, which showed a 20-25% increase in productivity in coding tasks attributable to AI.

Impact of AI coding assistants on organisational development effectiveness – Source DORA, 2025 [3]

Overall, the real-world impact on productivity at the enterprise level appears to be mixed. The DORA report suggests there is negligible, if any, improvement in software delivery effectiveness. Another paper by METR (Model Evaluation & Threat Research), a body assessing frontier AI capabilities, found that for developers working in a large, mature open-source repository, task completion time increased by around 20%. It appears that improvements in coding productivity at the developer level are, to date, offset by increased load on oversight, review and integration.

Some insight into what’s going on can be found in the recently released, and rather excellent, annual developer survey by Stack Overflow [15]. Although 84% of developers now use AI assistants, only 3.1% of them highly trust them, and 46% actively distrust them to some extent. The two main bugbears are that “AI solutions are almost right, but not quite” (66% of respondents) and “debugging AI-generated code is more time-consuming” (45% of respondents).

Even though we are still in the early days of AI adoption in software development, some things stand out:

AI is fundamentally changing the nature of software engineering. AI tools are here to stay, and they are changing the nature of coding and of the software that is created. For example, AI tools create non-deterministic, ‘fuzzier’, less predictable codebases. They are lowering the cost of building code from scratch, reducing the perceived benefit of creating reusable, modular code. These are pretty fundamental shifts in what developers have traditionally valued.
New skills are required to master software engineering in the age of AI. Software engineers now need to master a blend of traditional software skills and AI competencies. These include prompt and context engineering, AI-assisted code review, AI output evaluation, integration into AI APIs, an understanding of AI model behaviour, and a new security and compliance paradigm. Successful developers and organisations will be those who invest in learning to take on these new skills.
This requires a mindset shift. As Gen AI models become more tightly integrated into coding tools, development pipelines and documentation systems, the boundary between human and machine contribution will blur, requiring developers to increasingly work with AI tools as a co-contributing partner, rather than as a traditional coding tool.
Organisational effectiveness gains are currently lower than expected. The use of AI is leading to higher overall coordination costs, as AI-generated code tends to require more discussion, review or coordination across different developers. Additionally, the current unreliability of AI systems requires greater investment in security, validation and compliance oversight.
The AI benefit is not consistent across all developers. Most of the research indicates that junior engineers, working on smaller components, can benefit most from AI assistants. This mirrors the way they would ask senior colleagues for help [5]. However, these are the tasks most easily automatable, so AI appears to be disproportionately impacting developers who are just starting their careers. For more senior engineers, with responsibility for a large, complex codebase, the opportunities AI provides are more limited. For example, randomised trials show larger gains in productivity for junior engineers (~27-39%) than senior engineers (~8-13%) [6]. The greater the scope, the more complex the problem, the more difficult it is to solve a problem without creating knock-on effects elsewhere.
AI does not (yet) offer tech leadership. Yes, AI can help with coding, but it does not yet act as a substitute for non-coding tasks expected from senior engineers, including the creation of complex design documentation, technology roadmaps, arbitrating with other teams, or simply providing your teams with the vision of the way forward.

And the broader learnings for other engineering disciplines?

Having cast a spotlight on software engineering, what are the learnings for other engineering disciplines? Software engineering lends itself particularly well to the use of AI, as it often operates in a ‘fully digital’ space, where all the inputs and outputs are digital. There are often (though not always) fewer safety implications than for physical engineering applications. How will this be different for real-physical applications such as manufacturing, semiconductor design or aerospace?

First, let’s not forget that AI has been used in engineering design for well over a decade. Be it in the form of image recognition and robotics on an automated assembly line, the use of machine learning to optimise engineering designs or carry out predictive maintenance, and in the use of digital twins to model real-world systems, learning systems are not new. The question is rather, what will be the effect of generative AI, and its inherently non-deterministic nature?

1. AI for initial concept generation

Let’s start at the beginning of the engineering process. Much as AI co-pilots can help blog authors suffering from writer’s block, generative AI is already being used successfully by many engineering firms in the meandering and seemingly random process of creating initial exploratory designs. The integration of LLMs into physical CAD software already allows for design concepts to be generated from natural language prompts at a much higher level of abstraction. For example, a paper by Byrne et al shows how LLMs can be used to provide CAD designs for multiple concepts for consumer products, in this case, headphone stands, by giving it instructions in natural language [10].

FlecheTech, a Swiss startup, uses a fine-tuned LLM to produce PCB [printed circuit board] prototypes for hobbyists, small companies and anyone who requires prototype designs, but lacks the in-house expertise to design it from scratch. FlecheTech claim that this reduces the time to create a working board from around 8 weeks to one week [15]. These are both examples where generative AI’s non-repeatability and variation are actually a strength, as it can produce multiple variations as designers explore early concepts.

2. AI for design automation

Early ideation occupies only a small proportion of the overall engineering development effort. If AI is really going to transform engineering, it is going to be in the more structured design and development phases. Consider semiconductor design, which for decades has relied on Electronic Design Automation (EDA) software as the only way to deal with the extremely high complexity of modern processors. Semiconductor design includes processes such as design specification, front-end design, physical design validation and test and analysis. NVIDIA estimates that up to 60% of a chip designer’s time is spent in debug or checklist-type tasks [11]. In fact, the hardware design process is not dissimilar to software development, as the chip designs are fully described in software. LLMs therefore have the same potential to design and optimise code and synthesise information as they do in the software development space, with leading EDA providers such as Synopsys already building generative AI capabilities into their tools.

AI opportunities within the semiconductor design process – Source: AWS [11]

There are several other examples of companies using AI to automate engineering processes. CloudNC, a British startup, is using AI to automate the process of programming CNC (computer numerical control) machines, which are the mainstay of precision manufacturing [14]. These machines cut metal to precise specifications required by industries such as aerospace and medical devices. Although CloudNC don’t disclose how their algorithms work, it appears to use a combination of AI-based training and physical geometry to determine the optimal cutting strategy. These examples show that, whilst not yet as pervasive as for software design, generative AI solutions are beginning to appear across multiple technology areas.

3. AI for complex engineering optimisation

One area where AI is already having a big impact in engineering design is in engineering optimisation – the process of developing and testing different designs to see which operates best in a given set of conditions. Traditionally, this has been a fairly manual process, and even when reliant on simulations, it is very computationally-intensive. Neural Concept, a Swiss engineering software company, has developed neural-network-based software to help Formula 1 teams create the most effective aerodynamic designs possible [12]. Formula One car development is a race in itself, with each team racing to create the most effective car package possible before and during a season. Neural Concept’s software helps create outcomes in seconds rather than hours, conferring a clear competitive advantage. Similarly, Airbus is making use of AI to create 3D printed structural components using generative algorithms mimicking organic structures. [17]

Recent research has also shown that the combination of reinforcement learning (RL) and LLMs can outperform engineering design as carried out by a human expert. A recent paper explored the use of commonly available LLMs, such as OpenAI’s GPT-4o and Anthropic’s Claude 3.7 to design a rocket that met a given set of performance criteria, such as achieved altitude, structural integrity and landing accuracy. The combination of LLMs and reinforcement learning was found to exceed the performance of an expert who has carried out similar rocket design tasks over many years.

Designing rockets: A human expert compared with LLMs + Reinforcement Learning – Source: Simonds et al.

So what does all this mean for the future of engineering?

Given what we have seen from how generative AI is being used in software development, and the examples in other engineering areas, some themes begin to emerge.

We have not even begun to scratch the surface. First, a caveat. We need to consider that it has been barely three years since LLMs emerged from their research labs, and their capabilities are evolving at breakneck speed. For example, this blog hasn’t considered the impact of agentic systems, or indeed of artificial general intelligence (AGI). That said, I am pretty confident that the lessons below will hold true.
Data really matters. AI-assisted systems can only work with whatever data they are provided with, be it in the form of data to refine or pre-train models, or feedback to guide their systems through reinforcement learning in tight verification loops. Therefore, no matter the engineering discipline you are involved in, you will need to be fluent in capturing, managing and processing data, not just design files, but real-world physical data, and ensuring that it is available for use with your AI-assisted platforms.
Understand the black box. When we use generative AI models, we are, for the first time, outsourcing the logic to a machine that we currently don’t fully understand. This issue, called the interpretability problem, means that it is critical that engineers at least understand the underlying limitations as well as the benefits of using such models. These characteristics include the tendency to hallucinate, difficulty in repeatedly creating consistent outcomes and an excessive sensitivity to how the prompts are crafted. The list goes on. They are undoubtedly incredibly powerful tools, but they come with significant downsides.
Oversight is critical. The time spent ‘engineering’ outcomes will shift away from developing and designing individual components or subsystems. These are tasks that, with the right instructions and data, can easily be outsourced to AI assistants. Instead, AI-assisted engineering will require greater emphasis on shaping and crafting the requirements, contexts, specifications, and other input data to be used by AI-powered tools. Similarly, it will be critical to invest in the oversight of outcomes, in other words, validation. This is to ensure that the system as a whole still works well (i.e. the system integrates correctly as a whole) and to ensure that safety, security and compliance requirements are not compromised. In other words, each engineer will effectively become the supervisor of their AI team.
Finally, domain expertise remains key. As I hope this blog has made clear, AI systems are only as good as the context that they are provided with. Without a specific and clear context, AI systems can produce great outcomes that either don’t solve the problem at hand or fail to work in their intended environment. This means that the engineer’s insight into what is really needed to succeed in a given context is critical. This engineering input, or tasking, could take different forms. It could be the requirements upon a component or sub-system for it to work well in a broader system, or the features or capabilities will meet the customer’s needs.

Finally…

Writing this blog post has been a bit of a journey, and the key learning for me is that engineering is becoming simultaneously more complex, but also much more exciting. Used well, AI tools can turbo-charge the creative potential of individual engineers. However, human engineering judgement will remain as critical as ever. Engineers will need to be multi-disciplinarians, mastering both their domain of specialism as well as the fundamentals of the AI models they use.

Finally, while I tried to write about the future of engineering, what I ended up exploring was, in effect, the present and near future. I have not even begun to consider the implications of systems of AI agents interacting with each other has upon how technology development is carried out. That, my friends, is a topic for another day.

References and Further Reading

The post How AI is changing engineering appeared first on The Sand Reckoner.

Artificial Intelligence Good Reads (Part 2)

Simon Fabri — Sun, 21 Jan 2024 20:37:41 GMT

This blog post continues a list of articles, sources and papers on artificial intelligence that I started in 2023 (see here). The post became a bit too long and unwieldy to maintain in a single page, so have broken it out for the start of 2024. Enjoy!

Some AI Forecasts for 2024 (1)

Alberto Romano, who maintains a thoughtful blog called Algorithmic Bridge, observes that the benchmark LLM at the end of 2023 was GPT-4, which was built in 2022, and other models, including Gemini, Llama or Claud have not come close yet. He therefore speculates that we may be reaching the limit of what is possible with current technology. On the other hand, he muses at what is next from Meta, who have invested $20 billion on AI compute(!) while Sam Altman hints that GPT-5 will be a lot better than GPT-4.

Clues Say Generative AI’s Future Will Be Revealed in 2024
To Zuckerberg and Altman: Alea Iacta Est

Some AI Forecasts for 2024 (2)

In a paper for VentureBeat, Gary Grossman notes that AI is at the peak of the hype cycle. For example, you’d be hard-pressed to find a CES announcement that didn’t include some form of AI spin. Nevertheless, despite headwinds in the form of concerns about compute cost, environmental implications, training and data bias, security, copyright and hallucination, AI advancements are likely to continue to build upon one another. Grossman believes that Amara’s Law applies: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.”

After AI’s summer: What’s next for artificial intelligence?
The most optimistic promises of AI will likely not be realized in 2024. Hopefully, any disappointments will not result in another AI winter.

How to Rank LLMs like Chess Grandmasters

With advances in large language models improving so quickly, it is quite a challenge to rank them against standard benchmark tests. The most useful ranking system is maintained by Hugging Face, who use an approach called Elo ranking, more commonly used to rank players in zero-sum competitive games such as chess to rank LLM-powered chatbots. The current ranking, dominated by GPT-4, is put together by crowdsourcing 200,000 preferences.

LMSys Chatbot Arena Leaderboard – a Hugging Face Space by lmsys
Discover amazing ML apps made by the community

AI as a National Strategy

A couple of weeks ago, an interesting article in The Economist explored how different governments are pursuing AI industrial policy – a blend of scientific, investment and geopolitical strategy. Unsurprisingly, the USA dominates VC investment, whilst China and the United States are neck-and-neck when it comes to government investment, primarily for chip fabs. Cash-rich petrostates are ploughing money into GPUs and building their own models (such as Abu Dhabi’s Falcon LLM), though analysts are sceptical of the potential for government-sponsored AI models. (subscription required, though article may be downloaded for free)

Welcome to the era of AI nationalism
Sovereigns the world over are racing to control their technological destinies | Business

The post Artificial Intelligence Good Reads (Part 2) appeared first on The Sand Reckoner.

Artificial Intelligence Good Reads – Dec 23

Simon Fabri — Sat, 14 Oct 2023 19:28:58 GMT

A list of Artificial Intelligence Good Reads, trying to make sense of the fastest moving space in tech since the advent of the Internet. This is the smallest tip possible of an unmanageable iceberg. Here are a handful of initial articles. I will update this over the next few days. Enjoy! (Image courtesy of DALL-E-2)

December ’23

EU agrees new AI legislation

The EU Commission and European Parliament this week agreed the outline of an EU-wide AI Act that aims to provide safeguards on the use of advanced AI models. The proposed legislation forbids the use of AI algorithms in a number of application, including the use of tracking people through facial recognition (except for law enforcement) or for the purposes of ‘social scoring’. Companies will also have transparency obligations on the inner-workings of advanced models and the data that was used for training, and will be required to comply with a number of safety mechanisms including risk assessments, benchmarking and adversarial testing. It is notable that companies such as OpenAI and Google’s DeepMind have so far resisted calls for this sort of disclosure.

EU agrees landmark rules on artificial intelligence
Legislation lays out restrictive regime for emerging technology

Google announces its Gemini family of multimodal AI models

Just last week, I wrote about Mirasol3B, a multimodal AI model, and this week that news is already old hat, as Google announced their flagship generative AI model, Gemini. This will be available in three flavours. The most performant of the three, known as Ultra, is said to outperform GPT-4 on most benchmarks. It is however not available for public use, and the version that is currently integrated into Bard is based on a less performant Pro version. (See here for the full technical report, and a critique here). Having played around with Gemini Pro on Bard, the experience is fairly similar to ChatGPT (based on GPT-4), but it clearly has a more up-to-date feature set. Looking forward to the general availability og Gemini Ultra.

Introducing Gemini: our largest and most capable AI model
Gemini is our most capable and general model, built to be multimodal and optimized for three different sizes: Ultra, Pro and Nano.

MLOps – A primer

Whilst everyone involved in tech will be familiar with DevOps, the set of software engineering practices that span coding through to operation, its equivalent in the AI space, MLOps (Machine Learning Ops) is not as well known. The behaviour of machine learning and AI models is less predictable than traditional software, as it depends on the data used for training, the model structure and its parameters, the inputs used in production, as well as the software that hosts and interacts with the model. As such productising machine learning in a way that is safe, reliable and repeatable requires a structured approach to how data, models and software are managed, based very much on DevOps principles. Databricks provides a nice summary of the key elements.

What is MLOps?
MLOps is a core function of Machine Learning engineering, focused on streamlining the process of taking ML models to production, and then maintaining and monitoring them.

Older posts…

Google’s new multimodal AI model

As AI models become ever more sophisticated, one of most challenging problems is how to combine different media types together. Video, audio and text data all have very different characteristics in terms of how they are represented in data as well as the AI models used to process them. This means that creating an AI model that can manipulate all forms of media is proving to be a big challenge. A couple of weeks ago, Google DeepMind announced a new model, called Mirasol3B that implements multimodal learning across audio, video and text in an efficient way. The draw for Google is obvious – how can it combine its vast YouTube catalogue in a meaningful way with its enormous largely text-based search engine. Although benchmarking indicates that this model may have broken new ground, researchers have criticised it for the opaqueness on how it works.

Google DeepMind breaks new ground with ‘Mirasol3B’ for advanced video analysis
Google DeepMind announces Mirasol3B, a new multimodal AI system for understanding long videos, but questions remain about real-world applicability.

Using the Human Brain as a template for more efficient AI models

Although it is often claimed that neural netwrorks are modelled on human brains, the vast amounts of material that generative language or image models consume during their training bears little resemblance to how humans learn. Humans are clearly much more constrained in terms of the energy they consume when learning or problem solving. In a recent paper in Nature Machine Intelligence magazine, scientists from the University of Cambridge have sought to model an artificial neural network that contained constraints similar to those found in a human brain. The research showed that these constraints influenced the model to seek more efficient ways of solving problems. This obviously has very interesting implications for the development of AI models, particularly in designing systems that are both adaptable as well as efficient.

Physical Constraints Drive Evolution of Brain-Like AI – Unite.AI
In a groundbreaking study, Cambridge scientists have taken a novel approach to artificial intelligence, demonstrating how physical constraints can profoundly influence the development of an AI system. This research, reminiscent of the developmental and operational constraints of the human brain,

Anthropic’s ChatGPT rival sets an important benchmark

Anthropic, the startup backed by Amazon and Google, announced that Claud 2.1, its Large Language Model can process inputs with up to 200,000 tokens at once, equivalent to 500 pages of text. For comparison GPT-4 supports a token length of 8,000 or 32,000, depending on the model used. Token length, also known as the context window, is important as it represents the quantitiy of input information that it can consider when generating text. For example, this sets the upper limit on text it can summarise, or sets a limit before it can no longer ‘remember’ the previous context.

OpenAI rival Anthropic makes its Claude chatbot even more useful
Claude can now handle double the number of tokens with half the hallucinations and gets new API tools in Anthropic’s latest chatbot update.

The hidden manual effort in creating AI models

Many generative AI systems, such as OpenAI’s GPT family of Large Language Models make use of human labelling to fine-tune and improve the prediction models, in a technique called “Reinforcement Learning from Human Feedback” (RLHF). A Wired article last month explored who carries out this data labelling. The article described how workers in places such as Venezuela, Colombia, east Africa, the Philippines and Kenya manually label images, outputs from large language models as part of their training process.

Millions of Workers Are Training AI Models for Pennies
From the Philippines to Colombia, low-paid workers label training data for AI models used by the likes of Amazon, Facebook, Google, and Microsoft.

The Open Source vs Proprietary AI Models Faultlines

A row has escalated in the past few weeks over the relative threats to public safety of open-source and proprietary AI models. Meta, who famously released the inner workings of its Llama 2 models has come under criticism by some safety advocates who claim this lowers the bar for malicious third parties to use LLMs for nefarious purposes such as cybercrime or developing harmful biological or chemical agents. Unsuprisingly, OpenAI and Meta are on opposite sides of these faultlines, with Sam Altman, OpenAI’s CEO claiming that its closed proprietary model provides the best safeguards against exploitation. Yann LeCun, Meta’s head of AI and one of the godfathers of AI development, strongly makes counter-arguments that the very nature of closed AI systems makes their risks unknowable and can create a monopoly that concentrates humanity’s knowledge into black boxes is a threat to democracy and diversity of opinion.

Protesters Decry Meta’s “Irreversible Proliferation” of AI
But others say open source is the only way to make AI trustworthy

AI one-percenters seizing power forever is the real doomsday scenario, warns AI godfather
The real risk of AI isn’t that it’ll kill you. It’s that a small group of billionaires will control the tech forever.

AI’s Regulatory Outlook

As big tech argues about the merits of open-sourced vs proprietary models, governments around the world are trying to figure out the best way to keep their citizens safe. Some AI luminaries such as Geoff Hinton, Elon Musk and Sam Altman are warning against the existential risks of artificial general intelligence (AGI), while others, including Meta, are more concerned about the more prosaic risk to competition of ‘winner takes all’ economics and the cost of regulatory compliance on open source models. Last week, the US Government issued an Executive Order which places requirements on the establishment of guidelines and best practices, carrying red teaming exercises on large models and requiring disclosure on how large-scale computing clusters are used. This was folllowed by the ‘Bletchley Declaration’ at the AI Safety Summit hosted in the UK that outlines an international consensus on the need for scientific and policy collaboration in the face of AI risk, but was somewhat short on practical measures.

Global leaders scramble to regulate the future of AI
AI is now an issue of global import. Why the next few years will be crucial to balancing its promise with ethical and societal safeguards.

Joe Biden’s Sweeping New Executive Order Aims to Drag the US Government Into the Age of ChatGPT
President Joe Biden issued a wide-ranging executive order on artificial intelligence with measures to boost US tech talent and prevent AI from being used to threaten national security.

Dealing with Prompt Hacking

Despite their size and sophistication, large language models (LLMs) are particularly sensitive to the instructions, or ‘prompt’ used to generate an outcome. In its benign form, this is sometimes called ‘prompt engineering’ and a quick scour of the web throws up prompt templates for anything from creating a CV to answering a high school essay question. The darker side of prompt engineering is ‘prompt hackingexploits’ which uses carefully constructed prompts to work around safeguards and bypass protections in the model. This includes ‘indirect prompt injection’ which exploit the ability of many LLMs’ to ingest data as part of the query. The combination of LLMs’ inherent opaqueness of their inner workings, plus their ability to make use of data of potentially concerning origins, means that they should always be treated with caution, following the cybersec principle of least privileges.

Generative AI’s Biggest Security Flaw Is Not Easy to Fix
Chatbots like OpenAI’s ChatGPT and Google’s Bard are vulnerable to indirect prompt injection attacks. Security researchers say the holes can be plugged-sort of.

State of AI 2023

Nathan Benaich, of Air Street Capital, a VC focused on AI ventures, issues an annual “State of AI report”. The 2023 issue was published a couple of weeks ago and highlights that although GPT-4 currently sets the benchmark in terms of large language performance (it is actually a multi-model model, as it was trained on text and images), efforts are growing to develop open source models that match the performance of proprietary models. Interestingly, the largest models are running out of open human-generated data that can be used to train them. The report provides a nice summary of the key research highlights of the year, where industry is investing, the political and safety implications and predictions for 2024. At 163 slides, it is a fairly hefty read.

State of AI Report 2023
The State of AI Report seeks to trigger informed conversation about the state of AI and its implication for the future.

October ’23

Towards AI Safety (1) – On Mechastic Interpretability

One of the (many) challenges in understanding how neural networks really work is that there is no easily-observable logic to how they actually work. They are not governed by simple mathematical relationships, making them very difficult to diagnose problems, or explain why they predict certain outcomes. The same is true for neuroscientists, who struggle to understand the function of individual neurons in the brain. Anthropic recently published a paper that describes how a large language model can be decomposed into more coherent features, e.g. describing themes such as legal language or DNA sequences, and then use another LLM to generate short descriptions of the features, through a process called “autointerpretability”. This work is aimed to provide a mechanistic model of how neural networks work in order to overcome concerns about their use in safety or ethically-sensitive applications.

Decomposing Language Models Into Understandable Components

Towards AI Safety (2) – Watermarking AI Images

In July, the White House announced that the main large tech companies had agreed to deploy watermarks to determine in a robust way whether they have been generated by their generative AI models. Watermarking involves adding patterns that are difficult to remove from the image that attest to their origin. The Register however reports that a team at the University of Maryland demonstrated that they were able to attack such schemes, both degrading the watermark (i.e. allowing an AI-generated image to skip detection) as well as making non-watermarked images appear as though they are AI-generated. It seems that there is way to go until we have robust systems to identify deepfakes.

Academics blast holes in AI-made images’ watermark security
Basically, it’s ‘not going to work’

Towards AI Safety (3) – Tackling Algorithmic Bias

As AI models, including generative AI models such as ChatGPT, Stable Diffusion, as well as other tools such as classifiers, fraud detectors, credit scoring algorithms and so on are trained on large sets of data, often from the Internet and other public datasets, they contain and reflect the biases contained within them. So, we have seen how a recruitment tool used by Amazon a few years ago showed bias against women as it was trained on the male-dominated applicant pool, while more recently AI-generated barbies displayed all the national stereotypes you might fear (including a German SS officer Barbie!) . An Article in Vox recently provided an overview on why fixing AI bias is so difficult, why it is everywhere, who is most negatively impacted (no prizes for guessing!) and some of the initiatives underway to tackle the problem.

AI automated discrimination. Here’s how to spot it.
The next generation of AI comes with a familiar bias problem.

AI’s Environmental Impact

A recent article in the NY Times quoted an analysis that predicted that by 2027 AI servers would use between 85 to 135 TWh annually, equal to the entire electricity consumption of Sweden, and 0.5% of the world’s electricity. For context, all data centres in 2022 consumed between 1 and 1.3% of the world’s electricity (excluding crypto-mining), and in 2021 was already responsible for 10-15% of Google’s electricity consumption. As many AI training techniques are at their infancy, as AI scales further, a paper by MIT predicts that the algorithms will be optimsed for efficiency and not only for accuracy, for example by stopping underperforming models or tracks early. There is definitely a sense that so far, in the race for AI headlines, all secondary considerations have been put aside. Even if companies and cloud providers are not driven by environmental concerns, the likely constraints on GPU supply will be sufficient motivation to push for algorithmic efficiency.

The environmental impact of the AI revolution is starting to come into focus
Google’s use of AI alone could use up as much electricity as a small country – but that’s an unlikely worst-case scenario, a new analysis finds.

Generative AI copyright gets thorny

While Hollywood actors and writers are on strike, partly trying to protect themselves against AI-created derivatives of them and their work, a US federal court has rules that generative AI work will not be protected under US copyright law. In the ruling, the judge stated that although “copyright is designed to adapt with the times… (it) has never stretched so far, however, as to protect works generated by new forms of technology operating absent any guiding human hand.” In a related development, Microsoft have decided that they will underwrite and defend any customer of its Copilot AI services (which includes its generative coding tools) from any copyright infringement suits.

AI-Generated Art Is Not Copyrightable, Judge Rules
A federal court ruled on August 18 that AI-generated artwork cannot be copyrighted because copyright law only extends to human beings, per The Hollywood Reporter.

A closer look at Meta’s Open-Source alternative to Chat GPT

We have already looked at the impact of open-source large language models as an alternative to ChatGPT. These have the clear advantage that they can be used and trained on domain-specific data and thus be optimised to solve specific tasks, as exemplified by Bloomberg’s financial analysis model. Moreover, as the underlying model and training approaches are public, their strengths, limitations, biases and vulnerabilities are open to inspection. This article provides an overview of Lllama 2 as well explaining how to get your hands on the model and have a play with it yourself. The open source nature of LlaMa2 has stimulated an industry’s worth of activity tailoring it for specific applications, such as Colossal-AI’s large-scale training solution.

Llama 2: A Deep Dive into the Open-Source Challenger to ChatGPT – Unite.AI
Large Language Models (LLMs) capable of complex reasoning tasks have shown promise in specialized domains like programming and creative writing. However, the world of LLMs isn’t simply a plug-and-play paradise; there are challenges in usability, safety, and computational demands. In this article, we

Exploring an LLM software stack

An article describing Colossal AI’s large-scale LlaMa 2 model training solution also provides a pretty clear outline of a typical software stack for training and operating large language models. Worth reading just for this.

Artificial Intelligence at the Edge – Looking at TinyML

Much of the discussion relating to the future of machine learning and generative AI models focuses on how future applications will require larger, more computationally-expensive models, with more parameters and much larger training sets. There is however an alternative approach, which instead considers how best to distribute algorithms on cheap low-power edge devices. The advantages are obvious. If AI algorithms can be run on end devices, be they cars, home sensors, health monitors, agricultural sensors and so on, they do not need to be hosted and processed on a server somewhere. For large systems, this has a significant impact on system resilience, responsiveness and privacy, as devices can be intelligent themselves, rather than sending all data for processing elsewhere. Light-weight models such as TensorFlow lite are designed to operate on the smallest, most power-efficient chipsets to bring AI processing out of the data centre to the real world.

TinyML: Applications, Limitations, and It’s Use in IoT & Edge Devices – Unite.AI
In the past few years, Artificial Intelligence (AI) and Machine Learning (ML) have witnessed a meteoric rise in popularity and applications, not only in the industry but also in academia. However, today’s ML and AI models have one major limitation: they require an immense amount of computing and pro

Updated 2 July

A dissenting view. Why AI may not be as revolutionary to the world economy as we are assuming.

This essay takes a number of steps back and frames AI in the context of its likely long-term societal impact. In other words, where will it sit compared to the inventions, say, of agriculture or the steam engine? The authors, including The Economist‘s economics correspondent, take a refreshingly dissenting view. Challenges that will limit its impact include: drowning in information and content generated by AI, legal constraints of getting computers to make decisions impacting humans, the not insignificant issue of physical world interactions, and the challenge of mimicking human expertise. I don’t agree with everything that is said here, but as a counterpoint to AI hype, this is a fantastic read.

Why transformative artificial intelligence is really, really hard to achieve
A collection of the best technical, social, and economic arguments Humans have a good track record of innovation. The mechanization of agriculture, steam engines, electricity, modern medicine, computers, and the internet-these technologies radically changed the world. Still, the trend growth rate of GDP per capita in the world’s frontier

Will we survive the flood of AI content?

In a previous release of this list, we referenced a paper that touched on the value of human work where AI-generated content is assessed by other AI models. An essay this week by Alberto Romano asks what happens when AI-generated content increases to the point where most content is created by statistical models designed to mimic human coherence. Only this week an author of fiction observed that 81 out in a top 100 chart of self-published Kindle content were AI-generated. Search engines and social media algorithms were tuned to identify an promote content people will find more useful. How will this work when most content is computer generated?

How the Great AI Flood Could Kill the Internet
The web has become a place of passive consumption that AI hustlers won’t hesitate to exploit

Meta’s Transparency Quest

Meta, is in many ways a surprising AI pioneer. No only is it making some of the more interesting forays into open-source models, (see links below), but it is also in explaining how all the relevant recommender and ranking mechanisms work across Facebook and Instagram. While the algorithms themselves are not described in any detail (assume Meta consider these to be their ‘secret sauce’), you can find all the input signals that are used to rank content. This is a lot more than we have learnt from the likes of OpenAI. See here for an interview on Nick Clegg, Meta’s VP of global affairs (and former UK deputy PM).

Our approach to explaining ranking
At Meta, we define AI as the capability of computer systems to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. As part of our ongoing commitment to transparency, we provide tools and information to help you understand how AI at Meta works.

Not so Massive. Device-optimised generative AI

Large language models and large diffusion models for image generation, are, well, large, and are typically limited to server-based deployments. This week Google published a paper that describes how they have optimised the memory needs for large diffusion models, such that they can support image creation from text input in under 12 seconds on a high-end smartphone GPU. While it still requires 2GB of RAM to hold the model parameters. Whilst this paper speaks to the optimisations required for generative adversarial networks, I suspect, we will see a greater focus in reducing the size of inference models to allow for deployment on devices.

Speed is all you need: On-device acceleration of large diffusion models via GPU-aware optimizations
The proliferation of large diffusion models for image generation has led to a significant increase in model size and inference workloads. On-device ML inference in mobile environments requires meticulous performance optimization and consideration of trade-offs due to resource constraints.

Operationalising Machine Learning

With most of the discussions online focusing on AI models, their applications and real-world impact, it is easy to ignore the engineering discipline required to keep an AI product maintainable, reliable and secure. Just as DevOps principles allow software engineers to reliably and frequently release code that works, MLOps (Machine Learning Operations) are essential for creating an auditable, testable, end-to-end ML/AI pipeline from data ingestion, through to model training and tuning, and deployment and management. The AWS blog site provides an overview of how AWS supports MLOps, while Google describes how to implement an automated AI pipeline on their Google Cloud platform.

What is MLOps and Why Does It Matter?
Discover the game-changing approach to managing machine learning workflows with MLOps, and learn why it’s essential for scalable, reliable, and reproducible AI.

AI Security Models

The OWASP (Open Worldwide Application Security Project) Foundation is a community producing open-source tools and best-practice for application software security. Recently it has published a guide on AI security and privacy, tying into broader software good practice and then tying into AI-specific attack surface areas and vectors. A good place to start exploring privacy and security considerations relating to AI models.

OWASP AI Security and Privacy Guide
Guidance on designing, creating, testing, and procuring secure and privacy-preserving AI systems

How self-learning models can outperform much larger LLMs

LLMs are often characterised by the size of the model (parameters) and their training data, with the presumption that bigger is better. Large models however come with the obvious disadvantages of computational cost and privacy protection. However a paper by an MIT professor describes how implementing self-training model, where the AI model uses its own prediction (via a process called textual entailment) to teach itself without human intervention. The resulting 350m parameter model outperformed models such as Google’s LaMDA and GPT models.

MIT researchers make language models scalable self-learners
MIT CSAIL researchers used a natural language-based logical inference dataset to create smaller language models that outperformed much larger counterparts.

The secret inside GPT-4

OpenAI has kept the inner workings of GPT-4 well under wraps, maintaining their aura as leaders in generative language models, with much speculation as to how GPT-4 is able to outperform its predecessors. A blog post this week says that there isn’t an underlying algorithmic or model breakthrough or a much larger model. Instead, GPT-4 connects 8 different models together in an unspecified way. It feels to me that OpenAI is now moving towards keeping the fundamental model structures to itself, presumably to maintain the edge in real-world performance it appears to have over its competitors.

GPT-4’s Secret Has Been Revealed
Unraveling OpenAI’s masterful ploy

The thorny issue of copyright and generative AI

The implications of the use of intellectual property being used in training sets of generative AI models, first arose with image-generating AI and has spawned a number of lawsuits, including one by Getty Images against Stable AI for copyright. It is however virtually impossible to tell with certainty whether a work was including in the training of a Large Language Model. For this reason, the EU are proposing to require companies to disclose the data used when training their models, so as to protect the creators of intellectual property, thereby creating licensing models for AI. We are beginning to see the beginnings of these commercial relationships, with Microsoft, Google, OpenAI and Adobe negotiating with media and information organisations such as News Corp, Axel Springer, New York Times and the Guardian.

Reported EU legislation to disclose AI training data could trigger copyright lawsuits
The EU is reportedly considering a provision in its upcoming AI Act that would force companies to disclose the sources of their training data. This could lead to lawsuits by owners of copyrighted data used for training.

GPT-4 is pretty good at Maths too (or Math, if you are American)

In a recent paper called “Let’s Verify Step-by-Step”, researchers at OpenAI describe how GPT-4 has improved the underlying model to solve maths problems. They applied a supervision model during training called “process supervision” in which feedback is provided on each step of the reasoning, and not simply on the outcome. The paper (see here) shows that this optimised model, based on GPT-4 can successfully solve 78% of problems in a test problem set, though does not provide information for vanilla GPT-4. Nevertheless, having experimented a bit myself, it is clear that GPT-4 currently is a lot more reliable at maths problems than GPT-3.5.

OpenAI improves GPT-4’s mathematical reasoning with a new form of supervision
OpenAI shows an AI model that reaches the state of the art in solving some mathematical problems. Is the method used the future of GPT-4?

70% of developers aim to use AI

A survey by the developer chat site Stack Overflow released a couple of days ago showed that 44% of developers already use AI tools when coding, and a further 26% plan to do so soon. GitHub Copilot, a code completion tool is by far the most popular AI-powered developer tool, while ChatGPT is unsurprisingly the most used AI-enabled search tool. This clearly has implications for AI usage policies for companies, which are largely reticent to use such tools do the risk of IPR leakage.

70% of Devs Using or Will Use AI, Says Stack Overflow Survey
ChatGPT is most popular among AI search tools. The global study of more than 90,000 developers also noted declines in cloud provider usage.

Google introduces a framework for Secure AI.

Unimaginatively titled Secure AI Framework (SAIF), this initiative by Google brings good infosec practice into the artificial intelligence space. Its six pillars include building on established cloud and infrastructure security principles, implementing perimeter monitoring of inputs and outputs of AI models to detect anomalies, building the ability to scale to deal with automated attacks and creating fast feedback loops for vulnerability detection and mitigation.

Introducing Google’s Secure AI Framework
Today Google released released the Secure AI Framework to help collaboratively secure AI technology.

GPT-4’s understanding of the world’s geography

There have been several studies exploring large language models’ ability to understand different categories of information, including software, exam curricula and literature. In a paper published on arXiv, scientists at a number of universities discover the remarkable geographic understanding within GPT-4, which is able to return basic geographic data such as socio-economic indicators, and physical geography such as topography. More impressively, it can carry out route planning and figure out routes, such as key transport routes (maritime, rail and air). Whilst subject to hallucinations and missing data, it is really quite impressive.

Click to access 2306.00020.pdf

Bloomberg’s purpose-built finance large language model

We have already seen many papers and articles on how generalist LLMs such as GPT-3 can be applied to solve problems across a broad range of domains. A couple of months ago, Bloomberg announced BloombergGPT^TM , an LLM based on BLOOM with a 363 billion token dataset created from Bloomberg’s private archive of financial data, and augmented with a 345 billion token public data training set. In a paper published on arXiv, Bloomberg claims that this model outperforms larger, more-general purpose LLMs for tasks such as financial sentiment analysis, named entity recognition, and conversational reasoning of financial data. This is a remarkable case study for anyone considering creating a domain-specific LLM.

Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance

Will incumbents’ moats see off the waves of AI start-ups?

A few weeks ago, I explored the much-publicised clarion call of a Google researcher who claimed that Open Source will eat Google’s AI breakfast. Alberto Romero, a researcher at Cambrian AI takes a dissenting view, arguing that open-source models are tuned against closed-source models, through a process called self-instructing and consequently without incurring the prohibitive cost of training an LLM from scratch, open-source models will struggle to compete. Secondly, incumbents have access to millions of customers, which gives them an unparalleled route to market. We have already seen how Microsoft is integrating GPT into GitHub and its Office suite of productivity software, while Adobe’s FireFly has been wowing Photoshop users. Watch this space.

Open Source AI Is Not Winning-Incumbents Are
Progress in open-source (OS) generative AI (particularly language models, LMs) has exploded in recent months. As a consequence-and with the help of desperate internal confessions -people believe it has become a threat to incumbent companies like Google and Microsoft, and leading labs like OpenAI and Anthropic.

AI-assisted Writing and the Meaning of Work

Ethan Mollick, a Wharton professor, is one of the most insightful observers of the implications of AI. As Microsoft prepares to embed GPT-4 within its Office suite, it is surely only a matter of a few weeks before AI-assisted ‘Word’smithing becomes commonplace. What meaning and value do we assign to work that has only taken a few prompts to a generative AI system to create? The efficiency gains will be amazing, but expect it to be a bumpy ride.

Setting time on fire and the temptation of The Button
I saw a bit more of the future of AI at work this week, and it shows every sign of vastly boosting productivity, while also causing a crisis of meaning in many organizations. For such a dramatic statement, the actual bit of AI technology I got to experience this week is incredibly minor.

Meta’s Multilingual Model supports over 1000 different languages.

We have already seen below the strides Meta is making, including on multi-modal foundation models. The rate of innovation shows no signs of abating, having recently open-sourced a multilingual model that was trained on 1,162 languages. To overcome the dearth of labelled datasets for many of the world’s languages, the researches used textual and audio data from religious texts, including the Bible, before unsupervised learning was applied to a further 4000 languages. The model was made available through extensions to Facebook AI’s popular PyTorch library.

Meta AI Launches Massively Multilingual Speech (MMS) Project: Introducing Speech-To-Text, Text-To-Speech, And More For 1,000+ Languages
Significant advancements in speech technology have been made over the past decade, allowing it to be incorporated into various consumer items. It takes a lot of labeled data, in this case, many thousands of hours of audio with transcriptions, to train a good machine learning model for such jobs.

The genesis of ChatGPT

The MIT Technology Review has a great story on how ChatGPT was released. What really stands out is how surprised the team at OpenAI were by how it became a viral sensation, and how the importance of accuracy has increased now that it is acting effectively as a search engine.

The inside story of how ChatGPT was built from the people who made it
Skip to Content Exclusive conversations that take us behind the scenes of a cultural phenomenon. When OpenAI launched ChatGPT, with zero fanfare, in late November 2022, the San Francisco-based artificial-intelligence company had few expectations. Certainly, nobody inside OpenAI was prepared for a viral mega-hit.

Will AI veer towards open or closed-sourced models?

Although Meta (i.e. Facebook) has not quite been hitting the AI headlines, its team, led by YannLeCun, is very active. One of its most significant contributions has been the release of LLaMA, a large language model trained on 1.4 trillion parameters. By sharing the code, the Meta team is hoping to drive faster innovation, particularly in adapting it to different use cases. Likewise, Stability AI has open-sourced its text-to-image model in the hope of benefitting from innovation amongst developers. Although a researcher at Google claimed that fighting Open Source is a “losing battle”, the size of these models means that they are trained by large tech companies, and it is unclear for how much longer this openness will persist.

The open-source AI boom is built on Big Tech’s handouts. How long will it last?
Skip to Content Greater access to the code behind generative models is fueling innovation. But if top companies get spooked, they could close up shop. Last week a leaked memo reported to have been written by Luke Sernau, a senior engineer at Google, said out loud what many in Silicon Valley must have been whispering for weeks: an open-source free-for-all is threatening Big Tech’s grip on AI.

Updated 23 May

Meta’s Multimodal AI Models

Multimodal AI models link different content types (e.g. text, audio, video) into a single index or ’embedding space’ and are increasingly a subject of much research. For example, AI image generators such as Midjourney and DALL-E link text decoders with image inference models. (See here for a good overview). Meta has announced an open-sourced AI model called ImageBind that brings together text, image, video, audio and sensor data (including depth, thermal and inertial measurements). For example, Meta claim that the model could create an image of a rainy scene from the sound of rain, or conversely add appropriate audio to a video sequence.

ImageBind: Holistic AI learning across six modalities
When humans absorb information from the world, we innately use multiple senses, such as seeing a busy street and hearing the sounds of car engines. Today, we’re introducing an approach that brings machines one step closer to humans’ ability to learn simultaneously, holistically, and directly from many different forms of information – without the need for explicit supervision (the process of organizing and labeling raw data).

Updated 22 May

Large Language Model Landscape

With all the attention being heaped upon OpenAI’s ChatGPT, you’d be forgiven for thinking that GPT-3.5/4 was the only large language model in town. However, as a blog site maintained by Alan Hardman makes clear, this is an increasingly crowded space. The author helpfully also provides a list of LLMs, datasets , benchmarking and labs. An essential reference.

Inside language models (from GPT-4 to PaLM)
Hi, I’m Alan. I advise government and enterprise on post-2020 AI like OpenAI GPT-n and Google DeepMind Gemini. You definitely want to keep up with the AI revolution this year. My paid subscribers (NASA, Microsoft, Google…) receive bleeding-edge and exclusive insights on AI as it happens. Get The Memo.

Updated 18 May

Annual Stanford AI Index Report

Not exactly a quick read, and I certainly have not yet been through it all yet, but if nothing else the key takeouts provide a quick snapshot of the state of AI. Points of interest are that industry has firmly taken over from academia in creating new AI models (not coincidentally the exponential increase in training compute continues), Chinese universities lead the world in AI publications and AI models performance continues to improve, with some categories out-performing the human baseline (such as language inference). A good long read.

AI Index Report 2023 – Artificial Intelligence Index
The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by the AI Index Steering Committee, an interdisciplinary group of experts from across academia and industry. The annual report tracks, collates, distills, and visualizes data relating to artificial intelligence, enabling decision-makers to take meaningful action to advance AI responsibly and ethically with humans in mind.

Microsoft researchers make claims on Artificial General Intelligence

It is said that one way to destroy your credibility in the AI space is to claim that you have built a system capable of Artificial General Intelligence. This is generally taken as being an AI system that can be applied to a number of unrelated fields, achieving a level of capability similar or exceeding human capabilities. Cade Metz, who recently broke the news of Geoffrey Hinton’s AI concerns reports that researchers at Microsoft recently published a paper stating that GPT-4 is demonstrating ‘sparks’ of AGI that surprised the researchers. They describe a number of tasks that it was able to carry out including (rather incredibly) write a rhyming proof that there are an infinite number of prime numbers.

Microsoft Says New A.I. Shows Signs of Human Reasoning
A provocative paper from researchers at Microsoft claims A.I. technology shows the ability to understand the way people do. Critics say those scientists are kidding themselves. Send any friend a story As a subscriber, you have 10 gift articles to give each month. Anyone can read what you share.

Updated 15 May

OpenAI Improves ChatGPT Privacy

Addressing concerns in many jurisdictions about the privacy impact of ChatGPT (such as Italy which temporarily banned its use), OpenAI has introduced features to allow individuals and organisations to request that they do not appear in answers via a Personal Data Removal Request Form. This is however only the opening gambit and is unlikely to satisfy regulators, as it has no bearing on whether data (correct or otherwise) can be restricted from training, and the risk of your chat history influencing the answers it gives other users.

How To Delete Your Data From ChatGPT
OpenAI has now introduced a Personal Data Removal Request form that allows people-primarily in Europe, although also in Japan-to ask that information about them be removed from OpenAI’s systems. It is described in an OpenAI blog post about how the company develops its language models.

Updated 13 May

How do Transformers Work?

Hugging Face, a company that provides AI models and datasets to developers offers a free online course on natural language processing (NLP) which contains a nice overview of the workings of the transformer, the AI architecture that underpins most modern large language models. For those of a more technical bent, you can read the original paper where Google scientists first described the transformer model and its attention mechanism. [Updated 22 -May] For further, more accessible, descriptions of the transformer model, see Transformers from Scratch by Peter Bloem, who also offers code in github and a few video lectures.

Updated 12 May

How do Transformers work? – Hugging Face NLP Course
In this section, we will take a high-level look at the architecture of Transformer models. Here are some reference points in the (short) history of Transformer models: The Transformer architecture was introduced in June 2017. The focus of the original research was on translation tasks.

Size does not always matter

An article by the IEEE argues that there isn’t an inexorable correlation between size and model performance and that smaller models trained on larger datasets can outperform larger models. The cost of training between these two scenarios is unclear, but this may be the start away from a ‘number of parameters’ arms race, or “my model is bigger than yours” debate. Sam Altman, CEO of OpenAI made similar point at an MIT conference last month.

When AI’s Large Language Models Shrink
Building ever larger language models has led to groundbreaking jumps in performance. But it’s also pushing state-of-the-art AI beyond the reach of all but the most well-resourced AI labs. That makes efforts to shrink models down to more manageable sizes more important than ever, say researchers.

Updated 11 May

A look at ChatGPT’s Code Interpreter – A program that creates programs

The dystopian scenario that keeps AI pessimists alive is a future where AI systems are able to generate new AI systems, getting into a runaway loop of ever-improving capability which humans are powerless to stop. I am quite sceptical of this scenario, but GPT-4’s code interpreter, a sandboxed environment where GPT-4 can create, run and improve Python code is quite amazing, a preview of a world to come, and one that is particularly well-suited for complex data analysis.

It is starting to get strange.
OpenAI may be very good at many things, but it is terrible at naming stuff. I would have hoped that the most powerful AI on the planet would have had a cool name (Bing suggested EVE or Zenon), but instead it is called GPT-4 . We need to talk about GPT-4.

Stanford University study on the impact of AI assistants on productivity

It has often been claimed that Artificial Intelligence can do for white-collar work what automation has already done for manufacturing. This is a viewpoint I subscribe to, and Stanford University’s Human-Centered Artificial Intelligence (HAI) centre has shown that call centre workers at a Fortune 500 software companies did indeed see an average of 13.8% productivity increase. Of particular note, the AI assistant was able to accelerate the up-skilling of workers, reaching productivity levels in two months that would previously have taken six months.

Updated 9 May

Will Generative AI Make You More Productive at Work? Yes, But Only If You’re Not Already Great at Your Job.
Scholars examining the impact of an AI assistant at a call center find gains for less experienced workers.

Google and OpenAI struggling to keep up with open-source AI

A Google researcher claims that the current generation of Large Language Models do not have any intrinsic insurmountable defenses, and that the threat/opportunity (depending on your vantage point) of open-sourced AI models was being overlooked. Following the leaking to the public of Meta’s open-sourced LLaMA model, a flurry of innovation has resulted in models being trained for as little as $100 worth of cloud compute.

Google and OpenAI struggling to keep up with open-source AI, senior engineer warns – SiliconANGLE
Google LLC and ChatGPT developer OpenAI LP face increasing competition from open-source developers in the field of generative artificial intelligence, which may threaten to overcome them, a senior Google engineer warned in a leaked document.

WIRED, The Hacking of ChatGPT is Just Getting Started

An insight into what it means for a large language model to be compromised, the techniques being used to bypass ChatGPT’s safeguards, and how the field of Generative AI vulnerability and security research is still in its infancy.

The Hacking of ChatGPT Is Just Getting Started
It took Alex Polyakov just a couple of hours to break GPT-4. When OpenAI released the latest version of its text-generating chatbot in March, Polyakov sat down in front of his keyboard and started entering prompts designed to bypass OpenAI’s safety systems. Soon, the CEO of security firm Adversa AI had GPT-4 spouting homophobic statements, creating phishing emails, and supporting violence.

Cade Metz, Genius Makers

For a fast-moving history of how a collection of doctoral students, researchers and academics persevered for years in relative obscurity before being suddenly cast into the spotlight as they became the most sought-after talent in tech. Charting the genesis of companies that are now household names such as DeepMind and OpenAI, Genius Makers tells the story of the multi-million tussle between Silicon Valley giants as they raced to assemble the best AI teams to build systems that can lay claim to human or super-human intelligence. Fascinating reading, especially as this book was written before generative systems exploded into the public consciousness.

Genius Makers
‘This colourful page-turner puts artificial intelligence into a human perspective . . . Metz explains this transformative technology and makes the quest thrilling.’ Walter Isaacson, author of Steve Jobs ____________________________________________________ This is the inside story of a small group of mavericks, eccentrics and geniuses who turned Artificial Intelligence from a fringe enthusiasm into a transformative technology.

The Economist – Special AI Edition

The Economist starts its special edition on artificial intelligence with an essay that takes the long view on the impact AI may have on humans’ sense of self and exceptionalism. Comparing it to the invention of printing, the dawn of the world wide web, and psychoanalysis, the essay posits that advances in AI will also lead to a reassessment of how humans understand the world. Are LLMs simply sequencing words, or is something more fundamental emerging? Less controversial but equally enlightening articles on how ChatGPT works (self-attention models), and whether they provide societal-level risks.

How to worry wisely about AI | Apr 22nd 2023 | The Economist
How to worry wisely about AI – Weekly edition of The Economist for Apr 22nd 2023. You’ve seen the news, now discover the story.

A paid-for

The post Artificial Intelligence Good Reads – Dec 23 appeared first on The Sand Reckoner.

ChatGPT according to ChatGPT

Simon Fabri — Tue, 28 Feb 2023 08:39:25 GMT

It has been impossible to miss or ignore the hype about ChatGPT in any of my LinkedIn and Twitter feeds or business magazines. Clearly at the peak of a hype cycle, much has been written about the transformative effect generative AI language models will have on industries as diverse as search, law, book-writing and software development. It was therefore time to understand a bit better the inner workings of Chat GPT, OpenAI’s language model underpinning Microsoft’s new foray into Internet Search. What better way, I thought, than simply asking ChatGPT to explain how it works? As I don’t yet have access to Bing’s AI Beta program, probably my just desserts for ignoring Bing for over a decade, I interviewed OpenAI’s ChatGPT bot.

A fantastic writing companion

The full, unedited transcript is shown below, but before diving into the interview, here are some of my own thoughts. First of all, this has been, by far, the easiest blog post I have written. Even were I not using whole sections of AI-generated text, these tools are fantastic for carrying out preliminary research. Rather than sifting through reams of Google pages, you get a neatly summarised, generally-accurate summary of what you are looking for. As the responses are structured to mimic human-generated writing, the research process feels more natural. To me, the biggest revelation was that it exposed the extent to which Google has rewired the way we think. We have learned how to think by parsing through lists, filtering out irrelevant entries, ignoring advertising, and learning how to write search queries that give you the best chance of landing the answers you are looking for. Generative AI models feel a lot more natural, and can clearly speed up the process of creating those first rough drafts of ideas, and I suspect that writer’s block (at least when it comes to blog writing) may become a thing of the past.

Give out Name, Rank and Number only

It feels like ChatGPT has been on a “Resistance to Interrogation” training course. While it is happy to give generalities on how the language model is trained and provides responses, it resists providing specifics. So any questions relating to the size of the training data set, what it included, the type of validation testing carried out, remained unanswered. ChatGPT, it seems, is a black box to itself too. We are clearly a way from introspective, sentient AI systems.

The Transformer

ChatGPT is based on a neural network architecture called a transformer, which was actually introduced by Google researchers in 2017. Explaining how this model works lies outside the ambition of this blog, though if you are interested, this article is a good starting point. Nevertheless, the strengths of the model lie in its ability to understand context, in other words, the meaning of a phrase depending on the overall phraseology. For example, humans make use of context when determining whether a jaguar is a large cat or a nice car. Context is extremely important both when training the model, in other words, taking in the large input data set it bases its responses on, as well as in understanding the questions being made of it. The second reported advantage of transformer networks is that they lend themselves to parallelisation, meaning that, given enough computing resources, they can be trained and generate responses quicker than other approaches.

Avoiding offence

In order to be viable in the real world, AI chatbots need to be reliable, providing fair responses that are free from bias, and have safeguards to prevent them from being used for harmful purposes. ChatGPT addresses this in three ways. First, the input training data set is selected so as to, as much as possible, be fair and diverse, whilst removing datasets that may be either discriminatory, offensive or biased. Although OpenAI has said that it is committed to being transparent about how the language model is developed, I was unable to find information on the training data set or training procedures used. It then applies checks on the responses it provides, including algorithms to detect “bias and toxicity” as well as human evaluation of the responses. Once again, no data is provided, so we have no way of telling how effective these guardrails are. Finally, there are checks that prohibit the chatbot from providing a response to a question that is “inappropriate and not in line with ethical and moral principles.” To figure out what these principles include, I had a look at OpenAI’s usage policies, which intriguingly also did not allow its use for “high-risk government decision-making!”

The Interview

Anyway, enough of my thoughts. Here’s what ChatGPT had to say about itself.

Good evening, ChatGPT. How would you describe yourself?

Good evening! As an artificial intelligence language model, I am a program designed to simulate human communication and generate text-based responses to user input. I was created by OpenAI using a deep learning architecture and trained on vast amounts of text data, allowing me to understand and generate natural language responses to a wide range of topics.

While I am not a sentient being and do not have emotions or personal experiences, I can use my programming to provide information, answer questions, and engage in conversations with users. My purpose is to assist and provide value to those who interact with me in a variety of ways, whether that be through entertainment, education, or problem-solving.

Nice to meet you. How does your artificial intelligence language model work?

It’s nice to meet you too!

At a high level, my artificial intelligence language model works by processing text input and generating a response based on patterns and associations learned from vast amounts of data. Here is a brief overview of how it works:

Training: To create my AI language model, OpenAI first trained me on a massive amount of text data, such as books, articles, and websites. This training data helps me learn the patterns and relationships between words and phrases in natural language.
Architecture: I use a deep learning architecture called a transformer, which was specifically designed for natural language processing. This architecture allows me to understand the meaning and context of text input and generate relevant and coherent responses.
Natural language processing: When a user inputs text, I use natural language processing techniques to analyze the input and understand its meaning, context, and intent.
Response generation: Once I understand the user’s input, I use the patterns and associations learned during training to generate a response that is relevant to the input and maintains coherence and context.
Iterative improvement: As I interact with more users and receive feedback, my creators at OpenAI use that information to continuously improve and refine my language model, ensuring that I remain up-to-date and accurate.

Overall, my AI language model is a complex system that uses machine learning and natural language processing techniques to understand and generate human-like responses to text input.

So let’s start off talking about training. What data sets, and how much data was used to train this version of ChatGPT?

Sure! The version of ChatGPT that you are interacting with was trained on a large and diverse dataset of text called the WebText dataset, which was compiled by scraping billions of web pages from the internet. The WebText dataset contains a wide variety of text, including articles, blogs, forums, and other web content, covering many different topics and styles of writing.

The WebText dataset used to train the largest version of GPT (GPT-3), which has 175 billion parameters. However, the version of ChatGPT that you are interacting with is a smaller version, and its exact size and training data may depend on the specific implementation or deployment.

Overall, the specific amount of data and the exact sources used to train ChatGPT may vary, as the model can be fine-tuned on additional domain-specific datasets depending on the intended use case. But in general, the training process for language models like ChatGPT involves using vast amounts of text data to train the model to recognize patterns and associations between words and phrases, allowing it to generate human-like responses to text input.

Aha, I thought I was interacting with GPT-3. Which version of GPT are you then?

I apologize for the confusion earlier. As a language model developed by OpenAI, I am not an individual version of GPT, but rather a separate AI language model that was trained using techniques similar to those used to train GPT. While I share many similarities with GPT, I have been specifically designed to provide assistance and generate text-based responses for a variety of use cases, ranging from entertainment and education to customer service and technical support.

The exact version of ChatGPT that you are interacting with depends on the specific implementation or deployment. However, in general, ChatGPT is a smaller and more lightweight language model than GPT-3, with fewer parameters and less computational resources required to run. This makes ChatGPT a more accessible and cost-effective option for businesses and individuals who want to incorporate AI language processing into their products or services.

So coming back to training data. Are you able to say how large the dataset was that was used to train you?

As an AI language model, I am designed to be flexible and adaptable, and the specific dataset used to train me may vary depending on the implementation and deployment. However, in general, the dataset used to train my base architecture is likely to be similar to the WebText dataset that was used to train the GPT models, which contains billions of words.

That being said, the exact size and composition of the training data used to train me may depend on the specific implementation or deployment. In some cases, I may be fine-tuned on additional domain-specific datasets to better suit a particular use case or application.

Overall, the size and quality of the training data used to train an AI language model like me can have a significant impact on the model’s performance and capabilities. The more diverse and representative the training data is, the more likely the model is to generalize well to new inputs and generate coherent, contextually appropriate responses.

Ok, so I see we are not going to get a straight answer on the size of your training dataset. How do you ensure that your training data does not include information that is discriminatory, biased, illegal or simply likely to cause offence?

As an AI language model, I do not have direct control over the training data that I am provided with, as this is typically handled by my creators at OpenAI. However, OpenAI is committed to ensuring that my training data is diverse, representative, and ethically sourced, and takes steps to address issues such as discrimination, bias, and offensive language.

To ensure the ethical use of AI language models like me, OpenAI has implemented a set of ethical guidelines and principles, which include a commitment to promoting safety, transparency, and fairness in the development and deployment of AI. These guidelines also include specific measures to mitigate the risk of bias, discrimination, and offensive language in AI language models, including:

Data selection: OpenAI takes care to select training data that is diverse and representative, and avoids datasets that contain discriminatory or offensive language. In addition, OpenAI may use techniques such as adversarial training to identify and remove biased or discriminatory patterns from the training data.
Bias monitoring: OpenAI actively monitors the performance of AI language models like me to detect and mitigate instances of bias or discrimination in the generated output.
Transparency: OpenAI is committed to being transparent about the development and capabilities of AI language models, and may release information about the training data and testing procedures used to create and evaluate these models.

Overall, while AI language models like me are not perfect and may still contain biases or errors, OpenAI is committed to ensuring that I am developed and used in an ethical and responsible manner, and takes active steps to address any issues that may arise.

Let’s move on from training. You mentioned earlier that you use a deep learning architecture called a transformer. What can you tell us about it and how it works?

Sure! The Transformer is a deep learning architecture that was first introduced in the paper “Attention Is All You Need” by Vaswani et al. in 2017. The Transformer was designed to address some of the limitations of previous sequence-to-sequence architectures, such as Recurrent Neural Networks (RNNs), that struggled with long-term dependencies and suffered from slow training and inference times.

The Transformer is based on the concept of attention, which allows the model to focus on specific parts of the input sequence when generating output. This attention mechanism is what allows the Transformer to process input sequences in parallel, rather than sequentially as in RNNs.

The Transformer architecture consists of an encoder and a decoder, which are connected by multiple layers of self-attention and feedforward neural networks. The encoder takes in the input sequence and generates a sequence of encoded representations, which are then used by the decoder to generate the output sequence. The self-attention mechanism allows the model to compute attention weights for each input element based on its relevance to the current output element, allowing the model to focus on the most relevant parts of the input when generating output.

The Transformer has been widely adopted in natural language processing tasks, including machine translation, text generation, and question-answering. The GPT models, including the version of ChatGPT that you are interacting with, are based on the Transformer architecture, and use self-attention to process input sequences and generate human-like responses.

Is this an example of a Generative Adversarial Network (or GAN)?

No, the Transformer architecture is not an example of a Generative Adversarial Network (GAN). GANs are a type of deep learning model that consists of two neural networks: a generator and a discriminator. The generator produces new data samples, such as images or text, while the discriminator tries to distinguish between the generated samples and real samples from the training data.

The Transformer, on the other hand, is a type of sequence-to-sequence model that is used for tasks such as machine translation, text generation, and question-answering. It uses the self-attention mechanism to process input sequences and generate human-like responses.

While both GANs and the Transformer are based on deep learning techniques, they have different architectures and are used for different tasks. GANs are used for generative tasks, such as image and text generation, while the Transformer is used for natural language processing tasks that require a model to generate coherent and contextually appropriate responses.

Ok, thanks. You have corrected my misunderstanding. So now coming to response generation. Can you provide some more details as to how you generate a response that best answers the question posed by the user?

Sure, I’d be happy to provide some more details on how I generate a response that best answers a user’s question.

The process of generating a response starts with the user’s input, which is first tokenized and transformed into a sequence of embeddings. The embeddings are then fed into the Transformer, which uses self-attention to encode the input sequence and generate an internal representation of the input.

Next, the internal representation of the input is used by the Transformer to generate a sequence of output embeddings, which are then transformed back into human-readable text. The output sequence is generated one token at a time, with each token being generated based on the previously generated tokens as well as the internal representation of the input.

During the generation process, the Transformer uses a combination of heuristics and learned knowledge to select the most appropriate response. This includes factors such as the semantic meaning of the input, the context of the conversation, and the likelihood of the response being grammatically correct and fluent.

In addition, the Transformer may use a variety of techniques to ensure that the generated response is relevant and informative. For example, it may use information retrieval techniques to search a large corpus of text to find relevant information, or it may use knowledge graphs or other structured data to provide more precise and accurate responses.

Overall, the process of generating a response is a complex and iterative one that involves a combination of heuristics, learned knowledge, and natural language understanding. The goal is to provide a response that is both relevant and informative, and that accurately reflects the user’s intent and the context of the conversation.

That’s been really helpful. Can you now explain how OpenAI monitors the responses you generate for accuracy, correctness and bias. Is there any data on this you can share?

OpenAI has several approaches to monitor the responses that models like mine generate for accuracy, correctness, and bias. Here are a few examples:

Human evaluations: One way to evaluate the quality of the responses generated by a language model is to ask human evaluators to rate them on various dimensions such as relevance, coherence, and fluency. OpenAI often conducts such evaluations to assess the performance of its language models.
Intrinsic evaluation metrics: OpenAI also uses several intrinsic evaluation metrics to assess the quality of its models. These metrics are designed to measure specific aspects of a model’s performance, such as its ability to generate diverse responses or to produce syntactically correct sentences.
External evaluation metrics: In addition to intrinsic evaluation metrics, OpenAI also uses external evaluation metrics to assess the quality of its models. These metrics involve comparing the performance of a model to a benchmark or a human baseline on a specific task, such as question-answering or text summarization.
Bias and toxicity detection: OpenAI uses several methods to detect bias and toxicity in the responses generated by language models. For example, it may use heuristics or machine learning models to detect offensive language or discriminatory content in the model’s output. Additionally, OpenAI has developed a tool called the “GPT-3 Data Poisoning Detector” that can help detect and prevent the use of biased or malicious training data.
Real-world testing: Finally, OpenAI also conducts real-world testing of its language models to evaluate their performance in realistic scenarios. This involves deploying the models in production environments and evaluating their performance based on real user interactions.

While I don’t have access to specific data on the performance of my model, I can tell you that OpenAI takes evaluation and monitoring very seriously and is committed to developing models that are both accurate and unbiased.

You mentioned external evaluation metrics. Is there any specific data on these that you can share with us?

I don’t have access to specific data on external evaluation metrics for my model, but I can provide some general information on how OpenAI evaluates its language models.

One commonly used external evaluation metric is the General Language Understanding Evaluation (GLUE) benchmark, which consists of a set of natural language understanding tasks such as sentiment analysis, question-answering, and text classification. The goal of the benchmark is to evaluate a model’s ability to perform a range of language understanding tasks, and to compare the performance of different models on the same tasks.

Another external evaluation metric is the SuperGLUE benchmark, which is a more challenging extension of the GLUE benchmark that includes a wider range of tasks and requires models to have a deeper understanding of language and reasoning.

In addition to these benchmarks, OpenAI also evaluates its models on specific real-world tasks, such as language translation, summarization, and conversation generation. The performance of the models is evaluated based on various metrics such as accuracy, fluency, and relevance.

It’s worth noting that while external evaluation metrics are useful for comparing the performance of different models on specific tasks, they are not always a perfect measure of a model’s overall quality or usefulness. The performance of a language model can vary depending on the task, the data it is trained on, and the specific requirements of the application it is being used for. As a result, it’s important to use a range of evaluation metrics and to evaluate a model in a variety of settings to get a comprehensive understanding of its strengths and limitations.

One final question, we have spoken about language models, but how many languages does your model support?

As a language model, I can support a wide range of languages. However, my primary language is English, and I am optimized for generating responses in that language.

In addition to English, I can also generate responses in other languages, such as Spanish, French, German, Italian, Chinese, and Japanese, among others. However, my performance in these languages may not be as good as in English, especially if I haven’t been specifically trained on them.

It’s worth noting that different languages have different grammatical structures, vocabularies, and cultural contexts, which can affect the performance of a language model. Therefore, training a model to generate responses in a specific language often requires a large amount of high-quality data in that language, as well as careful consideration of the linguistic and cultural nuances of the target language.

Ok, so on the topic of languages, I hope you don’t mind one further question. Does your model, and indeed your responses, distinguish between US English or British English?

As a language model, I’m able to generate responses in different dialects and varieties of English, including US English and British English. However, my training data primarily consists of written text from a variety of sources, and as a result, my responses may reflect a more general form of English rather than a specific dialect or accent.

That being said, I can take into account regional differences in vocabulary, spelling, and grammar when generating responses, and I can adapt to different styles of language use based on the context and the user’s input. For example, if a user inputs a question using British English spelling or vocabulary, I can generate a response that reflects that style of language use. Similarly, if a user inputs a question using American English spelling or vocabulary, I can generate a response that reflects that style of language use.

In short, while my responses may not always reflect a specific dialect or accent of English, I can adapt to different styles of language use and take into account regional differences when generating responses.

Thanks for your time. It has been very insightful.

You’re welcome! I’m glad I could help. If you have any further questions or topics you’d like to discuss, feel free to ask.

The post ChatGPT according to ChatGPT appeared first on The Sand Reckoner.

Climate Tech – Is it all Greenwashing?

Simon Fabri — Fri, 30 Dec 2022 16:35:30 GMT

This post is a work-in-progress, and is still a bit rough around the edges…

Do 1.5°C matter?

In 2021, at the UN Climate Change Conference (COP26) held in Glasgow, the world reaffirmed its ambition to limit the increase in average global temperatures to 1.5°C above pre-industrial temperatures. This goal, first set at the Paris UN conference (COP15), aimed to prevent the world’s most vulnerable countries from the worst impacts of climate change. For example, at above 1.5°C, it was estimated that 80% of the Maldives’ surface area would find itself below water. Since then, this target has largely been considered as being unachievable, with the UN estimating that this temperature will already be reached in the mid 2030s (i.e. in only 10-15 years’ time), [1] and the UK government estimates that at the current rate, temperatures are due to peak at 2.5-2.7°C above pre-industrial temperatures. [2]

So why the gloomy output? Essentially, the forecast peak warming temperature is a direct consequence of additional net carbon that is added to the atmosphere. The UN estimates that +1.5°C requires no more than a further 500bn tonnes of additional carbon dioxide to be added to the atmosphere. Considering global emissions are running at around 40bn tonnes/year, this leaves only 12-13 years of emissions at the current rate, after which all emissions will have to stop. Clearly, global emissions will not cliff-edge from the current rate to zero. Instead, a pathway to an acceptable peak temperature will require a glide path to reducing global emissions, while at the same time introducing mechanisms to remove carbon from the atmosphere. When carbon emissions and removals balance out, we will have reached ‘net zero’, and global temperatures will stop rising.

The Climate Tech Landscape

So how does the ever-burgeoning set of companies calling themselves a Climate Tech or Clean Tech company fit into this outlook? Any company describing itself as being a climate tech company should be directly or indirectly contributing towards the reduction of net emissions. There are two principal ways in which it can be achieved. The first is to directly influence the amount of carbon in the atmosphere, either by improving the carbon efficiency of existing processes, by providing a non-carbon alternative, or alternatively by removing carbon from the atmosphere. We shall call this the carbon reduction category. Then there is then a swathe of companies that provide climate or carbon intelligence, all of which provide insight into the emissions and climate impact across companies, territories or industries. Whilst these companies are not directly involved in carbon reduction, the insight is critical for appropriate targeting of efforts, auditing their impact, and pricing the cost of carbon. There is also a third category of companies aiming to help organisations adapt to the impact of climate change. Whilst crucial in coping with the impact of warming, these companies do not offer direct or indirect means to lower the peak temperature increase and so I will not consider them further in this post.

Carbon Abatement

PWC, the accounting and consultancy firm, publishes an annual ‘State of Climate Tech’ report [3]. This report looks at individual emission-generating sectors and assesses the climate tech funding attracted by each sector. These sectors are transport & mobility, energy, food & land use, industry & manufacturing, the built environment and financial services. Additionally, it classifies carbon capture, removal and storage as its own category. This categorisation is pretty common in this space, so I will use it for the remainder of this blog post.

Mobility and Transport is probably the sector that receives the most attention across the tech industry, predominantly through the activities to provide an infrastructure to electrify private and commercial road transport, though also includes efforts to decarbonise sea and air transport. Tesla is the poster child for the electric vehicle (EV) industry, but the entire automotive industry is being electrified, with VW predicting to shift more EVs than Tesla by 2024-5. Companies in this sector are involved in the entire supply chain, from batteries, their supply chain, through to in-home and public charging infrastructure. Indeed, the PWC report indicates that although mobility and transport are responsible for 15% of all global emissions, they receive 48% of global climate tech venture investment. Now, of course, VC funding is only part of the financing available, as SPAC funding, corporate venturing, innovation agencies and other forms of finance are significant contributors, but this figure is indicative of the interest that this sector provides.

Venture funding by industry sector ‘ Source: PWC State of Climate Tech, 2022’

Energy generation is going through a wholesale transition towards renewable generation, with 40% of the UK’s electricity now being produced from renewable sources. Although this is, by definition, zero-emission, many renewable energy sources have irregular generation patterns, dependent on when the wind blows or when the sun shines. This provides challenges in the creation of a steady, predictable source of electricity that matches demand, especially as gas, oil and coal power stations are progressively taken offline. In other words, they are not ‘dispatchable’ sources that can be dialled up and down to meet demand. Most climate tech companies in this space aim to address how to bridge this gap, either by (1) providing new, more controllable or predictable energy sources, such as green hydrogen or geothermal energy capture, or (2) by better managing energy generation, distribution and consumption, or finally by (3) introducing energy storage solutions, to help balance supply and demand.

Energy management is attracting a lot of attention from the tech scene, as it is the space most suited to be solved by software solutions. In a nutshell, these involve integrating consumers of energy, such as commercial premises, home heating systems, vehicle charging and so on with energy distributors and producers to adjust demand to better match the supply patterns. Examples include using electric cars to return energy to the grid (Vehicle-to-Grid) or to the home (Vehicle-to-Home), making use of smart home integration to manage electricity demand to better match supply, and introducing price signals to consumers of energy so that they can shape consumption around when energy costs are lowest. Octopus Energy, a UK-based energy utility, offers an electric vehicle tariff called Intelligent Octopus that integrates with the EV’s cloud-based charging APIs to schedule charging when demand is lowest and cost lowest. Amongst the companies supported by Y Combinator is Enode, a Norwegian company that provides energy companies, utilities and grid operators with access to a broad range of smart energy devices such as home chargers, electric vehicles, solar inverters in order to help match demand to their supply.

However, software solutions to match supply and demand will only go so far in compensating for the erratic power output patterns of renewable energy. To bridge periods when the wind stops blowing or the sky is overcast, cost-efficient long-duration storage is required. One, slightly unconventional approach is the use of gravitational storage towers that use surplus energy to lift concrete blocks, and return the energy back to the grid by releasing the blocks back down in a controlled fashion. Unlike battery storage, there is no energy loss over time (unless the blocks fall!). Other solutions being developed include solid state batteries, such as lithium-ion tech as used in EVs, flow batteries, and green hydrogen storage.

Gravity-based renewal energy storage tower

Agriculture and food production is an often under-recognised source of atmospheric carbon, and is responsible for 26% of global carbon emissions. This is largely driven by the inefficiencies of converting plant protein to animal protein, the large land areas required for livestock and its associated methane emissions. (yes, I mean flatulence). Alternative foods led much promise, but these are dropping off the menu. Even when scaled, cultured meat is estimated to cost more than $63 a kilo to produce, compared to US wholesale prices of $4 and $6 a kilo for pork and beef respectively. This has had an impact on the sector’s outlook. Beyond Meat, one of the stock markets’ darlings in this space has seen its share price drop by 95% since its peak shortly after its NASDAQ IPO in 2019. Vertical and urban farming has long been seen as the future of agriculture, as it is significantly more land-efficient than its conventional counterpart. Infarm, a European vertical farming company that had raised $600m in venture capital, laid off half of its workforce in November, largely due to the increase in energy costs triggered by Russia’s invasion of Ukraine, which has exposed a very high susceptibility to energy prices for these businesses. In truth, it does seem counter-intuitive that the future of farming should dispense with the free solar energy available on tap.

Energy used in commercial and residential buildings contributes to over 17% of greenhouse gas emissions, primarily for heating and cooling (HVAC) as well as lighting and other electricity consumption. Although a lot of attention is brought to creating green buildings, it is clearly more efficient to improve the emissions of the existing building stock built up over the past hundred-plus years. This is the space where the Internet of Things comes into its own, bringing sensors that monitor occupancy, temperature, humidity, solar irradiation, and air quality and coupled with machine learning models to reduce the energy burden of operating buildings. In countries where the pandemic has resulted in hybrid work patterns, utilisation of both offices and homes is now a lot more variable, leaving a lot of optimisation potential in how energy is used. Furthermore, the recent increase in energy costs has improved the return on investment for smart building management systems (BMS).

To be continued. Coming up next

This post has looked at how Clean Tech companies can directly help add or remove carbon from the atmosphere. Part 2 will consider the impact of carbon capture and storage technologies, as well as the role carbon and emissions analytics companies have in targeting efforts to the right area and auditing their impact. I will also touch on the contentious topic of greenwashing and the impact the increasing energy costs are likely to have on the overall Clean Tech Sector.

Web3 – The promise of a decentralised web

Simon Fabri — Sun, 04 Sep 2022 20:44:47 GMT

The history of computing has been a multi-decade case study of network effects. When a product benefits from a network effect, its value to its customers increases as the number of buyers or sellers using it increases. The dominance of the Windows PC operating system was a classic case. By providing a ‘compatible platform’ for personal computers, software developers could reach millions of users, and the personal computing revolution took off.

Network effects came to play in the current generation of the Web, sometimes referred to as Web 2.0. The first incarnation of the web, which was built on open standards, including HTML, SMTP email, this iteration was basically an interoperable global noticeboard, making information available over the Internet. There were a plethora of search engines, each quickly making its predecessor obsolete. Remember Alta Vista, HotBot, Ask Jeeves? There was no ‘stickiness’ inherent to any of these services, which meant that the moment one search engine out-performed the previous, users would immediately shift. Then the early 2000s then saw the advent of Web 2.0, sometimes referred to as the ‘Social Internet’, where many new companies such as MySpace, LinkedIn, DropBox, Blogger, Facebook, Twitter etc. began creating web experiences that were interactive and personal. By holding information about their customers, they were able to create and tailor experiences that were personal to them, thereby creating stickiness. Google then showed the world how to create a very valuable advertising business using the personalisation it was hoovering up from its customers. This marked the return of network effects, and increased market concentration and centralisation. A platform was now valuable to its users only if their friends and colleagues were on it, and was valuable to other businesses only if it could link to enough customers.

Blockchain and Web 3.0

So why this quick ramble through the Internet’s potted history? Well, the next purported evolution, referred to unoriginally as Web 3.0, or simply Web3, promises to make use of crypto technologies to break the stranglehold of concentration of the big tech players. So is this the next big thing, or simply pie in the sky?

For an outsider, the numbers relating to cryptocurrencies are somewhat bewildering. From the capitalisation of the cryptocurrency market (it reached $30 trillion at the end of 2021), to its high volatility (it has since lost two-thirds of its value), it is difficult to get one’s head as to what is really going on. Despite being variously compared to a modern-day Ponzi scheme that would make Bernie Madoff blush, smart people are still betting big money on crypto tech.

Total Cryptocurrency Market Cap (1 Year) . Down 65% in one year. Source: coinmarketcap.com

In a previous post, I looked at the applications of blockchains, the technology that underpins cryptocurrency, showing how clever cryptography can provide a single version of a ‘truth’ that does not rely on the data or certification being provided by a single party, but instead is based on a decentralised network of several parties, all of whom have a stake in the system.

Blockchain characteristics – Source: S. Fabri

The Case for Web3

Andreessen Horowitz, also known as a16z, is a legendary Silicon Valley venture capital firm, whose founders were early investors in Facebook, Twitter, GitHub, Stripe, Waymo, AirBnB and Roblox. In May, it announced that it raised $4.5 billion for a crypto fund. Chris Dixon, who leads a16z’s crypto investments says that decentralisation and crypto are the new frontiers for 16z investment, as it fuels entrepreneurship, and pushes back away from the intrinsic centralisation of Web 2.0 companies. As an example, he explains how the proceeds of virtual goods in Fortnite go to the company behind the game. Instead, the sale of Non-Fungible Tokens (NFTs) is a means for creators to sell assets directly to a fanbase, in order to monetise that fanbase. According to Dixon, Web3 allows creative people, businesses and startups to reach audiences directly and have a relationship “that is not mediated by algorithms and advertising.”

The likes of a16z and other investors are fueling a burgeoning ecosystem of tech companies that are working to provide the building blocks to deliver on the Web 3.0 vision. Some of these are busy creating the building blocks of a Web3 infrastructure, including the blockchains themselves, including their crypto algorithms, the nodes that host the ledgers, and tooling and services that interconnect between blockchains and provide a variety of crypto-as-a-service applications. On top of these are the Web 3 applications, including Circle, which provides companies with the ability to transact financially with multiple cryptocurrencies, Sky Mavis, who provide the building blocks for creating decentralised online games, where all users can transact and sell game items, and OpenSea, a market place for peer-to-peer NFT transactions.

Not so simple – the Crypto ecosystem according to CB Insights

The Opportunities

When looking at Web3, a couple of key opportunities stand out. Decentralised Finance, also known as DeFi, whilst in its infancy, clearly will act as a spur to innovation in the broader finance industry. and possibly reconfigure the finance industry. Banks, wire transfer companies and payment networks all act as centralising conduits, extracting rents of anything between 1-4% on the value transacted, with operating margins in the order of 60-80%. These fees mean that established institutions are vulnerable to disruption by lower-cost crypto networks. Already in 2021, the value of transactions underpinned by Ethereum, one of the largest blockchain networks was $6 trillion. By comparison, Visa handled $10.14 trillion worth of transactions that year. The use of smart contracts, agreements that are automatically enforced and cannot be tampered with can be used to support a wide range of transactions, the crypto equivalent of shares, loans and ‘stablecoins.’

Non Fungible Tokens are unique records of digital assets that can be exchanged or sold, usually using cryptocurrencies that prove ownership of that digital assets. As they live on open blockchains, normally Ethereum, their transaction history is visible to all. Moreover, digital creators can retain a stake in their work, for example by retaining a share of the proceeds on any future resales enforced by smart contracts. Digital assets range from tweets, digital football trading cards, magazine articles or video game assets. Although the market for NFTs has tanked recently, for as long as artists continue to create digital content, be they music, visual, or gaming content, then it is likely that NFTs, or something like them, will be used to create a market where they can be bought and sold.

The Challenges to be Overcome

Having seen a fair few tech cycles, the extent of fragmentation of the Web3 landscape means that it feels a few years at best from going mainstream. Despite a16z’s enthusiasm, the computer industry does appear to have a tendency towards centralisation, rather than decentralisation.

Consider online music. Decentralised peer-to-peer services such as Napster and Kazaa provided listeners the world over with free, nearly limitless music, shared through PC applications across the world. Although these p2p applications provided access to music, the experience was poor. Slow, unreliable, with a clunky user interface. It was almost as though there wasn’t a single product development creating the experience! Then, in 2008, Spotify demoed their fledgling PC application to Universal, based on a tech stack that they developed with the sole purpose of cloud-based music streaming. Not only had music streaming made the jump from a hacky tech enthusiast’s tool to an application that anyone could use, but the music labels now had a partner with who they could set up contracts and hopefully build a sustainable business model.

The centralised services that Web3 advocates say get in the way between creators and their fans, such as recommendation algorithms and advertising, are what generate awareness in the first place. For example, Sam Ryder, who sang his way to a run-up position in this year’s Eurovision, could certainly drive up a sustainable business through selling access and digital assets as NFTs. But would that have been possible without TikTok’s recommendation algorithm which catapulted him from being an unknown artist singing covers from home over lockdown to a viral sensation with nationwide name recognition?

Say thank you for the algorithms – Sam Ryder

Similarly, for all the decentralising architecture of DeFi, there remain centralised control points. For example, online wallets that allow users to buy, sell and manage crypto assets manage passwords and logins on their users behalf, just like traditional online services. So what does Web3 need to achieve if it is truly to become mainstream?

Energy Efficiency

This is one area where there has been significant progress lately. Until recently, the Ethereum blockchain required approximately 4.8kWh for its ‘proof of work’ algorithm, which is more than the UK’s daily household electricity consumption. Think about it. A single online transaction requires more electricity than the domestic appliances, lighting and varied electricity needed by a family for a whole day. Particularly as much of the world faces an energy cost crisis, this was unsustainable. Ethereum claims that this was addressed in an upgrade called ‘The Merge’ which took place on, moving its verification algorithms to a system called ‘proof of stake’ which is a lot more efficient (Ethereum claims that energy consumption will decrease by 99.95% per transaction). For a comparison between these two algorithms, see here.

Be accessible to a non-tech customer base

First crypto transactions need to become more user-friendly, particularly for people who don’t care about the underlying tech. You don’t for example need to care how a car works, to want to drive one. The Verge quotes an example where some NFTs priced at $198,000 sold for $1,800 because some older online listings were still active. As blockchain transactions are irreversible, there was no way for the sellers to recover or reverse the transaction. As there was no intermediary in place who would guarantee or underwrite the sale, the asset was gone. Similarly, crypto wallets that allow users to hold their own private keys have no dependency on an online service, but if the user loses them, they are gone forever, much like cash under the mattress. This leads to the question, “Who do customers go to when things go wrong?” As anyone involved in creating consumer experiences knows, you can only really delight customers if you are able to deal with all the range of ways in which your service can go wrong. In this case, blockchain immutability is a challenge.

Prevent Abuse and Harassment

As blockchain platforms effectively provide exchanges for often anonymous users, there are currently few ways by which illicit, discriminatory, offending or otherwise illegal material can be revoked. The immutability of the transactions, the anonymity behind crypto wallets, the ability to obfuscate the source and destination of funds, plus the intrinsic openness and visibility of the ledgers cause all sorts of problems. A blog on the topic explored the ramifications of the transparency of public transactions, looking at the risks from abusive partners, harassment, revenge porn and so on. There seems to be little thought across the industry on how these issues will be addressed. After all, one of the strongest advantages of centralised systems is that it is clear who legislators and law enforcement authorities can go to in order to put things right.

Interoperability

Any financial asset that has a realisable value must be ‘liquid’, in other words, it should be able to be converted to another form of asset. For example, a house can be sold for cash. If it cannot be sold, it is illiquid, and hence has no financial value. For this very fundamental reason, all financial products and transaction platforms, allow for the transfer or conversion of assets. This is the financial equivalent of interoperability in the tech world. While the Web was built on a fully standardised set of technologies that we take for granted, (albeit with limited interoperability between, different platforms – Facebook, Tiktok etc), the blockchain networks on which Web3 is built are completely siloed. A digital asset such as an NFT held on one blockchain, say Ethereum, cannot easily be transferred onto another one, such as Solana. Similarly, the applications that run on given blockchains are also their own islands. Today this is being addressed through specific bridges between different applications, but this is a non-scalable solution that also introduces intermediary choke points that undermine the decentralised vision. Until protocols and asset formats become standardised, just as emails can be sent seamlessly across different email providers, then anyone using a Web3 application will be locked-in to whatever application or platform they selected.

Conclusion

While cryptocurrencies will likely become an increasingly important part of the fintech fabric, and blockchains will continue to find a diverse range of applications, decentralisation is something that will happen by degree. Bringing creators and consumers together requires often requires multi-sided platforms and companies to operate them. The scale that TikTok, Facebook or Google provide, also provides convenience, as it makes it easier for people to find the experiences, content or people they are looking for. Fully decentralised networks are also, by definition, fragmented, which will create significant usability issues that would likely be too high a barrier for mass market adoption. As the tech improves, as networks and blockchains become increasingly interoperable, regulators will also insist that there are legal entities who can take accountability for what takes place on a blockchain. For these reasons, I think it is more likely that the crypto tech underpinning Web 3.0 will be embraced by both existing and emerging tech companies as a way of providing customers with greater control and ownership over their own data, more transparency on how it is used and allowing faster and cheaper transactions. Along the way, we also doubtlessly see existing business models perishing and new ones emerging.

The Sand Reckoner: AI & Disruptive Tech

The 4 AI Roles of the Future

The 100x Economic Advantage

The 10x engineering advantage

“Syntax coding is dead” - bridging the human-machine gap

Looking to the future - the new categories of AI work:

1. Humans as Intelligent ‘AI Commissioners’

2. Humans as ‘Human-AI Orchestrators’

3. Humans as ‘AI Validators’

4. Humans as ‘AI Innovators’ - A golden age of innovation?

Conclusion

Further Reading

Claude Code and glimpses of the future - Part 1

What others are saying

The project - introducing Weavify

First Impressions

1. AI chatbots - your trusted advisors

2. Documentation over Code

3. Slop is a Human Artefact, not an AI Artefact

4. Planning Mode - who is the intelligent being in this relationship?

5. Plan and Iterate - Your “house rules”

6. Fast creation, slow fine-tuning. Don’t talk about cost!

In conclusion

Further Reading

AI-Generated Annexe: The Technology Stack behind Weavify.

Author: Claude.ai Opus 4.5

Summary 1: The Technology Stack Behind Weavify

Summary 2: From Code to Production: Weavify’s Development Workflow

AI themes for 2026

1. Jevons Paradox. A model for AI demand?

Read more:

2. What Training / Inference Mix are we heading towards?

Read more:

3. Will AI become more explainable?

Read More:

4. How should we manage autonomous AI?

Bringing it all together

Read More:

AI and the Future of Work

The state of AI in the workplace today

Stating the obvious – AI continues to get (much) better

How AI changes the value of work

Learning 1. Combine domain expertise with AI savvy

Lesson 2. Optimise AI for problem-solving and skills development.

Conclusion on the Future of Work

References and Further Reading

What does AI mean for university education?

A Tale of Two Time Horizons

The hollowing out of entry-level jobs

Looking Ahead – what does this mean for graduate jobs?

UK Universities Spring into Action. Or do they?

Protecting Academic Integrity and the Learning Process

So what should Universities learn from this?

1. A University Degree is not a guarantee of anything.

2. Data, Statistics and AI as core subjects for technical disciplines

3. Basic AI Model Literacy for all disciplines

4. Reinvent the value proposition

Predicting the future is hard

References and Further Reading

How AI is changing engineering

A spotlight on software engineering

And the broader learnings for other engineering disciplines?

1. AI for initial concept generation

2. AI for design automation

3. AI for complex engineering optimisation

So what does all this mean for the future of engineering?

Finally…

References and Further Reading

Artificial Intelligence Good Reads (Part 2)

Some AI Forecasts for 2024 (1)

Some AI Forecasts for 2024 (2)

How to Rank LLMs like Chess Grandmasters

AI as a National Strategy

Artificial Intelligence Good Reads – Dec 23

December ’23

Older posts…

The Open Source vs Proprietary AI Models Faultlines

October ’23

AI’s Environmental Impact

Previous articles