Wednesday, March 13, 2024

AI moves from 2D to 3D

Quite remarkable achievement by Deepmind. I wrote about this in my 'Learning in the Metaverse' book and the 2nd Edition of my book on GenAI coming out on May 4. The idea that AI accelerates the move from 2D to 3D.

This software takes language prompts into actions within 3D worlds. For the first time, the agent actually understands the 3D world in which it operates and can perform tasks just like a human.

How it works

All it needs are images from a screen of the game/environment and text instructions. It can therefor interact with Any virtual environment. Menus, navigation through the environments, actions and interactions with objects are all executed. They partnered with eight games companies to perform different tasks. SIMA is the AI agent that, after training, perceives and also understand a variety of environments, so that it can take actions to achieve an instructed goal.



Transfer

Even more remarkable is the fact that agents seem to transfer learning, so playing in one environment helps it succeed in others.



Multimodal now also 3D

Far too much debate around AI focus on text only LLM capabilities and not their expansion into multimodal capabilities, now including 3D worlds. The goal is to get agents to perform things in the virtual and/or real 3D world intelligently like a human.

Applications

Its obvious application is in performing risky tasks in high-risk environments but also in any 3D world. It can also be used in online 3D worlds to help with training. The early signs of a tutor within these worlds or buddy, patient, employee or customer in training. 

Its obvious application is in performing risky tasks in high-risk environments but also in any 3D world. It can also be used in online 3D worlds to help with training. The early signs of a tutor within these worlds or buddy, patient, employee or customer in training.

Full paper

The Strange Case of Altman V Board at OpenAI revealed

The New Yorker article on the drama at OpenAI has uncovered, not only the timeline but the dynamics of the drama. It was a plot worthy of an episode in Succession. Kendall Roy is Sam Altman, a charismatic, persuasive and experienced tech entrepreneur. Logan Roy is Microsoft, looking to get some zest into the business, as it has lost its mojo. Then there are the bit players, the winners and losers. 

Helen Toner, was the 'Effective Altruism' academic, with no real AI or technical experience, who had to apologise to the board for writing opinion piece articles criticising the organisation in which she was a board member. She apologised but Altman clearly had no time for her antics. He tried to get her ousted from the Board, playing them off against each other. It happens – I’ve seen it. Some on the board were inexperienced in business and couldn’t cope with the pressure, clearly tangled up in academic debates about AGI, an insider said “Every step we get closer to A.G.I., everybody takes on, like, ten insanity points.” The board felt threatened, panicked and sacked Altman. BIG MISTAKE

Establishing that there was “no malfeasance” Microsoft went apeshit, Altman took Brockman with him, the staff revolted. This was a battle between lightweight academics and experienced AI and business brains. Used to ruling the roost in the their world, and with more than a little of the arrogance that comes with academic status, they completely misjudged the situation and overplayed their hand. In the end it was a rout. The board “agreed to leave” (cough), Altman was reinstated, and the usual inquiry was ordered (always a sop). As one tech journalist noted "A clod of a board stays consistent to its cloddery.”

Two other fascinating characters in all this are Kevin Scott, the Microsoft AI guy, and Mira Murati, the ex-Tesla Albanian, tech savvy  and known to be unflappable. They both came from tough, poor backgrounds and hold the belief that this tech really is a leveller - we'll see. They steered all of this to its conclusion. 

The board all went, apart from Adam D’Angelo, co-founder and CEO of Quora. A computer scientist and hugely successful entrepreneur.

Larry Summers was brought in. Fascinating choice, ex-academic, president of Harvard but sacked during an early salvo in the culture wars and now soaked in economics, politics and business. He’s one of the best connected figures in America.  




The board has also been considerably expanded with a range of professional expertise; Bret Taylor is the Chair, a real heavyweight:
Creator of Google Maps
CTO at Facebook
Chair of Twitter
Co-CEO at Salesforce. 









He has brought in:
Sue Desmond -Hellman Former CEO of Gates Foundation, physician and experienced corporate board member
Nicole Seligman heavyweight global lawyer
Fidji Simo and other tech entrepreneur 
....and, of course, the King id dead Long live the King!
.....Samuel Altman.

One figure lurks behind all of this, the genius that is Ilya Sutskever. He knows more about AI than anyone there and created the software yet survives as he IS OpenAI. Like the mad-genius Oracle, sitting quietly in the middle watching all of this, above all of these petty squabbles. He is now back to his day job – changing the world. 


PS
Thanks to Peter Shea for helping me with this piece.

alsomoriginal article - well worth a read but paywalled 
https://www.newyorker.com/magazine/2023/12/11/the-inside-story-of-microsofts-partnership-with-openai?fbclid=IwAR2kmNi0LLc3FaXY6s2C08YCaDS88hD4mBFauylAIJCgzJ4lBnHqyZ-ts6Y

Wednesday, March 06, 2024

Are the LMS & VLE dead! Accenture and Udacity draw new line in sand

Dead fish market

I have been saying for some time that the VLE and LMS market is in for a dramatic shift. These are two very different markets with two separate sets of products, both global and lucrative. Both are also crammed with legacy technologies and both encourage old and 'not fit for purpose' standards, like SCORM (not even supported), that cripple their ability to adapt to AI-driven approaches to learning. The sector is a bit of a dead fish.

The LMS and VLE market is set for a change, as new AI platforms emerge. The investors are ready, the need is there, we are now moving into the phase when they will be built. It will take time, as incumbents are locked in, often on 3 year licence deals, and they are integrated but things will change. They always do.

Investor hiatus

Investors have been in a hiatus, waiting to see how things shake out. Guess what - they’re starting to shake out. AI is not just the new kid on the block, it is the only new kid on the block. It is THE technology of the age. The top 7 tech stocks, all AI companies, now have a combined market capitalisation of $12.5 trillion, more than the collective gross domestic product of New York, Tokyo, LA, Paris, London, Seoul, Chicago, San Francisco, Osaka, Dallas and Shanghai. This is no fad, neither is it the future – it is now.

The analysts are also all at sea with their grids and lack of foresight. In truth investors that bought into the LMS market are struggling to realise the revenues and profits. Some very large companies are struggling with their shareprice and meeting revenue and profit expectations. Even at the medium and lower levels, there is suspicion that value is falling. The learning content creation companies should be using AI (and are) and so prices will plummet. It is difficult to see why investors would put big valuations on dated content or bespoke production. Would you invest in a video production learning company having seen Sora? A major Hollywood investor has just pulled $850 million from a studio build. Investors in online learning will be thinking along the same lines.

Accenture buys Udacity

That brings us to Accenture buying Udacity (for peanuts) and saying they plan to invest $1 billion (yes $1 BILLION) in LearnVantage – an AI-first learning platform. Interesting move. They say it will be an AI platform... then make the mistake of saying it will primarily teach AI. That makes no sense. It is the old thinking of - let’s build a pile of courses. Consultancies don’t build good tech – neither did Udacity - and if Accenture lose their objectivity as solutions consultants then they do themselves damage. You can’t be a consultant then turn and say – by the way the 'optimal' solution is our platform. 

However, this doesn’t really matter, as this is just the first line in the sand in a major market shift. If they don’t succeed, someone else will. The huge tech companies could do this and may well enter the market but their eyes are on bigger fish - productivity tools. They are never good in the learning market. They're not looking for gold, as they make a ton from selling the shovels.

The LMS is dead, long live the LMS!

Some love them, some hate them. Some love to hate them.

1. Zombie LMS

Some organisations have a Zombie LMS. At the very mention of its name, managers and learners roll their eyes. Organisations can get locked into LMS contracts that limit their ability and agility to adopt innovations. Many an LMS lies like an old fossil, buried in the enterprise software stack, churning away like an old heating system – slow, inefficient and in constant need of repair. Long term licences, inertia and the cost of change, see the organisation locked into a barely functional world of half-dead software and courses.

2. Functional creep

Our LMS does everything. “Social?” “Yes, that as well”. Once the LMS folk get their hooks into you, they extend their reach into all sorts of areas where they don’t belong. Suddenly they have a ‘chat’ offer, that is truly awful – but part of the ‘complete LMS solution’. For a few extra bucks they solve all of your performance support, corporate comms, HR and talent management problems, locking you bit by bit into the deep dungeon they’ve built for your learners, never to see the light again.

3. Courses, of course

The LMS also encourages an obsession with courses. I’m no fan of Maslow’s clichéd pyramid of needs but he did come up with a great line, ”If you only have a hammer, you tend to see every problem as a nail.” That is precisely the problem with the LMS - give an organisation an LMS and every problem is solved by a ‘course’. This has led to a culture of over-engineered, expensive and long-winded course production that aligned with the use of the LMS and not with organisational or business needs. What we end up with are a ton of crap leadership, DEI and complaince courses.

4. Cripples content

Throw stuff into some VLEs and LMSs and it spits out some really awful looking stuff. Encouraged to load up half-baked course notes, teachers and trainers knock out stuff that conforms solidly to that great law of content production – GIGO – garbage in garbage out.  Graphic, text, graphic, text, multiple-choice question….. repeat. The Disneyfication of learning has happened with tons of hokey, cartoon and speech bubble stuff. Out goes simulations and anything that doesn’t conform to the simple, flat, linear content that your LMS can deliver or even worse.... gamification - some infantile game that feels as though it os designed fro 10 year olds!

5. One size fits all

With the rise of AI, adaptive and personalised learning, the LMS becomes an irritation. They don’t cope well with systems that deliver smart, personalised learning pathways. The sophisticated higher-level learning experiences are locked out by the limited ability of the LMS to cope with such innovation. The LMS becomes a sort of cardboard SCORM template through which all content must fit. But it’s the ‘learn by doing’, performance support and experiential learning that most LMSs really squeeze out of the mix.

6. Compliance hell

We all know what happened in compliance training. L&D used the fallacious argument that the law and regulators demand oodles of long courses. In fact, no law and very few regulators demand long, bad, largely useless online courses. This doesn’t work. In fact, it is counterproductive, often creating a dismissive reaction among learners. Yet the LMS encourages this glib solutionism.

7. Completion cul-de-sacs

With the LMS, along came SCORM, a ‘standard’ that in one move pushed everyone towards ‘course completion’. Learning via an LMS was no longer a joyous thing. It became an endless chore, slogging through course after course until complete. Gone is the idea that learning journeys can be interesting, personal affairs. SCORM is a completion whip that is used to march learners in lock-step towards completion.

8. Limits data

Given the constraints of most LMSs, there is the illusion that valuable data is being gathered, when in fact, it’s merely who does what course, when, and did they complete. As the world gets more data hungry, the LMS may be the very thing that stops valuable data from being gathered, managed and used.

To be fair...

To be fair a VLE or LMS was often the prime mover for shifting people away from pure classroom delivery. This is still an issue in many organisations but at least they effected a move, at the enterprise level, away from often lacklustre and expensive classroom courses. In fact, with blended learning, you can manage your pantheon of delivery channels, including classroom delivery, through your LMS (classroom planning is often included). As enterprise software they also scale, control what can be chaos and duplication, provide consistency and strategic intent. You do need to identify and manage your people, store stuff, deliver stuff and manage data and nn LMS is simply a single integrated piece of software. You may want to do without one but you’ll end up integrating the other things you use – and that will be, a sort of LMS. There are also security issues which they handle 

There will always be a need for single solutions. We can seem however that this has descended into the mess that is the all-embracing, death-clutch that is ‘Talent management’.

Conclusion

Organisations need enterprise software. We’ve been through the course repository model, that got stuck in the rut of rather flat e-learning. The new model is more dialogue than monologue. The incumbent VLE and LMS models need to adapt quickly or be replaced by those who do AI well. The VLE and LMS market looks like something out of the early 2000s, that’s because it is something out of that era. Many of these companies started then and having moved from client-server structure to the cloud, still have legacy code and lack the flexibility to work in this new world. My guess is that some stand a chance, many do not. If all you have done is add some prompted creation tools to your offer – forget it.

We have a chance to break out of this repository of courses model, crippled by box-ticking compliance, impoverished on data by SCORM to create more dynamic platforms that cope with formal and informal learning, also performance support, Tutorbots and data that informs learning and personal development. AI is the technology that appears to promise some sort of escape velocity from these repositories. You can already feel the blood drain from the old model as the new tools become available and improve so quickly.

Tuesday, March 05, 2024

Is ‘Deepfake’ hysteria mostly fake news?

Deepfakes touch a nerve. They are easy to latch on to as an issue of ethical concern. Yet despite the technology being around for many years, there has been no deepfake apocalypse. The surprising thing about deepfakes is that there are so few of them. That is not to say it cannot happen. But it is an issue that demands some cool head thinking.

Deepfakes have been around for a long time. Roman emperors sometimes had their predecessors' portraits altered to resemble themselves, thereby rewriting history to suit their narrative or to claim a lineage. Fakes in print and photography have been around as long as those media have existed.

In my own field, learning, a huge number have for decades, used this deliberate fake. It is entirely made up, based on a fake citation, fake numbers put on a fake pyramid. Yet I have seen a Vice Principal of a University and no end of Keynotes at conferences and educationalist use it in their presentations. I have written about suck fakery for years and a lesson I learnt a long time ago was that we tend to ignore deepfakes when they suit our own agendas. No one complained when a flood of naked Trump images flooded the web, but if it’s from the Trump camp, people go apeshit. In other words, the debate often tends to be partisan.

When did AI deepfakes start?

Deepfakes, as they're understood today, refer specifically to media that's been altered or created using deep learning, a subset of artificial intelligence (AI) technology.

The more recent worries about AI creating deepfakes have been around since 2017 when ‘deepfake’ (portmanteau of deep learning & fake) was used to create images and videos. It was on Reddit that a user called ‘Deepfake’ starting positing videos in 2017 of videos with celebrities superimposed on other bodies.

Since then, the technology has advanced rapidly, leading to more realistic deepfakes that are increasingly difficult to detect. This has raised significant ethical, legal, and social concerns regarding privacy, consent, misinformation, and the potential for exploitation. Yet there is little evidence that they are having any effect of either beliefs or elections.

Deliberate deepfakes

The first widely known instance of a political AI deepfake surfaced in April 2018. This was a video of former U.S. President Barack Obama, made by Jordan Peele in collaboration with BuzzFeed and the director’s production company, Monkeypaw Productions. In the video, Obama appears to say a series of controversial statements. However, it was actually Jordan Peele's voice, an impressionist and comedian, using AI technology to manipulate Obama's lip movements to match his speech. We also readily forget that it was Obama who pioneered the harvesting of social media data to target voters with political messaging.

The Obama video was actually created as a public service announcement to raise awareness about the potential misuse of deepfake technology in spreading misinformation and the importance of media literacy. It wasn't intended to deceive but rather to educate the public about the capabilities and potential dangers of deepfake technology, especially concerning its use in politics and media.

In 2019, artists created deepfake videos of UK politicians including Boris Johnson and Jeremy Corbyn, in which they appeared to endorse each other for Prime Minister. These videos were made to raise awareness about the threat of deepfakes in elections and politics

In 2020, the most notable deepfake video of Belgian Prime Minister Sophie Wilmès showed her give a speech where she linked COVID-19 to environmental damage and the need to take action on climate change. This video was actually created by an environmental organization to raise awareness about climate change.

In other words, many of the most notable deepfakes have been for awareness, satire, or educational purposes.

Debunked deepfakes

Most deepfakes are quickly debunked. In 2022, during the Russia-Ukraine conflict, a deepfake video of Ukrainian President Volodymyr Zelensky was circulated. In the video, he appeared to be making a statement asking Ukrainian soldiers to lay down their arms. Deepfakes, like this, are usually quickly identified and debunked, but it shows how harmful misinformation during sensitive times like a military conflict, can be dangerous.

The recent images of Donald Trump were explicitly stated to be deepfakes by their author. They had missing fingers, odd teeth, a long upside down nail on his hand and weird words on hats and clothes, so quickly identified. At the moment they are easy to detect and debunk. That won’t always be the case, which brings us to detection.

Deepfake detection

As AI develops, deepfake production becomes more possible but so do advances in AI and digital forensics for detection. You can train models to tell the difference by analysing facial expressions, eye movement, lip sync and overall facial consistency. There are subtleties in facial movements and expressions, blood vessel giveaways, as well as eye blinking, breathing, blood poulses and other movements that are difficult to replicate in deepfakes. Another is checks for consistency, in lighting, reflections, shadows and backgrounds. Frame by frame checking can also reveal flickers and other signs of fakery. Then there’s audio detection, with a whole rack of its own techniques. On top of all this are forensic checks on the origins, metadata and compression artefacts that can reveal the creation, tampering or its unreliable source. Let’s also remember that humans can also be used to check, as our brains are fine-tuned to find these tell-tale signs, so human moderation still has a role. 

As deepfake technology becomes more sophisticated, the challenge of detecting them increase but these techniques are constantly evolving, and companies often use a combination of methods to improve accuracy and reliability. There is also a lot of sharing of knowledge across companies to keep ahead of the game.

So it is easier to detect deepfakes that many think. There are plenty of tell-tale signs that AI can use to detect, police and prevent them from being shown. These techniques have been honed for years and that is the reason why so few ever actually surface on social media platforms. Facebook, Google, X and others have been working on this for years. That is why they have not been caught flat-footed on the issue. 

Deepfakes in learning 

We should also remember that deepfakes can be useful. I have used them to create several avatars of myself, which speak languages I cannot speak. They have been used to recreate historical figures for educational documentaries and interactive learning experiences. You see and hear historical figures ‘come to life’, to make the learning process more engaging. Language courses have used them to create videos and immersive language learning experiences, as the lip-synch is now superb. Even museums and educational institutions have started using deepfake technology to create more immersive exhibits. On top of this real training projects in sectors like medicine, now use deepfake technology to create realistic training videos or simulations, where patients and healthcare staff can be represented.

Conclusion

We too readily jump to conclusions when it comes to AI and ethics, there is often a rush to simplistic moralising, when the truth is deeper and more complex. Technology almost always has multiple uses with varying degrees of beneficial and damaging uses. We tend to lean towards the negative through confirmation and negativity bias. This needs to be avoided by a more detailed discussion of the issues, not presenting everything in apocalyptic terms.


Monday, March 04, 2024

The Mind is Flat!

Nick Chater’s ‘The Mind is Flat: The Remarkable Shallowness of the Improvising Brain’ is an astonishing work, a book that is truly challenging. He argues against the common belief that our thoughts and behaviours are deeply rooted in our subconscious. Mental depth for him is an illusion. Instead, he suggests that our minds are flat, meaning that they operate on the surface level without deep, hidden motivations or unconscious processes.

Training is post-rationalisation

For me, he explains why most training is post-rationalisation, simplistic stories we tell ourselves about cognition. We latch on to abstract words like creativity, critical thinking and resilience then wrap them up in PowerPoints to create coherent stories that are quite simply fictions. This is why they are so ineffective in the real world. They make you think there are easy solutions, simple bromides for action, when there are not. He thinks this is all wrong and I think he is right

Cognition is improvisational

Chater supports his arguments fully by discussing various psychological studies and experiments. He proposes that our thoughts, feelings, and behaviours are largely improvisational and context-dependent. According to his theory, our responses to situations are not driven by inner beliefs or desires, but are rather ad-hoc constructions created on the spot. This idea challenges the traditional views of psychology and suggests that much of what we believe about our internal thought processes might be an illusion.

Attacks psychoanalytic and psychotherapeutic worlds

It is a direct attack on the whole psychoanalytic and psychotherapeutic world and if true, renders much of what passes for psychology as speculative rot. He challenges the whole notion of a complex, subconscious mind that can be unlocked or understood through psychoanalysis or similar therapeutic methods. Since our thoughts and behaviours are improvised on the surface level and are context-dependent, delving into the supposed depths of the subconscious to find hidden meanings or repressed memories, as is often the goal in psychoanalysis, is likely to be misguided. He suggests that the mind doesn't work in the way psychoanalysis proposes, with its emphasis on uncovering deep-seated, unconscious desires and motivations.

Over-rationalise

We over-rationalise when it comes to ideas about the brain, when it is fantastically complex and opaque. He touches upon Tolstoy, where Anna Karenina commits suicide – but why? The stories we tell ourselves about her motivations are, for Chater, quite wrong, as she would be incoherent about such things. Rationalism is the mistake here, the idea that there is a true answer for everything. Dennett has taken a similar position, where conscious rationalisation is always retrospective, delving back in to the brain. The brain does huge, complex, parallel computations and has no locus or simplistic causes, the same applies to LLMs, there is no pace you can point to for the production of an answer. The brain, like a LLM, is necessarily opaque.

Stories are misleading

We are improvisors and this is where our 'storied-self' is misleading. We simply make most things up and use simple and approximate models to get through our lives. These simulations are often crude. Geoffrey Hinton in 1979 talks about the shallowness of human inference, using imagined cubes as an example. Our simulations of the world are momentary and not wholly coherent. We build models of it (the cone of experience) trying to see it as consistent. In fact, we deal with very localised bits of the world, a sliver of reality. We can’t model the world in our brains as the world is much larger than us! It is all a matter of approximation, analogy and past experience.

We latch onto abstract models and essences but these are far too reductive. Human exceptionalism is a good example of this, with words like ‘creativity’ and in general 21st century skills. Chater thinks these are misleading terms as they are too abstract and exclude the complexity of actual cases. He and Wittgenstein are, I think, correct on this. Language is promiscuous and tends to over produce abstractions which we think are real but turn out to be just that – misleading abstractions.

Our sense of our own psychology is almost completely wrong, as we have incredibly limited perception of the world through our senses and our minds work very differently from how they think they work. Colour is unlikely to be essential out there in the world, as they are mental constructions, similarly with temperature, as experienced. 

Bayesian brains?

One could argue that there are fundamental models, like pure reason, mathematics and science – axiomisation does happen, often after huge amounts of effort, but very few things are, in practice, axiomised. We may have some of this axiomised knowledge in us but this is unlikely to be foundational in the way psychology or neuroscience thinks it is.

As they say - all models are wrong but some are useful. We can, for example, hypothesise about the brain being a Bayesian organ. This may be true but more likely to use things similar to Bayesian approaches to cognition. Tom Griffiths and Josh Tanenbaum follow this line but Chater thinks this is very localised and not sufficient for most cognition.

Conclusion

It has been a while since I read anything that so reversed my long-held beliefs. Heavily influenced by my reading and work in AI I had been coming to a similar, but ill-informed and badly-evidenced belief that this was indeed the case. It changes your whole perspective on cognition and behaviour. AI is showing us that much of what passes for human behaviours can be reproduced to a degree by LLMs and other forms of AI. This should not be so astonishing, if Chater is right, that we are quite shallow thinkers.


Sunday, February 25, 2024

There’s a new Sheriff in town – AI!

Keynoting at conferences where the audience has absolute focus in a sector is sometimes better than the general conferences. They are keen to find out how AI can help them with their specific problems and goals. It leads to more practical discussions and questions. In the last year I have given presentations to national tax, police, waste management, recruitment, immigration, HR, military and global consultancy organisations, also specific online learning companies. They all have one thing in common – they already use AI, sometimes extensively, at an operational level, and know they need to get to grips with this technology in other areas, as it is the technology of the age. I will do a short series of articles on specific sectors, as I’m on the road in several countries giving more of these over the next few weeks. (Image on left by DALL-E)

First up – the police, as I’ve given three keynotes to national police Colleges in the UK and Netherlands and for the EU. There’s a new sheriff in town – AI! Well not really, as the police are pioneers of AI. 

AI in policing

ANPR (Automatic Number Plate Recognition) was invented in UK in 1976 and in use since 1979. It truly revolutionised policing and now the UK has 60 million ANPR ‘reads’ a day. It acts as a deterrent to reduce crime and catches everything from stolen vehicles and uninsured vehicles to major crime and counter terrorism. Then there are its more mundane uses which we use every day in car parks, tolls and logistics tracking. It is a great example of the massive benefits that can accrue from a simple piece of AI, in this case character image recognition, something that has been around for decades.

CCTV was first used in UK in 1960 for crime prevention and the detection of offenders. Again, with face recognition, it can and has been used to identify serious offenders such as murderers, sex offenders and figures in organised crime. It is now essential for crowd control and public order. It is often combined with face recognition, not only from CCTV but also mobile phone footage, dashcams and doorbells. 

It is also used when you cross borders on immigration gates. I haven't spoken to a border guard coming back to the UK for many years. It has been automated. In fact, humans are now the main point of failure. My wife cannot enter the US because of a poorly trained TSA guard at LAX, who knew neither the rules nor had the ability to solve the simple problem (a long story - see end of this article). When you slide in your passport at an automated gate, it compared your face with one stored on the chip on your passport. This has literally eliminated the need for thousands of border control agents. Why? It is accurate. Finger printing will be introduced across the EU this year, again using AI. The same can be done for documentation.

There is a very long list of other uses, including crime analysis and investigation, forensic analysis, traffic management, drone surveillance, cybercrime detection and social media monitoring. I could go on but you get the idea. AI is already deeply embedded in crime deterrence and detection.

Of course, AI may create its own problems with scams and deepfakes. This will undoubtedly happen. My own view is that this is less a threat than people think. Deepfakes are usually moderated out by AI on social media by AI, as it polices itself. Yet audio and video are increasingly used to scam people, to make the scammers seem authentic. At this level the police need training on that topic.

One of the great things about these events are the concrete projects, real projects used in training that have been applied or are underway. I have seen a range of projects that really were stunningly specific and useful – in forensics and the general training of police officers. 

Productivity

You walk away from such events knowing that AI could result in massive savings in productivity, especially as policing is a text-laden process, the bit no-one likes – the paperwork. Using AI to create, improve and just do administrative tasks not just faster but better would be straightforward. 

Transcription

Transcriptions alone could save millions. Throughout the police investigation process, and in courts, statements are taken and proceedings recorded. This is a massive opportunity for automated transcriptions. 

Translation

Translation in police stations and out on the streets is another. Using real translators is expensive and difficult logistically. Real time translation to and from a massive range of languages is now possible.

Training

But it is in training where they have most to gain. These are people with increasingly complex and difficult jobs who could do with all the help they deserve. From learner engagement through learner support, content creation, personalised learning, feedback, formative and summative assessment, along with performance support, almost every aspect of training could use AI.

The police have a tough job that requires a LOT of training. They have to deal with aggression, violence, abuse, mental illness, drugs, alcohol, murder, even death. This requires an astonishing array of knowledge and skills. 

Simulations and scenarios

AI could help alleviate that problem with a focus on scenario-based learning, using AI to design and build lots of good dialogue-based scenarios. This is the real interface between the police and the public. I’m told that new recruits are often ill-prepared for the situations they find themselves in, unable to talk things down, too ready to reach for the pepper spray. This type of training can be done well through lots of exposure to scenarios that give pre-training before you hit the streets.

Performance support

I had several interesting conversations afterwards around the use of simulations for driver training, 3D mixed reality projects using VR in forensics, and the possibility of AI improving administrative productivity. The one topic I felt was most interesting was the idea that AI can be used for performance support. Policing is all about being out there, doing things in the real world, difficult things. It needs a wide array of skills, a fundamental and accurate knowledge of the law, high-level interpersonal skills (especially de-escalation), physical handling skills, high-level driving skills, communication skills, medical skills… I could go on but you get the idea.

The one thing that is missing in the current model is performance support for training out there, in police stations, in cars wherever. There can be no doubt that most police officers and back-office staff learn a lot on the job from colleagues and more experienced staff. This seems like the perfect context for an AI-driven performance support system. It could deliver, for example, usable advice, whether needed in the field, on the law, processes, procedures and so on, as real checklists, job aids and support.

Federal problem

One of the problems the police face is the federal and fragmented nature of their organisation. The United Kingdom has a total of 45 territorial police forces. This includes 43 forces in England and Wales, the Police Service of Northern Ireland (PSNI), and Police Scotland. Additionally, there are several special police forces that operate across the UK, such as the British Transport Police, the Ministry of Defence Police, and the Civil Nuclear Constabulary, among others. However, these special forces have specific jurisdictional responsibilities rather than geographic ones. This makes communal and well-funded projects difficult. There is a real need for a mechanism for at least sharing or projects that can be centrally funded by all, then distributed back out to save time and money.

Conclusion

I wished I had had more time with these people. They know about the need for good training. What they need is help in delivering that training more effectively, lifting themselves out of classroom PowerPoint, into more realistic training that results in real transfer to the job on the street.

PS
My wife has been banned from travelling into the US since 2016. We were travelling to New Zealand via Los Angeles (LAX) and had to simply transfer aircraft. I got through as my passport was renewed. My wife had an Arabic stamp that the TSA guy, with all they're typical arrogance and poor training, thought was dodgy. She explained that it was a Syrian stamp, from six years ago, when we went on a holiday to Syria before their war with our kids. He didn't believe her and off she was marched to the back office, where she sat for ages finally being interviewed by an equally obnoxious person. They couldn't read the date or month on the stamp because no one could speak or read Arabic! (A problem that could have been solved in two minutes by checking what the numbers were on Google.) She explained that this was before the war had started but they were dismissive, did nothing to try and clarify the matter, and we were marched through the back of the airport, put on our Air New Zealand flight and told aggressively that she was NOT allowed to return. This cost us a fortune as we had flights booked via Vancouver to San Francisco back to London - - all Business Class, all lost. We also had to book two new flights from Vancouver to London. It was like dealing with gun toting idiots - all bravado, poor training, poor resources and even less common sense.

She has never been back to the US but it's a big world out there so, for her, it is no great loss.


Thursday, February 22, 2024

Rather than deepfakes, censorship and surreal ethical fakes are the problem?

Google have just shot themselves in the foot with their release of Gemini. Social and mainstream media has been flooded with pieces showing that their text and image generation behaves like some crazed activist teenager.

When asked to create images of a German soldier it created black, Asian and female faces, so keen was it to be ‘diverse’. I won’t give other examples, but it almost looks as of the Gemini LLM is mocking its creators for being so stupid. It's as of the language model fed back to them the craziness of their own internal ideological echo chamber and culture.

This image shows what happens when idiotic guardrailing overrides common sense. We get the imposition of narrow moralising on reality and reality loses. More than this, straight up utility loses. The tool gets a reputation for being as flaky as its moralist moderators. This is not like the six fingers problem, a weakness in the technology itself, it is the direct result of human interference.

What is Guardrailing?

Guardrailing is a complex business and needs to be carefully calibrated. So what is ‘Guardrailing’? It attempts, like road barriers to provide safeguards and constraints to stop responses producing harmful, inappropriate or biased output. These three words are important. 

Harm

No one wants child porn, porn or actual content that causes real physical and 'extreme' psychological harm to be produced. This has long been a feature in ethics around the line that should be drawn within freedom of expression (not just speech). We need to err on the side of freedom of expression as that is enshrined in our democratic society as essential. That throws the definition back to the word ‘harm’, as defined legally, not by someone who ‘feels’ as though they have been harmed or think that harm is synonymous with offence.

Inappropriate

This is different from harm in that it depends on a broad social consensus around what is appropriate, a much harder line to draw. Should you allow swearing (some literature), nudity in any form (thing classical art) and so on. Difficult but not the real problem as there is, largely, a social consensus around this.

Biased

This is a dangerous term, and where things can go way off balance. I use that word ‘balance’ as we cannot have a small group imposing their own personal views on what constitutes balance being applied to these systems. This clearly happened in Google’s case. In trying to eliminate what they saw as bias they managed to impose their own extreme biases on the system.

How is it implemented?

It always starts with policy statements and guidelines. This is where things can go badly wring, if the people writing the guidelines apply their own personal, ideological or activist beliefs – whether from the right, left, wherever. This is a huge lever as it affects everything. This is clearly where things went wrong at Google. A small number of moralisers have imposed their own views on the system. They need to be removed from the company.

The guidelines are then implemented using content filters, often prompts directing output towards supposed unbiased generation. The problem here is that of your guidelines are too constrained you eat not just into freedom of speech but also functionality. It simple doesn’t do things, like reply or create an image. Goggle have gone back to the filters to recalibrate.

Prompt modification means, they take your prompt then add other criteria, like 'diverse', 'inclusive' and other positively discriminatory descriptions. This is most likely in this case as the outputs are so obviously and crazily inappropriate. 

Moderation is another technique. This is a bad idea as it is slow, expensive and subject to the vagaries of the moderators. You are far better automating and calibrating that automation. Although there are several exceptions to this, such as porn, child porn, extreme violence and other actually harmful content.

You can also curate the training data. This is less of a problem as the data is so large that it tends to eliminate extremes. Indeed some of the problems created by Google have been on ignoring clear social norms that the training data would have produced. Apply a narrow definition of identity and you destroy realism.

There is also user testing. This has really surprised me. I know that Gemini was tested widely among Google employees before release, as I know people who did it. The problem could be that Google tends to employ a certain narrow demographic, so that testing is massively biased. This is almost certainly true in this case. Or, more likely the image generation wasn’t actually tested or tested only with their own weird ‘ethics’ team 

You can also put in user constraints that apply to single requests and/or context, such mentioning famous names in image generation. That’s fine.

Conclusion

I warned about this happening three years ago in a full article, and again in a talk at at the University of Leeds.

Who would have thought that rather than political deepfakes being the problem, censorship and surreal ethical fakes by flaky moralists, would flood social media? Guardrailing is necessary, and a good thing, but only when it reflects a broad social consensus, not when it is controlled by a few fruitcakes in a tech company. You can’t please all of the people all of the time but pleasing a small number at the expense of the majority is suicidal. Guardrails are essential, imprisoning content or allowing the generation of massively biased content, under the guise of activism, is not.

The good news is that this has happened. I mean that. In fact, it was bound to happen, as there was always going to be a clash between the moralisers and reality. We learn from our mistakes and Google will rethink and re-release. That's how this type of innovative software works. Sure, they seem hopeless at testing, as five minutes of actual testing would have revealed the problem. But we are where we are. The great news is that it knocks a lot of the bonkers ethical guardrailing into touch. 




Wednesday, February 21, 2024

Algorithms, optimisation and football

People sceptical of AI and algorithmic power should take note of my local football team, tBrighton and Hove Albion AKA Seagulls. For a small town, we topped our group in the Europa Cup, are still in the FA Cup and despite being decimated by injuries still 7th in the Premiership above Newcastle and Chelsea. That last name is critical as they have ripped out our manager, top staff and several players.

Tony Bloom

Having splashed out hundreds of millions, many of these teams will find it difficult to splash out even more on either our manager De Zerbi or any more of our players. But the secret sauce is not in the manager but Tony Bloom, the owner. It is he who finds the managers and players. One of the few local, genuine supporter owners in football. He made his fortune gambling, then as a gambling entrepreneur. He heads a private betting syndicate who are known to have been phenomenally successful in betting in sport.

He has been the Chair of the club for 15 years and has built a system of sophisticated data collection and algorithmic selection for scouting new players. It remains a secret, held by a separate company called Starlizard, so that no matter which scouting staff or manager comes, Bloom literally holds the key.

This focus on recruitment is the feed that creates a robust organisation that buys cheap and sells for top dollar, Caicedo cost £4.5m, sold for £115 million – to, you guessed it - Chelsea. His current roster has several players in that league, many young and therefore more valuable. They also play the sort of football that has become popular in top flight leagues – playing out from the back, pulling the opposition towards you and breaking fast.

What lessons can we learn from him? 

Leaders matter but not in the way leadership books and courses would suggest

Bloom is whip smart, driven and very much behind the scenes. He is wholly strategic, not tactical. His talent is in understanding that even a complex organisation, in a stochastic sport like football, needs to be run on high-quality decision making. That means decisions based on data and optimisation, not charisma or hunch. Data and algorithms are in his DNA, not vague nostrums about Leadership.

Recruitment matters but not in the way you think

Recruitment is data driven, a long list of data types are collected and fed into a n algorthmic process that flags targets for acquisition. He is interested in pure performance, not values or vague criteria and personal qualities. Actual performance; match time, successful passes, tackles, turnovers, shots, goals – and much, much more. All of this is monitored.

Deal making

Deals start with early contracts and he makes sure they are long deals with good exit fees. His promise is clever, come to Brighton, we’re in the best league in the world, the Premiership, and we can showcase you so that you can get into a top club anywhere else in the world. And when it comes to selling, he’s a master. As a successful international poker player, he fully understands both the fiscal and psychological moves that have to be made.

Growth 

When he became Chairman in 2009, he hired Poyet and got promotion as Champions in 2011. After a series of managers, Chris Houghton got us to 3rd place in the Championship then promotion the following season in 2016-2017. It was then he started to be really active in the transfer market. Even then, he was replaced by Graham Potter who took us to an all-time high of 9th in the Premiership., getting us into Europe. Chelsea (yes them again) stole Potter but Bloom made possibly his best hire yet, De Zerbi, applying all of his data analysis and algorithmic nous to even the managerial position. In other words, he understands gradual but steady growth.

He now has a huge war chest to invest in over the Summer, made some fantastic signings, especially Barco, touted as a huge talent, stolen from beneath the noses of the big boys and is ready to take things to the next level. This is poker at the highest level, a game of probabilities, tempered by maths, data and algorithmic decision-making. You never see him blowing off on TV yet the people of Brighton love him, as he’s humble, a real supporter, self-made man, and has put his money where his mouth is. This is no lazy, wealthy Gulf prince or Russian oligarch. This is the real deal, a real Leader.

An interesting idea also emerges here, that organisations who get optimisation right will be winners, the rest the losers. This demands our attention as it is likely to happen. It means getting with the programme now, to understand the technology of optimisation.

Seagulls!

Football was the only ‘real’ sport in my culture, at school, in pubs wherever. We played nearly every night beneath the yellow street lamps, even in the rain, on odd shapes of grass on the edge of our scheme in Craigshill (known as Crazyhill). A speciality was bouncing the ball like a cushion shot in billiards off the wall to get past a player. Some of us went week in, week out to matches, in my case Glasgow Rangers, home and away – Scotland’s a small country so it was easy.

When I ended up in Brighton, as far away from Scotland as you can get, without getting wet, a colleague at work, Clive, was a fanatical Brighton supporter, so I started going to the Goldstone. I arrived the year after they had appeared in the FA Cup Final and this was a different atmosphere, players like Frank Worthington, even the occasional Scot like Doug ‘chipped from a block of ice’ Rougvie. It was fun. But then they lost their stadium, imploded, narrowly missing relegation to the Conference League in 1997. It was desperate.

Then, despite protestations from Sussex University, a local man and Brighton supporter made good, Tony Bloom, put £90 million on the table for a new stadium. We never looked back. After 34 years out of top flight football we climbed back up to the Premier League. The promotion parade on the seafront was fantastic. At that time we had ‘Skint’ on the shirts as Fat Boy Slim was a supporter and sponsor. The stadium sits, nicely nestled in the Downs and at the start of every game, there’s always a seagull or two circling high above the pitch, Seagulls! being the club’s standard chant. The crowd always, quaintly, kick off the match with a rousing ‘Sussex by the Sea’ a First World War marching song.

Occasionally, it hardly ever happens now, the opposition would sing some homophobic chant, like ‘You’re going down on each other, you’re going down on…” to the tune of ‘Guantanamera’, actually about a Cuban woman, but there you go. Our fans would respond with ‘You’re too ugly to be gay, you’re too ugly…” to the same tune. In truth it was all a bit banterish. People forget that this is sport born of the industrial need in the 19th century for workers to have some fun at the weekend, after a week of hard labour. Going to the football is always a bit of a laugh. The beautiful game is working class Britain’s gift to the world.

Anyway. After nearly 40 years in Brighton I’m a Brightonian now, and like many, a Saturday is spent eyeing my phone for the result. It is a feeling that comes across you on a Saturday, of excitement and expectation, watching the clock for kick-off time. It turns the day into a drama. 

We’re playing brilliant football and despite London clubs run by Middle Eastern and Russian billionaires stealing our manager, support staff and players, we’re flying. 

I’ve be in all sorts of places around the globe this year and often the first conversation I have in the taxi, restaurant or meeting is about ‘Brighton… and Hove Albion’. People talk a lot about ‘culture’ these days but those who have real culture don’t use that word, they live it. Seeeagulls! 

Sunday, February 18, 2024

I am become life, the creator of worlds?

Can words now create worlds? Has AI suddenly acquired God-like creation qualities?

A short video of two black pirate ships sailing in a stormy sea of coffee, within a coffee cup, has caused a splash by showing what can be done with video but also created a stir by suggesting something astonishing. My son is a games player and AI expert. His immediate reaction on seeing the Sora videos was the slightly perfect and gamesy feel of the images. 

Could OpenAI have developed something truly astonishing here – a 3D world simulator? Is this the converse of Oppenheimer’s famous statement, 

I am become life, the creator of worlds?

To date Generative AI was limited to text and images and lacked a model of the world, of a 3D space, time, causality and action. 


This video show a video generated from a prompt, astounding in itself but what it may reveal is the following:

Possibility in the future of a physics engine that understands how objects behave, in this case the two pirate ships that never collide. That they do not collide is relevant as they must know several the position of the two boats at all times in a 3D space. It is physics that grounds models in reality. Note that the way this works is not by having a physics and collision engine, only that the data, from computer games will have been created using such tools.

Behaviour of the ships on the water suggests the possible future detailed knowledge of fluid dynamics, as the coffee whirls around and even creates waves and froth. Again, it is not being created from the mathematics of fluid dynamics but a clever diffusion model

Cup size and limitations of the cup space, showing a knowledge of small object and the ability to scale two very large objects down into a small space.

Sharp realism with correct lighting and shadows is also astonishing. This is not a rendering engine but, again, a trained diffusion model.

There are suggestions that this could have been training using data from Unreal, the games’ engine, in particular, synthetic data from that engine. YouTube and others sources are also clearly in there. This means it is trained on a combination of real and virtually created worlds. There also seems to be a time component. This is interesting, as that variable is missing in other modes.

If they have created such a thing, this is far more than just video creation. It is a step towards the ideea of the creation of 3D worlds using AI, something I mentioned in my book on Learning in the Metaverse. Being able to create any 3D world is a far bigger deal than video, as it opens the way for another revolution in media and learning. We are nowhere near that yet.

In truth there are two opposing routes to solving this problem. and both were released this week - OpenAI/Sora v Meta/V-JEPA. OpenAI has developed Sora, recognised for its text and video-to-video modelling capabilities, aiming ultimately to create a world simulator. However, Meta's AI chief, Yann LeCun, criticises this method, considering it impractical and likely to be unsuccessful. He contends that generative models are not suitable for processing sensory inputs due to the high level of prediction uncertainty associated with high-dimensional continuous sensory data.
In response, LeCun has introduced his own AI model, V-JEPA. This model utilises a non-generative approach and is designed to predict and interpret complex interactions. Its primary function is to understand the dynamics of objects and interactions, thereby enhancing the AI's comprehension of these elements.

We are 3D people, living in a 3D world doing 3D things with other 3D people and 3D things. Yet, bizarrely, most teaching and training if from the flat, 2D page – text, books, graphics, PowerPoints and screens of e-learning. This has always been largely suboptimal and prevents actual learning of skills and transfer.

In the beginning was the word and now we, like small Gods, can use the word to create new worlds. We are in dialogue with the world to create new ones. That simply act of saying something can make it appear, breathe life into that world. I find that more than interesting, it is staggering.

We may have, in this tool, the ability to create worlds, any world, on any scale, in 3D by simply asking it? If so, this is a threshold that has been crossed. We will be able to create worlds in which we work, interact and get things done. Also worlds in which we teach, train and learn. Even worlds in which we socialise and get entertained. We may be doing what has only ever done on a limited scale in incredibly expensive simulators and computer games – understand and create new worlds.  Multimodal may now mean a grand convergence.



Friday, February 16, 2024

Sora - as Producer, Director, Screenwriter, Cinematographer, Casting Director, Costume designer and actors

Sora, from OpenAI, may go down in the history of movies or moving pictures, as a pivot point. It is significant as that filmed train thundering into La Ciotat, scaring the theatre audience or The Jazz Singer, the first feature-length talking film. 

I’ve been involved in making a feature film, way back in the 1990s, at Epic, we made ‘The Killer Tongue’. It was a schlock-horror where a woman killed men who had wronged her with her tongue. It was all very fraught…. And expensive and we lost an eye watering amount of money. Lesson learnt. One amusing moment was when Quentin Tarantino said we “Had the best film poster at Cannes”… we dropped the word ‘poster’ as you do… in Hollywood! Didn’t work – the film bombed.

Film making has just been turned on its head, no longer requiring huge investments in production. All the components seem to be heading towards very low cost – everything – sound, lighting, worlds, people, action. A bit like painting and photography, only faster.

Sora is just so powerful. Even on its first release the shorts were stunning, the movement, lighting and reflections. There’s a moment when what looks like a Japanese woman walks down a street and turns to camera and the whole scene is reflected in her glasses, where the movie camera should be, but it isn’t. there is no camera as it has been replaced by a text prompt. In another two pirate ships heave around in a black sea of coffee, prompted by the simple worlds “Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee." No prompting course needed.

This puts movie making in the hands of creative people who can dispense with the very high production costs of a crew, set and, at some point, perhaps actors. It may even do a good job at editing. AI is already doing colour balancing volume setting across cuts and many other functions, it has just jumped into the Director’s seat, into a creative role. 

There will be those who will baulk at this, in the same way people baulked when photography challenged painting, printmaking challenged original works and CGI in film. But this is different, as it is not technology that is scaling or become a new medium, it is technology as a creative agent. This is technology, as Producer, Director, Screenwriter, Cinematographer, Casting Director, Costume designer and actors. This is technology as movie-maker. It democratises movie-making. Some will recoil that, especially those who make money from scarcity, the Hollywood moguls and their crew. It will reshape film making, in what way no one can be entirely sure but things change, life changes, art changes. It is the very definition of art.

Ever since the concept of the ‘arts’ and the 'artist' arose in the Romantic movement in the late 18th century in Germany, seeped in German idealism, and the concept of the individual artist, imported by Coleridge and taken up with enthusiasm by everyone in the arts to this day, we have worshipped the artist as an individual. Well, here we are he/she has arrived- we now that very concept in film. The AI auteur has arrived.

Going back to that moment when the woman turns to camera and there is no reflected camera, the fact that film making is free from the physical constraints of the camera, sets and costs, we may see a Renaissance of film making. Free from the tyranny of actual optics and physics, anything is possible. On the other hand if you want realism, the actual realism of historical settings may be far easier, with more authentic props, clothes, items, weapons and so on, than was ever possible before.

If you want extinct animals, you can have them, thundering across a snow-covered landscape, without the cost of going there. CGI is now well and truly yours.

An animated character that is the product of your imagination, just tell it what you want in a prompt.

A fast moving car chase, from a helicopter or drone, with good lighting through a dirt landscape.

A busy street scene with a Chinese Dragon and large crowd moving towards you.


New Genres

We may see new genres emerge, certainly a widening of participation and what can be done in moving images, as anything is now possible. My own view is that this will combine with other forms of AI and the shift from 2D to 3D, to create film experiences where we participate, either as agents within the story or our avatars as participants in the story. Movies and games will combine to form a new medium.




Thursday, February 15, 2024

Sora and Gemini 1.5 - two mind blowing releases within hours of each other


No sooner than I had written about how important ‘context windows’ for using AI for teaching and learning, within 24 hour Google have written about their next release Gemini Pro 1.5, which blows the whole market open – and guess what the great innovation is? A MASSIVE INCREASE IN THE ‘CONTEXT WINDOW’. 

Then, within a few hours another announcement – OpenAI’s release of Sora and we have an absolutely INSANE text to video model from OpenAI. Creates real & novel scenes just from text descriptions. This is a flip moment, as we all thought this was years off... implications for learning - huge... crazy good videos, lighting and movement. Not only that we see something interesting way out on the horizon. The whole Hollywood, Netflix thing is now up for grabs. Social media may well become the new source of entertainment and art.

The context window, what the model can ingest has just gone through the roof, in fact several roofs. They plant to that start with the standard 128,000 context window, then scale up to 1 million tokens, as they improve the model.  This will mean it can take in huge amounts of tokens and is multimodal. Whole books it east for breakfast, collections of documents, full movies, a whole series of podcasts.

The examples are compelling, so here’s just a few sets of seven. I could have given tons more….

Text

It can ingest giant novels then find exactly what you need. They took Victor Hugo’s five-volume novel “Les Misérables”, which is an astonishing 1382 pages, sketched a scene and asked “Look at the event in this drawing. What page is this on?” Got it right.

The opportunities in learning are many:

1. Summarising any document, not matter how long

2. Finding something within an enormous text file

3. Huge sets of HR documentation tuned into accessible resource

4. Use by a tutorbot to answer student queries and questions

5. Feedback and marking text assessments

Audio

I have recorded a large series of 30 podcasts on Great Minds on Learning. They’re an hour each and the initial tests suggest these could be ingested and used for learning.

The opportunities in learning are again many:

1. Summarising tons of podcasts

2. Finding specific chunks of podcasts to answer a query

3. Interpreting communication skills

4. Feedback and marking of spoken assessments

Video

It gobbles up entire movies and you can ask questions about what happened in those movies. The Buster Keaton example interprets a pawn ticket taken from someone’s pocket. The model can answer complex questions about the video content, or even from a primitive line drawing.

The opportunities in learning are once again many:

1. Allowing search of video for performance support on a specific task then playing it back

2. Allowing the learner to ask for more detail on a specific event or task

3. Looking for a specific solution to a specific problem 

4. Interpreting a trainee’s performance from video identifying successes and failure, with feedback on correcting and improving performance

5. Taking a lecture and annotating it with extra resources

6. Turning any video into a deeper learning experience

7. Interpreting video assessments where content & behaviour matters

Conclusion

These two releases alone will have a huge impact in learning. They bring video PLUS AI ingestion and interpretation of video into play. But we have to be careful. Video is an odd medium for learning. We tend to think it more powerful than it is. that is because of the transience effect. I covered this in detail in my book Learning Experience Design. This is NOT about the generation of media but about the generation of learning.


Wednesday, February 14, 2024

AI gets massive memory upgrade - implications for AI in learning


Human memory

A strong feature of intelligence is memory. In humans this is complex, with several different system interacting; sensory, episodic, semantic, along with encoding and retrieval mechanisms. It is not as if human memory is even that good. Our sensory memory is severely limited in range and timescale. Working memory is down at three or four manipulable things within a limited timescale. Long-term memory is fallible and degrades over time, sometimes catastrophically, with dementia and Alzheimer’s. The brain could accurately be described as a forgetting machine, shown by the fact that we forget most of what we try to learn.

AI memory upgrade

The good news is that Gemini and ChatGPT both got a memory upgrade, although Gemini is massive. This is really important as, especially in learning applications, knowing what the learner has said previously does matter. This is not only a context window upgrade – that has been happening for some time, it is also persistence of memory, what it remembers and what control you have over its memory.

First it will eventually be able to remember who you are and things about you that matter for learning, such as first language, age, existing skills sets, diagnosed learning difficulties such as dyslexia, and past exchanges.  Pre-existing knowledge is the big one. One can get this done up front by feeding it personal data or the system can ‘keep in mind’ what you’ve been telling it or what it can infer. You can also harvest data from formative assessment. This can reduce redundant exchanges and increase the efficacy, speed and quality of teaching and learning using AI tutors.

You will also be able to choose from a suite of privacy controls, effectively managing memory or what Chat GPT remembers. For example, you may want to remember a lot for the purposes of a long learning experience or just have a throwaway chat.

Human v Generative AI memory

Both human memory and generative AI involve store and retrieve information. In human memory, this process is biological and involves complex neural networks. In generative AI, information is stored digitally and retrieved through different forms of neural networks on a different substrate.

We are similar but different. For example, we humans recognize patterns based on past experiences stored in our memory, generative AI models also recognize patterns in the data they have been trained on. This ability is crucial for tasks like image recognition, language translation, and generating coherent text in dialogue, as well as the generation of images, audio and video.

Just as humans learn and adapt based on their memories and experiences, generative AI models learn from the data they are exposed to. This learning process is what enables these models to generate new content that is similar in style or content to their training data as well as being trained by humans. Newer model, used in automated cars, for example, take video feeds showing what a driver would see over millions of miles driving to improve performance.

Both human memory and generative AI can generalize from past experiences to new situations. Humans use their memories to apply learned concepts to new scenarios, while generative AI uses its training to generate new outputs that it has never explicitly seen before. Human memory is also associative, meaning that one memory can trigger related memories. Generative AI models can mimic this by generating content based on associations learned from their training data. Both human memory and generative AI adapt and modify their responses or outputs over time, albeit differently. Humans learn from new experiences, while AI models can be retrained or fine-tuned with new data to change their outputs. The first is actually quite haphazard, the second more difficult but defined.

Of course, just as human memory is not a perfect record and can change over time, generative AI also does not produce perfect replicas of its training data. Instead, it creates approximations that can sometimes include errors or novel creations. An interesting aspect of this flexibility, even fallibility of memory, is that just as human creativity is deeply linked to our experiences and memories, generative AI can also 'create' new content.

Context window

One concept fundamental to AI memory is the ‘context window’, the amount of text the model can consider at one time when generating a response in dialogue. It is the maximum span of recent input - words, characters or tokens - that the model can reference while generating output, like our working or short-term memory.

The size of the context window depends on the model. Early versions of GPT had small context windows. GPT-2 had a window of 1,024 tokens, while newer versions such as GPT-3 and GPT-4 have a context window of around 4,000 tokens and now much more. The size of this window impacts how much previous text the model can 'remember' and use to inform its responses. 

This matters because if the input exceeds the model's context window, the model may lose track of earlier parts of the conversation or text. Conversely, a larger context window allows the model to maintain longer conversations or understand longer documents, providing more relevant and coherent responses. However, here’s the downside; processing longer context windows also require more computational power and memory and may also affect accuracy and the quality of the response. Large context windows in Claude led to poorer performance.

All of this matter in practical applications, especially in teaching and learning, as the context window affects tasks like conversation, content generation and text completion. For example, a larger context window allows the model to reference earlier parts of the conversation, making it more effective in maintaining contextually relevant and coherent discussions, obviously useful in teaching and learning, for both the machine tutor and the learner. There are techniques one can use to mitigate these limitations such as a ‘rolling window’ or ‘summarization’ of previous content but it is still a problem. However, this is similar to the problem human teachers face when trying to remember where different students are using their known very limited working and long-term memories.

Cost

One major issue is cost. You can expand the context window but the costs are very high, supporting RAG alternatives.

Conclusion

Generative AI has a long history from Hebbs onwards of mimicking the human brain, either directly or metaphorically. This is especially true of learning (a common word in AI) and the way neural networks evolved and work. They are not the same, indeed very different, but in both cases, humans and the machine and humans learning from the machine, memory really matters in teaching and learning. 

In one sense learning theory is memory theory, if you define learning as a relatively permanent change in long-term memory, which is a pretty good, but still partial, definition. It is a constant battle with forgetting. Keep in mind, or in your memory, however, that despite these similarities, human memory and generative AI operate on fundamentally different principles and mechanisms. Human memory is a complex, biological and messy process, deeply intertwined with consciousness and emotions, while generative AI is a technological process governed by algorithms and data. Oddly, and maybe counterintuitively, the latter approach may result in better actual performance in teaching and learning, even generally. I think this type of informed input from learning science will really improve AI tutor systems. To be fair simply increasing context windows and the functionality will most likely have the same effect.