AI in our workplaces:
Hype and reality (2026)

The impressive capabilities of „large language models“ such as ChatGPT, Claude and Gemini have sparked huge interest and promise to have a deep, potentially disruptive impact on many different sectors of society. Virtually every organisation's board and top management is pressuring everyone to use so-called „AI“, usually without specifying exactly how, and without addressing the deep and still unsolved reliability issues of this technology.

MIT: The GenAI Divide STATE OF AI IN BUSINESS 2025
Carnegie-Mellon, Duke: TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
- Leaderboard
Salesforce: CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions
Princeton Towards a science of AI agent reliability
- Paper Towards a Science of AI Agent Reliability
- Interactive result dashboard HAL Reliability Dashboard
Meta's Head of AI Safety Just Made a Mistake That May Cause You a Certain Amount of Alarm
- Summer Yue on X: "... I had to RUN to my Mac mini like I was defusing a bomb. "
Cybersecurity: Agents of Chaos
- Interactive dashboard
Google NotebookLM
- The link above provides some examples of public notebooks ready to use.
- This is a public notebook for my students in Computer Networks and Principles of Cybersecurity course. Content in Italian but you can chat in English. The "Readme" notes describe how to obtain a self-assessment autonomously (notes in Italian, you may translate them automatically to have an idea).

Google Sites

Report abuse

AI in our workplaces:Hype and reality (2026)

AI in our workplaces:
Hype and reality (2026)