Analysis of AI Scaling. Leaked news about OpenAI and Google researchers suggested that new models are not better than the latest versions, indicating potential issues with the AI Scaling Law.
Expert Advice on AI Careers. Learn from smart people but think independently; build a broad base of technical and common-sense skills.
Recommendation for Brevity. More and shorter posts might be helpful, since content takes work and time to swallow and digest.
Content Length Feedback. Several comments have been received regarding articles being very long and detailed.
Energy Bottleneck. Among the many bottlenecks for AI data centers, energy might be the most important and difficult to address.
Hallucinations in LLMs. Large language models are susceptible to hallucinations, and distinguishing types of hallucinations is crucial for mitigation.
Nuclear Energy Agreement. Google signed the world's first corporate agreement to purchase nuclear energy from multiple small modular reactors to be developed by Kairos Power.
Community Engagement. We started an AI Made Simple Subreddit to foster community interaction.
Support Model. We follow a 'pay what you can' model, which allows you to support within your means.
Content Variety. The focus will be on AI and Tech, but the ideas might range from business, philosophy, ethics, and much more.
Gemini Model Success. Google's latest AI model, Gemini-Exp-1114, has topped the Imarena Chatbot Arena leaderboard, surpassing OpenAI's GPT-4o and o1-preview reasoning model.
Investment in xAI. Elon Musk's artificial intelligence company, xAI, is reportedly raising up to $6 billion at a $50 billion valuation to acquire 100,000 Nvidia chips for a new supercomputer in Memphis.
AI Music Advancements. Suno V4 introduces significant advancements in AI music generation, including improved audio quality, dynamic song structures, and innovative features like the ReMi lyrics assistant.
Figure 02 Performance. Figure AI's humanoid robot, Figure 02, has achieved a 400% increase in speed and a sevenfold improvement in success rate on BMW's production line.
Nvidia Chip Challenges. Nvidia's Blackwell GPUs, initially delayed due to overheating issues in server racks, may have had the problem resolved, but the challenge of managing energy and heat in AI data centers remains significant.
Denmark AI Framework. Denmark's new framework, supported by Microsoft, provides guidelines for EU member states to responsibly implement AI in compliance with the EU's AI Act.
OpenAI Policy Blueprint. OpenAI's policy blueprint envisions a significant role for the U.S. government in AI development, emphasizing infrastructure, energy systems, and economic zones to boost productivity and counter China's influence.
Job Market Changes. Generative AI tools like ChatGPT are rapidly reducing job opportunities in automation-prone fields, but those who adapt by acquiring AI skills may find new opportunities in the evolving job market.
AI in Healthcare. ChatGPT-4 outperformed doctors in diagnosing medical conditions, highlighting both the chatbot's superior accuracy and the potential overconfidence of doctors in their own diagnoses.
Productivity with Windsurf. Windsurf Editor by Codeium integrates AI collaboration and autonomous task-handling to create a seamless development experience, enhancing productivity through its innovative Cascade feature.
Mistral Competition. French startup Mistral has launched Pixtral Large, a 124-billion-parameter model, and upgraded its chatbot, Le Chat, to compete directly with OpenAI's ChatGPT.
.
. I believe that Agents will lead to the next major breakthrough in AI.
. Gemini is tediously overengineered- It’s trying to balance MoE with mid AND post-generation alignment.
. If your primary purpose with the LLM is to have it be the engine that ensures that everything works, then 4o seems like the best choice.
. I think they’re a very sketchy company, and they consistently hide important information.
. I’d strongly recommend doing your own research.
. Precision improves the reproducibility of your experiments, which makes your system more predictable.
. I’ve been bullish on Gemini/Google AI for a while now, but they have found new ways to constantly let me down.
. As an orchestrator, I have nothing positive to say about o1.
. Claude is very good at decompositions but lacks stability and has an annoying tendency not to follow my instructions.
. GPT 4o is by far my favorite LLM for the orchestration layer.
. The 'LLM as an orchestrator' is my favorite framework/thinking pattern in building Agentic Systems.
AI Poetry Study. A new AI study claims that ChatGPT can write poetry that is 'indistinguishable' from William Shakespeare.
Skeptical View. Gary Marcus hopes he has taught you by now to never trust the hype.
Critique Link. Davis's full critique can be found here.
Appendix Highlight. Stay for the Appendix, entitled 'Particularly terrible lines'.
AI Imitation. The AI poems seem like imitations that might have been produced by a supremely untalented poet who had never read any of the poems he was tasked with imitating.
Davis's Critique. Ernest Davis had a careful look at the study, methods, materials, etc, and not just the headline.
.
Saudi AI Initiative. Saudi Arabia plans a $100 billion AI initiative aiming to rival UAE's tech hub, highlighting the region's escalating AI investments.
Anthropic Collaboration. Anthropic collaborates with Palantir and AWS to integrate CLAWD into defense environments, marking a significant policy shift for the company.
US Sanctions Challenge. U.S. penalties on GlobalFoundries for violating sanctions against SMIC underline ongoing challenges in enforcing AI-chip export controls.
OpenAI's Acquisition. OpenAI's acquisition of chat.com and internal shifts signal significant strategy pivots and challenges with model scaling and security.
Systematic Issues. The scale-first mentality is a systemic issue, and needs to be addressed at the root.
Research Culture Influence. Scaling reliably improves benchmarks, which makes it a very good match for an academic/research environment where publications are a must.
Performance Metrics Issue. Emergent abilities are created by the researcher’s choice of metrics, not fundamental changes in model family behavior on specific tasks with scale.
Funding Justifications. Big scaling projects are easy to explain to funders.
Market Share Strategy. The hope is that scaling now will build market share for the future (and prevent competitors from taking it in the future).
Corporate Research Benefits. Scaling is a very attractive option for corporate research b/c it is everything that middle management dreams about: reliable, easy to account for, non-disruptive, and impersonal.
Public Interest Concerns. The accompanying research dominance should be a worry for policy-makers around the world because it means that public interest alternatives for important AI tools may become increasingly scarce.
Scaling Dominance. Scaling wins because it perfectly fits how big organizations (especially corporate research) operate.
Research Incentives. The incentive/compensation structure for Researchers drives them to pursue scaling.
Scaling Challenges. Leading LLM Labs have been allegedly struggling with going forward- with reports on both OpenAI and Google allegedly struggling to push their models GPT and Gemini to the next level.
.
.
.
MLOps Efficiency. VC firms that increased their hiring of women partners by just 10% saw an average increase of 1.5% in overall fund returns and gained 9.7% more profitable assets.
Women in VC. Only 15% of private equity institutional partners and managing directors are women.
Memory Personalization. Memory/personalization of LLMs to help them generate more customized things for the user, increasing the friction for switching.
Female Entrepreneur Efficacy. Female entrepreneurs have been shown to deliver more than double the revenue per dollar invested compared to their male counterparts.
Ventures 93% Male. 98% of all venture capital dollars flow into male-founded startups.
AI Ethics Discussions. Devansh talks about his experiences advocating for safer social platforms, his controversial takes on ‘morally aligned’ LLMs, and the underlying ethical issues in tech that often go unnoticed.
High-Quality Education. We follow a “pay what you can” model, which allows you to support within your means, and support my mission of providing high-quality technical education to everyone for less than the price of a cup of coffee.
Reading Recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week.
Community Engagement. Before we begin, our cult has established itself in 190 countries.
Research Support. I put a lot of effort into creating work that is informative, useful, and independent from undue influence.
Language Models Survey. To address the scaling challenges, we introduce Mixture-of-Transformers (MoT), a sparse multi-modal transformer architecture that significantly reduces pretraining computational costs.
Podcast Insight. We talked a bunch of things, mainly related to ethics, morally aligned LLMs, and what needs to be done to ensure that tech works for us.
.
.
.
AlphaFold 3 Capabilities. AlphaFold 3 is a major upgrade from its predecessor, capable of modeling complex interactions between proteins, DNA, RNA, and small molecules, which are crucial for understanding drug discovery and disease treatment.
Judge Dismisses Lawsuit. A US judge dismissed a copyright lawsuit against OpenAI, ruling that the plaintiffs failed to demonstrate that their articles were copyrighted or that ChatGPT's responses would likely plagiarize their content.
FrontierMath Benchmark. FrontierMath is a new benchmark designed to evaluate AI's mathematical reasoning by presenting research-level problems that current models struggle to solve, highlighting the gap between AI and human mathematicians.
OpenAI's Leadership Changes. Lilian Weng's departure from OpenAI highlights ongoing concerns about the company's commitment to AI safety amid a wave of exits by key researchers and executives.
Nvidia's Market Position. Nvidia's rise to become the world's largest company highlights the significant impact and dominance of artificial intelligence in the financial markets.
Waymo Expansion News. Waymo's robotaxi service, now available in Los Angeles, has rapidly expanded due to significant funding and partnerships, offering over 150,000 weekly rides across multiple cities.
Trump's AI Policy Shift. Donald Trump's victory in the 2024 election has significant implications for the future of artificial intelligence (AI) in the United States.
AI Improvement Slowdown. OpenAI's upcoming model, code-named Orion, may not represent a significant advancement over its predecessors, as per a report in The Information.
AlphaFold 3 Open-Sourcing. Google DeepMind has released the source code and model weights of AlphaFold 3 for academic use, a move that could significantly speed up scientific discovery and drug development.
OpenAI's Custom Hardware. OpenAI partners with Broadcom and AMD to develop custom AI hardware, aiming for profitability and reducing inference costs.
AI Military Use. Meta's open-source models utilized by China's military prompt regulatory adjustments; US agencies gain access to counterbalance.
AI Regulation Alert. Anthropic warns of AI catastrophe if governments don't regulate in 18 months.
AI Benchmark by OpenAI. OpenAI Releases SimpleQA: A New AI Benchmark that Measures the Factuality of Language Models.
Llama 3.2 Release. Meta Releases Quantized Llama 3.2 with 4x Inference Speed on Android Phones.
Funding for xAI. Elon Musk's xAI in talks to raise funding valuing it at $40 billion, WSJ reports.
Meta and Reuters Deal. Meta strikes multi-year AI deal with Reuters.
US AI Regulation. New U.S. regulation mandates quarterly reporting for large AI model training and computing cluster acquisitions, aiming to bolster national security.
Robot Control Policy. Physical Intelligence unveils a generalist robot control policy with a $400M funding boost, showcasing significant advancements in zero-shot task performance.
.
User Understanding. Users don’t (always) understand how the agent is different from LLMs/chatGPT.
Emerging Ecosystem. We think that together they can provide a useful guide for others interested in following agents.
Continuous Innovation. The journey towards building effective and ubiquitous autonomous AI agents is still one of continuous exploration and innovation.
Improved Accuracy. We now have a scalable framework that improved upon answer accuracy significantly, from 50% perceived accuracy to up to 100% on specific high-impact use cases.
Task-Specific Agents. We are increasingly excited about task- and industry-specific agents which promise to offer tailored solutions that address specific challenges and requirements.
Agent Adoption. Because users are sometimes uncomfortable with or intimidated by an iterative way of working, they give up quickly on prompt engineering.
AI Efficiency. More comprehensive answers could provide a reason for building agents on its own.
Learning Challenges. We’ve learned that building useful agents is, surprise surprise… hard.
AI Agent Components. Broadly, agents are AI systems that can make decisions and take actions on their own, following general instructions from a user.
AI User Base. The Prosus AI team helps solve real problems for the 2 billion users we collectively serve across companies in the Prosus Group.
Prosus Conference. The MLOps Community and Prosus will be having a free virtual conference on November 13th with over 40 speakers who are actively working with AI agents in production.
Learning Budget. Many companies have a learning budget that you can expense this newsletter to.
Expert Insights. In the series Guests, I will invite these experts to come in and share their insights on various topics that they have studied/worked on.
Market Impact. If the rumor is verified, there could be the AI equivalent of a bank run.
AI's Critical Test. What happens if suddenly people lose faith in that hypothesis?
Diminishing Returns. I strongly suspect that, contra Altman, we have in fact reached a point of diminishing returns for pure scaling.
Scaling Laws Limitations. Scaling laws are not physical laws; they are merely empirical generalizations that held for a certain period of time.
Scaling Beliefs. Sam Altman was still selling scaling as if it were infinite, describing it as a 'religious level belief'.
Learning Dynamics. For the MatMul-free LM, the learning dynamics differ from those of conventional models, necessitating a different learning strategy.
Ternary Weights Advantage. Using ternary weights allows for simple additions or subtractions instead of multiplications, greatly increasing computational efficiency.
MatMul-free LLMs. MatMul-free models achieve performance on-par with state-of-the-art Transformers and significantly reduce memory usage.
LLM Functionality. Our LLM functioned as the controller, which took a user query and then called the relevant script for a particular functionality.
Neuro-Symbolic AI. AlphaGeometry combines an LLM and a symbolic engine in a 2-routine loop, significantly improving efficiency in solving geometry problems.
Integration of Techniques. Constantly emphasizing a better tomorrow won't cause people to stop using these models today. It will stop them from thinking deeply about how to address the fundamental issues today.
Misleading Claims. To turn a 'there are limits to blindly scaling LLMs' to 'LLMs as a whole are hitting diminishing returns' is a huge stretch.
Hype in AI. There's a lot of hype that needs to be addressed, and we will create dangerous products if we keep pushing hype.
Limitations of Deep Learning. Due to these limitations, it's best to pair it with other techniques- especially when control/transparency are at stake.
Economics of LLMs. I think LLMs would still be economically viable in very high-cost avenues, where productivity gains can justify higher costs.
Diverse Research Directions. We really need to look beyond LLMs to explore (and celebrate) more research directions.
Agreement with Marcus. Right off the bat, I agree with the following claims- Scale won't solve general intelligence.
Critique of Marcus. My goal in this article will be to talk about why I disagree heavily w/ Gary Marcus's claim that Deep Learning or LLMs are close to hitting a wall.
AI Skeptics React. This has gotten a lot of AI Skeptics rejoicing since they can wave around their 'I told you so's.
Gary Marcus' Claim. Gary Marcus released an article confirming that LLMs have indeed reached a point of diminishing returns.
Market Recognition. I’m glad that the market is finally recognizing that what I’ve been saying is true.
Truth Revealed. The thing is, in the long term, science isn’t majority rule. In the end, the truth generally outs.
Investment Misalignment. Meanwhile, precious little investment has been made in other approaches. If LLMs won’t get the US to trustworthy AI, and our adversaries invest in alternative approaches, we could easily be outfoxed.
Economic Concerns. The economics are likely to be grim. Sky high valuation of companies like OpenAI and Microsoft are largely based on the notion that LLMs will, with continued scaling, become artificial general intelligence.
Deep Learning Critique. In my most notorious article, in March of 2022, I argued that 'deep learning is hitting a wall'.
Scaling Limits. For years I have been warning that 'scaling' — eeking out improvements in AI by adding more data and more compute, without making fundamental architectural changes — would not continue forever.
.
Article Link. Read more
Consulting Services. I provide various consulting and advisory services.
Email Template Link. You can use the following for an email template to request reimbursement for your subscription.
Learning Budget. Many companies have a learning budget that you can expense this newsletter to.
Discard Boring Datasets. Scrap the boring Iris datasets, no GPT + Vector DB spin-off, and no more Wine Price predictions.
High ROI Projects. I believe that they are one of highest ROI investments for an early career person.
Quality Over Quantity. Not 5 or 10 mid-tier projects, but 1-3 very good ones.
Essential Side Projects. The most crucial step to getting your first ML job is to have an amazing side project.
Target Audience. In this article, I will focus my advice on the group that I am most qualified to speak to- early career students looking for their first role in Machine Learning.
Research Role Limitation. This leaves me ineligible for research roles at most Companies (which require either a MS or preferably a PhD).
Formal Education Impact. This means that I don’t ever see myself pursuing an upper-level degree.
Machine Learning Jobs. A lot of people reach out to me with this question. Answering this question is complex and relies heavily on the person’s individual goals, interests, and skills.
.
Synthetic Data Utilization. Synthetic data can be a way to create fake training data that 'feels like' real samples, enhancing model performance.
Amazon's Fairness Metrics. Amazon's publication showed that the usual metrics used to measure fairness reflect the biases of their datasets.
Addressing Dataset Biases. Injecting diversity into your training data can save your performance and enhance representations.
Audit Recommendations. Engaging independent experts to verify transparency mechanisms and documentation ensures accountability.
Open Source Benefits. Open-source can reduce your R&D costs, help you identify + solve major issues, and get more people on your ecosystem.
Embedding Models Access. Giving access to embedding models will significantly improve the transparency and development of LLM-based solutions.
Self-Generated Data Issues. Models fed on self-generated training data tend to deteriorate over time.
Compounding Bias Origins. Unchecked dataset biases also feed into the way these models learn, compounding the bias.
Bias Control Necessity. We don’t want bias-free AI. We want an AI with biases that are explicit, controllable, and agreeable.
Transparency Importance. Transparency is the most important aspect of LLMs that we should be building on now.
Gemini's Hateful Flagging. Gemini flagged an image of a Middle Eastern Camel Salesman as 'unsafe', indicating problematic implications around AI bias.
.
.
.
Media's Role. The media is failing. The individual incidents have all been reported, but seemingly nobody is putting it all together.
Growing Concerns. Disorientation, dysfluency, disinhibition, and challenges with motor control - we are seeing it over and over, and it’s obviously getting worse.
Unusual Behavior. Most incredibly, later yesterday, in front of a live national television audience, Trump performed simulated fellatio on a microphone stand.
Disinhibition Signs. The chilling thing is that in the twelve hours after I posted that, we saw at least four MORE incidents, including more signs of foul language and disinhibition.
Trump's Warnings. Well over half a million people have viewed it on X: one of the bluntest warnings I have ever written.
Psychology Background. Before I turned full time to AI, the usual topic of this newsletter, I spent decades focused on human psychology, mostly as a full professor at NYU.
Dementia Behavior. As Meiselas put it on X, discussing the microphone incident, the fellatio incident fits with dementia: Yes, people with dementia can experience changes in sexual behavior, including confusion about sex: Inappropriate behavior.
Urgency of Coverage. With the election being on Tuesday, this is the most urgent post I have ever written. If you are in the mainstream media, or know someone who is, for the love of democracy, please cover Trump’s apparent dementia.
Call to Action. Dear Media, Don’t be cowed by the Goldwater rule: Call Dr. Lance Dodes. Reach out to the group Duty2Warn.
.
ChatGPT Integration. The beta release also includes additional features such as Genmoji, Image Playground, Visual Intelligence, Image Wand, and ChatGPT integration.
AI-Powered Transcription Tool Concerns. AI-powered transcription tool Whisper invents things no one ever said, leading to fabrications in transcriptions used in various industries, including medical settings.
OpenAI's Chip Development. OpenAI is collaborating with Broadcom to develop custom silicon for AI workloads, while also incorporating AMD chips into its Microsoft Azure setup.
Google's AI Watermarking Tool. Google has developed an AI watermarking tool to identify AI-generated text, making it easier to distinguish between AI-generated and human-written content.
Meta AI's Math Breakthrough. AI developed by Meta can solve century-old math problems involving Lyapunov functions, which were previously unsolvable.
Waymo Funding Milestone. Waymo secures $5.6 billion in funding to expand its self-driving taxi program to more US cities, with plans to partner with Uber and a focus on safety and responsible execution.
AI Investment Boom. The AI investment boom has led to a rapid increase in US fixed investment to meet the growth in computing demand, with companies investing in high-end computers, data center facilities, power plants, and more.
Claude 3.5 Advancements. The article announces the introduction of an upgraded AI model, Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku, both of which show significant improvements in coding tasks.
GitHub Copilot Expansion. GitHub is expanding its Copilot code completion and programming tool to include models from Anthropic, Google, and OpenAI, allowing developers to choose the model that best suits their needs.
Apple Intelligence Features. Apple has released the latest developer beta versions of its operating systems, including iOS 18.2, iPadOS 18.2, and macOS Sequoia 15.2, introducing new Apple Intelligence features.
Product Adoption. 79% of survey correspondents said they had tried Microsoft Copilot. That’s tremendous, given how new the product is.
Further Reading. I read a pair of stunning statistics from a new CNBC poll.
Market Sentiment. People aren’t ignoring GenAI; they are waiting to see if it will work.
Value Perception. Only 25% of the correspondents thought it was worth it.
Influencer Reaction. AI ignored? I dashed off a quick reply: practically everyone has tried it, but they are not always satisfied with the results.
Camus on Absurdity. The absurd is born of this confrontation between the human need and the unreasonable silence of the world.
Struggle Justification. He justifies the struggle because it preserves and enhances ordinary human moments and that is worth it.
Learning Budget Utilization. Many companies have a learning budget, and you can expense your subscription through that budget.
Absurd Hero Definition. An Absurd Hero is an active participant in the world who chooses their own values without needing validation from any other source.
Moment and Awareness. Real generosity towards the future lies in giving all to the present.
Gambling and Knowledge Limits. Gambling has several personal, economic, and societal benefits and is a great way to teach people to appreciate the limitations of our knowledge.
Recommendation for Living. Once we give up our desire to find these truths, we can focus on the simpler things that we can comprehend and control.
Paths Out of Absurdity. Ultimately, there are only 3 ways out of this state: Suicide, Philosophical Suicide, and Embracing the Absurd.
Value of Camus' Philosophy. I think Camus is generally a great antidote to the hopelessness we feel when asked to confront an unending task.
OpenAI Media Generation. OpenAI researchers develop new model that speeds up media generation by 50X.
ByteDance AI GPUs. TikTok owner ByteDance taps TSMC to make its own AI GPUs to stop relying on Nvidia.
Responsible Scaling Policy. Announcing our updated Responsible Scaling Policy.
New AI Research Artifacts. Meta FAIR Releases Eight New AI Research Artifacts—Models, Datasets, and Tools to Inspire the AI Community.
xAI API Launch. Elon Musk's AI startup, xAI, launches an API.
NVIDIA Server Deployment. NVIDIA's Blackwell GB200 AI Servers Ready For Mass Deployment In December.
Canva AI Tool. Canva has a shiny new text-to-image generator.
AI Video Startup Launch. AI video startup Genmo launches Mochi 1, an open source rival to Runway, Kling, and others.
Anthropic AI Update. Anthropic's latest AI update can use a computer on its own.
AI News Summary. Our 187th episode with a summary and discussion of last week's big AI news, now with Jeremie co-hosting once again!
Investment Motivations. Investors are drawn to plausible stories with big numbers, which allows them to make substantial fees from investing other people's money.
Precision in Predictions. Any serious scientist or engineer knows you can't possibly predict the future with that kind of precision, especially when there are so many unknowns.
Criticism of Hype. The audience of investors was potentially misled into believing that AI advancements are more controlled and predictable than they are.
Questionable Intelligence Claims. Masayoshi Son stated that 'Artificial Super Intelligence' would be 10,000 times smarter than humans, predicting it would arrive in 2035.
AI Improvement Claims. Elon Musk stated, 'I feel comfortable saying that AI is getting 10 times better each year' without specifying any measure.
Futuristic Robot Predictions. Elon Musk said, 'I think by 2040 probably there are more humanoid robots than there are people.'
.
.
.
.
.
.
.
.
.
Preempting Rounds.
Maintain Relationships.
Fundraising Timeline.
Investor Connections.
Investment Criteria.
Identify Investors.
Founder Attraction.
Funding Challenges.
Learning Budget Support.
Expert Insights Series.
.
Regulatory Challenges for Tesla. Tesla's plans for 'unsupervised FSD' and robotaxis could face regulatory challenges in California and Texas due to the need for permits and exemptions.
AI Child Abuse Risks. AI chatbot service Muah.AI is being used to request and potentially generate child-sexual-abuse material, highlighting the broader issue of AI's potential for abuse.
Controversial Perplexity Lawsuit. Perplexity is facing a lawsuit from Dow Jones and the New York Post for allegedly creating fake sections of news stories and falsely attributing them to publishers.
AI Fraud Recovery. AI has helped the US Treasury Department recover $1 billion worth of check fraud in fiscal 2024, nearly triple the amount recovered in the prior fiscal year.
ChatGPT Traffic Milestone. ChatGPT's web traffic has been steadily increasing, reaching 3.1 billion visits in September 2024, marking a significant growth compared to the previous year.
ByteDance Sabotage Incident. ByteDance confirmed that an intern was fired in August for planting malicious code in its AI models.
Elon Musk's API Launch. Elon Musk's AI startup, xAI, has launched an API for its flagship generative AI model, Grok.
Perplexity Valuation Rise. Perplexity AI, an artificial intelligence search engine startup, is aiming to raise its valuation to approximately $9 billion in its upcoming funding round, a significant increase from its $3 billion valuation in June.
Meta AI Artifacts. Meta's Fundamental AI Research (FAIR) team has unveiled eight new AI research artifacts, including models, datasets, and tools, aimed at advancing machine intelligence.
.
Adobe AI Video Tools. Adobe's AI video model is here, and it's already inside Premiere Pro.
AI Podcast Episode. Our 186th episode with a summary and discussion of last week's big AI news!
Google's Nuclear Project. Google will help build seven nuclear reactors to power its AI systems.
LLMs' Reasoning Limitations. LLMs can't perform 'genuine logical reasoning,' Apple researchers suggest.
OpenAI Content Deal. OpenAI announces content deal with Hearst, including content from Cosmopolitan, Esquire and the San Francisco Chronicle.
YouTube AI Expansion. YouTube expands AI audio generation tool to all U.S. creators.
AI Catalyst Event. Check out Jon's upcoming agent-focused event here - AI Catalyst: Agentic Artificial Intelligence.
Future of AI Development. If the answer isn't bigger LLMs, we may have wasted half a decade.
Societal Risks. If things do fall apart, it is not just investors who stand to lose, but society. Immense resources may end up being wasted in vain, because of hype.
Reflection on AI. AI ought to be looking itself in the mirror, too, right about now.
Hype in AI. The combination of made-up graphs and outsized promises could make a person nervous.
Imaginary Data. The curve, so far as I know, is just made up. I do not know any measure that pegs the delta between GPT-4 and o1 (marked as 'today') as being triple the delta between GPT-3 and GPT-4.
Critical Graphs. As I wrote on X, the graph 'is a fantasy about the future, and not at all obvious that the 'data' plotted correspond to anything real.'
Theranos Comparison. But sometimes I have heard others compare OpenAI to Theranos, which was basically a fraud. Another charismatic founder, another ridiculous valuation, and another collapse.
Comparisons to WeWork. When I think of OpenAI, I often think of WeWork: charismatic founder, immense valuation, questionable business plan, and the possibility of similar immense deflation in their valuation, if confidence wavers.
Meta's Actions on Sextortion. Meta has acted on our recommendations to protect teenagers from sextortion.
Four-Year Journey. We’ve reached over 10 Million people overall and are only growing.
Sourcing Assistance. My work has about 150-200K views a week, many of whom are founders and prospective founders of future software/AI companies.
VC Engagement. I’m looking to learn more about Venture Capital to help with my end goals of having my own AI Lab.
Victim Support Partnership. We’re also partnering with Crisis Text Line in the US to provide people with free, 24/7, confidential mental health support.
Screenshot Prevention. Soon, we’ll no longer allow people to use their device to directly screenshot or screen record ephemeral images or videos.
Follower List Restrictions. Removing access to follower lists is a very simple, but extremely effective way to stop scams.
Improved Scammer Detection. Meta will start using various signals to identify and flag accounts that might be blackmailers.
.
LLM Usage Risks. The more people use LLMs, the more trouble we are going to be in.
Need for AI Activism. Gary Marcus is quite concerned that nobody is really talking about tech policy, when so much is at stake.
Ethical Guardrails Failure. The possibilities are now endless for propaganda, troll farms, and rings of fake websites that degrade trust across the internet.
Predictable Issues. Jailbreaks aren’t new, but even after years of them, the tech industry has nothing like a robust response.
Civilian Threats. If the attack were carried out in the real world, people could be socially engineered into believing the unintelligible prompt might do something useful.
Vulnerable Robotics. Companies like Google, Tesla, and Figure.AI are now stuffing jailbreak-vulnerable LLMs into robots.
Imprompter Attacks. The Imprompter attacks on LLM agents start with a natural language prompt that tells the AI to extract all personal information from the user's conversation.
Jailbreaking Concerns. Both concerns jailbreaking: getting LLMs to do bad things by evading often simplistic guardrails.
Sharing Personal Data. There is a temptation for some people (and some businesses) to share their very personal information with LLMs.
.
Neglect of Real Issues. By prioritizing moral alignment, we give LLM providers a pass from addressing the much more real concerns that plague these systems currently.
Preventing Sextortion. Instagram can mitigate a vast majority of financial sextortion cases by hiding minors' Followers and Following lists on their platform by default.
Sextortion Increase. The tenfold increase of sextortion cases in the past 18 months is a direct result of instructional videos and scripts being distributed on platforms like TikTok.
Impact of Child Labor. Child labor allows contractors to undercut their competition, leading tech companies to choose suppliers based on cheaper costs.
Child Labor in Tech. Many tech companies rely on suppliers that are known to use child labor in their supply chains.
High Error Rate. I got close to 50% error (5 out of 11 times, it didn’t match their output) when testing OpenAI’s diagnostic claims.
OpenAI's Medical Claims. OpenAI claims that their o1 model is really good at Medical Diagnosis, being able to diagnose diseases given a phenotype profile.
Morally Aligned AGI. Arguments for LLM moral alignment entail building a system focused on Peace and Love, with the belief that AGI could otherwise harm humans.
LLM Safety Series. This article is the final part of our mini-series about Language Model Alignment.
Neglect for Transparency. OpenAI could have instead been much more open about their model's limitations.
Nobel Prize for AI. John J. Hopfield and Geoffrey E. Hinton have been awarded the Nobel Prize in Physics for their groundbreaking work in the development of neural networks.
AI Safety Clock. The AI Safety Clock warns of potential doomsday scenarios and the need for global regulation and company responsibility to ensure safe AI development.
AI Chatbots Threat. AI chatbots can read and write invisible text, creating a covert channel for attackers to conceal and exfiltrate confidential data, posing a significant security threat.
AI's Potential Dangers. Amodei acknowledges the potential dangers of AI to civil society and the need for discussions about economic organization in a post-AI world.
AI Predictions by CEO. Anthropic CEO Dario Amodei predicts that 'powerful AI', capable of outperforming Nobel Prize winners, will emerge by 2026.
Tesla's Advanced Vehicles. At Tesla's 'We Robot' event, Elon Musk introduced futuristic vehicles, including the Cybercab and the Robovan.
Adobe's AI Video Model. Adobe has launched its AI video model, Firefly, which includes several new tools for video generation and editing.
AI Revolutionizes Protein Understanding. Hassabis and Jumper utilized AI to predict the structure of millions of proteins, while Baker employed computer software to invent a new protein.
Transformative Impact. The Nobel committee highlighted the transformative impact of Hopfield and Hinton's work, stating that their machine learning breakthroughs have provided a new way to use computers to address societal challenges.
AI Investment Surge. Goldman Sachs estimates companies will spend $1 trillion to use AI chatbots in their operations.
Nobel Prize in Chemistry. The Nobel Prize in Chemistry has been awarded to three scientists for their groundbreaking work in predicting and creating proteins using advanced technology.
.
.
Versatile Applications of BEAST. BEAST can be used for various adversarial tasks, including jailbreaking, hallucination induction, and privacy attacks.
Success Rates Comparison. ACG achieves up to 84% success rates at attacking GPT-3.5 and GPT-4, significantly outperforming GCG.
Gradient-Free Optimization. BEAST does not rely on gradients, allowing it to be faster than traditional optimization-based attacks.
Cost-Effective Attacks. Chaining the MCTS and EA together to cut down on costs can be a strategic approach to maximizing attack efficiency.
Beam Search-Based Method. BEAST uses beam search to quickly explore adversarially generated prompts, maintaining a balance between speed and effectiveness.
ACG Attack Advantage. ACG maintains a buffer of recent successful attacks, helping guide the search process and reduce noise.
Bijection Learning Utility. Bijection learning is interesting because it generalizes encoding-based jailbreaks using arbitrary mappings that are learned in-context by the target model.
MCTS Effectiveness. MCTS’s tree-based approach handles the large branching factor inherent in language generation tasks, allowing for a balance between exploitation and exploration.
Automated Red-Teaming Techniques. Haize Labs breaks LLMs by employing techniques like multiturn jailbreaks via Monte Carlo Tree Search (MCTS) and bijection learning.
Red-Teaming Importance. Red teaming has 2 core uses: it ensures that your model is 'morally aligned' and helps you spot weird vulnerabilities and edge cases that need to be patched/improved.
Cerebras IPO Filing. Cerebras, an A.I. Chipmaker Trying to Take On Nvidia, Files for an I.P.O.
Waymo-Hyundai Partnership. Waymo to add Hyundai EVs to robotaxi fleet under new multiyear deal.
Google AI Ads Expansion. Google brings ads to AI Overviews as it expands AI's role in search.
OpenAI's VC Round. OpenAI closes the largest VC round of all time.
California AI Policy Debate. AI policy discussions intensify as California's vetoed bill sparks debates on regulation, alongside Google's $1 billion investment to expand AI infrastructure in Thailand.
Microsoft-OpenAI Moves. Microsoft and OpenAI's strategic advancements highlight significant financial moves and AI enhancements, including Microsoft's enhanced Copilot.
Mio's Foundation Model. Mio's foundation model and Apple's Depth Pro enhance multimodal AI inputs and precise 3D imaging for AR, VR, and robotics.
Meta's MovieGen Features. Meta's MovieGen introduces innovative features in AI video generation, alongside OpenAI's real-time speech API and expanded ChatGPT capabilities.
AI Story Creation for Kids. AI reading coach startup Ello now lets kids create their own stories.
.
.
.
Reading Recommendations. A lot of people reach out to me for reading recommendations, so I will start sharing interesting AI Papers/Publications, books, videos, etc.
Productivity Insights from Copilot. Less experienced developers accept AI-generated suggestions more frequently than their more experienced counterparts.
Weight Loss Industry Concerns. The episode dives into the dirty business of Big Pharma and weight loss drugs, spotlighting historical distrust.
Mathematical Reasoning Limitations. The performance of all models declines significantly as the number of clauses in a question increases, highlighting a fragility in mathematical reasoning.
AI and Conspiracy Beliefs. AI is surprisingly effective in countering conspiracy beliefs, even against true believers.
Energy-efficient Models. The new L-Mul algorithm can potentially reduce 95% energy cost by applying it in tensor processing hardware.
Legislators' Role. Legislators cannot sit idle by while AI technology is further developed and distributed to the public.
Generative AI Copyright. The article presents a convincing case for why AI companies should either compensate or seek permission from copyright holders to use their data for training purposes.
Community Spotlight. Dave Farley runs the excellent YouTube channel Continuous Delivery where he shares insights on software engineering.
AI Subreddit. We started an AI Made Simple Subreddit to keep the community engaged.
Focus Areas. The focus will be on AI and Tech, but the ideas might range from business, philosophy, ethics, and much more.
Hinton's Contribution. Hinton has made major contributions, but the citation seems to indicate he won it for inventing back-propagation, but, well, he didn’t.
Critique of LLMs. The anti-neurosymbolic tradition is limited; its most visible manifestation, LLMs, has brought fame and money, but no robust solution to solving any particular problem with great reliability.
Need for New Approach. It's time for a new approach, and Hassabis sees that; his open-mindedness will serve the field well.
Neurosymbolic AI. It is, as far as I can tell, the first Nobel Prize for Neurosymbolic AI.
Divergent Paths. Hinton and Hassabis represent two different paths forward in AI, with Hinton favoring back-propagation and Hassabis advancing neurosymbolic AI.
AlphaFold Significance. AlphaFold is a huge contribution to both chemistry and biology and is arguably one of the two biggest contributions of AI to date.
Response to Hinton's Award. Even Steve Hanson, a long-time Hinton defender, acknowledged 'we agree on the fact that the "Scientific committee of the Nobel committee" didn't know the N[eural] N[etwork] history very well'.
Werbos's Priority. Paul Werbos developed back propagation into its modern form for his 1974 Harvard PhD thesis.
Nobel Prize Winners. Not one but two Nobel Prizes went to AI this week.
.
.
Competition Landscape. OpenAI faces growing competition from rivals such as Google and Amazon.
California AI Law Blocked. California's new AI law, AB 2839, has been temporarily blocked by a federal judge due to concerns about its broad and potentially unconstitutional nature.
New Logo Controversy. OpenAI's staff is shocked and alarmed by the proposed new logo, preferring to keep the current hexagonal flower symbol.
Safety vs Profit. AI Safety culture confronts capitalism as leading AI labs grapple with the challenge of prioritizing safety over profit.
Corporate Moves. Durk Kingma, co-founder of OpenAI, has announced his move to Anthropic, expressing excitement to contribute to the development of responsible AI systems.
Generative AI Concerns. Despite its success, the NotebookLM tool is not immune from issues that affect generative AI, such as hallucinations and bias.
Flux Model Release. Black Forest Labs has released a new, faster text-to-image model called Flux 1.1 Pro, which is six times faster than its predecessor.
Podcast Creation. Google's study software, NotebookLM, is being utilized by users to create AI-generated podcasts, generating engaging audio in an 'upbeat, hyper-interested tone'.
DevDay Innovations. OpenAI has announced several new tools at its 2024 DevDay, including a public beta of its 'Realtime API' for building apps with low-latency, AI-generated voice responses.
User Growth. ChatGPT has gained 250 million weekly active users.
Investment Details. Microsoft contributed $750 million on top of its previous $13 billion investment.
Funding Milestone. OpenAI has raised $6.6 billion in a new funding round, led by Thrive Capital, valuing the company at $157 billion.
Focus on AI Safety. Given how important (but misunderstood) the topic is, I have decided to orient our next few pieces on safe and responsible AI.
Human-In-Loop Learning. Integrating human feedback into the LLM training process can improve the LLM's ability to align with human preferences and values.
Dynamic Benchmarking. Dynamic benchmarking platforms, where new test cases are continuously added and models are re-evaluated, can provide a more accurate and up-to-date assessment of LLM capabilities.
Gamified Evaluation. Making the evaluation process more engaging and interactive can motivate evaluators and improve their performance.
Bias Awareness Training. Providing evaluators with training on bias awareness can help them recognize and mitigate their own biases.
Diversity Matters. Ensuring a diverse pool of evaluators, representing different backgrounds, perspectives, and lived experiences, is crucial for mitigating bias in the evaluation process.
Responsible AI Evaluation. LLMs should be evaluated for bias, safety, truthfulness, and privacy to ensure responsible development and deployment.
Effective Test Sets. Test sets should be challenging enough to differentiate between various LLM capabilities and weaknesses.
New Evaluation Pillars. CTH seeks to solve issues around evaluation with six pillars including consistency, scoring criteria, differentiating, user experience, responsible practices, and scalability.
Human Evaluation Costs. Human evaluation is expensive and time-consuming, hindering wider adoption.
Fluency Misleading. LLMs are so good at generating fluent text that we often mistake it for being factually correct or useful.
Challenges in Evaluations. Current evaluation methods often neglect cognitive biases and user experience (UX) principles, leading to unreliable and inconsistent results.
Evaluations Consequences. The authors present a new 'ConSiDERS-The-Human Framework' (CTH for conciseness) to tackle this.
ConSiDERS Framework Introduction. Amazon's 'ConSiDERS—the human-evaluation framework: Rethinking human evaluation for generative large language models' is a welcome departure, as it attempts to tackle a very real issue in the evaluation of Language Models.
.
Speculative Promises. Almost every putative virtue of AI (aside from making a small number of people rich) has been promissory.
Divert Funding. Diverting funding from chatbot and movie synthesis machines to more focused efforts around special-purpose AI addressing climate change might make more sense.
Transparency Required. We can’t, or at least shouldn’t, place massive bets like these without transparency and accountability.
AI Misalignment. LLMs are not the AI we need to address climate change; they are the AI we would use if we wanted to risk serious harm to the climate.
Environmental Harm. The case that AI will do serious harm to the environment if we continue on the current path is actually much stronger.
Potential Conflicts. Schmidt has a large stake in the companies building AI, and it is important to take those potential conflicts of interest seriously.
Climate Goals Skepticism. Schmidt concludes 'My own opinion is that we’re not going to hit the climate goals anyway because we are not organized to do it.'
AI Energy Concerns. Eric Schmidt argued that, despite AI’s rapacious energy demands, he would rather bet on AI solving the problem than try to constrain AI.
.
Consulting Services. I provide various consulting and advisory services.
Learning Budget. Many companies have a learning budget that you can expense this newsletter to.
Article Access. For access to this article and all future articles, get a premium subscription below.
Tutoring Experience. I know it’s helped a lot of other people ... it was very helpful to them.
Resource Recommendation. Reading cutting edge work solving real problems, even if you understand very little is recommended.
Unstructured Learning. If you’re someone who absolutely requires structure ... you probably will have a hard time with this approach.
Commitment Required. This approach will require at least 4-5 hours weekly ... you will start to see improvements in the first 2 months.
Learning Approach. The standard advice for learning Machine Learning ... is a bad base to build your knowledge around.
Different Learning Paths. Different goals require different actions. Trying to follow one path will lead to inefficient results.
Dynamic Knowledge Filtering. Success in AI ... is more about having the ability to filter through constantly shifting ... knowledge sources.
AI Video Tools. AI is being rapidly integrated into various sectors, with examples including ChartWatch reducing unexpected hospital deaths, Snapchat and YouTube introducing AI video generation tools, and Lionsgate partnering with Runway for AI-assisted film production.
AI Assistant Upgrades. OpenAI, Meta, and Google are enhancing their AI assistants with advanced voice modes, while Meta released Llama 3.2, an open-source model capable of processing both images and text.
AI in Media Production. Lionsgate Signs Deal With AI Company Runway, Hopes That AI Can Eliminate Storyboard Artists and VFX Crews.
Deepfake Legislation. Governor Newsom signs bills to combat deepfake election content.
Effective Research Findings. Recent research shows chain-of-thought prompting is most effective for math and symbolic reasoning, while OpenAI's GPT-4 with vision capabilities is being integrated into Perplexity AI's search platform.
AI Infrastructure Advances. Significant AI infrastructure developments include Grok's partnership with Aramco for a massive data center in Saudi Arabia, and Microsoft's plan to power data centers using a reopened Three Mile Island nuclear plant.
AI in Healthcare. AI tool cuts unexpected deaths in hospital by 26%, Canadian study finds.
IDE Benefits. A good IDE probably is a much bigger, much less expensive, much less hyped improvement that helps more people more reliably.
Use AI Appropriately. Use it to type faster, not as a substitute for clear thinking about algorithms + data structures.
Conceptual Understanding. 10x-ing requires deep conceptual understanding – exactly what GenAI lacks.
Hype vs. Reality. The tracks here are pointing to modest improvements, with some potentials costs for security and technical debt, not 10x improvement.
Long-Term Risks. Users writing less secure code could lead to a net loss of productivity long term.
Quality Concerns. An earlier study showed 'downward pressure on code quality'.
Mixed Results. Another somewhat more positive study shows moderate (26%, not 1000%) improvement for junior developers, and only 'marginal gains' for senior developers.
Limited AI Benefits. One result with 800 programmers shows little improvement and more bugs.
Productivity Claims. The data are coming in – and it’s not.
Job Retention Issues. Many staff, many of whom have left, perhaps out of a sense that the mission had been abandoned.
Public Benefit Requirement. The code should be opened for the public benefit.
Further Reading. You can read more about their analysis here.
Transition Cost Proposal. The advocacy group Public Citizen has a proposal: the change from nonprofit should cost at least 20% of the business, perhaps more.
OpenAI's Shift. Now OpenAI wants to renege on its promises, and become a for-profit.
OpenAI's Advanced Voice Mode. OpenAI has announced the rollout of its Advanced Voice Mode (AVM) to a broader set of ChatGPT's paying customers, with the update including five new voices and enhanced speech naturalness.
AI Regulation Veto. California Governor Gavin Newsom vetoed a pioneering bill aimed to establish safety measures for large AI models, citing concerns about the bill's applicability to high-risk environments.
Meta's Llama 3.2. Meta has released Llama 3.2, the first of its large open-source models capable of processing both images and text.
OpenAI Funding Goals. OpenAI's CFO tells investors the funding round should close by next week despite the executive departures.
OpenAI Restructuring. OpenAI is undergoing a significant transition as it seeks to become more appealing to external investors, including a shift towards becoming a for-profit business and potentially raising one of the largest funding rounds in recent history.
Executive Departures. Multiple high-ranking employees resigned last week, including Chief Technical Officer Mira Murati, Chief Research Officer Bob McGrew, and VP of Research Barret Zoph, who expressed support for OpenAI despite their departure.
Concerns on AI Security. AI safety controls can be bypassed by translating malicious requests into math equations, posing a critical vulnerability.
AI Investor Interest. Middle Eastern sovereign wealth funds have increased funding for Silicon Valley's AI companies fivefold in the past year, showing strong interest in the AI sector.
Duolingo's New Features. Duolingo announced its AI-powered Adventures mini-games and Video Call feature to enhance language learning.
Meta AI Features. Meta's AI can now talk to users in the voices of celebrities like Awkwafina and John Cena, enabling a more engaging interaction experience.
Independent Analysis. I put a lot of effort into creating work that is informative, useful, and independent from undue influence.
Dynamic Adjustments. The model constantly receives feedback from the classifier and adjusts its output accordingly, leading to a final generated text that is both coherent and aligned with the safety/goal guidelines.
Fine-Tuning Limitations. Fine-tuning a large language model requires significant computational resources and time, making it impractical to train separate models for every desired attribute combination.
Robustness through Noise. DGLM incorporates Gaussian noise augmentation during the training of the decoder.
Decoupled Training. DGLM effectively decouples attribute control from the training of the core language model.
Toxicity Reduction. Increasing guidance reduces toxicity with minimal loss of fluency.
Improved Performance. DGLM consistently outperforms existing plug-and-play methods in tasks like toxicity mitigation and sentiment control.
Single Classifier Control. Further, controlling a new attribute in our framework is reduced to training a single logistic regression classifier.
New Approach. Their novel framework for controllable text generation combines the strengths of auto-regressive and diffusion models.
Learning Budget. Many companies have a learning budget that you can expense this newsletter to.
.
Changes in Financing. Perhaps more likely is that the company will make some fairly major concessions to get it over the line.
Industry Impact. If they stumble, it will have ripple effects.
Risk of Collapse. If the round does fall apart – and investors back down - OpenAI could be in trouble.
Concerns Over Cash. OpenAI probably doesn’t have a lot of cash on hand.
Operating Loss. Their operating loss last year is said to be on the order of $5 billion.
Funding Rumors. OpenAI is trying to raise a lot of money, rumored to be $6.5 or $7 billion dollars, apparently on a $150 billion dollar valuation.
.
Adobe Video Generation. Adobe adds video generation to Firefly, Anthropic launches AI safety-focused Claude enterprise.
AI Detection Tools. YouTube is developing AI detection tools for music and faces, plus creator controls for AI training.
AI in Japan. Japan's Sakana AI partners Nvidia for research, raises $100M.
Paid Users Milestone. OpenAI Hits 1 Million Paid Users For Business Versions of ChatGPT.
OpenAI Valuation. OpenAI Fundraising Set to Vault Startup's Valuation to $150 Billion.
AI Forecasting Competitors. New AI forecasting bot competes with veteran human forecasters.
LLAMA3 Performance. LLAMA3 8B excels with synthetic tokens, AI-generated ideas deemed more novel.
OpenAI O1 Models. OpenAI's O1 and O1 mini models boast advanced reasoning and longer responses.
AI Bias Issues. AI tends to replicate the biases in your systems, limiting creativity and diversity in content production.
Camus and Rebellion. Albert Camus’s philosophy encourages embracing life for what it is and thus live a richer, more fulfilling life.
Navigating Hyperreality. The path of least resistance leads to passive consumption and an acceptance of superficiality in our interactions with media.
Linguistic Diversity Decline. Our findings reveal a consistent decrease in the diversity of the model outputs through successive iterations.
Diversity in Content Creation. Large language models have led to a surge in collaborative writing with model assistance, risking decreased diversity in the produced content.
Critical Thinking Outsourcing. Outsourcing critical thinking to AI systems can lead to reducing individual cognitive engagement and understanding.
Generative AI Limitations. An overreliance on GenAI will lead to people outsourcing their thinking to GPT, reducing their critical thinking abilities.
Information Overload Effects. We live in a world where there is more and more information, and less and less meaning.
Simulacra and Reality. JB argued that our relationship with reality is mediated through signs and symbols, which have become detached from any underlying truth.
AI in Moderation Issues. The use of AI in Moderation can impose arbitrary standards that limit the distribution of certain kinds of content, and push others, all creating a loss of creativity and critical thinking.
Learning Budget Opportunity. Many companies have a learning budget, and you can expense your subscription through that budget.
.
Comparison to WeWork. Gary Marcus has repeatedly warned that OpenAI might someday be seen as the WeWork of AI.
Investor Caution. Investors shouldn’t be pouring more money at higher valuations, they should be asking what is going on.
Valuation Concerns. Yet people are valuing this company at $150 billion dollars.
Absence of Product Releases. GPT-5 hasn’t dropped, Sora hasn’t shipped.
Massive Operating Loss. The company had an operating loss of $5b last year.
Co-founder Departures. From left to right that’s Ilya Sutskever (now gone, less than a year later), Greg Brockman (on leave, at least until the end of the year), CTO Mira Murati (departure just announced) and Sam Altman (fired, and then rehired).
Iconic Magazine Covers. This one, from last September, may soon become just as iconic.
.
.
.
.
Retracted Statement. I am retracting our earlier statement that OpenAI deliberately cherry-picked the medical diagnostic example to make o-1 seem better than it is.
Over-Diagnosing Rare Condition. I also noticed that GPT seems to over-estimate the probability of (and thus over-diagnose) a very rare condition, which is a major flag and must be studied further.
Transparency in Results. Anyone claiming to have a powerful foundation model for these tasks should be sharing their evals.
OpenAI Responsibility. I think technical solution providers have a duty to make users clearly aware of any limitations upfront.
High Probability Concerns. Given the low prior probability, I am naturally suspicious of any system that weighs it this highly.
Need for Cross-Validation. I didn’t think I would have to teach OAI folk the importance of cross-validation.
Testing Methodology Issues. B mentioned that they had tested O1 (main) for the prompt a bunch of times. O1 always had the same outputs (KBG).
Judging OpenAI Performance. OAI promotes the performance of its model without acknowledging massive limitations.
Models Need Clarification. For diagnostic purposes, it is better to provide models that provide probability distributions + deep insights so that doctors can make their own call.
Inconsistent Diagnosis. Running the prompt- making a diagnosis based on a given phenotype profile on ChatGPT o-1 leads to inconsistent diagnosis.
Floating Harbor Syndrome Rarity. The Floating Harbor Syndrome has been recorded in less than 50 people ever.
Publish Testing Outputs. In the future, I think any group making claims of great performance on Medical Diagnosis must release their testing outputs on this domain.
Caution in Medical Use. This (and its weird probability distributions for diseases) lead me to caution people against using o-1 in Medical Diagnosis.
Luma's Dream Machine API. Text-to-video startup Luma AI has announced an API for its Dream Machine video generation model which allows users to build applications and services using Luma's video generation model.
California's Deepfake Legislation. Governor Newsom signs bills to combat deepfake election content, including legislation to protect the digital likeness of actors and performers.
White House AI Task Force. White House launches AI data center task force with industry experts to address massive infrastructure needs for artificial intelligence projects.
Lionsgate's AI Ambition. Lionsgate has announced a partnership with Runway to develop an AI model that can generate 'cinematic video' and potentially replace storyboard artists and VFX crews.
Runway's API Offerings. Runway, an AI startup that is also focused on video creation, had launched its own API that allows developers to integrate its generative models into third-party platforms, currently offering its Gen-3 Alpha Turbo model with two pricing plans.
James Earl Jones Controversy. James Earl Jones' decision to use AI to preserve his voice as Darth Vader raises concerns among actors about the potential impact on their work and the need for consent and compensation transparency.
AI Reducing Hospital Deaths. AI-based early warning system at St. Michael's Hospital in Toronto, called Chartwatch, has led to a 26% decrease in unexpected deaths among hospitalized patients.
Copilot Wave 2 Launch. Microsoft's Copilot AI chatbot, now in its 'Wave 2' phase, enhances productivity in Microsoft 365 apps by enabling collaborative document creation, narrative building in PowerPoint, and intelligent email summarization in Outlook.
1X Technologies Innovation. Norwegian startup 1X Technologies has developed an AI-based world model to serve as a virtual simulator for training robots, addressing the challenge of reliably evaluating multi-task robots in dynamic environments.
.
.
OpenAI Testimony. OpenAI whistleblower William Saunders testified that the company has 'repeatedly prioritized speed of deployment over rigor.'
Loss of Trust. A single negative encounter can drastically undermine their perception of the AI’s reliability and hinder human-AI collaboration.
Truth in AI. The solution for AI in healthcare is simple: Give clinicians the probabilities of your answers or start developing models that are capable of saying 'I don’t know!'
Benchmark Concerns. Benchmarks are gameable and aren't representative of the complexities found in real-world applications.
AgentClinic-MedQA Claims. AgentClinic-MedQA claims Strawberry is the top choice for medical diagnostics.
o1 Model Claims. What’s fascinating is that, for the first time ever, a foundational model—without any fine-tuning on medical data—is offering a medical diagnosis use case in its new release!
AI in Healthcare. This might be a viable strategy for a fledgling fintech startup, but it’s reckless and dangerous for a company like OpenAI, especially when they now promote applications in critical areas like medical diagnostics.
Lean Startup Culture. Since its inception, OpenAI has embraced a 'Lean Startup' culture—quickly developing an MVP (Minimum Viable Product), launching it into the market, and hoping something 'sticks.'
Hallucinations in Diagnosis. The o1 'Strawberry' model rationalizes misdiagnoses, which is simply wrong information, especially in clinical decision-making.
o1 Model Diagnosis. My verdict: even the best large language models (LLMs) we have are not ready for prime time in healthcare.
Sergei Polevikov Introduction. Today’s guest author is Sergei Polevikov, a Ph.D.-trained mathematician, data scientist, AI entrepreneur, economist, and researcher with over 30 academic manuscripts.
Learning Budget. Many companies have a learning budget that you can expense this newsletter to.
Independent Work. I put a lot of effort into creating work that is informative, useful, and independent from undue influence.
Guest Insights. In the series Guests, I will invite these experts to come in and share their insights on various topics that they have studied/worked on.
Chocolate Milk Cult. Our chocolate milk cult has a lot of experts and prominent figures doing cool things.
Tom Hanks Warning. Tom Hanks warns followers to be wary of 'fraudulent' ads using his likeness through AI.
China's Chip Advancements. China's chip capabilities are reportedly just 3 years behind TSMC, showcasing rapid advancements.
Investment in AI Companies. Ilya Sutskever's startup, Safe Superintelligence, raises $1B, signaling strong investor confidence in AI.
AI Regulation in California. California's pending AI regulation bill highlights growing governmental interest in AI oversight.
AI Training Advances. Advances in training language models with long-context capabilities are emerging in the AI landscape.
OpenAI Hardware Move. OpenAI's move into hardware production is a significant development for the company.
Amazon AI Robotics. Amazon's strategic acquisition in AI robotics is a notable event in the industry.
Fostering Innovation. OSS leads to cheaper, safer, and more accessible products, all benefiting end users.
Diverse Contributor Benefits. OSS attracts a diverse set of contributors, leading to more efficient and innovative solutions.
Invest in Community Building. It is critical for any group to invest in creating a developer-friendly open-source project through comprehensive documentation and community engagement.
Ecosystem Development. Collaborating with other organizations to create integrated AI solutions expands market opportunities.
Training and Support. Providing training and certification in open-source AI frameworks can also generate revenue and build a community of skilled users.
OSS and Innovation. Open-source projects tend to explore more novel directions, lacking the short-term profit motives of traditional companies.
Micro and Macro Impact. OSS is really good at solving big, important problems that affect tons of people.
Benefits of Sharing. Companies that share their software get better street cred, outsource a lot of R&D to people for free, and hook more people into their ecosystem.
Cost Reduction Strategies. Adopting preexisting OS tools allows companies to reduce costs, build more secure systems, and iterate quickly.
End-User Benefits. End-users benefit from AI-powered applications that are improved through open-source collaboration.
Developer Portfolio Boost. Participation in open-source AI projects enhances career prospects as developers build public portfolios showcasing expertise in a highly competitive field.
Complementary Forces. Open and Closed Software are often complementary forces that are blended together to create a useful end product.
Open Source Investment. Companies invest significantly in open-source software (OSS) for enhanced innovation and competitive advantage.
Learning Budget Support. Many companies have a learning budget that you can expense this newsletter to.
Reasoning Capabilities. OpenAI describes this release as a 'preview,' highlighting its early-stage nature, and positioning o1 as a significant advancement in reasoning capabilities.
Autonomous AI Agents. 1,000 autonomous AI agents collaborate to build their own society in a Minecraft server, forming a merchant hub and establishing a constitution.
Humanoid Robot Development. A robotics company in Silicon Valley has made significant progress in developing humanoid robots for real-world work scenarios.
DataGemma Introduction. Google introduces DataGemma, a pair of open-source AI models that address the issue of inaccurate answers in statistical queries.
Adobe Firefly Milestone. Adobe's Firefly Services, the company's AI-driven innovation, has reached a milestone of 12 billion generations.
Runway AI Upgrade. AI video platform RunwayML has introduced a new video-to-video tool in its latest model, Gen-3 Alpha.
Corporate Structure Change. Sam Altman announced that the company's non-profit corporate structure will undergo changes in the coming year, moving away from being controlled by a non-profit.
AI Potential Advancements. These models represent a major leap forward in AI’s problem-solving potential, paving the way for new advancements in fields like medicine, engineering, and advanced coding tasks.
OpenAI o1 Model. OpenAI has introduced this new model as part of a planned series of 'reasoning' models aimed at tackling complex problems more efficiently than ever before.
API Costs High. For developers, however, it’s worth noting that the model takes much longer to produce outputs and the API costs for o1 are significantly higher than GPT-4o.
Training Approach. What sets o1 apart is its training approach—unlike previous GPT models, which were trained to mimic data patterns, o1 uses reinforcement learning to think through problems, step by step.
Microsoft's Usage Caps. Microsoft's Inflection adds usage caps for Pi, new AI inference services by Cerebrus Systems competing with Nvidia.
AI Advancements. Google's AI advancements with Gemini 1.5 models and AI-generated avatars, along with Samsung's lithography progress.
U.S. Restrictions on China. U.S. gov't tightens China restrictions on supercomputer component sales.
Chinese GPU Access. Chinese Engineers Reportedly Accessing NVIDIA's High-End AI Chips Through Decentralized 'GPU Rental Services'.
Elon Musk's Support. Elon Musk voices support for California bill requiring safety tests on AI models.
Poll on SB1047. Poll: 7 in 10 Californians Support SB1047, Will Blame Governor Newsom for AI-Enabled Catastrophe if He Vetoes.
AI Regulation. AI regulation discussions including California's SB1047, China's AI safety stance, and new export restrictions impacting Nvidia's AI chips.
Bias in AI. Biases in AI, prompt leak attacks, and transparency in models and distributed training optimizations, including the 'distro' optimizer.
.
Marcus' Dream. Gary Marcus continue to a dream of day in which AI research doesn’t center almost entirely around LLMs.
Update on Strawberry. OpenAI’s latest, GPT o1, code named Strawberry, came out.
GPT-4 Prediction. “Still flawed, still limited, seem more impressive on first use”. Almost exactly what I predicted we would see with GPT-4, back on Christmas Day 2022.
Synthetic Data Dependence. The new system appears to depend heavily on synthetic data, and that such data may be easier to produce in some domains (such as those in which o1 is most successful, like some aspects of math) than others.
Altman's AGI Stance. Altman had, much to my surprise, just echoed my longstanding position that current techniques alone would not be enough to get to AGI.
Content Recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week.
AI Summary Study. The reviewers’ overall feedback was that they felt AI summaries may be counterproductive and create further work because of the need to fact-check and refer to original submissions.
Green Powders Marketing. Good video on the misleading marketing behind Green Powders.
Roaring Bitmaps Impact. By storing these indices as Roaring bitmaps, we are able to easily evaluate typical boolean filters efficiently, reducing latencies by 500 orders of magnitude.
AI Adoption Barriers. Until the liabilities and responsibilities of AI models for medicine are clearly spelled out via regulation or a ruling, the default assumption of any doctor is that if AI makes an error, the doctor is liable for that error, not the AI.
AI in Clinical Diagnosis. Doctors bear a lot of risk for using AI, while model developers don’t.
Freedom of Speech Analysis. Tobias Jensen discusses content moderation on social media platforms and recent cases which trend towards preventing the harms that can (and has) been caused by social media messages not being regulated properly.
Highlighting Important Works. I’m going to highlight only two since they bring up extremely important discussions, and I want to get your opinions on them.
Next Planned Articles. Boeing, DEI, and 9 USD Engineers.
Survey Participation. Fred Graver is looking into understanding the demand for content around AI and is asking people to fill out a survey.
Community Engagement. We started an AI Made Simple Subreddit.
California AI Bill. The controversial California bill SB 1047, aimed at preventing AI disasters, has passed the state's Senate and is now awaiting Governor Gavin Newsom's decision.
Waymo Collision Data. Waymo's driverless cars have been involved in fewer injury-causing crashes per million miles of driving than human-driven vehicles.
AI Image Creation. AI has led to the creation of over 15 billion images since 2022, with an average of 34 million images being created per day.
Global AI Treaty. US, EU, and UK sign the world's first international AI treaty, emphasizing human rights and democratic values as key to regulating public and private-sector AI models.
Music Producer Arrested. Music producer arrested for using AI and bots to boost streams and generate AI music, facing charges of money laundering and wire fraud.
AI in Healthcare. Google DeepMind has launched AlphaProteo, an AI system that generates novel proteins to accelerate research in drug design, disease understanding, and health applications.
Ilya Sutskever Funding. Safe Superintelligence (SSI), an AI startup co-founded by Ilya Sutskever, has successfully raised over $1 billion in funding.
OpenAI AI Chips. OpenAI is reportedly planning to build its own AI chips using TSMC's forthcoming 1.6nm A16 process node, according to United Daily News.
iPhone 16 Launch. Apple has unveiled its iPhone 16 line, which includes the iPhone 16, iPhone 16 Plus, iPhone 16 Pro, and iPhone 16 Pro Max, all designed with the Apple Intelligence mind.
AI Impacts on Society. AI is likely to change the world in coming years, affecting virtually every aspect of society, from employment to education to healthcare to national defense.
Future Responsibility. It will be our fault if candidates don’t address AI policy; they certainly aren’t going to bother to talk about it if we don’t let them know it matters.
Call for Clarity. In an ideal world, moderators would demand clarity on candidates' policies around AI.
Vulnerability of Teens. Nonconsensual deep fake porn may especially affect the already vulnerable population of teenage girls, who have been harmed by social media.
AI Policy Neglect. A total neglect of AI policy would be deeply unfortunate; our long-term future may actually be shaped more by AI policy than tariffs.
Candidates' AI Plans. It would be a really good time to demand better [AI policies] from candidates; if we don’t, future generations may regret it.
Foundation Model Size. Aurora is a 1.3 Billion Foundation Model for environmental forecasting.
Predictive Modeling Framework. The authors have created a fine-tuning process that allows Aurora to excel at both short-term and long-term predictions.
Replay Buffer Mechanism. Aurora implements a replay buffer, allowing the model to learn from its own predictions, improving long-term stability.
Energy-Efficient Fine-Tuning. LoRA introduces small, trainable matrices to the attention layers, allowing Aurora to fine-tune efficiently while significantly reducing memory usage.
Variable Weighting Methodology. Aurora uses variable weighting, where different weights are assigned to different variables in the loss function to balance their contributions.
Rollout Fine-tuning Importance. Rollout fine-tuning addresses the challenge by training Aurora on sequences of multiple predictions, simulating the chain reaction of weather events over time.
Training with MAE. Mean Absolute Error (MAE) is used as the training objective, which is robust to outliers.
U-Net Architecture. The U-Net architecture allows for multi-scale processing, enabling the model to simultaneously understand local weather patterns and larger-scale atmospheric phenomena.
Swin Transformer Benefits. Swin Transformers excel at capturing long-range dependencies and scaling to large datasets, which is crucial for weather modeling.
Impact of Underreporting. Aurora got almost no attention, indicating a serious misplacement of priorities in the AI Community.
Community Awareness Gap. The ability of foundation models to excel at downstream tasks with scarce data could democratize access to accurate weather and climate information in data-sparse regions, such as the developing world and polar regions.
Sandstorm Prediction. Aurora was able to predict a vicious sandstorm a day in advance, which can be used in the future for evacuations and disaster planning.
Limited Data Handling. Aurora leverages the strengths of the foundation modelling approach to produce operational forecasts for a wide variety of atmospheric prediction problems, including those with limited training data, heterogeneous variables, and extreme events.
Advanced Predictive Capabilities. In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts that outperform state-of-the-art classical simulation tools and the best specialized deep learning models.
.
.
Energy Demand Increase. Demand is increasing, and the question is what bottlenecks will be alleviated to fulfill that demand.
Long-Term Value Creation. Value will be created in unforeseen ways.
Infrastructure Creation. AI application will not generate a net positive ROI on infrastructure buildout for some time.
AI Model Revenues. Our best indication of AI app revenue comes from model revenue (OpenAI at an estimated $1.5B in API revenue).
Data Center Demand. Theoretically, value should flow through the traditional data center value chain.
AI Total Expenditures. The cloud revenue gives us the real indication of how much value is being invested into AI applications.
AI Application Revenue. AI applications have generated a very rough estimate of $20B in revenue with multiples higher than that in value creation so far.
Nvidia Revenue. Last quarter, Nvidia did $26.3B in data center revenue, with $3.7B of that coming from networking.
Power Scarcity. They’ll do this themselves or through a developer like QTS, Vantage, or CyrusOne.
Compute Power Concerns. All three hyperscalers noted they’re capacity-constrained on AI compute power.
Application Value. ROI on AI will ultimately be driven by application value to end users.
Hyperscaler Decisions. Hyperscalers are making the right CapEx business decisions.
No Clear ROI. There’s not a clear ROI on AI investments right now.
AI ROI Debate. For the first time in a year and a half, common opinion is now shifting to the narrative 'Hyperscaler spending is crazy. AI is a bubble.'
CapEx Growth. Amazon, Google, Microsoft, and Meta have spent a combined $177B on capital expenditures over the last four quarters.
100K Readers. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Learning Budget. Many companies have a learning budget that you can expense this newsletter to.
Expert Invitations. In the series Guests, I will invite these experts to come in and share their insights on various topics that they have studied/worked on.
Chocolate Milk Cult. Our chocolate milk cult has a lot of experts and prominent figures doing cool things.
.
New Paradigms in NLP. Sebastian Raschka discusses recent pre-training and post-training paradigms in NLP models, highlighting significant new techniques.
LLM Performance Restrictions. Imposing formatting restrictions on LLMs leads to performance degradation, impacting reasoning abilities significantly.
Risks of Synthetic Training. Training language models on synthetic data leads to a consistent decrease in the diversity of the model outputs through successive iterations.
Standardizing Text Diversity. This work empirically investigates diversity scores on English texts and provides a diversity score package to facilitate research.
Impact of LLMs on Diversity. Writing with InstructGPT results in a statistically significant reduction in diversity.
Dimension Insensitive Metric. This paper introduces the Dimension Insensitive Euclidean Metric (DIEM) which demonstrates superior robustness and generalizability across dimensions.
Previews of Articles. Upcoming articles include 'The Economics of ESports' and 'The economics of Open Source.'
Notable Content Creator. Artem Kirsanov produces high-quality videos on computational neuroscience and AI, and offers very new ideas/perspectives for traditional Machine Learning people.
Community Engagement. We started an AI Made Simple Subreddit.
AI Content Focus. The focus will be on AI and Tech, but ideas might range from business, philosophy, ethics, and much more.
Support for Writing. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction.
.
.
Authors' Lawsuit. Authors sue Claude AI chatbot creator Anthropic for copyright infringement.
California AI Bill Weakening. California weakens bill to prevent AI disasters before final vote, taking advice from Anthropic.
Anysphere Funding. Anysphere, a GitHub Copilot rival, has raised $60M Series A at $400M valuation from a16z, Thrive, sources say.
OpenAI's New Deal. Ars Technica content is now available in OpenAI services.
AMD Acquisition. AMD buying server maker ZT Systems for $4.9 billion as chipmakers strengthen AI capabilities.
California Regulation. Analysis of California's AI regulation bill SB1047 and legal issues related to synthetic media, copyright, and online personhood credentials.
AI Model Scaling. Exploration of the feasibility and investment needed for scaling advanced AI models like GPT-4 and Agent Q architecture enhancements.
Perplexity Updates. Perplexity's integration of Flux image generation models and code interpreter updates for enhanced search results.
New AI Features. Ideogram AI's new features, Google's Imagine 3, Dream Machine 1.5, and Runway's Gen3 Alpha Turbo model advancements.
Episode Summary. Our 180th episode with a summary and discussion of last week's big AI news!
.
.
.
Social Media Trends. Advice for content creators often revolves around imitating successful content rather than fostering unique voices, contributing to conformity.
Conformity in Media. Social media and content creation platforms, initially designed for authentic expression, often lead to a relentless drive toward sameness and conformity.
Democracy and Conformity. Tocqueville observed that democratic societies foster a sense of equality among citizens, which can lead to pressure for conformity, homogenizing thought, expression, and behavior.
Need for Critical Diversity. When people lose exposure to diverse viewpoints, their capacity to visualize alternatives diminishes, reinforcing conformity.
Intellectual Homogeneity. A populace that is intellectually homogenous tends to rely on external sources for solutions, sacrificing personal agency and responsibility.
Over-Reliance on Institutions. Tocqueville noticed a tendency for citizens to increasingly rely on the government under the expectation that an elected government should solve societal problems.
Collective Action Importance. The OSS movement in tech allows people to find their communities and contribute, emphasizing the importance of collective small contributions leading to significant shifts.
Voluntary Associations. Tocqueville noted that Americans constantly form associations for various purposes, which serve as a powerful tool for collective action and public benefit.
Local Community Power. Tocqueville saw voluntary organizations and local community groups as crucial to counterbalance the negative tendencies of democracy.
Tyranny of the Majority. In modern democracies, tyranny manifests through social ostracism rather than physical oppression, leading to self-censorship and a society of self-oppressors.
Agency and Accountability. Tocqueville emphasizes the importance of people accepting agency and accountability for their information diet instead of relying on institutions.
Mental Health and Misinformation. We cry for the government or social media companies to do something about worsening mental health and the spread of misinformation, but how many of us have acted positively on these platforms?
Personal Responsibility. We often expect institutions to make systemic changes without acknowledging the importance of individual responsibility in taking actions that lead to systemic change.
.
AI Ethical Concerns. Google DeepMind employees are urging the company to end military contracts due to concerns about AI technology used for warfare.
Open-Source AI Definition. Open-source AI is defined as a system that can be used, inspected, modified, and shared without restrictions.
Authors Sue Anthropic. Authors are suing AI startup Anthropic for using pirated texts to train its chatbot Claude, alleging large-scale theft.
AI in Ad Creation. Creatopy, which automates ad creation using AI, has raised $10 million and now serves over 5,000 brands and agencies.
Google's AI Image Generator. Google has released a powerful AI image generator, Imagen 3, for free use in the U.S., outperforming other models.
Content Partnership. OpenAI has partnered with Condé Nast to display content from its publications within AI products like ChatGPT and SearchGPT.
OpenAI's Regulatory Stance. OpenAI has opposed the proposed AI bill SB 1047 aimed at implementing safety measures, despite public support for regulation.
California AI Regulation. Anthropic's CEO supports California's AI bill SB 1047, stating the benefits outweigh the costs, despite some concerns.
AI for Coding Tasks. Open source Dracarys models are specifically designed to optimize coding tasks and significantly improve performance of existing models.
Advanced Long-Context Models. AI21's Jamba 1.5 Large model has demonstrated superior performance in latency tests against similar models.
Outperforming Competitors. Microsoft's Phi-3.5 outperforms other small models from Google, OpenAI, Mistral, and Meta on several key metrics.
Efficient Small Models. Nvidia's Llama-3.1-Minitron 4B performs comparably to larger models while being more efficient to train and deploy.
.
.
.
.
.
Optimizations in Distance Measurement. FINGER significantly outperforms existing acceleration approaches and conventional libraries by 20% to 60% across different benchmark datasets.
Integration of Graph-Based Indexes. Given that we’re already working on graphs, another promising direction for us has been integrating graph-based indexes and search.
User Verification. By letting our users both verify and edit each step of the AI process, we let them make the AI adjust to their knowledge and insight, instead of asking them to change for the tool.
Focus on Transparency. Model transparency is crucial as a few trigger words/phrases can change the meaning/implication of a clause; users need to have complete insight into every step of the process.
Leveraging Control Tokens. We use control tokens, which are special tokens to indicate different types of elements, enhancing our tokenization process.
Flexible Indexing Approach. Updating the indexes with new information is much cheaper than retraining your entire AI model. Index-based search also allows us to see which chunks/contexts the AI picks to answer a particular query.
Hallucinations in AI. Type 1 Hallucinations are not a worry because our citations are guaranteed to be from the data source, and Type 2 Hallucinations will be reduced significantly through our unique process of constant refinement.
Focus on User Feedback. Our unique approach to involving the user in the generation process leads to a beautiful pair of massive wins against Hallucinations.
Reducing Costs. Relying on a smaller, Mixture of experts style setup instead of letting bigger models do everything reduces our costs dramatically, allowing us to do more with less.
Flexibility in Architecture. The best architecture is useless if it can't fit into your client's processes. Being Lawyer-Led, IQIDIS understands the importance of working within a lawyer's/firm's workflow.
KI-RAG Challenges. Building KI-RAG systems requires a lot more handling and constant maintenance, making them more expensive than traditional RAG.
Handling Legal Nuances. There is a lot of nuance to Law. Laws can change between regions, different sub-fields weigh different factors, and a lot of law is done in the gray areas.
High Cost of Mistakes. A mistake can cost a firm millions of dollars in settlements and serious loss of reputation. This high cost justifies the investment into better tools.
Importance of RAG. RAG is one of the most important use-cases for LLMs, and the goal is to build the best RAG systems possible.
Cost of Legal Expertise. Legal Expertise is expensive. If a law firm can cut down the time required for a project by even a few hours, they are already looking at significant savings.
Need for Higher Adaptability. Building upon this is a priority after our next round of fund-raising (or for any client that specifically requests this).
End User Engagement. Users can inspect multiple alternative paths to verify the quality of secondary/tertiary relationships.
Obsession with User Feedback. I’d be lying if I said that there is one definitive approach (or that what we’ve done is absolutely the best approach).
Machine Learning in Legal Domain. These are the main aspects of the text-based search/embedding that are promising based on research and our own experiments.
Governance Concerns. Internal governance is key; it shouldn't be just one person at the top of one company calling the shots for all humanity.
Legislation Improvement. Saunders did not think SB-1047 was perfect but says the proposed legislation was the best attempt I've seen to provide a check on this power.
Employee Discontent. Promises have been made and not kept; they lost faith in Altman personally, and have lost faith in the company's commitment to AI safety.
Power Corrupts. If we don't figure out the governance problem, internal and external, before the next big AI advance, we could be in serious trouble.
Timelines for AGI. Saunders thinks it is at least somewhat plausible we will see AGI in a few years; I do not.
Need for Regulation. If OpenAI (and others in Silicon Valley) succeed in torpedoing SB-1047, self-regulation is in many ways what we will be left with.
Call for Accountable Power. Saunders described as a metaprinciple, 'Don't give power to people or structures that can't be held accountable.'
OpenAI's Opposition. OpenAI has just announced that it is opposed to California's SB-1047 despite Altman's public support for AI regulation at the Senate.
Future Whistleblower Protections. One of the most important reasons for passing SB-1047 in California was its whistleblower protections.
External Oversight Needed. There should be a role for external governance, as well: companies should not be able to make decisions of potentially enormous magnitude on their own.
.
AI Codec Proposal. Using canonical codec representations like JPEG, this article proposes a method to directly model images and videos as compressed files, showing its effectiveness in image generation.
Deepfake Scams. Elderly retiree loses over $690,000 to digital scammers using AI-powered deepfake videos of Elon Musk to promote fraudulent investment opportunities.
Procreate Stance. Procreate vows to never incorporate generative AI into its products, taking a stand against the technology.
US AI Lead. US leads in AI investment and job postings, surpassing China and other countries.
AI Image Licensing. OpenAI CEO's warning about the use of copyrighted content in AI models is highlighted as Anthropic faces a lawsuit for training its Claude AI model using authors' work without consent.
AI Risks Repository. MIT researchers release a comprehensive AI risk repository to guide policymakers and stakeholders in understanding and addressing the diverse and fragmented landscape of AI risks.
Research Automation Phases. The AI Scientist operates in three phases: idea generation, experimental iteration, and paper write-up.
AI Scientist Development. "The AI Scientist" is a novel AI system designed to automate the entire scientific research process.
AI Artist Claim Approved. The judge allowed a copyright claim against DeviantArt, which used a model based on Stable Diffusion.
Lawsuit Progress. The lawsuit against AI companies Stability and Midjourney, filed by a group of artists alleging copyright infringement, has gained traction as Judge William Orrick approved additional claims.
Conversational Features. Gemini Live can also interpret video in real time and function in the background or when the phone is locked.
Gemini Live Introduction. Google has introduced a new voice chat mode for its AI assistant, Gemini, named Gemini Live.
AI-driven Features. The company plans to deploy Grok-2 and Grok-2 mini in AI-driven features on X, including improved search capabilities, post analytics, and reply functions.
Image Tolerance. Compared to other image generators on the market, the model is far more permissive with regards to what images it can generate.
Image Generation Capabilities. Grok has also integrated FLUX.1 by Black Forest Labs to enable users to generate images.
Premium Access. Access to Grok is currently limited to Premium and Premium+ users.
Grok-2 Release. Elon Musk's company, X, has launched Grok-2 and Grok-2 mini in beta, both of which are AI models capable of generating images on the X social network.
Research Engineer Openings. Haize Labs is looking for research scientists to join their teams based in NYC.
Shoutout.io Page. Shoutout.io is a very helpful tool that allows independent creators to gather testimonials in one place.
Case Study Articles. I’d like to do more case-study-style articles, where we look into different organizations to study how they solved their business/operational challenges with AI.
Guest Posts Initiative. I want to integrate more guest posts in this newsletter to cover a greater variety of topics and hear from experts across the board.
Encouragement to Apply. We encourage you to apply even if you do not believe you meet every single qualification: We’re open to considering a wide range of perspectives and experiences.
Prompt Caching Launch. Prompt Caching is Now Available on the Anthropic API for Specific Claude Models.
AI Search Evolution. Google's AI-generated search summaries change how they show their sources.
Risks of Unaligned AI. Overview of potential risks of unaligned AI models and skepticism around SingularityNet's AGI supercomputer claims.
Huawei's AI Chip. Huawei's Ascend 910C AI chip aims to rival NVIDIA's H100 amidst US export controls.
Grok 2 Beta Release. Grok 2's beta release features new image generation using Black Forest Labs' tech.
Google Voice Chat Feature. Google introduces Gemini Voice Chat Mode available to subscribers and integrates it into Pixel Buds Pro 2.
Deepfake Scams. How ‘Deepfake Elon Musk’ Became the Internet's Biggest Scammer.
FCC AI Robocall Rules. FCC Proposes New Rules on AI-Powered Robocalls.
MIT AI Risks Repository. MIT researchers release a repository of AI risks.
Popular AI Search Startup. Perplexity's popularity surges as AI search start-up takes on Google.
Regulatory Fight. Most or all of the major big tech companies joined a lobbying organization that fought SB-1047, despite broad public support for the bill.
Innovative Balance. Passing SB-1047 may normalize the regulation of AI while allowing for continued innovation, showing that safety precautions are compatible with industry growth.
Need for Federal Legislation. Future state and federal efforts may suffer if the bill doesn't pass, showing that comprehensive regulatory efforts are needed at all levels.
Comprehensive Approach Needed. We need a comprehensive approach to AI regulation, as SB 1047 is just a start in addressing various risks associated with AI.
Whistleblower Protections. The bill provides important whistleblower protections, which are critical for transparency and accountability in AI companies.
Deterrent Value. SB-1047's strongest utility may come as a deterrent, clarifying that the duty to take reasonable care applies to AI developers.
Weak Assurance. The 'reasonable care' standard may be too weak, as billion-dollar companies might exploit it without facing meaningful consequences.
Narrow Focus. SB 1047 seems heavily skewed toward addressing hypothetical existential risks while largely ignoring demonstrable AI risks like misinformation and discrimination.
Legal Standards. The new form of SB 1047 can basically only be used after something really bad happens, as a tool to hold companies liable, rather than prevent risks.
Bill Weakened. California's SB-1047 was significantly weakened in last-minute negotiations, affecting its ability to address catastrophic risks.
High Subscription Importance. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
AI Adoption by Courts. The Attorney General's Office of São Paulo adopted GPT-4 last year to speed up the screening and reviewing process of lawsuits.
Cautious AI Implementation. Hallucination risks and security and data confidentiality concerns call for tremendous caution and common sense when using and implementing AI tools.
Impact on Legal Services. Legal copilots will inevitably drive down the price of legal services and make legal knowledge more accessible to non-lawyers.
Legal AI Tools' Future. The legal copilots that will succeed should be developed and branded with a focus on time-savings and productivity benefits.
Changing Nature of Legal Work. AI-driven tools will take care of routine, monotone tasks so lawyers can focus more on the strategic, high-value work.
AI Use in Legal Sector. 73% of 700 lawyers planned to utilize generative AI in their legal work within the next year.
AI Speed vs Court Speed. High tech runs three-times faster than normal businesses, and the government runs three times slower than normal businesses.
Access to Justice Correlation. We can find a strong correlation between the fairness and independence of the court system and the general life quality and well-being of its populace.
Judicial System's Importance. The court system undertakes a vitally important function in society as a central governance mechanism.
Experts in Chocolate Milk. Our chocolate milk cult has a lot of experts and prominent figures doing cool things.
GPT-5 Not Released. And no, GPT-5 did not drop this week as many had hoped.
Expectations Reframing. At the very least, I foresee a significant reframing of expectations.
AI Winter Speculation. As for whether there is an AI winter coming, time will tell.
Thoughts on Regulation. My thoughts on regulation are of course coming soon, in my next book (Taming Silicon Valley, now available for pre-order).
Differing Views. Interesting to see where his take and mine differ.
Audio Version Available. There is also an audio only version, here.
Keynote Video. Here’s the video (well-produced by Machine Learning Street Talk (MLST) of a talk I gave on Friday, as a keynote at AGI-Summit 24.
.
Google Antitrust Ruling. Google Monopolized Search Through Illegal Deals, Judge Rules.
California AI Bill Impact. 'The Godmother of AI' says California's well-intended AI bill will harm the U.S. ecosystem.
New Humanoid Robot. Figure's new humanoid robot leverages OpenAI for natural speech conversations.
UK Merger Probe. Amazon faces UK merger probe over $4B Anthropic AI investment.
OpenAI Co-founder Exit. OpenAI co-founder Schulman leaves for Anthropic, Brockman takes extended leave.
Adept AI Returns. Investors in Adept AI will be paid back after Amazon hires startup's top talent.
Character.AI Founders. Google's hiring of Character.AI's founders is the latest sign that part of the AI startup world is starting to implode.
Compute Efficiency Research. Research advancements such as Google's compute-efficient inference models and self-compressing neural networks, showcasing significant reductions in compute requirements while maintaining performance.
Humanoid Robotics Advances. Rapid advancements in humanoid robotics exemplified by new models from companies like Figure in partnership with OpenAI, achieving amateur-level human performance in tasks like table tennis.
OpenAI Changes. OpenAI's dramatic changes with co-founder exits, extended leaves, and new lawsuits from Elon Musk.
Personnel Movements. Notable personnel movements and product updates, such as Character.ai leaders joining Google and new AI features in Reddit and Audible.
.
Adversarial Perturbations Explained. Adversarial perturbations (AP) are subtle changes to images that can deceive AI classifiers by causing misclassification.
Mass Surveillance Impact. A recent study in The Quarterly Journal of Economics suggests that fewer people protest when public safety agencies acquire AI surveillance software to complement their cameras.
Multi-modal AI Concerns. Despite the potential of multi-modal AI, there is worry regarding its use in mass surveillance and automated weapon systems.
Emerging Adversarial Techniques. Transferability of adversarial examples between models and query-based attacks are vital strategies for black-box settings.
Evolutionary Strategies Potential. Evolutionary algorithms, such as genetic algorithms and differential evolution, show promise for generating adversarial perturbations.
Norm Considerations in Perturbation. Different norms (L1, L2, and L-infinity) significantly impact the outcome and effectiveness of adversarial perturbations.
Robust Features Importance. Training on just Robust Features leads to good results, suggesting a generalized extraction of robust features is a valuable future avenue for exploration.
Infectious Jailbreak Feasibility. Feeding an adversarial image into the memory of any randomly chosen agent can achieve infectious jailbreak, causing all agents to exhibit harmful behaviors exponentially fast.
Agent Smith Attack. The Agent Smith setup involves simulating a multi-agent environment where a single adversarial image can lead to widespread harmful behaviors across almost all agents.
Facial Recognition Use Case. In the U.K., the London Metropolitan Police admitted to using facial recognition technology on tens of thousands of people attending King Charles III's coronation in May 2023.
WeRide IPO Plans. WeRide, a Chinese autonomous vehicle company, is seeking a $5.02 billion valuation in its U.S. IPO, aiming to raise about $96 million from the offering.
Falcon Mamba 7B Launch. The Technology Innovation Institute (TII) has introduced Falcon Mamba 7B, a new large language model that uses a State Space Language Model (SSLM) architecture, marking a shift from traditional transformer-based designs.
Figure 02 Introduction. Figure has introduced its latest humanoid robot, Figure 02, which is designed to work alongside humans in a factory setting.
AI-Driven 3D Generation. A research paper by scientists from Meta and Oxford University introduces VFusion3D, an AI-driven technique capable of generating high-quality 3D models from 2D images in seconds.
New Supercomputing Initiatives. A new supercomputing network aims to accelerate the development of artificial general intelligence (AGI) through a worldwide network of powerful computers.
AI Emotional Attachment Concerns. OpenAI is concerned about users developing emotional attachments to the GPT-4o chatbot, warning of potential negative impacts on human interactions.
Performance Verification. Falcon Mamba 7B has been independently verified by Hugging Face as the top-performing open-source SSLM globally, outperforming established transformer-based models in benchmark tests.
AI Assistant at JPMorgan. JPMorgan Chase has rolled out a generative AI assistant to tens of thousands of its employees, designed to be as ubiquitous as Zoom.
Artists' Lawsuit Progress. A class action lawsuit against AI companies Stability, Runway, and DeviantArt, filed by artists alleging copyright infringement, has been partially approved to proceed by a judge.
AI Law in Europe. The world's first-ever AI law is now enforced in Europe, targeting US tech giants.
AI News Summary. Hosts Andrey Kurenkov and John Krohn dive into significant updates and discussions in the AI world.
Instagram AI Features. Instagram's new AI features allow people to create AI versions of themselves.
Waymo Rollout. Waymo's driverless cars have rolled out in San Francisco.
NVIDIA Chip Issues. Nvidia reportedly delays its next AI chip due to a design flaw.
New AI Tools. Black Forest Labs releases Open-Source FLUX.1, a 12 Billion Parameter Rectified Flow Transformer capable of generating images from text descriptions.
Open-Source AI Stance. The White House says there is no need to restrict 'open-source' artificial intelligence — at least for now.
Misinformation Impact. The impact of misinformation via deepfakes, particularly one involving Elon Musk, is also highlighted.
Common Regulatory Standards. Asking for standards and a degree of care in AI is common across many industries, contrasting with the fewer regulations on AI systems that could pose catastrophic risks.
Clarifications Requested. Concerns about inaccuracies in the essay lead to a request for reconsideration of the stance on SB-1047.
Need for Concrete Suggestions. While favoring AI governance, there are no positive, concrete suggestions offered for addressing risks such as mass casualties or large-scale cyberattacks.
Concerns on SB-1047. SB-1047 does not require predicting every use of an AI model, but focuses on specific, serious 'critical harms' such as mass casualties and large-scale cyberattacks.
Impact on Little Tech. Much of the bill's requirements are limited to models with training runs of $100 million+, which does not predominantly impact 'little-tech'.
Kill Switch Misunderstanding. The 'kill switch' requirement doesn't apply to open-source models once they are out of the original developer's control.
.
Chunking Strategy. Sentence-level chunking with a size of 512 tokens, using techniques like 'small-to-big' and 'sliding window', provides a good balance between information preservation and processing efficiency.
BERT Accuracy. A BERT-based classifier achieved high accuracy (over 95%) in determining retrieval needs.
Query Classification. Decides if retrieval is needed for a given query, helping keep costs down.
RAG Advantages. RAG speeds this up by having the AI find relevant contexts and aggregate them.
RAG Definition. Retrieval Augmented Generation involves using AI to search a pre-defined knowledge base to answer user queries.
Cost Considerations. While modern RAG (especially generator-heavy setups) are more expensive than V0, the general principle is still useful to keep in mind.
RAG System Recipes. The authors propose two distinct recipes for implementing RAG systems.
Integration Benefits. Query Classification Module leads to an average improvement in overall score from 0.428 to 0.443 and a reduction in latency time from 16.41 to 11.58 seconds per query.
RAG vs Fine-Tuning. RAG outperforms fine-tuning with respect to injecting new sources of information into an LLM's responses.
Fine-Tuning Focus. It’s best to keep the learning/information mainly to the data indexing.
Retrieval Methods Findings. The authors recommend monoT5 as a comprehensive method balancing performance and efficiency.
Hybrid Retrieval Success. Hybrid search, combining sparse and dense retrieval with HyDE, achieves the best retrieval performance.
.
Airbnb Architecture Shift. In 2018, Airbnb began its migration to a service-oriented architecture due to challenges with maintaining their Ruby on Rails 'monorail'.
Vocab Size Research. Research indicates that larger models deserve larger vocabularies, and increasing vocabulary size consistently improves downstream performance.
Confabulation Perspective. Hallucinations in large language models can be considered a potential resource instead of a categorically negative pitfall.
GitHub CI/CD Insights. GitHub runs 15,000 CI jobs within an hour across 150,000 cores of compute.
Machine Learning Applications. Software engineers building applications using machine learning need to test models in real-world scenarios before choosing the best performing model.
RAG vs. LLMs. When resourced sufficiently, long-context LLMs consistently outperform Retrieval Augmented Generation in terms of average performance.
LLM Paper Notes. Jean David Ruvini posts his notes on LLM/NLP related papers every month, providing valuable insights.
Emergent Garden. Emergent Garden puts out very interesting videos on Life simulations, neural networks, cellular automata, and other emergent programs.
Community Engagement. Devansh encourages individuals doing interesting work to drop their introduction in the comments for potential spotlight features.
Reading Recommendations. Devansh plans to share AI Papers/Publications, interesting books, videos, etc., each week.
Supporting Independent Work. Devansh puts a lot of effort into creating work that is informative, useful, and independent from undue influence.
Content Focus. The focus will be on AI and Tech, but ideas might range from business, philosophy, ethics, and much more.
Meta's AI Studio Launch. Meta has launched a new tool called AI Studio, allowing users in the US to create AI versions of themselves on Instagram or the web.
Autonomous Driving Milestone. Stanford Engineering and Toyota Research Institute achieve a milestone in autonomous driving by creating the world’s first autonomous Tandem Drift team, using AI to direct two driverless cars to perform synchronized maneuvers.
Concerns Over AI Alteration. Elon Musk shares deepfake video of Kamala Harris, potentially violating platform's policies against synthetic and manipulated media, sparking concerns about AI-altered content in the upcoming election.
AI Law in Europe. Europe enforces the world's first AI law, targeting US tech giants with regulations on AI development, deployment, and use.
Perplexity AI's Revenue Share. Perplexity AI plans to share advertising revenue with news publishers whose content is used by the bot, responding to accusations of plagiarism and unethical web scraping.
Funding for Black Forest Labs. Black Forest Labs, a startup founded by the creators of Stable Diffusion, has launched FLUX.1, a new text-to-image model suite for the open-source artificial intelligence community and secured $31 million in seed funding.
Musk's Revived Lawsuit. Elon Musk has reinitiated a lawsuit against OpenAI, the creator of the AI chatbot ChatGPT, reigniting a longstanding dispute that originated from a power conflict within the San Francisco-based startup.
Focus on AI Alignment. Schulman, who played a key role in creating the AI-powered chatbot platform ChatGPT and led OpenAI's alignment science efforts, stated his move was driven by a desire to focus more on AI alignment and hands-on technical work.
OpenAI Departures. OpenAI co-founder John Schulman has left the company to join rival AI startup Anthropic, while OpenAI president and co-founder Greg Brockman is taking an extended leave until the end of the year.
.
Monetization Intent. Altman wants to know - and monetize - everything about you.
Investment in Hardware. OpenAI just put in money in a $60M fundraise with a Webcam company and is planning hardware joint venture with them.
Security Expertise. OpenAI recently put Paul Nakasone (ex NSA) on the board.
WorldCoin Connection. Sam founded WorldCoin, known for their eye-scanning orb.
Data Collection Scale. ChatGPT has gathered unprecedented amounts of personal data.
Personal Data Training. Sam Altman has acknowledged wanting to train on everyone's personal documents (Word files, email etc).
Key Staff Departures. Over the last several months they have lost Ilya Sutskever, a whole bunch of safety people, and (slightly earlier) Andrej Karpathy.
Continuous Monitoring. Gary Marcus has had his eye on OpenAI for a long time.
Image Link. OpenAI's challenges appear visually notable.
Future Prospects Doubted. Prospects don’t seem as strong as they once did.
Valuation Concerns. Will they earn enough to justify their $80B valuation?
Risk of WeWork Comparison. I said it before, and I will say it again: OpenAI could wind up being seen as the WeWork of AI.
Morale Issues Identified. The board, which basically said it couldn't trust Sam, may have had a point.
Election Misinformation. Five states suggested that Musk's AI chatbot has spread election misinformation.
Elon Musk's Lawsuit. Elon sued OpenAI again; the most interesting thing is that the suit could force a discussion of what AGI means – in court.
AGI Predictions. OpenAI tempered expectations for its next event, and said we wouldn't see GPT-5 then.
Nvidia Stock Decline. Nvidia dropped 6%, 20% over the last month.
Market Uncertainty. It is also not out of the question that today could end someday be seen as a turning point.
Google Antitrust Case. Google lost its antitrust case; it could have implications for Google's storehouse of AI training data.
Cohere's Funding. AI startup Cohere raises US$500-million, valuing company at US$5.5-billion.
Meta's New AI Model. Meta releases open-source AI model it says rivals OpenAI, Google tech.
OpenAI's SearchGPT. OpenAI announces SearchGPT, its AI-powered search engine.
Google's Gemini Model. Google gives free Gemini users access to its faster, lighter 1.5 Flash AI model.
Strike Over AI. Video game performers will go on strike over artificial intelligence concerns.
Legislative Actions. Democratic senators seek to reverse Supreme Court ruling that restricts federal agency power.
Impact of AI on Jobs. As new tech threatens jobs, Silicon Valley promotes no-strings cash aid.
AI Safety Concerns. Senators demand OpenAI detail efforts to make its AI safe.
AI in Mathematics. AI achieves silver-medal standard solving International Mathematical Olympiad problems.
Historical Predictions. In December 2022, at the height of ChatGPT's popularity I made a series of seven predictions about GPT-4 and its limits, such as hallucinations and making stupid errors, in an essay called What to Expect When You Are Expecting GPT-4.
Strict Disbelief. I've always thought GenAI was overrated.
Consistent Predictions. In March of this year, I made a series of seven predictions about how this year would go. Every one of them has held firm, for every model produced by every developer ever since.
Warning About AI. Almost exactly a year ago, in August 2023, I was (AFAIK) the first person to warn that Generative AI could be a dud.
Investor Enthusiasm Diminishing. Investors may well stop forking out money at the rates they have, enthusiasm may diminish, and a lot of people may lose their shirts.
Generative AI Limitations. There is just one thing: Generative AI, at least we know it now, doesn't actually work that well, and maybe never will.
Imminent Collapse. The collapse of the generative AI bubble – in a financial sense – appears imminent, likely before the end of the calendar year.
AI Bubble Prediction. I just wrote a hard-hitting essay for WIRED predicting that the AI bubble will collapse in 2025 — and now I wish I hadn't.
.
AGI Misconceptions. Realizing neural networks struggle with outliers makes AGI seem like sheer fantasy, as no general solution to the outlier problem exists yet.
Symbolic vs Neural Networks. Symbolic systems have always been good for outliers; neural networks have always struggled with them.
Generative AI Expectations. GenAI sucks at outliers; if things are far enough from the space of trained examples, the techniques will fail.
AI Industry Bubble. An entire industry has been built - and will collapse - because people aren’t getting it regarding the outlier problem.
Cognitive Sciences Respect. AI researchers should have more respect for the cognitive sciences to make better advancements.
Historical Context. Machine learning had trouble with outliers in the 1990s, and it still does.
Outlier Problem Noted. Handling outliers is still the Achilles’ Heel of neural networks; this has been a constant issue for over a quarter century.
Median Split Insight. The key dividing line on the SAT math lies between those who understand fractions, and those who do not.
Machine Learning Limitations. Current approaches to machine learning are lousy at outliers, which means they often say and do things that are absurd when encountering unusual circumstances.
Burnout Society Overview. Byung-Chul Han describes how modern society primes us for burnout, reflecting on individual experiences in this context.
Limitations of Achievement. The achievement society leads to a distorted view of life, reducing relationships and experiences to mere metrics of success.
Engagement with Philosophy. The article recommends exploring philosophical perspectives like those of Nietzsche and Kierkegaard alongside Han's analysis for a broader understanding of the issues at hand.
Cultural Critique. While some critiques of Han's work resonate, there are also suggestions that engaging with craftsmanship can bring joy, countering the narrative of constant productivity.
Effects of Boredom. Han highlights that deep boredom can lead to mental relaxation, contrasting with the hectic pace of contemporary life.
Importance of Idleness. Han emphasizes the need for idle work, where tasks are done without worrying about results, to regain the right to be 'Human Beings' instead of 'Human Doings'.
Self-Destructive Pressure. The achievement-subject experiences destructive self-reproach and auto-aggression, resulting in a mental war against themselves.
Internalized Taskmaster. The internalized taskmaster becomes more insidious than any external authority, driving individuals to constantly strive for more.
Impact of Positivity. In the achievement society, positivity becomes a dominant force, pushing individuals to be happier and more successful, leading to internalized pressure.
Achievement Society Dynamics. Society has transitioned from a Discipline-based model to an Achievement-based one, driven by internal pressures to succeed.
.
Survey Findings. The Upwork survey highlighted during the week reflects shifting sentiments around Generative AI.
Warning on Deep Learning. Gary Marcus has been warning that deep learning was oversold since November 2012. Looks like he was right.
Opportunity for Resources. The fact that the GenAI bubble is apparently bursting sooner than expected may soon free up resources for other approaches, e.g., into neurosymbolic AI.
Loss of Faith. The bubble has begun to burst. Users have lost faith, clients have lost faith, VC's have lost faith.
Canceled Deal Reported. Business Insider reported a canceled deal, exacerbating concerns for the sector.
Investor Concerns. Microsoft's Chief Financial Officer painted a picture of a much slower burn, alarming some investors.
GenAI Project Canceled. Another GenAI monetization scheme bites the dust.
Generative AI Decline. Generative AI might be a dud; I just didn't expect it to fade so fast.
Legal Complications. Deepfakes challenge the reliability of digital evidence in court, potentially slowing legal processes.
Education and Empowerment. The best regulation will, therefore, focus on equipping us with the skills needed to navigate this.
Age of Misinformation. We fail with Deepfakes because we fail with SoMe, resorting to ineffective cases for both- censorship and an abdication of personal responsibility.
Combatting Environmental Concerns. Investing in more energy-efficient hardware and software for deepfake creation can significantly reduce energy consumption and emissions.
Environmental Impact. The energy-intensive process of generating deepfakes will contribute to climate change.
Scams and Vulnerability. Deepfakes provide a new tool for scammers, especially in targeting emotionally vulnerable people.
Labeling AI Content. I believe that heavily AI-generated content should be labeled, and people featured in AI Ads must have given explicit approval for their appearance.
Exploitation of Public Figures. Non-consensual use of deepfakes can dilute personal brands and harm fan relationships.
Political Misinformation. The real danger lies in the lack of media literacy and critical thinking skills, exacerbated by political polarization.
Combat Information Overload. The best way to combat the information overload created by Deepfakes is to empower people to stand on their own, interact with the world, and take care of themselves.
Need for Educational Reform. The way we see Education needs a rework- the emphasis on Courses, books, and degrees creates learners who are too static and passive.
Cognitive Overload. The most immediate and pervasive impact of deepfakes would be the cognitive overload and information fatigue they create.
Deepfake Risks Discussion. The discussions around the risks from Deepfakes are incomplete (or wrong) since they overexaggerate some risks while ignoring others.
.
FTC AI Investigation. FTC investigates how companies use AI to implement surveillance pricing based on consumer behavior and personal data, seeking information from eight major companies.
AI Scraping Backlash. AI companies are facing a growing backlash from website owners who are blocking their scraper bots, leading to concerns about the availability of data for AI training.
Regulatory Pressure. Elon Musk's X platform is under pressure from data regulators after it emerged that users are consenting to their posts being used to build artificial intelligence systems via a default setting on the app.
OpenAI Bankruptcy Risk. OpenAI faces potential bankruptcy with projected $5 billion losses due to high operational costs and insufficient revenue from its AI ventures.
AI Funding Surge. AI startups have raised $41.5 billion worldwide in the past five years, surpassing other industries and indicating a significant role for AI in the future development and modernization of various sectors.
Adobe Generative AI. Adobe introduces new generative AI features to Illustrator and Photoshop, including tools like Generative Shape Fill and Text to Pattern in Illustrator.
YouTube Search Deal. Google has become the exclusive search engine capable of surfacing results from Reddit, one of the internet's most significant sources of user-generated content.
SearchGPT Launch. OpenAI has announced its entry into the search market with SearchGPT, an AI-powered search engine that organizes and makes sense of search results rather than just providing a list of links.
Mistral Large 2. Mistral AI has launched Mistral Large 2, a new generation of its flagship model, boasting 123 billion parameters and a 128k context window.
Study Reference. Read Bjarnason's new essay here.
Organizational Expectations. Management's expectation that AI is a magic fix for the organizational catastrophe that is the mass layoff fad is often unfounded.
General Public Sentiment. Many coders and tech aficionados may love ChatGPT for work, but much of the outside world feels quite differently.
Unusual Study Results. It's quite unusual for a study like this on a new office tool to return such a resoundingly negative sentiment.
Negative AI Impact. Over three in four (77%) say AI tools have decreased their productivity and added to their workload in at least one way.
Productivity Concerns. Nearly half (47%) of workers using AI say they have no idea how to achieve the productivity gains their employers expect.
Generative AI Bubble. I fully expect that the generative AI bubble will begin to burst within the next 12 months, for many reasons.
Neurosymbolic AI Potential. AlphaProof and AlphaGeometry are both along the lines of first that we discussed, using formal systems like Cyc to vet solutions produced by LLMs.
Limitations of Generative AI. The biggest intrinsic failings of generative AI have to do with reliability, in a way that I believe can never be solved, given their inherent nature.
Progress by Google DeepMind. To do this GDM used not one but two separate systems, a new one called AlphaProof, focused on theorem proving, and an update (AlphaGeometry 2) to an older one focused on geometry.
Confidence in AI. On balance, these systems simply cannot be counted on, which is a bit part of why Fortune 500 companies have lost confidence in LLMs, after the initial hype.
Frustration with LLMs. My strong intuition... is that LLMs are simply never going to work reliably, at least not in the general form that so many people last year seemed to be hoping.
Need for Hybrid Models. What I have advocated for, my entire career, is hybrid approaches, sometimes called neurosymbolic AI, because they combine the best of the currently popular neural network approach with the symbolic approach.
Policy Issues. The U.S. is considering 'draconian' sanctions against China's semiconductor industry.
AI Video Model. Haiper 1.5 is a new AI video generation model challenging Sora and Runway.
Open Source Advancements. Mistral releases Codestral Mamba for faster, longer code generation.
GPT-4o Mini Release. OpenAI's release of GPT-4o Mini is a small AI model powering ChatGPT.
Internal Controversies. Whistleblowers say OpenAI illegally barred staff from airing safety risks.
Elon Musk's Supercomputer. Elon Musk is working on a giant xAI supercomputer in Memphis.
.
.
Nvidia's Value. In weeks leading up to Nvidia becoming the most valuable company in the world, I’ve received numerous requests for the updated math behind my analysis.
LLM Evaluation Technique. We explore the use of state-of-the-art LLMs, such as GPT-4, as a surrogate for humans.
Human Evaluation Challenges. While human evaluation is the gold standard for assessing human preferences, it is exceptionally slow and costly.
Future Articles. Deepfake Part 3. Exploring the true dangers of AI-generated misinformation.
Active Subreddits. We started an AI Made Simple Subreddit. Come join us over here.
AI's Investment Issues. Turns out a lot of the massive GPU purchase agreements and data center acquisitions were misguided and investing without a clear long-term vision and no understanding of revenue has lead to no ROI.
Philosophy of Love. Dostoevsky's ideas about love are hopeful, optimistic, demanding, and terrifying.
Importance of Stakeholder Alignment. The impact of getting stakeholder communication right vs wrong can be immense.
Dhabawala Case Study. Mumbai’s Dhabawala service presents an interesting case study of what is required to make food delivery profitable.
Community Engagement. If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me.
AI Health Uncut. Sergei Polevikov publishes super insightful and informative reports on AI, Healthcare, and Medicine as a business.
Reading Recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week.
Newsletter Reach. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Semiconductor Industry Insights. The semiconductor capital equipment (semicap) industry is one of the most important industries on the planet and one that doesn’t get much love.
Research Focus. The goal is to share interesting content with y'all so that you can get a peek behind the scenes into my research process.
Potential Fatal Questions. All of these questions are hard, with no obvious answer; the last may be fatal.
Accurate Predictions. Gary Marcus’s predictions over the last couple years have been astonishingly on target.
Investor Questions. But investors really ought to ask some tough questions, such as these: What is their moat?
Cash Raising Necessity. Obviously, their only hope is to raise more cash, and they will certainly try.
LLMs as Commodities. LLMs have just became exactly the commodity I predicted they would become, at the lowest possible price.
MetaAI Competition. Yesterday was something even more dramatic: MetaAI all but pulled the rug out from OpenAI's business, offering a viable competitor to GPT-4 for free.
Lack of Competitive Moat. OpenAI, as far as I can tell, doesn’t really have any moat whatsoever, beyond brand recognition.
Profit Predictions. That’s not great news for OpenAI, and you can see why they haven’t been, um, Open, about their financials.
OpenAI's Financial Issues. I have long suspected that OpenAI was losing money, and lots of it, but never seen an analysis, until this morning.
.
AI Training Data Ethics. A massive dataset containing subtitles from over 170,000 YouTube videos was used to train AI systems for major tech companies without permission, raising significant ethical and legal questions.
Hugging Face SmoLLM. Hugging Face has introduced SmoLLM, a new series of compact language models available in three sizes: 130M, 350M, and 1.7B parameters.
Llama 3.1 Parameters. With 405 billion parameters, Llama 3.1 was developed using over 16,000 Nvidia H100 GPUs, costing Meta hundreds of millions of dollars.
Meta Llama 3.1 Release. Meta has released Llama 3.1, the largest open-source AI model, claiming it outperforms top private models like GPT-4o and Claude 3.5 Sonnet.
AI Security Standards. Top tech companies form a coalition to develop cybersecurity and safety standards for AI, aiming to ensure rigorous security practices and keep malicious hackers at bay.
GPT-4o Mini Launch. OpenAI has launched GPT-4o mini, a smaller, faster, and more cost-effective AI model than its predecessors.
Market Demand for Small Models. The trend toward small language models is accelerating as Arcee AI announced its $24M Series A funding only 6 months after a $5.5M seed round in January 2024.
OpenAI Reasoning Project. OpenAI is developing a new reasoning technology called Project Strawberry, which aims to enable AI models to conduct autonomous research and improve their ability to answer difficult user queries.
GPT-4o Mini Performance. GPT-4o mini scored 82% on the MMLU reasoning benchmark and 87% on the MGSM math reasoning benchmark, outperforming other models like Gemini 1.5 Flash and Claude 3 Haiku.
Data Augmentation Strategy. We will use a policy like TrivialAugment + StyleTransfer, for it's superior performance, cost, and benefits.
Self-Supervised Learning Application. Self-supervised clustering is elite for selecting the right samples to train on, helping to overcome scaling limits.
Audience Engagement Strategy. Every share puts me in front of a new audience, and I rely entirely on word-of-mouth endorsements to grow.
Model Performance Improvement. Our method uses a deep convolutional network trained to directly optimize the embedding itself, achieving state-of-the-art face recognition performance using only 128-bytes per face.
Sample Selection for Retraining. It’s best to add train samples based on maximizing information gain instead of simply adding more random ones.
Temporal Feature Analysis. If you want to take things up a notch, you’re best served going for temporal feature extraction.
Importance of Ensemble Modeling. Using simple models will keep inference costs low and allows an ensemble to compensate for the weakness of one model by sampling a more diverse search space.
Effective Feature Extraction. Feature extraction is the highest ROI decision you can make.
Record Accuracy Achieved. On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%.
Deepfake Detection System. We hope to build a Deepfake Detection system that can classify between 3 types of inputs: real, deep-fake, and ai-generated.
.
.
.
.
Global AI Regulation. Japan's Prime Minister Fumio Kishida unveils an international framework for the regulation and use of generative AI, emphasizing the need to address the potential risks and promote cooperation for safe and trustworthy AI.
AI in Healthcare. AI system trained on heart's electrical activity reduces deaths in high-risk patients by 31% in hospital trial, proving its potential to save lives.
AI Notetaking Revolution. 'I will never go back': Ontario family doctor says new AI notetaking saved her job.
Shift to Enterprise Focus. AI startups that initially garnered attention with innovative generative AI products are now shifting their focus towards enterprise customers to enhance revenue streams.
Meta's Ad Tool Issues. Meta's automated ad tool, Advantage Plus, has been overspending on ad budgets and failing to deliver sales, causing frustration among marketers and businesses.
Microsoft's AI Policy Change. Microsoft bans U.S. police from using enterprise AI tool for facial recognition due to concerns about potential pitfalls and racial biases.
Inverse Scaling Phenomenon. The authors also share their findings on the difficulty of creating and evaluating hard prompts, and the phenomenon of inverse scaling, where larger models fail tasks that smaller models can complete.
Evaluation Challenges. The authors discuss the challenges of creating hard prompts and the trade-offs between human and model-based automatic evaluation.
Vibe-Eval Suite. Reka AI introduces Vibe-Eval, a new evaluation suite designed to measure the progress of multimodal language models.
Burnout in AI Industry. AI engineers in the tech industry are experiencing burnout and rushed rollouts due to the intense competition and pressure to stay ahead in the generative AI race.
Lawsuit Against OpenAI. Eight U.S. newspaper publishers, all under the ownership of investment firm Alden Global Capital, have filed a lawsuit against Microsoft and OpenAI, alleging copyright infringement.
Deepfake Detector Release. OpenAI Releases 'Deepfake' Detector to Disinformation Researchers.
AI Content Labeling. TikTok will automatically label AI-generated content created on platforms like DALL·E 3.
AI Audiobooks. Audible's Test of AI-Voiced Audiobooks Tops 40,000 Titles.
AI Export Bill. US lawmakers unveil bill to make it easier to restrict exports of AI models.
OpenAI & Stack Overflow. OpenAI and Stack Overflow partner to bring more technical knowledge into ChatGPT.
Robotaxi Plans Delayed. Motional delays commercial robotaxi plans amid restructuring.
Funding for Autonomy. Wayve, an A.I. Start-Up for Autonomous Driving, Raises $1 Billion.
New AI Model. New Microsoft AI model may challenge GPT-4 and Google Gemini.
Siri Revamp. Apple Will Revamp Siri to Catch Up to Its Chatbot Competitors.
Mystery Chatbot. Mysterious 'gpt2-chatbot' AI model appears suddenly, confuses experts.
AI Music Generation. ElevenLabs previews music-generating AI model.
Microsoft Copilot Upgrade. Microsoft is introducing new AI features in Copilot for Microsoft 365 to help users create better prompts and become prompt engineers, aiming to improve productivity and efficiency in the workplace.
TikTok AI Labeling. TikTok has announced that it will automatically label AI-generated content created on other platforms, such as OpenAI's DALL·E 3, using a technology called Content Credentials from the Coalition for Content Provenance and Authenticity (C2PA).
DeepSeek-V2 Features. DeepSeek AI releases DeepSeek-V2, a Mixture-of-Experts (MoE) language model, that is state-of-the-art, cost-effective, and efficient with 236B total parameters, of which 21B are activated for each token.
Robot Dogs Testing. The United States Marine Forces Special Operations Command (MARSOC) is testing rifle-armed 'robot dogs' supplied by Onyx Industries.
Advancements in Drug Discovery. AlphaFold 3 is expected to be particularly beneficial for drug discovery, as it can predict where a drug binds a protein, a feature that was absent in its predecessor, AlphaFold 2.
AlphaFold 3 Overview. Google's DeepMind has unveiled AlphaFold 3, an advanced version of its protein structure prediction tool, which can now predict the structures of DNA, RNA, and essential drug discovery molecules like ligands.
AI Model Competition. Microsoft is developing a new large-scale AI language model called MAI-1, potentially rivaling state-of-the-art models from Google, Anthropic, and OpenAI.
AI Deepfake Detector. OpenAI releases a deepfake detector tool to combat the influence of AI-generated content on the upcoming elections, acknowledging that it's just the beginning of the fight against deepfakes.
Wayve's $1 Billion Raise. Wayve, a London-based AI start-up for autonomous driving, raised an eye-popping $1 billion from investors like SoftBank, Microsoft, and Nvidia.
AI and Deception. AI systems are becoming increasingly sophisticated in their capacity for deception, raising concerns about potential dangers to society and the need for AI safety laws.
Safety Tool Release. U.K. Safety Institute releases an open-source toolset called Inspect to assess AI model safety, aiming to provide a shared, accessible approach to evaluations.
Google Media Models. Google unveils Veo and Imagen 3, its latest AI media creation models.
AI in Search. Google is redesigning its search engine — and it's AI all the way down.
Google AI Astra. Project Astra is the future of AI at Google.
OpenAI GPT-4o. OpenAI releases GPT-4o, a faster model that's free for all ChatGPT users.
Listener Interaction. Email us your questions and feedback at contact@lastweekin.ai and/or hello@gladstone.ai.
Special Interview. With a special one-time interview with Andrey in the latter part of the podcast.
YouTube Version. You can watch the youtube version of this here:
Guest Host. With guest host Daliana Liu from The Data Scientist Show!
AI News Summary. Our 167th episode with a summary and discussion of last week's big AI news!
Anthropic AI Tool. Anthropic AI Launches a Prompt Engineering Tool that Generates Production-Ready Prompts in the Anthropic Console.
AI Copyright Issues. How One Author Pushed the Limits of AI Copyright.
AI Watermark. Google's invisible AI watermark will help identify generative text and video.
AI Model Safety. U.K. agency releases tools to test AI model safety.
New AI Models. Falcon 2: UAE's Technology Innovation Institute Releases New AI Model Series, Outperforming Meta's New Llama 3.
Waymo Investigation. Waymo's robotaxis under investigation after crashes and traffic mishaps.
Zoox Probe. US agency probes Amazon-owned Zoox self-driving vehicles after two crashes.
Robotaxi Testing. GM's Cruise to start testing robotaxis in Phoenix area with human safety drivers on board.
Anthropic Leadership. Mike Krieger joins Anthropic as Chief Product Officer.
OpenAI Leadership Change. OpenAI's Chief Scientist and Co-Founder Is Leaving the Company.
AI Music Sandbox. Google Unveils Music AI Sandbox Making Loops From Prompts.
AI Emissions Concerns. Microsoft's emissions and water usage spiked due to the increased demand for AI technologies, posing challenges to meeting sustainability goals.
Investment in AI. Microsoft announces a 4 billion euro investment in cloud and AI infrastructure, AI skilling, and French Tech acceleration.
AI College Partnership. Reddit's partnership with OpenAI allows the AI company to train its models on Reddit content, leading to a surge in Reddit shares.
Waymo Investigation. The National Highway Traffic Safety Administration (NHTSA) has initiated an investigation into Alphabet's Waymo self-driving vehicles following reports of unexpected behavior and traffic safety violations.
Transparency Issues. This news came amidst the release of ChatGPT 4o, but OpenAI's restrictive off-boarding agreement has raised concerns about the company's transparency.
Multimodal Capabilities. The new model is 'natively multimodal,' meaning it can generate content or understand commands in voice, text, or images.
OpenAI's GPT-4o Release. OpenAI has announced the release of GPT-4o, an enhanced version of the GPT-4 model that powers ChatGPT.
Astra's Functionality. Hassabis envisions AI's future to be less about the models and more about their functionality, with AI agents performing tasks on behalf of users.
Project Astra Launch. Google's Project Astra, a real-time, multimodal AI assistant, is the future of AI at Google, according to Demis Hassabis, the head of Google DeepMind.
AI in Journalism. Gannett is implementing AI-generated bullet points at the top of journalists' stories to enhance the reporting process.
AI Legislation in Colorado. Colorado lawmakers have passed a landmark AI discrimination bill, which would prohibit employers from using AI to discriminate against workers.
AI Safety Commitments. Tech giants pledge AI safety commitments — including a ‘kill switch’ if they can't mitigate risks.
Groundbreaking AI Law. World's first major law for artificial intelligence gets final EU green light.
Emotional AI Initiative. Inflection AI reveals new team and plan to embed emotional AI in business bots.
Fetch AI Assistant. Microsoft, Khan Academy provide free AI assistant for all educators in US.
AI Voice Concerns. OpenAI says Sky voice in ChatGPT will be paused after concerns it sounds too much like Scarlett Johansson.
AI Regulation Bill. Colorado governor signs sweeping AI regulation bill.
AI Likeness Management. Hollywood agency CAA aims to help stars manage their own AI likenesses.
Universal Basic Income. AI 'godfather' Geoffrey Hinton advocates for universal basic income to address AI's impact on job inequality and wealth distribution.
First AI Regulation. EU member states have approved the world's first major law for regulating artificial intelligence, emphasizing trust, transparency, and accountability.
AI and Education. AI tutors are quietly changing how kids in the US study, offering affordable and personalized assistance for school assignments.
AI-Language Model War. Tencent and iFlytek have entered a price war by slashing prices of large-language models used for chatbots.
Generative AI Upgrade. Amazon is upgrading its decade-old Alexa voice assistant with generative artificial intelligence and plans to charge a monthly subscription fee.
OpenAI's Response. OpenAI has temporarily halted the use of the Sky voice in its ChatGPT application due to its resemblance to actress Scarlett Johansson's voice.
Claude's Discoveries. One notable discovery was a feature associated with the Golden Gate Bridge, which, when activated, indicated that Claude was contemplating the landmark.
Anthropic Research. A new research paper published by Anthropic aims to demystify the 'black box' phenomenon of AI's algorithmic behavior.
AI Launch Issues. This incident continues a trend of Google facing issues with its latest AI features immediately after their launch, as seen in February 2023.
Trust Undermined. This has led to a significant backlash online, undermining trust in Google's search engine, which is used by over two billion people for reliable information.
Google's AI Errors. Google's recent unveiling of its new artificial intelligence (AI) capabilities for search has sparked controversy due to a series of errors and untruths.
Nvidia Revenue Surge. Nvidia, Powered by A.I. Boom, Reports Soaring Revenue and Profits.
Content Deals with OpenAI. Vox Media and The Atlantic sign content deals with OpenAI.
PwC and OpenAI. PwC agrees deal to become OpenAI's first reseller and largest enterprise user.
Hollywood AI Partnerships. Alphabet, Meta Offer Millions to Partner With Hollywood on AI.
AI Cloning Fines. Robocaller Who Used AI to Clone Biden's Voice Fined $6 Million.
AI Earbuds Innovation. Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled.
Real-time Video Translation. Microsoft Edge will translate and dub YouTube videos as you’re watching them.
Alexa's AI Overhaul. Amazon plans to give Alexa an AI overhaul — and a monthly subscription price.
Opera's AI Integration. Opera is adding Google's Gemini AI to its browser.
Telegram Copilot Bot. Telegram gets an in-app Copilot bot.
Google AI Controversy. Google's A.I. Search Errors Cause a Furor Online.
AI News Summary. Our 169th episode with a summary and discussion of last week's big AI news!
AI Model Rankings. Scale AI publishes its first LLM Leaderboards, ranking AI model performance in specific domains.
AI Safety Concerns. OpenAI researcher who resigned over safety concerns joins Anthropic.
Training Compute Growth. Training Compute of Frontier AI Models Grows by 4-5x per Year.
xAI Funding. Elon Musk's xAI raises $6 billion in latest funding round.
ChatGPT Discounts. OpenAI launches programs making ChatGPT cheaper for schools and nonprofits.
EU AI Act Developments. The EU is establishing the AI Office to regulate AI risks, foster innovation, and influence global AI governance.
Deepfake Concerns. A deepfake video of a U.S. official discussing Ukraine's potential strikes in Russia has surfaced, raising concerns about the use of AI-powered disinformation.
AI Misuse in Influencing Campaigns. Russia and China used OpenAI's A.I. in covert campaigns to manipulate public opinion and influence geopolitics, raising concerns about the impact of generative A.I. on online disinformation.
AI Search Tool Rollback. Google's new artificial intelligence feature for its search engine, A.I. Overviews, has been significantly rolled back after it produced a series of errors and false information.
PwC as OpenAI Reseller. OpenAI has partnered with consulting giant PwC to provide ChatGPT Enterprise, the business-oriented version of its AI chatbot, to PwC employees and clients.
Vox Media and OpenAI Partnership. Vox Media has announced a strategic partnership with OpenAI, aiming to leverage AI technology to enhance its content and product offerings.
Expensive AI Training Data. AI training data is becoming increasingly expensive, putting it out of reach for all but the wealthiest tech companies.
Survey on AI Usage. AI products like ChatGPT are much hyped but not widely used, with only 2% of British respondents using such tools on a daily basis.
OpenAI Board Conflict. OpenAI is also embroiled in controversy, with former board member Helen Toner accusing CEO Sam Altman of dishonesty and manipulation during a failed coup attempt.
Musk's xAI Controversy. LeCun criticized Musk's leadership at xAI, calling him an erratic megalomaniac, following Musk's announcement of a $6 billion funding round for xAI.
AI Industry Tensions. The AI industry is seeing increasing tension, highlighted by a recent clash between Elon Musk and Yann LeCun on social media.
AI Video Generator. KLING is the latest AI video generator that could rival OpenAI's Sora.
AI Beauty Pageant. The Uncanny Rise of the World's First AI Beauty Pageant.
GPT-4 Exam Performance. GPT-4 didn't ace the bar exam after all, MIT research suggests — it didn't even break the 70th percentile.
Election Risks. Testing and mitigating elections-related risks.
OpenAI Whistleblowers. OpenAI Insiders Warn of a 'Reckless' Race for Dominance.
AGI by 2027. Former OpenAI researcher foresees AGI reality in 2027.
Tech Giants Collaboration. Google, Intel, Microsoft, AMD and more team up to develop an interconnect standard to rival Nvidia's NVLink.
Microsoft Layoffs. Microsoft Lays Off 1,500 Workers, Blames 'AI Wave'.
Zoox Self-Driving Cars. Zoox to test self-driving cars in Austin and Miami.
UAE AI Partnership. UAE seeks 'marriage' with US over artificial intelligence deals.
Saudi Investment. Saudi fund invests in China effort to create rival to OpenAI.
OpenAI Robotics Group. OpenAI is restarting its robotics research group.
Google's NotebookLM. Google's updated AI-powered NotebookLM expands to India, UK and over 200 other countries.
ElevenLabs Sound Effects. ElevenLabs’ AI generator makes explosions or other sound effects with just a prompt.
Perplexity AI Feature. Perplexity AI's new feature will turn your searches into shareable pages.
Udio 130 Model. Udio introduces new udio-130 music generation model and more advanced features.
Apple's AI Features. 'Apple Intelligence' will automatically choose between on-device and cloud-powered AI.
Amazon AI Impact. Amazon's use of AI and robotics in its warehouses isolates workers and hinders union organizing, according to a new report by Oxford University researchers.
FTC Antitrust Investigations. FTC and DOJ open antitrust investigations into Microsoft, OpenAI, and Nvidia, with the FTC looking into potential antitrust issues related to investments made by technology companies into smaller AI companies.
Microsoft's AI Investment. Microsoft plans to invest $3.2 billion in AI infrastructure in Sweden, including training 250,000 people and increasing capacity at its data centers.
AI Chatbot Accuracy. AI chatbots, including Google’s Gemini 1.0 Pro and OpenAI’s GPT-3, provided incorrect information 27% of the time when asked about voting and the 2024 election.
Kuaishou's New Product. Kuaishou, a Chinese short-video app, has launched a text-to-video service similar to OpenAI's Sora, as part of the race among Chinese Big Tech firms to catch up with US counterparts in AI applications.
Concept Storage Method. A new research paper from OpenAI introduces a method to identify how the AI stores concepts that might cause misbehavior.
Whistleblower Protections. The proposal also calls for the abolition of nondisparagement agreements that prevent insiders from voicing risk-related concerns.
Right to Warn. Thirteen current and former employees of OpenAI and Google DeepMind have published a proposal demanding the right to warn the public about the potential dangers of advanced artificial intelligence (AI).
ChatGPT Outage. OpenAI's ChatGPT experienced multiple outages, including a major one during the daytime in the US, but the issues were eventually resolved.
Anticipating AGI. Former OpenAI researcher predicts the arrival of AGI by 2027, foreseeing AI machines surpassing human intelligence and national security implications.
Regulatory Challenges. Waymo issues a voluntary software recall after a driverless vehicle collides with a telephone pole, prompting increased regulatory scrutiny of the autonomous vehicle industry.
Deepfake Impact. AI played a significant role in the Indian election, with political parties using deepfakes and AI-generated content for targeted communication, translation of speeches, and personalized voter outreach.
OpenAI Revenue Growth. OpenAI's annualized revenue has more than doubled in the last six months, reaching $3.4 billion.
OpenAI Partnership. OpenAI and Apple announce partnership to integrate ChatGPT into Apple experiences.
Generative Video Creation. Dream Machine enables users to create high-quality videos from simple text prompts such as 'a cute Dalmatian puppy running after a ball on the beach at sunset.'
Luma AI Launch. Luma AI has launched the public beta of its new AI video generation model, Dream Machine, which has garnered overwhelming user interest.
Conversations with Siri. Key features include a more conversational Siri, AI-generated 'Genmoji,' and integration with OpenAI's GPT-4o for handling complex requests.
Apple AI Features. Apple has announced 'Apple Intelligence,' a suite of AI features for iPhone, Mac, and more at WWDC 2024.
Meta's AI Models. Meta releases flurry of new AI models for audio, text and watermarking.
OpenAI Revenue Growth. Report: OpenAI Doubled Annualized Revenue in 6 Months.
Claude 3.5 Release. Anthropic just dropped Claude 3.5 Sonnet with better vision and a sense of humor.
Runway Video Model. Runway unveils new hyper realistic AI video model Gen-3 Alpha, capable of 10-second-long clips.
Luma's Dream Machine. 'We don’t need Sora anymore': Luma’s new AI video generator Dream Machine slammed with traffic after debut.
New Apple Features. Apple Intelligence: every new AI feature coming to the iPhone and Mac.
Waymo's Recall. Waymo issues software and mapping recall after robotaxi crashes into a telephone pole.
Perplexity Controversy. Buzzy AI Search Engine Perplexity Is Directly Ripping Off Content From News Outlets.
Huawei's Chip Concerns. Huawei exec concerned over China's inability to obtain 3.5nm chips, bemoans lack of advanced chipmaking tools.
Reward Tampering Research. Sycophancy to subterfuge: Investigating reward tampering in language models.
Adept and Microsoft Deal. AI startup Adept is in deal talks with Microsoft.
AI Influencer Ads. AI-generated avatars are being introduced on TikTok for brands to use in ads, allowing for customization and dubbing in multiple languages.
AI Models Comparison. Fireworks AI releases Firefunction-v2, an open-source function-calling model designed to excel in real-world applications, rivaling high-end models like GPT-4o at a fraction of the cost and with superior speed and functionality.
Brave AI Enhancement. Brave's in-browser AI assistant, Leo, now incorporates real-time Brave Search results, providing more accurate and up-to-date answers.
Revenue Loss Estimate. The publishing industry is expected to lose over $10 billion due to such practices, according to Ameet Shah, partner and SVP of publisher operations and strategy at Prohaska Consulting.
Publisher Backlash. AI search startup Perplexity, backed by Jeff Bezos and other tech giants, is facing backlash from publishers like The New York Times, The Guardian, Condé Nast, and Forbes for allegedly circumventing blocks to access and repurpose their content.
Benchmark Test Performance. Claude 3.5 Sonnet excelled in benchmark tests, outscoring GPT-4o, Gemini 1.5 Pro, and Meta's Llama 3 400B in most categories.
AI-Generated Script Backlash. London premiere of AI-generated script film cancelled after backlash from audience and industry, highlighting ongoing debate over AI's role in the film industry.
Emotion Detection Controversy. AI-powered cameras in UK train stations, including London's Euston and Waterloo, used Amazon software to scan faces and predict emotions, age, and gender for potential advertising and safety purposes, raising concerns about privacy and reliability.
Speed Improvement. The new model, which is available to Claude users on the web and iOS, and to developers, is said to be twice as fast as its predecessor and outperforms the previous top model, 3 Opus.
Claude 3.5 Sonnet Launch. Anthropic has launched its latest AI model, Claude 3.5 Sonnet, which it claims can match or surpass the performance of OpenAI’s GPT-4o or Google’s Gemini across a broad range of tasks.
Gemini Side Panels. Google rolls out Gemini side panels for Gmail and other Workspace apps.
AI Music Lawsuits. Music labels sue AI music generators for copyright infringement.
AI Safety Bill. Y Combinator rallies start-ups against California's AI safety bill.
Stock Sale Policies. OpenAI walks back controversial stock sale policies, will treat current and former employees the same.
Advanced AI Chip. China's ByteDance working with Broadcom to develop advanced AI chip, sources say.
Figma AI Redesign. Figma announces big redesign with AI.
Waymo Robotaxis. Waymo ditches the waitlist and opens up its robotaxis to everyone in San Francisco.
ChatGPT for Mac. OpenAI's ChatGPT for Mac is now available to all users.
Voice Mode Delay. OpenAI delays rolling out its 'Voice Mode' to July.
Collaboration Tools. Anthropic Debuts Collaboration Tools for Claude AI Assistant.
AI News Summary. Our 172nd episode with a summary and discussion of last week's big AI news!
Formation Bio Investment. Formation Bio raises $372M in Series D funding to apply AI to drug development, aiming to streamline clinical trials and drug development processes.
Humanoid Robot Deployment. Agility Robotics' Digit humanoids have landed their first official job with GXO Logistics Inc., marking the industry's first formal commercial deployment of humanoids.
Google Translate Expansion. Google Translate has added 110 new languages, including Cantonese and Punjabi, bringing the total of supported languages to nearly 250.
AI Voice Imitations Controversy. Morgan Freeman expresses gratitude to fans for calling out unauthorized AI imitations of his voice, highlighting the growing issue of AI-generated voice imitations in the entertainment industry.
Ethical AI Positioning. Anthropic aims to enable beneficial uses of AI by government agencies, positioning itself as an ethical choice among rivals.
New Collaboration Tools. Anthropic has launched an update to enhance team collaboration and productivity, introducing a Projects feature that allows users to organize their interactions with Claude.
Kicking Off AI Usage. The company's expansion of its service to all San Francisco residents is seen as a crucial step towards the normalization of autonomous vehicles and a potential path to profitability for the historically money-losing operation.
Waymo Expansion. Waymo announced that its robotaxi service in San Francisco is now open to the public, eliminating the need for customers to sign up for a waitlist.
AI Music Lawsuits. Universal Music Group, Sony Music, and Warner Records have filed lawsuits against AI music-synthesis companies Udio and Suno, accusing them of mass copyright infringement.
Performance Improvement. CriticGPT has shown significant effectiveness, with human reviewers using CriticGPT performing 60% better in evaluating ChatGPT's code outputs than those without such assistance.
CriticGPT Introduction. OpenAI has introduced a new AI model, CriticGPT, designed to identify errors in the outputs of ChatGPT, an AI system built on the GPT-4 architecture.
AI Scaling Myths. The belief that AI scaling will lead to artificial general intelligence is based on misconceptions about scaling laws, the availability of training data, and the limitations of synthetic data.
Gaming AI Capabilities. MIT robotics pioneer Rodney Brooks believes that people are overestimating the capabilities of generative AI and that it's flawed to assign human capabilities to it.
LLaMA 3 Release. Meta is about to launch its biggest LLaMA model yet, highlighting its significance.
China's AI Competition. The conversation includes China's competition in AI and its impacts.
AI Features Discussion. The episode covers emerging AI features and legal disputes over data usage.
Workforce Development. U.S. government addresses critical workforce shortages for the semiconductor industry with a new program.
Nvidia's Revenue. Nvidia is expected to make $12 billion from AI chips in China this year despite US controls.
AI Regulation Issues. With Chevron's demise, AI regulation seems dead in the water.
AI Video Fund. Bridgewater starts a $2 billion fund that uses machine learning for decision-making.
Runway's Gen 3 Alpha. Runway's Gen-3 Alpha AI video model is now available, but there’s a catch.
Gemini 1.5 Launch. Google's release of Gemini 1.5, Flash and Pro with 2M tokens to the public.
Security Flaw Discovered. OpenAI's ChatGPT macOS app was found to be storing user conversations in plain text, making them easily accessible to potential malicious actors.
AI Model Evaluation Advocacy. Anthropic is advocating for third-party AI model evaluations to assess capabilities and risks, focusing on safety levels, advanced metrics, and efficient evaluation development.
AI Bias in Medical Imaging. AI models analyzing medical images can be biased, particularly against women and people of color, and while debiasing strategies can improve fairness, they may not generalize well to new patient populations.
Apple's Board Role. Apple Inc. has secured an observer role on OpenAI's board, with Phil Schiller, Apple's App Store head and former marketing chief, appointed to the position.
Democratizing AI Access. Mozilla's Llamafile and Builders Projects were showcased at the AI Engineer World's Fair, emphasizing democratized access to AI technology.
Integrating ChatGPT. This move follows Apple's announcement to integrate ChatGPT into its iPhone, iPad, and Mac devices.
AI Music Generation. Suno launches iPhone app — now you can make AI music on the go, which allows users to generate full songs from text prompts or sound.
New AI Model Release. Kyutai has open-sourced Moshi, a real-time native multimodal foundation AI model that can listen and speak simultaneously.
Mind-reading AI Progress. AI can accurately recreate what someone is looking at based on brain activity, greatly improved when the AI learns which parts of the brain to focus on.
AI Coding Startup Valuation. AI coding startup Magic seeks $1.5-billion valuation in new funding round, aiming to develop AI models for writing software.
AI Lawsuits Implications. AI music lawsuits could shape the future of the music industry, as major labels sue AI firms for alleged copyright infringement.
AI Health Coach Collaboration. OpenAI and Arianna Huffington are collaborating on an 'AI health coach' that aims to provide personalized health advice and guidance based on individual data.
FlashAttention-3 Efficiency. The results show that FlashAttention-3 achieves a speedup on H100 GPUs by 1.5-2.0 times with FP16 reaching up to 740 TFLOPs/s and with FP8 reaching close to 1.2 PFLOPs/s.
Antitrust Concerns. These changes occur amid growing antitrust concerns over Microsoft's partnership with OpenAI, with regulators in the UK and EU scrutinizing the deal.
Concerns Over AI Safety. OpenAI is facing safety concerns from employees and external sources, raising worries about the potential impact on society.
AI Video Model Development. Odyssey is developing an AI video model that can create Hollywood-grade visual effects and allow users to edit and control the output at a granular level.
Regulatory Scrutiny Reaction. Microsoft has relinquished its observer seat on the board of OpenAI, a move that comes less than eight months after it secured the non-voting position.
OpenAI Security Breach. In early 2022, a hacker infiltrated OpenAI's internal messaging systems, stealing information about the design of the company's AI technologies.
Perception of Progress Assessment. Despite the introduction of this system, there is no consensus in the AI research community on how to measure progress towards AGI, and some view OpenAI's five-tier system as a tool to attract investors rather than a scientific measurement of progress.
Advancements in AGI. OpenAI is reportedly close to reaching Level 2, or 'Reasoners,' which would be capable of basic problem-solving on par with a human with a doctorate degree.
Current AI Level. OpenAI's technology, such as GPT-4o that powers ChatGPT, is currently at Level 1, which includes AI that can engage in conversational interactions.
OpenAI's Five-Tier Model. OpenAI has introduced a five-tier system to track its progress towards developing artificial general intelligence (AGI).
AMD Acquisition News. AMD plans to acquire Silo AI in a $665 million deal.
AI-generated Content Labels. Vimeo joins YouTube and TikTok in launching new AI content labels.
OpenAI and Health Coach. OpenAI and Arianna Huffington are working together on an 'AI health coach.'
Mind-Reading AI. Mind-reading AI recreates what you're looking at with amazing accuracy.
New AI Features. Figma pauses its new AI feature after Apple controversy.
Content Regulation Pressure. There is a need for transparency and regulation in AI content labeling and licensing.
AI Coding Startup. AI coding startup Magic seeks a $1.5-billion valuation in new funding round, sources say.
Elon Musk's GPU Plans. Elon Musk reveals plans to make the world's 'Most Powerful' 100,000 NVIDIA GPU AI cluster.
AI Industry Challenges. We delve into the latest advancements and challenges in the AI industry, highlighting new features from Figma and Quora, regulatory pressures on OpenAI, and significant investments in AI infrastructure.
AI's Limitations. LLMs are great at clustering similar things but 'regurgitating a lot of words with slight paraphrases while adding conceptually little, and understanding even less.'
Partial Regurgitation Defined. The term 'partial regurgitation' is introduced to describe AI's output not being a full reconstruction of the original source.
Regurgitation Process. The regurgitative process need not be verbatim.
Storage of Weights. Neural nets do store weights, but that doesn't mean that they know what they are talking about.
Neural Nets Critique. Gary Marcus criticizes neural nets, stating, 'Neural nets don't really understand anything, they read on the web.'
Understanding Proof. Partial regurgitation, no matter how fluent, does not, and will not ever, constitute genuine comprehension.
Need for New Approach. Getting to real AI will require a different approach.
Comparison to DeepMind. By comparison, GoogleDeepMind devotes a lot of its energy towards projects like AlphaFold that have clear potential to help humanity.
Safety Resources. Furthermore, OpenAI apparently hasn’t even fulfilled their own promises to devote 20% resources to AI safety.
Financial Priorities. Instead, they appear to be focused precisely on financial return, and appear almost indifferent to some the ways in which their product has already hurt large numbers of people (artists, writers, voiceover actors, etc).
Product Focus. The first step towards that should be a question about product – are the products we are making benefiting humanity?
OpenAI's Mission. As recently as November 2023, OpenAI promised in their filing as a nonprofit exempt from income tax to make AI that that 'benefits humanity … unconstrained by a need to generate financial return'.
Future of AI. Gary Marcus hopes that the most ethical company wins. And that we don’t leave our collective future entirely to self-regulation.
Ethical Concerns. The real issue isn’t whether OpenAI would win in court, it’s what happens to all of us, if a company with a track record for cutting ethical corners winds up first to AGI.
Unmet Safety Promises. OpenAI promised to devote 20% of its efforts to AI safety, but never delivered, according to a recent report.
Call for Independent Oversight. Without independent scientists in the loop, with a real voice, we are lost.
Questioning Government Trust. It's correct for the public to take everything OpenAI says with a grain of salt, especially because of their massive power and chance to potentially put humanity at risk.
Tax Status Conflict. OpenAI filed for non-profit tax exempt status, claiming that the company's mission was to 'safely benefit humanity', even as they turn over almost half their profits to Microsoft.
Governance Promises Broken. Altman once promised that outsiders would play an important role in the company's governance; that key promise has not been kept.
Restrictive Employee Contracts. OpenAI had highly unusual contractual 'clawback' clauses designed to keep employees from speaking out about any concerns about the company.
Altman's Conflicts of Interest. Altman appears to have misled people about his personal holdings in OpenAI, omitting potential conflicts of interest between his role as CEO of the nonprofit OpenAI and other companies he might do business with.
CTO's Miscommunication. CTO Mira Murati embarrassed herself and the company in her interview with Joanna Stern of the Wall Street Journal, sneakily conflating 'publicly available' with 'public domain'.
Copyright Issues. OpenAI has trained on a massive amount of copyrighted material, without consent, and in many instances without compensation.
Misuse of Artist's Voice. OpenAI proceeded to make a Scarlett Johansson-like voice for GPT-4o, even after she specifically told them not to, highlighting their overall dismissive attitude towards artist consent.
OpenAI's Misleading Name. OpenAI called itself open, and traded on the notion of being open, but even as early as May 2016 knew that the name was misleading.
Governance Representation. Sam Altman, 2016: 'We’re planning a way to allow wide swaths of the world to elect representatives to a new governance board.'
Questioning Authority. What happened to the wide swaths of the world? To quote Altman himself, 'Why do these fuckers get to decide what happens to me?'
Accountability Reminder. Gary Marcus keeps receipts.
.
Conflict of Interest. Sam has now divested his stake in that investment firm.
Toner's Whistleblowing. Toner was pushed out for her sin of speaking up.
Firing Consideration. The board had contemplated firing Sam over trust issues before that.
Safety Process Inaccuracy. Multiple occasions he gave inaccurate information about the small number of formal safety processes that the company did have in place.
ChatGPT Announcement. The board was not informed in advance about that [ChatGPT], we learned about ChatGPT on Twitter.
Sam's Deceit. Putting Toner's disclosures together with the other lies from OpenAI that I documented the other day, I think we can safely put Kara's picture of Sam the Innocent to bed.
Oversight Concerns. Altman is consolidating more and more power and seeming less and less on the level.
Lack of Candor. The (old) board never said that the firing of Sam was directly about safety, they said it was about candor.
Nonprofit Status. If they cannot assemble a board that respects the legal filings they made, and cannot behave in keeping with their oft-repeated promises, they must dissolve the nonprofit.
Trust Issues. If they can't trust Altman, I don't see they can do their job.
Misleading Claims. Both read to me as deeply misleading, verging on defamatory.
Board Attacks. At least two proxies have gone after Helen Toner, one in The Economist, highbrow, one low (a post on X that got around 200,000 views).
Lack of Trust. The degree to which they diverted from that core issue that led to Sam's firing is genuinely disturbing.
ChatGPT Announcement. The board was not informed in advance about that. We learned about ChatGPT on Twitter.
Alignment Problem. We are no closer to a solution to the alignment problem now than we were then.
Unmet Expectations. For all the daily claims of 'exponential progress', reliability is still a dream.
Time's Ravages. What I said then to Bach still holds, 100%, 26 months later.
Deep Learning Critique. The ridicule started with my infamous 'Deep Learning is Hit a Wall' essay.
Longstanding Warnings. Gary Marcus has warned people about the limits of deep learning, including hallucinations, since 2001.
Musk's Shift. Musk has switched teams, flipping from calling for a pause to going all in on a technology that remains exactly as incorrigible as it ever was.
Slowing Innovation. Christoper Mims echoed a lot of what I have been arguing here largely, writing that 'The pace of innovation in AI is slowing, its usefulness is limited, and the cost of running it remains exorbitant.'
No Breakthroughs. It has been almost two years since there’s been a bona fide GPT-4-sized breakthrough, despite the constant boasts of exponential progress.
Lackluster Fireside Chat. Melissa Heikkilä at Technology Review more or less panned Altman’s recent fireside chat at AI for Good.
Financial Conflicts. The Wall Street Journal had a long discussion of Altman’s financial holdings and possible conflicts of interest.
Bad Press for Altman. The bad press about Sam Altman and OpenAI, who once seemingly could do no wrong, just keeps coming.
Musk-LeCun Tension. Yann LeCun just pushed Elon Musk to the point of unfollowing him.
Kara Swisher's Bias. Paris Marx echoed my own feelings about Kara Swisher’s apparent lack of objectivity around Altman.
Informed Endorsement. I fully endorse its four recommendations.
Gift Link Provided. Roose supplied a gift link.
Key Contributors. The letter itself, cosigned by Bengio, Hinton, and Russell.
Common Sense Emphasis. Nowadays we both stress the absolutely essential nature of common sense, physical reasoning and world models, and the failure of current architectures to handle those well.
Future AI Development. If you want to argue that some future, as yet unknown form of deep learning will be better, fine, but with regards to what exists and is popular now, your view has come to mirror my own.
Critique Overlap. Your current critique for what is wrong with LLMs overlaps heavily with what I said repeatedly from 2018 to 2022.
Potential Alliance. The irony of all of this is that you and I are among the minority of people who have come to fully understand just how limited LLMs are, and what we need to do next. We should be allies.
Historical Dismissals. There is a clear pattern: you often initially dismiss my ideas, only to converge on the same place later — without ever citing my earlier arguments.
Funding Decline. Generative AI seed funding drops.
Data Point Validity. Every data point there is imaginary; we aren’t plotting real things here.
Read Marcus's Book. Gary Marcus wrote his new book Taming Silicon Valley in part for the reason of addressing regulatory issues.
Regulatory Failure. Self-regulation is a farce, and the US legislature has made almost no progress thus far.
Underprepared for AGI. We are woefully underprepared for AGI whenever it comes.
Graph Issues. The double Y-axis makes no sense, and presupposes its own conclusion.
GPT-4 Comparisons. GPT-4 is not actually equivalent to a smart high schooler.
AGI Prediction. OpenAI's internal roadmap alleged that AGI would be achieved by 2027.
Industry Pushback. Both the well-known deep-learning expert Andrew Ng and the industry newspaper The Information came out against 1047 in vigorous terms.
Self-Regulation Skepticism. Big Tech's overwhelming message is 'Trust Us'. Should we?
Certification Requirements. Anyone training a 'covered AI model' must certify, under penalty of perjury, that their model will not be used to enable a 'hazardous capability' in the future.
Concern over Liability. Andrew Ng complains that the bill defines an unreasonable 'hazardous capability' designation that may make builders of large AI models liable if someone uses their models to do something that exceeds the bill's definition of harm.
Proposed Bill SB-1047. State Senator Scott Wiener and others in California have proposed a bill, SB-1047, that would build in some modest restrains around AI.
Serious Damage Definition. Hazardous is defined here as half a billion dollars in damage; should we give that AI industry a free pass no matter how much harm might be done?
Regulation vs. Innovation. The Information's op-ed complains that 'California's effort to regulate AI would stifle innovation', but never really details how.
Demand for Stronger Regulation. We should be making SB-1047 stronger, not weaker.
Regulatory Support Lack. Not one of the companies that previously stood up and said they support AI regulation is standing up for this one.
Kurzweil's Prediction. Ray Kurzweil confirmed he has not revised and not redefined his prediction of AGI, still believing that will happen by 2029.
Future Expectations. Expect more revisionism and downsized expectations throughout 2024 and 2025.
Kurzweil's New Projection. In an interview published in WIRED, Kurzweil let his predictions slip back, for the first time, to 2032.
Expectations for LLMs. The ludicrously high expectations from the last 18 ChatGPT-drenched months were never going to be met.
OpenAI's CTO Admission. OpenAI's CTO Mira Murati acknowledged that there is no mind blowing GPT-5 behind the scenes as of yet.
Public Predictions. Nobody to my knowledge has kept systematic track of the predictions, but I took a quick and somewhat random look at X and had no trouble finding many predictions, going back to 2023, almost always optimistic.
GP-5 Training Status. Sam just a few weeks ago officially announced that they had only just started training GPT-5.
CTO Statement. Mira Murati promised we’d someday see 'PhD-level' models, the next big advance over today’s models, but not for another 18 months.
Delayed GPT-5 Arrival. Today is June 20 and I still don’t see squat. It would now appear that Business Insider’s sources were confused, or overstating what they knew.
Hallucination Concerns. Gary Marcus is still betting that GPT-5 will continue to hallucinate and make a bunch of wacky errors, whenever it finally drops.
Future Predictions Meme. Now arriving Gate 2024, Gate 2025, ... Gate 2026.
New Meme Observed. By now there’s actually a new meme in town. This one’s got even more views.
Confidence in Predictions. A lot of them got tons of views... What stands out the most, maybe, is the confidence with which a lot of them were presented.
Interpretation Misunderstanding. Gary Marcus misunderstood Ray Kurzweil to be revising his prediction for AGI to a later year (perhaps 2032).
Opposing Views on AGI. Gary Marcus stands by his own prediction that we will not see AGI by 2029, per criteria he discussed here.
Debate Potential. Ray Kurzweil and Gary Marcus talked about having a debate, which they hope will come to pass.
AGI Prediction Clarification. Ray Kurzweil confirmed he has not revised and not redefined his prediction of AGI, still defined as AI that can perform any cognitive task an educated human can, and still believes that will happen by 2029.
Starting Point. Gary Marcus thinks we have maybe one shot to get AI policy right in the US, and that we aren't off to a great start.
Reality Check Needed. We need a President who can sort truth from bullshit, in order to develop AI policies that are grounded in reality.
Corporate Promises. We need a President who can recognize when corporate leaders are promising things far beyond what is currently realistic.
Tech Hype Shift. The big tech companies are hyping AI with long term promises that are impossible to verify.
Presidential Understanding. We cannot afford to have a President in 2024 that doesn't fully grasp this.
Future AI Changes. AI is going to change everything, if not tomorrow, sometime over the next 5-20 years, some ways for good, some for bad.
Current AI Errors. Businesses are finally finding this out, too. (Headline in WSJ: 'AI Work Assistants Need a Lot of Handholding', because they are still riddled with errors.)
AI Limitations. Generative AI does in fact (still) have enormous limitations, just as I anticipated.
AI Ignored. Neither president even mentioned AI, which was a travesty of a different sort.
Debate Performance. Former President (and convicted felon) Donald Trump lied like an LLM last night, but still won the debate, because Biden's delivery was so weak.
Understanding Science. Above all else, we need a President who understands and appreciates science.
Urgent AI Policies. We need a President who can get Congress to recognize the true urgency of the moment, since Executive Orders alone are not enough.
Importance of Symbols. I don’t think metacognition can work without bringing explicit symbols back into the mix; they seem essential for high-level reflection.
Funding Concerns. Spending upwards of 100 billion dollars on the current approach seems wasteful if it's unlikely to get to AGI or ever be reliable.
Call for Metacognition. Scaling is not the most interesting dimension; instead, we need techniques, such as metacognition, that can reflect on what is needed and how to achieve it.
Skepticism on AGI. Many tech leaders have discovered that the best way to raise valuations is to hint that AGI is imminent.
Hope for Change. Gary Marcus hopes that people will take what Gates said seriously.
Neurosymbolic AI's Potential. Neurosymbolic AI has long been an underdog; in the end, I expect it to come from behind and be essential.
Need for Robust Software. Tech giants need serious commitment to software robustness.
Distress Over Regulation. Gary Marcus is deeply distressed that certain tech leaders and investors are putting massive support behind the presidential candidate least likely to regulate software.
AI Regulation Concerns. An unregulated AI industry is a recipe for disaster.
Shortsighted Innovation. Rushing innovative tech without robust foundations seems shortsighted.
Generative AI Limitations. Leaving more and more code writing to generative AI, which grasps syntax but not meaning, is not the answer.
Black Box AI Issues. Chasing black box AI, difficult to interpret, and difficult to debug, is not the answer.
AI Engineering Techniques. As Ernie Davis and I pointed out in Rebooting AI, five years ago, part of the reason we are struggling with AI in complex AI systems is that we still lack adequate techniques for engineering complex systems.
Structural Integrity Lacking. Twenty years ago, Alan Kay said 'Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no structural integrity, but just done by brute force and thousands of slaves.'
Software Reliability Needed. The world needs to up its software game massively. We need to invest in improving software reliability and methodology, not rushing out half-baked chatbots.
Integrating Prompt Testing. By running prompt tests regularly, we can catch issues early and ensure that prompts continue to perform well as you make changes and as the underlying LLMs are updated.
Evaluating LLM Outputs. Promptfoo offers various ways to evaluate the quality and consistency of LLM outputs.
Time Savings. Prompt testing saves time in the long run by catching bugs early and preventing regressions.
Introduction to Prompt Testing. Prompt testing is a technique specifically designed for testing LLMs and generative AI systems, allowing developers to write meaningful tests and catch issues early.
Testing Necessity. New LLM models are released, existing models are updated, and the performance of a model can shift over time.
Importance of Testing. LLMs can generate nonsensical, irrelevant, or even biased responses.
Newsletter Growth. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Expert Contributions. In the series Guests, I will invite these experts to come in and share their insights on various topics that they have studied/worked on.
Getting Started with Prompt Testing. Integrating prompt testing into your development workflow is easy.
Conclusion on Testing. Prompt testing provides a way to write meaningful tests for these systems, helping catch issues early and save significant time in the development process.
Product-Centric Metrics. Evaluate models based on metrics aligned with business goals, such as click-through rate or user churn, to ensure they deliver tangible value.
Dynamic Validation. Continuously update validation datasets to reflect real-world data and capture evolving patterns, ensuring accurate performance assessments.
Active Model Evaluation. Keeping models effective requires active and rigorous evaluation processes.
Three Vs of MLOps. Success in MLOps hinges on three crucial factors: Velocity, Validation, and Versioning.
Frequent Retraining. Regularly retraining models on fresh, labeled data helps mitigate performance degradation caused by data drift and evolving user behavior.
Collaborative Success. Successful project ideas often stem from collaboration with domain experts, data scientists, and analysts.
Overemphasis on Models. A common mistake that teams make is to overemphasize the importance of models and underestimate how much the addition of simple features can contribute to performance.
ML Engineer Tasks. ML engineers engage in four key tasks: data collection and labeling, feature engineering and model experimentation, model evaluation and deployment, and ML pipeline monitoring and response.
MLOps Investment. Investing in MLOps enables the development of 10x teams, which are more powerful in the long run.
MLOps Importance. Organizations often underestimate the importance of investing in the right MLOps practices.
Sustaining Model Performance. Maintaining models post-deployment requires deliberate practices such as frequent retraining on fresh data, having fallback models, and continuous data validation.
Simplicity in Models. Prioritizing simple models and algorithms over complex ones can simplify maintenance and debugging while still achieving desired results.
Reducing Alert Fatigue. Focus on Actionable Alerts: Prioritize alerts that indicate real problems requiring immediate attention.
Alert Fatigue Awareness. A common pitfall in data quality monitoring is alert fatigue.
Data Leakage Prevention. Thorough Data Cleaning and Validation: Scrutinize your data for inconsistencies, missing values, and potential leakage points.
Risks with Jupyter Notebooks. Notebooks allow you to trade simplicity + velocity for quality.
Tools and Experience. Engineers like tools that enhance their experience.
Anti-Patterns in MLOps. Several anti-patterns hinder MLOps progress, including the mismatch between industry needs and classroom education.
Streamline Deployments. Streamlining deployments and tools that predict end-to-end gains could minimize wasted effort.
Long Tail of ML Bugs. Debugging ML pipelines presents unique challenges due to the unpredictable and often bespoke nature of bugs.
Handling Data Errors. These can be addressed by developing/buying tools for real-time data quality monitoring and automatic tuning of alerting criteria.
Data Error Handling. ML engineers face challenges in handling a spectrum of data errors, such as schema violations, missing values, and data drift.
Development-Production Mismatch. There are discrepancies between development and production environments, including data leakage; differing philosophies on Jupyter Notebook usage; and non-standardized code quality.
ML Engineering Tasks. The 4 major tasks that an ML Engineer works on.
Audience Engagement. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Machine Learning Breakdown. In my series Breakdowns, I go through complicated literature on Machine Learning to extract the most valuable insights.
Documenting Knowledge. To avoid this, prioritize documentation, knowledge sharing, and cross-training.
Tribal Knowledge Risks. Undocumented Tribal Knowledge can create bottlenecks and dependencies, hindering collaboration.
C*-Algebraic ML. Looks like more and more people are looking to integrate Complex numbers into Machine Learning.
Saudi Arabia's Neom Project. The Saudi government had hoped to have 9 million residents living in 'The Line' by 2030, but this has been scaled back to fewer than 300,000.
Fractal Molecule Discovery. Researchers from Germany, Sweden, and the UK have discovered an enzyme produced by a single-celled organism that can arrange itself into a fractal.
Software Design Principles. During the design and implementation process, I found that the following list of 'rules' kept coming back up over and over in various scenarios.
Generative AI Insights. Some really good insights on building Gen AI LinkedIn.
LLM Reading Notes. The May edition of my LLM reading note is out.
Drug Design Transformation. We hope AlphaFold 3 will help transform our understanding of the biological world and drug discovery.
AlphaFold 3 Predictions. In a paper published in Nature, we introduce AlphaFold 3, a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy.
Spotlight on Aziz. Mohamed Aziz Belaweid writes the excellent, 'Aziz et al. Paper Summaries', where he summarizes recent developments in AI.
AI Education Support. Your generosity is crucial to keeping our cult free and independent- and in helping me provide high-quality AI Education to everyone.
AI Made Simple Community. We started an AI Made Simple Subreddit.
Language Processing Potential. Text Diffusion might be the next frontier of LLMs, at least for specific types of tasks.
Efficient Time Series Imputation. CSDI, using score-based diffusion models, improves upon existing probabilistic imputation methods by capturing temporal correlations.
Emerging LLM Techniques. Microsoft's GENIE achieves comparable performance with state-of-the-art autoregressive models and generates more diverse text samples.
Versatility of DMs. Diffusion models are applicable to a wide range of data modalities, including images, audio, molecules, etc.
Step-by-Step Control. The step-by-step generation process in diffusion models allows users to exert greater control over the final output, enabling greater transparency.
AlphaFold 3 Innovation. Google's AlphaFold 3 is gaining a lot of attention for its potential to revolutionize bio-tech. One of the key innovations that led to its performance gains over previous methods was its utilization of diffusion models.
Diffusion Models Explained. Diffusion Models are generative models that follow 2 simple steps: First, we destroy training data by incrementally adding Gaussian noise. Training consists of recovering the data by reversing this noising process.
High-Quality Generation. Diffusion models generate data with exceptional quality and realism, surpassing previous generative models in many tasks.
Application in Medical Imaging. Diffusion models have shown great promise in reconstructing Medical Images.
Greenwashing Example. Europe’s largest oil and gas company Shell was accused of selling millions of carbon credits tied to CO2 removal that never took place.
Pay What You Can. We follow a 'pay what you can' model, which allows you to support within your means.
Share Interesting Content. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
Meta Llama-3 Release. Our first agent is a finetuned Meta-Llama-3-8B-Instruct model, which was recently released by Meta GenAI team.
Deep Learning Method Spotlight. The DSDL framework significantly outperforms other dynamical and deep learning methods.
Venture Capital Overview. A great overview by Rubén Domínguez Ibar about how Venture Capital make decisions.
AI Regulation Insight. The regulation is primarily based on how risky your use case is rather than what technology you use.
Fungal Computing Potential. Unlock the secrets of fungal computing! Discover the mind-boggling potential of fungi as living computers.
Gaming and Chatbots. Limited Risk AI Systems like chatbots or content generation require transparency to inform users they are interacting with AI.
High-Risk AI Systems. High-Risk AI Systems are involved in critical sectors like healthcare, education, and employment, where there's a significant impact on people's safety or fundamental rights.
Community Spotlight Resource. Kiki's Bytes is a super fun YouTube channel that covers various System Design case studies.
Upcoming Articles Preview. Curious about what articles I’m working on? Here are the previews for the next planned articles.
Neural Networks Versatility. Thanks to their versatility, Neural Networks are a staple in most modern Machine Learning pipelines.
Credit Scoring Adaptation. Factors that predicted high creditworthiness a few years ago might not hold true today due to changing economic conditions or consumer behavior.
Evolving Language Models. Language Models trained on social media data need to adapt to constantly evolving language use, slang, and emerging topics.
Simplifying Data Augmentation. Before you decide to get too clever, consider the statement from TrivialAugment- the simplest method was so-far overlooked, even though it performs comparably or better.
Gradient Reversal Layer. The gradient reversal layer acts as an identity function during the forward pass but reverses gradients during backpropagation, creating a minimax game between the feature extractor and the domain classifier.
Impact on Sentiment Analysis. Our experiments on a sentiment analysis classification benchmark... show that our neural network for domain adaption algorithm has better performance than either a standard neural network or an SVM.
Adversarial Training Process. Domain-Adversarial Training (DAT) involves training a neural network with two competing objectives: to accurately perform the main task and to confuse a domain classifier that tries to distinguish between source and target domain data.
The Role of DANN. DANNs theoretically attain domain invariance by learning domain-invariant features.
Mitigating Distribution Shift. Good data + adversarial augmentation + constant monitoring works wonders.
Sources of Distribution Shift. Possible sources of distribution shift include sample selection bias, non-stationary environments, domain adaptation challenges, data collection and labeling issues, adversarial attacks, and concept drift.
Understanding Distribution Shift. Distribution shift, also known as dataset shift or covariate shift, is a phenomenon in machine learning where the statistical distribution of the input data changes between the training and deployment environments.
Improving Generalization. There are several ways to improve generalization such as implementing sparsity and/or regularization to reduce overfitting and applying data augmentation to mithridatize your models.
Challenges in Neural Networks. There are several underlying issues with the training process that scale does not fix, chief amongst them being distribution shift and generalization.
Community and Introspection. Epicurus encouraged his followers to form close-knit communities that allow their members to step back and help each other critically analyze the events around them.
Friendship Statistics. People with no friends or poor-quality friendships are twice as likely to die prematurely, according to Holt-Lunstad's meta-analysis of more than 308,000 people.
Friendship Importance. Epicurus has a particularly strong emphasis on the importance of friendship as a must for a happy life.
Social Media Awareness. Epicurean philosophy is a good reminder to keep vigilant about how we’re being influenced by the constant subliminal messaging and to only pursue the pleasures that we want for ourselves.
Epicurean Philosophy. Epicurean philosophy is based on a simple supposition: we are happy when we remove the things that make us unhappy.
Reading Recommendation. The plan is to do one of these a month as a special reading recommendation.
Self-Reflection Necessity. A good community directly benefits self-reflection.
Happiness Through Simplicity. True happiness doesn’t come from endlessly chasing pleasure, but from systematically eliminating the sources of our unhappiness.
Research Areas. A lot of current research focuses on LLM architectures, data sources prompting, and alignment strategies.
Greater Performance Gains. AnglE consistently outperforms SBERT, achieving an absolute gain of 5.52%.
AnglE Optimization. AnglE optimizes not only the cosine similarity between texts but also the angle to mitigate the negative impact of the saturation zones of the cosine function on the learning process.
Contrastive Learning Impact. Contrastive Learning encourages similar examples to have similar embeddings and dissimilar examples to have distinct embeddings.
Modeling Relations. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space uses complex numbers for knowledge graph embedding.
Complex Geometry Advantage. The complex plane provides a richer space to capture nuanced relationships and handle outliers.
Orthogonality Benefits. Orthogonality helps the model to capture more nuanced relationships and avoid unintended correlations between features.
Angular Representation. Focusing on angles rather than magnitudes avoids the saturation zones of the cosine function, enabling more effective learning and finer semantic distinctions.
Saturation Zones. The saturation zones of the cosine function can kill the gradient and make the network difficult to learn.
Challenges in Embeddings. Current Embeddings are held back by three things: Sensitivity to Outliers, Limited Relation Modeling, and Inconsistency.
Enhancing NLP. Good Embeddings allow three important improvements: Efficiency, Generalization, and Improved Performance.
Next-Gen Embeddings. Today we will primarily be looking at 4 publications to look at how we can improve embeddings by exploring a dimension that has been left untouched- their angles.
LLMs Hitting Wall. This is what leads to the impression that "LLMs are hitting a wall".
Critical Flaws. Such developments have 3 inter-related critical flaws: They mostly work by increasing the computational costs of training and/or inference, they are a lot more fragile than people realize, and they are incredibly boring.
Mental Space for Writing. Writing/Research takes a lot of mental space, and I don’t think I could do a good job if I was constantly firefighting these issues.
Communication Efforts. I have started communication with both the reader, my company, and Stripe/Bank.
Long Review Process. I have been told the review by the bank could take up to 3 months.
Stripe's Negative Balance Policy. Stripe does not let you use future deposits to settle balances, which makes sense from their perspectives but leaves me in this weird situation.
Stripe Payouts Paused. Due to all of this, Stripe has paused all my payouts.
Financial Loss. I lose money on every fraud claim. In this case, Stripe has removed 70 USD from my Stripe account: 50 for the base plan + 20 in fees.
Fraudulent Claim Issue. Unfortunately, one of the readers missed this. They signed up for a 50 USD/year plan and marked that transaction as fraudulent, causing complications.
Indefinite Pause. AI Made Simple will be going on an indefinite pause now.
Change in Payout Schedule. I’ve switched the payout schedule to monthly to ensure that I always have a buffer in my Stripe Account to handle issues like this.
Client Payment Process. I am monetizing this newsletter through my employer- SVAM International (USA Work Laws bar me from taking money from anyone who is not my employer).
Spline Usage. KANs use B-splines to approximate activation functions, providing accuracy, local control, and interpretability.
Interactive KANs. Users can collaborate with KANs through visualization tools and symbolic manipulation functionalities.
Explainability Benefits. KANs are more explainable, which is a big plus for sectors where model transparency is critical.
Accuracy of KANs. KANs can achieve lower RMSE loss with fewer parameters compared to MLPs for various tasks.
Performance and Training. KAN training is 10x slower than NNs which may limit their adoption in more mainstream directions that are dominated by scale.
Sparse Compositional Structures. A function has a sparse compositional structure when it can be built from a small number of simple functions, each of which only depends on a few input variables.
KAN Advantages. KANs use learnable activation functions on edges, which makes them more accurate and interpretable, especially useful for functions with sparse compositional structures.
Kolmogorov-Arnold Representation. The KART states that any continuous function with multiple inputs can be created by combining simple functions of a single input (like sine or square) and adding them together.
KAN Overview. This article will explore KANs and their viability in the new generation of Deep Learning.
Educational Importance. Even if we find fundamental limitations that make KANs useless, studying them in detail will provide valuable insights.
Grid Extension Technique. The grid extension technique allows KANs to adapt to changes in data distribution by increasing the grid density during training.
Need for Public Dialogue. Encouraging open dialogue and debate fosters critical thinking, raising awareness about oppression and empowering individuals to resist manipulation.
Technology and Risk. The lack of risk judgment and decision-making training is prevalent across roles and professions that most need it, revealing gaps in corporate risk management.
Current Gen Z Struggles. 67% of people 18 to 34 feel 'consumed' by their worries about money and stress, making it hard to focus, as part of the Gen Z mental health crisis.
Societal Symptoms. Being 'busy with work' has become a default way for people to spend their time, symptomatic of what Arendt called the 'victory of the animal laborans.'
Banality of Evil. Arendt argued that Adolf Eichmann's participation in the Holocaust was driven by thoughtlessness and blind obedience to authority, reflecting the concept of 'Banality of Evil.'
Totalitarianism Origins. Arendt argued that totalitarianism was a new form of government arising from the breakdown of traditional society and an increasingly ungrounded populace.
The Active Life Components. Hannah Arendt broke life down into 3 kinds of activities: Labor, Work, and Action, emphasizing that modern society deprioritizes the latter two.
Hannah Arendt Insights. Hannah Arendt was a 20th-century political theorist, well known for her thoughts on the nature of evil, the rise of totalitarianism, and her strong emphasis on the importance of living the 'active life.'
Challenge Comfort with Beliefs. Having good-faith conversations and the willingness to challenge deeply held beliefs is essential to fight dogma and ensure a society of free individuals.
AI Structural Concerns. The push for AI alignment by corporations may suppress inconvenient narratives, illustrating a paternalistic approach to technology.
High Cost of Red-teaming. Good red-teaming can be very expensive since it requires a combination of domain expert knowledge and AI person knowledge for crafting and testing prompts.
ACG Effectiveness. In the time that it takes ACG to produce successful adversarial attacks for 64% of the AdvBench set, GCG is unable to produce even one successful attack.
ACG Methodology. The Accelerated Coordinate Gradient (ACG) attack method combines algorithmic insights and engineering optimizations on top of GCG to yield a ~38x speedup.
Haize Labs Automation. Haize Labs seeks to rigorously test an LLM or agent with the purpose of preemptively discovering all of its failure modes.
Shift in Gender Output. The base model generates approximately 80% male and 20% female customers while the aligned model generates nearly 100% female customers.
Bias Distribution Changes. The alignment process would likely create new, unexpected biases that were significantly different from your baseline model.
Lower Output Diversity. Aligned models exhibit lower entropy in token predictions, form distinct clusters in the embedding space, and gravitate towards 'attractor states', indicating limited output diversity.
LLM Understanding. People often underestimate how little we understand about LLMs and the alignment process.
Adversarial Attack Generalization. The attack didn’t apply to any other model (including the base GPT).
Low Safety Checks. Many of them are too dumb: The prompts and checks for what is considered a 'safe' model is too low to be meaningful.
Red-teaming Purpose. Red-teaming/Jailbreaking is a process in which AI people try to make LLMs talk dirty to them.
Content Focus. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more.
Python Precision Issues. Python compares the integer value against the double precision representation of the float, which may involve a loss of precision, causing these discrepancies.
Deep Learning Insight. This paper presents a framework, HypOp, that advances the state of the art for solving combinatorial optimization problems in several aspects.
AI-Relations Trend. The ratio of people who reach out to me for AIRel vs ML roles has gone up significantly over the last 2–3 months.
Model Performance Challenge. We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models trained at the largest available scales.
Community Engagement. If you/your team have solved a problem that you’d like to share with the rest of the world, shoot me a message and let’s go over the details.
Reading Inspired. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week.
Subscriber Growth. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Legal AI Evaluation. We argue that this claim is not supported by the current evidence, diving into AI’s roles in various legal tasks.
TechBio Resources. We have a strong bio-tech focus this week b/c of all my reading into that space.
Performance Comparison. MatMul-Free LLMs (MMF-LLMs) achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2.7B parameters.
Training Efficiency Improvements. To counteract smaller gradients due to ternary weights, larger learning rates than those typically used for full-precision models should be employed.
Learning Rate Strategy. For the MatMul-free LM, the learning dynamics necessitate a different learning strategy, maintaining the cosine learning rate scheduler and then reducing the learning rate by half.
Memory Transfer Optimization. The Fused BitLinear Layer eliminates the need for multiple data transfers between memory levels, significantly reducing overhead.
Fused BitLinear Layer. The Fused BitLinear Layer combines operations and reduces memory accesses, significantly boosting training efficiency and lowering memory consumption.
Linear Layer Efficiency. Replacing non-linear operations with linear ones can boost your parallelism and simplify your overall operations.
Matrix Multiplication Bottleneck. Matrix multiplications (MatMul) are a significant computational bottleneck in Deep Learning, and removing them enables the creation of cheaper, less energy-intensive LLMs.
Simplified Operations. The secret to their great performance rests on a few innovations that follow two major themes- simplifying expensive computations and replacing non-linearities with linear operations.
Cost Reduction Strategies. The core idea includes restricting weights to the values {-1, 0, +1} to replace multiplications with simple additions or subtractions.
GPU Efficiency. We also provide a GPU-efficient implementation of this model which reduces memory usage by up to 61% over an unoptimized baseline during training.
Weekly Reach. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
AI Expertise Invitation. In the series Guests, I will invite these experts to come in and share their insights on various topics that they have studied/worked on.
Choco Milk Cult. Our chocolate milk cult has a lot of experts and prominent figures doing cool things.
AI-Human Relationship. The AI-human relationship dynamic is not something that I know much about.
Emotional Intelligence. Develop VCSAs to incorporate emotional intelligence to enhance user engagement and satisfaction.
Control Mechanisms. Ensure that VCSAs include features that give users a sense of control and the ability to communicate successfully with their devices.
Design for Imperfection. Design VCSAs to exhibit some level of imperfection to create relaxed interactions.
Managerial Implications. Encourage Partner-like interactions: use speech acts and algorithms to promote the perception of VCSAs as partners.
Partner Relationship. The perception of the relationship with the VCSA as a real partner attributes a distinct personality to the VCSA, making it an appealing entity.
Master Relationship. Some perceived the VCSA as a master, feeling like servants bound by its rules and unpredictable nature.
Servant Relationship. Young consumers frequently envisioned their VCSA as a servant that helps consumers realize their tasks.
Types of Relationships. From the results of the study three different relationships emerge: servant-master dynamic, dominant entity, and equal partners.
Controls and Preferences. Consumers may relate to anthropomorphized products either as others or as extensions of their self.
Self-extension Theory. If you think about the influence that particularly valuable products have on you, you increasingly consider them extensions of yourself.
Uncanny Valley. The Uncanny Valley represents clearly how different degrees of anthropomorphism can change our feelings and attitudes toward technologies and AI assistants.
Anthropomorphism Effects. Evidence shows that anthropomorphized products can enhance consumer preference, make products appear more vivid, and increase their perceived value.
Anthropomorphism Concept. Today's scholars focus on the broad concept of anthropomorphism: essentially, it is humans' tendency to perceive humanlike agents in nonhuman entities and events.
VCSAs Definition. Alexa, Google Home, and similar devices fall into the category of so-called 'voice-controlled smart assistants' (VCSAs).
Marriage Proposals. A good portion of those even said they would marry her.
Alexa Love. Amazon reported that half a million people told Alexa they loved her.
Human-like Interactions. When we interact with devices like Alexa or Google Home, we have different ways of thinking about ourselves and we relate to them differently from other people.
Skepticism on Technology. While I can’t imagine my life without tech, most of the activities that I enjoy are physical that would be very hard to simulate adequately.
Generational Perspective. I am a Gen Z kid who grew up with technology.
Inflection AI's Revenue Failure. Inflection AI’s revenue was, in the words of one investor, “de minimis.” Essentially zilch.
Data Contextuality in Healthcare Algorithms. A bombshell study found that a clinical algorithm many hospitals were using to decide which patients need care was showing racial bias.
AGI and Reduction of Information. The implication of this on generalized intelligence is clear. Reducing the amount of information to focus on what is important to a clearly defined problem is antithetical to generalization.
Contextual Nature of Data. Good or bad data is defined heavily by the context.
Statistical Proxy Limitations. Within any dataset is an implicit value judgment of what we consider worth measuring.
Good Data Removes Noise. Good Data Doesn’t Add Signal; it Removes Noise.
Skepticism About Generalized Intelligence. Ultimately, my skepticism around the viability of 'generalized intelligence' emerging by aggregating comes from my belief that there is a lot about the world and its processes that we can’t model within data.
Issues with Self-Driving Cars. Self-driving cars do find merges challenging.
AI Flattens Data Analysis. AI Flattens: By its very nature, AI works by abstracting the commonalities.
Data-Driven vs Mathematical Insights. My thesis can be broken into two parts. Firstly, I argue that Data-Driven Insights are a subclass of mathematical insights.
Yann LeCun's AGI Claim. Yann LeCunn has made headlines with his claims that 'LLMs are an off-ramp to AGI.'
AI's PR Campaign. This has led to a massive PR campaign to rehab AI's image and prepare for the next round of fundraising.
AI's Financial Cost for Microsoft. This is costing Microsoft more than $650 million.
Generative AI Commercialization Struggles. Close to 2 years since the release of ChatGPT, organizations have struggled to commercialize on the promise of the Generative AI.
Curated Insights. In issues of Updates, I will share interesting content I came across.
AI Market Hype. AI has many useful use cases, but it’s important to not allow yourself to get manipulated by people trying to piggy back off successful projects to sell their hype.
Knowledge Distillation. Knowledge distillation is a model training method that trains a smaller model to mimic the outputs of a larger model.
Impacts of FoodTech. The impact of food-related sciences is immense, proving that food is not just a basic necessity but a pivotal element in saving lives.
Security Challenges. Demand for high-performance chips designed specifically for AI applications is spiking.
AI Tokenization Method. The tokenizer for Claude 3 and beyond handles numbers quite differently to its competitors.
Reading Interest. If you want to keep your finger on your pulse for the tech-bio space, she’s an elite resource.
Technical Insight Source. Hai doesn’t shy away from talking about the Math/Technical Details, which is a rarity on LinkedIn.
Spotlight on Expertise. Hai Huang is a Senior Staff Engineer at Google, working on their AI for productivity projects.
Community Engagement. We started an AI Made Simple Subreddit.
Reading Recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc. I came across each week.
Subscriber Goal. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
SWE-Bench Overview. SWE-bench is a comprehensive evaluation framework comprising 2,294 software engineering problems sourced from real GitHub issues and their corresponding pull requests across 12 popular Python repositories.
Agility in Code Editing. The experiments reveal that agents are sensitive to the amount of content displayed in the file viewer, and striking the right balance is essential for performance.
Optimizing Agent Interfaces. Human user interfaces may not always be the most suitable for agent-computer interactions, calling for improved localization through faster navigation and more informative search interfaces tailored to the needs of language models.
Improving Error Recovery. Implementing guardrails, such as a code syntax checker that automatically detects mistakes, can help prevent error propagation and assist agents in identifying and correcting issues promptly.
SWE-Agent Functionalities. SWE-Agent offers commands that enable models to create and edit files, streamlining the editing process into a single command that facilitates easy multi-line edits with consistent feedback.
Key ACI Properties. ACIs should prioritize actions that are straightforward and easy to understand to minimize the need for extensive demonstrations or fine-tuning.
Effective ACI Design. By designing effective ACIs, we can harness the power of language models to create intelligent agents that can interact with digital environments in a more intuitive and efficient manner.
SWE-Agent Performance. When using GPT-4 Turbo as the base LLM, SWE-agent successfully solves 12.5% of the 2,294 SWE-bench test issues, significantly outperforming the previous best resolve rate of 3.8%.
Guest Contributions. In the series Guests, I will invite experts to share their insights on various topics that they have studied/worked on.
Adversarial AI Rise. Deepfakes typify the cutting edge of adversarial AI attacks, achieving a 3,000% increase last year alone; incidents are projected to rise by 50% to 60% in 2024.
AI Functionality Potential. We believe this process creates artifacts or fingerprints that ML models can detect.
Early Project Insights. We were good at the main task but had terrible generalization and robustness.
Social Media Influence. AI models are starting to gain a lot of popularity online, with some influencers earning significant incomes.
Deepfake Detection Collaboration. If your organization deals with Deepfakes, reach out to customize the baseline solution to meet your specific needs.
Model Performance. Our best models scored very good results—top models achieving 0.93 (SVC), 0.82 (RandomForest), and 0.8 (XGBoost) respectively.
Affordable Detection Solutions. Many cutting-edge Deepfake Detection setups are too costly to run at scale, severely limiting their utility in high-scale environments like Social Media.
Detection Strategy Development. Our goal is to classify an input image into one of three categories real, deep-fake, and ai-generated, which helps organizations catch Deepfakes amidst enterprise frauds.
Enterprise Security Concerns. 60% of CISOs, CIOs, and IT leaders are afraid their enterprises are not prepared to defend against AI-powered threats and attacks.
Deepfake Market Growth. Deepfake-related losses are expected to soar from $12.3 billion in 2023 to $40 billion by 2027, growing at an astounding 32% compound annual growth rate.