OpenAI Hardware Move. OpenAI's move into hardware production is a significant development for the company.
Tom Hanks Warning. Tom Hanks warns followers to be wary of 'fraudulent' ads using his likeness through AI.
China's Chip Advancements. China's chip capabilities are reportedly just 3 years behind TSMC, showcasing rapid advancements.
Investment in AI Companies. Ilya Sutskever's startup, Safe Superintelligence, raises $1B, signaling strong investor confidence in AI.
AI Regulation in California. California's pending AI regulation bill highlights growing governmental interest in AI oversight.
AI Training Advances. Advances in training language models with long-context capabilities are emerging in the AI landscape.
Amazon AI Robotics. Amazon's strategic acquisition in AI robotics is a notable event in the industry.
Micro and Macro Impact. OSS is really good at solving big, important problems that affect tons of people.
Benefits of Sharing. Companies that share their software get better street cred, outsource a lot of R&D to people for free, and hook more people into their ecosystem.
Cost Reduction Strategies. Adopting preexisting OS tools allows companies to reduce costs, build more secure systems, and iterate quickly.
End-User Benefits. End-users benefit from AI-powered applications that are improved through open-source collaboration.
Developer Portfolio Boost. Participation in open-source AI projects enhances career prospects as developers build public portfolios showcasing expertise in a highly competitive field.
Complementary Forces. Open and Closed Software are often complementary forces that are blended together to create a useful end product.
Learning Budget Support. Many companies have a learning budget that you can expense this newsletter to.
Open Source Investment. Companies invest significantly in open-source software (OSS) for enhanced innovation and competitive advantage.
Diverse Contributor Benefits. OSS attracts a diverse set of contributors, leading to more efficient and innovative solutions.
Fostering Innovation. OSS leads to cheaper, safer, and more accessible products, all benefiting end users.
Invest in Community Building. It is critical for any group to invest in creating a developer-friendly open-source project through comprehensive documentation and community engagement.
Ecosystem Development. Collaborating with other organizations to create integrated AI solutions expands market opportunities.
Training and Support. Providing training and certification in open-source AI frameworks can also generate revenue and build a community of skilled users.
OSS and Innovation. Open-source projects tend to explore more novel directions, lacking the short-term profit motives of traditional companies.
AI Potential Advancements. These models represent a major leap forward in AI’s problem-solving potential, paving the way for new advancements in fields like medicine, engineering, and advanced coding tasks.
Autonomous AI Agents. 1,000 autonomous AI agents collaborate to build their own society in a Minecraft server, forming a merchant hub and establishing a constitution.
Humanoid Robot Development. A robotics company in Silicon Valley has made significant progress in developing humanoid robots for real-world work scenarios.
DataGemma Introduction. Google introduces DataGemma, a pair of open-source AI models that address the issue of inaccurate answers in statistical queries.
Adobe Firefly Milestone. Adobe's Firefly Services, the company's AI-driven innovation, has reached a milestone of 12 billion generations.
Runway AI Upgrade. AI video platform RunwayML has introduced a new video-to-video tool in its latest model, Gen-3 Alpha.
Corporate Structure Change. Sam Altman announced that the company's non-profit corporate structure will undergo changes in the coming year, moving away from being controlled by a non-profit.
API Costs High. For developers, however, it’s worth noting that the model takes much longer to produce outputs and the API costs for o1 are significantly higher than GPT-4o.
Training Approach. What sets o1 apart is its training approach—unlike previous GPT models, which were trained to mimic data patterns, o1 uses reinforcement learning to think through problems, step by step.
Reasoning Capabilities. OpenAI describes this release as a 'preview,' highlighting its early-stage nature, and positioning o1 as a significant advancement in reasoning capabilities.
OpenAI o1 Model. OpenAI has introduced this new model as part of a planned series of 'reasoning' models aimed at tackling complex problems more efficiently than ever before.
Microsoft's Usage Caps. Microsoft's Inflection adds usage caps for Pi, new AI inference services by Cerebrus Systems competing with Nvidia.
U.S. Restrictions on China. U.S. gov't tightens China restrictions on supercomputer component sales.
AI Advancements. Google's AI advancements with Gemini 1.5 models and AI-generated avatars, along with Samsung's lithography progress.
Chinese GPU Access. Chinese Engineers Reportedly Accessing NVIDIA's High-End AI Chips Through Decentralized 'GPU Rental Services'.
Elon Musk's Support. Elon Musk voices support for California bill requiring safety tests on AI models.
Poll on SB1047. Poll: 7 in 10 Californians Support SB1047, Will Blame Governor Newsom for AI-Enabled Catastrophe if He Vetoes.
AI Regulation. AI regulation discussions including California's SB1047, China's AI safety stance, and new export restrictions impacting Nvidia's AI chips.
Bias in AI. Biases in AI, prompt leak attacks, and transparency in models and distributed training optimizations, including the 'distro' optimizer.
.
Altman's AGI Stance. Altman had, much to my surprise, just echoed my longstanding position that current techniques alone would not be enough to get to AGI.
GPT-4 Prediction. “Still flawed, still limited, seem more impressive on first use”. Almost exactly what I predicted we would see with GPT-4, back on Christmas Day 2022.
Synthetic Data Dependence. The new system appears to depend heavily on synthetic data, and that such data may be easier to produce in some domains (such as those in which o1 is most successful, like some aspects of math) than others.
Update on Strawberry. OpenAI’s latest, GPT o1, code named Strawberry, came out.
Marcus' Dream. Gary Marcus continue to a dream of day in which AI research doesn’t center almost entirely around LLMs.
Content Recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week.
AI Summary Study. The reviewers’ overall feedback was that they felt AI summaries may be counterproductive and create further work because of the need to fact-check and refer to original submissions.
Green Powders Marketing. Good video on the misleading marketing behind Green Powders.
Roaring Bitmaps Impact. By storing these indices as Roaring bitmaps, we are able to easily evaluate typical boolean filters efficiently, reducing latencies by 500 orders of magnitude.
AI Adoption Barriers. Until the liabilities and responsibilities of AI models for medicine are clearly spelled out via regulation or a ruling, the default assumption of any doctor is that if AI makes an error, the doctor is liable for that error, not the AI.
AI in Clinical Diagnosis. Doctors bear a lot of risk for using AI, while model developers don’t.
Freedom of Speech Analysis. Tobias Jensen discusses content moderation on social media platforms and recent cases which trend towards preventing the harms that can (and has) been caused by social media messages not being regulated properly.
Highlighting Important Works. I’m going to highlight only two since they bring up extremely important discussions, and I want to get your opinions on them.
Next Planned Articles. Boeing, DEI, and 9 USD Engineers.
Survey Participation. Fred Graver is looking into understanding the demand for content around AI and is asking people to fill out a survey.
Community Engagement. We started an AI Made Simple Subreddit.
Ilya Sutskever Funding. Safe Superintelligence (SSI), an AI startup co-founded by Ilya Sutskever, has successfully raised over $1 billion in funding.
OpenAI AI Chips. OpenAI is reportedly planning to build its own AI chips using TSMC's forthcoming 1.6nm A16 process node, according to United Daily News.
California AI Bill. The controversial California bill SB 1047, aimed at preventing AI disasters, has passed the state's Senate and is now awaiting Governor Gavin Newsom's decision.
iPhone 16 Launch. Apple has unveiled its iPhone 16 line, which includes the iPhone 16, iPhone 16 Plus, iPhone 16 Pro, and iPhone 16 Pro Max, all designed with the Apple Intelligence mind.
Waymo Collision Data. Waymo's driverless cars have been involved in fewer injury-causing crashes per million miles of driving than human-driven vehicles.
AI Image Creation. AI has led to the creation of over 15 billion images since 2022, with an average of 34 million images being created per day.
Global AI Treaty. US, EU, and UK sign the world's first international AI treaty, emphasizing human rights and democratic values as key to regulating public and private-sector AI models.
Music Producer Arrested. Music producer arrested for using AI and bots to boost streams and generate AI music, facing charges of money laundering and wire fraud.
AI in Healthcare. Google DeepMind has launched AlphaProteo, an AI system that generates novel proteins to accelerate research in drug design, disease understanding, and health applications.
Call for Clarity. In an ideal world, moderators would demand clarity on candidates' policies around AI.
AI Impacts on Society. AI is likely to change the world in coming years, affecting virtually every aspect of society, from employment to education to healthcare to national defense.
Candidates' AI Plans. It would be a really good time to demand better [AI policies] from candidates; if we don’t, future generations may regret it.
AI Policy Neglect. A total neglect of AI policy would be deeply unfortunate; our long-term future may actually be shaped more by AI policy than tariffs.
Future Responsibility. It will be our fault if candidates don’t address AI policy; they certainly aren’t going to bother to talk about it if we don’t let them know it matters.
Vulnerability of Teens. Nonconsensual deep fake porn may especially affect the already vulnerable population of teenage girls, who have been harmed by social media.
Training with MAE. Mean Absolute Error (MAE) is used as the training objective, which is robust to outliers.
Predictive Modeling Framework. The authors have created a fine-tuning process that allows Aurora to excel at both short-term and long-term predictions.
Replay Buffer Mechanism. Aurora implements a replay buffer, allowing the model to learn from its own predictions, improving long-term stability.
Energy-Efficient Fine-Tuning. LoRA introduces small, trainable matrices to the attention layers, allowing Aurora to fine-tune efficiently while significantly reducing memory usage.
Variable Weighting Methodology. Aurora uses variable weighting, where different weights are assigned to different variables in the loss function to balance their contributions.
Rollout Fine-tuning Importance. Rollout fine-tuning addresses the challenge by training Aurora on sequences of multiple predictions, simulating the chain reaction of weather events over time.
U-Net Architecture. The U-Net architecture allows for multi-scale processing, enabling the model to simultaneously understand local weather patterns and larger-scale atmospheric phenomena.
Swin Transformer Benefits. Swin Transformers excel at capturing long-range dependencies and scaling to large datasets, which is crucial for weather modeling.
Impact of Underreporting. Aurora got almost no attention, indicating a serious misplacement of priorities in the AI Community.
Community Awareness Gap. The ability of foundation models to excel at downstream tasks with scarce data could democratize access to accurate weather and climate information in data-sparse regions, such as the developing world and polar regions.
Sandstorm Prediction. Aurora was able to predict a vicious sandstorm a day in advance, which can be used in the future for evacuations and disaster planning.
Limited Data Handling. Aurora leverages the strengths of the foundation modelling approach to produce operational forecasts for a wide variety of atmospheric prediction problems, including those with limited training data, heterogeneous variables, and extreme events.
Advanced Predictive Capabilities. In under a minute, Aurora produces 5-day global air pollution predictions and 10-day high-resolution weather forecasts that outperform state-of-the-art classical simulation tools and the best specialized deep learning models.
Foundation Model Size. Aurora is a 1.3 Billion Foundation Model for environmental forecasting.
.
.
Learning Budget. Many companies have a learning budget that you can expense this newsletter to.
Expert Invitations. In the series Guests, I will invite these experts to come in and share their insights on various topics that they have studied/worked on.
Chocolate Milk Cult. Our chocolate milk cult has a lot of experts and prominent figures doing cool things.
Infrastructure Creation. AI application will not generate a net positive ROI on infrastructure buildout for some time.
AI Model Revenues. Our best indication of AI app revenue comes from model revenue (OpenAI at an estimated $1.5B in API revenue).
Energy Demand Increase. Demand is increasing, and the question is what bottlenecks will be alleviated to fulfill that demand.
Data Center Demand. Theoretically, value should flow through the traditional data center value chain.
AI Total Expenditures. The cloud revenue gives us the real indication of how much value is being invested into AI applications.
AI Application Revenue. AI applications have generated a very rough estimate of $20B in revenue with multiples higher than that in value creation so far.
Nvidia Revenue. Last quarter, Nvidia did $26.3B in data center revenue, with $3.7B of that coming from networking.
Power Scarcity. They’ll do this themselves or through a developer like QTS, Vantage, or CyrusOne.
Compute Power Concerns. All three hyperscalers noted they’re capacity-constrained on AI compute power.
Application Value. ROI on AI will ultimately be driven by application value to end users.
Hyperscaler Decisions. Hyperscalers are making the right CapEx business decisions.
No Clear ROI. There’s not a clear ROI on AI investments right now.
AI ROI Debate. For the first time in a year and a half, common opinion is now shifting to the narrative 'Hyperscaler spending is crazy. AI is a bubble.'
CapEx Growth. Amazon, Google, Microsoft, and Meta have spent a combined $177B on capital expenditures over the last four quarters.
100K Readers. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Long-Term Value Creation. Value will be created in unforeseen ways.
.
LLM Performance Restrictions. Imposing formatting restrictions on LLMs leads to performance degradation, impacting reasoning abilities significantly.
Standardizing Text Diversity. This work empirically investigates diversity scores on English texts and provides a diversity score package to facilitate research.
Impact of LLMs on Diversity. Writing with InstructGPT results in a statistically significant reduction in diversity.
Dimension Insensitive Metric. This paper introduces the Dimension Insensitive Euclidean Metric (DIEM) which demonstrates superior robustness and generalizability across dimensions.
Support for Writing. Doing so helps me put more effort into writing/research, reach more people, and supports my crippling chocolate milk addiction.
Community Engagement. We started an AI Made Simple Subreddit.
Notable Content Creator. Artem Kirsanov produces high-quality videos on computational neuroscience and AI, and offers very new ideas/perspectives for traditional Machine Learning people.
AI Content Focus. The focus will be on AI and Tech, but ideas might range from business, philosophy, ethics, and much more.
Previews of Articles. Upcoming articles include 'The Economics of ESports' and 'The economics of Open Source.'
New Paradigms in NLP. Sebastian Raschka discusses recent pre-training and post-training paradigms in NLP models, highlighting significant new techniques.
Risks of Synthetic Training. Training language models on synthetic data leads to a consistent decrease in the diversity of the model outputs through successive iterations.
.
.
OpenAI's New Deal. Ars Technica content is now available in OpenAI services.
Authors' Lawsuit. Authors sue Claude AI chatbot creator Anthropic for copyright infringement.
California AI Bill Weakening. California weakens bill to prevent AI disasters before final vote, taking advice from Anthropic.
Anysphere Funding. Anysphere, a GitHub Copilot rival, has raised $60M Series A at $400M valuation from a16z, Thrive, sources say.
AMD Acquisition. AMD buying server maker ZT Systems for $4.9 billion as chipmakers strengthen AI capabilities.
California Regulation. Analysis of California's AI regulation bill SB1047 and legal issues related to synthetic media, copyright, and online personhood credentials.
AI Model Scaling. Exploration of the feasibility and investment needed for scaling advanced AI models like GPT-4 and Agent Q architecture enhancements.
Perplexity Updates. Perplexity's integration of Flux image generation models and code interpreter updates for enhanced search results.
New AI Features. Ideogram AI's new features, Google's Imagine 3, Dream Machine 1.5, and Runway's Gen3 Alpha Turbo model advancements.
Episode Summary. Our 180th episode with a summary and discussion of last week's big AI news!
.
.
.
Mental Health and Misinformation. We cry for the government or social media companies to do something about worsening mental health and the spread of misinformation, but how many of us have acted positively on these platforms?
Democracy and Conformity. Tocqueville observed that democratic societies foster a sense of equality among citizens, which can lead to pressure for conformity, homogenizing thought, expression, and behavior.
Over-Reliance on Institutions. Tocqueville noticed a tendency for citizens to increasingly rely on the government under the expectation that an elected government should solve societal problems.
Personal Responsibility. We often expect institutions to make systemic changes without acknowledging the importance of individual responsibility in taking actions that lead to systemic change.
Need for Critical Diversity. When people lose exposure to diverse viewpoints, their capacity to visualize alternatives diminishes, reinforcing conformity.
Intellectual Homogeneity. A populace that is intellectually homogenous tends to rely on external sources for solutions, sacrificing personal agency and responsibility.
Social Media Trends. Advice for content creators often revolves around imitating successful content rather than fostering unique voices, contributing to conformity.
Conformity in Media. Social media and content creation platforms, initially designed for authentic expression, often lead to a relentless drive toward sameness and conformity.
Collective Action Importance. The OSS movement in tech allows people to find their communities and contribute, emphasizing the importance of collective small contributions leading to significant shifts.
Voluntary Associations. Tocqueville noted that Americans constantly form associations for various purposes, which serve as a powerful tool for collective action and public benefit.
Local Community Power. Tocqueville saw voluntary organizations and local community groups as crucial to counterbalance the negative tendencies of democracy.
Tyranny of the Majority. In modern democracies, tyranny manifests through social ostracism rather than physical oppression, leading to self-censorship and a society of self-oppressors.
Agency and Accountability. Tocqueville emphasizes the importance of people accepting agency and accountability for their information diet instead of relying on institutions.
.
Efficient Small Models. Nvidia's Llama-3.1-Minitron 4B performs comparably to larger models while being more efficient to train and deploy.
Open-Source AI Definition. Open-source AI is defined as a system that can be used, inspected, modified, and shared without restrictions.
Authors Sue Anthropic. Authors are suing AI startup Anthropic for using pirated texts to train its chatbot Claude, alleging large-scale theft.
AI Ethical Concerns. Google DeepMind employees are urging the company to end military contracts due to concerns about AI technology used for warfare.
AI in Ad Creation. Creatopy, which automates ad creation using AI, has raised $10 million and now serves over 5,000 brands and agencies.
Google's AI Image Generator. Google has released a powerful AI image generator, Imagen 3, for free use in the U.S., outperforming other models.
Content Partnership. OpenAI has partnered with Condé Nast to display content from its publications within AI products like ChatGPT and SearchGPT.
OpenAI's Regulatory Stance. OpenAI has opposed the proposed AI bill SB 1047 aimed at implementing safety measures, despite public support for regulation.
California AI Regulation. Anthropic's CEO supports California's AI bill SB 1047, stating the benefits outweigh the costs, despite some concerns.
AI for Coding Tasks. Open source Dracarys models are specifically designed to optimize coding tasks and significantly improve performance of existing models.
Advanced Long-Context Models. AI21's Jamba 1.5 Large model has demonstrated superior performance in latency tests against similar models.
Outperforming Competitors. Microsoft's Phi-3.5 outperforms other small models from Google, OpenAI, Mistral, and Meta on several key metrics.
.
.
.
.
.
End User Engagement. Users can inspect multiple alternative paths to verify the quality of secondary/tertiary relationships.
Obsession with User Feedback. I’d be lying if I said that there is one definitive approach (or that what we’ve done is absolutely the best approach).
Machine Learning in Legal Domain. These are the main aspects of the text-based search/embedding that are promising based on research and our own experiments.
High Cost of Mistakes. A mistake can cost a firm millions of dollars in settlements and serious loss of reputation. This high cost justifies the investment into better tools.
Cost of Legal Expertise. Legal Expertise is expensive. If a law firm can cut down the time required for a project by even a few hours, they are already looking at significant savings.
Importance of RAG. RAG is one of the most important use-cases for LLMs, and the goal is to build the best RAG systems possible.
Optimizations in Distance Measurement. FINGER significantly outperforms existing acceleration approaches and conventional libraries by 20% to 60% across different benchmark datasets.
Integration of Graph-Based Indexes. Given that we’re already working on graphs, another promising direction for us has been integrating graph-based indexes and search.
User Verification. By letting our users both verify and edit each step of the AI process, we let them make the AI adjust to their knowledge and insight, instead of asking them to change for the tool.
Focus on Transparency. Model transparency is crucial as a few trigger words/phrases can change the meaning/implication of a clause; users need to have complete insight into every step of the process.
Leveraging Control Tokens. We use control tokens, which are special tokens to indicate different types of elements, enhancing our tokenization process.
Flexible Indexing Approach. Updating the indexes with new information is much cheaper than retraining your entire AI model. Index-based search also allows us to see which chunks/contexts the AI picks to answer a particular query.
Hallucinations in AI. Type 1 Hallucinations are not a worry because our citations are guaranteed to be from the data source, and Type 2 Hallucinations will be reduced significantly through our unique process of constant refinement.
Reducing Costs. Relying on a smaller, Mixture of experts style setup instead of letting bigger models do everything reduces our costs dramatically, allowing us to do more with less.
Focus on User Feedback. Our unique approach to involving the user in the generation process leads to a beautiful pair of massive wins against Hallucinations.
Flexibility in Architecture. The best architecture is useless if it can't fit into your client's processes. Being Lawyer-Led, IQIDIS understands the importance of working within a lawyer's/firm's workflow.
KI-RAG Challenges. Building KI-RAG systems requires a lot more handling and constant maintenance, making them more expensive than traditional RAG.
Handling Legal Nuances. There is a lot of nuance to Law. Laws can change between regions, different sub-fields weigh different factors, and a lot of law is done in the gray areas.
Need for Higher Adaptability. Building upon this is a priority after our next round of fund-raising (or for any client that specifically requests this).
OpenAI's Opposition. OpenAI has just announced that it is opposed to California's SB-1047 despite Altman's public support for AI regulation at the Senate.
Legislation Improvement. Saunders did not think SB-1047 was perfect but says the proposed legislation was the best attempt I've seen to provide a check on this power.
Power Corrupts. If we don't figure out the governance problem, internal and external, before the next big AI advance, we could be in serious trouble.
Timelines for AGI. Saunders thinks it is at least somewhat plausible we will see AGI in a few years; I do not.
Need for Regulation. If OpenAI (and others in Silicon Valley) succeed in torpedoing SB-1047, self-regulation is in many ways what we will be left with.
Call for Accountable Power. Saunders described as a metaprinciple, 'Don't give power to people or structures that can't be held accountable.'
Future Whistleblower Protections. One of the most important reasons for passing SB-1047 in California was its whistleblower protections.
External Oversight Needed. There should be a role for external governance, as well: companies should not be able to make decisions of potentially enormous magnitude on their own.
Governance Concerns. Internal governance is key; it shouldn't be just one person at the top of one company calling the shots for all humanity.
Employee Discontent. Promises have been made and not kept; they lost faith in Altman personally, and have lost faith in the company's commitment to AI safety.
.
Image Generation Capabilities. Grok has also integrated FLUX.1 by Black Forest Labs to enable users to generate images.
Premium Access. Access to Grok is currently limited to Premium and Premium+ users.
Grok-2 Release. Elon Musk's company, X, has launched Grok-2 and Grok-2 mini in beta, both of which are AI models capable of generating images on the X social network.
Deepfake Scams. Elderly retiree loses over $690,000 to digital scammers using AI-powered deepfake videos of Elon Musk to promote fraudulent investment opportunities.
AI Codec Proposal. Using canonical codec representations like JPEG, this article proposes a method to directly model images and videos as compressed files, showing its effectiveness in image generation.
Procreate Stance. Procreate vows to never incorporate generative AI into its products, taking a stand against the technology.
US AI Lead. US leads in AI investment and job postings, surpassing China and other countries.
AI Image Licensing. OpenAI CEO's warning about the use of copyrighted content in AI models is highlighted as Anthropic faces a lawsuit for training its Claude AI model using authors' work without consent.
AI Risks Repository. MIT researchers release a comprehensive AI risk repository to guide policymakers and stakeholders in understanding and addressing the diverse and fragmented landscape of AI risks.
Research Automation Phases. The AI Scientist operates in three phases: idea generation, experimental iteration, and paper write-up.
AI Scientist Development. "The AI Scientist" is a novel AI system designed to automate the entire scientific research process.
AI Artist Claim Approved. The judge allowed a copyright claim against DeviantArt, which used a model based on Stable Diffusion.
Lawsuit Progress. The lawsuit against AI companies Stability and Midjourney, filed by a group of artists alleging copyright infringement, has gained traction as Judge William Orrick approved additional claims.
Conversational Features. Gemini Live can also interpret video in real time and function in the background or when the phone is locked.
Gemini Live Introduction. Google has introduced a new voice chat mode for its AI assistant, Gemini, named Gemini Live.
Image Tolerance. Compared to other image generators on the market, the model is far more permissive with regards to what images it can generate.
AI-driven Features. The company plans to deploy Grok-2 and Grok-2 mini in AI-driven features on X, including improved search capabilities, post analytics, and reply functions.
Shoutout.io Page. Shoutout.io is a very helpful tool that allows independent creators to gather testimonials in one place.
Research Engineer Openings. Haize Labs is looking for research scientists to join their teams based in NYC.
Encouragement to Apply. We encourage you to apply even if you do not believe you meet every single qualification: We’re open to considering a wide range of perspectives and experiences.
Guest Posts Initiative. I want to integrate more guest posts in this newsletter to cover a greater variety of topics and hear from experts across the board.
Case Study Articles. I’d like to do more case-study-style articles, where we look into different organizations to study how they solved their business/operational challenges with AI.
Prompt Caching Launch. Prompt Caching is Now Available on the Anthropic API for Specific Claude Models.
Deepfake Scams. How ‘Deepfake Elon Musk’ Became the Internet's Biggest Scammer.
FCC AI Robocall Rules. FCC Proposes New Rules on AI-Powered Robocalls.
MIT AI Risks Repository. MIT researchers release a repository of AI risks.
Popular AI Search Startup. Perplexity's popularity surges as AI search start-up takes on Google.
AI Search Evolution. Google's AI-generated search summaries change how they show their sources.
Risks of Unaligned AI. Overview of potential risks of unaligned AI models and skepticism around SingularityNet's AGI supercomputer claims.
Huawei's AI Chip. Huawei's Ascend 910C AI chip aims to rival NVIDIA's H100 amidst US export controls.
Google Voice Chat Feature. Google introduces Gemini Voice Chat Mode available to subscribers and integrates it into Pixel Buds Pro 2.
Grok 2 Beta Release. Grok 2's beta release features new image generation using Black Forest Labs' tech.
Legal Standards. The new form of SB 1047 can basically only be used after something really bad happens, as a tool to hold companies liable, rather than prevent risks.
Regulatory Fight. Most or all of the major big tech companies joined a lobbying organization that fought SB-1047, despite broad public support for the bill.
Bill Weakened. California's SB-1047 was significantly weakened in last-minute negotiations, affecting its ability to address catastrophic risks.
Need for Federal Legislation. Future state and federal efforts may suffer if the bill doesn't pass, showing that comprehensive regulatory efforts are needed at all levels.
Comprehensive Approach Needed. We need a comprehensive approach to AI regulation, as SB 1047 is just a start in addressing various risks associated with AI.
Innovative Balance. Passing SB-1047 may normalize the regulation of AI while allowing for continued innovation, showing that safety precautions are compatible with industry growth.
Whistleblower Protections. The bill provides important whistleblower protections, which are critical for transparency and accountability in AI companies.
Deterrent Value. SB-1047's strongest utility may come as a deterrent, clarifying that the duty to take reasonable care applies to AI developers.
Weak Assurance. The 'reasonable care' standard may be too weak, as billion-dollar companies might exploit it without facing meaningful consequences.
Narrow Focus. SB 1047 seems heavily skewed toward addressing hypothetical existential risks while largely ignoring demonstrable AI risks like misinformation and discrimination.
High Subscription Importance. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Experts in Chocolate Milk. Our chocolate milk cult has a lot of experts and prominent figures doing cool things.
Access to Justice Correlation. We can find a strong correlation between the fairness and independence of the court system and the general life quality and well-being of its populace.
AI Speed vs Court Speed. High tech runs three-times faster than normal businesses, and the government runs three times slower than normal businesses.
Judicial System's Importance. The court system undertakes a vitally important function in society as a central governance mechanism.
AI Adoption by Courts. The Attorney General's Office of São Paulo adopted GPT-4 last year to speed up the screening and reviewing process of lawsuits.
Cautious AI Implementation. Hallucination risks and security and data confidentiality concerns call for tremendous caution and common sense when using and implementing AI tools.
Impact on Legal Services. Legal copilots will inevitably drive down the price of legal services and make legal knowledge more accessible to non-lawyers.
Legal AI Tools' Future. The legal copilots that will succeed should be developed and branded with a focus on time-savings and productivity benefits.
Changing Nature of Legal Work. AI-driven tools will take care of routine, monotone tasks so lawyers can focus more on the strategic, high-value work.
AI Use in Legal Sector. 73% of 700 lawyers planned to utilize generative AI in their legal work within the next year.
Keynote Video. Here’s the video (well-produced by Machine Learning Street Talk (MLST) of a talk I gave on Friday, as a keynote at AGI-Summit 24.
GPT-5 Not Released. And no, GPT-5 did not drop this week as many had hoped.
Differing Views. Interesting to see where his take and mine differ.
Thoughts on Regulation. My thoughts on regulation are of course coming soon, in my next book (Taming Silicon Valley, now available for pre-order).
AI Winter Speculation. As for whether there is an AI winter coming, time will tell.
Expectations Reframing. At the very least, I foresee a significant reframing of expectations.
Audio Version Available. There is also an audio only version, here.
.
New Humanoid Robot. Figure's new humanoid robot leverages OpenAI for natural speech conversations.
UK Merger Probe. Amazon faces UK merger probe over $4B Anthropic AI investment.
Google Antitrust Ruling. Google Monopolized Search Through Illegal Deals, Judge Rules.
California AI Bill Impact. 'The Godmother of AI' says California's well-intended AI bill will harm the U.S. ecosystem.
OpenAI Co-founder Exit. OpenAI co-founder Schulman leaves for Anthropic, Brockman takes extended leave.
Adept AI Returns. Investors in Adept AI will be paid back after Amazon hires startup's top talent.
Character.AI Founders. Google's hiring of Character.AI's founders is the latest sign that part of the AI startup world is starting to implode.
Compute Efficiency Research. Research advancements such as Google's compute-efficient inference models and self-compressing neural networks, showcasing significant reductions in compute requirements while maintaining performance.
Humanoid Robotics Advances. Rapid advancements in humanoid robotics exemplified by new models from companies like Figure in partnership with OpenAI, achieving amateur-level human performance in tasks like table tennis.
OpenAI Changes. OpenAI's dramatic changes with co-founder exits, extended leaves, and new lawsuits from Elon Musk.
Personnel Movements. Notable personnel movements and product updates, such as Character.ai leaders joining Google and new AI features in Reddit and Audible.
.
Facial Recognition Use Case. In the U.K., the London Metropolitan Police admitted to using facial recognition technology on tens of thousands of people attending King Charles III's coronation in May 2023.
Mass Surveillance Impact. A recent study in The Quarterly Journal of Economics suggests that fewer people protest when public safety agencies acquire AI surveillance software to complement their cameras.
Multi-modal AI Concerns. Despite the potential of multi-modal AI, there is worry regarding its use in mass surveillance and automated weapon systems.
Emerging Adversarial Techniques. Transferability of adversarial examples between models and query-based attacks are vital strategies for black-box settings.
Evolutionary Strategies Potential. Evolutionary algorithms, such as genetic algorithms and differential evolution, show promise for generating adversarial perturbations.
Norm Considerations in Perturbation. Different norms (L1, L2, and L-infinity) significantly impact the outcome and effectiveness of adversarial perturbations.
Robust Features Importance. Training on just Robust Features leads to good results, suggesting a generalized extraction of robust features is a valuable future avenue for exploration.
Infectious Jailbreak Feasibility. Feeding an adversarial image into the memory of any randomly chosen agent can achieve infectious jailbreak, causing all agents to exhibit harmful behaviors exponentially fast.
Adversarial Perturbations Explained. Adversarial perturbations (AP) are subtle changes to images that can deceive AI classifiers by causing misclassification.
Agent Smith Attack. The Agent Smith setup involves simulating a multi-agent environment where a single adversarial image can lead to widespread harmful behaviors across almost all agents.
Artists' Lawsuit Progress. A class action lawsuit against AI companies Stability, Runway, and DeviantArt, filed by artists alleging copyright infringement, has been partially approved to proceed by a judge.
Falcon Mamba 7B Launch. The Technology Innovation Institute (TII) has introduced Falcon Mamba 7B, a new large language model that uses a State Space Language Model (SSLM) architecture, marking a shift from traditional transformer-based designs.
Performance Verification. Falcon Mamba 7B has been independently verified by Hugging Face as the top-performing open-source SSLM globally, outperforming established transformer-based models in benchmark tests.
Figure 02 Introduction. Figure has introduced its latest humanoid robot, Figure 02, which is designed to work alongside humans in a factory setting.
New Supercomputing Initiatives. A new supercomputing network aims to accelerate the development of artificial general intelligence (AGI) through a worldwide network of powerful computers.
AI Emotional Attachment Concerns. OpenAI is concerned about users developing emotional attachments to the GPT-4o chatbot, warning of potential negative impacts on human interactions.
AI Assistant at JPMorgan. JPMorgan Chase has rolled out a generative AI assistant to tens of thousands of its employees, designed to be as ubiquitous as Zoom.
WeRide IPO Plans. WeRide, a Chinese autonomous vehicle company, is seeking a $5.02 billion valuation in its U.S. IPO, aiming to raise about $96 million from the offering.
AI-Driven 3D Generation. A research paper by scientists from Meta and Oxford University introduces VFusion3D, an AI-driven technique capable of generating high-quality 3D models from 2D images in seconds.
Instagram AI Features. Instagram's new AI features allow people to create AI versions of themselves.
Misinformation Impact. The impact of misinformation via deepfakes, particularly one involving Elon Musk, is also highlighted.
Open-Source AI Stance. The White House says there is no need to restrict 'open-source' artificial intelligence — at least for now.
AI Law in Europe. The world's first-ever AI law is now enforced in Europe, targeting US tech giants.
New AI Tools. Black Forest Labs releases Open-Source FLUX.1, a 12 Billion Parameter Rectified Flow Transformer capable of generating images from text descriptions.
NVIDIA Chip Issues. Nvidia reportedly delays its next AI chip due to a design flaw.
Waymo Rollout. Waymo's driverless cars have rolled out in San Francisco.
AI News Summary. Hosts Andrey Kurenkov and John Krohn dive into significant updates and discussions in the AI world.
Clarifications Requested. Concerns about inaccuracies in the essay lead to a request for reconsideration of the stance on SB-1047.
Kill Switch Misunderstanding. The 'kill switch' requirement doesn't apply to open-source models once they are out of the original developer's control.
Concerns on SB-1047. SB-1047 does not require predicting every use of an AI model, but focuses on specific, serious 'critical harms' such as mass casualties and large-scale cyberattacks.
Impact on Little Tech. Much of the bill's requirements are limited to models with training runs of $100 million+, which does not predominantly impact 'little-tech'.
Common Regulatory Standards. Asking for standards and a degree of care in AI is common across many industries, contrasting with the fewer regulations on AI systems that could pose catastrophic risks.
Need for Concrete Suggestions. While favoring AI governance, there are no positive, concrete suggestions offered for addressing risks such as mass casualties or large-scale cyberattacks.
.
Cost Considerations. While modern RAG (especially generator-heavy setups) are more expensive than V0, the general principle is still useful to keep in mind.
RAG Definition. Retrieval Augmented Generation involves using AI to search a pre-defined knowledge base to answer user queries.
RAG Advantages. RAG speeds this up by having the AI find relevant contexts and aggregate them.
RAG System Recipes. The authors propose two distinct recipes for implementing RAG systems.
Integration Benefits. Query Classification Module leads to an average improvement in overall score from 0.428 to 0.443 and a reduction in latency time from 16.41 to 11.58 seconds per query.
RAG vs Fine-Tuning. RAG outperforms fine-tuning with respect to injecting new sources of information into an LLM's responses.
Fine-Tuning Focus. It’s best to keep the learning/information mainly to the data indexing.
Retrieval Methods Findings. The authors recommend monoT5 as a comprehensive method balancing performance and efficiency.
Hybrid Retrieval Success. Hybrid search, combining sparse and dense retrieval with HyDE, achieves the best retrieval performance.
Chunking Strategy. Sentence-level chunking with a size of 512 tokens, using techniques like 'small-to-big' and 'sliding window', provides a good balance between information preservation and processing efficiency.
BERT Accuracy. A BERT-based classifier achieved high accuracy (over 95%) in determining retrieval needs.
Query Classification. Decides if retrieval is needed for a given query, helping keep costs down.
.
RAG vs. LLMs. When resourced sufficiently, long-context LLMs consistently outperform Retrieval Augmented Generation in terms of average performance.
Emergent Garden. Emergent Garden puts out very interesting videos on Life simulations, neural networks, cellular automata, and other emergent programs.
Reading Recommendations. Devansh plans to share AI Papers/Publications, interesting books, videos, etc., each week.
Community Engagement. Devansh encourages individuals doing interesting work to drop their introduction in the comments for potential spotlight features.
Airbnb Architecture Shift. In 2018, Airbnb began its migration to a service-oriented architecture due to challenges with maintaining their Ruby on Rails 'monorail'.
Vocab Size Research. Research indicates that larger models deserve larger vocabularies, and increasing vocabulary size consistently improves downstream performance.
Supporting Independent Work. Devansh puts a lot of effort into creating work that is informative, useful, and independent from undue influence.
Content Focus. The focus will be on AI and Tech, but ideas might range from business, philosophy, ethics, and much more.
Confabulation Perspective. Hallucinations in large language models can be considered a potential resource instead of a categorically negative pitfall.
GitHub CI/CD Insights. GitHub runs 15,000 CI jobs within an hour across 150,000 cores of compute.
Machine Learning Applications. Software engineers building applications using machine learning need to test models in real-world scenarios before choosing the best performing model.
LLM Paper Notes. Jean David Ruvini posts his notes on LLM/NLP related papers every month, providing valuable insights.
Autonomous Driving Milestone. Stanford Engineering and Toyota Research Institute achieve a milestone in autonomous driving by creating the world’s first autonomous Tandem Drift team, using AI to direct two driverless cars to perform synchronized maneuvers.
Concerns Over AI Alteration. Elon Musk shares deepfake video of Kamala Harris, potentially violating platform's policies against synthetic and manipulated media, sparking concerns about AI-altered content in the upcoming election.
AI Law in Europe. Europe enforces the world's first AI law, targeting US tech giants with regulations on AI development, deployment, and use.
Meta's AI Studio Launch. Meta has launched a new tool called AI Studio, allowing users in the US to create AI versions of themselves on Instagram or the web.
Perplexity AI's Revenue Share. Perplexity AI plans to share advertising revenue with news publishers whose content is used by the bot, responding to accusations of plagiarism and unethical web scraping.
Funding for Black Forest Labs. Black Forest Labs, a startup founded by the creators of Stable Diffusion, has launched FLUX.1, a new text-to-image model suite for the open-source artificial intelligence community and secured $31 million in seed funding.
Musk's Revived Lawsuit. Elon Musk has reinitiated a lawsuit against OpenAI, the creator of the AI chatbot ChatGPT, reigniting a longstanding dispute that originated from a power conflict within the San Francisco-based startup.
Focus on AI Alignment. Schulman, who played a key role in creating the AI-powered chatbot platform ChatGPT and led OpenAI's alignment science efforts, stated his move was driven by a desire to focus more on AI alignment and hands-on technical work.
OpenAI Departures. OpenAI co-founder John Schulman has left the company to join rival AI startup Anthropic, while OpenAI president and co-founder Greg Brockman is taking an extended leave until the end of the year.
.
Data Collection Scale. ChatGPT has gathered unprecedented amounts of personal data.
Personal Data Training. Sam Altman has acknowledged wanting to train on everyone's personal documents (Word files, email etc).
WorldCoin Connection. Sam founded WorldCoin, known for their eye-scanning orb.
Monetization Intent. Altman wants to know - and monetize - everything about you.
Investment in Hardware. OpenAI just put in money in a $60M fundraise with a Webcam company and is planning hardware joint venture with them.
Security Expertise. OpenAI recently put Paul Nakasone (ex NSA) on the board.
Future Prospects Doubted. Prospects don’t seem as strong as they once did.
Risk of WeWork Comparison. I said it before, and I will say it again: OpenAI could wind up being seen as the WeWork of AI.
Morale Issues Identified. The board, which basically said it couldn't trust Sam, may have had a point.
Key Staff Departures. Over the last several months they have lost Ilya Sutskever, a whole bunch of safety people, and (slightly earlier) Andrej Karpathy.
Continuous Monitoring. Gary Marcus has had his eye on OpenAI for a long time.
Image Link. OpenAI's challenges appear visually notable.
Valuation Concerns. Will they earn enough to justify their $80B valuation?
Google Antitrust Case. Google lost its antitrust case; it could have implications for Google's storehouse of AI training data.
Nvidia Stock Decline. Nvidia dropped 6%, 20% over the last month.
AGI Predictions. OpenAI tempered expectations for its next event, and said we wouldn't see GPT-5 then.
Elon Musk's Lawsuit. Elon sued OpenAI again; the most interesting thing is that the suit could force a discussion of what AGI means – in court.
Market Uncertainty. It is also not out of the question that today could end someday be seen as a turning point.
Election Misinformation. Five states suggested that Musk's AI chatbot has spread election misinformation.
AI in Mathematics. AI achieves silver-medal standard solving International Mathematical Olympiad problems.
Strike Over AI. Video game performers will go on strike over artificial intelligence concerns.
Legislative Actions. Democratic senators seek to reverse Supreme Court ruling that restricts federal agency power.
Impact of AI on Jobs. As new tech threatens jobs, Silicon Valley promotes no-strings cash aid.
AI Safety Concerns. Senators demand OpenAI detail efforts to make its AI safe.
Cohere's Funding. AI startup Cohere raises US$500-million, valuing company at US$5.5-billion.
Meta's New AI Model. Meta releases open-source AI model it says rivals OpenAI, Google tech.
Google's Gemini Model. Google gives free Gemini users access to its faster, lighter 1.5 Flash AI model.
OpenAI's SearchGPT. OpenAI announces SearchGPT, its AI-powered search engine.
Investor Enthusiasm Diminishing. Investors may well stop forking out money at the rates they have, enthusiasm may diminish, and a lot of people may lose their shirts.
Generative AI Limitations. There is just one thing: Generative AI, at least we know it now, doesn't actually work that well, and maybe never will.
AI Bubble Prediction. I just wrote a hard-hitting essay for WIRED predicting that the AI bubble will collapse in 2025 — and now I wish I hadn't.
Imminent Collapse. The collapse of the generative AI bubble – in a financial sense – appears imminent, likely before the end of the calendar year.
Strict Disbelief. I've always thought GenAI was overrated.
Consistent Predictions. In March of this year, I made a series of seven predictions about how this year would go. Every one of them has held firm, for every model produced by every developer ever since.
Warning About AI. Almost exactly a year ago, in August 2023, I was (AFAIK) the first person to warn that Generative AI could be a dud.
Historical Predictions. In December 2022, at the height of ChatGPT's popularity I made a series of seven predictions about GPT-4 and its limits, such as hallucinations and making stupid errors, in an essay called What to Expect When You Are Expecting GPT-4.
.
Median Split Insight. The key dividing line on the SAT math lies between those who understand fractions, and those who do not.
AGI Misconceptions. Realizing neural networks struggle with outliers makes AGI seem like sheer fantasy, as no general solution to the outlier problem exists yet.
Symbolic vs Neural Networks. Symbolic systems have always been good for outliers; neural networks have always struggled with them.
Generative AI Expectations. GenAI sucks at outliers; if things are far enough from the space of trained examples, the techniques will fail.
AI Industry Bubble. An entire industry has been built - and will collapse - because people aren’t getting it regarding the outlier problem.
Cognitive Sciences Respect. AI researchers should have more respect for the cognitive sciences to make better advancements.
Historical Context. Machine learning had trouble with outliers in the 1990s, and it still does.
Outlier Problem Noted. Handling outliers is still the Achilles’ Heel of neural networks; this has been a constant issue for over a quarter century.
Machine Learning Limitations. Current approaches to machine learning are lousy at outliers, which means they often say and do things that are absurd when encountering unusual circumstances.
Internalized Taskmaster. The internalized taskmaster becomes more insidious than any external authority, driving individuals to constantly strive for more.
Effects of Boredom. Han highlights that deep boredom can lead to mental relaxation, contrasting with the hectic pace of contemporary life.
Cultural Critique. While some critiques of Han's work resonate, there are also suggestions that engaging with craftsmanship can bring joy, countering the narrative of constant productivity.
Engagement with Philosophy. The article recommends exploring philosophical perspectives like those of Nietzsche and Kierkegaard alongside Han's analysis for a broader understanding of the issues at hand.
Limitations of Achievement. The achievement society leads to a distorted view of life, reducing relationships and experiences to mere metrics of success.
Burnout Society Overview. Byung-Chul Han describes how modern society primes us for burnout, reflecting on individual experiences in this context.
Self-Destructive Pressure. The achievement-subject experiences destructive self-reproach and auto-aggression, resulting in a mental war against themselves.
Importance of Idleness. Han emphasizes the need for idle work, where tasks are done without worrying about results, to regain the right to be 'Human Beings' instead of 'Human Doings'.
Impact of Positivity. In the achievement society, positivity becomes a dominant force, pushing individuals to be happier and more successful, leading to internalized pressure.
Achievement Society Dynamics. Society has transitioned from a Discipline-based model to an Achievement-based one, driven by internal pressures to succeed.
.
Investor Concerns. Microsoft's Chief Financial Officer painted a picture of a much slower burn, alarming some investors.
GenAI Project Canceled. Another GenAI monetization scheme bites the dust.
Survey Findings. The Upwork survey highlighted during the week reflects shifting sentiments around Generative AI.
Generative AI Decline. Generative AI might be a dud; I just didn't expect it to fade so fast.
Warning on Deep Learning. Gary Marcus has been warning that deep learning was oversold since November 2012. Looks like he was right.
Opportunity for Resources. The fact that the GenAI bubble is apparently bursting sooner than expected may soon free up resources for other approaches, e.g., into neurosymbolic AI.
Loss of Faith. The bubble has begun to burst. Users have lost faith, clients have lost faith, VC's have lost faith.
Canceled Deal Reported. Business Insider reported a canceled deal, exacerbating concerns for the sector.
Combat Information Overload. The best way to combat the information overload created by Deepfakes is to empower people to stand on their own, interact with the world, and take care of themselves.
Deepfake Risks Discussion. The discussions around the risks from Deepfakes are incomplete (or wrong) since they overexaggerate some risks while ignoring others.
Cognitive Overload. The most immediate and pervasive impact of deepfakes would be the cognitive overload and information fatigue they create.
Need for Educational Reform. The way we see Education needs a rework- the emphasis on Courses, books, and degrees creates learners who are too static and passive.
Education and Empowerment. The best regulation will, therefore, focus on equipping us with the skills needed to navigate this.
Age of Misinformation. We fail with Deepfakes because we fail with SoMe, resorting to ineffective cases for both- censorship and an abdication of personal responsibility.
Combatting Environmental Concerns. Investing in more energy-efficient hardware and software for deepfake creation can significantly reduce energy consumption and emissions.
Environmental Impact. The energy-intensive process of generating deepfakes will contribute to climate change.
Scams and Vulnerability. Deepfakes provide a new tool for scammers, especially in targeting emotionally vulnerable people.
Legal Complications. Deepfakes challenge the reliability of digital evidence in court, potentially slowing legal processes.
Labeling AI Content. I believe that heavily AI-generated content should be labeled, and people featured in AI Ads must have given explicit approval for their appearance.
Exploitation of Public Figures. Non-consensual use of deepfakes can dilute personal brands and harm fan relationships.
Political Misinformation. The real danger lies in the lack of media literacy and critical thinking skills, exacerbated by political polarization.
.
YouTube Search Deal. Google has become the exclusive search engine capable of surfacing results from Reddit, one of the internet's most significant sources of user-generated content.
FTC AI Investigation. FTC investigates how companies use AI to implement surveillance pricing based on consumer behavior and personal data, seeking information from eight major companies.
AI Scraping Backlash. AI companies are facing a growing backlash from website owners who are blocking their scraper bots, leading to concerns about the availability of data for AI training.
Regulatory Pressure. Elon Musk's X platform is under pressure from data regulators after it emerged that users are consenting to their posts being used to build artificial intelligence systems via a default setting on the app.
OpenAI Bankruptcy Risk. OpenAI faces potential bankruptcy with projected $5 billion losses due to high operational costs and insufficient revenue from its AI ventures.
AI Funding Surge. AI startups have raised $41.5 billion worldwide in the past five years, surpassing other industries and indicating a significant role for AI in the future development and modernization of various sectors.
Adobe Generative AI. Adobe introduces new generative AI features to Illustrator and Photoshop, including tools like Generative Shape Fill and Text to Pattern in Illustrator.
Mistral Large 2. Mistral AI has launched Mistral Large 2, a new generation of its flagship model, boasting 123 billion parameters and a 128k context window.
SearchGPT Launch. OpenAI has announced its entry into the search market with SearchGPT, an AI-powered search engine that organizes and makes sense of search results rather than just providing a list of links.
Study Reference. Read Bjarnason's new essay here.
Organizational Expectations. Management's expectation that AI is a magic fix for the organizational catastrophe that is the mass layoff fad is often unfounded.
General Public Sentiment. Many coders and tech aficionados may love ChatGPT for work, but much of the outside world feels quite differently.
Unusual Study Results. It's quite unusual for a study like this on a new office tool to return such a resoundingly negative sentiment.
Negative AI Impact. Over three in four (77%) say AI tools have decreased their productivity and added to their workload in at least one way.
Productivity Concerns. Nearly half (47%) of workers using AI say they have no idea how to achieve the productivity gains their employers expect.
Confidence in AI. On balance, these systems simply cannot be counted on, which is a bit part of why Fortune 500 companies have lost confidence in LLMs, after the initial hype.
Neurosymbolic AI Potential. AlphaProof and AlphaGeometry are both along the lines of first that we discussed, using formal systems like Cyc to vet solutions produced by LLMs.
Generative AI Bubble. I fully expect that the generative AI bubble will begin to burst within the next 12 months, for many reasons.
Limitations of Generative AI. The biggest intrinsic failings of generative AI have to do with reliability, in a way that I believe can never be solved, given their inherent nature.
Frustration with LLMs. My strong intuition... is that LLMs are simply never going to work reliably, at least not in the general form that so many people last year seemed to be hoping.
Need for Hybrid Models. What I have advocated for, my entire career, is hybrid approaches, sometimes called neurosymbolic AI, because they combine the best of the currently popular neural network approach with the symbolic approach.
Progress by Google DeepMind. To do this GDM used not one but two separate systems, a new one called AlphaProof, focused on theorem proving, and an update (AlphaGeometry 2) to an older one focused on geometry.
Open Source Advancements. Mistral releases Codestral Mamba for faster, longer code generation.
AI Video Model. Haiper 1.5 is a new AI video generation model challenging Sora and Runway.
Policy Issues. The U.S. is considering 'draconian' sanctions against China's semiconductor industry.
Elon Musk's Supercomputer. Elon Musk is working on a giant xAI supercomputer in Memphis.
GPT-4o Mini Release. OpenAI's release of GPT-4o Mini is a small AI model powering ChatGPT.
Internal Controversies. Whistleblowers say OpenAI illegally barred staff from airing safety risks.
.
.
AI Health Uncut. Sergei Polevikov publishes super insightful and informative reports on AI, Healthcare, and Medicine as a business.
Dhabawala Case Study. Mumbai’s Dhabawala service presents an interesting case study of what is required to make food delivery profitable.
Importance of Stakeholder Alignment. The impact of getting stakeholder communication right vs wrong can be immense.
Philosophy of Love. Dostoevsky's ideas about love are hopeful, optimistic, demanding, and terrifying.
AI's Investment Issues. Turns out a lot of the massive GPU purchase agreements and data center acquisitions were misguided and investing without a clear long-term vision and no understanding of revenue has lead to no ROI.
Semiconductor Industry Insights. The semiconductor capital equipment (semicap) industry is one of the most important industries on the planet and one that doesn’t get much love.
Nvidia's Value. In weeks leading up to Nvidia becoming the most valuable company in the world, I’ve received numerous requests for the updated math behind my analysis.
LLM Evaluation Technique. We explore the use of state-of-the-art LLMs, such as GPT-4, as a surrogate for humans.
Human Evaluation Challenges. While human evaluation is the gold standard for assessing human preferences, it is exceptionally slow and costly.
Future Articles. Deepfake Part 3. Exploring the true dangers of AI-generated misinformation.
Active Subreddits. We started an AI Made Simple Subreddit. Come join us over here.
Community Engagement. If you’re doing interesting work and would like to be featured in the spotlight section, just drop your introduction in the comments/by reaching out to me.
Reading Recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week.
Newsletter Reach. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Research Focus. The goal is to share interesting content with y'all so that you can get a peek behind the scenes into my research process.
Profit Predictions. That’s not great news for OpenAI, and you can see why they haven’t been, um, Open, about their financials.
Potential Fatal Questions. All of these questions are hard, with no obvious answer; the last may be fatal.
OpenAI's Financial Issues. I have long suspected that OpenAI was losing money, and lots of it, but never seen an analysis, until this morning.
Accurate Predictions. Gary Marcus’s predictions over the last couple years have been astonishingly on target.
Investor Questions. But investors really ought to ask some tough questions, such as these: What is their moat?
Cash Raising Necessity. Obviously, their only hope is to raise more cash, and they will certainly try.
LLMs as Commodities. LLMs have just became exactly the commodity I predicted they would become, at the lowest possible price.
MetaAI Competition. Yesterday was something even more dramatic: MetaAI all but pulled the rug out from OpenAI's business, offering a viable competitor to GPT-4 for free.
Lack of Competitive Moat. OpenAI, as far as I can tell, doesn’t really have any moat whatsoever, beyond brand recognition.
.
Hugging Face SmoLLM. Hugging Face has introduced SmoLLM, a new series of compact language models available in three sizes: 130M, 350M, and 1.7B parameters.
Market Demand for Small Models. The trend toward small language models is accelerating as Arcee AI announced its $24M Series A funding only 6 months after a $5.5M seed round in January 2024.
OpenAI Reasoning Project. OpenAI is developing a new reasoning technology called Project Strawberry, which aims to enable AI models to conduct autonomous research and improve their ability to answer difficult user queries.
AI Security Standards. Top tech companies form a coalition to develop cybersecurity and safety standards for AI, aiming to ensure rigorous security practices and keep malicious hackers at bay.
AI Training Data Ethics. A massive dataset containing subtitles from over 170,000 YouTube videos was used to train AI systems for major tech companies without permission, raising significant ethical and legal questions.
Llama 3.1 Parameters. With 405 billion parameters, Llama 3.1 was developed using over 16,000 Nvidia H100 GPUs, costing Meta hundreds of millions of dollars.
Meta Llama 3.1 Release. Meta has released Llama 3.1, the largest open-source AI model, claiming it outperforms top private models like GPT-4o and Claude 3.5 Sonnet.
GPT-4o Mini Performance. GPT-4o mini scored 82% on the MMLU reasoning benchmark and 87% on the MGSM math reasoning benchmark, outperforming other models like Gemini 1.5 Flash and Claude 3 Haiku.
GPT-4o Mini Launch. OpenAI has launched GPT-4o mini, a smaller, faster, and more cost-effective AI model than its predecessors.
Data Augmentation Strategy. We will use a policy like TrivialAugment + StyleTransfer, for it's superior performance, cost, and benefits.
Effective Feature Extraction. Feature extraction is the highest ROI decision you can make.
Record Accuracy Achieved. On the widely used Labeled Faces in the Wild (LFW) dataset, our system achieves a new record accuracy of 99.63%.
Deepfake Detection System. We hope to build a Deepfake Detection system that can classify between 3 types of inputs: real, deep-fake, and ai-generated.
Self-Supervised Learning Application. Self-supervised clustering is elite for selecting the right samples to train on, helping to overcome scaling limits.
Audience Engagement Strategy. Every share puts me in front of a new audience, and I rely entirely on word-of-mouth endorsements to grow.
Model Performance Improvement. Our method uses a deep convolutional network trained to directly optimize the embedding itself, achieving state-of-the-art face recognition performance using only 128-bytes per face.
Sample Selection for Retraining. It’s best to add train samples based on maximizing information gain instead of simply adding more random ones.
Temporal Feature Analysis. If you want to take things up a notch, you’re best served going for temporal feature extraction.
Importance of Ensemble Modeling. Using simple models will keep inference costs low and allows an ensemble to compensate for the weakness of one model by sampling a more diverse search space.
.
.
.
.
Vibe-Eval Suite. Reka AI introduces Vibe-Eval, a new evaluation suite designed to measure the progress of multimodal language models.
Global AI Regulation. Japan's Prime Minister Fumio Kishida unveils an international framework for the regulation and use of generative AI, emphasizing the need to address the potential risks and promote cooperation for safe and trustworthy AI.
AI in Healthcare. AI system trained on heart's electrical activity reduces deaths in high-risk patients by 31% in hospital trial, proving its potential to save lives.
AI Notetaking Revolution. 'I will never go back': Ontario family doctor says new AI notetaking saved her job.
Shift to Enterprise Focus. AI startups that initially garnered attention with innovative generative AI products are now shifting their focus towards enterprise customers to enhance revenue streams.
Meta's Ad Tool Issues. Meta's automated ad tool, Advantage Plus, has been overspending on ad budgets and failing to deliver sales, causing frustration among marketers and businesses.
Lawsuit Against OpenAI. Eight U.S. newspaper publishers, all under the ownership of investment firm Alden Global Capital, have filed a lawsuit against Microsoft and OpenAI, alleging copyright infringement.
Inverse Scaling Phenomenon. The authors also share their findings on the difficulty of creating and evaluating hard prompts, and the phenomenon of inverse scaling, where larger models fail tasks that smaller models can complete.
Burnout in AI Industry. AI engineers in the tech industry are experiencing burnout and rushed rollouts due to the intense competition and pressure to stay ahead in the generative AI race.
Microsoft's AI Policy Change. Microsoft bans U.S. police from using enterprise AI tool for facial recognition due to concerns about potential pitfalls and racial biases.
Evaluation Challenges. The authors discuss the challenges of creating hard prompts and the trade-offs between human and model-based automatic evaluation.
New AI Model. New Microsoft AI model may challenge GPT-4 and Google Gemini.
Mystery Chatbot. Mysterious 'gpt2-chatbot' AI model appears suddenly, confuses experts.
AI Music Generation. ElevenLabs previews music-generating AI model.
AI Content Labeling. TikTok will automatically label AI-generated content created on platforms like DALL·E 3.
AI Audiobooks. Audible's Test of AI-Voiced Audiobooks Tops 40,000 Titles.
Deepfake Detector Release. OpenAI Releases 'Deepfake' Detector to Disinformation Researchers.
AI Export Bill. US lawmakers unveil bill to make it easier to restrict exports of AI models.
OpenAI & Stack Overflow. OpenAI and Stack Overflow partner to bring more technical knowledge into ChatGPT.
Robotaxi Plans Delayed. Motional delays commercial robotaxi plans amid restructuring.
Funding for Autonomy. Wayve, an A.I. Start-Up for Autonomous Driving, Raises $1 Billion.
Siri Revamp. Apple Will Revamp Siri to Catch Up to Its Chatbot Competitors.
Advancements in Drug Discovery. AlphaFold 3 is expected to be particularly beneficial for drug discovery, as it can predict where a drug binds a protein, a feature that was absent in its predecessor, AlphaFold 2.
TikTok AI Labeling. TikTok has announced that it will automatically label AI-generated content created on other platforms, such as OpenAI's DALL·E 3, using a technology called Content Credentials from the Coalition for Content Provenance and Authenticity (C2PA).
DeepSeek-V2 Features. DeepSeek AI releases DeepSeek-V2, a Mixture-of-Experts (MoE) language model, that is state-of-the-art, cost-effective, and efficient with 236B total parameters, of which 21B are activated for each token.
Robot Dogs Testing. The United States Marine Forces Special Operations Command (MARSOC) is testing rifle-armed 'robot dogs' supplied by Onyx Industries.
Microsoft Copilot Upgrade. Microsoft is introducing new AI features in Copilot for Microsoft 365 to help users create better prompts and become prompt engineers, aiming to improve productivity and efficiency in the workplace.
Wayve's $1 Billion Raise. Wayve, a London-based AI start-up for autonomous driving, raised an eye-popping $1 billion from investors like SoftBank, Microsoft, and Nvidia.
AI and Deception. AI systems are becoming increasingly sophisticated in their capacity for deception, raising concerns about potential dangers to society and the need for AI safety laws.
Safety Tool Release. U.K. Safety Institute releases an open-source toolset called Inspect to assess AI model safety, aiming to provide a shared, accessible approach to evaluations.
AI Deepfake Detector. OpenAI releases a deepfake detector tool to combat the influence of AI-generated content on the upcoming elections, acknowledging that it's just the beginning of the fight against deepfakes.
AlphaFold 3 Overview. Google's DeepMind has unveiled AlphaFold 3, an advanced version of its protein structure prediction tool, which can now predict the structures of DNA, RNA, and essential drug discovery molecules like ligands.
AI Model Competition. Microsoft is developing a new large-scale AI language model called MAI-1, potentially rivaling state-of-the-art models from Google, Anthropic, and OpenAI.
YouTube Version. You can watch the youtube version of this here:
AI News Summary. Our 167th episode with a summary and discussion of last week's big AI news!
Guest Host. With guest host Daliana Liu from The Data Scientist Show!
Special Interview. With a special one-time interview with Andrey in the latter part of the podcast.
Listener Interaction. Email us your questions and feedback at contact@lastweekin.ai and/or hello@gladstone.ai.
OpenAI GPT-4o. OpenAI releases GPT-4o, a faster model that's free for all ChatGPT users.
Google AI Astra. Project Astra is the future of AI at Google.
AI in Search. Google is redesigning its search engine — and it's AI all the way down.
Google Media Models. Google unveils Veo and Imagen 3, its latest AI media creation models.
AI Music Sandbox. Google Unveils Music AI Sandbox Making Loops From Prompts.
Anthropic AI Tool. Anthropic AI Launches a Prompt Engineering Tool that Generates Production-Ready Prompts in the Anthropic Console.
OpenAI Leadership Change. OpenAI's Chief Scientist and Co-Founder Is Leaving the Company.
Anthropic Leadership. Mike Krieger joins Anthropic as Chief Product Officer.
Robotaxi Testing. GM's Cruise to start testing robotaxis in Phoenix area with human safety drivers on board.
Zoox Probe. US agency probes Amazon-owned Zoox self-driving vehicles after two crashes.
Waymo Investigation. Waymo's robotaxis under investigation after crashes and traffic mishaps.
New AI Models. Falcon 2: UAE's Technology Innovation Institute Releases New AI Model Series, Outperforming Meta's New Llama 3.
AI Model Safety. U.K. agency releases tools to test AI model safety.
AI Watermark. Google's invisible AI watermark will help identify generative text and video.
AI Copyright Issues. How One Author Pushed the Limits of AI Copyright.
Project Astra Launch. Google's Project Astra, a real-time, multimodal AI assistant, is the future of AI at Google, according to Demis Hassabis, the head of Google DeepMind.
AI Legislation in Colorado. Colorado lawmakers have passed a landmark AI discrimination bill, which would prohibit employers from using AI to discriminate against workers.
AI in Journalism. Gannett is implementing AI-generated bullet points at the top of journalists' stories to enhance the reporting process.
AI Emissions Concerns. Microsoft's emissions and water usage spiked due to the increased demand for AI technologies, posing challenges to meeting sustainability goals.
Investment in AI. Microsoft announces a 4 billion euro investment in cloud and AI infrastructure, AI skilling, and French Tech acceleration.
AI College Partnership. Reddit's partnership with OpenAI allows the AI company to train its models on Reddit content, leading to a surge in Reddit shares.
Waymo Investigation. The National Highway Traffic Safety Administration (NHTSA) has initiated an investigation into Alphabet's Waymo self-driving vehicles following reports of unexpected behavior and traffic safety violations.
Transparency Issues. This news came amidst the release of ChatGPT 4o, but OpenAI's restrictive off-boarding agreement has raised concerns about the company's transparency.
Multimodal Capabilities. The new model is 'natively multimodal,' meaning it can generate content or understand commands in voice, text, or images.
OpenAI's GPT-4o Release. OpenAI has announced the release of GPT-4o, an enhanced version of the GPT-4 model that powers ChatGPT.
Astra's Functionality. Hassabis envisions AI's future to be less about the models and more about their functionality, with AI agents performing tasks on behalf of users.
Fetch AI Assistant. Microsoft, Khan Academy provide free AI assistant for all educators in US.
AI Regulation Bill. Colorado governor signs sweeping AI regulation bill.
AI Likeness Management. Hollywood agency CAA aims to help stars manage their own AI likenesses.
AI Safety Commitments. Tech giants pledge AI safety commitments — including a ‘kill switch’ if they can't mitigate risks.
Groundbreaking AI Law. World's first major law for artificial intelligence gets final EU green light.
Emotional AI Initiative. Inflection AI reveals new team and plan to embed emotional AI in business bots.
AI Voice Concerns. OpenAI says Sky voice in ChatGPT will be paused after concerns it sounds too much like Scarlett Johansson.
AI and Education. AI tutors are quietly changing how kids in the US study, offering affordable and personalized assistance for school assignments.
First AI Regulation. EU member states have approved the world's first major law for regulating artificial intelligence, emphasizing trust, transparency, and accountability.
Universal Basic Income. AI 'godfather' Geoffrey Hinton advocates for universal basic income to address AI's impact on job inequality and wealth distribution.
AI-Language Model War. Tencent and iFlytek have entered a price war by slashing prices of large-language models used for chatbots.
Generative AI Upgrade. Amazon is upgrading its decade-old Alexa voice assistant with generative artificial intelligence and plans to charge a monthly subscription fee.
OpenAI's Response. OpenAI has temporarily halted the use of the Sky voice in its ChatGPT application due to its resemblance to actress Scarlett Johansson's voice.
Claude's Discoveries. One notable discovery was a feature associated with the Golden Gate Bridge, which, when activated, indicated that Claude was contemplating the landmark.
Anthropic Research. A new research paper published by Anthropic aims to demystify the 'black box' phenomenon of AI's algorithmic behavior.
AI Launch Issues. This incident continues a trend of Google facing issues with its latest AI features immediately after their launch, as seen in February 2023.
Trust Undermined. This has led to a significant backlash online, undermining trust in Google's search engine, which is used by over two billion people for reliable information.
Google's AI Errors. Google's recent unveiling of its new artificial intelligence (AI) capabilities for search has sparked controversy due to a series of errors and untruths.
AI News Summary. Our 169th episode with a summary and discussion of last week's big AI news!
Hollywood AI Partnerships. Alphabet, Meta Offer Millions to Partner With Hollywood on AI.
AI Cloning Fines. Robocaller Who Used AI to Clone Biden's Voice Fined $6 Million.
AI Safety Concerns. OpenAI researcher who resigned over safety concerns joins Anthropic.
Training Compute Growth. Training Compute of Frontier AI Models Grows by 4-5x per Year.
AI Model Rankings. Scale AI publishes its first LLM Leaderboards, ranking AI model performance in specific domains.
xAI Funding. Elon Musk's xAI raises $6 billion in latest funding round.
Nvidia Revenue Surge. Nvidia, Powered by A.I. Boom, Reports Soaring Revenue and Profits.
ChatGPT Discounts. OpenAI launches programs making ChatGPT cheaper for schools and nonprofits.
Content Deals with OpenAI. Vox Media and The Atlantic sign content deals with OpenAI.
PwC and OpenAI. PwC agrees deal to become OpenAI's first reseller and largest enterprise user.
AI Earbuds Innovation. Iyo thinks its gen AI earbuds can succeed where Humane and Rabbit stumbled.
Real-time Video Translation. Microsoft Edge will translate and dub YouTube videos as you’re watching them.
Alexa's AI Overhaul. Amazon plans to give Alexa an AI overhaul — and a monthly subscription price.
Opera's AI Integration. Opera is adding Google's Gemini AI to its browser.
Telegram Copilot Bot. Telegram gets an in-app Copilot bot.
Google AI Controversy. Google's A.I. Search Errors Cause a Furor Online.
OpenAI Board Conflict. OpenAI is also embroiled in controversy, with former board member Helen Toner accusing CEO Sam Altman of dishonesty and manipulation during a failed coup attempt.
Expensive AI Training Data. AI training data is becoming increasingly expensive, putting it out of reach for all but the wealthiest tech companies.
Survey on AI Usage. AI products like ChatGPT are much hyped but not widely used, with only 2% of British respondents using such tools on a daily basis.
AI Industry Tensions. The AI industry is seeing increasing tension, highlighted by a recent clash between Elon Musk and Yann LeCun on social media.
EU AI Act Developments. The EU is establishing the AI Office to regulate AI risks, foster innovation, and influence global AI governance.
Deepfake Concerns. A deepfake video of a U.S. official discussing Ukraine's potential strikes in Russia has surfaced, raising concerns about the use of AI-powered disinformation.
AI Misuse in Influencing Campaigns. Russia and China used OpenAI's A.I. in covert campaigns to manipulate public opinion and influence geopolitics, raising concerns about the impact of generative A.I. on online disinformation.
AI Search Tool Rollback. Google's new artificial intelligence feature for its search engine, A.I. Overviews, has been significantly rolled back after it produced a series of errors and false information.
PwC as OpenAI Reseller. OpenAI has partnered with consulting giant PwC to provide ChatGPT Enterprise, the business-oriented version of its AI chatbot, to PwC employees and clients.
Vox Media and OpenAI Partnership. Vox Media has announced a strategic partnership with OpenAI, aiming to leverage AI technology to enhance its content and product offerings.
Musk's xAI Controversy. LeCun criticized Musk's leadership at xAI, calling him an erratic megalomaniac, following Musk's announcement of a $6 billion funding round for xAI.
AGI by 2027. Former OpenAI researcher foresees AGI reality in 2027.
AI Beauty Pageant. The Uncanny Rise of the World's First AI Beauty Pageant.
GPT-4 Exam Performance. GPT-4 didn't ace the bar exam after all, MIT research suggests — it didn't even break the 70th percentile.
Election Risks. Testing and mitigating elections-related risks.
OpenAI Whistleblowers. OpenAI Insiders Warn of a 'Reckless' Race for Dominance.
Tech Giants Collaboration. Google, Intel, Microsoft, AMD and more team up to develop an interconnect standard to rival Nvidia's NVLink.
Microsoft Layoffs. Microsoft Lays Off 1,500 Workers, Blames 'AI Wave'.
Zoox Self-Driving Cars. Zoox to test self-driving cars in Austin and Miami.
UAE AI Partnership. UAE seeks 'marriage' with US over artificial intelligence deals.
Saudi Investment. Saudi fund invests in China effort to create rival to OpenAI.
OpenAI Robotics Group. OpenAI is restarting its robotics research group.
Google's NotebookLM. Google's updated AI-powered NotebookLM expands to India, UK and over 200 other countries.
ElevenLabs Sound Effects. ElevenLabs’ AI generator makes explosions or other sound effects with just a prompt.
Perplexity AI Feature. Perplexity AI's new feature will turn your searches into shareable pages.
Udio 130 Model. Udio introduces new udio-130 music generation model and more advanced features.
Apple's AI Features. 'Apple Intelligence' will automatically choose between on-device and cloud-powered AI.
AI Video Generator. KLING is the latest AI video generator that could rival OpenAI's Sora.
Right to Warn. Thirteen current and former employees of OpenAI and Google DeepMind have published a proposal demanding the right to warn the public about the potential dangers of advanced artificial intelligence (AI).
Anticipating AGI. Former OpenAI researcher predicts the arrival of AGI by 2027, foreseeing AI machines surpassing human intelligence and national security implications.
ChatGPT Outage. OpenAI's ChatGPT experienced multiple outages, including a major one during the daytime in the US, but the issues were eventually resolved.
Amazon AI Impact. Amazon's use of AI and robotics in its warehouses isolates workers and hinders union organizing, according to a new report by Oxford University researchers.
FTC Antitrust Investigations. FTC and DOJ open antitrust investigations into Microsoft, OpenAI, and Nvidia, with the FTC looking into potential antitrust issues related to investments made by technology companies into smaller AI companies.
Microsoft's AI Investment. Microsoft plans to invest $3.2 billion in AI infrastructure in Sweden, including training 250,000 people and increasing capacity at its data centers.
AI Chatbot Accuracy. AI chatbots, including Google’s Gemini 1.0 Pro and OpenAI’s GPT-3, provided incorrect information 27% of the time when asked about voting and the 2024 election.
Kuaishou's New Product. Kuaishou, a Chinese short-video app, has launched a text-to-video service similar to OpenAI's Sora, as part of the race among Chinese Big Tech firms to catch up with US counterparts in AI applications.
Concept Storage Method. A new research paper from OpenAI introduces a method to identify how the AI stores concepts that might cause misbehavior.
Whistleblower Protections. The proposal also calls for the abolition of nondisparagement agreements that prevent insiders from voicing risk-related concerns.
Conversations with Siri. Key features include a more conversational Siri, AI-generated 'Genmoji,' and integration with OpenAI's GPT-4o for handling complex requests.
Deepfake Impact. AI played a significant role in the Indian election, with political parties using deepfakes and AI-generated content for targeted communication, translation of speeches, and personalized voter outreach.
Regulatory Challenges. Waymo issues a voluntary software recall after a driverless vehicle collides with a telephone pole, prompting increased regulatory scrutiny of the autonomous vehicle industry.
OpenAI Revenue Growth. OpenAI's annualized revenue has more than doubled in the last six months, reaching $3.4 billion.
OpenAI Partnership. OpenAI and Apple announce partnership to integrate ChatGPT into Apple experiences.
Generative Video Creation. Dream Machine enables users to create high-quality videos from simple text prompts such as 'a cute Dalmatian puppy running after a ball on the beach at sunset.'
Luma AI Launch. Luma AI has launched the public beta of its new AI video generation model, Dream Machine, which has garnered overwhelming user interest.
Apple AI Features. Apple has announced 'Apple Intelligence,' a suite of AI features for iPhone, Mac, and more at WWDC 2024.
Perplexity Controversy. Buzzy AI Search Engine Perplexity Is Directly Ripping Off Content From News Outlets.
Huawei's Chip Concerns. Huawei exec concerned over China's inability to obtain 3.5nm chips, bemoans lack of advanced chipmaking tools.
Waymo's Recall. Waymo issues software and mapping recall after robotaxi crashes into a telephone pole.
Reward Tampering Research. Sycophancy to subterfuge: Investigating reward tampering in language models.
Meta's AI Models. Meta releases flurry of new AI models for audio, text and watermarking.
Adept and Microsoft Deal. AI startup Adept is in deal talks with Microsoft.
OpenAI Revenue Growth. Report: OpenAI Doubled Annualized Revenue in 6 Months.
Claude 3.5 Release. Anthropic just dropped Claude 3.5 Sonnet with better vision and a sense of humor.
Runway Video Model. Runway unveils new hyper realistic AI video model Gen-3 Alpha, capable of 10-second-long clips.
Luma's Dream Machine. 'We don’t need Sora anymore': Luma’s new AI video generator Dream Machine slammed with traffic after debut.
New Apple Features. Apple Intelligence: every new AI feature coming to the iPhone and Mac.
Emotion Detection Controversy. AI-powered cameras in UK train stations, including London's Euston and Waterloo, used Amazon software to scan faces and predict emotions, age, and gender for potential advertising and safety purposes, raising concerns about privacy and reliability.
Claude 3.5 Sonnet Launch. Anthropic has launched its latest AI model, Claude 3.5 Sonnet, which it claims can match or surpass the performance of OpenAI’s GPT-4o or Google’s Gemini across a broad range of tasks.
AI Influencer Ads. AI-generated avatars are being introduced on TikTok for brands to use in ads, allowing for customization and dubbing in multiple languages.
AI Models Comparison. Fireworks AI releases Firefunction-v2, an open-source function-calling model designed to excel in real-world applications, rivaling high-end models like GPT-4o at a fraction of the cost and with superior speed and functionality.
Brave AI Enhancement. Brave's in-browser AI assistant, Leo, now incorporates real-time Brave Search results, providing more accurate and up-to-date answers.
Revenue Loss Estimate. The publishing industry is expected to lose over $10 billion due to such practices, according to Ameet Shah, partner and SVP of publisher operations and strategy at Prohaska Consulting.
Publisher Backlash. AI search startup Perplexity, backed by Jeff Bezos and other tech giants, is facing backlash from publishers like The New York Times, The Guardian, Condé Nast, and Forbes for allegedly circumventing blocks to access and repurpose their content.
Benchmark Test Performance. Claude 3.5 Sonnet excelled in benchmark tests, outscoring GPT-4o, Gemini 1.5 Pro, and Meta's Llama 3 400B in most categories.
AI-Generated Script Backlash. London premiere of AI-generated script film cancelled after backlash from audience and industry, highlighting ongoing debate over AI's role in the film industry.
Speed Improvement. The new model, which is available to Claude users on the web and iOS, and to developers, is said to be twice as fast as its predecessor and outperforms the previous top model, 3 Opus.
Gemini Side Panels. Google rolls out Gemini side panels for Gmail and other Workspace apps.
Voice Mode Delay. OpenAI delays rolling out its 'Voice Mode' to July.
AI News Summary. Our 172nd episode with a summary and discussion of last week's big AI news!
Collaboration Tools. Anthropic Debuts Collaboration Tools for Claude AI Assistant.
AI Music Lawsuits. Music labels sue AI music generators for copyright infringement.
AI Safety Bill. Y Combinator rallies start-ups against California's AI safety bill.
Stock Sale Policies. OpenAI walks back controversial stock sale policies, will treat current and former employees the same.
Advanced AI Chip. China's ByteDance working with Broadcom to develop advanced AI chip, sources say.
Figma AI Redesign. Figma announces big redesign with AI.
Waymo Robotaxis. Waymo ditches the waitlist and opens up its robotaxis to everyone in San Francisco.
ChatGPT for Mac. OpenAI's ChatGPT for Mac is now available to all users.
Ethical AI Positioning. Anthropic aims to enable beneficial uses of AI by government agencies, positioning itself as an ethical choice among rivals.
Gaming AI Capabilities. MIT robotics pioneer Rodney Brooks believes that people are overestimating the capabilities of generative AI and that it's flawed to assign human capabilities to it.
AI Scaling Myths. The belief that AI scaling will lead to artificial general intelligence is based on misconceptions about scaling laws, the availability of training data, and the limitations of synthetic data.
Formation Bio Investment. Formation Bio raises $372M in Series D funding to apply AI to drug development, aiming to streamline clinical trials and drug development processes.
Humanoid Robot Deployment. Agility Robotics' Digit humanoids have landed their first official job with GXO Logistics Inc., marking the industry's first formal commercial deployment of humanoids.
Google Translate Expansion. Google Translate has added 110 new languages, including Cantonese and Punjabi, bringing the total of supported languages to nearly 250.
AI Voice Imitations Controversy. Morgan Freeman expresses gratitude to fans for calling out unauthorized AI imitations of his voice, highlighting the growing issue of AI-generated voice imitations in the entertainment industry.
New Collaboration Tools. Anthropic has launched an update to enhance team collaboration and productivity, introducing a Projects feature that allows users to organize their interactions with Claude.
Kicking Off AI Usage. The company's expansion of its service to all San Francisco residents is seen as a crucial step towards the normalization of autonomous vehicles and a potential path to profitability for the historically money-losing operation.
Waymo Expansion. Waymo announced that its robotaxi service in San Francisco is now open to the public, eliminating the need for customers to sign up for a waitlist.
AI Music Lawsuits. Universal Music Group, Sony Music, and Warner Records have filed lawsuits against AI music-synthesis companies Udio and Suno, accusing them of mass copyright infringement.
Performance Improvement. CriticGPT has shown significant effectiveness, with human reviewers using CriticGPT performing 60% better in evaluating ChatGPT's code outputs than those without such assistance.
CriticGPT Introduction. OpenAI has introduced a new AI model, CriticGPT, designed to identify errors in the outputs of ChatGPT, an AI system built on the GPT-4 architecture.
China's AI Competition. The conversation includes China's competition in AI and its impacts.
AI Features Discussion. The episode covers emerging AI features and legal disputes over data usage.
Workforce Development. U.S. government addresses critical workforce shortages for the semiconductor industry with a new program.
Nvidia's Revenue. Nvidia is expected to make $12 billion from AI chips in China this year despite US controls.
AI Regulation Issues. With Chevron's demise, AI regulation seems dead in the water.
AI Video Fund. Bridgewater starts a $2 billion fund that uses machine learning for decision-making.
Runway's Gen 3 Alpha. Runway's Gen-3 Alpha AI video model is now available, but there’s a catch.
LLaMA 3 Release. Meta is about to launch its biggest LLaMA model yet, highlighting its significance.
Gemini 1.5 Launch. Google's release of Gemini 1.5, Flash and Pro with 2M tokens to the public.
Apple's Board Role. Apple Inc. has secured an observer role on OpenAI's board, with Phil Schiller, Apple's App Store head and former marketing chief, appointed to the position.
Integrating ChatGPT. This move follows Apple's announcement to integrate ChatGPT into its iPhone, iPad, and Mac devices.
AI Bias in Medical Imaging. AI models analyzing medical images can be biased, particularly against women and people of color, and while debiasing strategies can improve fairness, they may not generalize well to new patient populations.
Democratizing AI Access. Mozilla's Llamafile and Builders Projects were showcased at the AI Engineer World's Fair, emphasizing democratized access to AI technology.
Mind-reading AI Progress. AI can accurately recreate what someone is looking at based on brain activity, greatly improved when the AI learns which parts of the brain to focus on.
AI Model Evaluation Advocacy. Anthropic is advocating for third-party AI model evaluations to assess capabilities and risks, focusing on safety levels, advanced metrics, and efficient evaluation development.
AI Coding Startup Valuation. AI coding startup Magic seeks $1.5-billion valuation in new funding round, aiming to develop AI models for writing software.
AI Music Generation. Suno launches iPhone app — now you can make AI music on the go, which allows users to generate full songs from text prompts or sound.
New AI Model Release. Kyutai has open-sourced Moshi, a real-time native multimodal foundation AI model that can listen and speak simultaneously.
Security Flaw Discovered. OpenAI's ChatGPT macOS app was found to be storing user conversations in plain text, making them easily accessible to potential malicious actors.
Concerns Over AI Safety. OpenAI is facing safety concerns from employees and external sources, raising worries about the potential impact on society.
AI Lawsuits Implications. AI music lawsuits could shape the future of the music industry, as major labels sue AI firms for alleged copyright infringement.
AI Video Model Development. Odyssey is developing an AI video model that can create Hollywood-grade visual effects and allow users to edit and control the output at a granular level.
AI Health Coach Collaboration. OpenAI and Arianna Huffington are collaborating on an 'AI health coach' that aims to provide personalized health advice and guidance based on individual data.
FlashAttention-3 Efficiency. The results show that FlashAttention-3 achieves a speedup on H100 GPUs by 1.5-2.0 times with FP16 reaching up to 740 TFLOPs/s and with FP8 reaching close to 1.2 PFLOPs/s.
Antitrust Concerns. These changes occur amid growing antitrust concerns over Microsoft's partnership with OpenAI, with regulators in the UK and EU scrutinizing the deal.
Regulatory Scrutiny Reaction. Microsoft has relinquished its observer seat on the board of OpenAI, a move that comes less than eight months after it secured the non-voting position.
OpenAI Security Breach. In early 2022, a hacker infiltrated OpenAI's internal messaging systems, stealing information about the design of the company's AI technologies.
Perception of Progress Assessment. Despite the introduction of this system, there is no consensus in the AI research community on how to measure progress towards AGI, and some view OpenAI's five-tier system as a tool to attract investors rather than a scientific measurement of progress.
Advancements in AGI. OpenAI is reportedly close to reaching Level 2, or 'Reasoners,' which would be capable of basic problem-solving on par with a human with a doctorate degree.
Current AI Level. OpenAI's technology, such as GPT-4o that powers ChatGPT, is currently at Level 1, which includes AI that can engage in conversational interactions.
OpenAI's Five-Tier Model. OpenAI has introduced a five-tier system to track its progress towards developing artificial general intelligence (AGI).
AI Industry Challenges. We delve into the latest advancements and challenges in the AI industry, highlighting new features from Figma and Quora, regulatory pressures on OpenAI, and significant investments in AI infrastructure.
OpenAI and Health Coach. OpenAI and Arianna Huffington are working together on an 'AI health coach.'
Mind-Reading AI. Mind-reading AI recreates what you're looking at with amazing accuracy.
New AI Features. Figma pauses its new AI feature after Apple controversy.
AI-generated Content Labels. Vimeo joins YouTube and TikTok in launching new AI content labels.
Content Regulation Pressure. There is a need for transparency and regulation in AI content labeling and licensing.
AI Coding Startup. AI coding startup Magic seeks a $1.5-billion valuation in new funding round, sources say.
Elon Musk's GPU Plans. Elon Musk reveals plans to make the world's 'Most Powerful' 100,000 NVIDIA GPU AI cluster.
AMD Acquisition News. AMD plans to acquire Silo AI in a $665 million deal.
Regurgitation Process. The regurgitative process need not be verbatim.
Neural Nets Critique. Gary Marcus criticizes neural nets, stating, 'Neural nets don't really understand anything, they read on the web.'
Need for New Approach. Getting to real AI will require a different approach.
Understanding Proof. Partial regurgitation, no matter how fluent, does not, and will not ever, constitute genuine comprehension.
AI's Limitations. LLMs are great at clustering similar things but 'regurgitating a lot of words with slight paraphrases while adding conceptually little, and understanding even less.'
Partial Regurgitation Defined. The term 'partial regurgitation' is introduced to describe AI's output not being a full reconstruction of the original source.
Storage of Weights. Neural nets do store weights, but that doesn't mean that they know what they are talking about.
Financial Priorities. Instead, they appear to be focused precisely on financial return, and appear almost indifferent to some the ways in which their product has already hurt large numbers of people (artists, writers, voiceover actors, etc).
OpenAI's Mission. As recently as November 2023, OpenAI promised in their filing as a nonprofit exempt from income tax to make AI that that 'benefits humanity … unconstrained by a need to generate financial return'.
Future of AI. Gary Marcus hopes that the most ethical company wins. And that we don’t leave our collective future entirely to self-regulation.
Ethical Concerns. The real issue isn’t whether OpenAI would win in court, it’s what happens to all of us, if a company with a track record for cutting ethical corners winds up first to AGI.
Comparison to DeepMind. By comparison, GoogleDeepMind devotes a lot of its energy towards projects like AlphaFold that have clear potential to help humanity.
Safety Resources. Furthermore, OpenAI apparently hasn’t even fulfilled their own promises to devote 20% resources to AI safety.
Product Focus. The first step towards that should be a question about product – are the products we are making benefiting humanity?
Copyright Issues. OpenAI has trained on a massive amount of copyrighted material, without consent, and in many instances without compensation.
Call for Independent Oversight. Without independent scientists in the loop, with a real voice, we are lost.
Questioning Government Trust. It's correct for the public to take everything OpenAI says with a grain of salt, especially because of their massive power and chance to potentially put humanity at risk.
Tax Status Conflict. OpenAI filed for non-profit tax exempt status, claiming that the company's mission was to 'safely benefit humanity', even as they turn over almost half their profits to Microsoft.
Governance Promises Broken. Altman once promised that outsiders would play an important role in the company's governance; that key promise has not been kept.
Restrictive Employee Contracts. OpenAI had highly unusual contractual 'clawback' clauses designed to keep employees from speaking out about any concerns about the company.
Unmet Safety Promises. OpenAI promised to devote 20% of its efforts to AI safety, but never delivered, according to a recent report.
Altman's Conflicts of Interest. Altman appears to have misled people about his personal holdings in OpenAI, omitting potential conflicts of interest between his role as CEO of the nonprofit OpenAI and other companies he might do business with.
CTO's Miscommunication. CTO Mira Murati embarrassed herself and the company in her interview with Joanna Stern of the Wall Street Journal, sneakily conflating 'publicly available' with 'public domain'.
Misuse of Artist's Voice. OpenAI proceeded to make a Scarlett Johansson-like voice for GPT-4o, even after she specifically told them not to, highlighting their overall dismissive attitude towards artist consent.
OpenAI's Misleading Name. OpenAI called itself open, and traded on the notion of being open, but even as early as May 2016 knew that the name was misleading.
Governance Representation. Sam Altman, 2016: 'We’re planning a way to allow wide swaths of the world to elect representatives to a new governance board.'
Accountability Reminder. Gary Marcus keeps receipts.
Questioning Authority. What happened to the wide swaths of the world? To quote Altman himself, 'Why do these fuckers get to decide what happens to me?'
.
Toner's Whistleblowing. Toner was pushed out for her sin of speaking up.
Firing Consideration. The board had contemplated firing Sam over trust issues before that.
ChatGPT Announcement. The board was not informed in advance about that [ChatGPT], we learned about ChatGPT on Twitter.
Safety Process Inaccuracy. Multiple occasions he gave inaccurate information about the small number of formal safety processes that the company did have in place.
Oversight Concerns. Altman is consolidating more and more power and seeming less and less on the level.
Sam's Deceit. Putting Toner's disclosures together with the other lies from OpenAI that I documented the other day, I think we can safely put Kara's picture of Sam the Innocent to bed.
Conflict of Interest. Sam has now divested his stake in that investment firm.
Trust Issues. If they can't trust Altman, I don't see they can do their job.
Nonprofit Status. If they cannot assemble a board that respects the legal filings they made, and cannot behave in keeping with their oft-repeated promises, they must dissolve the nonprofit.
Lack of Candor. The (old) board never said that the firing of Sam was directly about safety, they said it was about candor.
Misleading Claims. Both read to me as deeply misleading, verging on defamatory.
Lack of Trust. The degree to which they diverted from that core issue that led to Sam's firing is genuinely disturbing.
ChatGPT Announcement. The board was not informed in advance about that. We learned about ChatGPT on Twitter.
Board Attacks. At least two proxies have gone after Helen Toner, one in The Economist, highbrow, one low (a post on X that got around 200,000 views).
Time's Ravages. What I said then to Bach still holds, 100%, 26 months later.
Longstanding Warnings. Gary Marcus has warned people about the limits of deep learning, including hallucinations, since 2001.
Musk's Shift. Musk has switched teams, flipping from calling for a pause to going all in on a technology that remains exactly as incorrigible as it ever was.
Alignment Problem. We are no closer to a solution to the alignment problem now than we were then.
Unmet Expectations. For all the daily claims of 'exponential progress', reliability is still a dream.
Deep Learning Critique. The ridicule started with my infamous 'Deep Learning is Hit a Wall' essay.
Financial Conflicts. The Wall Street Journal had a long discussion of Altman’s financial holdings and possible conflicts of interest.
Musk-LeCun Tension. Yann LeCun just pushed Elon Musk to the point of unfollowing him.
Kara Swisher's Bias. Paris Marx echoed my own feelings about Kara Swisher’s apparent lack of objectivity around Altman.
Slowing Innovation. Christoper Mims echoed a lot of what I have been arguing here largely, writing that 'The pace of innovation in AI is slowing, its usefulness is limited, and the cost of running it remains exorbitant.'
No Breakthroughs. It has been almost two years since there’s been a bona fide GPT-4-sized breakthrough, despite the constant boasts of exponential progress.
Lackluster Fireside Chat. Melissa Heikkilä at Technology Review more or less panned Altman’s recent fireside chat at AI for Good.
Bad Press for Altman. The bad press about Sam Altman and OpenAI, who once seemingly could do no wrong, just keeps coming.
Key Contributors. The letter itself, cosigned by Bengio, Hinton, and Russell.
Informed Endorsement. I fully endorse its four recommendations.
Gift Link Provided. Roose supplied a gift link.
Common Sense Emphasis. Nowadays we both stress the absolutely essential nature of common sense, physical reasoning and world models, and the failure of current architectures to handle those well.
Future AI Development. If you want to argue that some future, as yet unknown form of deep learning will be better, fine, but with regards to what exists and is popular now, your view has come to mirror my own.
Critique Overlap. Your current critique for what is wrong with LLMs overlaps heavily with what I said repeatedly from 2018 to 2022.
Potential Alliance. The irony of all of this is that you and I are among the minority of people who have come to fully understand just how limited LLMs are, and what we need to do next. We should be allies.
Historical Dismissals. There is a clear pattern: you often initially dismiss my ideas, only to converge on the same place later — without ever citing my earlier arguments.
Funding Decline. Generative AI seed funding drops.
Underprepared for AGI. We are woefully underprepared for AGI whenever it comes.
Read Marcus's Book. Gary Marcus wrote his new book Taming Silicon Valley in part for the reason of addressing regulatory issues.
Regulatory Failure. Self-regulation is a farce, and the US legislature has made almost no progress thus far.
Data Point Validity. Every data point there is imaginary; we aren’t plotting real things here.
Graph Issues. The double Y-axis makes no sense, and presupposes its own conclusion.
GPT-4 Comparisons. GPT-4 is not actually equivalent to a smart high schooler.
AGI Prediction. OpenAI's internal roadmap alleged that AGI would be achieved by 2027.
Proposed Bill SB-1047. State Senator Scott Wiener and others in California have proposed a bill, SB-1047, that would build in some modest restrains around AI.
Serious Damage Definition. Hazardous is defined here as half a billion dollars in damage; should we give that AI industry a free pass no matter how much harm might be done?
Regulation vs. Innovation. The Information's op-ed complains that 'California's effort to regulate AI would stifle innovation', but never really details how.
Demand for Stronger Regulation. We should be making SB-1047 stronger, not weaker.
Concern over Liability. Andrew Ng complains that the bill defines an unreasonable 'hazardous capability' designation that may make builders of large AI models liable if someone uses their models to do something that exceeds the bill's definition of harm.
Self-Regulation Skepticism. Big Tech's overwhelming message is 'Trust Us'. Should we?
Certification Requirements. Anyone training a 'covered AI model' must certify, under penalty of perjury, that their model will not be used to enable a 'hazardous capability' in the future.
Industry Pushback. Both the well-known deep-learning expert Andrew Ng and the industry newspaper The Information came out against 1047 in vigorous terms.
Regulatory Support Lack. Not one of the companies that previously stood up and said they support AI regulation is standing up for this one.
OpenAI's CTO Admission. OpenAI's CTO Mira Murati acknowledged that there is no mind blowing GPT-5 behind the scenes as of yet.
Kurzweil's Prediction. Ray Kurzweil confirmed he has not revised and not redefined his prediction of AGI, still believing that will happen by 2029.
Future Expectations. Expect more revisionism and downsized expectations throughout 2024 and 2025.
Expectations for LLMs. The ludicrously high expectations from the last 18 ChatGPT-drenched months were never going to be met.
Kurzweil's New Projection. In an interview published in WIRED, Kurzweil let his predictions slip back, for the first time, to 2032.
Public Predictions. Nobody to my knowledge has kept systematic track of the predictions, but I took a quick and somewhat random look at X and had no trouble finding many predictions, going back to 2023, almost always optimistic.
Hallucination Concerns. Gary Marcus is still betting that GPT-5 will continue to hallucinate and make a bunch of wacky errors, whenever it finally drops.
Future Predictions Meme. Now arriving Gate 2024, Gate 2025, ... Gate 2026.
New Meme Observed. By now there’s actually a new meme in town. This one’s got even more views.
Confidence in Predictions. A lot of them got tons of views... What stands out the most, maybe, is the confidence with which a lot of them were presented.
GP-5 Training Status. Sam just a few weeks ago officially announced that they had only just started training GPT-5.
CTO Statement. Mira Murati promised we’d someday see 'PhD-level' models, the next big advance over today’s models, but not for another 18 months.
Delayed GPT-5 Arrival. Today is June 20 and I still don’t see squat. It would now appear that Business Insider’s sources were confused, or overstating what they knew.
AGI Prediction Clarification. Ray Kurzweil confirmed he has not revised and not redefined his prediction of AGI, still defined as AI that can perform any cognitive task an educated human can, and still believes that will happen by 2029.
Opposing Views on AGI. Gary Marcus stands by his own prediction that we will not see AGI by 2029, per criteria he discussed here.
Debate Potential. Ray Kurzweil and Gary Marcus talked about having a debate, which they hope will come to pass.
Interpretation Misunderstanding. Gary Marcus misunderstood Ray Kurzweil to be revising his prediction for AGI to a later year (perhaps 2032).
Reality Check Needed. We need a President who can sort truth from bullshit, in order to develop AI policies that are grounded in reality.
Corporate Promises. We need a President who can recognize when corporate leaders are promising things far beyond what is currently realistic.
Tech Hype Shift. The big tech companies are hyping AI with long term promises that are impossible to verify.
Presidential Understanding. We cannot afford to have a President in 2024 that doesn't fully grasp this.
Future AI Changes. AI is going to change everything, if not tomorrow, sometime over the next 5-20 years, some ways for good, some for bad.
Current AI Errors. Businesses are finally finding this out, too. (Headline in WSJ: 'AI Work Assistants Need a Lot of Handholding', because they are still riddled with errors.)
AI Limitations. Generative AI does in fact (still) have enormous limitations, just as I anticipated.
Debate Performance. Former President (and convicted felon) Donald Trump lied like an LLM last night, but still won the debate, because Biden's delivery was so weak.
AI Ignored. Neither president even mentioned AI, which was a travesty of a different sort.
Starting Point. Gary Marcus thinks we have maybe one shot to get AI policy right in the US, and that we aren't off to a great start.
Understanding Science. Above all else, we need a President who understands and appreciates science.
Urgent AI Policies. We need a President who can get Congress to recognize the true urgency of the moment, since Executive Orders alone are not enough.
Call for Metacognition. Scaling is not the most interesting dimension; instead, we need techniques, such as metacognition, that can reflect on what is needed and how to achieve it.
Hope for Change. Gary Marcus hopes that people will take what Gates said seriously.
Neurosymbolic AI's Potential. Neurosymbolic AI has long been an underdog; in the end, I expect it to come from behind and be essential.
Importance of Symbols. I don’t think metacognition can work without bringing explicit symbols back into the mix; they seem essential for high-level reflection.
Funding Concerns. Spending upwards of 100 billion dollars on the current approach seems wasteful if it's unlikely to get to AGI or ever be reliable.
Skepticism on AGI. Many tech leaders have discovered that the best way to raise valuations is to hint that AGI is imminent.
Need for Robust Software. Tech giants need serious commitment to software robustness.
Distress Over Regulation. Gary Marcus is deeply distressed that certain tech leaders and investors are putting massive support behind the presidential candidate least likely to regulate software.
AI Regulation Concerns. An unregulated AI industry is a recipe for disaster.
Shortsighted Innovation. Rushing innovative tech without robust foundations seems shortsighted.
Generative AI Limitations. Leaving more and more code writing to generative AI, which grasps syntax but not meaning, is not the answer.
Black Box AI Issues. Chasing black box AI, difficult to interpret, and difficult to debug, is not the answer.
AI Engineering Techniques. As Ernie Davis and I pointed out in Rebooting AI, five years ago, part of the reason we are struggling with AI in complex AI systems is that we still lack adequate techniques for engineering complex systems.
Structural Integrity Lacking. Twenty years ago, Alan Kay said 'Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no structural integrity, but just done by brute force and thousands of slaves.'
Software Reliability Needed. The world needs to up its software game massively. We need to invest in improving software reliability and methodology, not rushing out half-baked chatbots.
Getting Started with Prompt Testing. Integrating prompt testing into your development workflow is easy.
Integrating Prompt Testing. By running prompt tests regularly, we can catch issues early and ensure that prompts continue to perform well as you make changes and as the underlying LLMs are updated.
Evaluating LLM Outputs. Promptfoo offers various ways to evaluate the quality and consistency of LLM outputs.
Time Savings. Prompt testing saves time in the long run by catching bugs early and preventing regressions.
Introduction to Prompt Testing. Prompt testing is a technique specifically designed for testing LLMs and generative AI systems, allowing developers to write meaningful tests and catch issues early.
Testing Necessity. New LLM models are released, existing models are updated, and the performance of a model can shift over time.
Importance of Testing. LLMs can generate nonsensical, irrelevant, or even biased responses.
Newsletter Growth. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Expert Contributions. In the series Guests, I will invite these experts to come in and share their insights on various topics that they have studied/worked on.
Conclusion on Testing. Prompt testing provides a way to write meaningful tests for these systems, helping catch issues early and save significant time in the development process.
Overemphasis on Models. A common mistake that teams make is to overemphasize the importance of models and underestimate how much the addition of simple features can contribute to performance.
MLOps Investment. Investing in MLOps enables the development of 10x teams, which are more powerful in the long run.
ML Engineer Tasks. ML engineers engage in four key tasks: data collection and labeling, feature engineering and model experimentation, model evaluation and deployment, and ML pipeline monitoring and response.
Sustaining Model Performance. Maintaining models post-deployment requires deliberate practices such as frequent retraining on fresh data, having fallback models, and continuous data validation.
Simplicity in Models. Prioritizing simple models and algorithms over complex ones can simplify maintenance and debugging while still achieving desired results.
Product-Centric Metrics. Evaluate models based on metrics aligned with business goals, such as click-through rate or user churn, to ensure they deliver tangible value.
Dynamic Validation. Continuously update validation datasets to reflect real-world data and capture evolving patterns, ensuring accurate performance assessments.
Active Model Evaluation. Keeping models effective requires active and rigorous evaluation processes.
Three Vs of MLOps. Success in MLOps hinges on three crucial factors: Velocity, Validation, and Versioning.
MLOps Importance. Organizations often underestimate the importance of investing in the right MLOps practices.
Frequent Retraining. Regularly retraining models on fresh, labeled data helps mitigate performance degradation caused by data drift and evolving user behavior.
Collaborative Success. Successful project ideas often stem from collaboration with domain experts, data scientists, and analysts.
Anti-Patterns in MLOps. Several anti-patterns hinder MLOps progress, including the mismatch between industry needs and classroom education.
Documenting Knowledge. To avoid this, prioritize documentation, knowledge sharing, and cross-training.
Tribal Knowledge Risks. Undocumented Tribal Knowledge can create bottlenecks and dependencies, hindering collaboration.
Reducing Alert Fatigue. Focus on Actionable Alerts: Prioritize alerts that indicate real problems requiring immediate attention.
Alert Fatigue Awareness. A common pitfall in data quality monitoring is alert fatigue.
Data Leakage Prevention. Thorough Data Cleaning and Validation: Scrutinize your data for inconsistencies, missing values, and potential leakage points.
Risks with Jupyter Notebooks. Notebooks allow you to trade simplicity + velocity for quality.
Tools and Experience. Engineers like tools that enhance their experience.
Streamline Deployments. Streamlining deployments and tools that predict end-to-end gains could minimize wasted effort.
Long Tail of ML Bugs. Debugging ML pipelines presents unique challenges due to the unpredictable and often bespoke nature of bugs.
Handling Data Errors. These can be addressed by developing/buying tools for real-time data quality monitoring and automatic tuning of alerting criteria.
Data Error Handling. ML engineers face challenges in handling a spectrum of data errors, such as schema violations, missing values, and data drift.
Development-Production Mismatch. There are discrepancies between development and production environments, including data leakage; differing philosophies on Jupyter Notebook usage; and non-standardized code quality.
ML Engineering Tasks. The 4 major tasks that an ML Engineer works on.
Audience Engagement. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Machine Learning Breakdown. In my series Breakdowns, I go through complicated literature on Machine Learning to extract the most valuable insights.
AI Made Simple Community. We started an AI Made Simple Subreddit.
Saudi Arabia's Neom Project. The Saudi government had hoped to have 9 million residents living in 'The Line' by 2030, but this has been scaled back to fewer than 300,000.
Fractal Molecule Discovery. Researchers from Germany, Sweden, and the UK have discovered an enzyme produced by a single-celled organism that can arrange itself into a fractal.
Software Design Principles. During the design and implementation process, I found that the following list of 'rules' kept coming back up over and over in various scenarios.
C*-Algebraic ML. Looks like more and more people are looking to integrate Complex numbers into Machine Learning.
Generative AI Insights. Some really good insights on building Gen AI LinkedIn.
LLM Reading Notes. The May edition of my LLM reading note is out.
Drug Design Transformation. We hope AlphaFold 3 will help transform our understanding of the biological world and drug discovery.
AlphaFold 3 Predictions. In a paper published in Nature, we introduce AlphaFold 3, a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy.
Spotlight on Aziz. Mohamed Aziz Belaweid writes the excellent, 'Aziz et al. Paper Summaries', where he summarizes recent developments in AI.
AI Education Support. Your generosity is crucial to keeping our cult free and independent- and in helping me provide high-quality AI Education to everyone.
AlphaFold 3 Innovation. Google's AlphaFold 3 is gaining a lot of attention for its potential to revolutionize bio-tech. One of the key innovations that led to its performance gains over previous methods was its utilization of diffusion models.
Efficient Time Series Imputation. CSDI, using score-based diffusion models, improves upon existing probabilistic imputation methods by capturing temporal correlations.
Emerging LLM Techniques. Microsoft's GENIE achieves comparable performance with state-of-the-art autoregressive models and generates more diverse text samples.
Language Processing Potential. Text Diffusion might be the next frontier of LLMs, at least for specific types of tasks.
Application in Medical Imaging. Diffusion models have shown great promise in reconstructing Medical Images.
Step-by-Step Control. The step-by-step generation process in diffusion models allows users to exert greater control over the final output, enabling greater transparency.
Versatility of DMs. Diffusion models are applicable to a wide range of data modalities, including images, audio, molecules, etc.
High-Quality Generation. Diffusion models generate data with exceptional quality and realism, surpassing previous generative models in many tasks.
Diffusion Models Explained. Diffusion Models are generative models that follow 2 simple steps: First, we destroy training data by incrementally adding Gaussian noise. Training consists of recovering the data by reversing this noising process.
Greenwashing Example. Europe’s largest oil and gas company Shell was accused of selling millions of carbon credits tied to CO2 removal that never took place.
Share Interesting Content. The goal is to share interesting content with y’all so that you can get a peek behind the scenes into my research process.
Venture Capital Overview. A great overview by Rubén Domínguez Ibar about how Venture Capital make decisions.
Meta Llama-3 Release. Our first agent is a finetuned Meta-Llama-3-8B-Instruct model, which was recently released by Meta GenAI team.
Deep Learning Method Spotlight. The DSDL framework significantly outperforms other dynamical and deep learning methods.
Fungal Computing Potential. Unlock the secrets of fungal computing! Discover the mind-boggling potential of fungi as living computers.
Gaming and Chatbots. Limited Risk AI Systems like chatbots or content generation require transparency to inform users they are interacting with AI.
High-Risk AI Systems. High-Risk AI Systems are involved in critical sectors like healthcare, education, and employment, where there's a significant impact on people's safety or fundamental rights.
AI Regulation Insight. The regulation is primarily based on how risky your use case is rather than what technology you use.
Upcoming Articles Preview. Curious about what articles I’m working on? Here are the previews for the next planned articles.
Community Spotlight Resource. Kiki's Bytes is a super fun YouTube channel that covers various System Design case studies.
Pay What You Can. We follow a 'pay what you can' model, which allows you to support within your means.
Credit Scoring Adaptation. Factors that predicted high creditworthiness a few years ago might not hold true today due to changing economic conditions or consumer behavior.
Neural Networks Versatility. Thanks to their versatility, Neural Networks are a staple in most modern Machine Learning pipelines.
Evolving Language Models. Language Models trained on social media data need to adapt to constantly evolving language use, slang, and emerging topics.
Simplifying Data Augmentation. Before you decide to get too clever, consider the statement from TrivialAugment- the simplest method was so-far overlooked, even though it performs comparably or better.
Gradient Reversal Layer. The gradient reversal layer acts as an identity function during the forward pass but reverses gradients during backpropagation, creating a minimax game between the feature extractor and the domain classifier.
Impact on Sentiment Analysis. Our experiments on a sentiment analysis classification benchmark... show that our neural network for domain adaption algorithm has better performance than either a standard neural network or an SVM.
Adversarial Training Process. Domain-Adversarial Training (DAT) involves training a neural network with two competing objectives: to accurately perform the main task and to confuse a domain classifier that tries to distinguish between source and target domain data.
The Role of DANN. DANNs theoretically attain domain invariance by learning domain-invariant features.
Mitigating Distribution Shift. Good data + adversarial augmentation + constant monitoring works wonders.
Sources of Distribution Shift. Possible sources of distribution shift include sample selection bias, non-stationary environments, domain adaptation challenges, data collection and labeling issues, adversarial attacks, and concept drift.
Understanding Distribution Shift. Distribution shift, also known as dataset shift or covariate shift, is a phenomenon in machine learning where the statistical distribution of the input data changes between the training and deployment environments.
Improving Generalization. There are several ways to improve generalization such as implementing sparsity and/or regularization to reduce overfitting and applying data augmentation to mithridatize your models.
Challenges in Neural Networks. There are several underlying issues with the training process that scale does not fix, chief amongst them being distribution shift and generalization.
Social Media Awareness. Epicurean philosophy is a good reminder to keep vigilant about how we’re being influenced by the constant subliminal messaging and to only pursue the pleasures that we want for ourselves.
Reading Recommendation. The plan is to do one of these a month as a special reading recommendation.
Happiness Through Simplicity. True happiness doesn’t come from endlessly chasing pleasure, but from systematically eliminating the sources of our unhappiness.
Self-Reflection Necessity. A good community directly benefits self-reflection.
Community and Introspection. Epicurus encouraged his followers to form close-knit communities that allow their members to step back and help each other critically analyze the events around them.
Friendship Statistics. People with no friends or poor-quality friendships are twice as likely to die prematurely, according to Holt-Lunstad's meta-analysis of more than 308,000 people.
Friendship Importance. Epicurus has a particularly strong emphasis on the importance of friendship as a must for a happy life.
Epicurean Philosophy. Epicurean philosophy is based on a simple supposition: we are happy when we remove the things that make us unhappy.
Next-Gen Embeddings. Today we will primarily be looking at 4 publications to look at how we can improve embeddings by exploring a dimension that has been left untouched- their angles.
Greater Performance Gains. AnglE consistently outperforms SBERT, achieving an absolute gain of 5.52%.
AnglE Optimization. AnglE optimizes not only the cosine similarity between texts but also the angle to mitigate the negative impact of the saturation zones of the cosine function on the learning process.
Contrastive Learning Impact. Contrastive Learning encourages similar examples to have similar embeddings and dissimilar examples to have distinct embeddings.
Modeling Relations. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space uses complex numbers for knowledge graph embedding.
Complex Geometry Advantage. The complex plane provides a richer space to capture nuanced relationships and handle outliers.
Orthogonality Benefits. Orthogonality helps the model to capture more nuanced relationships and avoid unintended correlations between features.
Angular Representation. Focusing on angles rather than magnitudes avoids the saturation zones of the cosine function, enabling more effective learning and finer semantic distinctions.
Saturation Zones. The saturation zones of the cosine function can kill the gradient and make the network difficult to learn.
Challenges in Embeddings. Current Embeddings are held back by three things: Sensitivity to Outliers, Limited Relation Modeling, and Inconsistency.
Enhancing NLP. Good Embeddings allow three important improvements: Efficiency, Generalization, and Improved Performance.
LLMs Hitting Wall. This is what leads to the impression that "LLMs are hitting a wall".
Critical Flaws. Such developments have 3 inter-related critical flaws: They mostly work by increasing the computational costs of training and/or inference, they are a lot more fragile than people realize, and they are incredibly boring.
Research Areas. A lot of current research focuses on LLM architectures, data sources prompting, and alignment strategies.
Client Payment Process. I am monetizing this newsletter through my employer- SVAM International (USA Work Laws bar me from taking money from anyone who is not my employer).
Change in Payout Schedule. I’ve switched the payout schedule to monthly to ensure that I always have a buffer in my Stripe Account to handle issues like this.
Mental Space for Writing. Writing/Research takes a lot of mental space, and I don’t think I could do a good job if I was constantly firefighting these issues.
Communication Efforts. I have started communication with both the reader, my company, and Stripe/Bank.
Long Review Process. I have been told the review by the bank could take up to 3 months.
Stripe's Negative Balance Policy. Stripe does not let you use future deposits to settle balances, which makes sense from their perspectives but leaves me in this weird situation.
Stripe Payouts Paused. Due to all of this, Stripe has paused all my payouts.
Financial Loss. I lose money on every fraud claim. In this case, Stripe has removed 70 USD from my Stripe account: 50 for the base plan + 20 in fees.
Fraudulent Claim Issue. Unfortunately, one of the readers missed this. They signed up for a 50 USD/year plan and marked that transaction as fraudulent, causing complications.
Indefinite Pause. AI Made Simple will be going on an indefinite pause now.
KAN Overview. This article will explore KANs and their viability in the new generation of Deep Learning.
Kolmogorov-Arnold Representation. The KART states that any continuous function with multiple inputs can be created by combining simple functions of a single input (like sine or square) and adding them together.
Educational Importance. Even if we find fundamental limitations that make KANs useless, studying them in detail will provide valuable insights.
Grid Extension Technique. The grid extension technique allows KANs to adapt to changes in data distribution by increasing the grid density during training.
Spline Usage. KANs use B-splines to approximate activation functions, providing accuracy, local control, and interpretability.
Interactive KANs. Users can collaborate with KANs through visualization tools and symbolic manipulation functionalities.
Explainability Benefits. KANs are more explainable, which is a big plus for sectors where model transparency is critical.
Accuracy of KANs. KANs can achieve lower RMSE loss with fewer parameters compared to MLPs for various tasks.
Performance and Training. KAN training is 10x slower than NNs which may limit their adoption in more mainstream directions that are dominated by scale.
Sparse Compositional Structures. A function has a sparse compositional structure when it can be built from a small number of simple functions, each of which only depends on a few input variables.
KAN Advantages. KANs use learnable activation functions on edges, which makes them more accurate and interpretable, especially useful for functions with sparse compositional structures.
Need for Public Dialogue. Encouraging open dialogue and debate fosters critical thinking, raising awareness about oppression and empowering individuals to resist manipulation.
Challenge Comfort with Beliefs. Having good-faith conversations and the willingness to challenge deeply held beliefs is essential to fight dogma and ensure a society of free individuals.
AI Structural Concerns. The push for AI alignment by corporations may suppress inconvenient narratives, illustrating a paternalistic approach to technology.
Technology and Risk. The lack of risk judgment and decision-making training is prevalent across roles and professions that most need it, revealing gaps in corporate risk management.
Current Gen Z Struggles. 67% of people 18 to 34 feel 'consumed' by their worries about money and stress, making it hard to focus, as part of the Gen Z mental health crisis.
Societal Symptoms. Being 'busy with work' has become a default way for people to spend their time, symptomatic of what Arendt called the 'victory of the animal laborans.'
Banality of Evil. Arendt argued that Adolf Eichmann's participation in the Holocaust was driven by thoughtlessness and blind obedience to authority, reflecting the concept of 'Banality of Evil.'
Totalitarianism Origins. Arendt argued that totalitarianism was a new form of government arising from the breakdown of traditional society and an increasingly ungrounded populace.
The Active Life Components. Hannah Arendt broke life down into 3 kinds of activities: Labor, Work, and Action, emphasizing that modern society deprioritizes the latter two.
Hannah Arendt Insights. Hannah Arendt was a 20th-century political theorist, well known for her thoughts on the nature of evil, the rise of totalitarianism, and her strong emphasis on the importance of living the 'active life.'
Red-teaming Purpose. Red-teaming/Jailbreaking is a process in which AI people try to make LLMs talk dirty to them.
ACG Effectiveness. In the time that it takes ACG to produce successful adversarial attacks for 64% of the AdvBench set, GCG is unable to produce even one successful attack.
ACG Methodology. The Accelerated Coordinate Gradient (ACG) attack method combines algorithmic insights and engineering optimizations on top of GCG to yield a ~38x speedup.
Haize Labs Automation. Haize Labs seeks to rigorously test an LLM or agent with the purpose of preemptively discovering all of its failure modes.
Shift in Gender Output. The base model generates approximately 80% male and 20% female customers while the aligned model generates nearly 100% female customers.
Bias Distribution Changes. The alignment process would likely create new, unexpected biases that were significantly different from your baseline model.
Lower Output Diversity. Aligned models exhibit lower entropy in token predictions, form distinct clusters in the embedding space, and gravitate towards 'attractor states', indicating limited output diversity.
LLM Understanding. People often underestimate how little we understand about LLMs and the alignment process.
Adversarial Attack Generalization. The attack didn’t apply to any other model (including the base GPT).
High Cost of Red-teaming. Good red-teaming can be very expensive since it requires a combination of domain expert knowledge and AI person knowledge for crafting and testing prompts.
Low Safety Checks. Many of them are too dumb: The prompts and checks for what is considered a 'safe' model is too low to be meaningful.
Subscriber Growth. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
TechBio Resources. We have a strong bio-tech focus this week b/c of all my reading into that space.
Legal AI Evaluation. We argue that this claim is not supported by the current evidence, diving into AI’s roles in various legal tasks.
Python Precision Issues. Python compares the integer value against the double precision representation of the float, which may involve a loss of precision, causing these discrepancies.
Model Performance Challenge. We demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the-art models trained at the largest available scales.
Deep Learning Insight. This paper presents a framework, HypOp, that advances the state of the art for solving combinatorial optimization problems in several aspects.
AI-Relations Trend. The ratio of people who reach out to me for AIRel vs ML roles has gone up significantly over the last 2–3 months.
Community Engagement. If you/your team have solved a problem that you’d like to share with the rest of the world, shoot me a message and let’s go over the details.
Reading Inspired. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc I came across each week.
Content Focus. While the focus will be on AI and Tech, the ideas might range from business, philosophy, ethics, and much more.
GPU Efficiency. We also provide a GPU-efficient implementation of this model which reduces memory usage by up to 61% over an unoptimized baseline during training.
Training Efficiency Improvements. To counteract smaller gradients due to ternary weights, larger learning rates than those typically used for full-precision models should be employed.
Learning Rate Strategy. For the MatMul-free LM, the learning dynamics necessitate a different learning strategy, maintaining the cosine learning rate scheduler and then reducing the learning rate by half.
Matrix Multiplication Bottleneck. Matrix multiplications (MatMul) are a significant computational bottleneck in Deep Learning, and removing them enables the creation of cheaper, less energy-intensive LLMs.
Memory Transfer Optimization. The Fused BitLinear Layer eliminates the need for multiple data transfers between memory levels, significantly reducing overhead.
Fused BitLinear Layer. The Fused BitLinear Layer combines operations and reduces memory accesses, significantly boosting training efficiency and lowering memory consumption.
Linear Layer Efficiency. Replacing non-linear operations with linear ones can boost your parallelism and simplify your overall operations.
Simplified Operations. The secret to their great performance rests on a few innovations that follow two major themes- simplifying expensive computations and replacing non-linearities with linear operations.
Cost Reduction Strategies. The core idea includes restricting weights to the values {-1, 0, +1} to replace multiplications with simple additions or subtractions.
Performance Comparison. MatMul-Free LLMs (MMF-LLMs) achieve performance on-par with state-of-the-art Transformers that require far more memory during inference at a scale up to at least 2.7B parameters.
Generational Perspective. I am a Gen Z kid who grew up with technology.
Emotional Intelligence. Develop VCSAs to incorporate emotional intelligence to enhance user engagement and satisfaction.
Control Mechanisms. Ensure that VCSAs include features that give users a sense of control and the ability to communicate successfully with their devices.
Design for Imperfection. Design VCSAs to exhibit some level of imperfection to create relaxed interactions.
Managerial Implications. Encourage Partner-like interactions: use speech acts and algorithms to promote the perception of VCSAs as partners.
Partner Relationship. The perception of the relationship with the VCSA as a real partner attributes a distinct personality to the VCSA, making it an appealing entity.
Master Relationship. Some perceived the VCSA as a master, feeling like servants bound by its rules and unpredictable nature.
Servant Relationship. Young consumers frequently envisioned their VCSA as a servant that helps consumers realize their tasks.
Types of Relationships. From the results of the study three different relationships emerge: servant-master dynamic, dominant entity, and equal partners.
Controls and Preferences. Consumers may relate to anthropomorphized products either as others or as extensions of their self.
Self-extension Theory. If you think about the influence that particularly valuable products have on you, you increasingly consider them extensions of yourself.
Uncanny Valley. The Uncanny Valley represents clearly how different degrees of anthropomorphism can change our feelings and attitudes toward technologies and AI assistants.
Anthropomorphism Effects. Evidence shows that anthropomorphized products can enhance consumer preference, make products appear more vivid, and increase their perceived value.
Anthropomorphism Concept. Today's scholars focus on the broad concept of anthropomorphism: essentially, it is humans' tendency to perceive humanlike agents in nonhuman entities and events.
VCSAs Definition. Alexa, Google Home, and similar devices fall into the category of so-called 'voice-controlled smart assistants' (VCSAs).
Marriage Proposals. A good portion of those even said they would marry her.
Alexa Love. Amazon reported that half a million people told Alexa they loved her.
Human-like Interactions. When we interact with devices like Alexa or Google Home, we have different ways of thinking about ourselves and we relate to them differently from other people.
Skepticism on Technology. While I can’t imagine my life without tech, most of the activities that I enjoy are physical that would be very hard to simulate adequately.
AI-Human Relationship. The AI-human relationship dynamic is not something that I know much about.
Weekly Reach. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
AI Expertise Invitation. In the series Guests, I will invite these experts to come in and share their insights on various topics that they have studied/worked on.
Choco Milk Cult. Our chocolate milk cult has a lot of experts and prominent figures doing cool things.
Generative AI Commercialization Struggles. Close to 2 years since the release of ChatGPT, organizations have struggled to commercialize on the promise of the Generative AI.
Data Contextuality in Healthcare Algorithms. A bombshell study found that a clinical algorithm many hospitals were using to decide which patients need care was showing racial bias.
AGI and Reduction of Information. The implication of this on generalized intelligence is clear. Reducing the amount of information to focus on what is important to a clearly defined problem is antithetical to generalization.
Contextual Nature of Data. Good or bad data is defined heavily by the context.
Statistical Proxy Limitations. Within any dataset is an implicit value judgment of what we consider worth measuring.
Good Data Removes Noise. Good Data Doesn’t Add Signal; it Removes Noise.
Skepticism About Generalized Intelligence. Ultimately, my skepticism around the viability of 'generalized intelligence' emerging by aggregating comes from my belief that there is a lot about the world and its processes that we can’t model within data.
Issues with Self-Driving Cars. Self-driving cars do find merges challenging.
AI Flattens Data Analysis. AI Flattens: By its very nature, AI works by abstracting the commonalities.
Data-Driven vs Mathematical Insights. My thesis can be broken into two parts. Firstly, I argue that Data-Driven Insights are a subclass of mathematical insights.
Yann LeCun's AGI Claim. Yann LeCunn has made headlines with his claims that 'LLMs are an off-ramp to AGI.'
AI's PR Campaign. This has led to a massive PR campaign to rehab AI's image and prepare for the next round of fundraising.
AI's Financial Cost for Microsoft. This is costing Microsoft more than $650 million.
Inflection AI's Revenue Failure. Inflection AI’s revenue was, in the words of one investor, “de minimis.” Essentially zilch.
Impacts of FoodTech. The impact of food-related sciences is immense, proving that food is not just a basic necessity but a pivotal element in saving lives.
AI Market Hype. AI has many useful use cases, but it’s important to not allow yourself to get manipulated by people trying to piggy back off successful projects to sell their hype.
Knowledge Distillation. Knowledge distillation is a model training method that trains a smaller model to mimic the outputs of a larger model.
Security Challenges. Demand for high-performance chips designed specifically for AI applications is spiking.
AI Tokenization Method. The tokenizer for Claude 3 and beyond handles numbers quite differently to its competitors.
Reading Interest. If you want to keep your finger on your pulse for the tech-bio space, she’s an elite resource.
Technical Insight Source. Hai doesn’t shy away from talking about the Math/Technical Details, which is a rarity on LinkedIn.
Spotlight on Expertise. Hai Huang is a Senior Staff Engineer at Google, working on their AI for productivity projects.
Community Engagement. We started an AI Made Simple Subreddit.
Reading Recommendations. I figured I’d start sharing whatever AI Papers/Publications, interesting books, videos, etc. I came across each week.
Subscriber Goal. Help me democratize the most important ideas in AI Research and Engineering to over 100K readers weekly.
Curated Insights. In issues of Updates, I will share interesting content I came across.
Key ACI Properties. ACIs should prioritize actions that are straightforward and easy to understand to minimize the need for extensive demonstrations or fine-tuning.
Guest Contributions. In the series Guests, I will invite experts to share their insights on various topics that they have studied/worked on.
Improving Error Recovery. Implementing guardrails, such as a code syntax checker that automatically detects mistakes, can help prevent error propagation and assist agents in identifying and correcting issues promptly.
SWE-Bench Overview. SWE-bench is a comprehensive evaluation framework comprising 2,294 software engineering problems sourced from real GitHub issues and their corresponding pull requests across 12 popular Python repositories.
SWE-Agent Performance. When using GPT-4 Turbo as the base LLM, SWE-agent successfully solves 12.5% of the 2,294 SWE-bench test issues, significantly outperforming the previous best resolve rate of 3.8%.
Effective ACI Design. By designing effective ACIs, we can harness the power of language models to create intelligent agents that can interact with digital environments in a more intuitive and efficient manner.
Agility in Code Editing. The experiments reveal that agents are sensitive to the amount of content displayed in the file viewer, and striking the right balance is essential for performance.
SWE-Agent Functionalities. SWE-Agent offers commands that enable models to create and edit files, streamlining the editing process into a single command that facilitates easy multi-line edits with consistent feedback.
Optimizing Agent Interfaces. Human user interfaces may not always be the most suitable for agent-computer interactions, calling for improved localization through faster navigation and more informative search interfaces tailored to the needs of language models.
Deepfake Market Growth. Deepfake-related losses are expected to soar from $12.3 billion in 2023 to $40 billion by 2027, growing at an astounding 32% compound annual growth rate.
Adversarial AI Rise. Deepfakes typify the cutting edge of adversarial AI attacks, achieving a 3,000% increase last year alone; incidents are projected to rise by 50% to 60% in 2024.
Enterprise Security Concerns. 60% of CISOs, CIOs, and IT leaders are afraid their enterprises are not prepared to defend against AI-powered threats and attacks.
Detection Strategy Development. Our goal is to classify an input image into one of three categories real, deep-fake, and ai-generated, which helps organizations catch Deepfakes amidst enterprise frauds.
Affordable Detection Solutions. Many cutting-edge Deepfake Detection setups are too costly to run at scale, severely limiting their utility in high-scale environments like Social Media.
Model Performance. Our best models scored very good results—top models achieving 0.93 (SVC), 0.82 (RandomForest), and 0.8 (XGBoost) respectively.
Deepfake Detection Collaboration. If your organization deals with Deepfakes, reach out to customize the baseline solution to meet your specific needs.
Social Media Influence. AI models are starting to gain a lot of popularity online, with some influencers earning significant incomes.
Early Project Insights. We were good at the main task but had terrible generalization and robustness.
AI Functionality Potential. We believe this process creates artifacts or fingerprints that ML models can detect.