Thursday, May 15, 2025

All in one AI

Controllable reasoning   "Llama-Nemotron models is their ability to toggle between standard chat mode and reasoning mode. This "reasoning toggle" allows users to dynamically control the level of reasoning performed during inference."

The blog option of Alphaxiv is great. It makes the research papers more accessible.

what is Neural Architecture Search?

As opposed to manual, NAS employs algorithms to explore a vast search space of possible architectures and identify those that perform best on a given task.

  • Puzzle Framework: The NAS framework used is called Puzzle (Bercovich et al., 2024), which transforms large language models into hardware-efficient variants under deployment constraints. (Page 3)


  • Block-wise Local Distillation: Puzzle applies block-wise local distillation to build a library of alternative transformer blocks, each trained independently to improve computational properties. (Page 3)
  • Mixed-Integer Programming (MIP): Puzzle uses a MIP solver to select the most efficient block configuration under given constraints like hardware compatibility, latency, memory budget, or desired throughput. (Page 4)
  • Accuracy-Efficiency Tradeoff: Puzzle supports multiple block variants per layer with different accuracy-efficiency tradeoff profiles, enabling users to target specific points on the Pareto frontier. (Page 4)
  • LN-Ultra Optimization: During Puzzle's architecture search phase, LN-Ultra is constrained to achieve at least a 1.5x latency reduction over Llama 3.1-405B-Instruct. (Page 5)

In essence, NAS, through the Puzzle framework, allows the researchers to automatically find efficient model architectures within the Llama 3 structure, optimizing for metrics like latency, memory usage, and throughput while maintaining a desired level of accuracy. This is a key step in creating the Llama-Nemotron models, particularly LN-Super and LN-Ultra.                                                                                              

 

Collective bias when individual LLM agents interact

alphaxiv

 arxiv.org to alphaxiv.org and chat away with AI.

arxiv.org/pdf/2505.09343

to

https://www.alphaxiv.org/pdf/2505.09343


Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Trying to understand the picture:

What is cross-entropy loss?

"Cross-entropy loss is a loss function used in machine learning and optimization, particularly in classification problems. It quantifies the difference between two probability distributions: the predicted distribution from your model and the true distribution of the labels."

What is the difference between a shared expert vs routed expert?

In the context of Mixture of Experts (MoE) models, "shared experts" and "routed experts" refer to different ways of organizing and utilizing the expert sub-networks within the larger model. 

  • Routed experts allow the model to specialize, with different experts handling different types of inputs.
  • Shared experts process all tokens, potentially capturing general features or providing a baseline level of processing.

What is GQA?

GQA stands for Grouped-Query Attention. It's an attention mechanism used in transformer models, particularly designed to improve inference efficiency.
  • The Problem GQA Solves: In multi-head attention, each attention head has its own separate Q, K, and V vectors. During inference, the key and value vectors from previous tokens need to be stored in a cache (KV cache) to efficiently handle multi-turn conversations. This KV cache can consume a significant amount of memory, especially for long sequences.

  • How GQA Works: GQA reduces memory consumption by having multiple attention heads share a single set of Key and Value (KV) pairs. Instead of maintaining separate KV pairs for each attention head, multiple heads share one. This significantly compresses KV storage.


Wednesday, May 14, 2025

can AI control the purse strings?

economic agency at scale (along with tiered insuring AI models based on risk, AI fat fingers based on hysteresis of wrong emmeory) being able to sell or buy correctly, without being hoodwinked and vice versa. During exploratory things not done before ie dealing with a new agent, the robot wallet could send the money to a central repository, so that the purse is not open by mistake. Or always get access to the purse, only after verification with a human in the loop.

shadow AI

from prompting to AGI


Tuesday, May 13, 2025

All things AI

 AI in universities

financial transactions go awry with implanted memory

should the agent already earmark the money for certain type of transactions ie budgeting as a way to avoid bad actors. Also be skeptical. Review previous transactions. Have a special process and a database checking of who can be the one time, first transactors.

Limitations of AI - The Binding problem,  The stability-plasticity dilemma ...

Datacity electricity demand through 2035

Zerve launches multi-agent-system-support-full-enterprise-ai-development-lifecycle

"AI agents provide a plan for a problem stated in natural language.

distributed compute engine that enables massively parallel code execution using serverless technology. 

making multiple concurrent calls to large language models, offering speed and efficiency without driving up costs."

Humanoid robot 5 trillion revenue by 2050

insuring AI models

autonomous supply chains of the future won’t just respond — they will think, learn and act.

"linear programs and mixed integer programs to model constraints on your supply capacity primarily."

virtue - habit of caring

If a general AI agent can collaborate with a niche expert AI and may be a doubting agent which keeps questioning the approaches taken by the team AI. 




Sunday, May 11, 2025

AI timeline

 ai-timeline

Know the limitation of the tool - general reasonong ability is back. Expertise is niched.


“The world is so much richer than even their training data,” Lewis said.

Is hallucination a case of wait wait dont tell me tongue slips. (toddlers generalize)

Measure Model belief - as one researcher said, humans have an idea of how wrong/right they are.

rock paper scissors with AI and simultaneity  -- compare the behavioral aspect of neural networks with Behave book by Sapolsky

"people in various professions have a much better understanding of the limits of AI in their domains than AI researchers do."
"it’s very hard to articulate all your knowledge. Much of it is tacit or unconscious—things you don’t even realize you know. And that knowledge is often crucial to doing the job well."

midstream evaluation

Monday, May 05, 2025