Adhyayan: Memory for AI

Sunday, June 15, 2025

Memory for AI

Data pre-processing by CPU+LPDDR

Learning/inference with GPU+HBM - paurooteri

Implementing AI activation functions ie nonlinear functions for inference, shows why it comes at the cost of performance.

The best DRAMS for AI

Algorithms for Inference MIT course

Compared to alternative solutions, its packet-based processing reduces external memory moves by more than 75% for Llama 3.2 1B and Qwen2 1.5 B. Even in highly memory-bound use cases, Origin Evolution excels in producing 1000s of effective TFLOPS and dozens of tokens per second per mm2 of silicon.

"Running LLM inference in edge hardware is crucial because it reduces latency and eliminates security concerns associated with cloud-based implementations. However, deploying LLMs in resource-constrained systems poses challenges due to their large model sizes and significant computational requirements. Consequently, edge designs require specialized hardware that can effectively address their unique resource constraints, including power, performance, area (PPA), latency, and memory requirements. Moreover, innovative software optimizations are essential, including model compression, hardware optimization, attention optimization, and the creation of dedicated frameworks to manage computational and energy constraints at the edge."

Adhyayan

Sunday, June 15, 2025

Memory for AI

No comments:

About Me

Popular Posts