Saturday, October 30, 2021

Data center

 Things driving Data center

lower latency, faster access, less delay, lower power

“Typically, CPUs are optimized for capacity

while accelerators and GPUs are optimized for bandwidth

However, with the exponentially growing model sizes, we see constant demand for both capacity and bandwidth without tradeoffs. We are seeing more memory tiering, which includes support for software-visible HBM plus DDR, and software transparent caching that uses HBM as a DDR-backed cache. Beyond CPUs and GPUs, HBM is also popular for data center FPGAs.”

"GDDR is a very power-hungry interface, but HBM is a super power-efficient interface. "

Interposers

Interconnects

Shareable and partitionable

"Arm has developed memory system resource partitioning and monitoring ( MPAM ) framework to tie resource controls to the software that accesses the memory system."

DVFS and AVFS

. “If you miss that (usecase), then you’re planning for some other chip,” Mijatovic said. 

a set of the most common use cases to optimize the data lines, 

to optimize the place-and-route. 

"every power domain costs wiring and isolation cells."


DRAM Low power

 Deep power down and clock stop

Logic Design Engineer

 QAT (Quick Assist Technology) hardware design team enables Data Center Technology thru a set of scalable hardware accelerators, like lossless compression, network security like secure key establishment, IPSec, SSL/TLS, and firewall and data center virtualization technology.

QAT team, e CPM (Content Processing Module) front end design team, where you will work on RTL/DFX development and integration activities within the Custom Logic
Responsibilities will include, but are not limited to:

  • Perform logic design, Register Transfer Level (RTL) coding, and simulation to generate cell libraries, functional units, and subsystems for inclusion in full chip designs.

  • Participate in the development of Architecture and Microarchitecture specifications for the Logic components.

  • Provide IP integration support to SoC customers and represents RTL team.

  • Implement RTL in System Verilog, validating the design, synthesizing the design, and closing timing.

  • High-level Architecture through to the details of timing.

  • Work with specifications at multiple levels, including the HAS and MAS (microarchitecture spec).

  • Balance design trade-offs with modularity, scalability, DFX requirements, power, area, and performance.


Friday, October 29, 2021

Systolic arrays and beyond

 


Each PE can store multiple weights.
Weights can be selected on the fly.
Pipelined parallel programs
Pipelined file compression
DAE - Decoupled Access and Execute - Modern example Pentium 4
Queue reduces the need for registers
Professor's simplified version of Pentium 4.

Friday, October 22, 2021

28nm STA

 28nm STA using PBA and GBA

Understanding 28-nm SoC Design With ARM-Based Cores

   Flexible Abstract model to decrease the size of netlist.

Implementation challenges for large 28nm socs

Global clock channel could limit reroute of deisgn

Memory-to-flop paths have high logic levels and a memory delay of ~400ps. 

create placement regions near the memories to ensure less buffering in the path, and achieve timing closure.

Clock gating issues also cause high congestion in the localized area.

different placement rules for different types of standard cells/macros, local power drop targets on top of the global targets, special clock tree design to take care of the skew and to minimize the number of buffers on clock tree. 

clock tree synthesis for low power

"Optimal grouping clock tree sinks for clock gating during both the RTL design and synthesis stages offer significant advantages for power saving. The grouping and clock gating that can be coded manually by the RTL designer who knows the architecture and typical application scenarios for the device is the most critical part that contributes most to the power savings. "

At a block level, you also have them all on or off at once with a selection for it. 


Tuesday, September 07, 2021

Bablle Hypothesis

 

Do you think you could do every new field that you study or like?

 The Burden of Capability

Capability trap

Saturday, September 04, 2021

Low node technology and power

 What is the CTL output for DFT?

Integrated Power Management, Leakage Control and Process Compensation Technology for Advanced Processes aka what are FF, SS corners.

Why SF and FS are skewed corners? (slide 10)

Multi corner multi mode analysis

Process Corner explosion

Power challenges at 10nm and below [“At that time the dynamic power could mainly be taken care of with I/Os, memories, and clocks. If you did something good in those areas, it was fine and you did what you could in order to get it under control. That’s changing, and the logic component (the data path logic and control logic) is becoming a pretty significant portion of the total dynamic power within the chip. "]

Design rule violatio fixing in timing closure

Testing analog circuits becoming more difficult

Performance Robustness Analysis of VLSI Circuits with Process Variations Based on Kharitonov Theorem

16 plant theorem for windmills

Sunday, July 18, 2021

USB

 

Saturday, July 17, 2021

Unique Chips and Systems (Computer Engineering Series)

 

Unique Chips and Systems (Computer Engineering Series)


Unique Chips and Systems

High speed Low-power On-chip Global Signaling paper refers to Bundled-data wiring channels, sometimes called “fabrics” used for highbandwidth, long-distance data movement .

Sunday, July 11, 2021

Definite Integrals


Basic definite integrals

 Lamar Notes

JEE main Maths Definite integrals previous year questions with solutions

wolfram Definite Integral

Definite integrals lecture 1

 Fractional part function

Definite integral cheat sheet

Students had questions about graphs. This graph lesson cleared up fundamentals. They even wanted to learn more about step function. This picture of  miles and cost as in the taxi meter made it easier for kids to understand. Then just to make it more relevant, I asked if the taxi driver should adopt y=2 or y=x or y=x power 2 or step function. Which would net him more money?

We talked a little bit about the floor and ceiling functions too.

After the above graphs primer, kids were ready for the graph problems in 

Definite integrals lecture 2