Discussion Raja Koduri Unveils OxMiq — GPU Hardware IP and Software Startup

marees

Golden Member
Apr 28, 2024
1,327
1,911
96
OxMiq wants to be the ARM of GPUs

Raja Koduri’s new GPU software and hardware IP startup, Oxmiq Labs, is emerging from stealth with $20 million in seed funding. Former Intel GPU chief architect Koduri plans to reinvent the entire GPU technology stack, beginning with software, but stretching as far as scalable GPU hardware IP and chiplet design tools.

Oxmiq’s product portfolio is based on four key pieces of technology:
  1. Capsule, a GPU container technology;
  2. OxPython, a Python module designed to enable code portability for models;
  3. OxCore, the company’s GPU hardware IP; and
  4. OxQuilt, a set of tools and architecture elements for configuring OxCore-based chiplet designs.

Oxmiq’s product lineup is “software-first,” Koduri told EE Times.

https://www.eetimes.com/koduri-unveils-gpu-hardware-ip-and-software-startup/


I'm not sure where to post it so i post it here:

Raja Koduri just announced a new GPU Startup to run to the ground.


Why post here?

Well I wonder what lasts longer. These forums or that startup ...
 

marees

Golden Member
Apr 28, 2024
1,327
1,911
96
Capsule is a dynamic GPU container technology that abstracts away hardware complexity when deploying on heterogeneous hardware. Koduri said the team started with the Endgame orchestration layer technology, previously licensed from Intel, but with significant changes from the gaming-oriented Endgame.

“There were some valuable lessons, but the needs for compute were very different,” he said. “There are elements of Endgame [in Capsule]: discovering a network of GPUs, understanding their capabilities and their utilization, and how to schedule them… we leverage that know-how, but it’s a complete rethink for the compute side.”

Capsule’s roadmap includes features to analyze workloads in order to assign the optimal compute in a cluster based on KPIs like speed and cost. However, in its current form, Capsule is mainly intended to do what Koduri calls “easing the access burden,” or simply enabling developers to access different types of hardware.

Oxmiq uses Capsule internally, Koduri said, noting the company’s 25 engineers “couldn’t live without it,” as they are developing across hardware from Nvidia, Tenstorrent, and others, whether they are on Mac or Windows platforms.
 

marees

Golden Member
Apr 28, 2024
1,327
1,911
96
OxPython is a key component of Capsule. Koduri wants to build a bridge from the Python-Cuda ecosystem so that models from the Nvidia ecosystem can run on other hardware without code changes.

“The virus of Cuda has spread into all layers of the stack,” Koduri said, adding that Nvidia-specific technologies are continuing to infiltrate higher and higher levels of the GPU software stack, and that this is increasing.

For example, the widely used Python module torch.cuda is required to run many HuggingFace models today, meaning they will only run on Nvidia hardware.

“OxPython provides a low-friction way for people to try hardware from different vendors,” Koduri said. Top layers of the stack, including models and libraries, do not need to be modified to work with OxPython.
 

marees

Golden Member
Apr 28, 2024
1,327
1,911
96
Koduri’s vision for Oxmiq’s hardware IP, the OxCore, is to raise the level of abstraction at which software talks to hardware, an idea he first proposed in his Hot Chips keynote in 2020. (During this talk, Koduri, then Intel’s chief GPU architect, detailed Intel’s efforts to construct common abstractions for different hardware at every layer of the software stack as part of OneAPI. These abstractions, once robust, should enable the boundary between hardware and software to be raised, he speculated at the time. “Would we be able to increase our transistor-level utilization if the hardware-software contract is at level one [runtimes/low level libraries] instead of level sub-zero [firmware/BIOS level]? This is a fascinating question for me,” he said.)

Much has evolved since 2020, notably in the field of natural language understanding, which has brought this vision closer to reality, Koduri said.

“[The software] could say hey, I’m not talking Python, I’m not talking Cuda – this is the math equation I want to run, and here is the pointer to my data and here is the algorithm I want to run, go execute it for me – in plain English,” he said. “That’s the vision for our architecture.”

Koduri calls this functionality a “nano-agent,” which will be accelerated in hardware as part of the OxCore.

Easing the software burden is key to broader adoption of non-Nvidia GPUs, Koduri said, and one way to do this is to take more of that burden into the hardware.

“Our goal is for you to take open-source software stack with open-source compilers and get up and running with an OxCore chip in days,” he said.

While Koduri isn’t revealing the details of the OxCore architecture just yet, he hints that will use RISC-V cores and some near-memory compute concepts. That said, it is still a GPU.

“We conform to the GPU programming model,” Koduri said. “The GPU programming model today – OneAPI, ROCm and Cuda – they are SIMT programming models. We have native SIMT acceleration, that’s what makes [the OxCore] a GPU.”

The OxCore will have variants with more or less provision for graphics, including pixel rendering and graphics datatypes, depending on demand.
 

marees

Golden Member
Apr 28, 2024
1,327
1,911
96

Chiplet architecture​

Oxmiq has also developed a tool named OxQuilt to allow customers to configure SoCs and chiplet-based designs with different compute to memory and interconnect ratios based on the OxCore. OxQuilt is a combination of tools and architectural elements; it features an intuitive GUI for configuring chiplet/packaging designs for various use cases.

“To achieve that, we needed a modular architecture and a set of detailed modelling tools to map customers’ workloads, including their desired latency, batch size and other elements,” Koduri said. “If chiplets are already available in our library, then all they’re doing is a packaging exercise, which is 20-100 times less expensive in terms of R&D versus [taping out custom silicon].”
 

marees

Golden Member
Apr 28, 2024
1,327
1,911
96
@marees is Raja confirmed by chain posting...
I posted this also today 👇😉

Bolt Graphics unveils Zeus, a specialized GPU targeting path tracing, CAD, and HPC applications with expandable memory up to 384GB and integrated RISC-V cores. The company claims Zeus 4c delivers 13× RTX 5090 ray-tracing performance, while Zeus 1c provides 3.25× improvement. However, these claims rely entirely on internal simulations. Zeus trades traditional shader performance (10–20 TFLOPS versus RTX 5090’s 105 TFLOPS) for ray-tracing optimization. Developer kits arrive in 2026, with mass production in 2027, creating uncertainty about real-world performance versus ambitious projections.