preloader

Intro

Seasoned AI/Hardware Co-Design Engineer with 15+ years of experience in hardware-aware software development, hardware design, and AI research. I leverage deep expertise in systems programming, computer architecture, and system simulation to design and optimize solutions for large-scale machine learning infrastructure. My track record includes patented AI accelerator architectures, impactful research in distributed machine learning, development of FPGA-based telecom and encryption modules, and optimizing hardware-software integration for AI systems. Passionate about bridging AI, software, and hardware for next-gen computing challenges, I am actively deepening my expertise in low-level systems programming and compiler technologies by learning Rust, MLIR, and LLVM, to contribute to high-performance ML infrastructure.

Experience

Research Scientist (AI/HW co-design), Rain

August 2023–present, San Francisco, USA

  • CIM: LUT-based approximations, Online Softmax, Quantization.
  • Multi-level simulation: performance, behavioral, cycle-accurate (PyTorch, SystemC, QEMU).
  • Custom RISC-V instructions.

Research Scientist, Imagia

May 2018–March 2022 (internship period included), Montreal, Canada

  • Federated Learning, Hypothesis Transfer Learning, Meta Learning, Few-Shot Learning.
  • AI experimentation orchestration.

Research Assistant, Institute for Big Data Analytics

May 2017–May 2018, Halifax, Canada

  • CUDA programming, OpenMP, AIS data, Deep Learning research.

FPGA Engineer, Kara Telephone Co.

Jun 2013–Jun 2014, Tehran, Iran

  • FPGA-based switches for PBX systems (lead engineer), communication protocols.

RTL Designer Intern, SarvNet Telecommunication Inc.

March 2012–Sep 2012, Isfahan, Iran

  • FPGA-based encryption modules, resource sharing algorithms for AES in STEM4.

Education

  • Ph.D., Computer Science | Dalhousie University (2016–2023), CGPA: 4.19
  • M.Sc., Computer Architecture | University of Isfahan (2012–2015), CGPA: 4.02
  • B.Sc., Computer Engineering | Guilan University (2008–2012)

Skills

Programming Languages: Python, C++, CUDA

Systems Programming & Compiler Technologies:

  • Proficient In: RISC-V Extensions, PyTorch Dynamo Integration, Low-Level Performance Optimization
  • Actively Learning: Rust, MLIR, LLVM, Triton

AI & Machine Learning: Distributed Machine Learning (Federated Learning), Transfer Learning, Quantization & Compression, On-Device Training, LLM & Attention Architecture Performance Modeling

ML Infrastructure & HPC: System Simulation (SystemC, QEMU), AI Performance Modeling

Hardware Design & Verification: VHDL, Verilog, SystemC

Developer & MLOps Tools: Git, GitHub Actions, Bazel, Polyaxon, MLflow

Selected Achievements

  • Patents: Lead inventor on patented AI accelerator and Transfer Learning advancements.
  • Publications: Published Federated Learning and Transfer Learning research at ECCV and ICLR.
  • Awards: Scotia Scholar Award ($45k), Best Graduate Research Award, Mitacs Accelerate Award ($56k).