Intro
Seasoned AI/Hardware Co-Design Engineer with 15+ years of experience in hardware-aware software development, hardware design, and AI research. I leverage deep expertise in systems programming, computer architecture, and system simulation to design and optimize solutions for large-scale machine learning infrastructure. My track record includes patented AI accelerator architectures, impactful research in distributed machine learning, development of FPGA-based telecom and encryption modules, and optimizing hardware-software integration for AI systems. Passionate about bridging AI, software, and hardware for next-gen computing challenges, I am actively deepening my expertise in low-level systems programming and compiler technologies by learning Rust, MLIR, and LLVM, to contribute to high-performance ML infrastructure.
Experience
Research Scientist (AI/HW co-design), Rain
August 2023–present, San Francisco, USA
- CIM: LUT-based approximations, Online Softmax, Quantization.
- Multi-level simulation: performance, behavioral, cycle-accurate (PyTorch, SystemC, QEMU).
- Custom RISC-V instructions.
Research Scientist, Imagia
May 2018–March 2022 (internship period included), Montreal, Canada
- Federated Learning, Hypothesis Transfer Learning, Meta Learning, Few-Shot Learning.
- AI experimentation orchestration.
May 2017–May 2018, Halifax, Canada
- CUDA programming, OpenMP, AIS data, Deep Learning research.
Jun 2013–Jun 2014, Tehran, Iran
- FPGA-based switches for PBX systems (lead engineer), communication protocols.
March 2012–Sep 2012, Isfahan, Iran
- FPGA-based encryption modules, resource sharing algorithms for AES in STEM4.
Education
- Ph.D., Computer Science | Dalhousie University (2016–2023), CGPA: 4.19
- M.Sc., Computer Architecture | University of Isfahan (2012–2015), CGPA: 4.02
- B.Sc., Computer Engineering | Guilan University (2008–2012)
Skills
Programming Languages: Python, C++, CUDA
Systems Programming & Compiler Technologies:
- Proficient In: RISC-V Extensions, PyTorch Dynamo Integration, Low-Level Performance Optimization
- Actively Learning: Rust, MLIR, LLVM, Triton
AI & Machine Learning: Distributed Machine Learning (Federated Learning), Transfer Learning, Quantization & Compression, On-Device Training, LLM & Attention Architecture Performance Modeling
ML Infrastructure & HPC: System Simulation (SystemC, QEMU), AI Performance Modeling
Hardware Design & Verification: VHDL, Verilog, SystemC
Developer & MLOps Tools: Git, GitHub Actions, Bazel, Polyaxon, MLflow
Selected Achievements
- Patents: Lead inventor on patented AI accelerator and Transfer Learning advancements.
- Publications: Published Federated Learning and Transfer Learning research at ECCV and ICLR.
- Awards: Scotia Scholar Award ($45k), Best Graduate Research Award, Mitacs Accelerate Award ($56k).