Job Information
Nvidia Deep Learning Performance Architect Intern in Shanghai, China
We are looking for a first-class Deep Learning Performance architect to join in us to drive the performance analysis, modelling and optimization of top Datacenter, Automotive and Client AI networks. Help building and enhancing our performance analysis infrastructure. In this role, you will analyze top inference networks, identify, prototype or model perf opportunities to guide SW and Arch for NVIDIA’s current and next generation GPU and SOC products.
What you’ll be doing:
Establish deep learning applications and use-cases for performance analysis, modelling, and projections
Analyzing and proposing both SW and HW optimizations for deep learning applications
Specify hardware/software configurations and metrics to analyze performance, power, accuracy and resiliency in existing and future uni-processor and multiprocessor configurations
Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, library, and compiler teams
Build Performance Analysis Infrastructure
What we need to see:
MS or PhD in relevant discipline (CS, EE, Math)
Strong background in computer architecture
Expert mathematical foundation in machine learning and deep learning
Strong programming skills in C, C++, Perl, or Python
Ways to stand out from the crowd:
Prior experience working on assembly level performance optimization
Experience working with deep learning frameworks like TensorFlow and Torch
Familiarity with GPU computing CUDA
Background with systems-level performance modeling, profiling, and analysis
Experience in characterizing and modeling system-level performance, executing comparison studies, and documenting and publishing results
NVIDIA is a Learning Machine
NVIDIA pioneered accelerated computing to tackle challenges no one else can solve. Our work in AI and the metaverse is transforming the world's largest industries and profoundly impacting society.
Learn more about NVIDIA .