Research Scientist Privacy-Preserving Large-Scale Model Training & Architecture Optimization

$156k - $316.8k

Ellis Technologies, Inc.

Research Scientist — Privacy-Preserving Large-Scale Model Training & Architecture Optimization Location: San Jose Employment Type: Regular Job Code: DW1L Responsibilities Design and optimize large-scale training architectures for diffusion-based and unified generative models (e.g., DiT, Rectified Flow, hybrid AR + diffusion systems). Lead GPU-centric performance optimization, including memory layout, communication overlap, kernel fusion, and throughput scaling across thousands of accelerators. Develop and evolve distributed training strategies (DP / TP / PP / ZeRO / FSDP-style sharding) tailored to long-running, multi-stage foundation model training. Build fault-tolerant, self-healing training systems that can sustain long-running jobs under frequent hardware, network, and software failures. Design mechanisms for fast failure detection, recovery, and minimal training interruption, including checkpointing strategies, restart policies, and controlled rollouts. Improve training ETTR / MFU / utilization efficiency under real-world production constraints. Optimize Diffusion Transformer training pipelines, including noise schedules, timestep strategies, and memory-efficient attention mechanisms. Support unified generation-and-understanding models, enabling shared context, long-sequence multimodal reasoning, and scalable training without architectural bottlenecks. Collaborate with research teams on architecture-level tradeoffs between quality, compute efficiency, and training stability. Qualifications Minimum Qualifications: Strong background in large-scale deep learning systems and distributed training. Hands‑on experience with GPU optimization, including memory management, communication/computation overlap, and performance profiling. Experience training diffusion models, DiT‑style architectures, or large foundation models at scale. Proficiency in PyTorch and modern distributed training stacks. Solid understanding of parallelism strategies (DP / TP / PP / ZeRO / FSDP or equivalents). Ability to reason about training stability, numerical issues, and long-running job robustness. Preferred Qualifications: Experience with privacy-preserving ML, sensitive data training, or regulated environments. Familiarity with fault-tolerant training systems, checkpointing strategies, or production GPU orchestration. Experience with unified multimodal models (generation + understanding) or hybrid AR/diffusion systems. Low-level performance work (CUDA kernels, custom ops, fused attention, or communication libraries). Background in production ML infrastructure supporting thousands of GPUs. Job Information The base salary range for this position in the selected city is $156,000 - $316,800 annually. Benefits Employees have day one access to medical, dental, and vision insurance, a 401(k) savings plan with company match, paid parental leave, short-term and long-term disability coverage, life insurance, wellbeing benefits, and more. Employees also receive 10 paid holidays per year, 10 paid sick days per year, and 17 days of Paid Personal Time (prorated upon hire with increasing accruals by tenure). The Company reserves the right to modify or change these benefit programs at any time, with or without notice. Employment Eligibility Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse and negative relationship on the following job duties, potentially resulting in the withdrawal of the conditional offer of employment: Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues; Appropriately handling and managing confidential information including proprietary and trade secret information and access to information technology systems; Exercising sound judgment. #J-18808-Ljbffr

Apply

Vacancy posted 13 hours ago

Similar jobs that could be interesting for youBased on the Research Scientist Privacy-Preserving Large-Scale Model Training & Architecture Optimization in San Jose, CA vacancy

Research Scientist: Privacy-Preserving Large-Scale Training
...Ellis Technologies, Inc. is seeking a Research Scientist specializing in privacy-preserving model training and architecture optimization in San Jose. The candidate will design and optimize large-scale training architectures for advanced generative models and lead performance...
Training
Ellis Technologies, Inc.
San Jose, CA
1 day ago
Principal ML Architect - Security AI & Advanced Model Systems
$254k - $349.25k
...Fortune 100, 10,000 large enterprises, and... ...expertise in model architecture, training, fine-tuning,... ...of operating at scale across high-volume... ...environments Optimize inference... ...deployment Data privacy and protection in... ...Contributions to AI/ML research, open-source, or...
Training
Flexible hours
Proofpoint
Sunnyvale, CA
2 days ago
Principal ML Architect - Security AI & Advanced Model Systems
$254k - $349.25k
...Fortune 100, 10,000 large enterprises, and... ...expertise in model architecture, training, fine‑tuning,... ...of operating at scale across high-volume... ...time environments Optimize inference... ...deployment Data privacy and protection in... ...Contributions to AI/ML research, open‑source, or...
Training
Flexible hours
Proofpoint
Sunnyvale, CA
13 hours ago
Research Scientist - Vision Language Model
$150k
...of Foundation Models We are a dedicated research lab for building... ...model training, alongside world... ...researchers, data scientists, and engineers... ...development of large-scale VLM systems, spanning model architectures, data recipes... ...and inference optimization. Build and improve...
Training
Institute of Foundation Models
Sunnyvale, CA
13 hours ago
Senior DL Engineer: Edge Model Optimization & Inference
...looking for a skilled professional to enhance the performance of large-scale models through advanced optimization techniques in Santa Clara, California. Candidates should have a strong background in DL model training and deployment, ideally with a PhD or equivalent experience...
Training
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Sr. Multimodal Model Training and Inference Optimization Engineer
$244.8k
...team The Vision-Applied Research team focuses on applied research... ...dedicated to generative models for content creation,... ...Multimodal Model Training and Inference Optimization Engineer with expertise in... ...scalability, and deployment of large-scale generative AI models. Responsibilities...
Training
Temporary work
Local area
ByteDance
San Jose, CA
2 days ago
Senior Research Scientist, Efficient Deep Learning
$184k - $299k
...Senior Research Scientist, Efficient Deep Learning NVIDIA... ...about methods for post-training model optimization (pruning,... ...quantization, NAS), efficient architecture design, adaptive/... .... Experience with large language models and... ...with large‑scale model training including...
Training
NVIDIA
Santa Clara, CA
14 hours ago
Applied Machine Learning Research Scientist
...Machine Learning Research Scientist Sunnyvale CA... ...Our novel wafer-scale architecture provides the AI... ...industry-leading training and inference speeds... ...run large-scale ML applications... ...customers include top model labs, global... ...LLMs) are trained, optimized, and deployed on...
Training
Internship
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
2 days ago
Advanced Technology: AI/ML Research Scientist
...Our novel wafer-scale architecture provides the AI compute... ...industry-leading training and inference... ...effortlessly run large-scale ML... ...customers include top model labs, global enterprises... ...The Role Most AI research today is shaped... ...at the level of optimization theory, model...
Training
Dormont Manufacturing Company
Sunnyvale, CA
13 hours ago
Senior DL Software Engineer, Model Optimization and Edge Deployment - Autonomous Vehicles
$184k - $287.5k
...state‑of‑the‑art model optimization techniques—speculative... ...conversion. Scale DL model performance... ...NVIDIA edge architectures, maximizing the throughput... ...interact with large‑scale models... ...environment. Partner with research, TensorRT, and... ...track record of training, deploying, or...
Training
NVIDIA Gruppe
Santa Clara, CA
4 days ago
Lead AI Research Scientist
$357k
...cloud-native architecture connects... ...orchestration at scale. With... ...Workato's AI Research Lab is seeking... ...AI Research Scientist to join our... ...building and optimization. goal based... ...techniques, model optimizaiton... ...production within large-scale... ...scale model training, transformer...
Training
Work at office
Remote work
Flexible hours
Workato
San Jose, CA
25 days ago
AIML Researcher/Engineer - Foundation Model Post-Training
...knit group of researchers and engineers... ...for building large scale frontier foundation models at Apple. We believe... ...tackle core training challenges in... ..., and architectural adaption — designing... ...integrated, and privacy-forward... ...for preference optimization, model steering...
Training
Apple Inc.
Cupertino, CA
4 days ago
Principal ML Engineer - Large Scale Training Performance Optimization
...future of AI and beyond. Together, we advance your career. PMTS Large Scale Training Performance Optimization ENGINEER THE ROLE: We are looking for a Principal Machine Learning Engineer to join our Models and Applications team. If you are excited by the challenge of distributed...
Training
Advanced Micro Devices , Inc.
San Jose, CA
13 hours ago
Research Scientist, ML Systems - PhD New College Grad 2026
$168k - $264.5k
...now looking for a Research Scientist New Graduate with... ...systems of all scales. Advances in AI/ML... ...trustworthy systems for training, fine‑tuning, and serving ML models. All layers of AI... ...‑designed and co‑optimized to maximize... ..., or computer architecture. What you'll be doing...
Training
NVIDIA Gruppe
Santa Clara, CA
1 day ago
HPE Labs - Postdoc Research Scientist - Racks Scale Architectures
...other degree with significant research and innovation experience)... ...technology transfer.* Computer architecture* Energy monitoring,... ...implementing, and managing rack scale architectures.* Knowledge of... ...efficient technologies and cooling optimization.* Understanding of...
Local area
Hewlett Packard Enterprise Development LP
Milpitas, CA
12 hours ago
AI Engineer, Model Quality and Performance
...GPUs. Our novel wafer-scale architecture provides the AI compute... ...industry-leading training and inference speeds and... ...users to effortlessly run large-scale ML applications,... ...customers include top model labs, global... ...their cutting-edge AI research. # Work on one of the...
Training
CEREBRAS SYSTEMS INC.
Sunnyvale, CA
4 days ago
ML Engineer - Inference & Model Deployment
...powerful AI and ML models into fast,... ...models, optimizing inference latency... ...throughput, scaling serving... ...utilization, inference architecture, and... ...and integrate researcher-trained model checkpoints... ...-offs while preserving model quality... ...experience with large-scale model...
Training
Full time
Relocation package
HiringCafe
Cupertino, CA
3 days ago
Manager, Large Language Model Inference
$184k - $287.5k
...edge deep learning models on every NVIDIA... ...in the realm of large language models (... ...directly with NVIDIA Researchers, GPU Architects,... ..., runtime optimizations, and frameworks for... ...ability to lead and scale high-performing engineering... ...of GPU architecture, CUDA programming...
NVIDIA
Santa Clara, CA
12 hours ago
Research Scientist
...unified multimodal foundation model, from pretraining to... ...hardware. This is foundational research with direct physical... ...You'll Do Design and train large-scale multimodal architectures where vision, language, and... ...robotic hardware and optimize for edge inference What...
Training
Prime Recruitment Partners
Santa Clara, CA
12 hours ago
World Modeling Research Scientist
...A dedicated research lab in Sunnyvale, California, is seeking individuals... ...cutting-edge foundation models. The role involves designing scalable systems for training and optimizing AI models. Candidates should... ...fields and experience with large-scale training and video...
Training
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
13 hours ago
Research Scientist - TikTok E-Commerce Recommendation Foundation
$156k - $387.6k
...Research Scientist - TikTok E-Commerce Recommendation Foundation... ...Build and optimize cross-scenario shared Foundation Models to enable unified modeling... ...participate in model training, inference optimization... ...Qualifications Experience in large-scale recommendation system...
Training
Local area
Ellis Technologies, Inc.
San Jose, CA
12 hours ago
Senior Research Scientist, Post-Training LLM and DLM
...We are looking for a Senior Research Scientist passionate about Large Language Model (LLM) and Diffusion Language Model (DLM) post‑training and system optimization. This role is part of NVIDIA’s foundation... ...post‑training algorithms, large‑scale system efficiency, and...
Training
NVIDIA Gruppe
Santa Clara, CA
13 hours ago
Senior Applied Deep Learning Research Scientist, Efficiency
...Applied Deep Learning Research Scientist, Efficiency! Join... ...series of models to make our state‑... ...and algorithms to optimize neural networks for training and deployment. Topics... ...learning, efficient architectures and pre‑training.... ...world to use. Run large‑scale deep learning experiments...
Training
NVIDIA Gruppe
Santa Clara, CA
1 day ago
Senior Research Scientist, Multimodal Foundation Models and Robotics
$184k - $299k
...looking for a Senior Research Scientist focused on Multimodal Foundation Models and Robotics! NVIDIA is... ...multimodal foundation models, large-scale robot learning, game... ...large-scale AI training and inference methods... ...for foundation models; Optimize and deploy AI models in...
Training
NVIDIA
Santa Clara, CA
13 hours ago
Machine Learning Engineer - World Model
$150k
...Foundation Models We are a dedicated research lab for building... ...model training, alongside world... ..., data scientists, and engineers... ...experimental work can scale reliably... ...systems for large‑scale data... .... Own architecture decisions for... ...Knowledge of cost optimization, security,...
Training
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
1 day ago
Research Scientist - Distributed Machine Learning
$300k
...Institute of Foundation Models We are a dedicated research lab for building... ...model training, alongside world... ...researchers, data scientists, and engineers,... ...Overview Build and scale distributed pre-... ...Prototype new optimizers or attention methods... ...the future of large language models....
Training
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
13 hours ago
Research Scientist - World Modeling
$150k
...Institute of Foundation Models We are a dedicated research lab for building,... ...foundation model training, alongside world-... ...researchers, data scientists, and engineers, tackling... ...the world model on large-scale clusters. Develop... ...and evaluation. Optimize inference efficiency...
Training
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
12 hours ago
Research Scientist - Data
$150k
...Institute of Foundation Models We are a dedicated research lab for building... ...model training, alongside world... ...researchers, data scientists, and engineers,... ...data at the web‑scale to fuel the development... ...performance of large‑scale machine... ...domains. Optimize data‑model co‑design...
Training
Worldwide
Visa sponsorship
Institute of Foundation Models
Sunnyvale, CA
13 hours ago
Machine Learning Research Scientist - Health AIML
$201.3k - $367.4k
...Machine Learning Research Scientist - Health AIML... ...multimodal models to create intelligent... ...expertise in large multimodal... ...models that scale to billions of... ...scale up new architectures to improve model... ...Study, debug, and optimize model... ...Contribute to training and inference...
Training
Work experience placement
Worldwide
Relocation
Apple
Cupertino, CA
3 days ago
Wafer-Scale AI Research Scientist
...is seeking talented professionals to design AI models and develop training methodologies on our groundbreaking wafer-scale hardware. This role allows you to rethink the... ..., preferably with a track record of published research. Join us and influence the design of future Cerebras...
Training
Dormont Manufacturing Company
Sunnyvale, CA
13 hours ago

Do you want to receive more vacancies?

Subscribe and receive similar vacancies to Research Scientist Privacy-Preserving Large-Scale Model Training & Architecture Optimization. Be the first to apply!