Stephen Mistele
AI Infrastructure Engineer - AWS
Distributed Model Training @ Amazon SageMaker AI
Hi! I'm Stephen. Nice to meet you!
I specialize in building large-scale ML infrastructure and GenAI platforms. With 4+ years of experience, I've developed strong skills in high-performance distributed systems, ML pipelines, and scalable compute solutions. I'm a proven leader in optimizing performance, reducing costs, and driving production-ready AI/ML features.
Professional Experience
Building what powers AI at scale: My journey engineering scalable AI, building automation, and becoming an Amazon Senior Speaker!
Amazon Web Services (AWS) - SageMaker
AI Infrastructure Engineer
Seattle, WA
Feb 2024 - Present
AWS SageMaker
Amazon SageMaker is a fully managed platform that supports the entire machine learning lifecycle, from data preparation to model deployment. I'm specifically a part of Amazon SageMaker Training, which powers large-scale model training and fine-tuning on managed infrastructure, including distributed GPU clusters. It automates compute provisioning, scaling, and monitoring—enabling seamless, high-performance training workflows at scale. It provides tools for every step of the ML lifecycle, from data preparation to production deployment.

My Role
I am a core service engineer on the SageMaker Training & Processing team, architecting infrastructure to support large-scale ML workloads, including LLM training, fine-tuning, and evaluation. Regularly asked to present as a Senior Amazon Speaker, and lead technical initiatives that improve performance, scalability, and developer experience across regions.
Job Startup Optimization
Helped re-architect and rebuild the jobs platform to reduce launch latency and improve system reliability.
Skills
Key Achievements
- Reduced job startup time by 75%
- Halved job failures and operational load
Amazon Senior Speaker
Regularly chosen to present on SageMaker, Bedrock at major internal and external conferences. Catch me presenting at the AWS Summit in DC this June!
Skills
Key Achievements
- Presenting architectural series on model training and optimization at internal and external conferences
- Blog post pending publication to the official AWS blog series
GPU Topology Optimization
Refactored networking and GPU configuration to improve training performance.
Skills
Key Achievements
- Reduced GPU communication latency by 15% in key workloads
Datacenter Service Expansion
Led buildout of new datacenter services to support regional scaling of training infrastructure. Developed automation to reduce future dev effort.
Skills
Key Achievements
- Reduced future region build engineering effort by 80%
- Saved 6 months of annual dev work
Cost Optimization Initiative
Identified and removed unused stack component to reduce unnecessary spending.
Skills
Key Achievements
- Saved $440k in annual infrastructure costs
On-Call Automation and Support
Maintained production readiness, put out fires, and built automation to improve on-call response times.
Skills
Key Achievements
- Reduced on-call load by 12% through targeted automation
PySpark & PyTorch Tooling
Individually responsible for maintaining customer-facing containers supporting ML workloads.
Skills
Key Achievements
- Build and maintain customer-available PySpark container and PyTorch toolkit
University AI Workshops & Hackathons
Led AI training workshops and hackathons at major universities in collaboration with AWS customers and leadership.
Skills
Key Achievements
- Led GenAI hackathons at UW and SCU with 600+ participants
INRIX
Full Stack Developer & Tech Lead
Kirkland, WA
Jan 2023 - Feb 2024
INRIX
INRIX is a global leader in transportation analytics, providing real-time and historical traffic data, mobility insights, and connected vehicle services. Their platform helps cities, businesses, and drivers make smarter decisions by analyzing movement patterns across roadways and transportation networks.
My Role
Led development of AI-powered traffic analytics solutions and improved operational efficiency across multiple projects.
INRIX Compass
Spearheaded team of 7 in building first iteration of INRIX Compass, a Bedrock-powered Gen-AI application that identified causes of traffic by leveraging RAG to combine INRIX Traffic, Parking, Saftey, and 3 other data lakes to provide real-time and historical transportation analysis.
Co-launched with AWS at AWS Re:Invent
Skills
Key Achievements
- Presented Compass vision to leadership during 'innovation week', earned cross-org backing
- Built Compass v1, led a cross-functional team of 7 engineers
- Owned Compass MVP delivery, guiding architecture and execution
Cost Optimization Initiative
Identified and resolved a major inefficiency in AWS EMR usage costing $100k+ annually.
Skills
Key Achievements
- Developed alternative solution using AWS Athena
- Led implementation of the cost-saving solution
University Recruiting Hackathons
Led three large-scale recruiting hackathons at Santa Clara University and the University of Washington in partnership with AWS, showcasing technical innovation and attracting engineering talent.
Skills
Key Achievements
- Organized and hosted 3 hackathons with ~250 attendees each
- Partnered with AWS to drive visibility, mentorship, and cloud credits
- Led GenAI workshops and judged submissions to spotlight engineering excellence
- Directly contributed to technical recruiting pipelines and brand presence at both universities
Panterix
Founder & CEO
Santa Clara, CA
2020 - 2021
My Role
Founded and led a startup focused on global road safety analytics. Spearheaded the development of a system that identified dangerous roads using INRIX APIs and other mobility data. With support from Santa Clara University faculty, the project gained traction, evolved into a venture-backed initiative, and received recognition from both academic and entrepreneurial communities.
Road Safety Identification System
Built a real-time platform to detect and analyze high-risk roadways worldwide using traffic and safety data.
Skills
Bronco Venture Accelerator
Accepted into Santa Clara University’s flagship venture accelerator program supporting student entrepreneurs.
Key Achievements
- Won entry into the Bronco Venture Accelerator, received $5,000 grant to build the business
Ciocca Center Pitch Competition
Finalist in SCU’s university-wide business pitch competition hosted by the Ciocca Center.
Key Achievements
- Selected as a Business Pitch Competition finalist among early-stage startups
Skills & Technologies
My technical toolkit for building high-performance systems, ML infrastructure, and scalable solutions.
Programming Languages
ML/AI Technologies
Cloud & Infrastructure
Web Development
Other Skills
Get In Touch
I'm always open to discussing new projects, opportunities, or partnerships. Feel free to reach out!
Send Me a Message
Location
Greater Seattle Area, WA
About Me
I'm passionate about building high-performance systems and ML infrastructure that powers the next generation of AI applications. With a background in software engineering and a focus on distributed systems, I enjoy tackling complex technical challenges.
When I'm not coding, you can find me exploring the Pacific Northwest, skiing, biking, or working on side projects.