Stephen Mistele

AI Infrastructure Engineer - AWS

Distributed Model Training @ Amazon SageMaker AI

Hi! I'm Stephen. Nice to meet you!

I specialize in building large-scale ML infrastructure and GenAI platforms. With 4+ years of experience, I've developed strong skills in high-performance distributed systems, ML pipelines, and scalable compute solutions. I'm a proven leader in optimizing performance, reducing costs, and driving production-ready AI/ML features.

Scroll Down

Professional Experience

Building what powers AI at scale: My journey engineering scalable AI, building automation, and becoming an Amazon Senior Speaker!

Amazon Web Services (AWS) - SageMaker

AI Infrastructure Engineer

Seattle, WA

Feb 2024 - Present

AWS SageMaker

Amazon SageMaker is a fully managed platform that supports the entire machine learning lifecycle, from data preparation to model deployment. I'm specifically a part of Amazon SageMaker Training, which powers large-scale model training and fine-tuning on managed infrastructure, including distributed GPU clusters. It automates compute provisioning, scaling, and monitoring—enabling seamless, high-performance training workflows at scale. It provides tools for every step of the ML lifecycle, from data preparation to production deployment.

SageMaker Logo

My Role

I am a core service engineer on the SageMaker Training & Processing team, architecting infrastructure to support large-scale ML workloads, including LLM training, fine-tuning, and evaluation. Regularly asked to present as a Senior Amazon Speaker, and lead technical initiatives that improve performance, scalability, and developer experience across regions.

Job Startup Optimization

Helped re-architect and rebuild the jobs platform to reduce launch latency and improve system reliability.

Skills
AWS SageMakerEC2Docker
Key Achievements
  • Reduced job startup time by 75%
  • Halved job failures and operational load

Amazon Senior Speaker

Regularly chosen to present on SageMaker, Bedrock at major internal and external conferences. Catch me presenting at the AWS Summit in DC this June!

Skills
Public SpeakingTechnical Writing
Key Achievements
  • Presenting architectural series on model training and optimization at internal and external conferences
  • Blog post pending publication to the official AWS blog series

GPU Topology Optimization

Refactored networking and GPU configuration to improve training performance.

Skills
EC2Distributed TrainingGPU Networking
Key Achievements
  • Reduced GPU communication latency by 15% in key workloads

Datacenter Service Expansion

Led buildout of new datacenter services to support regional scaling of training infrastructure. Developed automation to reduce future dev effort.

Skills
AWS InfrastructureCloudFormationNetworking
Key Achievements
  • Reduced future region build engineering effort by 80%
  • Saved 6 months of annual dev work

Cost Optimization Initiative

Identified and removed unused stack component to reduce unnecessary spending.

Skills
AWS Cost ExplorerInfrastructure Review
Key Achievements
  • Saved $440k in annual infrastructure costs

On-Call Automation and Support

Maintained production readiness, put out fires, and built automation to improve on-call response times.

Skills
PythonMonitoringSecurity Patching
Key Achievements
  • Reduced on-call load by 12% through targeted automation

PySpark & PyTorch Tooling

Individually responsible for maintaining customer-facing containers supporting ML workloads.

Skills
PyTorchPySparkDocker
Key Achievements
  • Build and maintain customer-available PySpark container and PyTorch toolkit

University AI Workshops & Hackathons

Led AI training workshops and hackathons at major universities in collaboration with AWS customers and leadership.

Skills
EducationWorkshopsModel TrainingHackathons
Key Achievements
  • Led GenAI hackathons at UW and SCU with 600+ participants

INRIX

Full Stack Developer & Tech Lead

Kirkland, WA

Jan 2023 - Feb 2024

INRIX

INRIX is a global leader in transportation analytics, providing real-time and historical traffic data, mobility insights, and connected vehicle services. Their platform helps cities, businesses, and drivers make smarter decisions by analyzing movement patterns across roadways and transportation networks.

My Role

Led development of AI-powered traffic analytics solutions and improved operational efficiency across multiple projects.

INRIX Compass

Spearheaded team of 7 in building first iteration of INRIX Compass, a Bedrock-powered Gen-AI application that identified causes of traffic by leveraging RAG to combine INRIX Traffic, Parking, Saftey, and 3 other data lakes to provide real-time and historical transportation analysis.

Co-launched with AWS at AWS Re:Invent

Skills
AWS BedrockVue.jsGraphQLAWS Step FunctionsLambdaEMR
Key Achievements
  • Presented Compass vision to leadership during 'innovation week', earned cross-org backing
  • Built Compass v1, led a cross-functional team of 7 engineers
  • Owned Compass MVP delivery, guiding architecture and execution

Cost Optimization Initiative

Identified and resolved a major inefficiency in AWS EMR usage costing $100k+ annually.

Skills
AWS EMRAWS AthenaData Analysis
Key Achievements
  • Developed alternative solution using AWS Athena
  • Led implementation of the cost-saving solution

University Recruiting Hackathons

Led three large-scale recruiting hackathons at Santa Clara University and the University of Washington in partnership with AWS, showcasing technical innovation and attracting engineering talent.

Skills
Event LeadershipGenAI DemosAWS PartnershipUniversity Outreach
Key Achievements
  • Organized and hosted 3 hackathons with ~250 attendees each
  • Partnered with AWS to drive visibility, mentorship, and cloud credits
  • Led GenAI workshops and judged submissions to spotlight engineering excellence
  • Directly contributed to technical recruiting pipelines and brand presence at both universities

Panterix

Founder & CEO

Santa Clara, CA

2020 - 2021

My Role

Founded and led a startup focused on global road safety analytics. Spearheaded the development of a system that identified dangerous roads using INRIX APIs and other mobility data. With support from Santa Clara University faculty, the project gained traction, evolved into a venture-backed initiative, and received recognition from both academic and entrepreneurial communities.

Road Safety Identification System

Built a real-time platform to detect and analyze high-risk roadways worldwide using traffic and safety data.

Skills
Vue.jsC#
Key Achievements
  • Built and launched MVP using INRIX APIs and SCU-backed research
  • Presented research paper at GoodTechs Conference

Bronco Venture Accelerator

Accepted into Santa Clara University’s flagship venture accelerator program supporting student entrepreneurs.

Key Achievements

Ciocca Center Pitch Competition

Finalist in SCU’s university-wide business pitch competition hosted by the Ciocca Center.

Key Achievements

Skills & Technologies

My technical toolkit for building high-performance systems, ML infrastructure, and scalable solutions.

Programming Languages

🐍Python
Java
🔧C++
📝TypeScript
🌐JavaScript

ML/AI Technologies

🔥PyTorch
Spark
🏗️ML Infrastructure
🌐Distributed Training
🤖GenAI

Cloud & Infrastructure

☁️AWS SageMaker
🪨AWS Bedrock
🖥️AWS EC2
λAWS Lambda
🐳Docker

Web Development

📱Vue.js
⚛️React
📊GraphQL
🔄REST APIs

Other Skills

💰Cost Optimization
👨‍💼Technical Leadership
🎤Public Speaking
🤖Agentic Automation

Get In Touch

I'm always open to discussing new projects, opportunities, or partnerships. Feel free to reach out!

Send Me a Message

Location

Greater Seattle Area, WA

Connect With Me

About Me

I'm passionate about building high-performance systems and ML infrastructure that powers the next generation of AI applications. With a background in software engineering and a focus on distributed systems, I enjoy tackling complex technical challenges.

When I'm not coding, you can find me exploring the Pacific Northwest, skiing, biking, or working on side projects.