Get In Touch

Get In Touch

Prefer using email? Say hi at z.stephen.davis@gmail.com

Builder of platforms, pipelines, and AI delivery systems that power modern applications and cloud infrastructure across industries.

Zachary S. Davis

AI Engineer, Platform Engineer, DevOps Engineer, and Cloud Architect

Contact Me
Profile Picture

Hello, I'm Zach

I design and build platforms that power modern AI and cloud-native systems.

With 16+ years of experience across infrastructure, platform engineering, and distributed systems, I focus on delivering scalable, automated environments that enable teams to move faster while maintaining reliability, security, and performance. My work sits at the intersection of cloud architecture, Kubernetes, and AI infrastructure, with a strong emphasis on Infrastructure as Code, CI/CD, and GitOps.

In my current role, I architect and operate platforms across AWS and hybrid environments, building Terraform-driven infrastructure, Kubernetes-based systems (EKS/OpenShift), and standardized CI/CD pipelines across Azure DevOps, GitHub, and GitLab. I focus on creating internal developer platforms and “golden paths” that reduce friction, improve consistency, and accelerate delivery across engineering teams.

More recently, I've been focused on AI infrastructure, designing and running GPU-backed environments using NVIDIA DGX systems, optimizing LLM inference workloads, and building hybrid architectures that balance local and cloud-based execution. I'm particularly interested in how AI systems and platform engineering can converge to accelerate DevOps workflows and practices, driving intelligent automation, reducing operational overhead, and enabling faster, more reliable software and platform delivery.

Core areas of focus:

  • AI Infrastructure & LLM Workloads (DGX, inference optimization, hybrid architectures)
  • Platform Engineering & Internal Developer Platforms
  • Terraform, Infrastructure as Code, and automation-first environments
  • CI/CD & GitOps (Azure DevOps, GitHub, GitLab)
  • Kubernetes (EKS, OpenShift) and distributed systems
  • Observability, reliability engineering, and system performance

I enjoy working on complex systems, collaborating across teams, and turning ideas into production-ready platforms. I'm especially interested in opportunities where I can help design and scale next-generation AI infrastructure.

If you're working on AI platforms, distributed systems, or cloud-native infrastructure, I'm always open to connecting.

Contact Me

Toolbox

Tools in daily rotation across platform, cloud, and AI delivery.

Skills

Built in real world production environments

AI / ML Infrastructure

GPU compute (NVIDIA DGX), LLM inference, model deployment, distributed inference.

Designed GPU-backed environments that support reliable model rollout and high-throughput inference services, with distributed serving patterns for production scale.

Cloud & Distributed Systems

AWS (multi-region, multi-account), VPC architectures, hybrid environments.

Delivered cloud foundations across regions and accounts with strong network segmentation, while integrating hybrid connectivity for enterprise platform workloads.

Platform Engineering

Internal developer platforms, self-service infrastructure, golden paths.

Built internal platforms that reduce delivery friction, providing self-service infrastructure workflows and opinionated golden paths for faster, safer releases.

Infrastructure as Code

Terraform (modular design), Ansible, CloudFormation.

Implemented reusable IaC modules and automated configuration layers that make infrastructure consistent, auditable, and repeatable across environments.

CI/CD & GitOps

Azure DevOps, GitHub Actions, GitLab CI, GitOps workflows.

Built delivery pipelines with policy checks, staged promotion, and GitOps operating models to improve release confidence and shorten feedback loops.

Languages

Python, Bash, PowerShell, JavaScript, YAML, JSON.

Used cross-platform scripting and API integration to automate operations, connect systems, and standardize configuration and runtime workflows.

Containers & Orchestration

Kubernetes (EKS, OpenShift), Helm, GPU-aware scheduling.

Operated Kubernetes platforms for both general application workloads and accelerated compute, using Helm-based release patterns and GPU scheduling controls.

Observability

Prometheus, Grafana, Alertmanager, performance tuning.

Implemented observability stacks that improve signal quality, accelerate incident triage, and guide performance tuning through data-driven optimization.

Security & Compliance

Active Directory, Azure Entra, IAM, SAML, OpenID, CJIS, HIPAA, PCI DSS, DevSecOps.

Designed identity and security patterns aligned with regulated standards while embedding controls and DevSecOps guardrails into platform delivery.