Oleksandr P.

✓ Vetted Crafter

Senior Software Engineer – Performance & Bug Fixing

🇺🇦Ukraine·7 years experience

Available NowContractFull-time

I optimise performance and fix bugs — fast systems that stay reliable.

About

I am a strong advocate for open-source ML tooling: not for ideological reasons, but for practical ones. In the environments I have worked in, vendor lock-in and unpredictable API costs are real operational risks. I have become an expert in the open-source MLOps stack: MLflow for experiment tracking and model registry, Qdrant for vector storage, Airflow for orchestration, and open models like Mistral and Llama for inference.

My MLOps work covers the full lifecycle: DVC for data versioning, automated training pipelines on Kubernetes with GPU scheduling, model serving with BentoML and vLLM, and comprehensive monitoring with Prometheus and Grafana. I have designed systems for clients that can retrain and redeploy models automatically in response to drift alerts with zero manual intervention.

For LLM infrastructure specifically, I specialize in self-hosted deployments: Mistral 7B and Llama models on dedicated GPU instances, quantized for cost efficiency, fronted by vLLM for high-throughput batched inference. I have helped teams reduce their OpenAI spend by 80-90% by transitioning appropriate workloads to self-hosted open models with no perceptible quality degradation.

AI Expertise

PerformanceBug Fixing

TypeScriptReactNode.jsPostgreSQLRedisDockerCursor

Notable Projects

Self-Hosted LLM Platform on Kubernetes

Designed and deployed a self-hosted LLM serving platform on GKE with vLLM, hosting Mistral 7B, Mixtral 8x7B, and Llama 3 8B. Implemented autoscaling based on queue depth, model-level cost tracking, and a prompt caching layer with Redis to reduce repeat inference costs.

vLLMMistralKubernetesGKERedisPrometheusGrafana

✓ Replaced $45K/month OpenAI spend with $8K/month GKE infrastructure cost; P95 latency improved from 3.1s to 1.4s due to dedicated GPU capacity and no rate limiting.

Automated Model Retraining Pipeline

Built a full automated retraining pipeline: data drift detection with Evidently, alert routing to Airflow via webhooks, DVC-managed dataset versioning, Kubeflow Pipelines for training, MLflow for experiment comparison and registry promotion, and BentoML for deployment. Zero human intervention required for standard retraining cycles.

AirflowKubeflowMLflowDVCBentoMLEvidentlyPython

✓ Model degradation response time reduced from 2-3 weeks to 18 hours; data scientists freed from 20 hours/week of manual retraining coordination tasks.

Open-Source RAG Platform with Qdrant

Built a fully self-hosted RAG platform using LangChain, Qdrant, and self-hosted embedding models (BGE, E5): zero external API dependencies. Supports multi-tenant isolation, incremental indexing, and query routing between vector search and structured SQL queries.

LangChainQdrantMistralFastAPIPythonPostgreSQLDocker

✓ Deployed for a financial services client with strict data residency requirements; zero external data transfers; retrieval latency under 180ms at P99 on a 50M document corpus.

Work Experience

Senior MLOps Engineer

Grid Dynamics

2021 – Present

Lead MLOps engineering for enterprise clients across retail, finance, and media. Specialize in open-source tooling deployments and LLM infrastructure migration from managed APIs to self-hosted solutions.

ML Engineer

Ciklum

2018 – 2021

Built ML pipelines and model serving infrastructure for European e-commerce clients. Introduced Kubernetes-based model serving and MLflow experiment tracking to standardize ML operations across 5 project teams.

Education & Certifications

🎓

M.Sc. Applied Mathematics & Computer Science

Taras Shevchenko National University of Kyiv · 2018

🏆 Certified Kubernetes Administrator (CKA)🏆 Google Professional Cloud Architect🏆 Databricks Certified ML Professional

Interested in working with Oleksandr?

Tell us about your project and we'll facilitate an introduction.