OP

Oleksandr P.

Vetted Crafter

Senior Software Engineer – Performance & Bug Fixing

🇺🇦Ukraine·7 years experience
Available NowContractFull-time

I optimise performance and fix bugs — fast systems that stay reliable.

About

I am a strong advocate for open-source ML tooling: not for ideological reasons, but for practical ones. In the environments I have worked in, vendor lock-in and unpredictable API costs are real operational risks. I have become an expert in the open-source MLOps stack: MLflow for experiment tracking and model registry, Qdrant for vector storage, Airflow for orchestration, and open models like Mistral and Llama for inference.

My MLOps work covers the full lifecycle: DVC for data versioning, automated training pipelines on Kubernetes with GPU scheduling, model serving with BentoML and vLLM, and comprehensive monitoring with Prometheus and Grafana. I have designed systems for clients that can retrain and redeploy models automatically in response to drift alerts with zero manual intervention.

For LLM infrastructure specifically, I specialize in self-hosted deployments: Mistral 7B and Llama models on dedicated GPU instances, quantized for cost efficiency, fronted by vLLM for high-throughput batched inference. I have helped teams reduce their OpenAI spend by 80-90% by transitioning appropriate workloads to self-hosted open models with no perceptible quality degradation.

AI Expertise

PerformanceBug Fixing
TypeScriptReactNode.jsPostgreSQLRedisDockerCursor

Notable Projects

Self-Hosted LLM Platform on Kubernetes

Designed and deployed a self-hosted LLM serving platform on GKE with vLLM, hosting Mistral 7B, Mixtral 8x7B, and Llama 3 8B. Implemented autoscaling based on queue depth, model-level cost tracking, and a prompt caching layer with Redis to reduce repeat inference costs.

vLLMMistralKubernetesGKERedisPrometheusGrafana

Replaced $45K/month OpenAI spend with $8K/month GKE infrastructure cost; P95 latency improved from 3.1s to 1.4s due to dedicated GPU capacity and no rate limiting.

Automated Model Retraining Pipeline

Built a full automated retraining pipeline: data drift detection with Evidently, alert routing to Airflow via webhooks, DVC-managed dataset versioning, Kubeflow Pipelines for training, MLflow for experiment comparison and registry promotion, and BentoML for deployment. Zero human intervention required for standard retraining cycles.

AirflowKubeflowMLflowDVCBentoMLEvidentlyPython

Model degradation response time reduced from 2-3 weeks to 18 hours; data scientists freed from 20 hours/week of manual retraining coordination tasks.

Open-Source RAG Platform with Qdrant

Built a fully self-hosted RAG platform using LangChain, Qdrant, and self-hosted embedding models (BGE, E5): zero external API dependencies. Supports multi-tenant isolation, incremental indexing, and query routing between vector search and structured SQL queries.

LangChainQdrantMistralFastAPIPythonPostgreSQLDocker

Deployed for a financial services client with strict data residency requirements; zero external data transfers; retrieval latency under 180ms at P99 on a 50M document corpus.

Work Experience

Senior MLOps Engineer

Grid Dynamics

2021 – Present

Lead MLOps engineering for enterprise clients across retail, finance, and media. Specialize in open-source tooling deployments and LLM infrastructure migration from managed APIs to self-hosted solutions.

ML Engineer

Ciklum

2018 – 2021

Built ML pipelines and model serving infrastructure for European e-commerce clients. Introduced Kubernetes-based model serving and MLflow experiment tracking to standardize ML operations across 5 project teams.

Education & Certifications

🎓

M.Sc. Applied Mathematics & Computer Science

Taras Shevchenko National University of Kyiv · 2018

🏆 Certified Kubernetes Administrator (CKA)🏆 Google Professional Cloud Architect🏆 Databricks Certified ML Professional

Interested in working with Oleksandr?

Tell us about your project and we'll facilitate an introduction.