Oleksandr P.
✓ Vetted CrafterSenior Software Engineer – Performance & Bug Fixing
I optimise performance and fix bugs — fast systems that stay reliable.
About
I am a strong advocate for open-source ML tooling: not for ideological reasons, but for practical ones. In the environments I have worked in, vendor lock-in and unpredictable API costs are real operational risks. I have become an expert in the open-source MLOps stack: MLflow for experiment tracking and model registry, Qdrant for vector storage, Airflow for orchestration, and open models like Mistral and Llama for inference.
My MLOps work covers the full lifecycle: DVC for data versioning, automated training pipelines on Kubernetes with GPU scheduling, model serving with BentoML and vLLM, and comprehensive monitoring with Prometheus and Grafana. I have designed systems for clients that can retrain and redeploy models automatically in response to drift alerts with zero manual intervention.
For LLM infrastructure specifically, I specialize in self-hosted deployments: Mistral 7B and Llama models on dedicated GPU instances, quantized for cost efficiency, fronted by vLLM for high-throughput batched inference. I have helped teams reduce their OpenAI spend by 80-90% by transitioning appropriate workloads to self-hosted open models with no perceptible quality degradation.
AI Expertise
Notable Projects
Self-Hosted LLM Platform on Kubernetes
Designed and deployed a self-hosted LLM serving platform on GKE with vLLM, hosting Mistral 7B, Mixtral 8x7B, and Llama 3 8B. Implemented autoscaling based on queue depth, model-level cost tracking, and a prompt caching layer with Redis to reduce repeat inference costs.
✓ Replaced $45K/month OpenAI spend with $8K/month GKE infrastructure cost; P95 latency improved from 3.1s to 1.4s due to dedicated GPU capacity and no rate limiting.
Automated Model Retraining Pipeline
Built a full automated retraining pipeline: data drift detection with Evidently, alert routing to Airflow via webhooks, DVC-managed dataset versioning, Kubeflow Pipelines for training, MLflow for experiment comparison and registry promotion, and BentoML for deployment. Zero human intervention required for standard retraining cycles.
✓ Model degradation response time reduced from 2-3 weeks to 18 hours; data scientists freed from 20 hours/week of manual retraining coordination tasks.
Open-Source RAG Platform with Qdrant
Built a fully self-hosted RAG platform using LangChain, Qdrant, and self-hosted embedding models (BGE, E5): zero external API dependencies. Supports multi-tenant isolation, incremental indexing, and query routing between vector search and structured SQL queries.
✓ Deployed for a financial services client with strict data residency requirements; zero external data transfers; retrieval latency under 180ms at P99 on a 50M document corpus.
Work Experience
Senior MLOps Engineer
Grid Dynamics
2021 – Present
Lead MLOps engineering for enterprise clients across retail, finance, and media. Specialize in open-source tooling deployments and LLM infrastructure migration from managed APIs to self-hosted solutions.
ML Engineer
Ciklum
2018 – 2021
Built ML pipelines and model serving infrastructure for European e-commerce clients. Introduced Kubernetes-based model serving and MLflow experiment tracking to standardize ML operations across 5 project teams.
Education & Certifications
M.Sc. Applied Mathematics & Computer Science
Taras Shevchenko National University of Kyiv · 2018
Interested in working with Oleksandr?
Tell us about your project and we'll facilitate an introduction.