Back to all positions
AIEquity-BasedFeatured
AI / LLM Systems Engineer
Remote • EuropePosted March 23, 2026
About this role
Most AI systems today are inefficient. They rely on oversized models, high latency pipelines, and expensive inference without control over cost. At scale, this becomes unsustainable. SoftQuantus is building a different approach: optimized inference pipelines, controlled cost per token, and high-performance LLM runtime systems. The objective is not experimentation — it is to maximize tokens/sec, minimize latency, and reduce infrastructure cost.
Focus Areas
- •Optimized LLM inference runtime (vLLM, TensorRT-LLM, DeepSpeed)
- •Quantization strategies: INT8, FP8, 4-bit
- •Distributed inference and KV cache optimization
- •Cost-per-token measurement and efficiency reporting
- •Production-grade AI system deployment
Responsibilities
- Build and optimize LLM inference pipelines for production environments
- Implement and benchmark quantization techniques (INT8, FP8, 4-bit)
- Optimize distributed inference with KV cache strategies
- Maximize tokens/sec throughput while minimizing latency and cost
- Collaborate with infrastructure team to align model serving with hardware capabilities
- Design AI systems that are production-ready, cost-efficient, and enterprise-grade
Requirements
- Strong experience with PyTorch or JAX
- Hands-on experience with vLLM, TensorRT-LLM, or DeepSpeed
- Deep understanding of quantization methods (INT8, FP8, 4-bit)
- Experience with distributed inference and memory optimization
- Performance-driven and system-level mindset
- Measurable impact track record: tokens/sec, latency, cost reduction
Nice to Have
- ○Experience with KV cache design and optimization
- ○Background in compiler or kernel-level optimization (Triton, CUDA)
- ○Familiarity with speculative decoding or continuous batching
- ○Experience deploying LLMs in enterprise or regulated environments
Skills & Keywords
PyTorchJAXvLLMTensorRT-LLMDeepSpeedQuantizationLLMInferenceAI SystemsCo-founder
Compensation
TypeEquity-Based
Cash component after funding
Early-stage equity participation. Core technical role at the heart of the system. Cash component after funding milestone.
Interested?
Apply now and join our team building the future of quantum computing.
Apply for this positionApply for this position
Fill out the form below to apply for AI / LLM Systems Engineer.