Skip to main content
Back to all positions
AIEquity-BasedFeatured

AI / LLM Systems Engineer

Remote • EuropePosted March 23, 2026

About this role

Most AI systems today are inefficient. They rely on oversized models, high latency pipelines, and expensive inference without control over cost. At scale, this becomes unsustainable. SoftQuantus is building a different approach: optimized inference pipelines, controlled cost per token, and high-performance LLM runtime systems. The objective is not experimentation — it is to maximize tokens/sec, minimize latency, and reduce infrastructure cost.

Focus Areas

  • Optimized LLM inference runtime (vLLM, TensorRT-LLM, DeepSpeed)
  • Quantization strategies: INT8, FP8, 4-bit
  • Distributed inference and KV cache optimization
  • Cost-per-token measurement and efficiency reporting
  • Production-grade AI system deployment

Responsibilities

  • Build and optimize LLM inference pipelines for production environments
  • Implement and benchmark quantization techniques (INT8, FP8, 4-bit)
  • Optimize distributed inference with KV cache strategies
  • Maximize tokens/sec throughput while minimizing latency and cost
  • Collaborate with infrastructure team to align model serving with hardware capabilities
  • Design AI systems that are production-ready, cost-efficient, and enterprise-grade

Requirements

  • Strong experience with PyTorch or JAX
  • Hands-on experience with vLLM, TensorRT-LLM, or DeepSpeed
  • Deep understanding of quantization methods (INT8, FP8, 4-bit)
  • Experience with distributed inference and memory optimization
  • Performance-driven and system-level mindset
  • Measurable impact track record: tokens/sec, latency, cost reduction

Nice to Have

  • Experience with KV cache design and optimization
  • Background in compiler or kernel-level optimization (Triton, CUDA)
  • Familiarity with speculative decoding or continuous batching
  • Experience deploying LLMs in enterprise or regulated environments

Skills & Keywords

PyTorchJAXvLLMTensorRT-LLMDeepSpeedQuantizationLLMInferenceAI SystemsCo-founder

Compensation

TypeEquity-Based
Cash component after funding

Early-stage equity participation. Core technical role at the heart of the system. Cash component after funding milestone.

Interested?

Apply now and join our team building the future of quantum computing.

Apply for this position

Share this position

Apply for this position

Fill out the form below to apply for AI / LLM Systems Engineer.

Click to upload or drag and drop

PDF, DOC, DOCX (max 5MB)

Links (optional)

By submitting this application, you agree to our Privacy Policy.