Network Engineer (42885)
Take part in building and evolving a cutting-edge AI Factory Cloud platform as a Senior Network Engineer. We are seeking a professional with expertise in InfiniBand architecture, RoCE networking, datacenter routing, FortiGate security solutions and Linux-based environments. You should be comfortable with scripting in Go, Python or Bash, infrastructure automation using Ansible, SaltStack, Terraform or Helm, and monitoring with Grafana and Prometheus. In addition to technical delivery, You will provide consultancy, support innovation initiatives and contribute to architecture and operational excellence across the platform.
🚀 Projekt
- build and automate network platform for Industrial AI Cloud covering Switches, Firewalls, Routers and Border Gateways
- provision, manage and optimize network infrastructure components across the Industrial AI Cloud environment
- implement and fine-tune monitoring, observability and automation solutions
- coordinate network lifecycle activities including installations, upgrades, firmware updates and configuration changes
- manage high-speed network fabric utilizing InfiniBand, Ethernet and RoCE technologies
- design and implement PE/CE datacenter connectivity, routing and firewall solutions
- collaborate with Infrastructure, Platform, Data Center, IaaS, PaaS and AI solution teams
- follow and improve ITIL-based incident, problem and change management processes
- develop concepts, processes and methods for automation, optimization and standardization
- provide consulting, customer-specific implementations and technical guidance
- develop and implement service architecture based on AI Factory Cloud platform requirements
- mentor colleagues and act as a key technical lead across related technologies
- research new technologies and support innovation initiatives
🎯 Skills
- Master’s degree in Information Technologies
- network installation, maintenance and operations
- NVIDIA/Mellanox switch configuration and UFM management
- data center routing and BGP/OSPF
- ASNs, IP Transit, peering and failover connectivity
- Linux networking (Cumulus, Ubuntu, Debian)
- configuring bridges, bonds, VLANs and routing tables
- iperf, ethtool, nvidia-smi, perfquery and NVIDIA/Mellanox diagnostics
- monitoring, incident detection and root-cause analysis in large-scale datacenter networks
- FortiGate firewall administration including policies, NAT, VPNs, IDS/IPS and HA
- security segmentation, DDoS mitigation and zero-trust networking
- switch provisioning, firmware and OS upgrades, patching and configuration backups
- VMware Tanzu Kubernetes
- Go, Python and Bash scripting
- Ansible, SaltStack, Terraform and Helm
- CI/CD in Kubernetes environments
- repository management
- GitHub, GitLab, GitHub Actions and GitLab CI
- Linux OS administration
- Software-Defined Networking (SDN)
- Grafana and Prometheus
- deep understanding of InfiniBand architecture and RoCE
- low-latency and high-throughput networking for AI/HPC
- ITIL processes including incident, problem and change management
- Industrial AI Factory Cloud platform stack and dependencies
- NVIDIA GPU-accelerated server platforms
- data engineering, transformation and migration tools
- Kubernetes or similar container-based technologies
- familiarity with NOC/SOC operations and on-call rotation models
- English at B2 level
💡 Nice to have
- good communication skills
- analytical thinking
- team cooperation
- presentation skills
- negotiation skills
- German language skills
- basic project management
- basic leadership skills
- intermediate quality management
- intermediate financial literacy