Hiring: W2 Candidates Only
🛂 Visa: Open to any visa type with valid work authorization in the USA
● System Management: Administer and maintain Linux-based servers and clusters
optimized for GPU compute workloads, ensuring high availability and performance.
● GPU Infrastructure: Configure, monitor, and troubleshoot GPU hardware (e.g., NVIDIA
GPUs) and related software stacks (e.g., CUDA, cuDNN) for optimal performance in
AI/ML and HPC applications.
● Troubleshooting: Diagnose and resolve hardware and software issues related to GPU
compute nodes and performance issues in GPU clusters.
● High-Speed Interconnects: Implement and manage high-speed networking
technologies like RDMA over Converged Ethernet (RoCE) to support low-latency,
high-bandwidth communication for GPU workloads.
● CI/CD Pipelines: Build and optimize continuous integration and deployment (CI/CD)
pipelines for testing GPU-based servers and managing deployments using tools like
GitHub Actions.
● Monitoring & Performance: Set up and maintain monitoring, logging, and alerting
systems (e.g., Prometheus, Victoria Metrics, Grafana) to track system performance,
GPU utilization, resource bottlenecks, and uptime of GPU resources.
● Security and Compliance: Implement network security measures, including firewalls,
VLANs, VPNs, and intrusion detection systems, to protect the GPU compute
environment and comply with standards like SOC 2 or ISO 27001.
● Experience: 8 years of experience in DevOps, Site Reliability Engineering (SRE), or
cloud infrastructure management, with at least 5 year working on GPU-based compute
environments in the cloud.
● Linux Administration: Strong knowledge of Linux system administration for managing
network services and tools in a GPU compute environment.
● High-Speed Interconnects: Experience with high-performance networking technologies
like RoCE, or 100GbE Ethernet in compute-intensive environments.
● GPU-Specific Networking: Proficiency with NVIDIA GPU networking technologies,
such as Mellanox ConnectX adapters, and configuring Netplan to support their drivers
and firmware.
● Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS,
Azure, GCP).
● Networking & Security: Knowledge of networking concepts (VPC, subnets) and
security best practices (IAM, encryption, firewall configurations).
Track & Field and Cross Country Graduate AssistantMuskingum University invites applications for a one-year with a one-year renewable contract as a graduate assistant coach in Track & Field and Cross Country. The position is under the direct supervision of the Director...
...Technician is also responsible for communicating with management and service advisors at all points of service repair.Responsibilities:... ..., age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression...
...provide competitive pay, outstanding benefits, career advancement opportunities, professional education, and extensive training for every... ...our customers needs. We are actively recruiting for Asphalt Distributor Operators for our Pennsylvania operations, out of our...
...hiring an Events and Community Marketing Manager to join our growing team. You'll own our events execution and strategy, and report into... ...experts Facilitate networking and community among our target accounts and roles Provide educational opportunities to our community...
...Dermatology Nurse Practitioner or Physician Assistant -Issaquah,WashingtonJob#16747640 Join an established dermatology practice in western Washington, providing expert medical, surgical, and cosmetic dermatology care in a collaborative and patient-centered environment...