Lead DevOps Engineer
201 Old Country Rd Melville, NY 11747 US
Job Description
- Serve as the primary contact for support infrastructure, leading and mentoring department team members.
- Design, implement, and support Infrastructure as Code solutions for the KWI suite of cloud-hosted products.
- Ensure service stability by monitoring for issue trends, performance metrics, and capacity.
- Diagnose system/software failures in-depth and develop/implement preventative solutions as needed.
- Create processes to clarify daily operations and facilitate cross-department communication.
- Document and implement configuration standards to optimize data center system utilization and manageability.
- Collaborate with the Technology team on infrastructure availability and technical considerations for planning and managing upgrades and enhancements.
- Over 5 years of experience in deploying or managing mid to large-scale, distributed, customer-facing OLTP Linux environments spanning hundreds of servers.
- Over 5 years of experience in designing, configuring, scaling, and supporting a 24x7x365 hosted SaaS environment.
- Deep knowledge of modern monitoring and alerting tools and practices, including OpenTelemetry standards and open-source tools such as Prometheus, InfluxDB, Grafana, ELK, Telegraf, Fluent, etc.
- Experience in building and maintaining hybrid infrastructures and on-premise data centers.
- Over 5 years of experience with infrastructure automation/configuration management/IaC (Infrastructure as Code) tools such as Ansible, Chef, Puppet, Bicep, Terraform.
- Over 3 years of experience in implementing and supporting modern infrastructure services such as Consul, Vault, Kubernetes, application load balancers, and integrating these with both monolithic and service-based applications.
- Over 5 years of experience administering Solaris, Linux, and Windows platforms.
- Over 3 years of experience supporting complex routing/switching environments, including VPN and dynamic routing protocols.
- Experience with Oracle VM on SPARC hardware is a plus.
- Over 3 years of experience administering enterprise block and file storage platforms.
- Over 3 years of experience administering MySQL platforms (replication experience highly desirable) and Java applications.
- Experience with Windows, Active Directory, and Infrastructure as Code.
- Strong working knowledge of data center network topologies, components (routing, switching, fiber channel, next-gen firewalls), and various networking and application protocols, including TCP/IP.
- Working knowledge of relational database solutions such as MySQL, as well as NoSQL and in-memory database solutions.
- Hands-on leadership with the ability to take action in high-pressure situations.
- Root cause mentality, always seeking to understand the "why" and having a bias towards action to solve problems.
- Precision and discipline in adhering to processes and standard operating procedures, while also being able to navigate ambiguity.
Meet Your Recruiter
