Meta Network Operations Engineer in Henrico, Virginia
Meta is looking for a forward thinking Network Operations Engineer with advanced technical skills in networking, system, and tooling/automation to join the network operation team to improve operations efficiency and reliability of one of the most dynamic, fast-paced networks in the world. The right candidate will be comfortable in a fast moving organization and enjoy digging into operational problems in order to implement the process enhancement and technical solutions to solve them, and are able to quickly learn and pick up new domain expertise and technologies.
Network Operations Engineer Responsibilities:
Be the key Subject Matter Expert on operations of Meta production networks in a hyper-scale and heterogeneous environment.
Formulate the right metrics and definitions of success to drive quality, efficiency, cost, and timeliness, and evolve these over time to match changes to the infrastructure and business requirements.
Develop the operational process improvement plans, and transform the improvements to scalable and automated workflows by writing and reviewing the code to improve the operational efficiency.
Perform deep dives on complex technical issues across networks, ranging from automated tooling to hardware failures and network issues.
Anticipate potential operational risks and develop strategies to mitigate/minimize.
Participate and improve escalation and emergency response with detailed postmortem while addressing issues systematically to prevent future occurrences.
Build cross-functional relationships with Network Engineering, Systems Engineering, Traffic, Logistics, Program Management, and OEM partners to deliver superb operational results and manage the performance of external vendors.
Be on-call to learn from real world production challenges and take the lessons to improve current and future generation products.
15% of travel(domestic and international).
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
7+ years of experience with network operations while supporting large-scale network infrastructure.
Experience working within a global team and collaborate with cross-functional teams in a fast-paced and dynamic environment with limited supervision.
Experience in implementing/maintaining the monitoring, alerting and repairing systems for production network in a DevOps environment.
Expert knowledge of TCP/IP and IPv6.
MS in Computer Science, Computer Engineering, or a related technical discipline, or equivalent experience.
Expert Knowledge of IB/RDMA/RoCE Networks.
Knowledge of data driven analysis.
Experience in providing technical guidance to external vendors.
Understanding of RDMA congestion control mechanisms on IB and RoCE Networks.
Understanding of AI training workloads and demands they exert on networks.
$132,000/year to $190,000/year + bonus + equity + benefits
Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.
Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at firstname.lastname@example.org.