 
						Sr. Manager, Product Development
2 weeks ago
Company overview:
TraceLink's software solutions and Opus Platform help the pharmaceutical industry digitize their supply chain and enable greater compliance, visibility, and decision making. It reduces disruption to the supply of medicines to patients who need them, anywhere in the world.
Founded in 2009 with the simple mission of protecting patients, today Tracelink has 8 offices, over 800 employees and more than 1300 customers in over 60 countries around the world. Our expanding product suite continues to protect patients and now also enhances multi-enterprise collaboration through innovative new applications such as MINT.
Tracelink is recognized as an industry leader by Gartner and IDC, and for having a great company culture by Comparably.
Role summary
We're seeking a motivated, and passionate Site Reliability Engineering (SRE) leader with strong expertise in programming, distributed systems, AWS infrastructure and services, and Kubernetes. In this role, you'll help evolve our SRE team's Kubernetes and Service Mesh architecture, while also supporting the integration of AI workloads both within Kubernetes and via managed services.
The SRE function plays a critical role in maintaining system visibility, ensuring platform scalability, and enhancing operational efficiency. As part of this, you'll help drive AIOps initiatives, leveraging AI tools and automation to proactively detect, diagnose, and remediate issues, enhancing the reliability and performance of TraceLink's global platform. As an SRE leader, you'll have the opportunity to apply your technical strengths, shape platform reliability strategies, and collaborate closely with engineering teams across the organization. You'll work as part of a globally distributed, inclusive team focused on AWS-based cloud infrastructure.
Key Responsibilities
SRE Leadership:
- Guide a team of SREs through weekly sprint planning and execution, helping them stay focused on delivery and long-term goals.
- Build a team environment centered around trust, ownership, and continuous learning.
- Partner with engineers across Platform and Application product teams to ensure what's pushed to production is stable, secure, and reliable.
- Stay directly involved in technical work, contributing to the codebase and leading by example in solving complex infrastructure challenges.
Core SRE:
- Collaborate with development teams, product owners, and stakeholders to define, enforce, and track SLOs and manage error budgets.
- Improve system reliability by designing for failure, testing edge cases, and monitoring key metrics.
- Boost performance by identifying bottlenecks, optimizing resource usage, and reducing latency across services.
- Build scalable systems that handle growth in traffic or data without compromising performance.
AI Ops:
- Design and implement scalable deployment strategies optimized for large language models like LLaMA, Claude, Cohere, and others.
- Set up continuous monitoring for model performance, ensuring robust alerting systems are in place to catch anomalies or degradation.
- Stay current with advancements in MLOps and Generative AI, proactively introducing innovative practices to strengthen AI infrastructure and delivery.
Monitoring and Alerting:
- Proactively identify and resolve issues by leveraging monitoring systems to catch early signals before they impact operations.
- Design and maintain alerting mechanisms that are clear, actionable, and tuned to avoid unnecessary noise or alert fatigue.
- Continuously improve system observability to enhance visibility, reduce false positives, and support faster incident response.
- Apply best practices for alert thresholds and monitoring configurations to ensure reliability and maintain system health.
- Incorporate agentic capabilities to monitor and proactively resolve system issues before they impact customers
Cost Management
- Monitor infrastructure usage to identify waste and reduce unnecessary spending.
- Optimize resource allocation by using right-sized instances, auto-scaling, and spot instances where appropriate.
- Implement cost-aware design practices during architecture and deployment planning.
- Track and analyze monthly cloud costs to ensure alignment with budget and forecast.
- Collaborate with teams to increase cost visibility and promote ownership of cloud spend.
Required Qualifications:
- Bachelor's degree in computer science, Engineering, or related field.
- 7+ years in SRE, DevOps, or cloud infrastructure; 3+ years managing SRE/DevOps teams responsible for large-scale, highly available, microservice-based systems.
- Deep knowledge of core operating system concepts, networking fundamentals, and systems management.
- Strong understanding of cloud-native deployment and management practices, especially in AWS.
- Strong expertise with AWS services from both a technical and cost optimization perspective.
- Hands-on experience with Terraform/OpenTofu, Helm, Docker, Kubernetes, Prometheus, and Istio.
- Proficiency in diagnosing and resolving container performance issues using modern tools and techniques.
- Hands-on experience with MLOps tools (Kubeflow, MLflow, SageMaker, Vertex AI, or equivalent).
- Familiarity with ML concepts: model lifecycle, feature stores, drift detection, and monitoring.
- Experience deploying, monitoring, and scaling AI/ML models, including LLM-based and agentic AI applications, in production.
- Skilled in modern DevOps/SRE practices, including CI/CD build and release pipelines.
- Experience with mature development processes, including source control, security best practices, and automated deployment.
- Familiarity with MLOps practices, including the deployment, monitoring, and scaling of AI/ML models in production, particularly LLM-based applications.
- Excellent written and verbal communication skills.
- Strong analytical and problem-solving abilities, with a bias for proactive issue identification and resolution.
Preferred Qualifications:
- Experience managing large-scale ML inference workloads, including LLM and agentic AI, in production.
- Knowledge of distributed training frameworks (TensorFlow, PyTorch).
- Hands-on development experience in Python and/or Golang.
- Experience managing SRE teams for 24/7, follow-the-sun operations.
- Familiarity with service mesh patterns beyond Istio (e.g., Linkerd, Consul).
- Experience managing GPU-enabled infrastructure and optimizing model-serving performance.
- Background in designing or implementing disaster recovery and business continuity plans.
- Prior experience in a regulated or compliance-heavy industry (e.g., healthcare, finance, life sciences).
Please see the Tracelink Privacy Policy for more information on how Tracelink processes your personal information during the recruitment process and, if applicable based on your location, how you can exercise your privacy rights. If you have questions about this privacy notice or need to contact us in connection with your personal data, including any requests to exercise your legal rights referred to at the end of this notice, please contact Candidate-.
- 
					Sales Manager1 day ago 
 Pune, Maharashtra, India SR Primes Full timeCompany DescriptionSR Primes, founded by Mrs. Mrunal Choudhari and Mr. Pranav Choudhari in 2020, is a company that specializes in career development and talent matching. Based in Pune, Ahilyanagar, and Chatrapati Sambhajinagar, SR Primes is renowned for its personalized recruitment strategies and efficient, data-driven talent matching. With a high client... 
- 
					Business Development Officer18 hours ago 
 Pune, Maharashtra, India SR Primes Full timeCompany DescriptionSR Primes, founded in 2020 by Mrs. Mrunal Choudhari and Mr. Pranav Choudhari, is renowned for its transformational career services. Based in Pune, Ahilyanagar, and Chatrapati Sambhajinagar, SR Primes specializes in lightning-fast talent matching and crafting professional partnerships. The company boasts a personalized approach, data-driven... 
- 
					Business Development Executive6 days ago 
 Pune, Maharashtra, India SR Primes Full time ₹ 6,00,000 - ₹ 12,00,000 per yearCompany DescriptionSR Primes, founded by Mrs. Mrunal Choudhari and Mr. Pranav Choudhari in 2020, is a dynamic career architecture and organizational matchmaking company. Based in Pune, Ahilyanagar, and Chatrapati Sambhajinagar, SR Primes is dedicated to transforming careers and connecting talent with organizations. With over 2,100 successful placements, a... 
- 
					Digital Marketing Manager5 days ago 
 Pune, Maharashtra, India SR Primes Full time ₹ 9,00,000 - ₹ 12,00,000 per yearCompany DescriptionSR Primes, founded by Mrs. Mrunal Choudhari and Mr. Pranav Choudhari in 2020, is more than a staffing company—it's a career architect and organizational matchmaker. We specialize in lightning-fast talent matching, creating perfect professional partnerships across diverse domains. With a personalized approach, data-driven matching, and a... 
- 
					  Sr Product Manager1 day ago 
 Pune, Maharashtra, India Mastercard Full time ₹ 20,00,000 - ₹ 25,00,000 per yearTitle and SummarySr Product Manager Technical, Priceless PlatformSr Product Manager Technical Overview of Priceless Platform, MastercardAcquired into Mastercard after being a successful startup, we maintain our passion for innovation and customer success. If you have a Own It mentality, thrive in a fast-paced startup, and passionate about technology powered... 
- 
					  Associate Product Manager2 weeks ago 
 Pune, Maharashtra, India Product Sense Full time ₹ 8,00,000 - ₹ 12,00,000 per yearWe're building a fast-growingCybersecurity Platformthat helps businesses protect their digital assets with cutting-edge solutions. To fuel this growth, we're looking for anAssociate Product Managerto join our team inPune.What you'll doDefine product requirements, user stories, and backlog items.Partner with clients to understand needs and ensure value... 
- 
					New Business Development Executive1 day ago 
 Pune, Maharashtra, India SR Primes Full time ₹ 5,00,000 - ₹ 10,00,000 per yearCompany Profile: SR PrimesSR Primes is a trusted HR Staffing solution agency based in Pune, focused on connecting exceptional talent with leading businesses across industries. Our expertise spans recruitment strategy, talent sourcing, and HR consulting, with a commitment to delivering value-driven hiring solutions. We're expanding our own team to support our... 
- 
					  Sr. Business Development Manager1 day ago 
 Pune, Maharashtra, India DigiSkillsPro Full time ₹ 5,00,000 - ₹ 15,00,000 per yearCompany DescriptionDigiSkills is an EdTech company forging a strong network of training institutions across India, known as DigiSkills Digital Career Institutes (DDCI). Through its flagship program, DigiOffice Pro, DigiSkills provides learners with critical skills such as AI-driven productivity, MS Office proficiency, digital finance, cybersecurity,... 
- 
					Recruitment Executive7 days ago 
 Pune, Maharashtra, India SR Primes Full time ₹ 6,00,000 - ₹ 12,00,000 per yearCompany DescriptionSR Primes, founded in 2020 by Mrs. Mrunal Choudhari and Mr. Pranav Choudhari, is more than a staffing company - it's a catalyst for career growth. With a personalized approach and data-driven matching, SR Primes has successfully placed over 2100 candidates and served more than 120 organizations. Based in Pune, Ahilyanagar, and Chatrapati... 
- 
					Sales Executive1 day ago 
 Pune, Maharashtra, India SR Primes Full time ₹ 6,00,000 - ₹ 12,00,000 per yearCompany DescriptionSR Primes, founded by Mrs. Mrunal Choudhari and Mr. Pranav Choudhari in 2020, is a career and organizational matchmaker based in Pune, Ahilyanagar, and Chatrapati Sambhajinagar. With a personalized and data-driven approach, SR Primes crafts precise professional placements, ensuring both career growth and organizational success. Trusted by...