pith. machine review for the scientific record. sign in

arxiv: 1712.06139 · v2 · submitted 2017-12-17 · 💻 cs.DC · cs.LG

Recognition: unknown

TensorFlow-Serving: Flexible, High-Performance ML Serving

Authors on Pith no claims yet
classification 💻 cs.DC cs.LG
keywords flexiblegooglemodelmodelsservingtensorflow-servingaroundavailable
0
0 comments X
read the original abstract

We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to integrate with systems that convey new models and updated versions from training to serving. At the same time, the core code paths around model lookup and inference have been carefully optimized to avoid performance pitfalls observed in naive implementations. Google uses it in many production deployments, including a multi-tenant model hosting service called TFS^2.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Efficient Memory Management for Large Language Model Serving with PagedAttention

    cs.LG 2023-09 conditional novelty 7.0

    PagedAttention achieves near-zero waste in LLM key-value cache memory and enables 2-4x higher serving throughput than prior systems.

  2. ERPPO: Entropy Regularization-based Proximal Policy Optimization

    cs.LG 2026-05 unverdicted novelty 5.0

    ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.

  3. EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge

    cs.DC 2026-05 unverdicted novelty 5.0

    EdgeServing schedules multi-DNN inference on edge GPUs via time-division sharing and early exits, using a stability score to minimize system-wide SLO violations and P95 latency.