Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Builder API #23223

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions cmake/onnxruntime_session.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ endif()
if (onnxruntime_MINIMAL_BUILD)
set(onnxruntime_session_src_exclude
"${ONNXRUNTIME_ROOT}/core/session/provider_bridge_ort.cc"
"${ONNXRUNTIME_ROOT}/core/session/model_builder_c_api.cc"
)

list(REMOVE_ITEM onnxruntime_session_srcs ${onnxruntime_session_src_exclude})
Expand Down
6 changes: 6 additions & 0 deletions cmake/onnxruntime_unittests.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -511,6 +511,7 @@ set (onnxruntime_shared_lib_test_SRC

if (NOT onnxruntime_MINIMAL_BUILD)
list(APPEND onnxruntime_shared_lib_test_SRC ${ONNXRUNTIME_SHARED_LIB_TEST_SRC_DIR}/test_inference.cc)
list(APPEND onnxruntime_shared_lib_test_SRC ${ONNXRUNTIME_SHARED_LIB_TEST_SRC_DIR}/test_model_builder_api.cc)
endif()

if(onnxruntime_RUN_ONNX_TESTS)
Expand Down Expand Up @@ -1350,14 +1351,19 @@ if (NOT onnxruntime_ENABLE_TRAINING_TORCH_INTEROP)
LIBS ${onnxruntime_shared_lib_test_LIBS}
DEPENDS ${all_dependencies}
)

target_include_directories(onnxruntime_shared_lib_test PRIVATE ${ONNXRUNTIME_ROOT})

if (onnxruntime_USE_CUDA)
target_include_directories(onnxruntime_shared_lib_test PRIVATE ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES})
target_sources(onnxruntime_shared_lib_test PRIVATE ${ONNXRUNTIME_SHARED_LIB_TEST_SRC_DIR}/cuda_ops.cu)
endif()

if (onnxruntime_USE_ROCM)
target_include_directories(onnxruntime_shared_lib_test PRIVATE ${onnxruntime_ROCM_HOME}/include)
target_compile_definitions(onnxruntime_shared_lib_test PRIVATE __HIP_PLATFORM_AMD__)
endif()

if (CMAKE_SYSTEM_NAME STREQUAL "Android")
target_sources(onnxruntime_shared_lib_test PRIVATE
"${ONNXRUNTIME_ROOT}/core/platform/android/cxa_demangle.cc"
Expand Down
31 changes: 29 additions & 2 deletions include/onnxruntime/core/graph/graph.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include "core/common/span_utils.h"
#include "core/common/status.h"
#include "core/common/logging/logging.h"
#include "core/framework/ort_value.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It introduces a circular dependency between onnxruntime_graph and onnxruntime_framework. ort_value is a core concept in onnxruntime_framework, which also depends on MemoryInfo, Allocators, etc. It means that the lifetime of a graph will be bound to an allocator. Furthermore, people may ask if the OrtValue can be put on GPU devices, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That dependency already existed with InjectExternalInitializedTensors in this file. And there are lots of places in the graph code where we use types from the framework library. If we want to fix that we might need to limit the graph library to fairly pure ONNX related types, and have ORT things built on top of those in the framework library. e.g. you'd have an ONNX Graph class as well as an onnxruntime Graph class, and things like OrtValue usage would be in the latter.

Long term I think it would be better to convert initializers to OrtValue when loading from the ONNX model so we detach from the protobuf types asap. There are many reasons for doing so. Having to add things like InjectExternalInitializedTensors to efficiently manage memory is a good sign the current setup isn't working well.

Can you elaborate on how the lifetime of the Graph is bound to an allocator? The OrtValue instances internally have a Tensor instance where the deleter is in a shared_ptr, so I would have thought the Graph instance can go away at any time, and the shared_ptr for the allocator in the Tensor deleter would also keep the allocator alive for as long as needed.

The problem I'm trying to address is that there's pre-existing memory where we want to transfer ownership to ORT. e.g. to free CPU based memory if we copy it to GPU. Because we have protobuf based initializers there's no way to attach the deleter to them, and the ORT API deals in OrtValue. So this was the best option I could find to essentially pass through that OrtValue to session state finalization.

The OrtValue could theoretically be on GPU. If you did that you could avoid a copy (if you knew for sure the value would be used on GPU) but you'd break the current setup with optimizers as they expect initializer data to be on CPU. Not clear we want to allow that.

Copy link
Member

@snnn snnn Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tensor cannot live longer than the allocator that allocated the buffer.
An allocator cannot live longer than the corresponding EP(e.g. CUDA EP). Because the EP needs to manage a lot of handlers, and the allocator may need to use a device handler to do malloc/free. All such handlers get destroyed when the EP is destroyed.
That could make things complicated. For example, in InferenceSession class, we have:

  std::shared_ptr<onnxruntime::Model> model_;

  // The file path of where the model was loaded. e.g. /tmp/test_squeezenet/model.onnx
  PathString model_location_;

  // The list of execution providers.
  ExecutionProviders execution_providers_;

  //...
  std::unique_ptr<SessionState> session_state_;

The model_ variable contains a graph, which contains OrtValues, which should be deleted before the execution_providers_ . But they are not ordered in that way. We had similar issues with "execution_providers_" and "session_state_". So, this is very subtle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok. So whilst the Tensor has a shared_ptr for the allocator, if the allocator depends on internals of the EP, and the EP goes away, it breaks due to that?

And if we add OrtValue to Graph, which is in InferenceSession::model_, which will be released after execution_providers_ it may break?

Should execution_providers_ therefore be declared prior to model_ in InferenceSession?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so.
However, the code is ok as for now if the graph's OrtValues only use CPU allocators which are relatively simple.

#include "core/framework/prepacked_weights_container.h"
#include "core/graph/onnx_protobuf.h"
#include "core/graph/basic_types.h"
Expand All @@ -39,6 +40,9 @@
#include "core/graph/node_arg.h"
#include "core/graph/ort_format_load_options.h"

// Type from Graph API in ORT C API so can't be in a namespace
struct OrtGraph;

namespace onnxruntime {
class Graph;
struct IndexedSubGraph;
Expand Down Expand Up @@ -763,6 +767,10 @@ class Graph { // NOLINT(clang-analyzer-optin.performance.Padding): preserve exi
*/
bool GetInitializedTensor(const std::string& tensor_name, const ONNX_NAMESPACE::TensorProto*& value) const;

/** Populate `value` if an externally allocated OrtValue exists for an initializer with the given name.
*/
bool GetOrtValueInitializer(const std::string& name, OrtValue& value) const;

/** Gets all the initializer tensors in this Graph. */
const InitializedTensorSet& GetAllInitializedTensors() const noexcept { return name_to_initial_tensor_; }

Expand Down Expand Up @@ -1430,6 +1438,16 @@ class Graph { // NOLINT(clang-analyzer-optin.performance.Padding): preserve exi
const OrtFormatLoadOptions& load_options,
const logging::Logger& logger, std::unique_ptr<Graph>& graph);

static Status LoadFromModelBuilderApiModel(const OrtGraph& api_graph,
const Model& owning_model,
const std::unordered_map<std::string, int>& domain_to_version,
IOnnxRuntimeOpSchemaCollectionPtr schema_registry,
bool strict_shape_type_inference,
const logging::Logger& logger,
std::unique_ptr<Graph>& graph);

Status UpdateUsingModelBuilderApiModel(const OrtModel& api_model);

#if !defined(ORT_MINIMAL_BUILD) || defined(ORT_EXTENDED_MINIMAL_BUILD)
const RuntimeOptimizationRecordContainer& RuntimeOptimizations() const {
return runtime_optimizations_;
Expand Down Expand Up @@ -1630,7 +1648,8 @@ class Graph { // NOLINT(clang-analyzer-optin.performance.Padding): preserve exi
// Implementation for initializer replacement
Status ReplaceInitializedTensorImpl(ONNX_NAMESPACE::TensorProto new_initializer, bool is_external);

std::vector<NodeArg*> CreateNodeArgs(const google::protobuf::RepeatedPtrField<std::string>& names,
template <typename StringRange> // range-initializer returning std::string
std::vector<NodeArg*> CreateNodeArgs(const StringRange& names,
const ArgNameToTypeMap& name_to_type_map);

void ToGraphProtoInternal(ONNX_NAMESPACE::GraphProto& graph_proto) const;
Expand Down Expand Up @@ -1694,6 +1713,8 @@ class Graph { // NOLINT(clang-analyzer-optin.performance.Padding): preserve exi
return nodes_[node_index].get();
}

Status LoadFromModelBuilderApiModel(const OrtGraph& api_graph, bool updating_existing_graph = false);

const Model& owning_model_;

// GraphProto to store name, version, initializer.
Expand All @@ -1708,6 +1729,11 @@ class Graph { // NOLINT(clang-analyzer-optin.performance.Padding): preserve exi

InitializedTensorSet name_to_initial_tensor_;

// Initializers that are external to the Graph. e.g. created using Model Builder API from existing memory.
// As we need to convert to TensorProto for the optimizers to work and keep the deleter information we store them
// in the Graph instance and retrieve during session state finalization.
std::unordered_map<std::string, OrtValue> ortvalue_initializers_;

std::unordered_set<std::reference_wrapper<const std::string>,
std::hash<std::string>, std::equal_to<std::string>>
sparse_tensor_names_;
Expand Down Expand Up @@ -1744,6 +1770,7 @@ class Graph { // NOLINT(clang-analyzer-optin.performance.Padding): preserve exi
// in some case, a fused sub-graph will happens multiple times in one model, we use a map
// to store reusable-schema in lookup.
InlinedHashMap<std::string, std::reference_wrapper<ONNX_NAMESPACE::OpSchema>> reusable_fused_schema_map_;

#endif // !defined(ORT_MINIMAL_BUILD)

// Graph nodes.
Expand Down Expand Up @@ -1806,7 +1833,7 @@ class Graph { // NOLINT(clang-analyzer-optin.performance.Padding): preserve exi
std::unordered_map<std::string, std::unordered_set<NodeIndex>> node_arg_to_consumer_nodes_;
#endif // !defined(ORT_MINIMAL_BUILD) || defined(ORT_EXTENDED_MINIMAL_BUILD)

const std::unordered_map<std::string, int> domain_to_version_;
std::unordered_map<std::string, int> domain_to_version_;

// Model IR version.
Version ir_version_{ONNX_NAMESPACE::Version::IR_VERSION};
Expand Down
6 changes: 6 additions & 0 deletions include/onnxruntime/core/graph/graph_viewer.h
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,12 @@
IOnnxRuntimeOpSchemaCollectionPtr GetSchemaRegistry() const { return graph_->GetSchemaRegistry(); }
#endif

/** Populate `value` if an externally allocated OrtValue exists for an initializer with the given name.
*/
bool GetOrtValueInitializer(const std::string& name, OrtValue& value) const {

Check warning on line 198 in include/onnxruntime/core/graph/graph_viewer.h

View workflow job for this annotation

GitHub Actions / Optional Lint C++

[cpplint] reported by reviewdog 🐶 Add #include <string> for string [build/include_what_you_use] [4] Raw Output: include/onnxruntime/core/graph/graph_viewer.h:198: Add #include <string> for string [build/include_what_you_use] [4]
return graph_->GetOrtValueInitializer(name, value);
}

private:
ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(GraphViewer);
GraphViewer(const Graph& graph, const IndexedSubGraph* filter_info);
Expand Down
Loading
Loading