V4 Data Store

From EPICSWIKI
This page is being used to develop a design for a generic data store facility, i.e. a container factory that a user application can use to create storage for structures with DataAccess interfaces.
Containers.jpg

General Design Idea

DataAccess provides a user application with an interface to data. It does not provide storage, as one of DataAccess' basic ideas is to have the user applications on both ends (client and server) store data in their own format. So DataAccess, unlike its predecessor gdd, does not implement a container, only an interface.

Nevertheless some generic applications (such as the Gateway) will have to store data without knowing beforehand what structures and types will be needed. For such applications, a generic container factory would be handy to facilitate storing data accessible through DataAccess in an easy and fast way.

User API

The user interface should be kept as simple and straightforward as possible.

As DataAccess will be used for accessing data, the basic functionality could be handled by two interface functions:

propertyCatalog& createContainer (const propertyCatalog& pc);
 pc - property catalog of the data to store
 returns property catalog of newly created container

createContainer will traverse pc twice:

  1. using the surveyor traverse to find out the structure and native types of the data, creating storage on the fly.
  2. using assignment, i.e. the viewer traverse of pc, to actually copy the data into the newly created container in natural format.

createContainer then will return the property catalog of the newly created container.

void disposeContainer (propertyCatalog& pc);
 pc - property catalog that was created by an earlier call to createContainer

disposeContainer will return the storage that was used by pc to the internal yard of unused storage.

Reference Counting

In order to be used effectively in event queues or the Gateway, the Data Store might use reference counting to avoid creating multiple copies of data.

I'm not sure which is the better way to go:

  • implicit reference counting by using smart pointers and overloading constructors and destructors or
  • explicit reference counting which needs the user to call explicitly, but might be easier to debug.

Implementation ideas

Storage optimization

The data store could engage any techniques or mechanisms to optimize the hell out of storing the data. Using epicsTypes, free lists of memory blocks sorted by natural type / size / color / smell, garbage collection, a relational database, a paper card reader, whatever.

  • A simple approach might be to implement a free list for each scalar primitive type storing properties for each PV in their native type on a linked list. That could take care of scalars. Arrays might be stored in non-contiguous fixed sized blocks in a linked list to improve efficiency. This could be viewed as an optimized version of GDD saving space by using a pure virtual interface and different implementations for different types of data. GDD didn't do that. We would still have a linked list of properties for each PV similar to GDD with this approach however. There would need to be some way to index properties efficiently within this linked list.
  • A more sophisticated approach might use non-contiguous fixed sized blocks to store all properties for a PV. Properties would be packed into these buffers using some sort of protocol. A header for each PV would provide some sort of indexing scheme into the packed properties so that they could be efficiently extracted as needed. This approach might use less storage, and presumably could be just as fast when indexing properties.
  • An even more sophisticated technique would query the server for the "class name" of its PV when connecting. The gateway would then traverse all of the properties for that class of PV on that server and build only one indexing authority for that class of PV on that server. The individual PVs would still be stored using a protocol into non-contiguous fixed sized blocks, but the indexing authority would need to be stored only once for each "class" of PV.