V4 Requirements for Standard Types

From EPICSWIKI
(Redirected from V4 Standard Types)

The purpose of this page is to discuss the requirements for the standard data types that will be used and supported throughout EPICS V4.

Numeric Types

epicsBoolean

A boolean type.

typedef bool epicsBoolean;

epicsInt16

A signed 16-bit integer type.

typedef short epicsInt16;

epicsInt32

A signed 32-bit integer type.

typedef int epicsInt32;

epicsInt64

A signed 64-bit integer type.

typedef long long epicsInt64;

On some architectures, this may need to be defined as:

typedef long epicsInt64;

epicsFloat32

A 32-bit IEEE floating-point numeric type.

typedef float epicsFloat32;

epicsFloat64

A 64-bit IEEE floating-point numeric type.

typedef double epicsFloat64


Character Types

epicsOctet

An 8-bit character type. This is not intended to be used for numeric operations, just for storing and manipulating raw data bytes, including use for Unicode/UTF-8 encoded strings.

typedef char epicsOctet;

EpicsString

A Unicode/UTF-8 encoded string.

Requirements for EpicsString

To be adopted widely, an EpicsString should be usable like any regular C++ type, although we don't need it to be anywhere near as rich as the C++ std::string class. This is my list of requirements for EpicsString:

Ben: Let me compare this to what I think requirements should be and also to what Cords provide.

  1. Must provide a default constructor so instances can be created without any initialization parameters.
    Ben: Yes. For Cords the default is the empty string.
  2. Must be able to support different kinds of underlying buffer storage, such as:
    • Read only - immutable, used to hold string literals.
      Ben: Yes. Cords can do that.
      • The constructor EpicsString(const char *) should create a string that just stores the given pointer as its buffer.
        Ben: Yes.
      • Attempts to modify the string will probably throw an exception.
        Ben: No. Strings should not provide operations to change an existing string's data. Instead, a new string should be created.
      • On some architectures overwriting a string literal causes a SEGV, so this protection is worth doing to avoid crashes.
    • Fixed capacity - mutable and contiguous, for things like record names that don't get changed after being set.
      Ben: Why use a mutable string if you say they "don't get changed after being set"?
      • The character buffer is intended to be allocated just once.
      • The string contents can be modified, but cannot be extended beyond the allocated capacity.
        Modifications are bad, because other parts of the IOC might refer to the data and don't expect it to change. Why should things like record names be modifiable?
    • Null-terminated contiguous - for interfacing with standard C routines such as printf()
      Ben: Cords can be easily converted to standard C strings. However, it makes much more sense to output and convert Cords with their own methods. For instance, you can put a Cord into a (writable, open) file just like that. It is much more efficient.
      • This should probably be derived from the fixed capacity implementation.
        Ben: Already built into Cords.
    • Variable capacity - mutable and segmented, for strings that might change quite often.
      Ben: Mutability is hell, because it eliminates all possibilities to share data (which makes a lot of sense if you have segmented data storage). And also because you need to lock everything against concurrent access. Much better to just create a new string each time.
      • Uses a freelist to manage its buffer as a series of string segments.
  3. Strings must be assignable (if the target string is mutable).
    Ben: Assignability and mutability are orthogonal. So-called "immutable" strings (like Cords) can still support assignment. What is immutable in them is the data inside the string. An assignment to a Cord is a pointer assignment, the data remains constant.
  4. Assignment never changes the target string's buffer type or capacity, it just copies character data from the source to the target string.
    Ben: And with Cords it is not even data copying, but just copying a pointer.
  5. We need to provide access to the underlying character data in a unified way that will work for all buffer types.
    Ben: see what mutability gives you? To use it efficiently, you can't even hide the internal structure.
    • We've designed an API that works for segmented strings, which the other types can also use.
  6. The EpicsString class must implement a comparison function that works between segmented and non-segmented strings
    Ben: Cords provide that too.
  7. Strings should also be comparable using operator == and operator !=.
    • However these operators will be non-member functions that takes two const EpicsString& parameters; this permits the LHS to undergo type conversion from a literal.
      Ben: Easy to do if we create a simple and thin C++ layer on top of the C Cords.
  8. We're probably not going to provide less-than or greater-than comparisons because the correct ordering of characters is language specific.
    • However this makes it impossible to do binary searches or create tree structures.
    • We might want to revisit this point later.
  9. This list not yet complete...

The above requirements ensure that the following code snippets (or something very like these) will do what it looks like they should do:

EpicsString hello = "World!";  // create a readonly string
EpicsString msg(...);          // create a variable length string
msg = hello;                   // copy the readonly data
if (msg == hello) { ... }      // compare strings
if ("hello" != hello) { ... }  // implicit type conversion

Ben: With Cords it would look like this:

EpicsString hello = "World!";  // create a string represented by a (null-terminated) char array;
                               // no data is copied.
EpicsString msg;               // create an empty string
msg = hello;                   // no copying: data is shared between 'msg' and 'hello'
if (msg == hello) { ... }      // compare strings
if ("hello" != hello) { ... }  // implicit type conversion

Note, you didn't show the ugly part, i.e. how is a string modified? Cords support easy and efficient concat and substring operations. All these operations construct a new string that (usually, internally) shares most of its data with the argument strings.

EpicsString hw = concat("Hello, ", hello); // no data is copied
msg = substr(hw, 3, 8);        // take a substring, starting a position 3, with length 8;
                               // data is probably copied, because the strings are very small
assert(msg == "lo, Worl");

There are also forward and backward traversal operations that take appropriate function arguments. For details see http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt

Cords are not only space and time efficient and easy to program with. They can also be unrestrictedly shared between threads without any need for locking.