Difference between revisions of "V4 Requirements for Standard Types"

From EPICSWIKI
 
(9 intermediate revisions by one other user not shown)
Line 1: Line 1:
The purpose of this page is to discuss and agree on a set of standard basic data types that will be supported throughout EPICS V4.  As at 2005-5-19 the latest implementation of Data Access doesn't support all the types that the V4 database is proposed to support; we should try to converge on a common set.
The purpose of this page is to discuss the requirements for the standard data types that will be used and supported throughout EPICS V4.


If you wish to comment, please use the "Post a comment" link under "This Page" in the left-hand column of this page; your comment will be added to the bottom of the Talk page which you can read [[Talk:V4 Standard Types|here]].
= Numeric Types =
 
= epicsTypes.h =
 
This header will include a set of <tt>typedef</tt>s (OS dependent if necessary) as follows:


== epicsBoolean ==
== epicsBoolean ==


A boolean type.
A boolean type.
typedef bool epicsBoolean;


== epicsInt16 ==
== epicsInt16 ==


A signed 16-bit integer type.
A signed 16-bit integer type.
typedef short epicsInt16;


== epicsInt32 ==
== epicsInt32 ==


A signed 32-bit integer type.
A signed 32-bit integer type.
typedef int epicsInt32;


== epicsInt64 ==
== epicsInt64 ==


A signed 64-bit integer type.
A signed 64-bit integer type.
typedef long long epicsInt64;
On some architectures, this may need to be defined as:
typedef long epicsInt64;


== epicsFloat32 ==
== epicsFloat32 ==


A 32-bit IEEE floating-point numeric type.
A 32-bit IEEE floating-point numeric type.
typedef float epicsFloat32;


== epicsFloat64 ==
== epicsFloat64 ==


A 64-bit IEEE floating-point numeric type.
A 64-bit IEEE floating-point numeric type.
typedef double epicsFloat64
= Character Types =


== epicsOctet ==
== epicsOctet ==


An 8-bit character type which may be signed or unsigned depending on the particular platform.  This is not intended to be used for numeric operations, just for storing and manipulating raw data bytes, including use for Unicode/UTF-8 encoded strings.
An 8-bit character type.  This is not intended to be used for numeric operations, just for storing and manipulating raw data bytes, including use for Unicode/UTF-8 encoded strings.
 
typedef char epicsOctet;


== epicsString ==
== EpicsString ==


A Unicode/UTF-8 encoded string.
A Unicode/UTF-8 encoded string.


Since this type does not map directly to any native C/C++ type, we're going to have to discuss the implementation and facilities we'll provide. Marty has a proposal for an interface that supports both segmented and contiguous buffer management and the requirements of character encoding conversions. I'll link to that proposal from here when it's ready for public consumption.
=== Requirements for EpicsString ===
 
To be adopted widely, an EpicsString should be usable like any regular C++ type, although we don't need it to be anywhere near as rich as the C++ std::string class.  This is my list of requirements for EpicsString:
 
''Ben: Let me compare this to what I think requirements should be and also to what Cords provide.''
 
# Must provide a default constructor so instances can be created without any initialization parameters. <br>''Ben: Yes. For Cords the default is the empty string.''
# Must be able to support different kinds of underlying buffer storage, such as:
#* Read only - immutable, used to hold string literals. <br>''Ben: Yes. Cords can do that.''
#** The constructor <tt>EpicsString(const char *)</tt> should create a string that just stores the given pointer as its buffer.<br>''Ben: Yes.''
#** Attempts to modify the string will probably throw an exception.<br>''Ben: No. Strings should not provide operations to change an existing string's data. Instead, a new string should be created.''
#** On some architectures overwriting a string literal causes a SEGV, so this protection is worth doing to avoid crashes.
#* Fixed capacity - mutable and contiguous, for things like record names that don't get changed after being set.<br>''Ben: Why use a mutable string if you say they "don't get changed after being set"?''
#** The character buffer is intended to be allocated just once.
#** The string contents can be modified, but cannot be extended beyond the allocated capacity.<br>''Modifications are bad, because other parts of the IOC might refer to the data and don't expect it to change. Why should things like record names be modifiable?''
#* Null-terminated contiguous - for interfacing with standard C routines such as <tt>printf()</tt> <br>''Ben: Cords can be easily converted to standard C strings. However, it makes much more sense to output and convert Cords with their own methods. For instance, you can put a Cord into a (writable, open) file just like that. It is much more efficient.''
#** This should probably be derived from the fixed capacity implementation.<br>''Ben: Already built into Cords.''
#* Variable capacity - mutable and segmented, for strings that might change quite often.<br>''Ben: Mutability is hell, because it eliminates all possibilities to share data (which makes a lot of sense if you have segmented data storage). And also because you need to lock everything against concurrent access. Much better to just create a new string each time.''
#** Uses a freelist to manage its buffer as a series of string segments.
# Strings must be assignable (if the target string is mutable).<br>''Ben: Assignability and mutability are orthogonal. So-called "immutable" strings (like Cords) can still support assignment. What is immutable in them is the '''data''' inside the string. An assignment to a Cord is a pointer assignment, the data remains constant.''
# Assignment never changes the target string's buffer type or capacity, it just copies character data from the source to the target string.<br>''Ben: And with Cords it is not even data copying, but just copying a pointer.''
# We need to provide access to the underlying character data in a unified way that will work for all buffer types.<br>''Ben: see what mutability gives you? To use it efficiently, you can't even hide the internal structure.''
#* We've designed an API that works for segmented strings, which the other types can also use.
# The EpicsString class must implement a comparison function that works between segmented and non-segmented strings <br>''Ben: Cords provide that too.''
# Strings should also be comparable using <tt>operator ==</tt> and <tt>operator !=</tt>.
#* However these operators will be non-member functions that takes two <tt>const EpicsString&</tt> parameters; this permits the LHS to undergo type conversion from a literal.<br>''Ben: Easy to do if we create a simple and thin C++ layer on top of the C Cords.''
# We're probably not going to provide less-than or greater-than comparisons because the correct ordering of characters is language specific.
#* However this makes it impossible to do binary searches or create tree structures.
#* We might want to revisit this point later.
# This list not yet complete...
 
The above requirements ensure that the following code snippets (or something very like these) will do what it looks like they should do:


----
EpicsString hello = "World!";  // create a readonly string
EpicsString msg(...);          // create a variable length string
msg = hello;                  // copy the readonly data
if (msg == hello) { ... }      // compare strings
if ("hello" != hello) { ... }  // implicit type conversion


Below here, things become more speculative...
''Ben: With Cords it would look like this:''


== epicsEnum ==
EpicsString hello = "World!";  // create a string represented by a (null-terminated) char array;
                                // no data is copied.
EpicsString msg;              // create an empty string
msg = hello;                  // no copying: data is shared between 'msg' and 'hello'
if (msg == hello) { ... }      // compare strings
if ("hello" != hello) { ... }  // implicit type conversion


A 16-bit index and an interface to convert between index values and choice strings; something like this perhaps?
''Note, you didn't show the ugly part, i.e. how is a string modified? Cords support easy and efficient concat and substring operations. All these operations construct a new string that (usually, internally) shares most of its data with the argument strings.''


  class epicsEnumInterface {
  EpicsString hw = concat("Hello, ", hello); // no data is copied
public:
  msg = substr(hw, 3, 8);       // take a substring, starting a position 3, with length 8;
    virtual ~EpicsEnumInterface() = 0;
                                // data is probably copied, because the strings are very small
   
assert(msg == "lo, Worl");
    virtual epicsInt16 choices() const = 0;
   
    virtual epicsInt16 index(const EpicsString &choice) const = 0;
    virtual void choice(epicsInt16 index, EpicsString &choice) const = 0;
  }
class EpicsEnum {
public:
    enum {invalid = -1};
   
    EpicsEnum() : pif(NULL), index(invalid) {};
    EpicsEnum(EpicsEnumInterface *if) : pif(if), index(invalid) {};
    EpicsEnum(EpicsEnumInterface *if, epicsInt16 in) : pif(if), index(in) {};
    virtual ~EpicsEnum();
   
    void interface(EpicsEnumInterface *pif);
    EpicsEnumInterface *interface() const;
   
    epicsInt16 choices() const;
   
    epicsInt16 get() const { return index; };
    void get(EpicsString &state) const;
   
    void put(epicsInt16 index);
    void put(const EpicsString &state);
   
protected:
    EpicsEnumInterface *pif;
    epicsInt16 index;
}


== epicsBits ==
''There are also forward and backward traversal operations that take appropriate function arguments. For details see http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt''


Some way to store a collection of named bits.
''Cords are not only space and time efficient and easy to program with. They can also be '''unrestrictedly shared between threads without any need for locking'''.''

Latest revision as of 11:43, 8 July 2005

The purpose of this page is to discuss the requirements for the standard data types that will be used and supported throughout EPICS V4.

Numeric Types

epicsBoolean

A boolean type.

typedef bool epicsBoolean;

epicsInt16

A signed 16-bit integer type.

typedef short epicsInt16;

epicsInt32

A signed 32-bit integer type.

typedef int epicsInt32;

epicsInt64

A signed 64-bit integer type.

typedef long long epicsInt64;

On some architectures, this may need to be defined as:

typedef long epicsInt64;

epicsFloat32

A 32-bit IEEE floating-point numeric type.

typedef float epicsFloat32;

epicsFloat64

A 64-bit IEEE floating-point numeric type.

typedef double epicsFloat64


Character Types

epicsOctet

An 8-bit character type. This is not intended to be used for numeric operations, just for storing and manipulating raw data bytes, including use for Unicode/UTF-8 encoded strings.

typedef char epicsOctet;

EpicsString

A Unicode/UTF-8 encoded string.

Requirements for EpicsString

To be adopted widely, an EpicsString should be usable like any regular C++ type, although we don't need it to be anywhere near as rich as the C++ std::string class. This is my list of requirements for EpicsString:

Ben: Let me compare this to what I think requirements should be and also to what Cords provide.

  1. Must provide a default constructor so instances can be created without any initialization parameters.
    Ben: Yes. For Cords the default is the empty string.
  2. Must be able to support different kinds of underlying buffer storage, such as:
    • Read only - immutable, used to hold string literals.
      Ben: Yes. Cords can do that.
      • The constructor EpicsString(const char *) should create a string that just stores the given pointer as its buffer.
        Ben: Yes.
      • Attempts to modify the string will probably throw an exception.
        Ben: No. Strings should not provide operations to change an existing string's data. Instead, a new string should be created.
      • On some architectures overwriting a string literal causes a SEGV, so this protection is worth doing to avoid crashes.
    • Fixed capacity - mutable and contiguous, for things like record names that don't get changed after being set.
      Ben: Why use a mutable string if you say they "don't get changed after being set"?
      • The character buffer is intended to be allocated just once.
      • The string contents can be modified, but cannot be extended beyond the allocated capacity.
        Modifications are bad, because other parts of the IOC might refer to the data and don't expect it to change. Why should things like record names be modifiable?
    • Null-terminated contiguous - for interfacing with standard C routines such as printf()
      Ben: Cords can be easily converted to standard C strings. However, it makes much more sense to output and convert Cords with their own methods. For instance, you can put a Cord into a (writable, open) file just like that. It is much more efficient.
      • This should probably be derived from the fixed capacity implementation.
        Ben: Already built into Cords.
    • Variable capacity - mutable and segmented, for strings that might change quite often.
      Ben: Mutability is hell, because it eliminates all possibilities to share data (which makes a lot of sense if you have segmented data storage). And also because you need to lock everything against concurrent access. Much better to just create a new string each time.
      • Uses a freelist to manage its buffer as a series of string segments.
  3. Strings must be assignable (if the target string is mutable).
    Ben: Assignability and mutability are orthogonal. So-called "immutable" strings (like Cords) can still support assignment. What is immutable in them is the data inside the string. An assignment to a Cord is a pointer assignment, the data remains constant.
  4. Assignment never changes the target string's buffer type or capacity, it just copies character data from the source to the target string.
    Ben: And with Cords it is not even data copying, but just copying a pointer.
  5. We need to provide access to the underlying character data in a unified way that will work for all buffer types.
    Ben: see what mutability gives you? To use it efficiently, you can't even hide the internal structure.
    • We've designed an API that works for segmented strings, which the other types can also use.
  6. The EpicsString class must implement a comparison function that works between segmented and non-segmented strings
    Ben: Cords provide that too.
  7. Strings should also be comparable using operator == and operator !=.
    • However these operators will be non-member functions that takes two const EpicsString& parameters; this permits the LHS to undergo type conversion from a literal.
      Ben: Easy to do if we create a simple and thin C++ layer on top of the C Cords.
  8. We're probably not going to provide less-than or greater-than comparisons because the correct ordering of characters is language specific.
    • However this makes it impossible to do binary searches or create tree structures.
    • We might want to revisit this point later.
  9. This list not yet complete...

The above requirements ensure that the following code snippets (or something very like these) will do what it looks like they should do:

EpicsString hello = "World!";  // create a readonly string
EpicsString msg(...);          // create a variable length string
msg = hello;                   // copy the readonly data
if (msg == hello) { ... }      // compare strings
if ("hello" != hello) { ... }  // implicit type conversion

Ben: With Cords it would look like this:

EpicsString hello = "World!";  // create a string represented by a (null-terminated) char array;
                               // no data is copied.
EpicsString msg;               // create an empty string
msg = hello;                   // no copying: data is shared between 'msg' and 'hello'
if (msg == hello) { ... }      // compare strings
if ("hello" != hello) { ... }  // implicit type conversion

Note, you didn't show the ugly part, i.e. how is a string modified? Cords support easy and efficient concat and substring operations. All these operations construct a new string that (usually, internally) shares most of its data with the argument strings.

EpicsString hw = concat("Hello, ", hello); // no data is copied
msg = substr(hw, 3, 8);        // take a substring, starting a position 3, with length 8;
                               // data is probably copied, because the strings are very small
assert(msg == "lo, Worl");

There are also forward and backward traversal operations that take appropriate function arguments. For details see http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt

Cords are not only space and time efficient and easy to program with. They can also be unrestrictedly shared between threads without any need for locking.