This chapter covers information that is relevant to all the functions specified in System Interfaces and XBD Headers.
Each of the following statements shall apply to all functions unless explicitly stated otherwise in the detailed descriptions that follow:
If an argument to a function has an invalid value (such as a value outside the domain of the function, or a pointer outside the address space of the program, or a null pointer), the behavior is undefined.
Any function declared in a header may also be implemented as a macro defined in the header, so a function should not be declared explicitly if its header is included. Any macro definition of a function can be suppressed locally by enclosing the name of the function in parentheses, because the name is then not followed by the <left-parenthesis> that indicates expansion of a macro function name. For the same syntactic reason, it is permitted to take the address of a function even if it is also defined as a macro. The use of the C-language #undef construct to remove any such macro definition shall also ensure that an actual function is referred to.
Any invocation of a function that is implemented as a macro shall expand to code that evaluates each of its arguments exactly once, fully protected by parentheses where necessary, so it is generally safe to use arbitrary expressions as arguments.
Provided that a function can be declared without reference to any type defined in a header, it is also permissible to declare the function explicitly and use it without including its associated header.
If a function that accepts a variable number of arguments is not declared (explicitly or by including its associated header), the behavior is undefined.
Each of the following statements shall apply to all macros unless explicitly stated otherwise:
Any definition of an object-like macro in a header shall expand to code that is fully protected by parentheses where necessary, so that it groups in an arbitrary expression as if it were a single identifier.
All object-like macros listed as expanding to integer constant expressions shall additionally be suitable for use in #if preprocessing directives.
Any definition of a function-like macro in a header shall expand to code that evaluates each of its arguments exactly once, fully protected by parentheses where necessary, so that it is generally safe to use arbitrary expressions as arguments.
Any definition of a function-like macro in a header can be invoked in an expression anywhere a function with a compatible return type could be called.
Certain symbols in this volume of POSIX.1-2017 are defined in headers (see XBD Headers). Some of those headers could also define symbols other than those defined by POSIX.1-2017, potentially conflicting with symbols used by the application. Also, POSIX.1-2017 defines symbols that are not permitted by other standards to appear in those headers without some control on the visibility of those symbols.
Symbols called "feature test macros" are used to control the visibility of symbols that might be included in a header. Implementations, future versions of this standard, and other standards may define additional feature test macros.
In the compilation of an application that #defines a feature test macro specified by POSIX.1-2017, no header defined by POSIX.1-2017 shall be included prior to the definition of the feature test macro. This restriction also applies to any implementation-provided header in which these feature test macros are used. If the definition of the macro does not precede the #include, the result is undefined.
Feature test macros shall begin with the <underscore> character ( '_' ).
A POSIX-conforming application shall ensure that the feature test macro _POSIX_C_SOURCE is defined before inclusion of any header.
When an application includes a header described by POSIX.1-2017, and when this feature test macro is defined to have the value 200809L:
All symbols required by POSIX.1-2017 to appear when the header is included shall be made visible.
Symbols that are explicitly permitted, but not required, by POSIX.1-2017 to appear in that header (including those in reserved name spaces) may be made visible.
Additional symbols not required or explicitly permitted by POSIX.1-2017 to be in that header shall not be made visible, except when enabled by another feature test macro.
Identifiers in POSIX.1-2017 may only be undefined using the #undef directive as described in Use and
Implementation of Interfaces or The Name Space. These #undef directives shall follow all
#include directives of any header in POSIX.1-2017.
[XSI] An XSI-conforming application shall ensure that the feature test macro _XOPEN_SOURCE is defined with the value 700 before inclusion of any header. This is needed to enable the functionality described in The _POSIX_C_SOURCE Feature Test Macro and to ensure that the XSI option is enabled.
Since this volume of POSIX.1-2017 is aligned with the ISO C standard, and since all functionality enabled by _POSIX_C_SOURCE set equal to 200809L is enabled by _XOPEN_SOURCE set equal to 700, there should be no need to define _POSIX_C_SOURCE if _XOPEN_SOURCE is so defined. Therefore, if _XOPEN_SOURCE is set equal to 700 and _POSIX_C_SOURCE is set equal to 200809L, the behavior is the same as if only _XOPEN_SOURCE is defined and set equal to 700. However, should _POSIX_C_SOURCE be set to a value greater than 200809L, the behavior is unspecified.
If _XOPEN_SOURCE is defined with the value 700 and _POSIX_C_SOURCE is undefined before inclusion of any header, then the header may define the _POSIX_C_SOURCE macro with the value 200809L.
All identifiers in this volume of POSIX.1-2017, except environ, are defined in at least one of the headers, as shown in XBD Headers. When [XSI] _XOPEN_SOURCE or _POSIX_C_SOURCE is defined, each header defines or declares some identifiers, potentially conflicting with identifiers used by the application. The set of identifiers visible to the application consists of precisely those identifiers from the header pages of the included headers, as well as additional identifiers reserved for the implementation. In addition, some headers may make visible identifiers from other headers as indicated on the relevant header pages.
Implementations may also add members to a structure or union without controlling the visibility of those members with a feature test macro, as long as a user-defined macro with the same name cannot interfere with the correct interpretation of the program. The identifiers reserved for use by the implementation are described below:
Each identifier with external linkage described in the header section is reserved for use as an identifier with external linkage if the header is included.
Each macro described in the header section is reserved for any use if the header is included.
Each identifier with file scope described in the header section is reserved for use as an identifier with file scope in the same name space if the header is included.
The prefixes posix_, POSIX_, and _POSIX_ are reserved for use by POSIX.1-2017 and other POSIX standards. Implementations may add symbols to the headers shown in the following table, provided the identifiers for those symbols either:
Begin with the corresponding reserved prefixes in the table, or
Have one of the corresponding complete names in the table, or
End in the string indicated as a reserved suffix in the table and do not use the reserved prefixes posix_, POSIX_, or _POSIX_, as long as the reserved suffix is in that part of the name considered significant by the implementation.
Symbols that use the reserved prefix _POSIX_ may be made visible by implementations in any header defined by POSIX.1-2017.
|
|
|
Complete |
---|---|---|---|
Header |
Prefix |
Suffix |
Name |
<aio.h> |
aio_, lio_, AIO_, LIO_ |
|
|
<arpa/inet.h> |
inet_ |
|
|
<ctype.h> |
to[a-z], is[a-z] |
|
|
<dlfcn.h> |
RTLD_ |
|
|
<dirent.h> |
d_ |
|
|
<fcntl.h> |
l_ |
|
|
[XSI] <fmtmsg.h> |
MM_ |
|
|
<fnmatch.h> |
FNM_ |
|
|
[XSI] <ftw.h> |
FTW |
|
|
<glob.h> |
gl_, GLOB_ |
|
|
<grp.h> |
gr_ |
|
|
<limits.h> |
|
_MAX, _MIN |
|
[XSI] <math.h> |
M_ |
|
|
[MSG] <mqueue.h> |
mq_, MQ_ |
|
|
[XSI] <ndbm.h> |
dbm_, DBM_ |
|
|
<netdb.h> |
ai_, h_, n_, p_, s_ |
|
|
<net/if.h> |
if_, IF_ |
|
|
<netinet/in.h> |
in_, ip_, s_, sin_, INADDR_, IPPROTO_ |
|
|
[IP6] |
in6_, s6_, sin6_, IPV6_ |
|
|
<netinet/tcp.h> |
TCP_ |
|
|
<nl_types.h> |
NL_ |
|
|
<poll.h> |
pd_, ph_, ps_, POLL |
|
|
<pthread.h> |
pthread_, PTHREAD_ |
|
|
<pwd.h> |
pw_ |
|
|
<regex.h> |
re_, rm_, REG_ |
|
|
<sched.h> |
sched_, SCHED_ |
|
|
<semaphore.h> |
sem_, SEM_ |
|
|
[CX] <signal.h> |
sa_, si_, sigev_, sival_, uc_, BUS_, CLD_, |
|
|
|
FPE_, ILL_, SA_, SEGV_, SI_, SIGEV_, |
|
|
[XSI] |
ss_, sv_, SS_, TRAP_, |
|
|
[OB XSR] |
POLL_ |
|
|
<stropts.h> |
bi_, ic_, l_, sl_, str_, |
|
|
|
FLUSH[A-Z], I_, S_, SND[A-Z] |
|
|
<stdlib.h> |
str[a-z] |
|
|
<string.h> |
str[a-z], mem[a-z], wcs[a-z] |
|
|
[XSI] <sys/ipc.h> |
ipc_, IPC_ |
|
key, pad, seq |
<sys/mman.h> |
shm_, MAP_, MCL_, MS_, |
|
|
|
PROT_ |
|
|
[XSI] <sys/msg.h> |
msg, MSG_[A-Z] |
|
msg |
[XSI] <sys/resource.h> |
rlim_, ru_, PRIO_, RLIMIT_, RUSAGE_ |
|
|
<sys/select.h> |
fd_, fds_, FD_ |
|
|
|
|
|
Complete |
---|---|---|---|
Header |
Prefix |
Suffix |
Name |
[XSI] <sys/sem.h> |
sem, SEM_ |
|
sem |
[XSI] <sys/shm.h> |
shm, SHM[A-Z], SHM_[A-Z] |
|
|
<sys/socket.h> |
cmsg_, if_, ifc_, ifra_, ifru_, |
|
|
|
infu_, l_, msg_, sa_, ss_, |
|
|
[XSI] |
AF_, MSG_, PF_, SCM_, |
|
|
|
SHUT_, SO |
|
|
<sys/stat.h> |
st_ |
|
|
<sys/statvfs.h> |
f_, ST_ |
|
|
[XSI] <sys/time.h> |
it_, tv_, ITIMER_ |
|
|
<sys/times.h> |
tms_ |
|
|
[XSI] <sys/uio.h> |
iov_ |
|
UIO_MAXIOV |
<sys/un.h> |
sun_ |
|
|
<sys/utsname.h> |
uts_ |
|
|
<sys/wait.h> |
P_, W[A-Z] |
|
|
[XSI] <syslog.h> |
LOG_ |
|
|
<termios.h> |
c_, B[0-9], TC |
|
|
[CX] <time.h> |
tm_ |
|
|
|
clock_, it_, timer_, tv_, |
|
|
|
CLOCK_, TIMER_ |
|
|
[XSI] <ulimit.h> |
UL_ |
|
|
[OB] <utime.h> |
utim_ |
|
|
[XSI] <utmpx.h> |
ut_ |
_LVL, _PROCESS, |
|
|
|
_TIME |
|
<wchar.h> |
wcs[a-z] |
|
|
<wctype.h> |
is[a-z], to[a-z] |
|
|
<wordexp.h> |
we_, WRDE_ |
|
|
[CX] ANY header |
|
_t |
|
|
|
|
|
Implementations may also add symbols to the <complex.h> header with the following complete names or the same names suffixed with 'f' or 'l' :
|
|
|
If any header in the following table is included, macros with the prefixes shown may be defined. After the last inclusion of a given header, an application may use identifiers with the corresponding prefixes for its own purpose, provided their use is preceded by a #undef of the corresponding macro.
Header |
Prefix |
---|---|
<errno.h> |
E[0-9], E[A-Z] |
<fcntl.h> |
F_, O_ |
<fenv.h> |
FE_[A-Z] |
<inttypes.h> |
PRI[Xa-z], SCN[Xa-z] |
<locale.h> |
LC_[A-Z] |
<math.h> |
FP_[A-Z] |
<netinet/in.h> |
IMPLINK_, IN_, IP_, IPPORT_, SOCK_, |
[IP6] |
IN6_ |
<signal.h> |
SIG_, SIG[A-Z], |
[XSI] |
SV_ |
[CX] <stdio.h> |
SEEK_ |
[OB XSR] <stropts.h> |
M_, MUXID_R[A-Z], STR |
[XSI] <sys/resource.h> |
RLIM_ |
[XSI] <sys/socket.h> |
CMSG_ |
<sys/stat.h> |
S_ |
[XSI] <sys/uio.h> |
IOV_ |
<termios.h> |
I, O, V (See below.) |
<unistd.h> |
SEEK_ |
The following are used to reserve complete names for the <stdint.h> header:
INT[0-9A-Za-z_]*_MIN INT[0-9A-Za-z_]*_MAX INT[0-9A-Za-z_]*_C UINT[0-9A-Za-z_]*_MIN UINT[0-9A-Za-z_]*_MAX UINT[0-9A-Za-z_]*_C
[XSI] The following reserved names are used as exact matches for <termios.h>:
CBAUD |
EXTB |
VDSUSP |
DEFECHO |
FLUSHO |
VLNEXT |
ECHOCTL |
LOBLK |
VREPRINT |
ECHOKE |
PENDIN |
VSTATUS |
ECHOPRT |
SWTCH |
VWERASE |
EXTA |
VDISCARD |
|
The following identifiers are reserved regardless of the inclusion of headers:
No other identifiers are reserved.
|
|
|
|
|
|
|
|
|
|
Applications shall not declare or define identifiers with the same name as an identifier reserved in the same context. Since macro names are replaced whenever found, independent of scope and name space, macro names matching any of the reserved identifier names shall not be defined by an application if any associated header is included.
Except that the effect of each inclusion of <assert.h> depends on the definition of NDEBUG, headers may be included in any order, and each may be included more than once in a given scope, with no difference in effect from that of being included only once.
If used, the application shall ensure that a header is included outside of any external declaration or definition, and it shall be first included before the first reference to any type or macro it defines, or to any function or object it declares. However, if an identifier is declared or defined in more than one header, the second and subsequent associated headers may be included after the initial reference to the identifier. Prior to the inclusion of a header, the application shall not define any macros with names lexically identical to symbols defined by that header.
Most functions can provide an error number. The means by which each function provides its error numbers is specified in its description.
Some functions provide the error number in a variable accessed through the symbol errno, defined by including the <errno.h> header. The value of errno should only be examined when it is indicated to be valid by a function's return value. No function in this volume of POSIX.1-2017 shall set errno to zero. For each thread of a process, the value of errno shall not be affected by function calls or assignments to errno by other threads.
Some functions return an error number directly as the function value. These functions return a value of zero to indicate success.
If more than one error occurs in processing a function call, any one of the possible errors may be returned, as the order of detection is undefined.
Implementations may support additional errors not included in this list, may generate errors included in this list under circumstances other than those described here, or may contain extensions or limitations that prevent some errors from occurring.
The ERRORS section on each reference page specifies which error conditions shall be detected by all implementations (``shall fail") and which may be optionally detected by an implementation (``may fail"). If no error condition is detected, the action requested shall be successful. If an error condition is detected, the action requested may have been partially performed, unless otherwise stated.
Implementations may generate error numbers listed here under circumstances other than those described, if and only if all those error conditions can always be treated identically to the error conditions as described in this volume of POSIX.1-2017. Implementations shall not generate a different error number from one required by this volume of POSIX.1-2017 for an error condition described in this volume of POSIX.1-2017, but may generate additional errors unless explicitly disallowed for a particular function.
Each implementation shall document, in the conformance document, situations in which each of the optional conditions defined in POSIX.1-2017 is detected. The conformance document may also contain statements that one or more of the optional error conditions are not detected.
Certain threads-related functions are not allowed to return an error code of [EINTR]. Where this applies it is stated in the ERRORS section on the individual function pages.
The following macro names identify the possible error numbers, in the context of the functions specifically defined in this volume of POSIX.1-2017; these general descriptions are more precisely defined in the ERRORS sections of the functions that return them. Only these macro names should be used in programs, since the actual value of the error number is unspecified. All values listed in this section shall be unique, except as noted below. The values for all these macros shall be found in the <errno.h> header defined in the Base Definitions volume of POSIX.1-2017. The actual values are unspecified by this volume of POSIX.1-2017.
or:
Lack of space in an output buffer.
or:
Argument is greater than the system-imposed maximum.
or:
Bad Message. The implementation has detected a corrupted message.
or:
O_NONBLOCK is set for the socket file descriptor and the connection cannot be immediately established.
or:
Inappropriate message buffer length.
or:
Operation timed out. The time limit associated with the operation was exceeded before the operation completed.
A conforming implementation may assign the same values for [EWOULDBLOCK] and [EAGAIN].
Additional implementation-defined error numbers may be defined in <errno.h>.
A signal is said to be "generated" for (or sent to) a process or thread when the event that causes the signal first occurs. Examples of such events include detection of hardware faults, timer expiration, signals generated via the sigevent structure and terminal activity, as well as invocations of the kill() and sigqueue() functions. In some circumstances, the same event generates signals for multiple processes.
At the time of generation, a determination shall be made whether the signal has been generated for the process or for a specific thread within the process. Signals which are generated by some action attributable to a particular thread, such as a hardware fault, shall be generated for the thread that caused the signal to be generated. Signals that are generated in association with a process ID or process group ID or an asynchronous event, such as terminal activity, shall be generated for the process.
Each process has an action to be taken in response to each signal defined by the system (see Signal Actions). A signal is said to be "delivered" to a process when the appropriate action for the process and signal is taken. A signal is said to be "accepted" by a process when the signal is selected and returned by one of the sigwait() functions.
During the time between the generation of a signal and its delivery or acceptance, the signal is said to be "pending". Ordinarily, this interval cannot be detected by an application. However, a signal can be "blocked" from delivery to a thread. If the action associated with a blocked signal is anything other than to ignore the signal, and if that signal is generated for the thread, the signal shall remain pending until it is unblocked, it is accepted when it is selected and returned by a call to the sigwait() function, or the action associated with it is set to ignore the signal. Signals generated for the process shall be delivered to exactly one of those threads within the process which is in a call to a sigwait() function selecting that signal or has not blocked delivery of the signal. If there are no threads in a call to a sigwait() function selecting that signal, and if all threads within the process block delivery of the signal, the signal shall remain pending on the process until a thread calls a sigwait() function selecting that signal, a thread unblocks delivery of the signal, or the action associated with the signal is set to ignore the signal. If the action associated with a blocked signal is to ignore the signal and if that signal is generated for the process, it is unspecified whether the signal is discarded immediately upon generation or remains pending.
Each thread has a "signal mask" that defines the set of signals currently blocked from delivery to it. The signal mask for a thread shall be initialized from that of its parent or creating thread, or from the corresponding thread in the parent process if the thread was created as the result of a call to fork(). The pthread_sigmask(), sigaction(), sigprocmask(), and sigsuspend() functions control the manipulation of the signal mask.
The determination of which action is taken in response to a signal is made at the time the signal is delivered, allowing for any changes since the time of generation. This determination is independent of the means by which the signal was originally generated. If a subsequent occurrence of a pending signal is generated, it is implementation-defined as to whether the signal is delivered or accepted more than once in circumstances other than those in which queuing is required. The order in which multiple, simultaneously pending signals outside the range SIGRTMIN to SIGRTMAX are delivered to or accepted by a process is unspecified.
When any stop signal (SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU) is generated for a process or thread, all pending SIGCONT signals for that process or any of the threads within that process shall be discarded. Conversely, when SIGCONT is generated for a process or thread, all pending stop signals for that process or any of the threads within that process shall be discarded. When SIGCONT is generated for a process that is stopped, the process shall be continued, even if the SIGCONT signal is ignored by the process or is blocked by all threads within the process and there are no threads in a call to a sigwait() function selecting SIGCONT. If SIGCONT is blocked by all threads within the process, there are no threads in a call to a sigwait() function selecting SIGCONT, and SIGCONT is not ignored by the process, the SIGCONT signal shall remain pending on the process until it is either unblocked by a thread or a thread calls a sigwait() function selecting SIGCONT, or a stop signal is generated for the process or any of the threads within the process.
An implementation shall document any condition not specified by this volume of POSIX.1-2017 under which the implementation generates signals.
This section describes functionality to support realtime signal generation and delivery.
Some signal-generating functions, such as high-resolution timer expiration, asynchronous I/O completion, interprocess message arrival, and the sigqueue() function, support the specification of an application-defined value, either explicitly as a parameter to the function or in a sigevent structure parameter. The sigevent structure is defined in <signal.h> and contains at least the following members:
Member Type |
Member Name |
Description |
---|---|---|
int |
sigev_notify |
Notification type. |
int |
sigev_signo |
Signal number. |
union sigval |
sigev_value |
Signal value. |
void(*)(union sigval) |
sigev_notify_function |
Notification function. |
(pthread_attr_t*) |
sigev_notify_attributes |
Notification attributes. |
The sigev_notify member specifies the notification mechanism to use when an asynchronous event occurs. This volume of POSIX.1-2017 defines the following values for the sigev_notify member:
An implementation may define additional notification mechanisms.
The sigev_signo member specifies the signal to be generated. The sigev_value member is the application-defined value to be passed to the signal-catching function at the time of the signal delivery or to be returned at signal acceptance as the si_value member of the siginfo_t structure.
The sigval union is defined in <signal.h> and contains at least the following members:
Member Type |
Member Name |
Description |
---|---|---|
int |
sival_int |
Integer signal value. |
void* |
sival_ptr |
Pointer signal value. |
The sival_int member shall be used when the application-defined value is of type int; the sival_ptr member shall be used when the application-defined value is a pointer.
When a signal is generated by the sigqueue() function or any signal-generating function that supports the specification of an application-defined value, the signal shall be marked pending and, if the SA_SIGINFO flag is set for that signal, the signal shall be queued to the process along with the application-specified signal value. Multiple occurrences of signals so generated are queued in FIFO order. It is unspecified whether signals so generated are queued when the SA_SIGINFO flag is not set for that signal.
Signals generated by the kill() function or other events that cause signals to occur, such as detection of hardware faults, alarm() timer expiration, or terminal activity, and for which the implementation does not support queuing, shall have no effect on signals already queued for the same signal number.
When multiple unblocked signals, all in the range SIGRTMIN to SIGRTMAX, are pending, the behavior shall be as if the implementation delivers the pending unblocked signal with the lowest signal number within that range. No other ordering of signal delivery is specified.
If, when a pending signal is delivered, there are additional signals queued to that signal number, the signal shall remain pending. Otherwise, the pending indication shall be reset.
Multi-threaded programs can use an alternate event notification mechanism. When a notification is processed, and the sigev_notify member of the sigevent structure has the value SIGEV_THREAD, the function sigev_notify_function is called with parameter sigev_value.
The function shall be executed in an environment as if it were the start_routine for a newly created thread with thread attributes specified by sigev_notify_attributes. If sigev_notify_attributes is NULL, the behavior shall be as if the thread were created with the detachstate attribute set to PTHREAD_CREATE_DETACHED. Supplying an attributes structure with a detachstate attribute of PTHREAD_CREATE_JOINABLE results in undefined behavior. The signal mask of this thread is implementation-defined.
There are three types of action that can be associated with a signal: SIG_DFL, SIG_IGN, or a pointer to a function. Initially, all signals shall be set to SIG_DFL or SIG_IGN prior to entry of the main() routine (see the exec functions). The actions prescribed by these values are as follows.
Signal-specific default action.
The default actions for the signals defined in this volume of POSIX.1-2017 are specified under <signal.h>. The default actions for the realtime signals in the range SIGRTMIN to SIGRTMAX shall be to terminate the process abnormally.
If the default action is to terminate the process abnormally, the process is terminated as if by a call to _exit(), except that the status made available to wait(), waitid(), and waitpid() indicates abnormal termination by the signal. [XSI] If the default action is to terminate the process abnormally with additional actions, implementation-defined abnormal termination actions, such as creation of a core file, may also occur.
If the default action is to stop the process, the execution of that process is temporarily suspended. When a process stops, a SIGCHLD signal shall be generated for its parent process, unless the parent process has set the SA_NOCLDSTOP flag. While a process is stopped, any additional signals that are sent to the process shall not be delivered until the process is continued, except SIGKILL which always terminates the receiving process. A process that is a member of an orphaned process group shall not be allowed to stop in response to the SIGTSTP, SIGTTIN, or SIGTTOU signals. In cases where delivery of one of these signals would stop such a process, the signal shall be discarded.
If the default action is to ignore the signal, delivery of the signal shall have no effect on the process.
Setting a signal action to SIG_DFL for a signal that is pending, and whose default action is to ignore the signal (for example, SIGCHLD), shall cause the pending signal to be discarded, whether or not it is blocked. Any queued values pending shall be discarded and the resources used to queue them shall be released and returned to the system for other use.
The default action for SIGCONT is to resume execution at the point where the process was stopped, after first handling any pending unblocked signals.
[XSI] When a stopped process is continued, a SIGCHLD signal may be generated for its parent process, unless the parent process has set the SA_NOCLDSTOP flag.
Ignore signal.
Delivery of the signal shall have no effect on the process. The behavior of a process is undefined after it ignores a SIGFPE, SIGILL, SIGSEGV, or SIGBUS signal that was not generated by kill(), sigqueue(), or raise().
The system shall not allow the action for the signals SIGKILL or SIGSTOP to be set to SIG_IGN.
Setting a signal action to SIG_IGN for a signal that is pending shall cause the pending signal to be discarded, whether or not it is blocked.
If a process sets the action for the SIGCHLD signal to SIG_IGN, the behavior is unspecified,
[XSI]
except as specified under "Consequences of Process Termination" in the description of the _Exit() function (see XSH _Exit).
Any queued values pending shall be discarded and the resources used to queue them shall be released and made available to queue other signals.
Catch signal.
On delivery of the signal, the receiving process is to execute the signal-catching function at the specified address. After returning from the signal-catching function, the receiving process shall resume execution at the point at which it was interrupted.
If the SA_SIGINFO flag for the signal is cleared, the signal-catching function shall be entered as a C-language function call as follows:
void func(int signo);
If the SA_SIGINFO flag for the signal is set, the signal-catching function shall be entered as a C-language function call as follows:
void func(int signo, siginfo_t *info, void *context);
where func is the specified signal-catching function, signo is the signal number of the signal being delivered, and info is a pointer to a siginfo_t structure defined in <signal.h> containing at least the following members:
Member Type |
Member Name |
Description |
---|---|---|
int |
si_signo |
Signal number. |
int |
si_code |
Cause of the signal. |
pid_t |
si_pid |
Sending process ID. |
uid_t |
si_uid |
Real user ID of sending process. |
void * |
si_addr |
Address of faulting instruction. |
int |
si_status |
Exit value or signal. |
union sigval |
si_value |
Signal value. |
The si_signo member shall contain the signal number. This shall be the same as the signo parameter. The si_code member shall contain a code identifying the cause of the signal. The following non-signal-specific values are defined for si_code:
Signal-specific values for si_code are also defined, as described in XBD <signal.h>.
If the signal was not generated by one of the functions or events listed above, si_code shall be set either to one of the signal-specific values described in XBD <signal.h>, or to an implementation-defined value that is not equal to any of the values defined above.
If si_code is SI_USER or SI_QUEUE, [XSI] or any value less than or equal to 0, then the signal was generated by a process and si_pid and si_uid shall be set to the process ID and the real user ID of the sender, respectively.
In addition, si_addr, si_pid, si_status, and si_uid shall be set for certain signal-specific values of si_code, as described in XBD <signal.h>.
If si_code is one of SI_QUEUE, SI_TIMER, SI_ASYNCIO, or SI_MESGQ, then si_value shall contain the application-specified signal value. Otherwise, the contents of si_value are undefined.
The behavior of a process is undefined after it returns normally from a signal-catching function for a SIGBUS, SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(), sigqueue(), or raise().
The system shall not allow a process to catch the signals SIGKILL and SIGSTOP.
If a process establishes a signal-catching function for the SIGCHLD signal while it has a terminated child process for which it has not waited, it is unspecified whether a SIGCHLD signal is generated to indicate that child process.
If the process is multi-threaded, or if the process is single-threaded and a signal handler is executed other than as the result of:
The process calling abort(), raise(), kill(), pthread_kill(), or sigqueue() to generate a signal that is not blocked
A pending signal being unblocked and being delivered before the call that unblocked it returns
the behavior is undefined if the signal handler refers to any object other than errno with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t, or if the signal handler calls any function defined in this standard other than one of the functions listed in the following table.
The following table defines a set of functions that shall be async-signal-safe. Therefore, applications can call them, without restriction, from signal-catching functions. Note that, although there is no restriction on the calls themselves, for certain functions there are restrictions on subsequent behavior after the function is called from a signal-catching function (see longjmp).
Any function not in the above table may be unsafe with respect to signals. Implementations may make other interfaces async-signal-safe. In the presence of signals, all functions defined by this volume of POSIX.1-2017 shall behave as defined when called from or interrupted by a signal-catching function, with the exception that when a signal interrupts an unsafe function or equivalent (such as the processing equivalent to exit() performed after a return from the initial call to main()) and the signal-catching function calls an unsafe function, the behavior is undefined. Additional exceptions are specified in the descriptions of individual functions such as longjmp().
Operations which obtain the value of errno and operations which assign a value to errno shall be async-signal-safe, provided that the signal-catching function saves the value of errno upon entry and restores it before it returns.
When a signal is delivered to a thread, if the action of that signal specifies termination, stop, or continue, the entire process shall be terminated, stopped, or continued, respectively.
Signals affect the behavior of certain functions defined by this volume of POSIX.1-2017 if delivered to a process while it is executing such a function. If the action of the signal is to terminate the process, the process shall be terminated and the function shall not return. If the action of the signal is to stop the process, the process shall stop until continued or terminated. Generation of a SIGCONT signal for the process shall cause the process to be continued, and the original function shall continue at the point the process was stopped. If the action of the signal is to invoke a signal-catching function, the signal-catching function shall be invoked; in this case the original function is said to be "interrupted" by the signal. If the signal-catching function executes a return statement, the behavior of the interrupted function shall be as described individually for that function, except as noted for unsafe functions. After returning from a signal-catching function, the value of errno is unspecified if the signal-catching function or any function it called assigned a value to errno and the signal-catching function did not save and restore the original value of errno. Signals that are ignored shall not affect the behavior of any function; signals that are blocked shall not affect the behavior of any function until they are unblocked and then delivered, except as specified for the sigpending() and sigwait() functions.
A stream is associated with an external file (which may be a physical device) [CX] or memory buffer by "opening" a file [CX] or buffer. This may involve "creating" a new file. Creating an existing file causes its former contents to be discarded if necessary. If a file can support positioning requests (such as a disk file, as opposed to a terminal), then a "file position indicator" associated with the stream is positioned at the start (byte number 0) of the file, unless the file is opened with append mode, in which case it is implementation-defined whether the file position indicator is initially positioned at the beginning or end of the file. The file position indicator is maintained by subsequent reads, writes, and positioning requests, to facilitate an orderly progression through the file.
The wide-character input functions shall read characters from the stream and convert them to wide characters as if they were read by successive calls to the fgetwc() function. Each conversion shall occur as if by a call to the mbrtowc() function, with the conversion state described by the stream's own mbstate_t object (see Stream Orientation and Encoding Rules). The byte input functions shall read characters from the stream as if by successive calls to the fgetc() function.
The wide-character output functions shall convert wide characters to characters and write them to the stream as if they were written by successive calls to the fputwc() function. Each conversion shall occur as if by a call to the wcrtomb() function, with the conversion state described by the stream's own mbstate_t object (see Stream Orientation and Encoding Rules). The byte output functions shall write characters to the stream as if by successive calls to the fputc() function.
The perror(), psiginfo(), and psignal() functions shall behave as described above for the byte output functions if the stream is already byte-oriented, and shall behave as described above for the wide-character output functions if the stream is already wide-oriented. If the stream has no orientation, they shall behave as described for the byte output functions except that they shall not change the orientation of the stream.
Functions other than perror(), psiginfo(), and psignal() that write to streams but are neither wide-character output nor byte output functions (getopt() and wordexp()), shall behave as described above for the byte output functions, except that if the stream has no orientation, it is unspecified whether they set the stream to byte orientation or leave it with no orientation.
When a stream is "unbuffered", bytes are intended to appear from the source or at the destination as soon as possible; otherwise, bytes may be accumulated and transmitted as a block. When a stream is "fully buffered", bytes are intended to be transmitted as a block when a buffer is filled. When a stream is "line buffered", bytes are intended to be transmitted as a block when a <newline> byte is encountered. Furthermore, bytes are intended to be transmitted as a block when a buffer is filled, when input is requested on an unbuffered stream, or when input is requested on a line-buffered stream that requires the transmission of bytes. Support for these characteristics is implementation-defined, and may be affected via setbuf() and setvbuf().
A file may be disassociated from a controlling stream by "closing" the file. Output streams are flushed (any unwritten buffer contents are transmitted) before the stream is disassociated from the file. The value of a pointer to a FILE object is unspecified after the associated file is closed (including the standard streams).
A file may be subsequently reopened, by the same or another program execution, and its contents reclaimed or modified (if it can be repositioned at its start). If the main() function returns to its original caller, or if the exit() function is called, all open files are closed (hence all output streams are flushed) before program termination. Other paths to program termination, such as calling abort(), need not close all files properly.
The address of the FILE object used to control a stream may be significant; a copy of a FILE object need not necessarily serve in place of the original.
At program start-up, three streams are predefined and need not be opened explicitly: standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). When opened, the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device.
[CX] A stream associated with a memory buffer shall have the same operations for text files that a stream associated with an external file would have. In addition, the stream orientation shall be determined in exactly the same fashion.
Input and output operations on a stream associated with a memory buffer by a call to fmemopen() shall be constrained by the implementation to take place within the bounds of the memory buffer. In the case of a stream opened by open_memstream() or open_wmemstream(), the memory area shall grow dynamically to accommodate write operations as necessary. For output, data is moved from the buffer provided by setvbuf() to the memory stream during a flush or close operation.
[CX] This section describes the interaction of file descriptors and standard I/O streams. The functionality described in this section is an extension to the ISO C standard (and the rest of this section is not further CX shaded).
An open file description may be accessed through a file descriptor, which is created using functions such as open() or pipe(), or through a stream, which is created using functions such as fopen() or popen(). Either a file descriptor or a stream is called a "handle" on the open file description to which it refers; an open file description may have several handles.
Handles can be created or destroyed by explicit user action, without affecting the underlying open file description. Some of the ways to create them include fcntl(), dup(), fdopen(), fileno(), and fork(). They can be destroyed by at least fclose(), close(), and the exec functions.
A file descriptor that is never used in an operation that could affect the file offset (for example, read(), write(), or lseek()) is not considered a handle for this discussion, but could give rise to one (for example, as a consequence of fdopen(), dup(), or fork()). This exception does not include the file descriptor underlying a stream, whether created with fopen() or fdopen(), so long as it is not used directly by the application to affect the file offset. The read() and write() functions implicitly affect the file offset; lseek() explicitly affects it.
The result of function calls involving any one handle (the "active handle") is defined elsewhere in this volume of POSIX.1-2017, but if two or more handles are used, and any one of them is a stream, the application shall ensure that their actions are coordinated as described below. If this is not done, the result is undefined.
A handle which is a stream is considered to be closed when either an fclose(), or freopen() with non-full filename, is executed on it (for freopen() with a null filename, it is implementation-defined whether a new handle is created or the existing one reused), or when the process owning that stream terminates with exit(), abort(), or due to a signal. A file descriptor is closed by close(), _exit(), or the exec functions when FD_CLOEXEC is set on that file descriptor.
For a handle to become the active handle, the application shall ensure that the actions below are performed between the last use of the handle (the current active handle) and the first use of the second handle (the future active handle). The second handle then becomes the active handle. All activity by the application affecting the file offset on the first handle shall be suspended until it again becomes the active file handle. (If a stream function has as an underlying function one that affects the file offset, the stream function shall be considered to affect the file offset.)
The handles need not be in the same process for these rules to apply.
Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. The application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec functions or _exit() (not exit()), the handle is never accessed in that process.)
For the first handle, the first applicable condition below applies. After the actions required below are taken, if the handle is still open, the application can close it.
If it is a file descriptor, no action is required.
If the only further action to be performed on any handle to this open file descriptor is to close it, no action need be taken.
If it is a stream which is unbuffered, no action need be taken.
If it is a stream which is line buffered, and the last byte written to the stream was a <newline> (that is, as if a:
putc('\n')
was the most recent operation on that stream), no action need be taken.
If it is a stream which is open for writing or appending (but not also open for reading), the application shall either perform an fflush(), or the stream shall be closed.
If the stream is open for reading and it is at the end of the file (feof() is true), no action need be taken.
If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.
For the second handle:
If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the first handle, the application shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location.
If the active handle ceases to be accessible before the requirements on the first handle, above, have been met, the state of the open file description becomes undefined. This might occur during functions such as a fork() or _exit().
The exec functions make inaccessible all streams that are open at the time they are called, independent of which streams or file descriptors may be available to the new process image.
When these rules are followed, regardless of the sequence of handles used, implementations shall ensure that an application, even one consisting of several processes, shall yield correct results: no data shall be lost or duplicated when writing, and all data shall be written in order, except as requested by seeks. It is implementation-defined whether, and under what conditions, all input is seen exactly once.
Each function that operates on a stream is said to have zero or more "underlying functions". This means that the stream function shares certain traits with the underlying functions, but does not require that there be any relation between the implementations of the stream function and its underlying functions.
For conformance to the ISO/IEC 9899:1999 standard, the definition of a stream includes an "orientation". After a stream is associated with an external file, but before any operations are performed on it, the stream is without orientation. Once a wide-character input/output function has been applied to a stream without orientation, the stream shall become "wide-oriented". Similarly, once a byte input/output function has been applied to a stream without orientation, the stream shall become "byte-oriented". Only a call to the freopen() function or the fwide() function can otherwise alter the orientation of a stream.
A successful call to freopen() shall remove any orientation. The three predefined streams standard input, standard output, and standard error shall be unoriented at program start-up.
Byte input/output functions cannot be applied to a wide-oriented stream, and wide-character input/output functions cannot be applied to a byte-oriented stream. The remaining stream operations shall not affect and shall not be affected by a stream's orientation, except for the following additional restriction:
For wide-oriented streams, after a successful call to a file-positioning function that leaves the file position indicator prior to the end-of-file, a wide-character output function can overwrite a partial character; any file contents beyond the byte(s) written are henceforth undefined.
Each wide-oriented stream has an associated mbstate_t object that stores the current parse state of the stream. A successful call to fgetpos() shall store a representation of the value of this mbstate_t object as part of the value of the fpos_t object. A later successful call to fsetpos() using the same stored fpos_t value shall restore the value of the associated mbstate_t object as well as the position within the controlled stream.
Implementations that support multiple encoding rules associate an encoding rule with the stream. The encoding rule shall be determined by the setting of the LC_CTYPE category in the current locale at the time when the stream becomes wide-oriented. As with the stream's orientation, the encoding rule associated with a stream cannot be changed once it has been set, except by a successful call to freopen() which clears the encoding rule and resets the orientation to unoriented.
Although wide-oriented streams are conceptually sequences of wide characters, the external file associated with a wide-oriented stream is a sequence of (possibly multi-byte) characters generalized as follows:
Multi-byte encodings within files may contain embedded null bytes (unlike multi-byte encodings valid for use internal to the program).
A file need not begin nor end in the initial shift state.
Moreover, the encodings used for characters may differ among files. Both the nature and choice of such encodings are implementation-defined.
The wide-character input functions read characters from the stream and convert them to wide characters as if they were read by successive calls to the fgetwc() function. Each conversion shall occur as if by a call to the mbrtowc() function, with the conversion state described by the stream's own mbstate_t object, [CX] except the encoding rule associated with the stream is used instead of the encoding rule implied by the LC_CTYPE category of the current locale.
The wide-character output functions convert wide characters to (possibly multi-byte) characters and write them to the stream as if they were written by successive calls to the fputwc() function. Each conversion shall occur as if by a call to the wcrtomb() function, with the conversion state described by the stream's own mbstate_t object, [CX] except the encoding rule associated with the stream is used instead of the encoding rule implied by the LC_CTYPE category of the current locale.
An "encoding error" shall occur if the character sequence presented to the underlying mbrtowc() function does not form a valid (generalized) character, or if the code value passed to the underlying wcrtomb() function does not correspond to a valid (generalized) character. The wide-character input/output functions and the byte input/output functions store the value of the macro [EILSEQ] in errno if and only if an encoding error occurs.
[OB XSR] STREAMS functionality is provided on implementations supporting the XSI STREAMS Option Group. The functionality described in this section is dependent on support of the XSI STREAMS option (and the rest of this section is not further marked for this option).
STREAMS provides a uniform mechanism for implementing networking services and other character-based I/O. The STREAMS function provides direct access to protocol modules. STREAMS modules are unspecified objects. Access to STREAMS modules is provided by interfaces in POSIX.1-2017. Creation of STREAMS modules is outside the scope of POSIX.1-2017.
A STREAM is typically a full-duplex connection between a process and an open device or pseudo-device. However, since pipes may be STREAMS-based, a STREAM can be a full-duplex connection between two processes. The STREAM itself exists entirely within the implementation and provides a general character I/O function for processes. It optionally includes one or more intermediate processing modules that are interposed between the process end of the STREAM (STREAM head) and a device driver at the end of the STREAM (STREAM end).
STREAMS I/O is based on messages. There are three types of message:
Data messages containing actual data for input or output
Control data containing instructions for the STREAMS modules and underlying implementation
Other messages, which include file descriptors
The interface between the STREAM and the rest of the implementation is provided by a set of functions at the STREAM head. When a process calls write(), writev(), putmsg(), putpmsg(), or ioctl(), messages are sent down the STREAM, and read(), readv(), getmsg(), or getpmsg() accepts data from the STREAM and passes it to a process. Data intended for the device at the downstream end of the STREAM is packaged into messages and sent downstream, while data and signals from the device are composed into messages by the device driver and sent upstream to the STREAM head.
When a STREAMS-based device is opened, a STREAM shall be created that contains the STREAM head and the STREAM end (driver). If pipes are STREAMS-based in an implementation, when a pipe is created, two STREAMS shall be created, each containing a STREAM head. Other modules are added to the STREAM using ioctl(). New modules are "pushed" onto the STREAM one at a time in last-in, first-out (LIFO) style, as though the STREAM was a push-down stack.
Message types are classified according to their queuing priority and may be normal (non-priority), priority, or high-priority messages. A message belongs to a particular priority band that determines its ordering when placed on a queue. Normal messages have a priority band of 0 and shall always be placed at the end of the queue following all other messages in the queue. High-priority messages are always placed at the head of a queue, but shall be discarded if there is already a high-priority message in the queue. Their priority band shall be ignored; they are high-priority by virtue of their type. Priority messages have a priority band greater than 0. Priority messages are always placed after any messages of the same or higher priority. High-priority and priority messages are used to send control and data information outside the normal flow of control. By convention, high-priority messages shall not be affected by flow control. Normal and priority messages have separate flow controls.
A process may access STREAMS messages that contain a data part, control part, or both. The data part is that information which is transmitted over the communication medium and the control information is used by the local STREAMS modules. The other types of messages are used between modules and are not accessible to processes. Messages containing only a data part are accessible via putmsg(), putpmsg(), getmsg(), getpmsg(), read(), readv(), write(), or writev(). Messages containing a control part with or without a data part are accessible via calls to putmsg(), putpmsg(), getmsg(), or getpmsg().
A process accesses STREAMS-based files using the standard functions close(), ioctl(), getmsg(), getpmsg(), open(), pipe(), poll(), putmsg(), putpmsg(), read(), or write(). Refer to the applicable function definitions for general properties and errors.
Calls to ioctl() shall perform control functions on the STREAM associated with the file descriptor fildes. The control functions may be performed by the STREAM head, a STREAMS module, or the STREAMS driver for the STREAM.
STREAMS modules and drivers can detect errors, sending an error message to the STREAM head, thus causing subsequent functions to fail and set errno to the value specified in the message. In addition, STREAMS modules and drivers can elect to fail a particular ioctl() request alone by sending a negative acknowledgement message to the STREAM head. This shall cause just the pending ioctl() request to fail and set errno to the value specified in the message.
[XSI] This section describes extensions to support interprocess communication. The functionality described in this section shall be provided on implementations that support the XSI option (and the rest of this section is not further marked).
The following message passing, semaphore, and shared memory services form an XSI interprocess communication facility. Certain aspects of their operation are common, and are defined as follows.
IPC Functions |
||
---|---|---|
Another interprocess communication facility is provided by functions in the Realtime Option Group; see Realtime.
Each individual shared memory segment, message queue, and semaphore set shall be identified by a unique positive integer, called, respectively, a shared memory identifier, shmid, a semaphore identifier, semid, and a message queue identifier, msqid. The identifiers shall be returned by calls to shmget(), semget(), and msgget(), respectively.
Associated with each identifier is a data structure which contains data related to the operations which may be or may have been performed; see the Base Definitions volume of POSIX.1-2017, <sys/shm.h>, <sys/sem.h>, and <sys/msg.h> for their descriptions.
Each of the data structures contains both ownership information and an ipc_perm structure (see the Base Definitions volume of POSIX.1-2017, <sys/ipc.h>) which are used in conjunction to determine whether or not read/write (read/alter for semaphores) permissions should be granted to processes using the IPC facilities. The mode member of the ipc_perm structure acts as a bit field which determines the permissions.
The values of the bits are given below in octal notation.
Bit |
Meaning |
---|---|
0400 |
Read by user. |
0200 |
Write by user. |
0040 |
Read by group. |
0020 |
Write by group. |
0004 |
Read by others. |
0002 |
Write by others. |
The name of the ipc_perm structure is shm_perm, sem_perm, or msg_perm, depending on which service is being used. In each case, read and write/alter permissions shall be granted to a process if one or more of the following are true ( "xxx" is replaced by shm, sem, or msg, as appropriate):
The process has appropriate privileges.
The effective user ID of the process matches xxx_perm.cuid or xxx_perm.uid in the data structure associated with the IPC identifier, and the appropriate bit of the user field in xxx_perm.mode is set.
The effective user ID of the process does not match xxx_perm.cuid or xxx_perm.uid but the effective group ID of the process matches xxx_perm.cgid or xxx_perm.gid in the data structure associated with the IPC identifier, and the appropriate bit of the group field in xxx_perm.mode is set.
The effective user ID of the process does not match xxx_perm.cuid or xxx_perm.uid and the effective group ID of the process does not match xxx_perm.cgid or xxx_perm.gid in the data structure associated with the IPC identifier, but the appropriate bit of the other field in xxx_perm.mode is set.
Otherwise, the permission shall be denied.
In addition to the ipc_perm structure, each associated data structure includes several time_t fields for recording timestamps of particular operations. When an operation is described as setting a timestamp to the current time, that particular timestamp member of the associated data structure shall be set to the largest time_t value which is not greater than the current time.
This section defines functions to support the source portability of applications with realtime requirements. The presence of some of these functions is dependent on support for implementation options described in the text.
The specific functional areas included in this section and their scope include the following. Full definitions of these terms can be found in XBD Definitions.
Semaphores
Process Memory Locking
Memory Mapped Files and Shared Memory Objects
Priority Scheduling
Realtime Signal Extension
Timers
Interprocess Communication
Synchronized Input and Output
Asynchronous Input and Output
All the realtime functions defined in this volume of POSIX.1-2017 are portable, although some of the numeric parameters used by an implementation may have hardware dependencies.
See Realtime Signal Generation and Delivery.
An asynchronous I/O control block structure aiocb is used in many asynchronous I/O functions. It is defined in the Base Definitions volume of POSIX.1-2017, <aio.h> and has at least the following members:
Member Type |
Member Name |
Description |
---|---|---|
int |
aio_fildes |
File descriptor. |
off_t |
aio_offset |
File offset. |
volatile void* |
aio_buf |
Location of buffer. |
size_t |
aio_nbytes |
Length of transfer. |
int |
aio_reqprio |
Request priority offset. |
struct sigevent |
aio_sigevent |
Signal number and value. |
int |
aio_lio_opcode |
Operation to be performed. |
The aio_fildes element is the file descriptor on which the asynchronous operation is performed.
If O_APPEND is not set for the file descriptor aio_fildes and if aio_fildes is associated with a device that is capable of seeking, then the requested operation takes place at the absolute position in the file as given by aio_offset, as if lseek() were called immediately prior to the operation with an offset argument equal to aio_offset and a whence argument equal to SEEK_SET. If O_APPEND is set for the file descriptor, or if aio_fildes is associated with a device that is incapable of seeking, write operations append to the file in the same order as the calls were made, with the following exception: under implementation-defined circumstances, such as operation on a multi-processor or when requests of differing priorities are submitted at the same time, the ordering restriction may be relaxed. Since there is no way for a strictly conforming application to determine whether this relaxation applies, all strictly conforming applications which rely on ordering of output shall be written in such a way that they will operate correctly if the relaxation applies. After a successful call to enqueue an asynchronous I/O operation, the value of the file offset for the file is unspecified. The aio_nbytes and aio_buf elements are the same as the nbyte and buf arguments defined by read() and write(), respectively.
If _POSIX_PRIORITIZED_IO and _POSIX_PRIORITY_SCHEDULING are defined, then asynchronous I/O is queued in priority order, with the priority of each asynchronous operation based on the current scheduling priority of the calling process. The aio_reqprio member can be used to lower (but not raise) the asynchronous I/O operation priority and is within the range zero through {AIO_PRIO_DELTA_MAX}, inclusive. Unless both _POSIX_PRIORITIZED_IO and _POSIX_PRIORITY_SCHEDULING are defined, the order of processing asynchronous I/O requests is unspecified. When both _POSIX_PRIORITIZED_IO and _POSIX_PRIORITY_SCHEDULING are defined, the order of processing of requests submitted by processes whose schedulers are not SCHED_FIFO, SCHED_RR, or SCHED_SPORADIC is unspecified. The priority of an asynchronous request is computed as (process scheduling priority) minus aio_reqprio. The priority assigned to each asynchronous I/O request is an indication of the desired order of execution of the request relative to other asynchronous I/O requests for this file. If _POSIX_PRIORITIZED_IO is defined, requests issued with the same priority to a character special file are processed by the underlying device in FIFO order; the order of processing of requests of the same priority issued to files that are not character special files is unspecified. Numerically higher priority values indicate requests of higher priority. The value of aio_reqprio has no effect on process scheduling priority. When prioritized asynchronous I/O requests to the same file are blocked waiting for a resource required for that I/O operation, the higher-priority I/O requests shall be granted the resource before lower-priority I/O requests are granted the resource. The relative priority of asynchronous I/O and synchronous I/O is implementation-defined. If _POSIX_PRIORITIZED_IO is defined, the implementation shall define for which files I/O prioritization is supported.
The aio_sigevent determines how the calling process shall be notified upon I/O completion, as specified in Signal Generation and Delivery. If aio_sigevent.sigev_notify is SIGEV_NONE, then no signal shall be posted upon I/O completion, but the error status for the operation and the return status for the operation shall be set appropriately.
The aio_lio_opcode field is used only by the lio_listio() call. The lio_listio() call allows multiple asynchronous I/O operations to be submitted at a single time. The function takes as an argument an array of pointers to aiocb structures. Each aiocb structure indicates the operation to be performed (read or write) via the aio_lio_opcode field.
The address of the aiocb structure is used as a handle for retrieving the error status and return status of the asynchronous operation while it is in progress.
The aiocb structure and the data buffers associated with the asynchronous I/O operation are being used by the system for asynchronous I/O while, and only while, the error status of the asynchronous operation is equal to [EINPROGRESS]. Applications shall not modify the aiocb structure while the structure is being used by the system for asynchronous I/O.
The return status of the asynchronous operation is the number of bytes transferred by the I/O operation. If the error status is set to indicate an error completion, then the return status is set to the return value that the corresponding read(), write(), or fsync() call would have returned. When the error status is not equal to [EINPROGRESS], the return status shall reflect the return status of the corresponding synchronous operation.
[MLR] Range memory locking operations are defined in terms of pages. Implementations may restrict the size and alignment of range lockings to be on page-size boundaries. The page size, in bytes, is the value of the configurable system variable {PAGESIZE}. If an implementation has no restrictions on size or alignment, it may specify a 1-byte page size.
[ML|MLR] Memory locking guarantees the residence of portions of the address space. It is implementation-defined whether locking memory guarantees fixed translation between virtual addresses (as seen by the process) and physical addresses. Per-process memory locks are not inherited across a fork(), and all memory locks owned by a process are unlocked upon exec or process termination. Unmapping of an address range removes any memory locks established on that address range by this process.
Range memory mapping operations are defined in terms of pages. Implementations may restrict the size and alignment of range mappings to be on page-size boundaries. The page size, in bytes, is the value of the configurable system variable {PAGESIZE}. If an implementation has no restrictions on size or alignment, it may specify a 1-byte page size.
Memory mapped files provide a mechanism that allows a process to access files by directly incorporating file data into its address space. Once a file is mapped into a process address space, the data can be manipulated as memory. If more than one process maps a file, its contents are shared among them. If the mappings allow shared write access, then data written into the memory object through the address space of one process appears in the address spaces of all processes that similarly map the same portion of the memory object.
[SHM] Shared memory objects are named regions of storage that may be independent of the file system and can be mapped into the address space of one or more processes to allow them to share the associated memory.
An unlink() of a file [SHM] or shm_unlink() of a shared memory object, while causing the removal of the name, does not unmap any mappings established for the object. Once the name has been removed, the contents of the memory object are preserved as long as it is referenced. The memory object remains referenced as long as a process has the memory object open or has some area of the memory object mapped.
When an object is mapped, various application accesses to the mapped region may result in signals. In this context, SIGBUS is used to indicate an error using the mapped object, and SIGSEGV is used to indicate a protection violation or misuse of an address:
A mapping may be restricted to disallow some types of access.
Write attempts to memory that was mapped without write access, or any access to memory mapped PROT_NONE, shall result in a SIGSEGV signal.
References to unmapped addresses shall result in a SIGSEGV signal.
Reference to whole pages within the mapping, but beyond the current length of the object, shall result in a SIGBUS signal.
The size of the object is unaffected by access beyond the end of the object (even if a SIGBUS is not generated).
[TYM] The functionality described in this section shall be provided on implementations that support the Typed Memory Objects option (and the rest of this section is not further marked for this option).
Implementations may support the Typed Memory Objects option independently of support for memory mapped files or shared memory objects. Typed memory objects are implementation-configurable named storage pools accessible from one or more processors in a system, each via one or more ports, such as backplane buses, LANs, I/O channels, and so on. Each valid combination of a storage pool and a port is identified through a name that is defined at system configuration time, in an implementation-defined manner; the name may be independent of the file system. Using this name, a typed memory object can be opened and mapped into process address space. For a given storage pool and port, it is necessary to support both dynamic allocation from the pool as well as mapping at an application-supplied offset within the pool; when dynamic allocation has been performed, subsequent deallocation must be supported. Lastly, accessing typed memory objects from different ports requires a method for obtaining the offset and length of contiguous storage of a region of typed memory (dynamically allocated or not); this allows typed memory to be shared among processes and/or processors while being accessed from the desired port.
[PS] The functionality described in this section shall be provided on implementations that support the Process Scheduling option (and the rest of this section is not further marked for this option).
The scheduling semantics described in this volume of POSIX.1-2017 are defined in terms of a conceptual model that contains a set of thread lists. No implementation structures are necessarily implied by the use of this conceptual model. It is assumed that no time elapses during operations described using this model, and therefore no simultaneous operations are possible. This model discusses only processor scheduling for runnable threads, but it should be noted that greatly enhanced predictability of realtime applications results if the sequencing of other resources takes processor scheduling policy into account.
There is, conceptually, one thread list for each priority. A runnable thread will be on the thread list for that thread's priority. Multiple scheduling policies shall be provided. Each non-empty thread list is ordered, contains a head as one end of its order, and a tail as the other. The purpose of a scheduling policy is to define the allowable operations on this set of lists (for example, moving threads between and within lists).
The POSIX model treats a "process" as an aggregation of system resources, including one or more threads that may be scheduled by the operating system on the processor(s) it controls. Although a process has its own set of scheduling attributes, these have an indirect effect (if any) on the scheduling behavior of individual threads as described below.
Each thread shall be controlled by an associated scheduling policy and priority. These parameters may be specified by explicit application execution of the pthread_setschedparam() function. Additionally, the scheduling parameters of a thread (but not its scheduling policy) may be changed by application execution of the pthread_setschedprio() function.
Each process shall be controlled by an associated scheduling policy and priority. These parameters may be specified by explicit application execution of the sched_setscheduler() or sched_setparam() functions.
The effect of the process scheduling attributes on individual threads in the process is dependent on the scheduling contention scope of the threads (see Thread Scheduling):
For threads with system scheduling contention scope, the process scheduling attributes shall have no effect on the scheduling attributes or behavior either of the thread or an underlying kernel scheduling entity dedicated to that thread.
For threads with process scheduling contention scope, the process scheduling attributes shall have no effect on the scheduling attributes of the thread. However, any underlying kernel scheduling entity used by these threads shall at all times behave as specified by the scheduling attributes of the containing process, and this behavior may affect the scheduling behavior of the process contention scope threads. For example, a process contention scope thread with scheduling policy SCHED_FIFO and the system maximum priority H (the value returned by sched_get_priority_max(SCHED_FIFO)) in a process with scheduling policy SCHED_RR and system minimum priority L (the value returned by sched_get_priority_min(SCHED_RR)) shall be subject to timeslicing and to preemption by any thread with an effective priority higher than L.
Associated with each policy is a priority range. Each policy definition shall specify the minimum priority range for that policy. The priority ranges for each policy may but need not overlap the priority ranges of other policies.
A conforming implementation shall select the thread that is defined as being at the head of the highest priority non-empty thread list to become a running thread, regardless of its associated policy. This thread is then removed from its thread list.
Four scheduling policies are specifically required. Other implementation-defined scheduling policies may be defined. The following symbols are defined in the Base Definitions volume of POSIX.1-2017, <sched.h>:
The values of these symbols shall be distinct.
Conforming implementations shall include a scheduling policy called the FIFO scheduling policy.
Threads scheduled under this policy are chosen from a thread list that is ordered by the time its threads have been on the list without being executed; generally, the head of the list is the thread that has been on the list the longest time, and the tail is the thread that has been on the list the shortest time.
Under the SCHED_FIFO policy, the modification of the definitional thread lists is as follows:
When a running thread becomes a preempted thread, it becomes the head of the thread list for its priority.
When a blocked thread becomes a runnable thread, it becomes the tail of the thread list for its priority.
When a running thread calls the sched_setscheduler() function, the process specified in the function call is modified to the specified policy and the priority specified by the param argument.
When a running thread calls the sched_setparam() function, the priority of the process specified in the function call is modified to the priority specified by the param argument.
When a running thread calls the pthread_setschedparam() function, the thread specified in the function call is modified to the specified policy and the priority specified by the param argument.
When a running thread calls the pthread_setschedprio() function, the thread specified in the function call is modified to the priority specified by the prio argument.
If a thread whose policy or priority has been modified other than by pthread_setschedprio() is a running thread or is runnable, it then becomes the tail of the thread list for its new priority.
If a thread whose priority has been modified by pthread_setschedprio() is a running thread or is runnable, the effect on its position in the thread list depends on the direction of the modification, as follows:
If the priority is raised, the thread becomes the tail of the thread list.
If the priority is unchanged, the thread does not change position in the thread list.
If the priority is lowered, the thread becomes the head of the thread list.
When a running thread issues the sched_yield() function, the thread becomes the tail of the thread list for its priority.
At no other time is the position of a thread with this scheduling policy within the thread lists affected.
For this policy, valid priorities shall be within the range returned by the sched_get_priority_max() and sched_get_priority_min() functions when SCHED_FIFO is provided as the parameter. Conforming implementations shall provide a priority range of at least 32 priorities for this policy.
Conforming implementations shall include a scheduling policy called the "round robin" scheduling policy. This policy shall be identical to the SCHED_FIFO policy with the additional condition that when the implementation detects that a running thread has been executing as a running thread for a time period of the length returned by the sched_rr_get_interval() function or longer, the thread shall become the tail of its thread list and the head of that thread list shall be removed and made a running thread.
The effect of this policy is to ensure that if there are multiple SCHED_RR threads at the same priority, one of them does not monopolize the processor. An application should not rely only on the use of SCHED_RR to ensure application progress among multiple threads if the application includes threads using the SCHED_FIFO policy at the same or higher priority levels or SCHED_RR threads at a higher priority level.
A thread under this policy that is preempted and subsequently resumes execution as a running thread completes the unexpired portion of its round robin interval time period.
For this policy, valid priorities shall be within the range returned by the sched_get_priority_max() and sched_get_priority_min() functions when SCHED_RR is provided as the parameter. Conforming implementations shall provide a priority range of at least 32 priorities for this policy.
[SS|TSP] The functionality described in this section shall be provided on implementations that support the Process Sporadic Server or Thread Sporadic Server options (and the rest of this section is not further marked for these options).
If _POSIX_SPORADIC_SERVER or _POSIX_THREAD_SPORADIC_SERVER is defined, the implementation shall include a scheduling policy identified by the value SCHED_SPORADIC.
The sporadic server policy is based primarily on two time parameters: the replenishment period and the available execution capacity. The replenishment period is given by the sched_ss_repl_period member of the sched_param structure. The available execution capacity is initialized to the value given by the sched_ss_init_budget member of the same parameter. The sporadic server policy is identical to the SCHED_FIFO policy with some additional conditions that cause the thread's assigned priority to be switched between the values specified by the sched_priority and sched_ss_low_priority members of the sched_param structure.
The priority assigned to a thread using the sporadic server scheduling policy is determined in the following manner: if the available execution capacity is greater than zero and the number of pending replenishment operations is strictly less than sched_ss_max_repl, the thread is assigned the priority specified by sched_priority; otherwise, the assigned priority shall be sched_ss_low_priority. If the value of sched_priority is less than or equal to the value of sched_ss_low_priority, the results are undefined. When active, the thread shall belong to the thread list corresponding to its assigned priority level, according to the mentioned priority assignment. The modification of the available execution capacity and, consequently of the assigned priority, is done as follows:
When the thread at the head of the sched_priority list becomes a running thread, its execution time shall be limited to at most its available execution capacity, plus the resolution of the execution time clock used for this scheduling policy. This resolution shall be implementation-defined.
Each time the thread is inserted at the tail of the list associated with sched_priority- because as a blocked thread it became runnable with priority sched_priority or because a replenishment operation was performed-the time at which this operation is done is posted as the activation_time.
When the running thread with assigned priority equal to sched_priority becomes a preempted thread, it becomes the head of the thread list for its priority, and the execution time consumed is subtracted from the available execution capacity. If the available execution capacity would become negative by this operation, it shall be set to zero.
When the running thread with assigned priority equal to sched_priority becomes a blocked thread, the execution time consumed is subtracted from the available execution capacity, and a replenishment operation is scheduled, as described in 6 and 7. If the available execution capacity would become negative by this operation, it shall be set to zero.
When the running thread with assigned priority equal to sched_priority reaches the limit imposed on its execution time, it becomes the tail of the thread list for sched_ss_low_priority, the execution time consumed is subtracted from the available execution capacity (which becomes zero), and a replenishment operation is scheduled, as described in 6 and 7.
Each time a replenishment operation is scheduled, the amount of execution capacity to be replenished, replenish_amount, is set equal to the execution time consumed by the thread since the activation_time. The replenishment is scheduled to occur at activation_time plus sched_ss_repl_period. If the scheduled time obtained is before the current time, the replenishment operation is carried out immediately. Several replenishment operations may be pending at the same time, each of which will be serviced at its respective scheduled time. With the above rules, the number of replenishment operations simultaneously pending for a given thread that is scheduled under the sporadic server policy shall not be greater than sched_ss_max_repl.
A replenishment operation consists of adding the corresponding replenish_amount to the available execution capacity at the scheduled time. If, as a consequence of this operation, the execution capacity would become larger than sched_ss_initial_budget, it shall be rounded down to a value equal to sched_ss_initial_budget. Additionally, if the thread was runnable or running, and had assigned priority equal to sched_ss_low_priority, then it becomes the tail of the thread list for sched_priority.
Execution time is defined in XBD CPU Time (Execution Time).
For this policy, changing the value of a CPU-time clock via clock_settime() shall have no effect on its behavior.
For this policy, valid priorities shall be within the range returned by the sched_get_priority_min() and sched_get_priority_max() functions when SCHED_SPORADIC is provided as the parameter. Conforming implementations shall provide a priority range of at least 32 distinct priorities for this policy.
If the scheduling policy of the target process is either SCHED_FIFO or SCHED_RR, the sched_ss_low_priority, sched_ss_repl_period, and sched_ss_init budget members of the param argument shall have no effect on the scheduling behavior. If the scheduling policy of this process is not SCHED_FIFO, SCHED_RR, or SCHED_SPORADIC, the effects of these members are implementation-defined; this case includes the SCHED_OTHER policy.
Conforming implementations shall include one scheduling policy identified as SCHED_OTHER (which may execute identically with either the FIFO or round robin scheduling policy). The effect of scheduling threads with the SCHED_OTHER policy in a system in which other threads are executing under SCHED_FIFO, SCHED_RR, [SS] or SCHED_SPORADIC is implementation-defined.
This policy is defined to allow strictly conforming applications to be able to indicate in a portable manner that they no longer need a realtime scheduling policy.
For threads executing under this policy, the implementation shall use only priorities within the range returned by the sched_get_priority_max() and sched_get_priority_min() functions when SCHED_OTHER is provided as the
parameter.
The <time.h> header defines the types and manifest constants used by the timing facility.
Many of the timing facility functions accept or return time value specifications. A time value structure timespec specifies a single time value and includes at least the following members:
Member Type |
Member Name |
Description |
---|---|---|
time_t |
tv_sec |
Seconds. |
long |
tv_nsec |
Nanoseconds. |
The tv_nsec member is only valid if greater than or equal to zero, and less than the number of nanoseconds in a second (1000 million). The time interval described by this structure is (tv_sec * 109 + tv_nsec) nanoseconds.
A time value structure itimerspec specifies an initial timer value and a repetition interval for use by the per-process timer functions. This structure includes at least the following members:
Member Type |
Member Name |
Description |
---|---|---|
struct timespec |
it_interval |
Timer period. |
struct timespec |
it_value |
Timer expiration. |
If the value described by it_value is non-zero, it indicates the time to or time of the next timer expiration (for relative and absolute timer values, respectively). If the value described by it_value is zero, the timer shall be disarmed.
If the value described by it_interval is non-zero, it specifies an interval which shall be used in reloading the timer when it expires; that is, a periodic timer is specified. If the value described by it_interval is zero, the timer is disarmed after its next expiration; that is, a one-shot timer is specified.
Per-process timers may be created that notify the process of timer expirations by queuing a realtime extended signal. The sigevent structure, defined in the Base Definitions volume of POSIX.1-2017, <signal.h>, is used in creating such a timer. The sigevent structure contains the signal number and an application-specific data value which shall be used when notifying the calling process of timer expiration events.
The following constants are defined in the Base Definitions volume of POSIX.1-2017, <time.h>:
The maximum allowable resolution for CLOCK_REALTIME and [MON] CLOCK_MONOTONIC clocks and all time services based on these clocks is represented by {_POSIX_CLOCKRES_MIN} and shall be defined as 20 ms (1/50 of a second). Implementations may support smaller values of resolution for these clocks to provide finer granularity time bases. The actual resolution supported by an implementation for a specific clock is obtained using the clock_getres() function. If the actual resolution supported for a time service based on one of these clocks differs from the resolution supported for that clock, the implementation shall document this difference.
The minimum allowable maximum value for CLOCK_REALTIME and [MON] CLOCK_MONOTONIC clocks and all absolute time services based on them is the same as that defined by the ISO C standard for the time_t type. If the maximum value supported by a time service based on one of these clocks differs from the maximum value supported by that clock, the implementation shall document this difference.
[CPT] If _POSIX_CPUTIME is defined, process CPU-time clocks shall be supported in addition to the clocks described in Manifest Constants.
[TCT] If _POSIX_THREAD_CPUTIME is defined, thread CPU-time clocks shall be supported.
[CPT|TCT] CPU-time clocks measure execution or CPU time, which is defined in XBD CPU Time (Execution Time). The mechanism used to measure execution time is described in XBD Measurement of Execution Time.
[CPT] If _POSIX_CPUTIME is defined, the following constant of the type clockid_t is defined in <time.h>:
[TCT] If _POSIX_THREAD_CPUTIME is defined, the following constant of the type clockid_t is defined in <time.h>:
This section defines functionality to support multiple flows of control, called "threads", within a process. For the definition of threads, see XBD Thread.
The specific functional areas covered by threads and their scope include:
Thread management: the creation, control, and termination of multiple flows of control in the same process under the assumption of a common shared address space
Synchronization primitives optimized for tightly coupled operation of multiple control flows in a common, shared address space
All functions defined by this volume of POSIX.1-2017 shall be thread-safe, except that the following functions1 need not be thread-safe.
The ctermid() and tmpnam()
functions need not be thread-safe if passed a NULL argument. The mbrlen(), mbrtowc(), mbsnrtowcs(), mbsrtowcs(), wcrtomb(), wcsnrtombs(), and wcsrtombs() functions
need not be thread-safe if passed a NULL ps argument. The getc_unlocked(), getchar_unlocked(), putc_unlocked(), and putchar_unlocked() functions need not be thread-safe unless the invoking thread
owns the (FILE *) object accessed by the call, as is the case after a successful call to the flockfile() or ftrylockfile()
functions.
Implementations shall provide internal synchronization as necessary in order to satisfy this requirement.
Since multi-threaded applications are not allowed to use the environ variable to access or modify any environment variable while any other thread is concurrently modifying any environment variable, any function dependent on any environment variable is not thread-safe if another thread is modifying the environment; see XSH exec.
Although implementations may have thread IDs that are unique in a system, applications should only assume that thread IDs are usable and unique within a single process. The effect of calling any of the functions defined in this volume of POSIX.1-2017 and passing as an argument the thread ID of a thread from another process is unspecified. The lifetime of a thread ID ends after the thread terminates if it was created with the detachstate attribute set to PTHREAD_CREATE_DETACHED or if pthread_detach() or pthread_join() has been called for that thread. A conforming implementation is free to reuse a thread ID after its lifetime has ended. If an application attempts to use a thread ID whose lifetime has ended, the behavior is undefined.
If a thread is detached, its thread ID is invalid for use as an argument in a call to pthread_detach() or pthread_join().
A thread that has blocked shall not prevent any unblocked thread that is eligible to use the same processing resources from eventually making forward progress in its execution. Eligibility for processing resources is determined by the scheduling policy.
A thread shall become the owner of a mutex, m, when one of the following occurs:
It calls pthread_mutex_lock() with m as the mutex argument and the call returns zero or [EOWNERDEAD].
It calls pthread_mutex_trylock() with m as the mutex argument and the call returns zero or [EOWNERDEAD].
It calls pthread_mutex_timedlock() with m as the mutex argument and the call returns zero or [EOWNERDEAD].
It calls pthread_mutex_setprioceiling() with m as the mutex argument and the call returns [EOWNERDEAD].
It calls pthread_cond_wait() with m as the mutex argument and the call returns zero or certain error numbers (see pthread_cond_timedwait).
It calls pthread_cond_timedwait() with m as the mutex argument and the call returns zero or certain error numbers (see pthread_cond_timedwait).
The thread shall remain the owner of m until one of the following occurs:
It executes pthread_mutex_unlock() with m as the mutex argument
It blocks in a call to pthread_cond_wait() with m as the mutex argument.
It blocks in a call to pthread_cond_timedwait() with m as the mutex argument.
The implementation shall behave as if at all times there is at most one owner of any mutex.
A thread that becomes the owner of a mutex is said to have "acquired" the mutex and the mutex is said to have become "locked''; when a thread gives up ownership of a mutex it is said to have "released" the mutex and the mutex is said to have become "unlocked".
A problem can occur if a process terminates while one of its threads holds a mutex lock. Depending on the mutex type, it might be possible for another thread to unlock the mutex and recover the state of the mutex. However, it is difficult to perform this recovery reliably.
Robust mutexes provide a means to enable the implementation to notify other threads in the event of a process terminating while one of its threads holds a mutex lock. The next thread that acquires the mutex is notified about the termination by the return value [EOWNERDEAD] from the locking function. The notified thread can then attempt to recover the state protected by the mutex, and if successful mark the state protected by the mutex as consistent by a call to pthread_mutex_consistent(). If the notified thread is unable to recover the state, it can declare the state as not recoverable by a call to pthread_mutex_unlock() without a prior call to pthread_mutex_consistent().
Whether or not the state protected by a mutex can be recovered is dependent solely on the application using robust mutexes. The robust mutex support provided in the implementation provides notification only that a mutex owner has terminated while holding a lock, or that the state of the mutex is not recoverable.
[TPS] The functionality described in this section shall be provided on implementations that support the Thread Execution Scheduling option (and the rest of this section is not further marked for this option).
In support of the scheduling function, threads have attributes which are accessed through the pthread_attr_t thread creation attributes object.
The contentionscope attribute defines the scheduling contention scope of the thread to be either PTHREAD_SCOPE_PROCESS or PTHREAD_SCOPE_SYSTEM.
The inheritsched attribute specifies whether a newly created thread is to inherit the scheduling attributes of the creating thread or to have its scheduling values set according to the other scheduling attributes in the pthread_attr_t object.
The schedpolicy attribute defines the scheduling policy for the thread. The schedparam attribute defines the scheduling parameters for the thread. The interaction of threads having different policies within a process is described as part of the definition of those policies.
If the Thread Execution Scheduling option is defined, and the schedpolicy attribute specifies one of the priority-based policies defined under this option, the schedparam attribute contains the scheduling priority of the thread. A conforming implementation ensures that the priority value in schedparam is in the range associated with the scheduling policy when the thread attributes object is used to create a thread, or when the scheduling attributes of a thread are dynamically modified. The meaning of the priority value in schedparam is the same as that of priority.
[TSP] If _POSIX_THREAD_SPORADIC_SERVER is defined, the schedparam attribute supports four new members that are used for the sporadic server scheduling policy. These members are sched_ss_low_priority, sched_ss_repl_period, sched_ss_init_budget, and sched_ss_max_repl. The meaning of these attributes is the same as in the definitions that appear under Process Scheduling.
When a process is created, its single thread has a scheduling policy and associated attributes equal to the policy and attributes of the process. The default scheduling contention scope value is implementation-defined. The default values of other scheduling attributes are implementation-defined.
The scheduling contention scope of a thread defines the set of threads with which the thread competes for use of the processing resources. The scheduling operation selects at most one thread to execute on each processor at any point in time and the thread's scheduling attributes (for example, priority), whether under process scheduling contention scope or system scheduling contention scope, are the parameters used to determine the scheduling decision.
The scheduling contention scope, in the context of scheduling a mixed scope environment, affects threads as follows:
A thread created with PTHREAD_SCOPE_SYSTEM scheduling contention scope contends for resources with all other threads in the same scheduling allocation domain relative to their system scheduling attributes. The system scheduling attributes of a thread created with PTHREAD_SCOPE_SYSTEM scheduling contention scope are the scheduling attributes with which the thread was created. The system scheduling attributes of a thread created with PTHREAD_SCOPE_PROCESS scheduling contention scope are the implementation-defined mapping into system attribute space of the scheduling attributes with which the thread was created.
Threads created with PTHREAD_SCOPE_PROCESS scheduling contention scope contend directly with other threads within their process that were created with PTHREAD_SCOPE_PROCESS scheduling contention scope. The contention is resolved based on the threads' scheduling attributes and policies. It is unspecified how such threads are scheduled relative to threads in other processes or threads with PTHREAD_SCOPE_SYSTEM scheduling contention scope.
Conforming implementations shall support the PTHREAD_SCOPE_PROCESS scheduling contention scope, the PTHREAD_SCOPE_SYSTEM scheduling contention scope, or both.
Implementations shall support scheduling allocation domains containing one or more processors. It should be noted that the presence of multiple processors does not automatically indicate a scheduling allocation domain size greater than one. Conforming implementations on multi-processors may map all or any subset of the CPUs to one or multiple scheduling allocation domains, and could define these scheduling allocation domains on a per-thread, per-process, or per-system basis, depending on the types of applications intended to be supported by the implementation. The scheduling allocation domain is independent of scheduling contention scope, as the scheduling contention scope merely defines the set of threads with which a thread contends for processor resources, while scheduling allocation domain defines the set of processors for which it contends. The semantics of how this contention is resolved among threads for processors is determined by the scheduling policies of the threads.
The choice of scheduling allocation domain size and the level of application control over scheduling allocation domains is implementation-defined. Conforming implementations may change the size of scheduling allocation domains and the binding of threads to scheduling allocation domains at any time.
For application threads with scheduling allocation domains of size equal to one, the scheduling rules defined for SCHED_FIFO and SCHED_RR shall be used; see Scheduling Policies. All threads with system scheduling contention scope, regardless of the processes in which they reside, compete for the processor according to their priorities. Threads with process scheduling contention scope compete only with other threads with process scheduling contention scope within their process.
For application threads with scheduling allocation domains of size greater than one, the rules defined for SCHED_FIFO, SCHED_RR, [TSP] and SCHED_SPORADIC shall be used in an implementation-defined manner. Each thread with system scheduling contention scope competes for the processors in its scheduling allocation domain in an implementation-defined manner according to its priority. Threads with process scheduling contention scope are scheduled relative to other threads within the same scheduling contention scope in the process.
[TSP] If _POSIX_THREAD_SPORADIC_SERVER is defined, the rules defined for SCHED_SPORADIC in Scheduling Policies shall be used in an implementation-defined manner for application threads whose scheduling allocation domain size is greater than one.
If _POSIX_PRIORITY_SCHEDULING is defined, then any scheduling policies beyond SCHED_OTHER, SCHED_FIFO, SCHED_RR, [TSP] and SCHED_SPORADIC, as well as the effects of the scheduling policies indicated by these other values, and the attributes required in order to support such a policy, are implementation-defined. Furthermore, the implementation shall document the effect of all processor scheduling allocation domain values supported for these policies.
The thread cancellation mechanism allows a thread to terminate the execution of any other thread in the process in a controlled manner. The target thread (that is, the one that is being canceled) is allowed to hold cancellation requests pending in a number of ways and to perform application-specific cleanup processing when the notice of cancellation is acted upon.
Cancellation is controlled by the cancellation control functions. Each thread maintains its own cancelability state. Cancellation may only occur at cancellation points or when the thread is asynchronously cancelable.
The thread cancellation mechanism described in this section depends upon programs having set deferred cancelability state, which is specified as the default. Applications shall also carefully follow static lexical scoping rules in their execution behavior. For example, use of setjmp(), return, goto, and so on, to leave user-defined cancellation scopes without doing the necessary scope pop operation results in undefined behavior.
Use of asynchronous cancelability while holding resources which potentially need to be released may result in resource loss. Similarly, cancellation scopes may only be safely manipulated (pushed and popped) when the thread is in the deferred or disabled cancelability states.
The cancelability state of a thread determines the action taken upon receipt of a cancellation request. The thread may control cancellation in a number of ways.
Each thread maintains its own cancelability state, which may be encoded in two bits:
Cancelability-Enable: When cancelability is PTHREAD_CANCEL_DISABLE (as defined in the Base Definitions volume of POSIX.1-2017, <pthread.h>), cancellation requests against the target thread are held pending. By default, cancelability is set to PTHREAD_CANCEL_ENABLE (as defined in <pthread.h>).
Cancelability Type: When cancelability is enabled and the cancelability type is PTHREAD_CANCEL_ASYNCHRONOUS (as defined in <pthread.h>), new or pending cancellation requests may be acted upon at any time. When cancelability is enabled and the cancelability type is PTHREAD_CANCEL_DEFERRED (as defined in <pthread.h>), cancellation requests are held pending until a cancellation point (see below) is reached. If cancelability is disabled, the setting of the cancelability type has no immediate effect as all cancellation requests are held pending; however, once cancelability is enabled again the new type is in effect. The cancelability type is PTHREAD_CANCEL_DEFERRED in all newly created threads including the thread in which main() was first invoked.
Cancellation points shall occur when a thread is executing the following functions:
A cancellation point may also occur when a thread is executing the following functions:
In addition, a cancellation point may occur when a thread is executing any function that this standard does not require to be thread-safe but the implementation documents as being thread-safe. If a thread is cancelled while executing a non-thread-safe function, the behavior is undefined.
An implementation shall not introduce cancellation points into any other functions specified in this volume of POSIX.1-2017.
The side-effects of acting upon a cancellation request while suspended during a call of a function are the same as the side-effects that may be seen in a single-threaded program when a call to a function is interrupted by a signal and the given function returns [EINTR]. Any such side-effects occur before any cancellation cleanup handlers are called. For functions that are explicitly required not to return when interrupted (for example, pclose()), if a thread is canceled while executing the function, the behavior is undefined.
Whenever a thread has cancelability enabled and a cancellation request has been made with that thread as the target, and the thread then calls any function that is a cancellation point (such as pthread_testcancel() or read()), the cancellation request shall be acted upon before the function returns. If a thread has cancelability enabled and a cancellation request is made with the thread as a target while the thread is suspended at a cancellation point, the thread shall be awakened and the cancellation request shall be acted upon. It is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution if:
The thread is suspended at a cancellation point and the event for which it is waiting occurs
A specified timeout expired
before the cancellation request is acted upon.
Each thread maintains a list of cancellation cleanup handlers. The programmer uses the pthread_cleanup_push() and pthread_cleanup_pop() functions to place routines on and remove routines from this list.
When a cancellation request is acted upon, or when a thread calls pthread_exit(), the thread first disables cancellation by setting its cancelability state to PTHREAD_CANCEL_DISABLE and its cancelability type to PTHREAD_CANCEL_DEFERRED. The cancelability state shall remain set to PTHREAD_CANCEL_DISABLE until the thread has terminated. The behavior is undefined if a cancellation cleanup handler or thread-specific data destructor routine changes the cancelability state to PTHREAD_CANCEL_ENABLE.
The routines in the thread's list of cancellation cleanup handlers are invoked one by one in LIFO sequence; that is, the last routine pushed onto the list (Last In) is the first to be invoked (First Out). When the cancellation cleanup handler for a scope is invoked, the storage for that scope remains valid. If the last cancellation cleanup handler returns, thread-specific data destructors (if any) associated with thread-specific data keys for which the thread has non-NULL values will be run, in unspecified order, as described for pthread_key_create().
After all cancellation cleanup handlers and thread-specific data destructors have returned, thread execution is terminated. If the thread has terminated because of a call to pthread_exit(), the value_ptr argument is made available to any threads joining with the target. If the thread has terminated by acting on a cancellation request, a status of PTHREAD_CANCELED is made available to any threads joining with the target. The symbolic constant PTHREAD_CANCELED expands to a constant expression of type (void *) whose value matches no pointer to an object in memory nor the value NULL.
A side-effect of acting upon a cancellation request while in a condition variable wait is that the mutex is re-acquired before calling the first cancellation cleanup handler. In addition, the thread is no longer considered to be waiting for the condition and the thread shall not have consumed any pending condition signals on the condition.
A cancellation cleanup handler cannot exit via longjmp() or siglongjmp().
The pthread_cancel(), pthread_setcancelstate(), and pthread_setcanceltype() functions are defined to be async-cancel safe.
No other functions in this volume of POSIX.1-2017 are required to be async-cancel-safe.
If a thread has asynchronous cancellation enabled and is cancelled during execution of a function that is not async-cancel-safe, the behavior is undefined.
If a thread has deferred cancellation enabled, a signal-catching function is called in that thread during execution of a function that is not async-cancel-safe, and the signal-catching function calls any function that is a cancellation point while a cancellation is pending for the thread, the behavior is undefined.
Multiple readers, single writer (read-write) locks allow many threads to have simultaneous read-only access to data while allowing only one thread to have exclusive write access at any given time. They are typically used to protect data that is read more frequently than it is changed.
One or more readers acquire read access to the resource by performing a read lock operation on the associated read-write lock. A writer acquires exclusive write access by performing a write lock operation. Basically, all readers exclude any writers and a writer excludes all readers and any other writers.
A thread that has blocked on a read-write lock (for example, has not yet returned from a pthread_rwlock_rdlock() or pthread_rwlock_wrlock() call) shall not prevent any unblocked thread that is eligible to use the same processing resources from eventually making forward progress in its execution. Eligibility for processing resources shall be determined by the scheduling policy.
Read-write locks can be used to synchronize threads in the current process and other processes if they are allocated in memory that is writable and shared among the cooperating processes and have been initialized for this behavior.
All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2017 when they operate on regular files or symbolic links:
If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them. The requirement on the close() function shall also apply whenever a file descriptor is successfully closed, however caused (for example, as a consequence of calling close(), calling dup2(), or of process termination).
An "application-managed thread stack" is a region of memory allocated by the application-for example, memory returned by the malloc() or mmap() functions-and designated as a stack through the act of passing the address and size of the stack, respectively, as the stackaddr and stacksize arguments to pthread_attr_setstack(). Application-managed stacks allow the application to precisely control the placement and size of a stack.
The application grants to the implementation permanent ownership of and control over the application-managed stack when the attributes object in which the stack or stackaddr attribute has been set is used, either by presenting that attribute's object as the attr argument in a call to pthread_create() that completes successfully, or by storing a pointer to the attributes object in the sigev_notify_attributes member of a struct sigevent and passing that struct sigevent to a function accepting such argument that completes successfully. The application may thereafter utilize the memory within the stack only within the normal context of stack usage within or properly synchronized with a thread that has been scheduled by the implementation with stack pointer value(s) that are within the range of that stack. In particular, the region of memory cannot be freed, nor can it be later specified as the stack for another thread.
When specifying an attributes object with an application-managed stack through the sigev_notify_attributes member of a struct sigevent, the results are undefined if the requested signal is generated multiple times (as for a repeating timer).
Until an attributes object in which the stack or stackaddr attribute has been set is used, the application retains ownership of and control over the memory allocated to the stack. It may free or reuse the memory as long as it either deletes the attributes object, or before using the attributes object replaces the stack by making an additional call to pthread_attr_setstack(), that was used originally to designate the stack. There is no mechanism to retract the reference to an application-managed stack by an existing attributes object.
Once an attributes object with an application-managed stack has been used, that attributes object cannot be used again by a subsequent call to pthread_create() or any function accepting a struct sigevent with sigev_notify_attributes containing a pointer to the attributes object, without designating an unused application-managed stack by making an additional call to pthread_attr_setstack().
For barriers, condition variables, mutexes, and read-write locks, [TSH] if the process-shared attribute is set to PTHREAD_PROCESS_PRIVATE, only the synchronization object at the address used to initialize it can be used for performing synchronization. The effect of referring to another mapping of the same object when locking, unlocking, or destroying the object is undefined. [TSH] If the process-shared attribute is set to PTHREAD_PROCESS_SHARED, only the synchronization object itself can be used for performing synchronization; however, it need not be referenced at the address used to initalize it (that is, another mapping of the same object can be used). The effect of referring to a copy of the object when locking, unlocking, or destroying it is undefined.
For spin locks, the above requirements shall apply as if spin locks have a process-shared attribute that is set from the pshared argument to pthread_spin_init(). For semaphores, the above requirements shall apply as if semaphores have a process-shared attribute that is set to PTHREAD_PROCESS_PRIVATE if the pshared argument to sem_init() is zero and set to PTHREAD_PROCESS_SHARED if pshared is non-zero.
A socket is an endpoint for communication using the facilities described in this section. A socket is created with a specific socket type, described in Socket Types, and is associated with a specific protocol, detailed in Protocols. A socket is accessed via a file descriptor obtained when the socket is created.
All network protocols are associated with a specific address family. An address family provides basic services to the protocol implementation to allow it to function within a specific network environment. These services may include packet fragmentation and reassembly, routing, addressing, and basic transport. An address family is normally comprised of a number of protocols, one per socket type. Each protocol is characterized by an abstract socket type. It is not required that an address family support all socket types. An address family may contain multiple protocols supporting the same socket abstraction.
Use of Sockets for Local UNIX Connections, Use of Sockets over Internet Protocols Based on IPv4, and Use of Sockets over Internet Protocols Based on IPv6, respectively, describe the use of sockets for local UNIX connections, for Internet protocols based on IPv4, and for Internet protocols based on IPv6.
An address family defines the format of a socket address. All network addresses are described using a general structure, called a sockaddr, as defined in the Base Definitions volume of POSIX.1-2017, <sys/socket.h>. However, each address family imposes finer and more specific structure, generally defining a structure with fields specific to the address family. The field sa_family in the sockaddr structure contains the address family identifier, specifying the format of the sa_data area. The size of the sa_data area is unspecified.
A protocol supports one of the socket abstractions detailed in Socket Types. Selecting a protocol involves specifying the address family, socket type, and protocol number to the socket() function. Certain semantics of the basic socket abstractions are protocol-specific. All protocols are expected to support the basic model for their particular socket type, but may, in addition, provide non-standard facilities or extensions to a mechanism.
Sockets provides packet routing facilities. A routing information database is maintained, which is used in selecting the appropriate network interface when transmitting packets.
Each network interface in a system corresponds to a path through which messages can be sent and received. A network interface usually has a hardware device associated with it, though certain interfaces such as the loopback interface, do not.
A socket is created with a specific type, which defines the communication semantics and which allows the selection of an appropriate communication protocol. Four types are defined: SOCK_DGRAM, [RS] SOCK_RAW, SOCK_SEQPACKET, and SOCK_STREAM. Implementations may specify additional socket types.
The SOCK_STREAM socket type provides reliable, sequenced, full-duplex octet streams between the socket and a peer to which the socket is connected. A socket of type SOCK_STREAM must be in a connected state before any data may be sent or received. Record boundaries are not maintained; data sent on a stream socket using output operations of one size may be received using input operations of smaller or larger sizes without loss of data. Data may be buffered; successful return from an output function does not imply that the data has been delivered to the peer or even transmitted from the local system. If data cannot be successfully transmitted within a given time then the connection is considered broken, and subsequent operations shall fail. A SIGPIPE signal is raised if a thread attempts to send data on a broken stream (one that is no longer connected), except that the signal is suppressed if the MSG_NOSIGNAL flag is used in calls to send(), sendto(), and sendmsg(). Support for an out-of-band data transmission facility is protocol-specific.
The SOCK_SEQPACKET socket type is similar to the SOCK_STREAM type, and is also connection-oriented. The only difference between these types is that record boundaries are maintained using the SOCK_SEQPACKET type. A record can be sent using one or more output operations and received using one or more input operations, but a single operation never transfers parts of more than one record. Record boundaries are visible to the receiver via the MSG_EOR flag in the received message flags returned by the recvmsg() function. It is protocol-specific whether a maximum record size is imposed.
The SOCK_DGRAM socket type supports connectionless data transfer which is not necessarily acknowledged or reliable. Datagrams may be sent to the address specified (possibly multicast or broadcast) in each output operation, and incoming datagrams may be received from multiple sources. The source address of each datagram is available when receiving the datagram. An application may also pre-specify a peer address, in which case calls to output functions that do not specify a peer address shall send to the pre-specified peer. If a peer has been specified, only datagrams from that peer shall be received. A datagram must be sent in a single output operation, and must be received in a single input operation. The maximum size of a datagram is protocol-specific; with some protocols, the limit is implementation-defined. Output datagrams may be buffered within the system; thus, a successful return from an output function does not guarantee that a datagram is actually sent or received. However, implementations should attempt to detect any errors possible before the return of an output function, reporting any error by an unsuccessful return value.
[RS] The SOCK_RAW socket type is similar to the SOCK_DGRAM type. It differs in that it is normally used with communication providers that underlie those used for the other socket types. For this reason, the creation of a socket with type SOCK_RAW shall require appropriate privileges. The format of datagrams sent and received with this socket type generally include specific protocol headers, and the formats are protocol-specific and implementation-defined.
The I/O mode of a socket is described by the O_NONBLOCK file status flag which pertains to the open file description for the socket. This flag is initially off when a socket is created, but may be set and cleared by the use of the F_SETFL command of the fcntl() function.
When the O_NONBLOCK flag is set, certain functions that would normally block until they are complete shall return immediately.
The bind() function initiates an address assignment and shall return without blocking when O_NONBLOCK is set; if the socket address cannot be assigned immediately, bind() shall return the [EINPROGRESS] error to indicate that the assignment was initiated successfully, but that it has not yet completed.
The connect() function initiates a connection and shall return without blocking when O_NONBLOCK is set; it shall return the error [EINPROGRESS] to indicate that the connection was initiated successfully, but that it has not yet completed.
Data transfer operations (the read(), write(), send(), and recv() functions) shall complete immediately, transfer only as much as is available, and then return without blocking, or return an error indicating that no transfer could be made without blocking.
The owner of a socket is unset when a socket is created. The owner may be set to a process ID or process group ID using the F_SETOWN command of the fcntl() function.
The transmit and receive queue sizes for a socket are set when the socket is created. The default sizes used are both protocol-specific and implementation-defined. The sizes may be changed using the setsockopt() function.
Errors may occur asynchronously, and be reported to the socket in response to input from the network protocol. The socket stores the pending error to be reported to a user of the socket at the next opportunity. The error is returned in response to a subsequent send(), recv(), or getsockopt() operation on the socket, and the pending error is then cleared.
A socket has a receive queue that buffers data when it is received by the system until it is removed by a receive call. Depending on the type of the socket and the communication provider, the receive queue may also contain ancillary data such as the addressing and other protocol data associated with the normal data in the queue, and may contain out-of-band or expedited data. The limit on the queue size includes any normal, out-of-band data, datagram source addresses, and ancillary data in the queue. The description in this section applies to all sockets, even though some elements cannot be present in some instances.
The contents of a receive buffer are logically structured as a series of data segments with associated ancillary data and other information. A data segment may contain normal data or out-of-band data, but never both. A data segment may complete a record if the protocol supports records (always true for types SOCK_SEQPACKET and SOCK_DGRAM). A record may be stored as more than one segment; the complete record might never be present in the receive buffer at one time, as a portion might already have been returned to the application, and another portion might not yet have been received from the communications provider. A data segment may contain ancillary protocol data, which is logically associated with the segment. Ancillary data is received as if it were queued along with the first normal data octet in the segment (if any). A segment may contain ancillary data only, with no normal or out-of-band data. For the purposes of this section, a datagram is considered to be a data segment that terminates a record, and that includes a source address as a special type of ancillary data. Data segments are placed into the queue as data is delivered to the socket by the protocol. Normal data segments are placed at the end of the queue as they are delivered. If a new segment contains the same type of data as the preceding segment and includes no ancillary data, and if the preceding segment does not terminate a record, the segments are logically merged into a single segment.
The receive queue is logically terminated if an end-of-file indication has been received or a connection has been terminated. A segment shall be considered to be terminated if another segment follows it in the queue, if the segment completes a record, or if an end-of-file or other connection termination has been reported. The last segment in the receive queue shall also be considered to be terminated while the socket has a pending error to be reported.
A receive operation shall never return data or ancillary data from more than one segment.
The handling of received out-of-band data is protocol-specific. Out-of-band data may be placed in the socket receive queue, either at the end of the queue or before all normal data in the queue. In this case, out-of-band data is returned to an application program by a normal receive call. Out-of-band data may also be queued separately rather than being placed in the socket receive queue, in which case it shall be returned only in response to a receive call that requests out-of-band data. It is protocol-specific whether an out-of-band data mark is placed in the receive queue to demarcate data preceding the out-of-band data and following the out-of-band data. An out-of-band data mark is logically an empty data segment that cannot be merged with other segments in the queue. An out-of-band data mark is never returned in response to an input operation. The sockatmark() function can be used to test whether an out-of-band data mark is the first element in the queue. If an out-of-band data mark is the first element in the queue when an input function is called without the MSG_PEEK option, the mark is removed from the queue and the following data (if any) is processed as if the mark had not been present.
Sockets that are used to accept incoming connections maintain a queue of outstanding connection indications. This queue is a list of connections that are awaiting acceptance by the application; see listen .
One category of event at the socket interface is the generation of signals. These signals report protocol events or process errors relating to the state of the socket. The generation or delivery of a signal does not change the state of the socket, although the generation of the signal may have been caused by a state change.
The SIGPIPE signal shall be sent to a thread that attempts to send data on a socket that is no longer able to send (one that is no longer connected), except that the signal is suppressed if the MSG_NOSIGNAL flag is used in calls to send(), sendto(), and sendmsg(). Regardless of whether the generation of the signal is suppressed, the send operation shall fail with the [EPIPE] error.
If a socket has an owner, the SIGURG signal is sent to the owner of the socket when it is notified of expedited or out-of-band data. The socket state at this time is protocol-dependent, and the status of the socket is specified in Use of Sockets for Local UNIX Connections, Use of Sockets over Internet Protocols Based on IPv4, and Use of Sockets over Internet Protocols Based on IPv6. Depending on the protocol, the expedited data may or may not have arrived at the time of signal generation.
If any of the following conditions occur asynchronously for a socket, the corresponding value listed below shall become the pending error for the socket:
There are a number of socket options which either specialize the behavior of a socket or provide useful information. These options may be set at different protocol levels and are always present at the uppermost "socket" level.
Socket options are manipulated by two functions, getsockopt() and setsockopt(). These functions allow an application program to customize the behavior and characteristics of a socket to provide the desired effect.
All of the options have default values. The type and meaning of these values is defined by the protocol level to which they apply. Instead of using the default values, an application program may choose to customize one or more of the options. However, in the bulk of cases, the default values are sufficient for the application.
Some of the options are used to enable or disable certain behavior within the protocol modules (for example, turn on debugging) while others may be used to set protocol-specific information (for example, IP time-to-live on all the application's outgoing packets). As each of the options is introduced, its effect on the underlying protocol modules is described.
Value of Level for Socket Options shows the value for the socket level.
Name |
Description |
---|---|
SOL_SOCKET |
Options are intended for the sockets level. |
Socket-Level Options lists those options present at the socket level; that is, when the level
parameter of the getsockopt() or setsockopt() function is SOL_SOCKET, the types of the option value parameters associated
with each option, and a brief synopsis of the meaning of the option value parameter. Unless otherwise noted, each may be examined
with getsockopt() and set with setsockopt() on all types of socket. Options at other protocol levels vary in format and
name.
Option |
Parameter Type |
Parameter Meaning |
---|---|---|
SO_ACCEPTCONN |
int |
Non-zero indicates that socket listening is enabled (getsockopt() only). |
SO_BROADCAST |
int |
Non-zero requests permission to transmit broadcast datagrams (SOCK_DGRAM sockets only). |
SO_DEBUG |
int |
Non-zero requests debugging in underlying protocol modules. |
SO_DONTROUTE |
int |
Non-zero requests bypass of normal routing; route based on destination address only. |
SO_ERROR |
int |
Requests and clears pending error information on the socket (getsockopt() only). |
SO_KEEPALIVE |
int |
Non-zero requests periodic transmission of keepalive messages (protocol-specific). |
SO_LINGER |
struct linger |
Specify actions to be taken for queued, unsent data on close(): linger on/off and linger time in seconds. |
SO_OOBINLINE |
int |
Non-zero requests that out-of-band data be placed into normal data input queue as received. |
SO_RCVBUF |
int |
Size of receive buffer (in bytes). |
SO_RCVLOWAT |
int |
Minimum amount of data to return to application for input operations (in bytes). |
SO_RCVTIMEO |
struct timeval |
Timeout value for a socket receive operation. |
SO_REUSEADDR |
int |
Non-zero requests reuse of local addresses in bind() (protocol-specific). |
SO_SNDBUF |
int |
Size of send buffer (in bytes). |
SO_SNDLOWAT |
int |
Minimum amount of data to send for output operations (in bytes). |
SO_SNDTIMEO |
struct timeval |
Timeout value for a socket send operation. |
SO_TYPE |
int |
Identify socket type (getsockopt() only). |
The SO_ACCEPTCONN option is used only on getsockopt(). When this option is specified, getsockopt() shall report whether socket listening is enabled for the socket. A value of zero shall indicate that socket listening is disabled; non-zero that it is enabled. SO_ACCEPTCONN has no default value.
The SO_BROADCAST option requests permission to send broadcast datagrams on the socket. Support for SO_BROADCAST is protocol-specific. The default for SO_BROADCAST is that the ability to send broadcast datagrams on a socket is disabled.
The SO_DEBUG option enables debugging in the underlying protocol modules. This can be useful for tracing the behavior of the underlying protocol modules during normal system operation. The semantics of the debug reports are implementation-defined. The default value for SO_DEBUG is for debugging to be turned off.
The SO_DONTROUTE option requests that outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. It is protocol-specific whether this option has any effect and how the outgoing network interface is chosen. Support for this option with each protocol is implementation-defined.
The SO_ERROR option is used only on getsockopt(). When this option is specified, getsockopt() shall return any pending error on the socket and clear the error status. It shall return a value of 0 if there is no pending error. SO_ERROR may be used to check for asynchronous errors on connected connectionless-mode sockets or for other types of asynchronous errors. SO_ERROR has no default value.
The SO_KEEPALIVE option enables the periodic transmission of messages on a connected socket. The behavior of this option is protocol-specific. On a connection-mode socket for which a connection has been established, if SO_KEEPALIVE is enabled and the connected socket fails to respond to the keep-alive messages, the connection shall be broken. The default value for SO_KEEPALIVE is zero, specifying that this capability is turned off.
The SO_LINGER option controls the action of the interface when unsent messages are queued on a socket and a close() is performed. The details of this option are protocol-specific. If SO_LINGER is enabled, the system shall block the calling thread during close() until it can transmit the data or until the end of the interval indicated by the l_linger member, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the calling thread to continue as quickly as possible. The default value for SO_LINGER is zero, or off, for the l_onoff element of the option value and zero seconds for the linger time specified by the l_linger element.
The SO_OOBINLINE option is valid only on protocols that support out-of-band data. The SO_OOBINLINE option requests that out-of-band data be placed in the normal data input queue as received; it is then accessible using the read() or recv() functions without the MSG_OOB flag set. The default for SO_OOBINLINE is off; that is, for out-of-band data not to be placed in the normal data input queue.
The SO_RCVBUF option requests that the buffer space allocated for receive operations on this socket be set to the value, in bytes, of the option value. Applications may wish to increase buffer size for high volume connections, or may decrease buffer size to limit the possible backlog of incoming data. The default value for the SO_RCVBUF option value is implementation-defined, and may vary by protocol.
The SO_RCVLOWAT option sets the minimum number of bytes to process for socket input operations. In general, receive calls block until any (non-zero) amount of data is received, then return the smaller of the amount available or the amount requested. The default value for SO_RCVLOWAT is 1, and does not affect the general case. If SO_RCVLOWAT is set to a larger value, blocking receive calls normally wait until they have received the smaller of the low water mark value or the requested amount. Receive calls may still return less than the low water mark if an error occurs, a signal is caught, or the type of data next in the receive queue is different from that returned (for example, out-of-band data). As mentioned previously, the default value for SO_RCVLOWAT is 1 byte. It is implementation-defined whether the SO_RCVLOWAT option can be set.
The SO_RCVTIMEO option is an option to set a timeout value for input operations. It accepts a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an input operation to complete. If a receive operation has blocked for this much time without receiving additional data, it shall return with a partial count or errno shall be set to [EAGAIN] or [EWOULDBLOCK] if no data were received. The default for this option is the value zero, which indicates that a receive operation will not time out. It is implementation-defined whether the SO_RCVTIMEO option can be set.
The SO_REUSEADDR option indicates that the rules used in validating addresses supplied in a bind() should allow reuse of local addresses. Operation of this option is protocol-specific. The default value for SO_REUSEADDR is off; that is, reuse of local addresses is not permitted.
The SO_SNDBUF option requests that the buffer space allocated for send operations on this socket be set to the value, in bytes, of the option value. The default value for the SO_SNDBUF option value is implementation-defined, and may vary by protocol.
The SO_SNDLOWAT option sets the minimum number of bytes to process for socket output operations. Most output operations process all of the data supplied by the call, delivering data to the protocol for transmission and blocking as necessary for flow control. Non-blocking output operations process as much data as permitted subject to flow control without blocking, but process no data if flow control does not allow the smaller of the send low water mark value or the entire request to be processed. A select() operation testing the ability to write to a socket shall return true only if the send low water mark could be processed. The default value for SO_SNDLOWAT is implementation-defined and protocol-specific. It is implementation-defined whether the SO_SNDLOWAT option can be set.
The SO_SNDTIMEO option is an option to set a timeout value for the amount of time that an output function shall block because flow control prevents data from being sent. As noted in Socket-Level Options, the option value is a timeval structure with the number of seconds and microseconds specifying the limit on how long to wait for an output operation to complete. If a send operation has blocked for this much time, it shall return with a partial count or errno set to [EAGAIN] or [EWOULDBLOCK] if no data were sent. The default for this option is the value zero, which indicates that a send operation will not time out. It is implementation-defined whether the SO_SNDTIMEO option can be set.
The SO_TYPE option is used only on getsockopt(). When this option is specified, getsockopt() shall return the type of the socket (for example, SOCK_STREAM). This option is useful to servers that inherit sockets on start-up. SO_TYPE has no default value.
Support for UNIX domain sockets is mandatory.
UNIX domain sockets provide process-to-process communication in a single system.
The symbolic constant AF_UNIX defined in the <sys/socket.h> header is used to identify the UNIX domain address family. The <sys/un.h> header contains other definitions used in connection with UNIX domain sockets. See XBD Headers.
The sockaddr_storage structure defined in <sys/socket.h> shall be large enough to accommodate a sockaddr_un structure (see the <sys/un.h> header defined in XBD Headers) and shall be aligned at an appropriate boundary so that pointers to it can be cast as pointers to sockaddr_un structures and used to access the fields of those structures without alignment problems. When a sockaddr_storage structure is cast as a sockaddr_un structure, the ss_family field maps onto the sun_family field.
When a socket is created in the Internet family with a protocol value of zero, the implementation shall use the protocol listed below for the type of socket created.
[RS] A raw interface to IP is available by creating an Internet socket of type SOCK_RAW. The default protocol for type SOCK_RAW shall be identified in the IP header with the value IPPROTO_RAW. Applications should not use the default protocol when creating a socket with type SOCK_RAW, but should identify a specific protocol by value. The ICMP control protocol is accessible from a raw socket by specifying a value of IPPROTO_ICMP for protocol.
Support for sockets over Internet protocols based on IPv4 is mandatory.
The symbolic constant AF_INET defined in the <sys/socket.h> header is used to identify the IPv4 Internet address family. The <netinet/in.h> header contains other definitions used in connection with IPv4 Internet sockets. See XBD Headers.
The sockaddr_storage structure defined in <sys/socket.h> shall be large enough to accommodate a sockaddr_in structure (see the <netinet/in.h> header defined in XBD Headers) and shall be aligned at an appropriate boundary so that pointers to it can be cast as pointers to sockaddr_in structures and used to access the fields of those structures without alignment problems. When a sockaddr_storage structure is cast as a sockaddr_in structure, the ss_family field maps onto the sin_family field.
[IP6] This section describes extensions to support sockets over Internet protocols based on IPv6. The functionality described in this section shall be provided on implementations that support the IPV6 option (and the rest of this section is not further marked for this option).
To enable smooth transition from IPv4 to IPv6, the features defined in this section may, in certain circumstances, also be used in connection with IPv4; see Compatibility with IPv4.
IPv6 overcomes the addressing limitations of earlier versions by using 128-bit addresses instead of 32-bit addresses. The IPv6 address architecture is described in RFC 2373.
There are three kinds of IPv6 address:
A unicast address can be global, link-local (designed for use on a single link), or site-local (designed for systems not connected to the Internet). Link-local and site-local addresses need not be globally unique.
An anycast address is similar to a unicast address; the nodes to which an anycast address is assigned must be explicitly configured to know that it is an anycast address.
An application can send multicast datagrams by simply specifying an IPv6 multicast address in the address argument of sendto(). To receive multicast datagrams, an application must join the multicast group (using setsockopt() with IPV6_JOIN_GROUP) and must bind to the socket the UDP port on which datagrams will be received. Some applications should also bind the multicast group address to the socket, to prevent other datagrams destined to that port from being delivered to the socket.
A multicast address can be global, node-local, link-local, site-local, or organization-local.
The following special IPv6 addresses are defined:
Two sets of IPv6 addresses are defined to correspond to IPv4 addresses:
Note that the unspecified address and the loopback address must not be treated as IPv4-compatible addresses.
The API provides the ability for IPv6 applications to interoperate with applications using IPv4, by using IPv4-mapped IPv6 addresses. These addresses can be generated automatically by the getaddrinfo() function when the specified host has only IPv4 addresses.
Applications can use AF_INET6 sockets to open TCP connections to IPv4 nodes, or send UDP packets to IPv4 nodes, by simply encoding the destination's IPv4 address as an IPv4-mapped IPv6 address, and passing that address, within a sockaddr_in6 structure, in the connect(), sendto(), or sendmsg() function. When applications use AF_INET6 sockets to accept TCP connections from IPv4 nodes, or receive UDP packets from IPv4 nodes, the system shall return the peer's address to the application in the accept(), recvfrom(), recvmsg(), or getpeername() function using a sockaddr_in6 structure encoded this way. If a node has an IPv4 address, then the implementation shall allow applications to communicate using that address via an AF_INET6 socket. In such a case, the address will be represented at the API by the corresponding IPv4-mapped IPv6 address. Also, the implementation may allow an AF_INET6 socket bound to in6addr_any to receive inbound connections and packets destined to one of the node's IPv4 addresses.
An application can use AF_INET6 sockets to bind to a node's IPv4 address by specifying the address as an IPv4-mapped IPv6 address in a sockaddr_in6 structure in the bind() function. For an AF_INET6 socket bound to a node's IPv4 address, the system shall return the address in the getsockname() function as an IPv4-mapped IPv6 address in a sockaddr_in6 structure.
Each local interface is assigned a unique positive integer as a numeric index. Indexes start at 1; zero is not used. There may be gaps so that there is no current interface for a particular positive index. Each interface also has a unique implementation-defined name.
The following options apply at the IPPROTO_IPV6 level:
An attempt to read this option using getsockopt() shall result in an [EOPNOTSUPP] error.
The parameter type of this option is a pointer to an ipv6_mreq structure.
An attempt to read this option using getsockopt() shall result in an [EOPNOTSUPP] error.
The parameter type of this option is a pointer to an ipv6_mreq structure.
The parameter type of this option is a pointer to an int. (Default value: 1)
The parameter type of this option is a pointer to an unsigned int. (Default value: 0)
The parameter type of this option is a pointer to an unsigned int which is used as a Boolean value. (Default value: 1)
The parameter type of this option is a pointer to an int. (Default value: Unspecified)
The parameter type of this option is a pointer to an int which is used as a Boolean value. (Default value: 0)
An [EOPNOTSUPP] error shall result if IPV6_JOIN_GROUP or IPV6_LEAVE_GROUP is used with getsockopt().
The symbolic constant AF_INET6 is defined in the <sys/socket.h> header to identify the IPv6 Internet address family. See XBD Headers.
The sockaddr_storage structure defined in <sys/socket.h> shall be large enough to accommodate a sockaddr_in6 structure (see the <netinet/in.h> header defined in XBD Headers) and shall be aligned at an appropriate boundary so that pointers to it can be cast as pointers to sockaddr_in6 structures and used to access the fields of those structures without alignment problems. When a sockaddr_storage structure is cast as a sockaddr_in6 structure, the ss_family field maps onto the sin6_family field.
The <netinet/in.h>, <arpa/inet.h>, and <netdb.h> headers contain other definitions used in connection with IPv6 Internet sockets; see XBD Headers.
[OB TRC] This section describes extensions to support tracing of user applications. The functionality described in this section is dependent on support of the Trace option (and the rest of this section is not further marked for this option).
The tracing facilities defined in POSIX.1-2017 allow a process to select a set of trace event types, to activate a trace stream of the selected trace events as they occur in the flow of execution, and to retrieve the recorded trace events.
The tracing operation relies on three logically different components: the traced process, the controller process, and the analyzer process. During the execution of the traced process, when a trace point is reached, a trace event is recorded into the trace streams created for that process in which the associated trace event type identifier is not being filtered out. The controller process controls the operation of recording the trace events into the trace stream. It shall be able to:
Initialize the attributes of a trace stream
Create the trace stream (for a specified traced process) using those attributes
Start and stop tracing for the trace stream
Filter the type of trace events to be recorded, if the Trace Event Filter option is supported
Shut a trace stream down
These operations can be done for an active trace stream. The analyzer process retrieves the traced events either at runtime, when the trace stream has not yet been shut down, but is still recording trace events; or after opening a trace log that had been previously recorded and shut down. These three logically different operations can be performed by the same process, or can be distributed into different processes.
A trace stream identifier can be created by a call to posix_trace_create(), posix_trace_create_withlog(), or posix_trace_open(). The posix_trace_create() and posix_trace_create_withlog() functions should be used by a controller process. The posix_trace_open() should be used by an analyzer process.
The tracing functions can serve different purposes. One purpose is debugging the possibly pre-instrumented code, while another is post-mortem fault analysis. These two potential uses differ in that the first requires pre-filtering capabilities to avoid overwhelming the trace stream and permits focusing on expected information; while the second needs comprehensive trace capabilities in order to be able to record all types of information.
The events to be traced belong to two classes:
User trace events (generated by the application instrumentation)
System trace events (generated by the operating system)
The trace interface defines several system trace event types associated with control of and operation of the trace stream. This small set of system trace events includes the minimum required to interpret correctly the trace event information present in the stream. Other desirable system trace events for some particular application profile may be implemented and are encouraged; for example, process and thread scheduling, signal occurrence, and so on.
Each traced process shall have a mapping of the trace event names to trace event type identifiers that have been defined for that process. Each active trace stream shall have a mapping that incorporates all the trace event type identifiers predefined by the trace system plus all the mappings of trace event names to trace event type identifiers of the processes that are being traced into that trace stream. These mappings are defined from the instrumented application by calling the posix_trace_eventid_open() function and from the controller process by calling the posix_trace_trid_eventid_open() function. For a pre-recorded trace stream, the list of trace event types is obtained from the pre-recorded trace log.
The last data modification and file status change timestamps of a file associated with an active trace stream shall be marked for update every time any of the tracing operations modifies that file.
The last data access timestamp of a file associated with a trace stream shall be marked for update every time any of the tracing operations causes data to be read from that file.
Results are undefined if the application performs any operation on a file descriptor associated with an active or pre-recorded trace stream until posix_trace_shutdown() or posix_trace_close() is called for that trace stream. Results are also undefined if the analyzer process and the traced process do not share the same programming environment (see c99, Programming Environments in the Shell and Utilities volume of POSIX.1-2017.
The main purpose of this option is to define a complete set of functions and concepts that allow a conforming application to be traced from creation to termination, whatever its realtime constraints and properties.
The <trace.h> header shall define the posix_trace_status_info and posix_trace_event_info structures described below. Implementations may add extensions to these structures.
To facilitate control of a trace stream, information about the current state of an active trace stream can be obtained dynamically. This structure is returned by a call to the posix_trace_get_status() function.
The posix_trace_status_info structure defined in <trace.h> shall contain at least the following members:
Member Type |
Member Name |
Description |
---|---|---|
int |
posix_stream_status |
The operating mode of the trace stream. |
int |
posix_stream_full_status |
The full status of the trace stream. |
int |
posix_stream_overrun_status |
Indicates whether trace events were lost in the trace stream. |
If the Trace Log option is supported in addition to the Trace option, the posix_trace_status_info structure defined in <trace.h> shall contain at least the following additional members:
Member Type |
Member Name |
Description |
---|---|---|
int |
posix_stream_flush_status |
Indicates whether a flush is in progress. |
int |
posix_stream_flush_error |
Indicates whether any error occurred during the last flush operation. |
int |
posix_log_overrun_status |
Indicates whether trace events were lost in the trace log. |
int |
posix_log_full_status |
The full status of the trace log. |
The posix_stream_status member indicates the operating mode of the trace stream and shall have one of the following values defined by manifest constants in the <trace.h> header:
The posix_stream_full_status member indicates the full status of the trace stream, and it shall have one of the following values defined by manifest constants in the <trace.h> header:
The combination of the posix_stream_status and posix_stream_full_status members also indicates the actual status of the stream. The status shall be interpreted as follows:
The posix_stream_overrun_status member indicates whether trace events were lost in the trace stream, and shall have one of the following values defined by manifest constants in the <trace.h> header:
When the corresponding trace stream is created, the posix_stream_overrun_status member shall be set to POSIX_TRACE_NO_OVERRUN.
Whenever an overrun occurs, the posix_stream_overrun_status member shall be set to POSIX_TRACE_OVERRUN.
An overrun occurs when:
The policy is POSIX_TRACE_LOOP and a recorded trace event is overwritten.
The policy is POSIX_TRACE_UNTIL_FULL and the trace stream is full when a trace event is generated.
If the Trace Log option is supported, the policy is POSIX_TRACE_FLUSH and at least one trace event is lost while flushing the trace stream to the trace log.
The posix_stream_overrun_status member is reset to zero after its value is read.
If the Trace Log option is supported in addition to the Trace option, the posix_stream_flush_status, posix_stream_flush_error, posix_log_overrun_status, and posix_log_full_status members are defined as follows; otherwise, they are undefined.
The posix_stream_flush_status member indicates whether a flush operation is being performed and shall have one of the following values defined by manifest constants in the header <trace.h>:
The posix_stream_flush_status member shall be set to POSIX_TRACE_FLUSHING if a flush operation is in progress either due to a call to the posix_trace_flush() function (explicit or caused by a trace stream shutdown operation) or because the trace stream has become full with the stream-full-policy attribute set to POSIX_TRACE_FLUSH. The posix_stream_flush_status member shall be set to POSIX_TRACE_NOT_FLUSHING if no flush operation is in progress.
The posix_stream_flush_error member shall be set to zero if no error occurred during flushing. If an error occurred during a previous flushing operation, the posix_stream_flush_error member shall be set to the value of the first error that occurred. If more than one error occurs while flushing, error values after the first shall be discarded. The posix_stream_flush_error member is reset to zero after its value is read.
The posix_log_overrun_status member indicates whether trace events were lost in the trace log, and shall have one of the following values defined by manifest constants in the <trace.h> header:
When the corresponding trace stream is created, the posix_log_overrun_status member shall be set to POSIX_TRACE_NO_OVERRUN. Whenever an overrun occurs, this status shall be set to POSIX_TRACE_OVERRUN. The posix_log_overrun_status member is reset to zero after its value is read.
The posix_log_full_status member indicates the full status of the trace log, and it shall have one of the following values defined by manifest constants in the <trace.h> header:
The posix_log_full_status member is only meaningful if the log-full-policy attribute is either POSIX_TRACE_UNTIL_FULL or POSIX_TRACE_LOOP.
For an active trace stream without log, that is created by the posix_trace_create() function, the posix_log_overrun_status member shall be set to POSIX_TRACE_NO_OVERRUN and the posix_log_full_status member shall be set to POSIX_TRACE_NOT_FULL.
The trace event structure posix_trace_event_info contains the information for one recorded trace event. This structure is returned by the set of functions posix_trace_getnext_event(), posix_trace_timedgetnext_event(), and posix_trace_trygetnext_event().
The posix_trace_event_info structure defined in <trace.h> shall contain at least the following members:
Member Type |
Member Name |
Description |
---|---|---|
trace_event_id_t |
posix_event_id |
Trace event type identification. |
pid_t |
posix_pid |
Process ID of the process that generated the trace event. |
void * |
posix_prog_address |
Address at which the trace point was invoked. |
int |
posix_truncation_status |
Status about the truncation of the data associated with this trace event. |
struct timespec |
posix_timestamp |
Time at which the trace event was generated. |
In addition, the posix_trace_event_info structure defined in <trace.h> shall contain the following additional member:
Member Type |
Member Name |
Description |
---|---|---|
pthread_t |
posix_thread_id |
Thread ID of the thread that generated the trace event. |
The posix_event_id member represents the identification of the trace event type and its value is not directly defined by the user. This identification is returned by a call to one of the following functions: posix_trace_trid_eventid_open(), posix_trace_eventtypelist_getnext_id(), or posix_trace_eventid_open(). The name of the trace event type can be obtained by calling posix_trace_eventid_get_name().
The posix_pid is the process identifier of the traced process which generated the trace event. If the posix_event_id member is one of the implementation-defined system trace events and that trace event is not associated with any process, the posix_pid member shall be set to zero.
For a user trace event, the posix_prog_address member is the process mapped address of the point at which the associated call to the posix_trace_event() function was made. For a system trace event, if the trace event is caused by a system service explicitly called by the application, the posix_prog_address member shall be the address of the process at the point where the call to that system service was made.
The posix_truncation_status member defines whether the data associated with a trace event has been truncated at the time the trace event was generated, or at the time the trace event was read from the trace stream, or (if the Trace Log option is supported) from the trace log (see the event argument from the posix_trace_getnext_event() function). The posix_truncation_status member shall have one of the following values defined by manifest constants in the <trace.h> header:
The posix_timestamp member shall be the time at which the trace event was generated. The clock used is implementation-defined, but the resolution of this clock can be retrieved by a call to the posix_trace_attr_getclockres() function.
The posix_thread_id member is the identifier of the thread that generated the trace event. If the posix_event_id member is one of the implementation-defined system trace events and that trace event is not associated with any thread, the posix_thread_id member shall be set to zero.
Trace streams have attributes that compose the posix_trace_attr_t trace stream attributes object. This object shall contain at least the following attributes:
The generation-version attribute identifies the origin and version of the trace system.
The trace-name attribute is a character string defined by the trace controller, and that identifies the trace stream.
The creation-time attribute represents the time of the creation of the trace stream.
The clock-resolution attribute defines the clock resolution of the clock used to generate timestamps.
The stream-min-size attribute defines the minimum size in bytes of the trace stream strictly reserved for the trace events.
The stream-full-policy attribute defines the policy followed when the trace stream is full; its value is POSIX_TRACE_LOOP, POSIX_TRACE_UNTIL_FULL, or POSIX_TRACE_FLUSH.
The max-data-size attribute defines the maximum record size in bytes of a trace event.
In addition, if the Trace option and the Trace Inherit option are both supported, the posix_trace_attr_t trace stream creation attributes object shall contain at least the following attributes:
The inheritance attribute specifies whether a newly created trace stream will inherit tracing in its parent's process trace stream. It is either POSIX_TRACE_INHERITED or POSIX_TRACE_CLOSE_FOR_CHILD.
In addition, if the Trace option and the Trace Log option are both supported, the posix_trace_attr_t trace stream creation attributes object shall contain at least the following attribute:
If the file type corresponding to the trace log supports the POSIX_TRACE_LOOP or the POSIX_TRACE_UNTIL_FULL policies, the log-max-size attribute defines the maximum size in bytes of the trace log associated with an active trace stream. Other stream data-for example, trace attribute values-shall not be included in this size.
The log-full-policy attribute defines the policy of a trace log associated with an active trace stream to be POSIX_TRACE_LOOP, POSIX_TRACE_UNTIL_FULL, or POSIX_TRACE_APPEND.
The following system trace event types, defined in the <trace.h> header, track the invocation of the trace operations:
POSIX_TRACE_START shall be associated with a trace start operation.
POSIX_TRACE_STOP shall be associated with a trace stop operation.
If the Trace Event Filter option is supported, POSIX_TRACE_FILTER shall be associated with a trace event type filter change operation.
The following system trace event types, defined in the <trace.h> header, report operational trace events:
POSIX_TRACE_OVERFLOW shall mark the beginning of a trace overflow condition.
POSIX_TRACE_RESUME shall mark the end of a trace overflow condition.
If the Trace Log option is supported, POSIX_TRACE_FLUSH_START shall mark the beginning of a flush operation.
If the Trace Log option is supported, POSIX_TRACE_FLUSH_STOP shall mark the end of a flush operation.
If an implementation-defined trace error condition is reported, it shall be marked POSIX_TRACE_ERROR.
The interpretation of a trace stream or a trace log by a trace analyzer process relies on the information recorded for each trace event, and also on system trace events that indicate the invocation of trace control operations and trace system operational trace events.
The POSIX_TRACE_START and POSIX_TRACE_STOP trace events specify the time windows during which the trace stream is running.
The POSIX_TRACE_STOP trace event with an associated data that is equal to zero indicates a call of the function posix_trace_stop().
The POSIX_TRACE_STOP trace event with an associated data that is different from zero indicates an automatic stop of the trace stream (see the definition of the posix_trace_attr_getstreamfullpolicy() function in posix_trace_attr_getinherited).
The POSIX_TRACE_FILTER trace event indicates that a trace event type filter value changed while the trace stream was running.
The POSIX_TRACE_ERROR serves to inform the analyzer process that an implementation-defined internal error of the trace system occurred.
The POSIX_TRACE_OVERFLOW trace event shall be reported with a timestamp equal to the timestamp of the first trace event overwritten. This is an indication that some generated trace events have been lost.
The POSIX_TRACE_RESUME trace event shall be reported with a timestamp equal to the timestamp of the first valid trace event reported after the overflow condition ends and shall be reported before this first valid trace event. This is an indication that the trace system is reliably recording trace events after an overflow condition.
Each of these trace event types shall be defined by a constant trace event name and a trace_event_id_t constant; trace event data is associated with some of these trace events.
If the Trace option is supported and the Trace Event Filter option and the Trace Log option are not supported, the following
predefined system trace events in Trace Option: System Trace Events shall be defined:
|
|
Associated Data |
---|---|---|
|
|
_ |
Event Name |
Constant |
Data Type |
posix_trace_error |
POSIX_TRACE_ERROR |
error |
|
|
_ |
|
|
int |
posix_trace_start |
POSIX_TRACE_START |
None. |
posix_trace_stop |
POSIX_TRACE_STOP |
auto |
|
|
_ |
|
|
int |
posix_trace_overflow |
POSIX_TRACE_OVERFLOW |
None. |
posix_trace_resume |
POSIX_TRACE_RESUME |
None. |
If the Trace option and the Trace Event Filter option are both supported, and if the Trace Log option is not supported, the
following predefined system trace events in Trace and Trace Event Filter Options: System Trace Events
shall be defined:
|
|
Associated Data |
---|---|---|
|
|
_ |
Event Name |
Constant |
Data Type |
posix_trace_error |
POSIX_TRACE_ERROR |
error |
|
|
_ |
|
|
int |
posix_trace_start |
POSIX_TRACE_START |
event_filter |
|
|
_ |
|
|
trace_event_set_t |
posix_trace_stop |
POSIX_TRACE_STOP |
auto |
|
|
_ |
|
|
int |
posix_trace_filter |
POSIX_TRACE_FILTER |
old_event_filter |
|
|
new_event_filter |
|
|
_ |
|
|
trace_event_set_t |
posix_trace_overflow |
POSIX_TRACE_OVERFLOW |
None. |
posix_trace_resume |
POSIX_TRACE_RESUME |
None. |
If the Trace option and the Trace Log option are both supported, and if the Trace Event Filter option is not supported, the
following predefined system trace events in Trace and Trace Log Options: System Trace Events shall be
defined:
|
|
Associated Data |
---|---|---|
|
|
_ |
Event Name |
Constant |
Data Type |
posix_trace_error |
POSIX_TRACE_ERROR |
error |
|
|
_ |
|
|
int |
posix_trace_start |
POSIX_TRACE_START |
None. |
posix_trace_stop |
POSIX_TRACE_STOP |
auto |
|
|
_ |
|
|
int |
posix_trace_overflow |
POSIX_TRACE_OVERFLOW |
None. |
posix_trace_resume |
POSIX_TRACE_RESUME |
None. |
posix_trace_flush_start |
POSIX_TRACE_FLUSH_START |
None. |
posix_trace_flush_stop |
POSIX_TRACE_FLUSH_STOP |
None. |
If the Trace option, the Trace Event Filter option, and the Trace Log option are all supported, the following predefined system
trace events in Trace, Trace Log, and Trace Event Filter Options: System Trace Events shall be
defined:
|
|
Associated Data |
---|---|---|
|
|
_ |
Event Name |
Constant |
Data Type |
posix_trace_error |
POSIX_TRACE_ERROR |
error |
|
|
_ |
|
|
int |
posix_trace_start |
POSIX_TRACE_START |
event_filter |
|
|
_ |
|
|
trace_event_set_t |
posix_trace_stop |
POSIX_TRACE_STOP |
auto |
|
|
_ |
|
|
int |
posix_trace_filter |
POSIX_TRACE_FILTER |
old_event_filter |
|
|
new_event_filter |
|
|
_ |
|
|
trace_event_set_t |
posix_trace_overflow |
POSIX_TRACE_OVERFLOW |
None. |
posix_trace_resume |
POSIX_TRACE_RESUME |
None. |
posix_trace_flush_start |
POSIX_TRACE_FLUSH_START |
None. |
posix_trace_flush_stop |
POSIX_TRACE_FLUSH_STOP |
None. |
The user trace event POSIX_TRACE_UNNAMED_USEREVENT is defined in the <trace.h> header. If the limit of per-process user trace event names represented by {TRACE_USER_EVENT_MAX} has already been reached, this predefined user event shall be returned when the application tries to register more events than allowed. The data associated with this trace event is application-defined.
The following predefined user trace event in Trace Option: User Trace Event shall be defined:
Event Name |
Constant |
---|---|
posix_trace_unnamed_userevent |
POSIX_TRACE_UNNAMED_USEREVENT |
The trace interface is built and structured to improve portability through use of trace data of opaque type. The object-oriented approach for the manipulation of trace attributes and trace event type identifiers requires definition of many constructor and selector functions which operate on these opaque types. Also, the trace interface must support several different tracing roles. To facilitate reading the trace interface, the trace functions are grouped into small functional sets supporting the three different roles:
A trace controller process requires functions to set up and customize all the resources needed to run a trace stream, including:
Attribute initialization and destruction (posix_trace_attr_init())
Identification information manipulation (posix_trace_attr_getgenversion())
Trace system behavior modification (posix_trace_attr_getinherited())
Trace stream and trace log size set (posix_trace_attr_getmaxusereventsize())
Trace stream creation, flush, and shutdown (posix_trace_create())
Trace stream and trace log clear (posix_trace_clear())
Trace event type identifier manipulation (posix_trace_trid_eventid_open())
Trace event type identifier list exploration (posix_trace_eventtypelist_getnext_id())
Trace event type set manipulation (posix_trace_eventset_empty())
Trace event type filter set (posix_trace_set_filter())
Trace stream start and stop (posix_trace_start())
Trace stream information and status read (posix_trace_get_attr())
A traced process requires functions to instrument trace points:
Trace event type identifiers definition and trace points insertion (posix_trace_event())
A trace analyzer process requires functions to retrieve information from a trace stream and trace log:
Identification information read (posix_trace_attr_getgenversion())
Trace system behavior information read (posix_trace_attr_getinherited())
Trace stream and trace log size get (posix_trace_attr_getmaxusereventsize())
Trace event type identifier manipulation (posix_trace_trid_eventid_open())
Trace event type identifier list exploration (posix_trace_eventtypelist_getnext_id())
Trace log open, rewind, and close (posix_trace_open())
Trace stream information and status read (posix_trace_get_attr())
Trace event read (posix_trace_getnext_event())
All of the data types used by various functions are defined by the implementation. The following table describes some of these types. Other types referenced in the description of a function, not mentioned here, can be found in the appropriate header for that function.
Defined Type |
Description |
---|---|
cc_t |
Type used for terminal special characters. |
clock_t |
Integer or real-floating type used for processor times, as defined in the ISO C standard. |
clockid_t |
Used for clock ID type in some timer functions. |
dev_t |
Integer type used for device numbers. |
DIR |
Type representing a directory stream. |
div_t |
Structure type returned by the div() function. |
FILE |
Structure containing information about a file. |
glob_t |
Structure type used in pathname pattern matching. |
fpos_t |
Type containing all information needed to specify uniquely every |
|
position within a file. |
gid_t |
Integer type used for group IDs. |
iconv_t |
Type used for conversion descriptors. |
id_t |
Integer type used as a general identifier; can be used to contain |
|
at least the largest of a pid_t, uid_t, or gid_t. |
ino_t |
Unsigned integer type used for file serial numbers. |
key_t |
Arithmetic type used for XSI interprocess communication. |
ldiv_t |
Structure type returned by the ldiv() function. |
mode_t |
Integer type used for file attributes. |
mqd_t |
Used for message queue descriptors. |
nfds_t |
Integer type used for the number of file descriptors. |
nlink_t |
Integer type used for link counts. |
off_t |
Signed integer type used for file sizes. |
pid_t |
Signed integer type used for process and process group IDs. |
pthread_attr_t |
Used to identify a thread attribute object. |
pthread_cond_t |
Used for condition variables. |
pthread_condattr_t |
Used to identify a condition attribute object. |
pthread_key_t |
Used for thread-specific data keys. |
pthread_mutex_t |
Used for mutexes. |
pthread_mutexattr_t |
Used to identify a mutex attribute object. |
pthread_once_t |
Used for dynamic package initialization. |
pthread_rwlock_t |
Used for read-write locks. |
pthread_rwlockattr_t |
Used for read-write lock attributes. |
pthread_t |
Used to identify a thread. |
ptrdiff_t |
Signed integer type of the result of subtracting two pointers. |
regex_t |
Structure type used in regular expression matching. |
regmatch_t |
Structure type used in regular expression matching. |
rlim_t |
Unsigned integer type used for limit values, to which objects of |
|
type int and off_t can be cast without loss of value. |
sem_t |
Type used in performing semaphore operations. |
sig_atomic_t |
Possibly volatile-qualified integer type of an object that can be |
|
accessed as an atomic entity, even in the presence of asynchronous |
|
interrupts. |
sigset_t |
Integer or structure type of an object used to represent sets |
|
of signals. |
size_t |
Unsigned integer type used for size of objects. |
speed_t |
Type used for terminal baud rates. |
ssize_t |
Signed integer type used for a count of bytes or an error indication. |
suseconds_t |
Signed integer type used for time in microseconds. |
tcflag_t |
Type used for terminal modes. |
time_t |
Integer type used for time in seconds, as defined in the ISO C standard. |
timer_t |
Used for timer ID returned by the timer_create() function. |
uid_t |
Integer type used for user IDs. |
va_list |
Type used for traversing variable argument lists. |
wchar_t |
Integer type whose range of values can represent distinct codes for |
|
all members of the largest extended character set specified by the |
|
supported locales. |
wctype_t |
Scalar type which represents a character class descriptor. |
wint_t |
Integer type capable of storing any valid value of wchar_t or |
|
WEOF. |
wordexp_t |
Structure type used in word expansion. |
The type char is defined as a single byte; see XBD Definitions (Byte and Character).
Status information is data associated with a process detailing a change in the state of the process. It shall consist of:
The state the process transitioned into (stopped, continued, or terminated)
The information necessary to populate the siginfo_t structure provided by waitid()
If the new state is terminated:
The low-order 8 bits of the status argument that the process passed to _Exit(), _exit(), or exit(), or the low-order 8 bits of the value the process returned from main()
Note that these 8 bits are part of the complete value that is used to set the si_status member of the siginfo_t structure provided by waitid()
Whether the process terminated due to the receipt of a signal that was not caught and, if so, the number of the signal that caused the termination of the process
If the new state is stopped:
The number of the signal that caused the process to stop
A process might not have any status information (such as immediately after a process has started).
Status information for a process shall be generated (made available to the parent process) when the process stops, continues, or terminates except in the following case:
If the parent process sets the action for the SIGCHLD signal to SIG_IGN, or if the parent sets the SA_NOCLDWAIT flag for the SIGCHLD signal action, process termination shall not generate new status information but shall cause any existing status information for the process to be discarded.
If new status information is generated, and the process already had status information, the existing status information shall be discarded and replaced with the new status information.
Only the process' parent process can obtain the process' status information. The parent obtains a child's status information by calling wait(), waitid(), or waitpid(). Except when waitid() is called with the WNOWAIT flag set in the options argument, the status information obtained by a wait function shall be consumed (discarded) by that wait function; no two calls to wait(), waitid() (without WNOWAIT), or waitpid() shall obtain the same status information.
When status information becomes available to the parent process and more than one thread in the parent process is waiting for the status information (blocked in a call to wait(), waitid(), or waitpid() with arguments that would match the status information):
If none of the matching threads is in a call to waitid() with the WNOWAIT flag set in the options argument, the thread that obtains the status information is unspecified.
Otherwise (at least one of the matching threads is in a call to waitid() with the WNOWAIT flag set), the matching thread or threads that obtain the status information is unspecified except that at least one of the matching threads shall obtain the status information and at most one of the matching threads that are not calling waitid() with the WNOWAIT flag set shall obtain it.
All functions that open one or more file descriptors shall, unless specified otherwise, atomically allocate the lowest numbered available (that is, not already open in the calling process) file descriptor at the time of each allocation. Where a single function allocates two file descriptors (for example, pipe() or socketpair()), the allocations may be independent and therefore applications should not expect them to have adjacent values or depend on which has the higher value.
return to top of page