Chapter 1
Introducing C# and the .NET Framework
Contents:
The C# Language
The .NET Framework
ECMA Standardization
C# is a new programming language from Microsoft designed specifically to target the .NET Framework. Microsoft's .NET Framework is a runtime environment and class library that dramatically simplifies the development and deployment of modern, component-based applications.
When the .NET Framework and C# language compiler were shipped in final form in January 2002, both platform and programming language had already garnered much industry attention and widespread use among Microsoft-centric early adopters. Why this level of success? Certainly, the C# language and the .NET Framework address many of the technical challenges facing modern developers as they strive to develop increasingly complex distributed systems with ever-shrinking schedules and team sizes.
However, in addiction to its technical merits, one of the main reasons for the success that the language and platform has enjoyed thus far is the unprecedented degree of openness that Microsoft has shown. From July 2000 to January 2002, the .NET Framework underwent an extensive public beta that allowed tens of thousands of developers to "kick the tires" of the programming environment. This allowed Microsoft to both solicit and react to developer community feedback before finalizing the new platform.
Additionally, the key specifications for both the language and the platform have been published, reviewed, and ratified by an international standards organization called the European Computer Manufacturers Association (ECMA). These standardization efforts have led to multiple third-party initiatives that bring the C# language and the .NET platform to non-Microsoft environments. They have also prompted renewed interest among academics in the use of Microsoft technologies as teaching and research vehicles.
Lastly, although the language and platform are shiny and new, the foundations for the C# language and the .NET Framework have been years in the making, reaching back more than half a decade. Understanding where the language and platform have come from gives us a better understanding of where they are headed.
Reports of a new language from Microsoft first started surfacing in 1998. At that time the language was called COOL, and was said to be very similar to Java. Although Microsoft consistently denied the reports of the new language, rumors persisted.
In June 2000, Microsoft ended the speculation by releasing the specifications for a new language called C# (pronounced "see-sharp"). This was rapidly followed by the release of a preview version of the .NET Framework SDK (which included a C# compiler) at the July 2000 Professional Developer's Conference (PDC) in Orlando, Florida.
The new language was designed by Anders Hejlsberg (creator of Turbo Pascal and architect of Delphi), Scott Wiltamuth, and Peter Golde. Described in the C# Language Specification as a "...simple, modern, object-oriented, and type-safe programming language derived from C and C++," C# bears many syntactic similarities to C++ and Java.
However, focusing on the syntactic similarities between C# and Java does the C# language a disservice. Semantically, C# pushes the language-design envelope substantially beyond where the Java language was circa 2001, and could rightfully be viewed as the next step in the evolution of component-oriented programming languages. While it is outside the scope of this book to perform a detailed comparison between C# and Java, we urge interested readers to read the widely cited article "A Comparative Overview of C# and Java," by co-author Ben Albahari, available at http://genamics.com/developer/csharp_comparative.htm.
Enabling Component-Based Development
Over the last 10 years, programming techniques such as object-oriented design, interface-based programming, and component-based software have become ubiquitous. However, programming language support for these constructs has always lagged behind the current state-of-the-art best practices. As a result, developers tend to either depend on programming conventions and custom code rather than direct compiler and runtime support, or to not take advantage of the techniques at all.
As an example, consider that C++ supported object orientation, but had no formal concept of interfaces. C++ developers resorted to abstract base classes and mix-in interfaces to simulate interface-based programming, and relied on external component programming models such as COM or CORBA to provide the benefits of component-based software.
While Java extended C++ to add language-level support for interfaces and packages (among other things), it too had very little language-level support for building long-lived component-based systems (in which one needs to develop, interconnect, deploy, and version components from various sources over an extended period of time). This is not to say that the Java community hasn't built many such systems, but rather that these needs were addressed by programming conventions and custom code: relying on naming conventions to identify common design patterns such as properties and events, requiring external metadata for deployment information, and developing custom class loaders to provide stronger component versioning semantics.
By comparison, the C# language was designed from the ground up around the assumption that modern systems are built using components. Consequently, C# provides direct language support for common component constructs such as properties, methods, and events (used by RAD tools to build applications out of components, setting properties, responding to events, and wiring components together via method calls). C# also allows developers to directly annotate and extend a component's type information to provide deployment, design or runtime support, integrate component versioning directly into the programming model, integrate XML-based documentation directly into C# source files. C# also discards the C++ and COM approach of spreading source artifacts across header files, implementation files, and type libraries in favor of a much simpler source organization and component reuse model.
While this is by no means an exhaustive list, the enhancements in C# over Java and C++ qualify it as the next major step in the evolution of component-based development languages.
A Modern Object-Oriented Language
In addition to deeply integrated support for building component-based systems, C# is also a fully capable object-oriented language, supporting all the common concepts and abstractions that exist in languages such as C++ and Java.
As is expected of any modern object-oriented language, C# supports concepts such as inheritance, encapsulation, polymorphism, and interface-based programming. C# supports common C, C++, and Java language constructs such as classes, structs, interfaces, and enums, as well as more novel constructs such as delegates, which provide a type-safe equivalent to C/C++ function pointers, and custom attributes, which allow annotation of code elements with additional information.
In addition, C# incorporates features from C++ such as operator overloading, user-defined conversions, true rectangular arrays, and pass-by-reference semantics that are currently missing from Java.
Unlike most programming languages, C# has no runtime library of its own. Instead, C# relies on the vast class library in the .NET Framework for all its needs, including console I/O, network and file handling, collection data structures, and many other facilities. Implemented primarily in C# and spanning more than a million lines of code, this class library served as an excellent torture-test during the development cycle for both the C# language and the C# compiler.
The C# language strives to balance the need for consistency and efficiency. Some object-oriented languages (such as Smalltalk) take the viewpoint that "everything is an object." This approach has the advantage that instances of primitive types (such as integers) are first-class objects. However, it has the disadvantage of being very inefficient. To avoid this overhead, other languages (such as Java) choose to bifurcate the type system into primitives and everything else, leading to less overhead, but also to a schism between primitive and user-defined types.
C# balances these two conflicting viewpoints by presenting a unified type system in which all types (including primitive types) are derived from a common base type, while simultaneously allowing for performance optimizations that allow primitive types and simple user-defined types to be treated as raw memory, with minimal overhead and increased efficiency.
Building Robust and Durable Software
In a world of always-on connectivity and distributed systems, software robustness takes on new significance. Servers need to stay up and running 24×7 to service clients, and clients need to be able to download code off the network and run it locally with some guarantee that it will not misbehave. The C# language (in concert with the .NET Framework) promotes software robustness in a number of different ways.
First and foremost, C# is a type-safe language, meaning that programs are prevented from accessing objects in inappropriate ways. All code and data is associated with a type, all objects have an associated type, and only operations defined by the associated type can be performed on an object. Type-safety eliminates an entire category of errors in C and C++ programs stemming from invalid casts, bad pointer arithmetic, and even malicious code.
C# also provides automatic memory management in the form of a high-performance tracing generational garbage collector. This frees programmers from performing manual memory management or reference counting, and eliminates an entire category of errors, such as dangling pointers, memory leaks, and circular references.
Even good programs can have bad things happen to them, and it is important to have a consistent mechanism for detecting errors. Over the years, Windows developers have had to contend with numerous error reporting mechanisms, such as simple failure return codes, Win32 structured exceptions, C++ exceptions, COM error HResults, and OLE automation IErrorInfo objects. This proliferation of approaches breeds complexity and makes it difficult for designers to create standardized error-handling strategies. The .NET Framework eliminates this complexity by standardizing on a single exception-handling mechanism that is used throughout the framework, and exposed in all .NET languages including C#.
The C# language design also includes numerous other features that promote robustness, such as language-level support for independently versioning base classes (without changing derived class semantics or mandating recompilation of derived classes), detection of attempts to use uninitialized variables, array bounds checking, and support for checked arithmetic.
A Pragmatic World View
Many of the design decisions in the C# language represent a pragmatic world view on the part of the designers. For example, the syntax was selected to be familiar to C, C++, and Java developers, making it easier to learn C# and aiding source code porting.
While C# provides many useful, high-level object-oriented features, it recognizes that in certain limited cases these features can work against raw performance. Rather than dismiss these concerns as unimportant, C# includes explicit support for features such as direct pointer manipulation, unsafe type casts, declarative pinning of garbage-collected objects, and direct memory allocation on the stack. Naturally, these features come at a cost, both in terms of the complexity they add and the elevated security privileges required to use them. However, the existence of these features gives C# programmers much more headroom than other, more restrictive languages do.
Lastly, the interop facilities in the .NET Framework make it easy to leverage existing DLLs and COM components from C# code, and to use C# components in classic COM applications. Although not strictly a function of the C# language, this capability reflects a similarly pragmatic world view, in which new functionality coexists peacefully with legacy code for as long as needed.
The Microsoft .NET Framework consists of two elements: a runtime environment called the Common Language Runtime (CLR), and a class library called the Framework Class Library (FCL). The FCL is built on top of the CLR and provides services needed by modern applications.
While applications targeting the .NET Framework interact directly with the FCL, the CLR serves as the underlying engine. In order to understand the .NET Framework, one first must understand the role of the CLR.
The Common Language Runtime
The CLR is a modern runtime environment that manages the execution of user code, providing services such as JIT compilation, memory management, exception management, debugging and profiling support, and integrated security and permission management.
Essentially, the CLR represents the foundation of Microsoft's computing platform for the next decade. However, it has been a long time in the making. Its origins can be traced back to early 1997, when products such as Microsoft Transaction Server (MTS) were starting to deliver on the promise of a more declarative, service-oriented programming model. This new model allowed developers to declaratively annotate their components at development time, and then rely on the services of a runtime (such as MTS) to hijack component activation and intercept method calls, transparently layering in additional services such as transactions, security, just-in-time (JIT) activation, and more. This need to augment COM type information pushed the limits of what was possible and useful with IDL and type libraries. The COM+ team set out to find a generalized solution to this problem.
The first public discussion of a candidate solution occurred at the 1997 PDC in San Diego, when Mary Kirtland and other members of the COM+ team discussed a future version of COM centered on something called the COM+ Runtime, and providing many of the services such as extensible type information, cross-language integration, implementation inheritance and automatic memory management that ultimately resurfaced in the CLR.[1]
Soon after the 1997 PDC, Microsoft stopped talking publicly about the technology, and the product known as COM+ that was released with Windows 2000 bore little resemblance to the COM+ Runtime originally described. Behind the scenes, however, work was continuing and the scope of the project was expanding significantly as it took on a much larger role within Microsoft.
Initially codenamed Lightning, the project underwent many internal (and some external) renamings, and was known at various times as COM3, COM+ 2.0, the COM+ Runtime, the NGWS Runtime, the Universal Runtime (URT), and the Common Language Runtime. This effort ultimately surfaced as the .NET Framework, announced at the July 2000 PDC in Orlando, Florida. During the following 18 months, the .NET Framework underwent an extensive public beta, culminating in the release of Version 1.0 of the Microsoft .NET Framework on January 15, 2002.
Compilation and Execution Model
To better understand the CLR, consider how compilers that target the .NET Framework differ from traditional compilers.
Traditional compilers target a specific processor, consuming source files in a specific language, and producing binary files containing streams of instructions in the native language of the target processor. These binary files may then be executed directly on the target processor.
.NET compilers function a little differently, as they do not target a specific native processor. Instead, they consume source files and produce binary files containing an intermediate representation of the source constructs, expressed as a combination of metadata and Common Intermediate Language (CIL). In order for these binaries to be executed, the CLR must be present on the target machine.
When these binaries are executed they cause the CLR to load. The CLR then takes over and manages execution, providing a range of services such as JIT compilation (converting the CIL as needed into the correct stream of instructions for the underlying processor), memory management (in the form of a garbage collector), exception management, debugger and profiler integration, and security services (stack walking and permission checks).
This compilation and execution model explains why C# is referred to as a managed language, why code running in the CLR is referred to as managed code, and why the CLR is said to provide managed execution.
Although this dependency on a runtime environment might inititally appear to be a drawback, substantial benefits arise from this architecture. Since the metadata and CIL representations are processor architecture-neutral, binaries may be used on any machine in which the Common Language Runtime is present, regardless of underlying processor architecture. Additionally, since processor-specific code generation is deferred until runtime, the CLR has the opportunity to perform processor-specific optimizations based on the target architecture the code is running on. As processor technology advances, all applications need to take advantage of these advances is an updated version of the CLR.
Unlike traditional binary representations, which are primarily streams of native processor instructions, the combination of metadata and CIL retains almost all of the original source language constructs. In addition, this representation is source language-neutral, which allows developers to build applications using multiple source languages. They can select the best language for a particular task, rather than being forced to standardize on a particular source language for each application or needing to rely on component technologies, such as COM or CORBA, to mask the differences between the source languages used to build the separate components of an application.
The Common Type System
Ultimately, the CLR exists to safely execute managed code, regardless of source language. In order to provide for cross-language integration, to ensure type safety, and to provide managed execution services such as JIT compilation, garbage collection, exception management, etc., the CLR needs intimate knowledge of the managed code that it is executing.
To meet this requirement, the CLR defines a shared type system called the Common Type System (CTS). The CTS defines the rules by which all types are declared, defined and managed, regardless of source language. The CTS is designed to be rich and flexible enough to support a wide variety of source languages, and is the basis for cross-language integration, type safety, and managed execution services.
Compilers for managed languages that wish to be first-class citizens in the world of the CLR are responsible for mapping source language constructs onto the CTS analogs. In cases in which there is no direct analog, the language designers may decide to either adapt the source language to better match the CTS (ensuring more seamless cross-language integration), or to provide additional plumbing that preserves the original semantics of the source language (possibly at the expense of cross-language integration capabilities).
Since all types are ultimately represented as CTS types, it now becomes possible to combine types authored in different languages in new and interesting ways. For example, since managed languages ultimately declare CTS types, and the CTS supports inheritance, it follows that the CLR supports cross-language inheritance.
The Common Language Specification
Not all languages support the exact same set of constructs, and this can be a barrier to cross-language integration. Consider this example: Language A allows unsigned types (which are supported by the CTS), while Language B does not. How should code written in Language B call a method written in Language A, which takes an unsigned integer as a parameter?
The solution is the Common Language Specification (CLS). The CLS defines the reasonable subset of the CTS that should be sufficient to support cross-language integration, and specifically excludes problem areas such as unsigned integers, operator overloading, and more.
Each managed language decides how much of the CTS to support. Languages that can consume any CLS-compliant type are known as CLS Consumers. Languages which can extend any existing CLS-compliant type are known as CLS Extenders. Naturally, managed languages are free to support CTS features over and above the CLS, and most do. As an example, the C# language is both a CLS Consumer and a CLS Extender, and supports all of the important CTS features.
The combination of the rich and flexible CTS and the widely supported CLS has led to many languages being adapted to target the .NET platform. At the time of this writing, Microsoft was offering compilers for six managed languages (C#, VB.NET, JScript, Managed Extensions for C++, Microsoft IL, and J#), and a host of other commercial vendors and academics were offering managed versions of languages, such as COBOL, Eiffel, Haskell, Mercury, Mondrian, Oberon, Forth, Scheme, Smalltalk, APL, several flavors of Pascal, and more.
Given the level of interest from industry and academia, one might say that .NET has spawned something of a renaissance in programming-language innovation.
The Framework Class Library
Developer needs (and Windows capabilities) have evolved much since Windows 1.0 was introduced in November 1985. As Windows has grown to meet new customer needs, the accompanying APIs have grown by orders of magnitude over time, becoming ever more complex, increasingly inconsistent, and almost impossible to comprehend in their totality.
Additionally, while modern paradigms such as object orientation, component software, and Internet standards had emerged and, in many cases, joined the main-stream, these advances had not yet been incorporated into the Windows programming model in a comprehensive and consistent manner.
Given the issues, and the degree of change already inherent in the move to a managed execution environment, the time was ripe for a clean start. As a result, the .NET Framework replaces most (though not all) of the traditional Windows API sets with a well-factored, object-oriented class library called the Framework Class Library (FCL).
The FCL provides a diverse array of higher-level software services, addressing the needs of modern applications. Conceptually, these can be grouped into several categories such as:
-
Support for core functionality, such as interacting with basic data types and collections; console, network and file I/O; and interacting with other runtime-related facilities.
-
Support for interacting with databases; consuming and producing XML; and manipulating tabular and tree-structured data.
-
Support for building web-based (thin client) applications with a rich server-side event model.
-
Support for building desktop-based (thick client) applications with broad support for the Windows GUI.
-
Support for building SOAP-based XML web services.
The FCL is vast, including more than 3,500 classes. For a more detailed overview of the facilities in the FCL, see Chapter 5.
One of the most encouraging aspects about the .NET Framework is the degree of openness that Microsoft has shown during its development. From the earliest public previews, core specifications detailing the C# language, the classes in the FCL, and the inner workings of the CLR have been freely available.
However, this openness was taken to a new level in November 2000 when Microsoft, along with co-sponsors Intel and HP, officially submitted the specifications for the C# language, a subset of the FCL, and the runtime environment to ECMA for standardization.
This action began an intense standardization process. Organizations participating in the effort included Microsoft, HP, Intel, IBM, Fujitsu, Sun, Netscape, Plum Hall, OpenWave, and others. The work was performed under the auspices of ECMA technical committee TC39, the same committee that had previously standardized the JavaScript language as ECMAScript.
TC39 chartered two new task groups to perform the actual standardization work: one to focus on the C# language, the other to focus on what became known as the Common Language Infrastructure (CLI).
The CLI consisted of the runtime engine and the subset of the FCL being standardized. Conceptually, Microsoft's CLR is intended to be a conforming commercial implementation of the runtime engine specified in the CLI, and Microsoft's FCL is intended to be a conforming commercial implementation of the class library specified in the CLI (although obviously, it is a massive superset of the 294 classes ultimately specified in the CLI).
After more than a year of intense effort, the task groups completed their standardization work and presented the specifications to the ECMA General Assembly. On December 13, 2001, the General Assembly ratified the C# and CLI specifications as international standards, assigning them the ECMA standards numbers of ECMA-334 (C#) and ECMA-335 (the CLI). Copies of the ECMA standards are available at http://www.ecma.ch.
Critics have claimed that the ECMA standardization process was merely a ploy by Microsoft to deflect Java's cross-platform advantages. However, the qualifications and seniority of the people working on the standardization effort, and their level of involvement during the lengthy standardization cycle, tell a different story. Microsoft, along with its co-sponsors and the other members of the standardization task groups, committed some of its best and brightest minds to this effort, spending a huge amount of time and attention on the standardization process. Given that this effort occurred concurrently with the development and release of the .NET Framework itself, this level of investment by Microsoft and others flies in the face of the conspiracy theories.
Of course, for standards to have an impact, there must be implementations. At the time of this writing (January 2001), no fully conformant reference implementations were shipping. However, there are several efforts already under way to provide open and shared source implementations of the C# language and the CLI, for both commercial and noncommercial use.
Microsoft is working with Corel to provide a shared source implementation of the CLI, as well as C# and ECMAScript compilers that target it. This project, codenamed Rotor, is expected to ship in 2002, and will run on both Windows and FreeBSD platforms. Although specific details of the shared source license were not available at the time of this writing, it is expected that this CLI implementation will be licensed for noncommercial use by researchers, hobbyists, and academics.
However, Microsoft's implementation is not the only game in town. Other CLI implementations include the Mono project and dotGNU.
The Mono project (http://www.go-mono.com), started by Ximian Corporation, is aiming to provide full open source (GPL and LGPL) implementations of not only the CLI platform and the C# compiler, but also a larger set of classes selected from Microsoft's .NET Framework FCL. In addition to the internal resources that Ximian has committed to the project, the Mono project has also attracted attention from the broader open source community, and appears to be gathering steam.
Finally, consider the open source (GPL) dotGNU project (http://www.dotgnu.org). While not as high-profile as Mono, dotGNU has also been making headway, and includes some interesting and unique concepts. The core of dotGNU is Portable.NET, which was originally developed by a lone developer (Rhys Weatherley) before merging his project with dotGNU in August 2000. There are unique aspects to the dotGNU project, including the fact that it was originally designed around a CIL interpreter rather than a JIT compiler (although the team plans to add JIT compilation at a later date), and the developers' plan to support directly executing Java binaries.
Beyond these three, it is very likely that more implementations of the CLI will arise over time. While it is too early to say whether the .NET Framework (in the form of the CLI) will ever be available on as many platforms as Java is, the degree of openness and the level of community interest is very encouraging.
|